Executive Overview: Mid-Term Outlook for Artificial Intelligence
This overview includes the trends to watch, overarching principles, and the impact on publishing and other industries.
We begin with a discussion of short-term concepts such as generative AI, GPT, Llama, and Palm, exploring how language models can transform content from one medium to another. This includes generative audio and imagery technologies like Stable Diffusion, MidJourney, Adobe Firefly, and HeyGen. We also cover the paid benefits of GPT Plus, including plugins and advanced data analytics, followed by speech-to-text technologies like Whisper and Otter.ai.
Next, we delve into topics expected to emerge in AI over the mid-term. These include the concept of agency with connected and collaborative AI working on our behalf. We then examine multimodality and multimedia within a single interface. Finally, we explore robot embodiment and LLMs (Large Language Models) as interfaces to physical machines.
Looking further ahead, we discuss the decline of traditional web metrics such as page views, the declining role of web browsers, and the horizon of Artificial General Intelligence (AGI).
2. 2
Short Term – Bulk of
discussion
Mid Term (6-24
months)
Long Term
Generative AI Models:
GPT, Llama, PaLM
ENPS -> Web Articles
Box Scores -> Articles
Audio Creation:
ElevenLabs
THIS IS WHAT I WANT
TO TALK ABOUT
Media: MidJourney,
Adobe Firefly, HeyGen
ChatGPT PlugIns
Advanced Data Analysis:
Spreadsheets, insights,
code writing
Speech to Text: Whisper,
Otter.ai, Captions
4. Mid Term – Determine our AI Acumen
• Can you name a few LLMs other than Chat GPT?
• You don’t need to know them… but be aware.
• Llama, PaLM, etc.
5. Mid Term – Generative AI is a LOT more than ChatGPT
Google’s are light blue. Meta’s are dark blue.
Proficiency
6. Mid Term - Determine our AI Acumen
• Can you name a few LLMs other than Chat GPT?
• You don’t need to know them… but be aware.
• Llama, PaLM, etc.
• Do you have Chat GPT Plus?
• If so, have you used Code Interpreter/Advanced Data
Analysis?
• Have you experimented with PlugIns?
• That’s the difference between a “chit chat bot” and
AGENCY – where the language model is an interface to
data that you speak to conversationally
7. 7
Short Term Mid Term Long Term
Generative AI Models:
GPT, Llama, PaLM
Agency
ENPS -> Web Articles
Box Scores -> Articles
Audio Creation:
ElevenLabs
Media: MidJourney,
Adobe Firefly, HeyGen
ChatGPT PlugIns
Advanced Data Analysis:
Spreadsheets, insights,
code writing
Speech to Text: Whisper,
Otter.ai, Captions
8. Mid Term – Agency
• Generative AI bots are so good, that we THINK THEY ARE
ANSWER-BOTS
• On their own… they can pass IQ tests, take the bar exam, etc.
• GPT4 does so well on some tests, that we cannot create a test to
measure it!
• Like a punching bag game that goes to 1000… GPT4 is scoring
1000…. but if we can’t measure anything stronger… we don’t know if
it’s strength is 1000 or 10,000… or 100,000
9. Mid Term – Agency
• Generative AI bots are so good, that we THINK THEY ARE ANSWER-BOTS
10. Mid Term – Agency
• Generative AI bots are so good, that we THINK THEY ARE ANSWER-BOTS
• On their own… they can pass IQ tests, take the bar exam, etc.
• However, when we use them as INTERFACES and connect them with DATA and give
the ability to reflect and refine… avoid hallucinations, check their work…
Access the internet, book flights, change hotel rooms… that’s an AGENT. We’re already
using them… but they will get more powerful.
11. This is the agent
The Agents are
Coming…
via Enterprise
LLM
“Everything That
Can Be LLM-ed,
Will Be”
Imagine ‘talking with’ 30 years of
ENPS scripts….
14. Mid Term – Agency – Death of the Page View
“The advertising model as we know it — getting people to go to your website and view it
— that’s going to break…I don’t see how that sustains” –Jeremiah Owyang
As consumers increasingly use automated tools to go through the marketing “funnel,”
marketers and creators need to consider something that many might think is
counterintuitive: That is, you actually want, no need, LLMs to train on your data.
“If I was a journalist, I would want my articles ingested by all of the LLMs,” he
explained, adding that more and more chatbots are including citations, including Bing,
You.com and Perplexity. “So when people search for that information, I show up first —
it’s the same as SEO strategy,” he said — cautioning that this would not apply to gated
content, which employs a different business model.
https://venturebeat.com/ai/forget-seo-why-ai-engine-optimization-may-be-the-future/
Never leave the chat = Death of the browser
Elon, Gates, Zuck, Andreeson agree.
?
15. 15
Short Term Mid Term Long Term
Generative AI Models:
GPT, Llama, PaLM
Agency
ENPS -> Web Articles Multimodal
Box Scores -> Articles
Audio Creation:
ElevenLabs
Media: MidJourney,
Adobe Firefly, HeyGen
ChatGPT PlugIns
Advanced Data Analysis:
Spreadsheets, insights,
code writing
Speech to Text: Whisper,
Otter.ai, Captions
16. Mid Term – Multimodal
• Currently, Large Language Models are thought of as single modes:
• ChatGPT = text
• MidJourney = images
• Etc.
• Next gen LLMs will be multimodal:
• Multimodal is the ability to work with and understand ANY media:
• Text, images, video, data,
• Image to speech, text to video, etc.
• ChatGPT AGENTS like Code Interpreter can use tools like OCR to read PDFs or write Python
to create tables and graphs
• That’s not the same.
17. Mid Term – Multimodal
Actual interaction with Google DeepMind’s
Flamingo in April 2022 (17 months ago)
• How many people are in this photo
• “There at least five people in this photo”
• How many mirrors are in this photo?
• “There are at least two mirrors”
• What is the person on the scale doing?
• “He is looking at the scale”
• Where is Obama’s foot positioned?
• “Obama’s foot is on the right side of the scale”
• What happens as a result?
• “The scale shows a higher weight”
• Is the person on the scale aware of it?
• “He is not aware of it.”
https://www.youtube.com/watch?v=zRYcKhkAsk4
18. Mid Term – Multimodal
• Currently, Large Language Models are though of in single modes:
• ChatGPT = text to text
• MidJourney = text to images
• Next gen LLMs will be Multimodal:
• ChatGPT Plus Code Interpreter can use tools like OCR to read PDFs or write Python to
create tables and graphs
• That’s not the same.
• Multimodal is the ability to understand ANY media:
• Text, images, video, data,
• Image to speech, text to video, etc.
• No more need for metadata, keywords? The AP already has the
ability to search its entire archive this way, i.e. “Find all of the
photos of Churchill on a roof smoking a cigar” (actual example)
19. 19
Short Term Mid Term Long Term
Generative AI Models:
GPT, Llama, PaLM
Agency
ENPS -> Web Articles Multimodal
Box Scores -> Articles Embodiment
Audio Creation:
ElevenLabs
Media: MidJourney,
Adobe Firefly, HeyGen
ChatGPT PlugIns
Advanced Data Analysis:
Spreadsheets, insights,
code writing
Speech to Text: Whisper,
Otter.ai, Captions
20. Mid Term – Embodiment
• Using a Large Language Model as an interface to
accomplish tasks on a network is AGENCY
• Using a Large Language Model as an interface with
a physical robot or machine is EMBODIMENT
21. Mid Term – Embodiment
The robot’s program does not need to
know what rice chips are…nor where the
drawer is… the LLM works with the video
sensors and derives it… and commands
it.
Again... “mid-term” is now.
“According to Google, when
given a high-level command,
such as bring me the rice
chips from the drawer, PaLM-
E can generate a plan of
action for a mobile robot and
execute the actions by itself.”
– March 2023
22. Mid Term – Embodiment
• Using a Large Language Model as an interface to accomplish
tasks online is AGENCY
• Using a Large Language Model as an interface with a robot or
machine is EMBODIMENT
• The human says a command in plain English.
• The LLM translates the command into the code required to
execute.
• If you had a Big Trak or used the LOGO programming educational
tool, imagine just talking to “the turtle” as opposed to having to
program it line by line.
• Recently, a large language model embedded in a robot dog was
able to INTUITIVELY figure out the dog’s programming code and
guide the dog to do plain English tasks like “Go to the living room
and bring me the red sock off of the couch”
23.
24. 24
Short Term Mid Term Long Term
Generative AI Models:
GPT, Llama, PaLM
Agency
ENPS -> Web Articles Multimodal
Box Scores -> Articles Embodiment
Audio Creation:
ElevenLabs
Artificial General
Intelligence
More toward “long term”
Media: MidJourney,
Adobe Firefly, HeyGen
ChatGPT PlugIns
Advanced Data Analysis:
Spreadsheets, insights,
code writing
Speech to Text: Whisper,
Otter.ai, Captions
25. Long Term – Artificial General Intelligence (AGI)
• OpenAI defines AGI as “when computers
can outperform humans at most
economically viable work”
• A combination of multimodality, intelligence, agency,
embodiment…and exponential improvement.
• AGI Predictions
• DeepMind CEO: A few years
• Alan Thompson: 2 years
• Elon Musk: 5 years
• Ray Kurzweil: 6 years
• Geoffrey Hinton: 5-10 years
• Sam Altman: <10 years
26. 10 Months of “otter on a plane using
wifi”
• October 2022
• November 2022
• March 2023
Let’s revisit short term,
the novelty stage…
Source: MidJourney
& Ethan Mollick
27. Long term sneaks up on us…
• 1964: 1 megaflop = $5M (CDC 6600)
• 1985: 1.9 gigaflop = $15M (The Cray-2)
• 1997: 1 teraflop = $46M (ASCI Red)
• 2023: 35 teraflops = $999 (iPhone 15)
Source: Ethan Mollick
28. 28
Short Term Mid Term Long Term
Generative AI Models:
GPT, Llama, PaLM
Agency Artificial General
Intelligence
ENPS -> Web Articles Multimodal
Box Scores -> Articles Embodiment
Audio Creation:
ElevenLabs
Media: MidJourney,
Adobe Firefly, HeyGen
ChatGPT PlugIns
Advanced Data Analysis:
Spreadsheets, insights,
code writing
Speech to Text: Whisper,
Otter.ai, Captions
These are my
foundations for
thinking about AI
Thank you.
Editor's Notes
Partnering with RTDNA on policy and ethics
Met with the Associated Press AI team
Leading BLOX product advisory board
Presenting to the ITG in September
Created an in-house group to test and learn: EMPOWER
New models check their work and reflect before responding, catch errors, and suggest improvements. It’s jarring.
Written: transferring existing captions/scripts into articles, grammar, clarity, script critique, summaries/localization of documents
Sports: raw box scores into scripts
Audio : promotion voice track automation
Visual : contest and promotional clip art
Data: large dataset manipulation, insights, graphs, tables Code: Excel, Python, HTML, JavaScript