SlideShare a Scribd company logo
1 of 23
From OpenAI
to Open Source AI
Navigating Between Commercial Ownership and Collaborative Openness
https://stateofopencon.com/ #stateofopencon #soocon24 #openuk
https://hachyderm.io/@openuk
Raphaël Semeteys (and Luxin Zhang) - Worldline
Introduction
Raphaël Semeteys
• Open source since 1997, professionally since 2004
• Yoga Teacher, Creator of the QSOS method
• Head of DevRel at Worldline
7000+ engineers
in over 40 countries
Managing 43+ billion
transactions per year
€250M spent in R&D
every year
Handling 150+
payment methods
We design payments technology that powers the growth
of millions of businesses around the world
The early days of LLMs
From rule-based and simpler statistical models to LLMs
2010’s 2020’s
2017-2018
Word embeddings
such as Word2Vec
and GloVe
“Attention is All You Need"
Transformers, BERT
Generative AI, ChatGPT
responsibility concerns
GenAI is having its Linux Moment
• Just like open source and Internet, bust much faster!
• Dynamics between collaborative openness and commercial ownership
• Need of clarity on licenses
Labs &
Universities
Individuals
Enterprises
Commodities
Defining Openness of a LLM
Pre-training
Dataset
Fine-tuning
Dataset
Reward
Model
Model
Data Processing Code
Defining Openness of a LLM
Score Level Description
Model
(weights)
Pre-
training
Dataset
Fine-
tuning
Dataset
Reward
model
Data
Processing
Code
0 Closed
No access to any public
information, data or asset
1
Published
research only
Research papers(s) published but
with no more information, data or
asset
2
Restricted
access
Access to asset is possible only
with special agreement
(commercial, research…)
3
Open with
limitations
Access and reuse of asset is
possible but with certain
limitations on usage (ex. Open
RAIL)
4 Totally open
Access and reuse of asset is
possible without restriction (ex.
open source license)
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed

GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
No training of other commercial LLMs
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed

GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
No training of other commercial LLMs
You may not: […] Use Output to develop
models that compete with OpenAI.
Market-Leading Player: Google
Transition from open research to proprietary commercial approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
BERT PaLM 2 & Gemini
1
Published
research only
1
Published
research only
0 Closed

Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa Llama 2
3
Open with
limitations
1
Published
research only
1
Published
research only

Restriction on usage: license for platforms with 700+ M users
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa Llama 2
3
Open with
limitations
1
Published
research only
1
Published
research only

Restriction on usage: license for platforms with 700+ M users
Additional Commercial Terms. If, on the Llama 2 version release date, the
monthly active users of the products or services made available by or for
Licensee, or Licensee’s affiliates, is greater than 700 million monthly active
users in the preceding calendar month, you must request a license from
Meta, which Meta may grant to you in its sole discretion, and you are not
authorized to exercise any of the rights under this Agreement unless or
until Meta otherwise expressly grants you such rights.
Llama offspring’s: Alpaca and Vicuna
Fine-tuned models from Llama 2 by universities
Research (US)
Component Score
Level
description
Model 3
Open with
limitations
Pre-training
Dataset
1
Published
research only
Fine-tuning
Dataset
2
Research use
only
Code 4
Under Apache
2 license
Restrictions from both Llama 2 and OpenAI (ShareGPT)
Collaborative foundational LLMs
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral
Model 4
Access and
reuse without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse without
restriction
4
Access and
reuse without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse without
restriction
3
Open with
limitations
4
Access and
reuse without
restriction
0
No public
information or
access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1 Just examples 4
Completely
open
Collaborative foundational LLMs
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral
Model 4
Access and
reuse without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse without
restriction
4
Access and
reuse without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse without
restriction
3
Open with
limitations
4
Access and
reuse without
restriction
0
No public
information or
access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1 Just examples 4
Completely
open
This license is, in part, based on the Apache License Version 2.0,
with a series of modifications. The contribution of the Apache
License 2.0 to the framing of this document is acknowledged.
Please read this license carefully, as it is different to other ‘open
access’ licenses you may have encountered previously. Use of
Falcon180B for hosted services may require a separate license.
Collaborative fine-tuned LLMs
Impact of foundational model or pre-training datasets
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and reuse
without restriction
4 Dolly and LAION 2
Research use only
(OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
Collaborative fine-tuned LLMs
Impact of foundational model or pre-training datasets
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and reuse
without restriction
4 Dolly and LAION 2
Research use only
(OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
BLOOMChat Use Restrictions
l. To provide medical advice and medical results interpretation; or
m. To generate or disseminate information for the purpose to be used for
administration of justice, law enforcement, immigration or asylum processes,
such as predicting an individual will commit fraud/crime
commitment.
Collaboration platform: Hugging Face
• Startup and ecosystem dedicated to democratizing AI
• Open source Transformers library
• LLM leaderboard: upload and assess models
• The “GitHub of AI”
• Collaborative space for exploring, sharing and experimenting AI
• Hosts thousands of models, datasets, and demo applications
Enabler for collaboration and reuse
Hosting and resource paradigms
• Big players invest billions (Microsoft/OpenAI, AWS/Anthropic)
• CSP selling shovels in the AI Gold rush
Source: numind.ai
Closed models are centralized and resource-consuming
Hosting and resource paradigms
• Democratizing AI Computing
• Quantization, AI Chips
• Run models locally, in containers
• Emergence of smaller models for edge and mobile
• Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT
• Domain Specific Language Models: BloombergGPT, Harvey (law)
• Mixture of models: Mixtral 8x7B, OpenMoE  Mixture of licenses?
Key takeaways
• Hyper-centralization leads to black boxes and closed solutions
• Openness
• Fosters collaboration and fuels community-driven innovation
• Enables inclusivity
• Just like open source software beware of licenses and restrictions
• AI's democratization continually reshapes the landscape
Thank you
Raphaël Semeteys - Worldline
@RaphaelSemeteys
https://dev.to/raphiki
Check the two-part article co-written with Luxin Zhang
Image credits
• Opensource, Internet & GenAI evolution image generated with DALL-E
• Robot evolution from Freepik
• LLMs’ #parameters evolution from numind.ai
• Shovels in Gold rush image generated with DALL-E
• Logos from official websites
• Coffee cups from Freepik
#stateofopencon #soocon24 #openuk

More Related Content

Similar to SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Ownership and Collaborative Openness

OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27Shane Coughlan
 
Open Source
Open SourceOpen Source
Open SourceJohn Gs
 
Open Source And the Internet Of Things
Open Source And the Internet Of ThingsOpen Source And the Internet Of Things
Open Source And the Internet Of ThingsProgrammableWeb
 
Open Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models ExplainedOpen Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models ExplainedAll Things Open
 
Open Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models ExplainedOpen Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models ExplainedAll Things Open
 
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib..."Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib...eLiberatica
 
Open Source Business Case Final
Open Source Business Case FinalOpen Source Business Case Final
Open Source Business Case FinalFITT
 
Open Source & What It Means For Self-Sovereign Identity (SSI)
Open Source & What It Means For Self-Sovereign Identity (SSI)Open Source & What It Means For Self-Sovereign Identity (SSI)
Open Source & What It Means For Self-Sovereign Identity (SSI)Evernym
 
1 open power foundation_japan meetup - v1
1 open power foundation_japan meetup - v11 open power foundation_japan meetup - v1
1 open power foundation_japan meetup - v1Yutaka Kawai
 
Open Source Software Development by TLV Partners
Open Source Software Development by TLV PartnersOpen Source Software Development by TLV Partners
Open Source Software Development by TLV PartnersRoy Leiser
 
Open source presentation
Open source presentationOpen source presentation
Open source presentationRona Segev Gal
 
Open Source All The Things
Open Source All The ThingsOpen Source All The Things
Open Source All The ThingsAll Things Open
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseCharlie Hull
 
201704 - An Introduction to the Symphony Software Foundation
201704 - An Introduction to the Symphony Software Foundation201704 - An Introduction to the Symphony Software Foundation
201704 - An Introduction to the Symphony Software FoundationSymphony Software Foundation
 
Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)Gail Murphy
 
Red Hat - The Open Source Model
Red Hat - The Open Source ModelRed Hat - The Open Source Model
Red Hat - The Open Source Modelhelkomy
 
Can end user apps also be open source? OW2online'20, June 2020
Can end user apps also be open source? OW2online'20, June 2020Can end user apps also be open source? OW2online'20, June 2020
Can end user apps also be open source? OW2online'20, June 2020OW2
 

Similar to SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Ownership and Collaborative Openness (20)

OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
 
Open Source
Open SourceOpen Source
Open Source
 
Open Source And the Internet Of Things
Open Source And the Internet Of ThingsOpen Source And the Internet Of Things
Open Source And the Internet Of Things
 
Open Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models ExplainedOpen Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models Explained
 
Open Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models ExplainedOpen Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models Explained
 
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib..."Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
 
Open source
Open sourceOpen source
Open source
 
Open Source Business Case Final
Open Source Business Case FinalOpen Source Business Case Final
Open Source Business Case Final
 
Opensource
OpensourceOpensource
Opensource
 
Open Source & What It Means For Self-Sovereign Identity (SSI)
Open Source & What It Means For Self-Sovereign Identity (SSI)Open Source & What It Means For Self-Sovereign Identity (SSI)
Open Source & What It Means For Self-Sovereign Identity (SSI)
 
1 open power foundation_japan meetup - v1
1 open power foundation_japan meetup - v11 open power foundation_japan meetup - v1
1 open power foundation_japan meetup - v1
 
Open Source Software Development by TLV Partners
Open Source Software Development by TLV PartnersOpen Source Software Development by TLV Partners
Open Source Software Development by TLV Partners
 
Open source presentation
Open source presentationOpen source presentation
Open source presentation
 
Open Source All The Things
Open Source All The ThingsOpen Source All The Things
Open Source All The Things
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterprise
 
My Seminar
My SeminarMy Seminar
My Seminar
 
201704 - An Introduction to the Symphony Software Foundation
201704 - An Introduction to the Symphony Software Foundation201704 - An Introduction to the Symphony Software Foundation
201704 - An Introduction to the Symphony Software Foundation
 
Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)
 
Red Hat - The Open Source Model
Red Hat - The Open Source ModelRed Hat - The Open Source Model
Red Hat - The Open Source Model
 
Can end user apps also be open source? OW2online'20, June 2020
Can end user apps also be open source? OW2online'20, June 2020Can end user apps also be open source? OW2online'20, June 2020
Can end user apps also be open source? OW2online'20, June 2020
 

More from Raphaël Semeteys

OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...Raphaël Semeteys
 
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Raphaël Semeteys
 
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?Raphaël Semeteys
 
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsSnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsRaphaël Semeteys
 
Solution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptSolution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptRaphaël Semeteys
 
Solutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptSolutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptRaphaël Semeteys
 
Solutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxSolutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxRaphaël Semeteys
 

More from Raphaël Semeteys (12)

OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
 
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
 
Nantes JUG 2023 - Web3
Nantes JUG 2023 - Web3Nantes JUG 2023 - Web3
Nantes JUG 2023 - Web3
 
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
 
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsSnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
 
Solution Linux 2009 - QSOS
Solution Linux 2009 - QSOSSolution Linux 2009 - QSOS
Solution Linux 2009 - QSOS
 
Solution Linux 2009 - SVG
Solution Linux 2009 - SVGSolution Linux 2009 - SVG
Solution Linux 2009 - SVG
 
Solution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptSolution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScript
 
Solutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptSolutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScript
 
Solutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxSolutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail Linux
 
Solutions Linux 2008 - ECOS
Solutions Linux 2008 - ECOSSolutions Linux 2008 - ECOS
Solutions Linux 2008 - ECOS
 
Solutions Linux 2007 - QSOS
Solutions Linux 2007 - QSOSSolutions Linux 2007 - QSOS
Solutions Linux 2007 - QSOS
 

Recently uploaded

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 

Recently uploaded (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 

SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Ownership and Collaborative Openness

  • 1. From OpenAI to Open Source AI Navigating Between Commercial Ownership and Collaborative Openness https://stateofopencon.com/ #stateofopencon #soocon24 #openuk https://hachyderm.io/@openuk Raphaël Semeteys (and Luxin Zhang) - Worldline
  • 2. Introduction Raphaël Semeteys • Open source since 1997, professionally since 2004 • Yoga Teacher, Creator of the QSOS method • Head of DevRel at Worldline 7000+ engineers in over 40 countries Managing 43+ billion transactions per year €250M spent in R&D every year Handling 150+ payment methods We design payments technology that powers the growth of millions of businesses around the world
  • 3. The early days of LLMs From rule-based and simpler statistical models to LLMs 2010’s 2020’s 2017-2018 Word embeddings such as Word2Vec and GloVe “Attention is All You Need" Transformers, BERT Generative AI, ChatGPT responsibility concerns
  • 4. GenAI is having its Linux Moment • Just like open source and Internet, bust much faster! • Dynamics between collaborative openness and commercial ownership • Need of clarity on licenses Labs & Universities Individuals Enterprises Commodities
  • 5. Defining Openness of a LLM Pre-training Dataset Fine-tuning Dataset Reward Model Model Data Processing Code
  • 6. Defining Openness of a LLM Score Level Description Model (weights) Pre- training Dataset Fine- tuning Dataset Reward model Data Processing Code 0 Closed No access to any public information, data or asset 1 Published research only Research papers(s) published but with no more information, data or asset 2 Restricted access Access to asset is possible only with special agreement (commercial, research…) 3 Open with limitations Access and reuse of asset is possible but with certain limitations on usage (ex. Open RAIL) 4 Totally open Access and reuse of asset is possible without restriction (ex. open source license)
  • 7. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed  GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only No training of other commercial LLMs
  • 8. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed  GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only No training of other commercial LLMs You may not: […] Use Output to develop models that compete with OpenAI.
  • 9. Market-Leading Player: Google Transition from open research to proprietary commercial approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open BERT PaLM 2 & Gemini 1 Published research only 1 Published research only 0 Closed 
  • 10. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa Llama 2 3 Open with limitations 1 Published research only 1 Published research only  Restriction on usage: license for platforms with 700+ M users
  • 11. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa Llama 2 3 Open with limitations 1 Published research only 1 Published research only  Restriction on usage: license for platforms with 700+ M users Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
  • 12. Llama offspring’s: Alpaca and Vicuna Fine-tuned models from Llama 2 by universities Research (US) Component Score Level description Model 3 Open with limitations Pre-training Dataset 1 Published research only Fine-tuning Dataset 2 Research use only Code 4 Under Apache 2 license Restrictions from both Llama 2 and OpenAI (ShareGPT)
  • 13. Collaborative foundational LLMs Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open
  • 14. Collaborative foundational LLMs Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open This license is, in part, based on the Apache License Version 2.0, with a series of modifications. The contribution of the Apache License 2.0 to the framing of this document is acknowledged. Please read this license carefully, as it is different to other ‘open access’ licenses you may have encountered previously. Use of Falcon180B for hosted services may require a separate license.
  • 15. Collaborative fine-tuned LLMs Impact of foundational model or pre-training datasets Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source
  • 16. Collaborative fine-tuned LLMs Impact of foundational model or pre-training datasets Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source BLOOMChat Use Restrictions l. To provide medical advice and medical results interpretation; or m. To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment.
  • 17. Collaboration platform: Hugging Face • Startup and ecosystem dedicated to democratizing AI • Open source Transformers library • LLM leaderboard: upload and assess models • The “GitHub of AI” • Collaborative space for exploring, sharing and experimenting AI • Hosts thousands of models, datasets, and demo applications Enabler for collaboration and reuse
  • 18. Hosting and resource paradigms • Big players invest billions (Microsoft/OpenAI, AWS/Anthropic) • CSP selling shovels in the AI Gold rush Source: numind.ai Closed models are centralized and resource-consuming
  • 19. Hosting and resource paradigms • Democratizing AI Computing • Quantization, AI Chips • Run models locally, in containers • Emergence of smaller models for edge and mobile • Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT • Domain Specific Language Models: BloombergGPT, Harvey (law) • Mixture of models: Mixtral 8x7B, OpenMoE  Mixture of licenses?
  • 20. Key takeaways • Hyper-centralization leads to black boxes and closed solutions • Openness • Fosters collaboration and fuels community-driven innovation • Enables inclusivity • Just like open source software beware of licenses and restrictions • AI's democratization continually reshapes the landscape
  • 21. Thank you Raphaël Semeteys - Worldline @RaphaelSemeteys https://dev.to/raphiki Check the two-part article co-written with Luxin Zhang
  • 22. Image credits • Opensource, Internet & GenAI evolution image generated with DALL-E • Robot evolution from Freepik • LLMs’ #parameters evolution from numind.ai • Shovels in Gold rush image generated with DALL-E • Logos from official websites • Coffee cups from Freepik