SlideShare a Scribd company logo
1 of 23
Download to read offline
Payments to grow your world
Unlocking AI:
Navigating Open Source
vs. Commercial Frontiers
Raphaël Semeteys
Head of DevRel, Senior Architect at Worldline
March 16th
Centrul Regional de Afaceri, Timișoara
We design payments technology
that powers the growth of millions
of businesses around the world.
7000+ engineers
in over 40 countries
Managing 43+ billion
transactions per year
€250M spent in R&D
every year
Handling 150+
payment methods
The early days of LLMs
From rule-based and simpler statistical models to LLMs
2010’s 2020’s
2017-2018
Word embeddings
such as Word2Vec
and GloVe
“Attention is All You
Need"
Transformers, BERT
Generative AI,
ChatGPT responsibility
concerns
GenAI is having its Linux Moment
• Just like open source and Internet, bust much faster!
• Dynamics between collaborative openness and commercial ownership
• Need of clarity on licenses
Labs &
Universities
Individuals
Enterprises
Commodities
Defining Openness of a LLM
Pre-training
Dataset
Fine-tuning
Dataset
Reward
Model
Model
Data Processing Code
Defining Openness of a LLM
Score Level Description
Model
(weights)
Pre-
training
Dataset
Fine-
tuning
Dataset
Reward
model
Data
Processing
Code
0 Closed
No access to any public
information, data or asset
1
Published
research
only
Research papers(s) published but
with no more information, data or
asset
2
Restricted
access
Access to asset is possible only
with special agreement
(commercial, research…)
3
Open with
limitations
Access and reuse of asset is
possible but with certain
limitations on usage
4 Totally open
Access and reuse of asset is
possible without restriction on
usage (ex. open source license)
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed
→
GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed
→
GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
No training of other commercial LLMs
You may not: […] Use Output to
develop models that compete with
OpenAI.
Market-Leading Player: Google
Transition from open research to a pragmatic approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
1
Published
research only
1
Published
research only
0 Closed
→
3
Open with
limitations
1
Published
research only
4
Toolchain
available
→
Market-Leading Player: Google
Transition from open research to a pragmatic approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
1
Published
research only
1
Published
research only
0 Closed
→
3
Open with
limitations
1
Published
research only
4
Toolchain
available
→
You may not use nor allow others to use Gemma or
Model Derivatives to: [illegals activities, unlicensed
practices of profession, abuse, security bypass and
promotion of hatred, abuse, violence, monitoring
people without consent,
misinformation/defamation, automate decisions
concerning human rights and well-being, etc.]
Responsible AI contradicts Open Source Definition
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
Restriction on usage: license for platforms with 700+ M users
Additional Commercial Terms. If, on the Llama 2 version release date,
the monthly active users of the products or services made available by or
for Licensee, or Licensee’s affiliates, is greater than 700 million monthly
active users in the preceding calendar month, you must request a license
from Meta, which Meta may grant to you in its sole discretion, and you
are not authorized to exercise any of the rights under this Agreement
unless or until Meta otherwise expressly grants you such rights.
Llama offspring’s: Alpaca and Vicuna
Fine-tuned models from Llama 2 by universities
Research (US)
Component Score
Level
description
Model 3
Open with
limitations
Pre-training
Dataset
1
Published
research only
Fine-tuning
Dataset
2
Research use
only
Code 4
Under Apache
2 license
Restrictions from both Llama 2 and OpenAI (ShareGPT)
Collaborative foundational LLMs
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral
Model 4
Access and
reuse
without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse
without
restriction
4
Access and
reuse
without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse
without
restriction
3
Open with
limitations
4
Access and
reuse
without
restriction
0
No public
information
or access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1
Just
examples
4
Completely
open
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Collaborative foundational LLMs
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral
Model 4
Access and
reuse
without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse
without
restriction
4
Access and
reuse
without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse
without
restriction
3
Open with
limitations
4
Access and
reuse
without
restriction
0
No public
information
or access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1
Just
examples
4
Completely
open
This license is, in part, based on the Apache License Version 2.0, with a
series of modifications. The contribution of the Apache License 2.0 to
the framing of this document is acknowledged. Please read this license
carefully, as it is different to other ‘open access’ licenses you may have
encountered previously. Use of Falcon180B for hosted services may
require a separate license.
Collaborative fine-tuned LLMs
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3
Based on
BLOOM
4
Based on
Mistral
4 Open source
Pre-training
Dataset
3 Based on GPT-J 3
Based on
BLOOM
0
Based on
Mistral
4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and
reuse without
restriction
4
Dolly and
LAION
2
Research use
only (OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
Impact of foundational model or pre-training datasets
Collaborative fine-tuned LLMs
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3
Based on
BLOOM
4
Based on
Mistral
4 Open source
Pre-training
Dataset
3 Based on GPT-J 3
Based on
BLOOM
0
Based on
Mistral
4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and
reuse without
restriction
4
Dolly and
LAION
2
Research use
only (OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
Impact of foundational model or pre-training datasets
BLOOMChat Use Restrictions
l. To provide medical advice and medical results interpretation; or
m. To generate or disseminate information for the purpose to be used
for administration of justice, law enforcement, immigration or asylum
processes, such as predicting an individual will commit fraud/crime
commitment.
Collaboration platform: Hugging Face
Enabler for collaboration and reuse
• Startup and ecosystem dedicated to democratizing AI
• Open source Transformers library
• LLM leaderboard: upload and assess models
• The “GitHub of AI”
• Collaborative space for exploring, sharing and experimenting AI
• Hosts thousands of models, datasets, and demo applications
Hosting and resource paradigms
Closed models are centralized and resource-consuming
Big players invest billions (Microsoft/OpenAI, AWS/Anthropic)
CSP selling shovels in the AI Gold rush
Source: numind.ai
Hosting and resource paradigms
• Democratizing AI Computing
• Quantization, AI Chips
• Run models locally, in containers
• Emergence of smaller models for edge and mobile
• Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT
• Domain Specific Language Models: BloombergGPT, BioMistral, Harvey (law)
• Mixture of models: Mixtral 8x7B, OpenMoE → Mixture of licenses?
Key takeaways
• Hyper-centralization leads to black boxes and closed solutions
• Openness
• Fosters collaboration and fuels community-driven innovation
• Enables inclusivity
• Just like opensource software beware of licenses and restrictions
• GenAI’s innovation continually reshapes the landscape
Thank you
Raphaël Semeteys - Worldline
@RaphaelSemeteys
https://blog.worldline.tech
https://dev.to/raphiki
Check the two-part article co-written with Luxin Zhang
Want to shape
how the world
pays & gets paid?
Explore our jobs in tech:
careers.worldline.com

More Related Content

Similar to I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers

Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at TwitterChris Aniszczyk
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...FINOS
 
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Ruchi Raveendran
 
Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay
 
NTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceNTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceSteph Nagoski
 
Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010OpenSourceLGMA
 
GPT-4 What It Is, What It Does, How It Deals with Content.pdf
GPT-4 What It Is, What It Does, How It Deals with Content.pdfGPT-4 What It Is, What It Does, How It Deals with Content.pdf
GPT-4 What It Is, What It Does, How It Deals with Content.pdfAdsy
 
Open soucre(cut shrt)
Open soucre(cut shrt)Open soucre(cut shrt)
Open soucre(cut shrt)Shivani Rai
 
OCITA 2012: Opening Up to Open Source Software for Government
OCITA 2012: Opening Up to Open Source Software for GovernmentOCITA 2012: Opening Up to Open Source Software for Government
OCITA 2012: Opening Up to Open Source Software for GovernmentJillmz
 
Open Source In Further Education
Open Source In Further EducationOpen Source In Further Education
Open Source In Further EducationRoss Gardler
 
OSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOpen Science Fair
 
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without ModelOWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without ModelParis Open Source Summit
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic WebOptum
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationSafe Software
 
Economics of Open Source Software
Economics of Open Source SoftwareEconomics of Open Source Software
Economics of Open Source SoftwareRay Toal
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?EDB
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Open Source All The Things
Open Source All The ThingsOpen Source All The Things
Open Source All The ThingsAll Things Open
 

Similar to I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers (20)

Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at Twitter
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
OSSF 2018 - Daniel Izquierdo of Bitergia / InnerSource Commons - Starting wit...
 
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
 
Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009
 
NTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data ScienceNTC16 - Open Data and Open Source Data Science
NTC16 - Open Data and Open Source Data Science
 
Open source
Open sourceOpen source
Open source
 
Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010
 
GPT-4 What It Is, What It Does, How It Deals with Content.pdf
GPT-4 What It Is, What It Does, How It Deals with Content.pdfGPT-4 What It Is, What It Does, How It Deals with Content.pdf
GPT-4 What It Is, What It Does, How It Deals with Content.pdf
 
Open soucre(cut shrt)
Open soucre(cut shrt)Open soucre(cut shrt)
Open soucre(cut shrt)
 
OCITA 2012: Opening Up to Open Source Software for Government
OCITA 2012: Opening Up to Open Source Software for GovernmentOCITA 2012: Opening Up to Open Source Software for Government
OCITA 2012: Opening Up to Open Source Software for Government
 
Open Source In Further Education
Open Source In Further EducationOpen Source In Further Education
Open Source In Further Education
 
OSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform trainingOSFair2017 Training | OpenMinTeD platform training
OSFair2017 Training | OpenMinTeD platform training
 
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without ModelOWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
OWF14 - Plenary Session : Louis-David Benyayer, Président, Without Model
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic Web
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data Integration
 
Economics of Open Source Software
Economics of Open Source SoftwareEconomics of Open Source Software
Economics of Open Source Software
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Open Source All The Things
Open Source All The ThingsOpen Source All The Things
Open Source All The Things
 

More from Raphaël Semeteys

OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...Raphaël Semeteys
 
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Raphaël Semeteys
 
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?Raphaël Semeteys
 
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsSnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsRaphaël Semeteys
 
Solution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptSolution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptRaphaël Semeteys
 
Solutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptSolutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptRaphaël Semeteys
 
Solutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxSolutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxRaphaël Semeteys
 

More from Raphaël Semeteys (12)

OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
 
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
 
Nantes JUG 2023 - Web3
Nantes JUG 2023 - Web3Nantes JUG 2023 - Web3
Nantes JUG 2023 - Web3
 
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
 
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutionsSnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
 
Solution Linux 2009 - QSOS
Solution Linux 2009 - QSOSSolution Linux 2009 - QSOS
Solution Linux 2009 - QSOS
 
Solution Linux 2009 - SVG
Solution Linux 2009 - SVGSolution Linux 2009 - SVG
Solution Linux 2009 - SVG
 
Solution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScriptSolution Linux 2009 - JavaScript
Solution Linux 2009 - JavaScript
 
Solutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScriptSolutions Linux 2008 - JavaScript
Solutions Linux 2008 - JavaScript
 
Solutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail LinuxSolutions Linux 2008 - Poste de travail Linux
Solutions Linux 2008 - Poste de travail Linux
 
Solutions Linux 2008 - ECOS
Solutions Linux 2008 - ECOSSolutions Linux 2008 - ECOS
Solutions Linux 2008 - ECOS
 
Solutions Linux 2007 - QSOS
Solutions Linux 2007 - QSOSSolutions Linux 2007 - QSOS
Solutions Linux 2007 - QSOS
 

Recently uploaded

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 

Recently uploaded (20)

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 

I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers

  • 1. Payments to grow your world Unlocking AI: Navigating Open Source vs. Commercial Frontiers Raphaël Semeteys Head of DevRel, Senior Architect at Worldline March 16th Centrul Regional de Afaceri, Timișoara
  • 2. We design payments technology that powers the growth of millions of businesses around the world. 7000+ engineers in over 40 countries Managing 43+ billion transactions per year €250M spent in R&D every year Handling 150+ payment methods
  • 3. The early days of LLMs From rule-based and simpler statistical models to LLMs 2010’s 2020’s 2017-2018 Word embeddings such as Word2Vec and GloVe “Attention is All You Need" Transformers, BERT Generative AI, ChatGPT responsibility concerns
  • 4. GenAI is having its Linux Moment • Just like open source and Internet, bust much faster! • Dynamics between collaborative openness and commercial ownership • Need of clarity on licenses Labs & Universities Individuals Enterprises Commodities
  • 5. Defining Openness of a LLM Pre-training Dataset Fine-tuning Dataset Reward Model Model Data Processing Code
  • 6. Defining Openness of a LLM Score Level Description Model (weights) Pre- training Dataset Fine- tuning Dataset Reward model Data Processing Code 0 Closed No access to any public information, data or asset 1 Published research only Research papers(s) published but with no more information, data or asset 2 Restricted access Access to asset is possible only with special agreement (commercial, research…) 3 Open with limitations Access and reuse of asset is possible but with certain limitations on usage 4 Totally open Access and reuse of asset is possible without restriction on usage (ex. open source license)
  • 7. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed → GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only
  • 8. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed → GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only No training of other commercial LLMs You may not: […] Use Output to develop models that compete with OpenAI.
  • 9. Market-Leading Player: Google Transition from open research to a pragmatic approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open 1 Published research only 1 Published research only 0 Closed → 3 Open with limitations 1 Published research only 4 Toolchain available →
  • 10. Market-Leading Player: Google Transition from open research to a pragmatic approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open 1 Published research only 1 Published research only 0 Closed → 3 Open with limitations 1 Published research only 4 Toolchain available → You may not use nor allow others to use Gemma or Model Derivatives to: [illegals activities, unlicensed practices of profession, abuse, security bypass and promotion of hatred, abuse, violence, monitoring people without consent, misinformation/defamation, automate decisions concerning human rights and well-being, etc.] Responsible AI contradicts Open Source Definition
  • 11. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa 3 Open with limitations 1 Published research only 1 Published research only →
  • 12. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa 3 Open with limitations 1 Published research only 1 Published research only → Restriction on usage: license for platforms with 700+ M users Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
  • 13. Llama offspring’s: Alpaca and Vicuna Fine-tuned models from Llama 2 by universities Research (US) Component Score Level description Model 3 Open with limitations Pre-training Dataset 1 Published research only Fine-tuning Dataset 2 Research use only Code 4 Under Apache 2 license Restrictions from both Llama 2 and OpenAI (ShareGPT)
  • 14. Collaborative foundational LLMs Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage
  • 15. Collaborative foundational LLMs Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open This license is, in part, based on the Apache License Version 2.0, with a series of modifications. The contribution of the Apache License 2.0 to the framing of this document is acknowledged. Please read this license carefully, as it is different to other ‘open access’ licenses you may have encountered previously. Use of Falcon180B for hosted services may require a separate license.
  • 16. Collaborative fine-tuned LLMs Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source Impact of foundational model or pre-training datasets
  • 17. Collaborative fine-tuned LLMs Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source Impact of foundational model or pre-training datasets BLOOMChat Use Restrictions l. To provide medical advice and medical results interpretation; or m. To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment.
  • 18. Collaboration platform: Hugging Face Enabler for collaboration and reuse • Startup and ecosystem dedicated to democratizing AI • Open source Transformers library • LLM leaderboard: upload and assess models • The “GitHub of AI” • Collaborative space for exploring, sharing and experimenting AI • Hosts thousands of models, datasets, and demo applications
  • 19. Hosting and resource paradigms Closed models are centralized and resource-consuming Big players invest billions (Microsoft/OpenAI, AWS/Anthropic) CSP selling shovels in the AI Gold rush Source: numind.ai
  • 20. Hosting and resource paradigms • Democratizing AI Computing • Quantization, AI Chips • Run models locally, in containers • Emergence of smaller models for edge and mobile • Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT • Domain Specific Language Models: BloombergGPT, BioMistral, Harvey (law) • Mixture of models: Mixtral 8x7B, OpenMoE → Mixture of licenses?
  • 21. Key takeaways • Hyper-centralization leads to black boxes and closed solutions • Openness • Fosters collaboration and fuels community-driven innovation • Enables inclusivity • Just like opensource software beware of licenses and restrictions • GenAI’s innovation continually reshapes the landscape
  • 22. Thank you Raphaël Semeteys - Worldline @RaphaelSemeteys https://blog.worldline.tech https://dev.to/raphiki Check the two-part article co-written with Luxin Zhang
  • 23. Want to shape how the world pays & gets paid? Explore our jobs in tech: careers.worldline.com