SlideShare a Scribd company logo
1 of 35
Download to read offline
Building an Event Driven
Gen-AI powered video
summarizer API
I am Zied Ben Tahar
AWS Community Builder / Principal Architect @ Aviv Group /
You can find me:
Blog: https:/
/zied-ben-tahar.medium.com/
Linkedin: https:/
/www.linkedin.com/in/zied-ben-tahar-5200913/
Twitter : @wow_sig
Bonjour !
Agenda
01 02
04
03
05
GenAI on AWS - A
quick overview
Let’s build: AI powered
Video summarizer
Video summarizer -
Scaling the
architecture with EDA
Api-fying the solution Demo Time
GenAI on AWS - A quick
overview
01
Amazon Bedrock
A serverless Gen-AI service
Managed FMs
Access to foundation models
from leading partners on AI space
Customizable
Capabilities to adapt and
privately customize FMs to
domain specific problem
Serverless
No infrastructure to manage,
Good integration with AWS
ecosystem
Gen AI
Addressing some LLM challenges
How to solve issues related to knowledge cut-off, hallucinations, blackbox ?
➔ Prompt engineering
◆ Optimizing prompts for specific tasks and guiding the model to yield the
expected answers
➔ Retrieval augmented generation (RAG)
◆ Allow the model to access to external/up to date information to enhance
response.
➔ Model fine-tuning
◆ Additional training of a pre-trained model on task-specific datasets.
Anthropic’s Claude (2.1) 3
A serious alternative to GPT 4
➔ Handles large context windows, 200K tokens (150k
words)
➔ Able to maintain factual consistency when processing
long prompts
➔ Can generate code and structured data
➔ Vision capabilities (new with Sonnet)
Claude 3 benchmarks
Let’s build!
AI powered Video
summarizer
02
Solution overview
Summarization workflow, first attempt
Extract/Get Video
transcript
Summarize & extract
structured data
Generate audio from
summary
Profit !
01 02 03 04
Solution overview
Serverless Summarizer workflow, a first “naïve” approach
Deep dive
Getting video transcripts
Konami’ing your way into video transcripts
🔗 https:/
/github.com/Kakulukian/youtube-transcript
~ Fast, as it retrieves already generated transcripts from youtube
~ …but it’s brittle, based on a non official youtube API, does not work on
some languages
Deep dive
Prompting techniques with Claude
You are a video transcript summarizer.
Here is a video transcription that you must summarize:
<transcript>{{transcript}}</transcript>
Summarize this transcript in a third person point of view in 10 sentences.
Identify the speakers and the main topics of the transcript and add them in
the output as well.
Do not add or invent speaker names if you not able to identify them.
You must output the summary JSON format conforming to this JSON schema:
{
"type": "object",
"properties": {
"speakers": {
"type": "array",
"items": {
"type": "string"
}
},
"topics": {
"type": "string"
},
"summary": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
➔ Clear without ambiguity, no
room for assumptions
➔ Don’t be polite, be
straightforward
➔ Concrete example/ schema
for structured outputs
Deep dive
Putting words in Claude’s mouth
For structured output, we’ll need to give hints to the Assistant by prefilling its response
With Claude’s Completion API
(Legacy)
Message API
Deep dive
Putting words in Claude’s mouth
❌ ✅
Deep dive
Invoking the model - state machine definition
1- Generate model parameters and write to an s3
bucket
2- Direct invoke to claude model
3- Making sure that JSON is well formed
Deep diving
Invoking the model
Can we do better ?
A better solution
Enters Amazon Transcribe
~ A more stable solution that supports many languages and more media types
~ Some important architectural side effects: The system is now event based and
asynchronous
This is nice…but how can I scale
my architecture ?
Scaling the architecture
with EDA
03
Scaling the architecture
Orchestration or Choreography ?
~ Orchestration: Greater control
over complex workflows. Complex
to evolve when collaborating with
external contexts
~ Choreography: Flexible and and
easier to extend but harder to
Observe
Which pattern works best ?
Scaling the architecture
Let’s try with Choreography
~ Extensible: Tasks can be added or removed without impacting the entire system
~ Resilient: If a task fails, it does not impact the entire application
Scaling the architecture
Scatter-gather pattern
~ Aggregator: A central component handling the events from the scatter phase.
Scaling the architecture
Scatter-gather pattern
…Yes, but
Scaling the architecture
Aggregator pattern can be hard to tame
● What if…
○ a task fails
○ or never publishes the expected event
● How to make sure that failure is handled properly ?
● How to monitor the overall progress of a given Job ?
● How to make sure that my system is observed properly
So…Orchestration or Choreography ?
Scaling the architecture
Orchestration or Choreography…Why not both ?
Apify-ing the solution
04
APIs, like diamonds, are forever
Joshua Bloch
Apify-ing the solution
Principles on designing good “public” Event-Driven APIs
● Be conservative on what you send (Postel’s Law)
● Be mindful of the observed behavior of your APIs (Hyrum’s law)
● Make clear distinction between your private and public events
● Adopt a sane versioning strategy
● Adopt standards: CloudEvents & AsyncAPI
Apify-ing the solution
The awesomest GenAI powered video summarizer
Demo
05
Thanks !
Questions ?
Source code👇

More Related Content

Similar to eda-genai-believe-in-serverless-talk.pdf

Spring '20 Developer Release Highlights
Spring '20 Developer Release HighlightsSpring '20 Developer Release Highlights
Spring '20 Developer Release HighlightsPeter Knolle
 
Questions Log: Tips for Intermediate Cognos Report Studio Authors
Questions Log: Tips for Intermediate Cognos Report Studio AuthorsQuestions Log: Tips for Intermediate Cognos Report Studio Authors
Questions Log: Tips for Intermediate Cognos Report Studio AuthorsSenturus
 
The OpenOffice.org specification process demystified
The OpenOffice.org specification process demystifiedThe OpenOffice.org specification process demystified
The OpenOffice.org specification process demystifiedAlexandro Colorado
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.Vlad Fedosov
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
 
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedJoão Pedro Martins
 
Front-End Modernization for Mortals
Front-End Modernization for MortalsFront-End Modernization for Mortals
Front-End Modernization for Mortalscgack
 
Front end-modernization
Front end-modernizationFront end-modernization
Front end-modernizationdevObjective
 
Lean Engineering: How to make Engineering a full Lean UX partner
Lean Engineering: How to make Engineering a full Lean UX partnerLean Engineering: How to make Engineering a full Lean UX partner
Lean Engineering: How to make Engineering a full Lean UX partnerBill Scott
 
Building a Great AEM Team: Time Warner Cable's Journey
Building a Great AEM Team: Time Warner Cable's JourneyBuilding a Great AEM Team: Time Warner Cable's Journey
Building a Great AEM Team: Time Warner Cable's JourneyiCiDIGITAL
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Comment tirer partie de Visual Studio Online pour vos développements SharePoint
Comment tirer partie de Visual Studio Online pour vos développements SharePointComment tirer partie de Visual Studio Online pour vos développements SharePoint
Comment tirer partie de Visual Studio Online pour vos développements SharePointGilles Pommier
 
Fowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing WorkshopFowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing WorkshopMark Masterson
 
Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Julien SIMON
 
PHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpPHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpAhmed Abdou
 

Similar to eda-genai-believe-in-serverless-talk.pdf (20)

Spring '20 Developer Release Highlights
Spring '20 Developer Release HighlightsSpring '20 Developer Release Highlights
Spring '20 Developer Release Highlights
 
Questions Log: Tips for Intermediate Cognos Report Studio Authors
Questions Log: Tips for Intermediate Cognos Report Studio AuthorsQuestions Log: Tips for Intermediate Cognos Report Studio Authors
Questions Log: Tips for Intermediate Cognos Report Studio Authors
 
XP Injection
XP InjectionXP Injection
XP Injection
 
XP Injection
XP InjectionXP Injection
XP Injection
 
Feature folders
Feature foldersFeature folders
Feature folders
 
The OpenOffice.org specification process demystified
The OpenOffice.org specification process demystifiedThe OpenOffice.org specification process demystified
The OpenOffice.org specification process demystified
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
 
Enabling Lean at Enterprise Scale: Lean Engineering in Action
Enabling Lean at Enterprise Scale: Lean Engineering in ActionEnabling Lean at Enterprise Scale: Lean Engineering in Action
Enabling Lean at Enterprise Scale: Lean Engineering in Action
 
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
 
Front-End Modernization for Mortals
Front-End Modernization for MortalsFront-End Modernization for Mortals
Front-End Modernization for Mortals
 
Front end-modernization
Front end-modernizationFront end-modernization
Front end-modernization
 
Front end-modernization
Front end-modernizationFront end-modernization
Front end-modernization
 
Lean Engineering: How to make Engineering a full Lean UX partner
Lean Engineering: How to make Engineering a full Lean UX partnerLean Engineering: How to make Engineering a full Lean UX partner
Lean Engineering: How to make Engineering a full Lean UX partner
 
Building a Great AEM Team: Time Warner Cable's Journey
Building a Great AEM Team: Time Warner Cable's JourneyBuilding a Great AEM Team: Time Warner Cable's Journey
Building a Great AEM Team: Time Warner Cable's Journey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Comment tirer partie de Visual Studio Online pour vos développements SharePoint
Comment tirer partie de Visual Studio Online pour vos développements SharePointComment tirer partie de Visual Studio Online pour vos développements SharePoint
Comment tirer partie de Visual Studio Online pour vos développements SharePoint
 
Fowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing WorkshopFowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing Workshop
 
Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)
 
PHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpPHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in php
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

eda-genai-believe-in-serverless-talk.pdf

  • 1. Building an Event Driven Gen-AI powered video summarizer API
  • 2. I am Zied Ben Tahar AWS Community Builder / Principal Architect @ Aviv Group / You can find me: Blog: https:/ /zied-ben-tahar.medium.com/ Linkedin: https:/ /www.linkedin.com/in/zied-ben-tahar-5200913/ Twitter : @wow_sig Bonjour !
  • 3. Agenda 01 02 04 03 05 GenAI on AWS - A quick overview Let’s build: AI powered Video summarizer Video summarizer - Scaling the architecture with EDA Api-fying the solution Demo Time
  • 4. GenAI on AWS - A quick overview 01
  • 5. Amazon Bedrock A serverless Gen-AI service Managed FMs Access to foundation models from leading partners on AI space Customizable Capabilities to adapt and privately customize FMs to domain specific problem Serverless No infrastructure to manage, Good integration with AWS ecosystem
  • 6. Gen AI Addressing some LLM challenges How to solve issues related to knowledge cut-off, hallucinations, blackbox ? ➔ Prompt engineering ◆ Optimizing prompts for specific tasks and guiding the model to yield the expected answers ➔ Retrieval augmented generation (RAG) ◆ Allow the model to access to external/up to date information to enhance response. ➔ Model fine-tuning ◆ Additional training of a pre-trained model on task-specific datasets.
  • 7. Anthropic’s Claude (2.1) 3 A serious alternative to GPT 4 ➔ Handles large context windows, 200K tokens (150k words) ➔ Able to maintain factual consistency when processing long prompts ➔ Can generate code and structured data ➔ Vision capabilities (new with Sonnet) Claude 3 benchmarks
  • 8. Let’s build! AI powered Video summarizer 02
  • 9. Solution overview Summarization workflow, first attempt Extract/Get Video transcript Summarize & extract structured data Generate audio from summary Profit ! 01 02 03 04
  • 10. Solution overview Serverless Summarizer workflow, a first “naïve” approach
  • 11. Deep dive Getting video transcripts Konami’ing your way into video transcripts 🔗 https:/ /github.com/Kakulukian/youtube-transcript ~ Fast, as it retrieves already generated transcripts from youtube ~ …but it’s brittle, based on a non official youtube API, does not work on some languages
  • 12. Deep dive Prompting techniques with Claude You are a video transcript summarizer. Here is a video transcription that you must summarize: <transcript>{{transcript}}</transcript> Summarize this transcript in a third person point of view in 10 sentences. Identify the speakers and the main topics of the transcript and add them in the output as well. Do not add or invent speaker names if you not able to identify them. You must output the summary JSON format conforming to this JSON schema: { "type": "object", "properties": { "speakers": { "type": "array", "items": { "type": "string" } }, "topics": { "type": "string" }, "summary": { "type": "array", "items": { "type": "string" } } } } ➔ Clear without ambiguity, no room for assumptions ➔ Don’t be polite, be straightforward ➔ Concrete example/ schema for structured outputs
  • 13. Deep dive Putting words in Claude’s mouth For structured output, we’ll need to give hints to the Assistant by prefilling its response With Claude’s Completion API (Legacy) Message API
  • 14. Deep dive Putting words in Claude’s mouth ❌ ✅
  • 15. Deep dive Invoking the model - state machine definition 1- Generate model parameters and write to an s3 bucket 2- Direct invoke to claude model 3- Making sure that JSON is well formed
  • 17. Can we do better ?
  • 18. A better solution Enters Amazon Transcribe ~ A more stable solution that supports many languages and more media types ~ Some important architectural side effects: The system is now event based and asynchronous
  • 19. This is nice…but how can I scale my architecture ?
  • 21. Scaling the architecture Orchestration or Choreography ? ~ Orchestration: Greater control over complex workflows. Complex to evolve when collaborating with external contexts ~ Choreography: Flexible and and easier to extend but harder to Observe
  • 23. Scaling the architecture Let’s try with Choreography ~ Extensible: Tasks can be added or removed without impacting the entire system ~ Resilient: If a task fails, it does not impact the entire application
  • 24. Scaling the architecture Scatter-gather pattern ~ Aggregator: A central component handling the events from the scatter phase.
  • 27. Scaling the architecture Aggregator pattern can be hard to tame ● What if… ○ a task fails ○ or never publishes the expected event ● How to make sure that failure is handled properly ? ● How to monitor the overall progress of a given Job ? ● How to make sure that my system is observed properly
  • 29. Scaling the architecture Orchestration or Choreography…Why not both ?
  • 31. APIs, like diamonds, are forever Joshua Bloch
  • 32. Apify-ing the solution Principles on designing good “public” Event-Driven APIs ● Be conservative on what you send (Postel’s Law) ● Be mindful of the observed behavior of your APIs (Hyrum’s law) ● Make clear distinction between your private and public events ● Adopt a sane versioning strategy ● Adopt standards: CloudEvents & AsyncAPI
  • 33. Apify-ing the solution The awesomest GenAI powered video summarizer