Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
https://www.youtube.com/watch?v=Yeua8NlzQ3Y
https://www.conf42.com/Large_Language_Models_LLMs_2024_Tim_Spann_generative_ai_streaming
Adding Generative AI to Real-Time Streaming Pipelines
Abstract
Let’s build streaming pipelines that convert streaming events into prompts, call LLMs, and process the results.
Summary
Tim Spann: My talk is adding generative AI to real time streaming pipelines. I'm going to discuss a couple of different open source technologies. We'll touch on Kafka, Nifi, Flink, Python, Iceberg. All the slides, all the code and GitHub are out there.
Llm, if you didn't know, is rapidly evolving. There's a lot of different ways to interact with models. That enrichment, transformation, processing really needs tools. The amount of models and projects and software that are available is massive.
Nifi supports hundreds of different inputs and can convert them on the fly. Great way to distribute your data quickly to whoever needs it without duplication, without tight coupling. Fun to find new things to integrate into.
So what we can do is, well, I want to get a meetup chat going. I have a processor here that just listens for events as they come from slack. And then I'm going to clean it up, add a couple fields and push that out to slack. Every model is a little bit of different tweaking.
Nifi acts as a whole website. And as you see here, it can be get, post, put, whatever you want. We send that response back to flink and it shows up here. Thank you for attending this talk. I'm going to be speaking at some other events very shortly.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, Tim Spann here. My talk is adding generative AI to real time streaming pipelines, and we're here for the large language model conference at Comp 42, which is always a nice one, great place to be. I'm going to discuss a couple of different open source technologies that work together to enable you to build real time pipelines using large language models. So we'll touch on Kafka, Nifi, Flink, Python, Iceberg, and I'll show you a little bit of each one in the demos. I've been working with data machine learning, streaming IoT, some other things for a number of years, and you could contact me at any of these places, whether Twitter or whatever it's called, some different blogs, or in person at my meetups and at different conferences around the world. I do a weekly newsletter, cover streaming ML, a lot of LLM, open source, Python, Java, all kinds of fun stuff, as I mentioned, do a bunch of different meetups. They are not just in the east coast of the US, they are available virtually live, and I also put them on YouTube, and if you need them somewhere else, let me know. We publish all the slides, all the code and GitHub. Everything you need is out there. Let's get into the talk. Llm, if you didn't know, is rapidly evolving. While you're typing down the things that you use, it
29. Extract Text from Web VTT
● Python 3.10+
● Web VTT to Text
● Web Video Text Tracks Format Extractor
https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API
https://github.com/tspannhw/FLaNK-python-processors/blob/main/TranslateWebVTT.py
WEBVTT
1
00:00:06.066 --> 00:00:07.166
Now let's talk about
2
00:00:07.166 --> 00:00:12.033
data retrieval, views,
and materialized views.
30. WatsonX SDK To Foundation
● Python 3.10+
● LLM
● WatsonX.AI Foundation Models
● Inference
● Secure
● Official SDK from IBM
https://github.com/tspannhw/FLaNK-python-watsonx-processor
31. Generate Synthetic Records w/
Faker
● Python 3.10+
● faker
● Choose as many as you want
● Attribute output
32. Download a Wiki Page as
HTML or WikiFormat (Text)
● Python 3.10+
● Wikipedia-api
● HTML or Text
● Choose your wiki page dynamically
34. CaptionImage
● Python 3.10+
● Hugging Face
● Salesforce/blip-image-captioning-large
● Generate Captions for Images
● Adds captions to FlowFile Attributes
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
35. RESNetImageClassification
● Python 3.10+
● Hugging Face
● Transformers
● Pytorch
● Datasets
● microsoft/resnet-50
● Adds classification label to FlowFile
Attributes
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
36. NSFWImageDetection
● Python 3.10+
● Hugging Face
● Transformers
● Falconsai/nsfw_image_detection
● Adds normal and nsfw to FlowFile
Attributes
● Gives score on safety of image
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
37. FacialEmotionsImageDetection
● Python 3.10+
● Hugging Face
● Transformers
● facial_emotions_image_detection
● Image Classification
● Adds labels/scores to FlowFile Attributes
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
38. Other Python Processors
● Put/Query-Pinecone (Vector DB Interface)
● ChunkDocument, ParseDocument
● ConvertCSVtoExcel
● DetectObjectInImage
● PromptChatGPT
● Put/Query-Chroma (Vector DB Interface)