technical seminar.pptx on multi model of AI

VISVESVARAYA TECHNOLOGICAL UNIVERSITY
"Jnana Sangama", Belgaum: 590 018
H.K.E Society’s
SIR M VISVESVARAYA COLLEGE OF ENGINEERING
(Affiliated to VTU - Belagavi, Approved by AICTE, Accredited by NAAC)
Yeramarus Camp, Raichur-584135, Karnataka
2023-2024
TECHNICAL SEMINAR PRESENTATION
ON
“MULTIMODAL AI ”
UNDER THE GUIDENCE
OF
DR.SHARAN KUMAR
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

POWERING
THE NEXT
CHAPTER IN
GENERATIVE AI
MULTIMODAL AI

PRESENTED
BY
B CHANDANA
3SL20EC003

CONTENTS
• Introduction
• Literature survey
• Block diagram
• Applications
• Future scope
• Benefits and challenges
• Conclusion
• Reference

Introduction
• Multi modal AI is an
advanced form of artificial
intelligence that is able to
analyze and interpret
multiple modes of data
simultaneously allowing it
to generate more accurate
and human like responses.

Literature survey
• The release of ChatGPT in November 2022, a conversation-focused
model that follows human instructions, further underscored the
feasibility of AGI in practical applications (Liu et al., 2023a). This
development has had a wide-ranging impact across various sectors,
including journalism (Liu et al., 2023c), education (Zhai, 2023; Liu
et al., 2023b), healthcare (Li et al., 2023; Liu et al., [n. d.]; Holmes
et al., 2023), industry (Dou et al., 2023), agriculture (Rezayi
et al., 2023), law (Bubeck et al., 2023), gaming (Bubeck et al., 2023),
and finance (Wu et al., 2023c), catalyzing a popular wave in AI (Liu
et al., 2023a, g, h).
• Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran
Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg,
Antoine Bosselut, Emma Brunskill, et al. 2021.On the opportunities
and risks of foundation models.arXiv preprint
arXiv:2108.07258 (2021).

Sensory Inputs
Sensory inputs refer to the various forms of data collected from different
senses such as vision, hearing, touch, and smell that are processed by
multimodal AI technology for a technical seminar.
Data Fusion
Data fusion involves combining information from multiple modalities, such
as text, images, and videos, to improve the accuracy and robustness of AI
systems in a technical seminar on multimodal AI technology generation.
Machine Learning Algorithms
Machine learning algorithms play a crucial role in generating multimodal
AI technology for technical seminars by effectively analyzing and
interpreting data from multiple sources such as text, images, and audio.
Natural Language Processing
Natural Language Processing is a crucial component of Multimodal AI
technology, allowing for the analysis and understanding of human
language in combination with other modalities such as images or videos.
Computer Vision
Computer Vision is a key component of Multimodal AI technology, which
allows for the integration of visual data processing with other modes of
information to enhance overall system performance.

Applications
• Social media content moderation: Multimodal AI can be used to analyze text, images, and audio to
identify and moderate harmful content on social media platforms. For instance, it can detect hate
speech, violence, and bullying.
• Virtual assistants: Smart assistants like Google Assistant and Amazon Alexa are powered by
multimodal AI. They can understand and respond to natural language commands, both spoken and
typed.
• Healthcare imaging: In healthcare, multimodal AI can analyze medical images (X-rays, MRIs) along
with text reports and patient history data to improve diagnostics. This can lead to more accurate
diagnoses and better patient outcomes.
• Autonomous vehicles: Self-driving cars rely heavily on multimodal AI. They use a variety of sensors,
including cameras, radar, and LiDAR, to perceive their surroundings and navigate safely.
• E-commerce product recommendations: Many e-commerce websites use multimodal AI to
personalize product recommendations for customers. By considering both the product image and
description, the AI can recommend items that are more likely to interest the customer

Conclusion
• The future of AI is not just about seeing or hearing, it's
about truly understanding. Multimodal AI holds the
key to unlocking a new level of human-computer
interaction, with applications that can bridge
communication gaps, enhance our understanding of
the world, and empower us to solve complex
challenges in entirely new ways. The potential for
positive impact across various fields is truly limitless.

References
• Rania Abdelghani, Yen-Hsiang Wang, Xingdi Yuan, Tong Wang,
Pauline Lucas, Hélène Sauzéon, and Pierre-Yves Oudeyer.
2023.GPT-3-driven pedagogical agents for training children’s
curious question-asking skills. International Journal of Artificial
Intelligence in Education 167, 3 (2023), 102887.
• Hang Bao, Wen Wang, Li Dong, Qianru Liu, Ola K. Mohammed,
Kirti Aggarwal, and Fang Wei. 2022.Vlmo: Unified vision-language
pre-training with mixture-of-modality-experts. In Advances in
Neural Information Processing Systems (NeurIPS), Vol. 35. 32897–
32912.

technical seminar.pptx on multi model of AI

Recommended

Recommended

More Related Content

Similar to technical seminar.pptx on multi model of AI

Similar to technical seminar.pptx on multi model of AI (20)

Recently uploaded

Recently uploaded (20)

technical seminar.pptx on multi model of AI