SlideShare a Scribd company logo
1 of 31
Uncovering the Causes of Emotions in
Software Developer Communication
Using Zero-shot LLMs
Mia Mohammad Imran, Preetha Chatterjee,
Kostadin Damevski
Drexel University Virginia Commonwealth
University
Understanding Emotion Cause in OSS
Emotion cause involves identifying the text span within an
utterance that triggers a particular emotion
Frustration
"I'm feeling frustrated because the code isn't
compiling no matter what I try."
the code isn't compiling no
matter what I try
Cause
Emotion
Outline
● Emotion Models
● Emotion Classification
● Emotion Cause Extraction
● Case Study
Emotion Models
Emotion Models
● Theoretical frameworks to represent emotions
● Shaver’s tree-structured model is most commonly used in
Software Engineering Research
○ 6 primary categories, 25 secondary categories and over 100
tertiary categories
● GoEmotions is a recently developed model by Google for
text
Emotion Models: Shaver’s Taxonomy
● 6 primary categories:
○ Anger 😡
○ Love ❤️
○ Fear 😨
○ Joy 😊
○ Sadness 😥
○ Surprise 😲
Shaver’s Taxonomy Is Not Complete
● “I’m curious about this - can you give more context on
what exactly goes wrong? Perhaps if that causes bugs this
should be prohibited instead?"
○ Expresses Curiosity 🤔
● “And, I am a little confused, if there is not any special
folder, according to the module resolution [URL] How
could file find the correct modules? Did I miss something?”
○ Expresses Confusion 😕
Extended Shaver’s Taxonomy
● Imran et al. [1] proposed an extended Shaver’s Taxonomy
by combining GoEmotions’ categories
● Provides mapping between GoEmotions’ categories and
primary emotions:
○ 👍 Approval to 😊 Joy
○ 👎 Disapproval to 😡 Anger
○ 🤔 Curiosity to 😲 Surprise
○ 🙌 Gratitude to ❤️ Love
[1] Imran et al., “Data augmentation for improving emotion recognition in software engineering communication.” ASE 2022
Emotion Classification
State-of-the Art Models
ESEM-E [1] SVM Unigram, bigram
EMTk [2] SVM Unigram, bigram, lexicon, polarity, mood
SEntiMoji [3] Transfer learning Neural Network
[1] Murgia et al., “An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems.”, ESEM, 2018
[2] Calefato et al., “Emtk-the emotion mining toolkit.” SEmotion, 2019
[3] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021
● Studies show that general purpose tools perform poorly in
software engineering text
● All tools perform one-vs-all predictions for all 6 basic
emotions (Anger, Love, Fear, Joy, Sadness, and Surprise)
Compared Fine-tuned LLMs
● BERT: First major transformer model applied
to NLP
● RoBERTa: An optimized version of BERT
when LLMs can be fine-tuned with
task-specific data
Fine-tuned LLMs
Compared Zero-shot LLMs
● ChatGPT (GPT-3.5): Proprietary model by OpenAI
● GPT-4: Updated version of gpt-3.5
● flan-alpaca: open-source
○ variation of Meta’s LLaMA model
○ instruct tuned with Google’s Flan-T5 model
when LLMs can make decisions on
unseen tasks without prior training
Zero-shot reasoning
Evaluating the Models
● Goal: Assess effectiveness of LLMs against SotA model
● Compared against three existing datasets from GitHub[1],
JIRA [2] and Stack Overflow [3]
● 80% train set, 20% test set with stratified sampling
● Metric: F1-score (micro-average F1-score)
[1] Imran et al., “Data augmentation for improving emotion recognition in software engineering communication.” ASE 2022
[2] Murgia et al., “An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems.” ESEM, 2018
[3] Calefato et al., “Emtk-the emotion mining toolkit.” SEmotion, 2019
Prompt Design for Zero-shot LLM Reasoners
You are a [GitHub/Stack Overflow/JIRA] user. You are reading
comments from [GitHub/Stack Overflow/JIRA]. Your task is to
detect whether there is one of the following emotions aroused in
you while reading the utterance.
Emotions List: Anger, Fear, Love, Joy, Sadness, Surprise.
Utterance: <insert utterance>.
If there is no emotion in the text, write Neutral. Otherwise write
exactly one word, the exact emotion from the emotions list.
Results (Micro-average F1-score)
Model GitHub Stack Overflow JIRA
ESEM-E 0.440 0.674 0.744
EMTk 0.434 0.651 0.734
SEntiMoji 0.529 0.721 0.793
BERT 0.588 0.716 0.817
RoBERTa 0.592 0.735 0.818
ChatGPT 0.234 0.339 0.276
GPT-4 0.424 0.293 0.432
flan-alpaca 0.355 0.444 0.256
Fine-tuned BERT and
RoBERTa other
models
Zero-shot LLMs
performs badly!
We do error analysis
to understand why
Error Analysis
● Misclassifying one emotion as other, i.e., Love as Joy
● Predicting Neutral
"My concern is that more new atributes may appear [...]
it may break their behavior."
● Hallucinations: Generated responses that were outside of
what asked
Apology: "Doh. Sorry for wasting your time."
Zero-shot LLMs: Granular Level
Prompting
● From the GitHub dataset, sampled 400 utterances from training
set and perform prompting
● Designed prompts based on various emotion taxonomies:
○ Using Basic and Secondary emotions (total 36 emotions)
○ Secondary layer only (total 25 emotions)
○ Using all layers of emotions (total 141 emotions)
○ Using GoEmotions taxonomy (total 27 emotions)
Output Emotions are mapped to basic emotions
GoEmotions taxonomy performed best in F1-
score
How the Zero-shot LLMs Perform Now
● Output on GitHub Dataset
● Open-source flan-alpaca achieved best zero-shot
performance, outperformed GPT-4!
Model Anger Love Fear Joy Sadness Surprise Micro avg.
BERT 0.506 0.712 0.536 0.579 0.636 0.594 0.588
RoBERTa 0.525 0.683 0.492 0.500 0.613 0.673 0.592
ChatGPT 0.337 0.490 0.182 0.458 0.412 0.511 0.423
flan-alpaca 0.447 0.543 0.140 0.446 0.451 0.740 0.507
GPT-4 0.437 0.698 0.0 0.446 0.487 0.517 0.481
SotA Model GitHub
ESEM-E 0.440
EMTk 0.434
SEntiMoji 0.529
Zero-shot LLMs for Emotion-Cause
Extraction
Emotion Cause Extraction
● Emotion cause extraction involves extracting the text span
within an utterance that triggers a particular emotion
Frustration
"I'm feeling frustrated because the code isn't
compiling no matter what I try."
the code isn't compiling no
matter what I try
Cause
Emotion
Emotion Cause Extraction - Challenges
Annotation
● Requires understanding nuances in textual communication
● Causes can be implicit
● There can be multiple causes
Automatically cause extraction
● Requires large amounts of training data which we lack
Zero-shot LLMs for Cause Extraction
● Requires no training to extract causes
● Prompt design is critical
● Use same three models:
○ ChatGPT
○ GPT-4
○ flan-alpaca
Emotion Cause Extraction: Prompt
You are a GitHub user. You are reading utterances from
GitHub issues and pull requests. Your task is to extract the
span that is causing the emotion <insert emotion> in the
following GitHub utterance: <insert utterance>.
Write the cause of the span within a double quote.
Experiment Setup: Annotation
● Manually annotated 450 utterances
○ 75 utterances for each of 6 basic emotions
● Instructions:
○ Extract cause span to associated emotion
○ Allow multiple causes
Experiment Setup: Metric
● We use BLEU score as a metric
○ Compares machine-generated text to human references
○ Measures precision of n-gram overlap
● BLEU-2 (bigram) suitable for comparing short texts
● Interpretation:
○ 0.5 - Good fluency and correctness
○ 0.3-0.5 - Comprehensible
○ < 0.3 - Disfluent or incorrect
Results
● GPT-4 outperform in each cases
● BLEU-2 score for GPT-4 and flan-alpaca > 0.5 - which
indicates they perform reasonably well in correctness
Model BLEU-1 BLEU-2 BLEU-3 BLEU-4
ChatGPT 0.522 0.489 0.467 0.450
GPT-4 0.637 0.598 0.571 0.554
flan-alpaca 0.571 0.543 0.525 0.508
Error Analysis
● 41 cases where all three model BLEU-2 score < 0.3
● Two categories of error:
○ Incorrect emotion detection
○ Identifying wrong cause span
“Oh right 🙃! This started as a Mac issue, I
forgot to add the rest.”
Annotation: Neglect (2nd level of Sadness)
GPT-4 Detected emotion: Amusement
GPT-4 Detected cause span: Oh right 🙃
“[USER] yep, it is bug, we will fix it, so we
have it in ‘experiments‘ :+1:”
Annotation: Agreement (2nd level of Joy)
GPT-4 Detected emotion: Agreement
GPT-4 Detected cause span: we will fix it
Incorrect emotion
Wrong cause span
Case Study
A Case Study on Emotion Cause
● Frustration on Tensorflow Repository using flan-alpaca
● Collected all comments made by developers 1 year period
● Extracted causes when the emotion is Frustration
● Resulted a total of 1275 comments
● Applied DBSCAN clustering on causes
Methodology
Causes of Frustration
● TensorFlow Version and Dependency Issues
● Pull Request Delays and Merge Conflicts
● Failing Tests
● Too Fine-Grained Commits
● CI Flakiness
● CUDA/CuDNN Compatibility Issues
Summary of Contributions
● Utilization of Zero-shot LLMs: Employed zero-shot models like GPT-3.5,
GPT-4 and flan-alpaca for detecting emotions and their causes in SE
● Annotated Data: 450 GitHub utterances with Emotion and Causes
● Resource Sharing: Publicly released source code, annotation
guidelines, and dataset
● Novel Research: Among the first to explore Emotion Causes in SE
● Open-source Case Study: Demonstrated practical benefits of emotion
cause extraction through a case study on a major open-source project
Questions/Thoughts/Collaboration Ideas to:
Mia Mohammad Imran, imranm3@vcu.edu

More Related Content

Similar to Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMs

Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Siddharth Adelkar
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Inspirit AI Facial Emotion Detection Project (Dec 2021)
Inspirit AI Facial Emotion Detection Project (Dec 2021)Inspirit AI Facial Emotion Detection Project (Dec 2021)
Inspirit AI Facial Emotion Detection Project (Dec 2021)EmilyJoseph18
 
Futuristic Background _ by Slidesgo.pptx
Futuristic Background _ by Slidesgo.pptxFuturistic Background _ by Slidesgo.pptx
Futuristic Background _ by Slidesgo.pptxMurlidharBansal3
 
Reporting Metasystem Design and Penalization Strategy Best Practices (Present...
Reporting Metasystem Design and Penalization Strategy Best Practices (Present...Reporting Metasystem Design and Penalization Strategy Best Practices (Present...
Reporting Metasystem Design and Penalization Strategy Best Practices (Present...Intel® Software
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisSagar Ahire
 
[DSC Europe 23][Pandora] Nikola_Vasiljevic-Leveraging_Sentiment_and_Topic_Det...
[DSC Europe 23][Pandora] Nikola_Vasiljevic-Leveraging_Sentiment_and_Topic_Det...[DSC Europe 23][Pandora] Nikola_Vasiljevic-Leveraging_Sentiment_and_Topic_Det...
[DSC Europe 23][Pandora] Nikola_Vasiljevic-Leveraging_Sentiment_and_Topic_Det...DataScienceConferenc1
 
Nlp whitepaper the securly way
Nlp whitepaper   the securly wayNlp whitepaper   the securly way
Nlp whitepaper the securly waySecurly
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
Cooperative game model based sentiment analysis of product reviews.pptx
Cooperative game model based sentiment analysis of product reviews.pptxCooperative game model based sentiment analysis of product reviews.pptx
Cooperative game model based sentiment analysis of product reviews.pptxUsamaHassan90
 
Lecture 7 program development issues (supplementary)
Lecture 7  program development issues (supplementary)Lecture 7  program development issues (supplementary)
Lecture 7 program development issues (supplementary)alvin567
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
 
Imagine that you are a public health nurse, and you and your colle
Imagine that you are a public health nurse, and you and your colleImagine that you are a public health nurse, and you and your colle
Imagine that you are a public health nurse, and you and your colleLizbethQuinonez813
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learningVenkat Projects
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Sentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsSentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsRaimon Bosch
 

Similar to Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMs (20)

Introduction To Pc Security
Introduction To Pc SecurityIntroduction To Pc Security
Introduction To Pc Security
 
Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Inspirit AI Facial Emotion Detection Project (Dec 2021)
Inspirit AI Facial Emotion Detection Project (Dec 2021)Inspirit AI Facial Emotion Detection Project (Dec 2021)
Inspirit AI Facial Emotion Detection Project (Dec 2021)
 
Futuristic Background _ by Slidesgo.pptx
Futuristic Background _ by Slidesgo.pptxFuturistic Background _ by Slidesgo.pptx
Futuristic Background _ by Slidesgo.pptx
 
Reporting Metasystem Design and Penalization Strategy Best Practices (Present...
Reporting Metasystem Design and Penalization Strategy Best Practices (Present...Reporting Metasystem Design and Penalization Strategy Best Practices (Present...
Reporting Metasystem Design and Penalization Strategy Best Practices (Present...
 
Sentimental analysis
Sentimental analysisSentimental analysis
Sentimental analysis
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
[DSC Europe 23][Pandora] Nikola_Vasiljevic-Leveraging_Sentiment_and_Topic_Det...
[DSC Europe 23][Pandora] Nikola_Vasiljevic-Leveraging_Sentiment_and_Topic_Det...[DSC Europe 23][Pandora] Nikola_Vasiljevic-Leveraging_Sentiment_and_Topic_Det...
[DSC Europe 23][Pandora] Nikola_Vasiljevic-Leveraging_Sentiment_and_Topic_Det...
 
ECCAA
ECCAAECCAA
ECCAA
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Nlp whitepaper the securly way
Nlp whitepaper   the securly wayNlp whitepaper   the securly way
Nlp whitepaper the securly way
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
Cooperative game model based sentiment analysis of product reviews.pptx
Cooperative game model based sentiment analysis of product reviews.pptxCooperative game model based sentiment analysis of product reviews.pptx
Cooperative game model based sentiment analysis of product reviews.pptx
 
Lecture 7 program development issues (supplementary)
Lecture 7  program development issues (supplementary)Lecture 7  program development issues (supplementary)
Lecture 7 program development issues (supplementary)
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
 
Imagine that you are a public health nurse, and you and your colle
Imagine that you are a public health nurse, and you and your colleImagine that you are a public health nurse, and you and your colle
Imagine that you are a public health nurse, and you and your colle
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learning
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Sentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsSentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-models
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMs

  • 1. Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMs Mia Mohammad Imran, Preetha Chatterjee, Kostadin Damevski Drexel University Virginia Commonwealth University
  • 2. Understanding Emotion Cause in OSS Emotion cause involves identifying the text span within an utterance that triggers a particular emotion Frustration "I'm feeling frustrated because the code isn't compiling no matter what I try." the code isn't compiling no matter what I try Cause Emotion
  • 3. Outline ● Emotion Models ● Emotion Classification ● Emotion Cause Extraction ● Case Study
  • 5. Emotion Models ● Theoretical frameworks to represent emotions ● Shaver’s tree-structured model is most commonly used in Software Engineering Research ○ 6 primary categories, 25 secondary categories and over 100 tertiary categories ● GoEmotions is a recently developed model by Google for text
  • 6. Emotion Models: Shaver’s Taxonomy ● 6 primary categories: ○ Anger 😡 ○ Love ❤️ ○ Fear 😨 ○ Joy 😊 ○ Sadness 😥 ○ Surprise 😲
  • 7. Shaver’s Taxonomy Is Not Complete ● “I’m curious about this - can you give more context on what exactly goes wrong? Perhaps if that causes bugs this should be prohibited instead?" ○ Expresses Curiosity 🤔 ● “And, I am a little confused, if there is not any special folder, according to the module resolution [URL] How could file find the correct modules? Did I miss something?” ○ Expresses Confusion 😕
  • 8. Extended Shaver’s Taxonomy ● Imran et al. [1] proposed an extended Shaver’s Taxonomy by combining GoEmotions’ categories ● Provides mapping between GoEmotions’ categories and primary emotions: ○ 👍 Approval to 😊 Joy ○ 👎 Disapproval to 😡 Anger ○ 🤔 Curiosity to 😲 Surprise ○ 🙌 Gratitude to ❤️ Love [1] Imran et al., “Data augmentation for improving emotion recognition in software engineering communication.” ASE 2022
  • 10. State-of-the Art Models ESEM-E [1] SVM Unigram, bigram EMTk [2] SVM Unigram, bigram, lexicon, polarity, mood SEntiMoji [3] Transfer learning Neural Network [1] Murgia et al., “An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems.”, ESEM, 2018 [2] Calefato et al., “Emtk-the emotion mining toolkit.” SEmotion, 2019 [3] Chen et al. “Emoji-powered sentiment and emotion detection from software developers' communication data.” TOSEM, 2021 ● Studies show that general purpose tools perform poorly in software engineering text ● All tools perform one-vs-all predictions for all 6 basic emotions (Anger, Love, Fear, Joy, Sadness, and Surprise)
  • 11. Compared Fine-tuned LLMs ● BERT: First major transformer model applied to NLP ● RoBERTa: An optimized version of BERT when LLMs can be fine-tuned with task-specific data Fine-tuned LLMs
  • 12. Compared Zero-shot LLMs ● ChatGPT (GPT-3.5): Proprietary model by OpenAI ● GPT-4: Updated version of gpt-3.5 ● flan-alpaca: open-source ○ variation of Meta’s LLaMA model ○ instruct tuned with Google’s Flan-T5 model when LLMs can make decisions on unseen tasks without prior training Zero-shot reasoning
  • 13. Evaluating the Models ● Goal: Assess effectiveness of LLMs against SotA model ● Compared against three existing datasets from GitHub[1], JIRA [2] and Stack Overflow [3] ● 80% train set, 20% test set with stratified sampling ● Metric: F1-score (micro-average F1-score) [1] Imran et al., “Data augmentation for improving emotion recognition in software engineering communication.” ASE 2022 [2] Murgia et al., “An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems.” ESEM, 2018 [3] Calefato et al., “Emtk-the emotion mining toolkit.” SEmotion, 2019
  • 14. Prompt Design for Zero-shot LLM Reasoners You are a [GitHub/Stack Overflow/JIRA] user. You are reading comments from [GitHub/Stack Overflow/JIRA]. Your task is to detect whether there is one of the following emotions aroused in you while reading the utterance. Emotions List: Anger, Fear, Love, Joy, Sadness, Surprise. Utterance: <insert utterance>. If there is no emotion in the text, write Neutral. Otherwise write exactly one word, the exact emotion from the emotions list.
  • 15. Results (Micro-average F1-score) Model GitHub Stack Overflow JIRA ESEM-E 0.440 0.674 0.744 EMTk 0.434 0.651 0.734 SEntiMoji 0.529 0.721 0.793 BERT 0.588 0.716 0.817 RoBERTa 0.592 0.735 0.818 ChatGPT 0.234 0.339 0.276 GPT-4 0.424 0.293 0.432 flan-alpaca 0.355 0.444 0.256 Fine-tuned BERT and RoBERTa other models Zero-shot LLMs performs badly! We do error analysis to understand why
  • 16. Error Analysis ● Misclassifying one emotion as other, i.e., Love as Joy ● Predicting Neutral "My concern is that more new atributes may appear [...] it may break their behavior." ● Hallucinations: Generated responses that were outside of what asked Apology: "Doh. Sorry for wasting your time."
  • 17. Zero-shot LLMs: Granular Level Prompting ● From the GitHub dataset, sampled 400 utterances from training set and perform prompting ● Designed prompts based on various emotion taxonomies: ○ Using Basic and Secondary emotions (total 36 emotions) ○ Secondary layer only (total 25 emotions) ○ Using all layers of emotions (total 141 emotions) ○ Using GoEmotions taxonomy (total 27 emotions) Output Emotions are mapped to basic emotions GoEmotions taxonomy performed best in F1- score
  • 18. How the Zero-shot LLMs Perform Now ● Output on GitHub Dataset ● Open-source flan-alpaca achieved best zero-shot performance, outperformed GPT-4! Model Anger Love Fear Joy Sadness Surprise Micro avg. BERT 0.506 0.712 0.536 0.579 0.636 0.594 0.588 RoBERTa 0.525 0.683 0.492 0.500 0.613 0.673 0.592 ChatGPT 0.337 0.490 0.182 0.458 0.412 0.511 0.423 flan-alpaca 0.447 0.543 0.140 0.446 0.451 0.740 0.507 GPT-4 0.437 0.698 0.0 0.446 0.487 0.517 0.481 SotA Model GitHub ESEM-E 0.440 EMTk 0.434 SEntiMoji 0.529
  • 19. Zero-shot LLMs for Emotion-Cause Extraction
  • 20. Emotion Cause Extraction ● Emotion cause extraction involves extracting the text span within an utterance that triggers a particular emotion Frustration "I'm feeling frustrated because the code isn't compiling no matter what I try." the code isn't compiling no matter what I try Cause Emotion
  • 21. Emotion Cause Extraction - Challenges Annotation ● Requires understanding nuances in textual communication ● Causes can be implicit ● There can be multiple causes Automatically cause extraction ● Requires large amounts of training data which we lack
  • 22. Zero-shot LLMs for Cause Extraction ● Requires no training to extract causes ● Prompt design is critical ● Use same three models: ○ ChatGPT ○ GPT-4 ○ flan-alpaca
  • 23. Emotion Cause Extraction: Prompt You are a GitHub user. You are reading utterances from GitHub issues and pull requests. Your task is to extract the span that is causing the emotion <insert emotion> in the following GitHub utterance: <insert utterance>. Write the cause of the span within a double quote.
  • 24. Experiment Setup: Annotation ● Manually annotated 450 utterances ○ 75 utterances for each of 6 basic emotions ● Instructions: ○ Extract cause span to associated emotion ○ Allow multiple causes
  • 25. Experiment Setup: Metric ● We use BLEU score as a metric ○ Compares machine-generated text to human references ○ Measures precision of n-gram overlap ● BLEU-2 (bigram) suitable for comparing short texts ● Interpretation: ○ 0.5 - Good fluency and correctness ○ 0.3-0.5 - Comprehensible ○ < 0.3 - Disfluent or incorrect
  • 26. Results ● GPT-4 outperform in each cases ● BLEU-2 score for GPT-4 and flan-alpaca > 0.5 - which indicates they perform reasonably well in correctness Model BLEU-1 BLEU-2 BLEU-3 BLEU-4 ChatGPT 0.522 0.489 0.467 0.450 GPT-4 0.637 0.598 0.571 0.554 flan-alpaca 0.571 0.543 0.525 0.508
  • 27. Error Analysis ● 41 cases where all three model BLEU-2 score < 0.3 ● Two categories of error: ○ Incorrect emotion detection ○ Identifying wrong cause span “Oh right 🙃! This started as a Mac issue, I forgot to add the rest.” Annotation: Neglect (2nd level of Sadness) GPT-4 Detected emotion: Amusement GPT-4 Detected cause span: Oh right 🙃 “[USER] yep, it is bug, we will fix it, so we have it in ‘experiments‘ :+1:” Annotation: Agreement (2nd level of Joy) GPT-4 Detected emotion: Agreement GPT-4 Detected cause span: we will fix it Incorrect emotion Wrong cause span
  • 29. A Case Study on Emotion Cause ● Frustration on Tensorflow Repository using flan-alpaca ● Collected all comments made by developers 1 year period ● Extracted causes when the emotion is Frustration ● Resulted a total of 1275 comments ● Applied DBSCAN clustering on causes Methodology
  • 30. Causes of Frustration ● TensorFlow Version and Dependency Issues ● Pull Request Delays and Merge Conflicts ● Failing Tests ● Too Fine-Grained Commits ● CI Flakiness ● CUDA/CuDNN Compatibility Issues
  • 31. Summary of Contributions ● Utilization of Zero-shot LLMs: Employed zero-shot models like GPT-3.5, GPT-4 and flan-alpaca for detecting emotions and their causes in SE ● Annotated Data: 450 GitHub utterances with Emotion and Causes ● Resource Sharing: Publicly released source code, annotation guidelines, and dataset ● Novel Research: Among the first to explore Emotion Causes in SE ● Open-source Case Study: Demonstrated practical benefits of emotion cause extraction through a case study on a major open-source project Questions/Thoughts/Collaboration Ideas to: Mia Mohammad Imran, imranm3@vcu.edu