SlideShare a Scribd company logo
1 of 9
Download to read offline
EXTRACTING DATA FROM
YOUR OPEN SOURCE
COMMUNITIES
Dawn M. Foster
@geekygirldawn	
  
dawn@fastwonder.com	
  
fastwonderblog.com
PhD	
  Student,	
  University	
  of	
  Greenwich	
  
Consultant,	
  The	
  Scale	
  Factory
WHOAMI
• Geek, traveler, reader
• 20 year tech career. Past 15
years doing community & open
source (Intel, Puppet Labs, etc.)
• PhD student at University of
Greenwich researching Linux
kernel
• Community and open source
consultant at The Scale Factory
Photos by Josh Bancroft, Don Park
I 💖 METRICS GRIMOIRE
MailingListStats aka MLStats
CVSAnalY - repos
Bicho - bugs
More
Photo by Bitergia
http://metricsgrimoire.github.io/
MLSTATS AND CVSANALY
a) Install
$ python setup.py install
b) Create database
mysql> create database mlstats;

mysql> create database cvsanaly;
c) Import data
$ mlstats http://URLOFYOURLIST

$ cvsanaly2 /path/to/repo
MLSTATS: EXTRACT DATA
Top 100 messages (most replied to threads):
SELECT subject, COUNT(*) as total 

FROM messages 

GROUP BY subject 

ORDER by total DESC 

LIMIT 100;
Other queries:

# of messages from a specific person

# of messages per person from email domain

Find all messages with specific word in subject line (patch)
CVSANALY: EXTRACT DATA
Number of commits per person by email domain:
SELECT p.name, p.email, 

COUNT(distinct(s.id)) as num_commits 

FROM people p, scmlog s 

WHERE email like "%company.com" 

AND p.id=s.author_id 

GROUP BY email

ORDER BY num_commits DESC;
Other queries:

Top commit authors all time

# of commits for specific person
OTHER GRIMOIRE OPTIONS
Bug data
Wikis
IRC
Aggregate across tools
Photo by Bitergia
GOURCE
Visualize repository data using Gource

http://gource.io/
Dawn Foster
PhD student, University of Greenwich
Consultant, The Scale Factory
@geekygirldawn, dawn@dawnfoster.com
fastwonderblog.com
THANK YOU

More Related Content

More from Dawn Foster

Measuring Project Health at VMware
Measuring Project Health at VMwareMeasuring Project Health at VMware
Measuring Project Health at VMwareDawn Foster
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source RiskDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Dawn Foster
 
Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Dawn Foster
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesDawn Foster
 
Being a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceDawn Foster
 
Building Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsBuilding Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsDawn Foster
 
Building Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectBuilding Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectDawn Foster
 
How to be a terrible hiring manager
How to be a terrible hiring managerHow to be a terrible hiring manager
How to be a terrible hiring managerDawn Foster
 
A week in the Life of Kubernetes
A week in the Life of KubernetesA week in the Life of Kubernetes
A week in the Life of KubernetesDawn Foster
 
Open Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceDawn Foster
 
Strategies to Balance the Needs of the Company and the Community
Strategies to Balance the Needs  of the Company and the CommunityStrategies to Balance the Needs  of the Company and the Community
Strategies to Balance the Needs of the Company and the CommunityDawn Foster
 
Being a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceDawn Foster
 
Open Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceDawn Foster
 
Building a Community Metrics Strategy FOSDEM 2019
Building a Community Metrics Strategy FOSDEM 2019Building a Community Metrics Strategy FOSDEM 2019
Building a Community Metrics Strategy FOSDEM 2019Dawn Foster
 
Open Source Collaboration: Finding the right balance
Open Source Collaboration: Finding the right balanceOpen Source Collaboration: Finding the right balance
Open Source Collaboration: Finding the right balanceDawn Foster
 

More from Dawn Foster (20)

Measuring Project Health at VMware
Measuring Project Health at VMwareMeasuring Project Health at VMware
Measuring Project Health at VMware
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source Risk
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?
 
Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in Kubernetes
 
Being a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open Source
 
Building Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsBuilding Community for your Company’s OSS Projects
Building Community for your Company’s OSS Projects
 
Building Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectBuilding Community for your Company’s OSS Project
Building Community for your Company’s OSS Project
 
How to be a terrible hiring manager
How to be a terrible hiring managerHow to be a terrible hiring manager
How to be a terrible hiring manager
 
A week in the Life of Kubernetes
A week in the Life of KubernetesA week in the Life of Kubernetes
A week in the Life of Kubernetes
 
Open Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right Balance
 
Strategies to Balance the Needs of the Company and the Community
Strategies to Balance the Needs  of the Company and the CommunityStrategies to Balance the Needs  of the Company and the Community
Strategies to Balance the Needs of the Company and the Community
 
Being a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open Source
 
Open Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right Balance
 
Building a Community Metrics Strategy FOSDEM 2019
Building a Community Metrics Strategy FOSDEM 2019Building a Community Metrics Strategy FOSDEM 2019
Building a Community Metrics Strategy FOSDEM 2019
 
Open Source Collaboration: Finding the right balance
Open Source Collaboration: Finding the right balanceOpen Source Collaboration: Finding the right balance
Open Source Collaboration: Finding the right balance
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Extracting Data from your Open Source Communities

  • 1. EXTRACTING DATA FROM YOUR OPEN SOURCE COMMUNITIES Dawn M. Foster @geekygirldawn   dawn@fastwonder.com   fastwonderblog.com PhD  Student,  University  of  Greenwich   Consultant,  The  Scale  Factory
  • 2. WHOAMI • Geek, traveler, reader • 20 year tech career. Past 15 years doing community & open source (Intel, Puppet Labs, etc.) • PhD student at University of Greenwich researching Linux kernel • Community and open source consultant at The Scale Factory Photos by Josh Bancroft, Don Park
  • 3. I 💖 METRICS GRIMOIRE MailingListStats aka MLStats CVSAnalY - repos Bicho - bugs More Photo by Bitergia http://metricsgrimoire.github.io/
  • 4. MLSTATS AND CVSANALY a) Install $ python setup.py install b) Create database mysql> create database mlstats;
 mysql> create database cvsanaly; c) Import data $ mlstats http://URLOFYOURLIST
 $ cvsanaly2 /path/to/repo
  • 5. MLSTATS: EXTRACT DATA Top 100 messages (most replied to threads): SELECT subject, COUNT(*) as total 
 FROM messages 
 GROUP BY subject 
 ORDER by total DESC 
 LIMIT 100; Other queries:
 # of messages from a specific person
 # of messages per person from email domain
 Find all messages with specific word in subject line (patch)
  • 6. CVSANALY: EXTRACT DATA Number of commits per person by email domain: SELECT p.name, p.email, 
 COUNT(distinct(s.id)) as num_commits 
 FROM people p, scmlog s 
 WHERE email like "%company.com" 
 AND p.id=s.author_id 
 GROUP BY email
 ORDER BY num_commits DESC; Other queries:
 Top commit authors all time
 # of commits for specific person
  • 7. OTHER GRIMOIRE OPTIONS Bug data Wikis IRC Aggregate across tools Photo by Bitergia
  • 8. GOURCE Visualize repository data using Gource
 http://gource.io/
  • 9. Dawn Foster PhD student, University of Greenwich Consultant, The Scale Factory @geekygirldawn, dawn@dawnfoster.com fastwonderblog.com THANK YOU