The best thing about open source projects is that you have all of your community data in the public at your fingertips. You just need to know how to gather the data about your open source community so that you can hack it all together to get something interesting that you can really use. We’ll start with some general guidance for coming up with a set of metrics that makes sense for your project. The focus of the session will be on tips and techniques for collecting metrics from tools commonly used by open source projects: Bugzilla, MediaWiki, Mailman, IRC and more. It will include both general approaches and technical details about using various data collection tools, like mlstats. The final section of the presentation will talk about techniques for sharing this data with your community and highlighting contributions from key community members. For anyone who loves playing with data as much as I do, metrics can be a fun way to see what your community members are really doing in your open source project.
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Measuring Open Source Community Participation
1. Open Source Community Metrics
Tips and Techniques for Measuring Participation
Open Source Bridge
June 2011
Dawn M. Foster
MeeGo Community Manager at Intel
@geekygirldawn
dawn@fastwonder.com
fastwonderblog.com
meego.com
2. Stuff I'll Talk About
● What, why and example metrics from MeeGo
● Coming up with the right metrics
● Tips and techniques for collecting metrics
● Sharing metrics and highlighting community members
Photo: http://www.flickr.com/photos/falcifer/3136673599
2
3. Community Definition
● Community includes all of the people who work on the project
● Product contributors: kernel / distribution developers, release
managers, quality assurance, localization, etc.
● App developers: writing applications
● Users: people who run your software and provide feedback
● Vendors: companies creating products based on your project
● Other contributors: promotion, moderation, documentation and more
Some people contribute as part of their employment at companies,
while others contribute free time. The community includes all of the
people who are working on your project.
3
4. Metrics are Useful for Open Source Projects
● Measure progress in your community over time
● Who contributes
● Where are people contributing
● Spot trends
● Gauge interest
● Learn more about key contributors
● Recognize contributions
4
6. Example: April MeeGo Community Metrics Summary
• 3,534,575 unique people have visited MeeGo.com (cumulative total)
• 295,992 unique people visited this month (333,293 last month).
• 22,914 people are members of MeeGo.com (was 21,823 last month)
• Dev ML subscribers = 4983; Community = 3929; iL10N = 2871; SDK = 3313
• Mailing Lists: 4891 posts this month; 220 people posted 2+ msgs
• Forums: 862 posts. 123 people posted 2+ messages
• New Bugs Created: 1757; Bugs Resolved: 2988
• 1.1 Downloads: 39,044 Netbook, 4171 Tablet, 3346 IVI, 2699 N900
• Active Users: Estimated at 800 – 1000 people.
• Mailing Lists: 343 people with unique email addresses posted (367 last month)
• Forums: 229 people posted at least one item (281 last month)
• Bugzilla: 716 people performed some action (552 last month)
• IRC: 410-500 people logged into #meego simultaneously most days
http://wiki.meego.com/Metrics
6
7. What are the Right Metrics for YOUR Project
● Goals
● What are your overall goals for the project?
● How can you measure progress toward those goals?
● What is important to you and your progress?
● Trends
● What should you measure to recognize trends?
● How do you recognize when something is going wrong?
● Do you notice big improvements?
Note: I measure way too much
7 http://www.flickr.com/photos/bandfan/5548675317/
8. Mailing Lists: mlstats
Mailing List Stats is a command line tool used to analyze
mailing list archives. It downloads the archives, places
them in a directory and stores all the information contained
in each mailing list post into a database
http://libresoft.es/tools/mlstats
8
9. Mailing Lists: mlstats
● Grab data from your mailing & store in db (repeat per ML)
– /mlstats --db-user=user --db-password=pw
http://lists.meego.com/pipermail/meego-community/
● Top Content Query
– select subject,monthname(first_date) as m,count(*) as c from
messages where month(first_date)=$MONTH and
year(first_date)=$YEAR group by subject, month(first_date)
order by m, c;
● Top Poster Query
– select p.email_address,year(m.first_date) as y,
monthname(m.first_date),count(*) as c from messages as
m,messages_people as p where
m.message_id=p.message_ID and
month(first_date)=$MONTH and year(first_date)=$YEAR
group by p.email_address, month(m.first_date) order by y,
month(m.first_date), c;
9
10. Mailing Lists: Top Content Result (graphed)
What are people talking about?
10
12. IRC: irssistats
Generates IRC stats for active people, by hour of the day,
by day, most used words, quotes and more.
http://royale.zerezo.com/irssistats/
12
16. Bugs
● New bugs vs. resolved bugs
● Can't just look at monthly trends
● Need to take release cycle into account
● Before release: more resolved bugs
● After release: more new bugs
● Participants
● People who file new bugs
● Participate in bugs (comment, etc.)
● Careful with people who resolve bugs (usually QA)
Image: http://www.thegeekstuff.com/2010/05/install-bugzilla-on-linux/
16
19. Media Wiki
● Get Statistics
● http://wiki.meego.com/Special:Statistics
● wget "http://wiki.meego.com/api.php?
action=query&meta=siteinfo&siprop=statistics&format=ya
mlfm"
19
20. Website: Google Analytics
Nokia Announcement
Roadmaps Mobile World Congress
Source: Google Analytics
(excludes wiki prior to Dec 21)
20
21. Automate
● My less than elegant method
● Giant bash script
● Uses wget, awk, mysql queries, etc.
● Dumps a bunch of csv files on my hard drive
● A better dashboard approach (WIP)
● Open source metrics dashboard
● Uses Pentaho for reporting, runs regularly and produces a
dashboard anyone can view at any time
● Will be finished in the next couple of months
● http://wiki.meego.com/Metrics/Dashboard
21
22. Now What?
● Report Regularly
● Monthly – may be too often
● Quartely? Yearly?
● Share
● Share the reports with the
community
● http://wiki.meego.com/Metrics
● Recognize
● Recognize your top contributors
http://www.flickr.com/photos/play4smee/2439494411/
22
23. Dawn Foster
MeeGo Community Manager for Intel
@geekygirldawn
dawn.m.foster@intel.com
http://www.flickr.com/photos/tlk/5630885373/
24. Credits
Thank you to the many people who have contributed to
the metrics
● Dave Neary for many helpful tips & for providing a lot of
help with mailing list stats (mlstats)
● Reggie Suplido for automating forum stats:
http://forum.meego.com/stats/
● Carsten Munk for the IRC stats
● Stephen Gadsby for the bug jars:
http://www.octofish.net/meegobugjar/
● Mike Shaver for a variety of help
● Arjan Van De Ven for some Perl magic
● Adam Gretzinger for providing download data
24