SlideShare a Scribd company logo
1 of 23
Crowdsourcing Research Opportunities:
Lessons from Natural Language Processing
  Marta Sabou, Kalina Bontcheva, Arno Scharl
Crowdsourcing
Crowdsourcing




Undefined and generally large group
Crowdsourcing in Science
Crowdsourcing for NLP
Challenges
Crowdsourcing in science – is not new




Sir Francis Galton, “VOX POPULI”



Citizen science, from early 19th century, 60,000 – 80,000 yearly volunteers
Genre 1: Mechanised Labour
 Participants (workers) paid a small amount
  of money to complete easy tasks (HIT =
  Human Intelligence Task)
Genre 2: Games with a purpose
                                From 2008
                                240k players
Crowdsourcing via Facebook
Genre 3: Altruistic Crowdsourcing

                                    >250K players




          >670K players
Crowdsourcing in Science - Typical Use
                       •Harness human
                       intuition to prune
                       solution space




                              Process/               Evaluation
     Input                    Algorithm
                                            Output

•Form based data collection
•Labeling, Classification
•Surveys
Crowdsourcing in Science
Crowdsourcing for NLP
Challenges
Crowdsourcing in NLP
Papers relying on crowdsourcing in major NLP venues
Crowdsourcing Genres in NLP
Benefit 1: Affordable, Large-Scale Resources
 A variety of small-medium sized resources can be
  obtained with as little as 100$ using AMT
 Crowdsourcing is also cost effective for large
  resources (Poesio, 2012)


                             $/label 1 M labels ($)
Traditional High Q.             1       1,000,000
Mechanical Turk                .38   380,000 (<40%)
Game                           .19    217,000 (20%)
Benefit 2: Diversification of research
Challenge 1: Contributor Selection and Training
 From: prior to resource creation
 To: during the resource creation
Challenge 2: Aggregation and Quality Control

 From: a few experts‘ annotations
 To: multiple, noisy annotations from non-experts
 Approach 1: Statistical techniques
   Simplest (and most popular): majority voting
   More complex: Machine learning model trained on
    various features
 Approach 2: Crowdsourcing the QC process itself
            HIT1 (Create):                       HIT2 (Verify):
                                      Which of these 5 sentences is the
  Translate the following sentence:           best translation?
Conclusions (What have we learned from NLP?)

 Crowdsourcing is revolutionalising NLP
  research
   Cheaper resource acquisition
   Diversification of research agenda
 But requires more complex methodologies
   For contributor management
   For quality control and data aggregation
 Other findings: most popular
   Genre: mechanised labour
   Task: acquiring input data
   Problem: solving subjective tasks
Crowdsourcing in Science
Crowdsourcing for NLP
Challenges
User Motivation

 Motivating users
   Motivations for scientific projects might differ

   Task-granularity might impact motivation
 Promoting learning and science
   Advertise STEM research to young people
   Support learning and self-improvement through
    participation in crowdsourcing
Legal and Ethical Issues
 Acknowledging the Crowd‘s contribution
    S. Cooper, [other auhors], and Foldit players: Predicting
     protein structures with a multiplayer online game.
     Nature, 466(7307):756-760, 2010.
 Ensuring privacy and wellbeing
    Mechnised labour criticesed for low wages (,$2/hour),
     lack of worker rights
    Prevent addition, prolonged-use & user exploitation
 Licensing and consent
    Some clearly state the use of Creative Common licenses
    General failure to provide informed consent information
Technical Issues
 Scaling up to large resources
 Preventing bias
 Increasing repeatability
   Through reuse of crowdsourcing elements (e.g., HIT
    templates)
 uComp - Embedded Human Computation for
  Knowledge Extraction and Evaluation
   3 year project, starting November 2012
   Develops a scalable and generic HC framework for
    knowledge creation
   Provides reusable HC elements
Thank you!

More Related Content

Similar to Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Leaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaLeaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaDaniele Mazzei
 
Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...GESIS
 
How to facilitate crowd participation - presentation in ISPIM 2013
How to facilitate crowd participation - presentation in ISPIM 2013How to facilitate crowd participation - presentation in ISPIM 2013
How to facilitate crowd participation - presentation in ISPIM 2013Miia Kosonen
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Academia Sinica
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Christoph Rensing
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentivesElena Simperl
 
Research to Innovation
Research to InnovationResearch to Innovation
Research to Innovationkhargonekar
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...Amit Sheth
 
Crowdsourcing - an overview
Crowdsourcing - an overviewCrowdsourcing - an overview
Crowdsourcing - an overviewMirko Presser
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Matthew Lease
 
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, VietnamTechnology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, VietnamBenCorrigan
 
SSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialSSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialIrene Celino
 
Crowdsourcing: A Survey
Crowdsourcing: A SurveyCrowdsourcing: A Survey
Crowdsourcing: A SurveyIJERA Editor
 
Overview of Data Science and AI
Overview of Data Science and AIOverview of Data Science and AI
Overview of Data Science and AIjohnstamford
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 

Similar to Crowdsourcing Research Opportunities: Lessons from Natural Language Processing (20)

Leaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaLeaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di Pisa
 
Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...
 
How to facilitate crowd participation - presentation in ISPIM 2013
How to facilitate crowd participation - presentation in ISPIM 2013How to facilitate crowd participation - presentation in ISPIM 2013
How to facilitate crowd participation - presentation in ISPIM 2013
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentives
 
Research to Innovation
Research to InnovationResearch to Innovation
Research to Innovation
 
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...
 
David Rejeski: The Synthetic Biology Startup Ecosystem in the US
David Rejeski: The Synthetic Biology Startup Ecosystem in the USDavid Rejeski: The Synthetic Biology Startup Ecosystem in the US
David Rejeski: The Synthetic Biology Startup Ecosystem in the US
 
Crowdsourcing - an overview
Crowdsourcing - an overviewCrowdsourcing - an overview
Crowdsourcing - an overview
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, VietnamTechnology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
Technology in the Wild: Dynamics and Uncertainty in Field Experiments, Vietnam
 
SSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialSSSW 2016 Cognition Tutorial
SSSW 2016 Cognition Tutorial
 
Crowdsourcing: A Survey
Crowdsourcing: A SurveyCrowdsourcing: A Survey
Crowdsourcing: A Survey
 
Overview of Data Science and AI
Overview of Data Science and AIOverview of Data Science and AI
Overview of Data Science and AI
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 

Recently uploaded

Turkey Vs Georgia Vincenzo Montella's Squad Selection for Turkey's Euro 2024 ...
Turkey Vs Georgia Vincenzo Montella's Squad Selection for Turkey's Euro 2024 ...Turkey Vs Georgia Vincenzo Montella's Squad Selection for Turkey's Euro 2024 ...
Turkey Vs Georgia Vincenzo Montella's Squad Selection for Turkey's Euro 2024 ...Eticketing.co
 
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxJORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxArturo Pacheco Alvarez
 
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...World Wide Tickets And Hospitality
 
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxItaly Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxWorld Wide Tickets And Hospitality
 
Project & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEProject & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEDeShawn Ellis
 
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 GACOR
 
Clash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfClash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfMuhammad Hashim
 
PPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports RivalryPPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports Rivalryanirbannath184
 
Benifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxBenifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxsherrymieg19
 
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptxBADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptxvillenoc6
 
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptxPGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptxaleonardes
 

Recently uploaded (12)

Turkey Vs Georgia Vincenzo Montella's Squad Selection for Turkey's Euro 2024 ...
Turkey Vs Georgia Vincenzo Montella's Squad Selection for Turkey's Euro 2024 ...Turkey Vs Georgia Vincenzo Montella's Squad Selection for Turkey's Euro 2024 ...
Turkey Vs Georgia Vincenzo Montella's Squad Selection for Turkey's Euro 2024 ...
 
NATIONAL SPORTS DAY WRITTEN QUIZ by QUI9
NATIONAL SPORTS DAY WRITTEN QUIZ by QUI9NATIONAL SPORTS DAY WRITTEN QUIZ by QUI9
NATIONAL SPORTS DAY WRITTEN QUIZ by QUI9
 
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxJORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
 
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
 
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxItaly Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
 
Project & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEProject & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWE
 
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
 
Clash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfClash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdf
 
PPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports RivalryPPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports Rivalry
 
Benifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxBenifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptx
 
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptxBADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
 
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptxPGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
 

Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

  • 1. Crowdsourcing Research Opportunities: Lessons from Natural Language Processing Marta Sabou, Kalina Bontcheva, Arno Scharl
  • 5. Crowdsourcing in science – is not new Sir Francis Galton, “VOX POPULI” Citizen science, from early 19th century, 60,000 – 80,000 yearly volunteers
  • 6. Genre 1: Mechanised Labour  Participants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)
  • 7. Genre 2: Games with a purpose From 2008 240k players
  • 9. Genre 3: Altruistic Crowdsourcing >250K players >670K players
  • 10. Crowdsourcing in Science - Typical Use •Harness human intuition to prune solution space Process/ Evaluation Input Algorithm Output •Form based data collection •Labeling, Classification •Surveys
  • 12. Crowdsourcing in NLP Papers relying on crowdsourcing in major NLP venues
  • 14. Benefit 1: Affordable, Large-Scale Resources  A variety of small-medium sized resources can be obtained with as little as 100$ using AMT  Crowdsourcing is also cost effective for large resources (Poesio, 2012) $/label 1 M labels ($) Traditional High Q. 1 1,000,000 Mechanical Turk .38 380,000 (<40%) Game .19 217,000 (20%)
  • 16. Challenge 1: Contributor Selection and Training  From: prior to resource creation  To: during the resource creation
  • 17. Challenge 2: Aggregation and Quality Control  From: a few experts‘ annotations  To: multiple, noisy annotations from non-experts  Approach 1: Statistical techniques  Simplest (and most popular): majority voting  More complex: Machine learning model trained on various features  Approach 2: Crowdsourcing the QC process itself HIT1 (Create): HIT2 (Verify): Which of these 5 sentences is the Translate the following sentence: best translation?
  • 18. Conclusions (What have we learned from NLP?)  Crowdsourcing is revolutionalising NLP research  Cheaper resource acquisition  Diversification of research agenda  But requires more complex methodologies  For contributor management  For quality control and data aggregation  Other findings: most popular  Genre: mechanised labour  Task: acquiring input data  Problem: solving subjective tasks
  • 20. User Motivation  Motivating users  Motivations for scientific projects might differ  Task-granularity might impact motivation  Promoting learning and science  Advertise STEM research to young people  Support learning and self-improvement through participation in crowdsourcing
  • 21. Legal and Ethical Issues  Acknowledging the Crowd‘s contribution  S. Cooper, [other auhors], and Foldit players: Predicting protein structures with a multiplayer online game. Nature, 466(7307):756-760, 2010.  Ensuring privacy and wellbeing  Mechnised labour criticesed for low wages (,$2/hour), lack of worker rights  Prevent addition, prolonged-use & user exploitation  Licensing and consent  Some clearly state the use of Creative Common licenses  General failure to provide informed consent information
  • 22. Technical Issues  Scaling up to large resources  Preventing bias  Increasing repeatability  Through reuse of crowdsourcing elements (e.g., HIT templates)  uComp - Embedded Human Computation for Knowledge Extraction and Evaluation  3 year project, starting November 2012  Develops a scalable and generic HC framework for knowledge creation  Provides reusable HC elements

Editor's Notes

  1. How does crowdsourcing relate to Research 2.0.? My talk will illustrate how certain web technologies can reduce the gap between scientists on one hand, and ordinary citizens on the other – thus enabling a certain form of research 2.0. If Web2.0 is often associate to “user generated content”, research 2.0, at least the one enabled by crowdsourcing, is “user generated/supported science”. Taking the field of NLP as an example, I will discuss how crowdsourcing is changing research practices and its effect on this scientific discipline. Research 2.0 deals with the involvement of the web in science. It spans from the utilization of Web 2.0 tools and technologies in research to a more open and sharing approach to science. Some definitions of Research 2.0 even include notions of a methodological change due to the abundance of data, and the nature of the socio-technical systems on the web. The change in scientific practices due to the involvement of Research 2.0 tools and technologies in the research process and the effects this has on science itself.
  2. But not projects that: Do not have the creation of scientific data as their main goal (e.g., Wikipedia) Use crowds to support auxiliary scientific processes (e.g., Mendeley) Recruit online but experiment in lab Recruit processing power and NOT human effort (SETI@home) Have as contributors scientific stuff alone, e.g., collaboratories
  3. But not projects that: Do not have the creation of scientific data as their main goal (e.g., Wikipedia) Use crowds to support auxiliary scientific processes (e.g., Mendeley) Recruit online but experiment in lab Recruit processing power and NOT human effort (SETI@home) Have as contributors scientific stuff alone, e.g., collaboratories
  4. In fact, already in 1907, Sir Francis Galton, (Darwin‘s cousin, A brilliant Victorian scientist,) has published a Nature article entitled „VOX Populi“ (or the voice of the people, the voice of the crowd), where he discribes his experiment at a lifestock fair: 787 persons were asked to estimate the weight of the ox, and, while none came close to the real value, the mean of the guesses was almost spot-on. Meanwhile, some other societies were using the crowd differently, namely, to support them in gathering scintific data. From the early 19th century, the Aubodon society has been relying on volunteers to count species of local birds. Their campaings continue to this date, and in 2012, volunteers submitted over 100, 000 ch ecklists leading to observations about 623 specied and over 17 million individual birds. These activities are often termed as citizen science. This is not a novel phenomenon Citizen science projects around since the beginning of last century (at least) There is a vast landscape and variety of citizen science projects where scientists call on the public for help - some examples, including from Lora‘s paper (her talk might have some mentions as well) IT enables virtual citizen science projects and this upsurge is a direct consequence of new and improved ways to involve the public into scientifc procecess
  5. Participants contribute while having fun 13 Apr 2012 | 16:35 EDT | Posted by Rebecca Hersher: Two years ago, FoldIt made headlines, lots of them, when players of the online protein-folding video game took three weeks to solve the three dimensional structure of a simian retroviral protein that is used in animal models of HIV, but whose structure had eluded biochemists for more than a decade. “: http://blogs.nature.com/spoonful/2012/04/foldit-games-next-play-crowdsourcing-better-drug-design.html Phylo is an experimental video game about multiple sequence alignment optimisation. “Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered.” It is about showing that humans can aid algorithms rather than comparing human and machine performance.
  6. In 2008, the group built a FB game that required players to rate the sentiment associated to a sentence on a 5-values scale, then used this as atraining corpus for the sentiment detection module. Over 800 player played the game. In 2009 the game has been released in a slightly different form and with the aim to gather sentiment lexicons, i.e., associations between words and their sentiment polarity (ratings from as many as 12 players were averaged to get the final value). The game ran in 7 different languages and attracted over 4000 players. Let this be an introductory example of a crowdsourcing project, however, crowdsourcing is a not a new phenomenon.
  7. Volunteer contributes because he is interested in a domain, supports a cause
  8. More languages E.g., Urdu, Arabic, Hitian Creole Irvine and Klementiev create lexicons between English and 37 low resourced languages Diverse types of text (besides news-wire) Emails, twitter feeds, augmented and alternative communication texts Speech: transcription, accent rating, assessment of dialog systems Subjective tasks Sentiment detection, translation, word sense disambiguation, anaphora resolution, question answering, textual entailment, text summarization …. Niche language phenomena Lab experiments reproduced at a fraction of their cost E.g., contextual predictivity (Cloze task), corpus trends
  9. Completely new wrt traditional approaches Uses „create-verify“ workflows Widespred technique for translation tasks, less for labeling
  10. STEM (Science, Technology, Engineering, Mathematics) Harness increased visability and ease of engagement in social networks to make STEM research more attractive and understandable =&gt; more young people to study STEM
  11. STEM (Science, Technology, Engineering, Mathematics) Harness increased visability and ease of engagement in social networks to make STEM research more attractive and understandable =&gt; more young people to study STEM
  12. STEM (Science, Technology, Engineering, Mathematics) Harness increased visability and ease of engagement in social networks to make STEM research more attractive and understandable =&gt; more young people to study STEM