Eleanor Rusack (UNITAR) presenting GeoTag-X, a crowdsourcing application for humanitarian crises, at the Citizen Cyberlab Summit, 17-18 September 2015, University of Geneva (UNIGE).
2. WHAT IS UNOSAT?
Operational Satellite Applications Programme of the United
Nations Institute for Training and Research (UNITAR)
Entirely dedicated to satellite imagery analysis, geospatial
information technologies and capacity development
Provide satellite imagery-based products and services in
support of the international humanitarian operations
Geneva (powered by CERN IT), N’Djamena, Nairobi,
Bangkok
50% 50%
3. 3
Limitations of satellite imagery
Limitations of satellite imagery
• Angle of view only from above
• Sometimes hard to get clear data
• Weather
Credits: Huda.Sy http://www.panoramio.com/user/7127417Satellites can’t see what is happening on the
ground
4. Geotag-X ?
• Harvest photos coming out of a
disaster
• Analyse photo to produce structured
and relevant data of what is
happening on the ground
• Share knowledge of experts with
crowd to perform analysis
Source: National Geographic
5. Workflow and tools for analysis of media
Experts and
Project
Leaders
Field experience/
surveys
Projects/
Tutorials
GeoTag-X
Analysts
Photo collection
tools
Photos
Analysis
Structured/
relevant data
Disaster managers, NGO’s,
UN etc
Chrome extension
Flickr
Twitter
UN-Asign
Epicollect
geotagx.org
11. Participants: Experts and Project Leaders
• Experts (professionals, academics etc) in relevant topics
• What data can they extract from media?
• Develop structured analysis and tutorials
• Analysis translated into projects on GeoTag-X, put to crowd
Participants: GeoTag-X Analysts
• Anyone, with/without relevant skills
• GeoTag-X provides tutorials, hints, links to relevant info etc to
get them started.
• Self-learning
12. Engagement, promoting project
• Social Media: Twitter account + strong links with partner
accounts eg UNOST, UNITAR, Cyberlab, Mozilla
• Blogs: http://geotagx.org/geotagx/blogs
• On SciStarter, guest blogs (Mozilla science, SciStarter, Discover
Mag)
• Links with existing online volunteer groups (GISCorps,
Humanity Road, Mozilla Science, UNV)
• Events: presentations, thinkcamps/hackdays, data sprints
13. Engagement, promoting project
Mozilla Global Sprint
Data sprint 1
Data sprint 2
Data sprint 2: Yemen Stats
- 123 Analysts
- 3000+ tasks
- 529 new photos
Data sprint 1
- 84 new Analysts
- 7000+ tasks
- 3 projects completed
Mozilla Global Sprint:
Winter Shelter
- 36 new Analysts
(doubled total)
- 420 tasks
14. What types of questions can be answered in
GeoTag-X
Do you see shelter in this photo?
Yes No Don't know Not clear
Image 1 25 2 2 1
Image 2 10 15 5 0
Image 3 20 2 3 0
Image 4 3 5 0 22
Image 5 7 18 4 1
If volunteers consistently agree in their
answers = question easy
17. GeoTag-X: Questions
How to move questions up the curve?
• Better training?
• How to train large numbers of Analysts when project
leaders have limited time?
• Better photos?
Thailand Floods 2011. Source: UN-ASIGN
18. Palais des Nations
1211 Geneva 10
Switzerland
T +41 22 917 8400
F +41 22 917 8047
www.unitar.org
United Nations Institute for Training and Research
Institut des Nations Unies pour la Formation et la Recherche
Instituto de las Naciones Unidas para Formación Profesional e Investigaciones
Учебньıй и научно-исследовательский институт
Организации Объединенньıх Наций
والبحث للتدريب المتحدة األمم معهد
联合国训练研究所
18
Thanks for your attention! Any questions?
Eleanor Cervigni
eleanor.rusack@unitar.org
geotagx.org
www.unitar.org/unosat
Editor's Notes
UNOSAT is the Operational Satellite Applications Programme of the United Nations Institute for Training and Research. Our two core activities are satellite imagery analysis and mapping for international humanitarian operations, and training and capacity development in GIS and satellite imagery for humanitarian operations. We have a team of around 25 with offices at CERN in Geneva, where we make use of powerful IT infrastructure, N’Djamena in Chad, Nairobi, and Bangkok.
So what are some of the limitations of satellite imagery in a humanitarian context? We can only get a birds eye view of the situation. If there is damage to the side of the building, we wont see it. This top photo shows building damage in Syria, because the damage is on the side of the building, unless the roof is also damaged, this building will appear undamaged in satellite imagery. Sometimes it is also hard to get a clear picture of what is happening on the ground, like during the thailand floods – the signature of the water was impossible to distinguish from that of the urban area, This area here under the flood is bangkok, which was severly flooded during the event, but we could not pick that up in the imagery because it is a dense urban area, this is why the analysis stops at the edge of the city. And the weather… clouds block the view of certain types of satellites. Aside from these problems, we cant see with a satellite image what is really happening on the ground – who is there? What conditions are the population living in etc.
So we have begun working on a project that aims to extract relevant and useful information from the large amounts of media coming out of a disaster. This project was initially developed as a way of filling in these gaps in satellite imagery data. However the data produced in this way can also complement other existing data sources like field survey assessments and we have been specifically aiming to do this.
The premise behind our idea of photo analysis is that a knowledgeable expert can you us a lot about what's happening in a photo: take this one for instance, experts with the relevant knowledge could tell us about for instance the ethnicity of the people, their likely level of caloric intake, water quality and availability, environmental health, infrastructure, and so on. This can be captured as data points, and in GeoTag-X we have been attempting to transfer these skills to the crowd so that they can start parsing thousands of photos and converting them to data to help better understand what is happening in a disaster situation.
This is the workflow and tools we have set up to achieve this. Have built the GeoTag-X platform by adding customisations to PyBossa, along with a set of tools for collecting photos from a variety of sources. Including Flickr, Twitter, for collecting photos in the field we have linked the platform with Un-Asign and Epicollect, and finally we developed an extension for chrome that allows people to send in photos while they are browsing the internet. Over the last few years we have been working with different organisations and individuals to develop pilot projects on the platform, into which the photos are ingested and then presented to the GeoTag-X community for analysis. Analysis questions in the projects are closely linked to other means of collecting data in humanitarian crises like field survey questionnaires to make it easy for humanitarian organisations to integrate the data produced into their workflow.
So I think the easiest way to see how it works is to have a go yourselves. So if you could all go to geotagx.org written up here.
Once the photos have been collected, they are pushed into the different categories on GeoTag-X, from which they are displayed in the different projects for analysis. Here we have the different categories (or events) that we are currently working on
Each category will have a set of associated projects. Most have a geotagging application, along with a variety of topic specific applications. In this particular project we are looking at emergency shelter assessment in the middle east and we have the geotagging project along with a project looking at whether or not shelters shown in the photos are prepared for winter. All photos collected for the emergency shelter assessment category will be displayed in both of these projects.
If we click on the project « are shelters prepared for winter» we get taken to its page, from where you can read what it is trying to do (info), we can download the photo list and the results (tasks), we can try the tutorial, get statistics about the responses, and read the blog. Clicking on the big play button starts the analysis.
If you havnt already completed it, you will be asked to take the tutorial. Here we have chosen a photo and worked with the experts to define correct answers to each question. If you are not sure what you are looking for you can click on the help to get a detailed explanation with examples.
Once the tutorial is finished you will be taken into the analysis. This project has only three questions «is the shelter raised off the ground?» «Does the shelter have a seconf cover to protect it from the rain?» and «is there space to put a chimney safely inside the shelter?» If you are stuck at anytime you can click on the help here to get the explanation and examples. Each photo usually takes around 1-3 minutes to analyse, and thats it. In order to get a good dataset we need about 30 responses for each photo, so that is 30 pairs of eyes looking at each photo.
So who participates in GeoTag-X? We have three main groups of participants: The first two are what we call experts and project leaders. Project leaders are individuals who come to us with a project idea that they think could work on GeoTag-X. experts are people with relevant expertise who provide input in the development of the projects. These are the people with the knowledge that we want to transfer to the crowd. We want to know what they can see in a photo that you or I might not notice and turn that into structured analysis and tutorials on geotag-x following the needs of project leaders.
The third group are our GeoTag-X Analysts: these are the people who do the work. They can be anyone with or without relevant skills. GeoTag-X provides tutorials, hints and links to relevant information and useful tools with which they can learn to do the analyses in the different projects.
How do we connect with the different particpants? We have a variety of channels including a Twitter account on which we post regular updates, thank our volunteers, and launch new projects. We also link with the social media accounts of other, bigger organisations like UNOSAT, UNITAR, Cyberlab, and Mozilla. We don’t have many followers and these links help us reach a larger audience than otherwise possible.
We write regular blogs talking about projects, events we ran or particpated in, and answering questions that we get from participants. WE have written guest blogs for other sites, and have been a featured project on SciStarter.
Our most successful outreach however has been by linking in with existing online volunteering groups like GISCorps, who have more than 6000 volunteers with an interest or skills in GIS, Humanity Road, Mozilla Science, and the United nations volunteering online volunteering platform. These groups have reall helped us build our community on GeoTag-X, I will show you some graphs in a second demonstrating this.
We also present and participate in different events. We find that events generally do not lead to much site activity however they are great for networking and connecting with people who have project ideas. The Emergency Shelter Assessment in the Middle East, Yamuna monsoon flooding came out of the Port Hackathon and the Citizen Cyberscience Summit respectively.
Finally, our data sprints have proven invaluable in getting activity on the site that translates into data collection and useful feedback.
We have run three datasprints this year. The first was as a part of the Mozilla Global Sprint. In this case we were looking for a wider audience to test the platform and projects, provide feedback and start collecting data in the Emergency Shelter Assessment in the Middle East. This was much more successful than we were expected. Mozilla science and the CMS experiment at CERN helped push the project via social media and we ended up doubling the number of active analysts on the site and getting through 420 tasks in the shelter projects. Most importantly we got lots of feedback that helped us improve the platform and projects in time for the next data sprint.
The next two data sprints were run by us in collaboration with the different online volunteering organisations. You can see in these graphs the effects of the datasprints on site activity and the number of new analysts registering on the platform. Large peaks in users on the site can be seen starting just before the events and lasting for the duration of the event. This activity clearly translated into new accounts on GeoTag-X as you can see by the peaks in new accounts in the top graph, as well as a lot of data being collected in the projects.
So we’ve manged to get people on the site and start collecting data, but how good is that data? Were the analysts able to answer the questions we asked of them? Were the tutorials and other hints, links etc enough for them to learn how to recognise the relavant information in the photos?
We’ve done some exploratory stats to start quantifying how difficult the different questions were for Analysts to answer. We calculated Fleiss Kappa for each question. This is a measure of how well different analysts agreed in their classification of the different photos. So here we have the table of responses across different images for a particular question. Analysts could place each photo into one of four classifications – yes, no, don’t know, and not clear. We calculate the kappa using this data and it gives us an idea as to how well the different analysts agreed in their classifications for this particular question.
We ended up with an aggregated kappa for each questions, and a kappa for each question in each image. Because we needed to see not only how difficult the questions were, but how the difficulty of photos impacted on the ability of analysts to answer a question.
From these calculations we ended up with this graph, which shows the mean kappa for each question (calculated from the kappa from each image) plotted against the standard deviation. There is a clear curve in the data, with a low mean kappa associated with a low standard deviation, medium mean kappa associated with a high standard deviation, and a high mean kappa associated with a low standard deviation. From this graph we have defined three groups of questions, the first on the bottom left are questions that the analysts consistently showed little agreement in their answers. The second group are questions where in some photos analysts showed high agreement, in their answers, but in others very little. The final group are those that analysts consistently agreed in their answers to the questions.
What we think is that questions in group one are consistently hard for analysts to answer, either because the photo set itself is not clear, the analysts don’t have the skills to answer the question, or the question just cant be answered by looking at a photo.
Group two are questions for which the analysts have some skills to answer but sometimes they come across a photo that is difficult or not clear and this throws them and they don’t know how to answer
Group three are questions that analysts have the skills to answer generally regardless of the photo.
We compared these results with what the volunteers were telling us in their feedback, and what we could see in the data being collected. Here are a couple of examples, photo one would be considered a hard photo to analyse, photo two an easy one. Do you see shelter fell into group two, sometimes they agreed well, othertimes not so well. You can see in the second photo they all agreed, however in the first photo they didn’t know what to answer. Volunteers were telling us that there were photos like this with blown over tents and they didn’t know what to answer in these cases. We probably could have moved this question into group three is we had pre-empted these photos and explained to volunteers what they should answer.
Answers to the last question “is there space to put a chimney safely in the shelter?” never show agreement, and this question was right at the bottom lefthand corner of the graph. This question is one that cant be answered from looking at a photo because you can never see all sides of the tent and so can never know for sure if there is a chimney or space for a chimney or not.
What we have noticed is that for some questions volunteers really lack the skills necessary to answer them successfully which indicates that the tutorials and tools on geotag-x arnt always sufficient and we need some more in-depth training. So these are some of the questions I want to look at during the break out session:
How do we provide this training to large groups of volunteers when our project leaders and experts have really limited time to do this?
Are there ways that we can improve the photos to improve the answers given by volunteers?