SlideShare a Scribd company logo
1 of 197
Download to read offline
Universidad del Bío-Bío, Chile
Facultad de Ciencias Empresariales
Iván Cantador, ivan.cantador@uam.es
January 13, 2023
Case studies in e-participation
Data science in practice
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
About me
• Iván Cantador
• Associate Professor at the Computer Science and Engineering Department
of Universidad Autónoma de Madrid, Spain
http://www.eps.uam.es/~cantador
• Research interests
- Recommender systems
- Information retrieval
- Machine learning
- Natural language processing
- Semantic technologies
- E-government
1
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
2
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
• Open government
• Citizen participation
• Digital platforms for citizen participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
3
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Open government
• Open Government (Oszlak, 2013) – A public management paradigm that arises in
a context characterized by:
• The disaffection on the part of the citizenry originated by the numerous crises that question
the capacity of the Public Administration to deal with them
• The rise of the ubiquitous use of technologies, which have transformed communications
and interactions between individuals, and have promoted the emergence of open,
participatory and collaborative practices
• The opening of the government, among other institutions, to the citizens, aiming to end with
the existing disaffection
4
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Open government
• Goals of the open government model (Ramírez-Alujas, 2014):
• Increasing the transparency (accountability) and access to
government information through Open Data
- These open data should allow citizens to have access to information
and should promote innovation and economic development in the public sector
• Facilitating the collaboration between distinct actors, particularly between public
administrations, civil society, and the private sector, in order to codesign and generate
public value
• Promoting citizen participation in the design and implementation of public policies,
i.e., in decision and policy making
5
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Open government
• Background – Memorandum on Transparency and Open Government, USA.
Barack Obama’s Administration, 2009
6
1. E-participation
• Providing information about the government activity, its performance, etc.
This encourages and promotes accountability and social control.
Transparency
• Promoting the right of citizens to actively participate in policy making.
Participation
• Involving citizens and other actors in scenarios of cooperation and
coordinated work.
Collaboration
• Using technology as an instrument to promote openness in government,
facing the challenges of the new millennium.
Technology
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Open government
• The Open Government Partnership emerged in 2011 in order to promote open
government in different administrations
• It seeks for the different governments to reach specific commitments on transparency and
power of citizens, fight against corruption, and take advantage of new technologies to
strengthen governance
- Founded by 8 countries: Brazil, Mexico, Indonesia, Philippines, Norway, USA, South Africa, UK
- Composed of 70 member states and numerous government organizations
• Principal commitments:
- Improvement of public services
- Increased public integrity
- Effective management of public resources
- Safer communities
- Increased corporate responsibility
7
1. E-participation
https://www.opengovpartnership.org
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
• Open government
• Citizen participation
• Digital platforms for citizen participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
8
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Citizen participation
• Citizen participation is a process that provides private
individuals an opportunity to influence on public decisions,
and has been a component of democratic decision making
• A community-based process in which citizens may organize themselves
and their goals, and may work together through non-governmental organizations
to influence on public policies and plans
• Benefits
• Governance: reducing conflicts, strengthening democratic legitimacy, encouraging active
citizenship → government transparency and accountability, and trust between citizens and
political institutions
• Increasing the quality of public decisions and services
• Learning and training to build stronger societies
• Promoting social cohesion, mutual understanding and social justice
9
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Citizen participation
• Ladder of citizen participation (Arnstein, 1969)
• 8 levels in 3 groups
- No participation
- Symbolic participation
- ‘Real’ participation
• Simplified by the OECD model into 3 levels
10
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Citizen participation
• Barriers of citizen participation
• Incompatibilities
- Politic, legal, cultural, socioeconomic, organizational
• Intrinsic problems
- Complex, expensive, under representative, non-plural, little informed, conflictive,
non-deliberative, non-scalable, etc.
• Extrinsic problems
- Arbitrary and manipulable
- Inefficient and non-self-sustaining
- Irrelevant issues and lack of effect
- Citizen saturation
- Monopoly of participation, etc.
11
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Citizen participation
• Tools for citizen participation
• Non-ICT-based
- Questionnaires, and surveys
- Seminars, talks, and meetings
- Discussion and work groups
- Cultural, artistic and leisure events
• ICT-based
- E-mail, RSS, SMS, multimedia sharing
- Social media, web portals and e-platforms
- Mobile apps
- Open data, IoT (crowdsensing)
- Augmented/virtual reality
12
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Citizen participation
• Participedia.net
• Anyone can join the Participedia community
and help crowdsource, catalogue, and
compare participatory political processes
around the world
• Cases (2259)
• Methods (360)
• Organizations (841)
• Teaching resources
13
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
• Open government
• Citizen participation
• Digital platforms for citizen participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
14
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Digital platforms for citizen participation
• With the advent of social media and mobile computing, nowadays there is a plethora
of digital citizen participation channels
• general-purpose online social networks
• ad hoc e-consultation, e-voting and e-participation platforms
• The huge, ever-increasing citizen generated content leads to an information
overload problem for both citizens and government stakeholders in decision and
policy making tasks
• Users may feel overwhelmed by the large amount of data, whose exploration and
understanding could result challenging and frustrating
• Citizens may feel thwarted if their proposals do not reach sufficient visibility and impact
15
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Digital platforms for citizen participation
• E-participation refers to ICT-supported citizen
participation in governance processes
• administration
• service delivery
• decision making
• policy making
• It aims to upgrade the relations among stakeholders
in civil society –e.g., local government, citizens, firms–,
putting the citizens in the center of the processes
• It has originated novel consultation and deliberation
initiatives
16
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Digital platforms for citizen participation
• E-participation tools by type of engagement and role of ICT/level of participation
17
1. E-participation
Aichholzer, G., & Allhutter, D. (2011).
Online forms of political participation and their
impact on democracy. Institute of Technology
Assessment (ITA).
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Digital platforms for citizen participation
• Most current e-participation
platforms are based on web forums
• Citizens make proposals and provide
comments and opinions, forming
large conversation threads
18
1. E-participation
Example of web forum-based e-participation platform
Citizen proposal Discussions
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Digital platforms for citizen participation
• Conventional web forums promote social interaction
• Pros
- Easy and fast content generation (through free text posts)
- Smooth, large-scale interaction (via comment threads)
• Cons
- No or very limited functionalities for content organization,
filtering and analysis
- Dispersed and redundant content, since it is structured
by time
- Challenging processing of discussions
19
1. E-participation
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
• Participatory budgeting
• E-participatory budgeting
• The ‘Decide Madrid’ platform
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
20
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Participatory budgeting
• Participatory budgeting (PB) is a democratic
deliberation and decision-making process in
which citizens decide how to spend certain
municipal or public budgets
• informing about issues and problems on a wide range
of subject areas in a city, e.g., housing, public safety,
education, health, transportation and environment
• proposing, debating and supporting/voting for
spending ideas and projects aimed to address such
problems
21
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Participatory budgeting
• Pros
• Increased government transparency and trust
• Citizens’ empowerment and change of democratic attitude
• Better allocation of resources (in general)
• Increased voter turnout
• Cons
• Lack of diverse representation
• Time consuming
• Resource intensive
• Lack of interest or political will
22
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Participatory budgeting
• Since its original invention in Porto Alegre,
Brazil, in 1988, PB has gained much
popularity
• As for 2022, PB had spread to over 4,500 cities
around the world (source: Participatory
Budgeting World Atlas, https://www.pbatlas.net)
• Tools of citizen participation
• Meetings
• Committees
• Consultations
• …
• Electronic participatory platforms
23
2. Decide Madrid
http://www.participatorybudgeting.org
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Participatory budgeting
• PB in Europe (https://www.euractiv.com/section/participatory-democracy/infographic/participatory-
budgeting-europes-bet-to-increase-trust-in-government)
• While residents’ demands in European cities are often similar, the percentage of budget can
vary widely from one place to another: Paris dedicates 25% of the investment budget to PB,
while smaller cities usually invest 2 to 5% of their resources.
24
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Participatory budgeting
• PB in Chile (https://www.pbatlas.net/chile.html)
• 37 local government initiatives + 1 regional government initiative
• Although PB initiatives in the country are born in 2002 due to political will of the mayors at
local level, since 2014 the region of Los Ríos started its own process:
- high valuation of citizen participation that exists in the region
- historical roots of the creation of the region that happened in 2007, preceded by a social
movement of more than 30 years that demanded to be a region
• The presentation of proposals is made mainly through social leaders
- the selection of the projects is carried out in neighborhood or territorial assemblies,
which mostly are formed by representatives of social organizations and institutions
• Regarding voting and prioritizing proposals, predominates the model the people’s
direct and universal vote
25
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
• Participatory budgeting
• E-participatory budgeting
• The ‘Decide Madrid’ platform
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
26
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
E-participatory budgeting
• In addition to ad hoc PB digital applications
and platforms, there are several software
frameworks to build online PB platforms
• CONSUL, http://consulproject.org: tens of cities
in Spain, Italy, France and South America
• Stanford Participatory Budgeting,
http://pbstanford.org: major cities in the USA,
e.g., New York, Chicago, Seattle, Oakland and
Boston
• EU Open Budgets, http://openbudgets.eu/tools
27
2. Decide Madrid
title
location category
author description
supports comments
Proposal
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
E-participatory budgeting
• Motivations for data science applications
• Limitations of current ePB platforms of large
cities
- very limited search and filtering functionalities
- unable to facilitate the analysis of hundreds,
even thousands, of citizen proposals and
associated comments and discussions
• Creating a budgeting proposal, a citizen should
be aware of similar or related ideas or projects, so
she could better define the proposal or find the
opportunity to collaborate with others
28
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
• Participatory budgeting
• E-participatory budgeting
• The ‘Decide Madrid’ platform
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
29
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The ‘Decide Madrid’ platform
• A web system designed to allow Madrid
residents to make, discuss and vote
proposals for the city
• Used since September 2015
• With a 100M € budget in 2017
• Consisting of a 3-phase process
30
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The ‘Decide Madrid’ platform
• ~6,000 citizen proposals per year
• Keyword-based search
• No use of (structured) metadata
• No data analysis
• No personalization
• No recommendation
31
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The ‘Decide Madrid’ platform
• Available data for a proposal
• Title
• Author
• Date
• Summary
• Description
• Freely-chosen tags
• Number of user votes
• User comment threads
32
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The ‘Decide Madrid’ platform
Why considering Decide Madrid as a representative case study?
• Participatory budgeting is one the citizen participation methods most used
worldwide:
• Represented in more than 400 cases from a total of 2,000 cases analyzed in Participedia
(https://participedia.net)
• Used in more than 3,000 cities and municipalities worldwide according to the Participatory
Budgeting Project (https://www.participatorybudgeting.org/white-paper)
• Decide Madrid is implemented upon CONSUL (https://consulproject.org), an open-source
framework to develop citizen participation platforms:
• Used in more than de 135 institutions of 35 countries
• With a structure similar to other popular frameworks, such as Stanford Participatory
Budgeting (https://pbstanford.org) and EU Open Budgets (http://openbudgets.eu/tools)
33
2. Decide Madrid
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
34
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
• The data mining pipeline
• Data crawling
• Data scraping
• Data processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The data mining pipeline
35
3. Data acquisition and processing
Data
Information
Knowledge
Understanding, experience, insights,
intuitions to use information
Pure and simple facts with no particular
organization
Understanding, experience, insights,
intuitions to use information
Processed, filtered, calculated, structured,
categorized, contextualized data
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The data mining pipeline
36
3. Data acquisition and processing
Unstructured data
Semi-structured data
Structured data
Simple and flexible structure, no strict format
Limited vocabulary, schema mixed with data values
E.g.: taxonomies (categories), folksonomies (tags)
Rigid structure, strict format
Well defined vocabularies and representation
E.g.: databases, ontologies
No structure
Non-restricted vocabulary, no predefined schema
E.g.: free text
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The data mining pipeline
• Open Government Data (OGD) promote transparency, accountability and
public value creation
• By making datasets publicly available, institutions become more transparent and
accountable to citizens
• By facilitating the use, reuse and free distribution of datasets, governments foster business
creation and innovative, citizen-centered digital applications and services
• OGD portals enable the general public to access the open data collections
• allowing the search of data files, but not the search of information within the files
37
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The data mining pipeline
• Open data portals are web sites to access sets
of OGD collections
• Search engine
- Retrieving collections via keyword-based queries
• Collection metadata
- Title, description, date, size, etc.
• Data files
- Formats: CSV, XLS, XML, RDF, etc.
- To be downloaded and opened with specific
applications, e.g., Microsoft Excel
• Documentation
- Inner structure of the data files
38
3. Data acquisition and processing
Example: open data portal of Madrid City Council
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The data mining pipeline
• Open data are commonly provided as tables:
• Rows = data records (instances, individuals)
• Columns = data attributes (features, fields)
39
3. Data acquisition and processing
Example: records of traffic accidents occurred in Madrid in 2020
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The data mining pipeline
Methodology
• Text processing on titles, tags,
descriptions and comments of citizen
proposals
• Semantic annotation of proposals:
topics and districts
• Computing discussion and
controversy metrics on the
comments of each proposal
• Exploiting open data as statistical
indicators about districts: economic,
sociocultural, ideology, employment,
education, health, housing, etc.
40
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The data mining pipeline
• 2 complex processes for:
• crawling and scrapping the ‘Decide Madrid’ web pages
• mapping tags to places and topics
• 22 districts & hundreds of places
• 30 topics
• urbanism, transport, environment,
health care, education, social rights,
education, culture, economy, job,
politics, security, housing, family,
old age, religion, animals, etc.
Assumption: a comment = a (positive, unary,) rating
41
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
The data mining pipeline
Dataset
• Participatory budgeting of
4 editions: 2015-2018
• Around 29,000 proposals
• More than 86,000
comments
• 30 categories and 325
topics
• 21 districts + “city scope”
42
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
43
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
• The data mining pipeline
• Data crawling
• Data scraping
• Data processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data crawling
• A (web) crawler is a computer program that browses the Web in a methodological
(with an orderly fashion), automated manner
• Applications
• Web search/indexing
• Vertical (specialized) search engines, e.g., news, shopping, recipes, reviews, papers
• Monitoring web sites and pages of interest
• Business intelligence: collecting information about company competitors and potential
collaborators
• Malicious applications: collecting personal information
44
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data crawling
• A crawler within a web search engine
45
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data crawling
• A crawler within a web application
46
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data crawling
• Generic web crawling process
• Seeds
- A list of starting URLs
• Visiting order
- Frontier = unvisited URLs
- Deciding which URLs should be discarded
to not fill up the frontier (lower priority)
• Stop criterion
- Empty frontier or maximum number
of pages crawled
47
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data crawling
• Best First
• The simplest topical crawler
• The frontier is a priority queue based on text (or keyword) similarity between topic and
parent page
48
3. Data acquisition and processing
bestFirst(topic, seed_urls) {
foreach link(seed_urls) {
queue(frontier, link);
}
while (frontier.size() > 0 and visited < MAX_PAGES) {
link = dequeueMax(frontier); // dequeue MAX similarity
page := fetch(link);
score := sim(topic, page);
foreach (extract_links(doc)) { // outlinks
enqueue(frontier, outlink, score);
}
}
}
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data crawling
49
3. Data acquisition and processing
<div class="proposal-content">
<h3><a href="/proposals/34239-luces-led-barrio-concepcion-y-san-pascual">
Luces LED Barrio Concepción y San Pascual </a></h3>
<p class="proposal-info">
<span class="icon-comments"></span>&nbsp;
<a href="/proposals/34239-luces-led-barrio-concepcion-y-san-pascual#comments">
Sin comentarios</a>
<span class="bullet">&nbsp;•&nbsp;</span>01/12/2022
<div class="proposal-content">
<h3><a href="/proposals/34239-luces-led-barrio-concepcion-y-san-pascual">
Luces LED Barrio Concepción y San Pascual </a></h3>
<p class="proposal-info">
<span class="icon-comments"></span>&nbsp;
<a href="/proposals/34239-luces-led-barrio-concepcion-y-san-pascual#comments">
Sin comentarios</a>
<span class="bullet">&nbsp;•&nbsp;</span>01/12/2022
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data crawling
50
3. Data acquisition and processing
public static void downloadProposalsURLs(String url, String file, int firstPage, int lastPage, boolean append) throws Exception {
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file, append), "UTF-8"));
for (int p = firstPage; p <= lastPage; p++) {
// Pick a random user agent
int indx = RAND.nextInt(USER_AGENTS.length);
String userAgent = USER_AGENTS[indx];
// Open the connection and read the web document
URI uri = new URI(url + p);
Connection connection = Jsoup.connect(uri.toASCIIString());
Document doc = connection.userAgent(userAgent).get();
// Read the proposals URLs from the web document -> identified by <a> links within <div class="proposal-content"> element
Elements linkList = doc.getElementsByClass("proposal-content");
Iterator<Element> it = linkList.iterator();
while (it.hasNext()) {
Element link = it.next();
String linkURL = link.getElementsByTag("a").get(0).attr("href");
writer.write(linkURL + "n");
}
}
writer.close();
}
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
51
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
• The data mining pipeline
• Data crawling
• Data scraping
• Data processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data scraping
52
3. Data acquisition and processing
<img alt="Armando Cuesta" class="initialjs-avatar author-photo"
data-char-count="1" data-font-size="19" data-height="32"
data-name="Armando Cuesta"
data-radius="4" data-seed="460897"
data-text-color="#ffffff" data-width="32" src="data:image/…">
<img alt="Armando Cuesta" class="initialjs-avatar author-photo"
data-char-count="1" data-font-size="19" data-height="32"
data-name="Armando Cuesta"
data-radius="4" data-seed="460897"
data-text-color="#ffffff" data-width="32" src="data:image/…">
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data scraping
53
3. Data acquisition and processing
public static Proposal getProposal(String proposalFile, boolean isClosed) throws Exception {
Proposal proposal = new Proposal();
Document doc = Jsoup.parse(new File(proposalFile), "UTF-8");
// URL
Elements elems = doc.select("meta[property=og:url]");
String url = elems.attr("content").trim();
proposal.setUrl(url);
// Id
String id = url.substring(url.lastIndexOf("/") + 1);
id = id.substring(0, id.indexOf("-"));
proposal.setId(Integer.valueOf(id));
// Title
elems = doc.select("meta[property=og:title]");
String title = elems.attr("content").trim();
proposal.setTitle(title);
// Summary
String summary = doc.select("div.proposal-show").get(0).getElementsByTag("blockquote").text().trim();
if (summary.equals("Resumen de la propuesta")) {
summary = "";
}
proposal.setSummary(summary);
...
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
• The data mining pipeline
• Data crawling
• Data scraping
• Data processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
54
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data processing
• Database tables
• Created by the crawler and scraper
- proposals (code, title, author, date, summary, description, supports,…)
- users
- proposal_tags
- proposal_comments (id, author, text, parent_comment, pos_votes, neg_votes, …)
• Created from proposal_tags
- proposal_categories  text processing + clustering
- proposal_topics  text processing + clustering
- proposal_districts  text processing
- proposal_locations  text processing + mapping to a street directory + geolocation
55
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data processing
• Graph building
• Its nodes are whole set of proposal tags
• Each of its (weighted) edges links
“related” a pair of tags, according to:
- Syntactic similarity
- Semantic similarity
- Cooccurrences within proposals
• Graph clustering method proposed by
Newman and Girvan (2004)
• It has a criterion to automatically set an
optimal number of clusters
• Each cluster represents a topic, which is
composed by a set of tags
56
3. Data acquisition and processing
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Data processing
• 2-level taxonomy: 30 categories + 325 topics
57
3. Data acquisition and processing
Accesibilidad accesibilidad, accesibilidad metro, aparcamiento para discapacitados....
Animales adiestramiento canino, águilas, animales, animales de compañía, antitaurino....
Asociaciones asociaciones, asociaciones de vecinos, asociaciones juveniles....
Ayuntamiento y administración pública administracion, alcaldesa, atencion al ciudadano....
Civismo acoger, bioetica, bullying, cinismo, civico, civismo, colaboracion social....
Cultura arqueologia, arte, arte callejero, arte urbano, artesania, artistas, ....
Delincuencia anti corrupcion, atraco, carteristas, corrupcion, delincuencia, delitos....
Deportes actividad fisica, anillo ciclista, area de deportes, atletas, atleti, atletismo....
Derechos sociales abuso, acoso, albergue, altermundialismo, apoyo emocional, apoyo social....
Economía actividad económica, ahorro, bancos, bbva, comerciantes, comercio....
Educación acoso escolar, alumnos, bachillerato, bibiotecas, brecha cultural....
Empleo autoempleo, autónomos, comerciales, conciliacion laboral, contratacion municipal....
Equidad e integración chabolas, cie, derechos lgtbi, inmigración, desigualdad de genero...
Familia e infancia actividades infantiles, ayuda embarazo, bebes, carricoche....
Jóvenes acoso escolar, adolescencia, adolescentes, asociaciones juveniles....
Justicia constitucion, cumplimiento de las leyes, dictadura, fiscal, franquismo....
Medio ambiente acusticas, agroecologia, agua, aire, aire acondicionado, ajardinamiento....
Movilidad abono transportes, adif, agentes de movilidad, aparamiento regulado...
Ocio y entretenimiento baile, bares, celebraciones, centro comercial, cines, conciertos....
Participación ciudadana accion social, avisos madrid, decide madrid, decidemadrid...
Política, 15m, ahora madrid, ayuntamiento, ayuntamiento de madrid, democracia....
Religión españa laica, estado aconfesional, iglesia, islam, laicismo, religion...
Salud y sanidad acoholismo, acustica, acusticas, aire libre, aire puro, alcohol....
Seguridad y emergencias accidentes, app emergencias, aviso, avisos madrid, bomberos...
Sostenibilidad agroecologia, ahorro de energia, autogestion, ciudad amable, ....
Tercera edad abuelos, ancianos, centros de dia, desempleo mayores, jubilacion....
Transparencia anti corrupcion, datos abiertos, derecho a la informacion....
Turismo oferta turistica, puntos de informacion turistica, puntos de interes...
Urbanismo aceras, adoquinado, ajardinamiento, alumbrado, apariencia edificios....
Vivienda alquileres, alquiler vacacional, alquiler vivienda, derecho a un vivienda....
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
• Discussion and controversy analysis
• Clustering and visualization
• Intent-based classification
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
58
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
• In the literature, there is a predominance of online tools implemented ad hoc to
facilitate citizen participation at scale and to reduce costs
• Aiming to analyze in depth how participation is performed in such tools, we conduct
a study about a particular tool
• The chosen tool is Decide Madrid (https://decide.madrid.es), the participatory budgeting
e-platform of Madrid City Council since 2015
• The study makes use of diverse data:
• Topics, districts and support levels of citizen proposals
• Controversy level of comment threads originated over the proposals
• Indicators about economic, sociocultural and ideological aspects of the districts
59
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Motivation
• Lack of government institutions comprehension about content generated by
citizens in electronic tools
• Possibility that institutions fail to meet the citizens’ demands
- Meeting certain relevant demands may be missed, not because they are unfeasible, but
because of their controversial nature
• Decreased quality of decision making
• Loss of confidence on the part of the citizenry
60
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Decide Madrid
• Operational since 2015
• With more than 6,000 citizen
proposals a year
• With more than 400,000
registered users in 2019
• With a structure of
discussion threads
(comments) for each citizen
proposal
61
4. Data mining applications
Ejemplo de propuesta ciudadana en Decide Madrid.
title
author, date
description
tags
votes
comments
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Controversy metrics
• To measure the controversy of a citizen proposal, we consider the aggregation of 3 metrics applied to
discussion threads (comments)
62
4. Data mining applications
Controversy based on the content
(lenght) of dicussions
Controversy vased on the opinión
polarization (of votes)
Controversy based on the estructure of
the conversations
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Controversy metrics
• Discussion content-based controversy
• The length of the proposal’s discussion, measured as the sum of the length of its comments
• Opinion polarization-based controversy
• A weighted ratio measuring the difference of positive and negative votes for the proposal’s comments
• Conversation structure-based controversy
• An adaptation of the H-index for measuring discussion diversification
63
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Some results of the study (I)
• The controversy values follow a heavy tail distribution, in which the majority of the proposals have
low controversy
• The proposals highly supported are not necessarily the most controversial
64
4. Data mining applications
“In Decide Madrid, proposals with a low level of support are currently discarded and archived, regardless of the level of
discussion and controversy they have. However, from a decision-making perspective, it would be interesting to delve deeper
into the controversial proposals and understand the problems of the city and the citizens they are affected by”.
“In Decide Madrid, proposals with a low level of support are currently discarded and archived, regardless of the level of
discussion and controversy they have. However, from a decision-making perspective, it would be interesting to delve deeper
into the controversial proposals and understand the problems of the city and the citizens they are affected by”.
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Some results of the study (II)
• Most controversial and supported topics
• Religion: inclusion of LGTBI+ groups in Cabalgata de Reyes,
public funding and tax benefits for Catholic institutions
• Housing: creation of social housing, annual property taxes
• Culture: prohibition of bullfighting
• Topics having low-moderate number of proposals with
low level of support and high controversy
• Governance: transparency, citizen participation, public
administration, laws and legislation
• Rights and social movements: social rights, civility, equity,
migration, integration, crime, NIMBY
65
4. Data mining applications
“In Decide Madrid, citizens’ ideological differences play
an important role in the group of controversial categories”.
“In Decide Madrid, citizens’ ideological differences play
an important role in the group of controversial categories”.
“In Decide Madrid, political and social issues reach a
low-moderate relevance (final attention)”.
“In Decide Madrid, political and social issues reach a
low-moderate relevance (final attention)”.
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Some results of the study (II)
• Topics having a large number of proposals with a high level
of support and controversy
• Domestic animals, mainly dogs (e.g., cleaning and fines for excrements on
public roads, creation of "pipicans", compulsory leash, etc.)
• Topics having low-moderate number of proposals with
low-moderate level of support and controversy
• Education, health, family, childhood, old age, employment, accessibility,
youth.
66
4. Data mining applications
“In Decide Madrid, proposals aimed at some vulnerable groups
(for example, people with disabilities, the elderly, unemployed)
tend to generate less citizen participation”.
“In Decide Madrid, proposals aimed at some vulnerable groups
(for example, people with disabilities, the elderly, unemployed)
tend to generate less citizen participation”.
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Some results of the study (III)
• Study of factors external to participation. Calculation of the correlation between levels of
support/controversy and district “statistical indicators” published as open data
67
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Some results of the study (III)
• Study of factors external to participation. Calculation of the correlation between levels of
support/controversy and district “statistical indicators” published as open data
• The districts in which the greatest number of proposals are generated are those with:
• A high number of groups, neighborhood associations, and consumer organizations
• A more progressive position, that is, in which the majority voted for PSOE and Unidas Podemos
• A greater environmental commitment, that is, with more ecological associations
• The districts in which the most controversial proposals are generated are those with:
• A higher percentage of young people
• A greater number of citizens belonging to vulnerable groups, such as the elderly, young people and people
with some type of disability
• A higher birth rate and number of associations related to childhood
4. Data mining applications
68
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Discussion and controversy analysis
Limitations of the study
• The discussion of the votes (for or against) given to the comments has been considered
• The “polarity” (positive or negative) of the comments themselves should be analyzed. To do this, natural
language processing techniques would have to be applied
• Decide Madrid, which is a tool restricted and adjusted to a specific participation procedure,
has been analyzed
• More open tools such as online social networks (e.g., Twitter) should be considered
• Proposals and discussions motivated by political and ideological cleavages that traditionally
confront Spanish society have been observed (ideological positioning on the left-right scale,
religious versus secular values, traditional versus progressive, etc.)
• Tools from other countries should be analyzed to obtain more generalizable conclusions
• Possible biases (e.g., digital divide, political program) that could exist in users who use Decide
Madrid, and similar tools, have been omitted
4. Data mining applications
69
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
• Discussion and controversy analysis
• Clustering and visualization
• Intent-based classification
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
70
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Clustering and visualization
• Citizen collaboration through current digital participation platforms can entail the
generation of large amounts of complex content, which may hide relevant citizens’
concerns, requests and initiatives, diluted in isolated individual proposals
• We present an interactive data mining tool for citizen participation data
visualization and analysis
• Applying natural language processing, text similarity, and graph clustering techniques
• Grouping proposals with common objectives
• Identifying trends and recurrent topics of interest
• Filtering and presenting information according to several criteria
• The tool is flexible, able to process different sources of data, and lightweight as it
uses simple data structures and dynamic HTML-based visualization and interaction
71
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Clustering and visualization
• The tools is built upon the
Tableau data visualization
software
https://www.tableau.com/resource/
data-visualization
• Lightweight
• Easy to configure
• Several visualization
functionalities
- Diagram bars
- Heat maps
- Time series graphs
72
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Clustering and visualization
• Distribution of proposals,
categories and topics,
according to:
• Time (year, month) and
location (district)
• Support, discussion and
controversy levels
• Diverse temporal and
geographical analysis
• Better and easier extraction of
patterns and insights when
analyzing the published citizen
generated content
73
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Clustering and visualization
• Text processing
• Mistake correction
- Dictionary
- Levenshtein distance
• Special characters removal
• Stopwords removal
• Word lemmatization
• Document similarity
• Word Mover’s Distance
(WMD) similarity, which
treats text documents as
weighted point clouds of
word embeddings
74
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Clustering and visualization
• Document clustering
• Weighted graph
- Nodes: citizen proposal
documents
- Edges: document
similarity values
- Removal of edges with
“low” weights
• Louvain clustering
method
- Optimizes the
modularity of the graph,
associating nodes to
clusters until
convergence
75
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Clustering and visualization
• A coproduction functionality
based on the retrieval of
existing similar proposals
• A citizen who is interested in
submitting a new proposal can
first bring it into the tool, and
check if there are related ones
76
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
• Discussion and controversy analysis
• Clustering and visualization
• Intent-based classification
5. Information retrieval applications
6. Recommendation applications
7. Conclusions
77
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Social networks represent a prominent bidirectional communication
channel
between citizens and government
• Citizens are…
- content consumers who receive the government announcements, to which they
react and freely respond according to personal ideology, interests and needs, and
- content providers who generate a wide range of messages targeted to government
and political stakeholders
• The amount of social media content daily generated by citizens is huge and
diverse, and its processing by human actors may result too costly and
overwhelming
78
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• There is an increasing interest and need to use computer-assisted
solutions capable of automatically gathering, processing and analyzing
the underlying information in the citizens’ messages (a.k.a. posts) on social
networks
• The research literature reports extensive work on:
• analyzing social phenomena produced through the online network structures
(e.g., information spreading, fake news, and opinion polarity), and mainly originated
by particular events (e.g., natural disasters, elections, and trending news)
• extracting the most popular topics addressed by citizens’ posts in social networks,
as well as the general dynamics (i.e., temporal evolution) and opinions on such topics
79
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Differently to previous work, we go beyond the extraction of topics by
attempting to automatically classify citizens’ posts (tweets) according
to their intents or purposes
1. Complaint: stating something that is unsatisfactory or unacceptable
- “@MADRID after 1 week of calling, the city is yet not clean, and the rats are taking over!!
http://t.co/IiIDuaPFG9”
2. Announcement: making a public statement about a fact, occurrence or event
- “The date, place and schedule of the Festival activities in La Latina have already been
confirmed http://t.co/U0tRwKAC @madrid @madridiario”
3. News item: objectively informing about current events
- “#oladecalor #aemet @Madrid has suffered its warmest night within the latest 100 years
http://t.co/ZSjeqK6m”
80
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Differently to previous work, we go beyond the extraction of topics by
attempting to automatically classify citizens’ posts (tweets) according
to their intents or purposes
4. Personal fact: publicizing self issues and experiences
- “I also support the candidature from @Madrid2020ES @MADRID #aporella”
5. Opinion: expressing subjective opinions about the city, its events, activities, etc.
- “The activity of #emprendeenmadrid is amazing. Congratulations @MADRID and greetings
from an entrepreneur”
6. Request: explicitly asking for something specific
- “Very nice but impossible to ride a bike at normal speed #MadridRio. Please @MADRID
create a bike lane with cyclist priority”
81
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Differently to previous work, we go beyond the extraction of topics by
attempting to automatically classify citizens’ posts (tweets) according
to their intents or purposes
7. Notification: reporting or giving notice of urban, citizenship- or government-related
issues, so that government can quickly act on them and help other citizens
- “@MADRID can you fix this gap in San Bernardino street 8-10 before someone gets hurt?
http://lockerz.com/s/117566458”
8. Question: explicitly asking for information
- “@MADRID could you please give me the telephone number of the press office of the
Madrid city hall”
9. Proposal: suggesting an initiative or project
- “There is a collection of used oil in the center of Alicante. It would be fantastic to have
something similar @MADRID”
82
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• To automatically categorize a tweet into one of the previous intents
(labels), it is first transformed into a vector of features
• We consider 37 domain- and language-independent features to
describe the content of a tweet
83
4. Data mining applications
Lexical features Grammatical features Social network-based features
• number of characters
• number of words
• number of exclamation marks
• number of question marks
• existence of a positive emoticon
• existence of a negative emoticon
• existence of a vowel (or “y”)
consecutively repeated 3 or more
times in a word
• number of nouns
• number of proper nouns
• number of adjectives
• number of verbs
• number of adverbs
• number of personal/possessive
pronouns
• number of time references
(entities)
• number of money-related
references
• number of followers
• number of friends
(a.k.a. followees)
• number of posts
• number of active days in Twitter
• number of hashtags (#)
• number of user mentions (@)
• number of hyperlinks
• number of multimedia
• maximum hashtag length
• existence of an explicit retweet
request (i.e., "RT" abbreviation)
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• To validate the proposed approach, we evaluated several machine learning
algorithms on a labeled dataset:
• K-Nearest Neighbors (KNN)
• Logistic Regression (LR)
• Quadratic Discriminant Analysis (QDA)
• Decision Tree (DT)
- executed alone, and in combination with
feature selection (RFECV DT) and
tree pruning (AP DT)
to avoid learning over-fitting
• Gaussian Process (GP)
• Support Vector Machine (SVM)
• Bagging Ensemble (BE)
84
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Dataset: a random sample of 666 tweets mentioning @Madrid account, each of
them manually labeled by 3 researchers (almost perfect agreement: Fleiss' kappa = 0.98)
• 9 binary classification problems: one-against-all (i.e., training a single classifier
per label)
• Classification metrics
• acc (accuracy)
• acc+ (minority class acc)
• acc– (majority class acc)
•
85
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Dataset: a random sample of 666 tweets mentioning @Madrid account, each of
them manually labeled by 3 researchers (almost perfect agreement: Fleiss' kappa = 0.98)
• 9 binary classification problems: one-against-all (i.e., training a single classifier
per label)
• Classification metrics
• acc (accuracy)
• acc+ (minority class acc)
• acc– (majority class acc)
•
86
4. Data mining applications
(very) unbalanced classification problems
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Dataset: a random sample of 666 tweets mentioning @Madrid account, each of
them manually labeled by 3 researchers (almost perfect agreement: Fleiss' kappa = 0.98)
• 9 binary classification problems: one-against-all (i.e., training a single classifier
per label)
• Classification metrics
• acc (accuracy)
• acc+ (minority class acc)
• acc– (majority class acc)
•
87
4. Data mining applications
(misleading) high classification accuracies
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Dataset: a random sample of 666 tweets mentioning @Madrid account, each of
them manually labeled by 3 researchers (almost perfect agreement: Fleiss' kappa = 0.98)
• 9 binary classification problems: one-against-all (i.e., training a single classifier
per label)
• Classification metrics
• acc (accuracy)
• acc+ (minority class acc)
• acc– (majority class acc)
•
88
4. Data mining applications
reasonably good accuracy balance for the two labels
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• Most discriminating words and features for each of the considered intents
89
4. Data mining applications
COM = complaint
ANN = announcement
REQ = request
NEW = news item
FAC = personal fact
OPI = personal opinion
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Intent-based classification
• The proposed intent-based classification represents a task prior to the
extraction of topics and opinions, and may help filtering and prioritizing citizens’
messages, and further automatizing processes for more efficient and effective
decision and policy making
• There is room for improvement:
• More sophisticated NLP techniques, such as language models and word embeddings,
could be used to exploit the semantics of words and word sequences
- e.g., “opinion is” and “really think that” could be identified as informative bigram and
trigram of the personal opinion intent
• Features from other sources of information, such as the user who creates a post and the
user(s) who are mentioned in a post
- e.g., by considering their types: citizens, neighborhood associations, organizations, or
political actors
90
4. Data mining applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
• Argument mining in a nutshell
• Argument-based document search
• Argument-based conversational information access
• Neural network-based argument extraction
6. Recommendation applications
7. Conclusions
91
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Tasks
• Detection of argument text fragments
• Identification of argument components
• Extraction of argument relations
• Algorithmic foundations
• Natural Language Processing (NLP)
• Machine/deep learning
• Linguistic features
• Sentence-level (e.g., sentence length, argument linkers, etc.),
grammatical (e.g., number of nouns, adjectives, modal verbs, etc.), syntactic (e.g., patterns,
constituency tree depth, etc.), semantic (e.g., named entities, word embeddings, etc.)
92
5. Information retrieval applications
Source: ACL’16 tutorial “NLP Approaches to Computational Argumentation”
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Tasks
1. Detection of arguments
2. Identification of argument components and structures
3. Extraction of argument relations
93
5. Information retrieval applications
Source: ACL’16 tutorial
“NLP Approaches to Computational Argumentation”
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Tasks
1. Detection of arguments
2. Identification of argument components and structures
3. Extraction of argument relations
94
5. Information retrieval applications
Source: ACL’16 tutorial
“NLP Approaches to Computational Argumentation”
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Tasks
1. Detection of arguments
2. Identification of argument components and structures
3. Extraction of argument relations
95
5. Information retrieval applications
Source: ACL’16 tutorial
“NLP Approaches to Computational Argumentation”
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Tasks
1. Detection of arguments
2. Identification of argument components and structures
3. Extraction of argument relations
96
5. Information retrieval applications
Source: ACL’16 tutorial
“NLP Approaches to Computational Argumentation”
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Tasks
1. Detection of arguments
2. Identification of argument components and structures
3. Extraction of argument relations
97
5. Information retrieval applications
Source: ACL’16 tutorial
“NLP Approaches to Computational Argumentation”
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Example: Categorization of argumentative components via machine learning
• Classes
- “Major claim”, “Claim”, “Premise”
• Features
- Lexical lemmatized unigrams including previous tokens
- Syntactic number of nested phrases, depth of the syntactic tree, POS distribution,
tense of the principal verb, modal verbs
- Structural first or last sentence of a paragraph, present in introduction or conclusion,
relative position, number of tokens, etc.
- Indicators connectors: “because”, “however”, “as a result”, etc.
- Contextual contextualized connectors, number of words shared by introduction and conclusion
- Probabilistic conditional probability P(category | previous tokens)
- Discourse discourse relation based on Penn Discourse Treebank
- Embeddings vectors with 300 dimensions trained with Google News Corpus
98
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Example: Categorization of argumentative components via machine learning
• Using all features results on the best F1 values
• The classification of claims is the most difficult task
• The structural features are the most valuable
• The discourse features are informative for the identification of claims
• The word embeddings achieve results similar to lexical features
99
5. Information retrieval applications
Source: ACL’16 tutorial
“NLP Approaches to Computational Argumentation”
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Corpus
• AIFdb: repository of databases, following the
Argument Interchange Format, AIF
- AracuriaDB: news editorials, parliamentary records,
court summaries and panel discussions
- MM2012: transcriptions of BBC Radio 4
- …
• The Internet Argument Corpus, IAC: set of political
debates in internet forums
• The ECHR Corpus: collection of documents extracted
from legal texts of the European Court of Human Rights
• The Argument Annotated Essays Corpus, AAEC:
collection of persuasive essays
• …
100
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Tools
• Collaborative editors of argumentative graphs
- Agora, http://agora.gatech.edu
- Argunet, http://www.argunet.org
- DebateGraph, http://debategraph.org
- Rationale Online, https://www.rationaleonline.com
• Argumentative annotation platforms
- Araucaria, http://araucaria.arg.tech
- OVA, http://ova.arg-tech.org
101
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument mining in a nutshell
• Events
• International Conference on Computational Models of Argument (COMMA),
https://comma2020.dmi.unipg.it
• Workshop on Argument Mining (ArgMining), https://2021.argmining.org
• Workshop on Computational Models of Natural Argument (CMNA),
http://cmna.csc.liv.ac.uk/CMNA20
• Summer School on Argumentation (SSA), https://ssa2020.dmi.unipg.it
• ACL’19 tutorial “Advances in Argument Mining”, http://arg.tech/~chris/acl2019tut/index.html
• ACL’16 tutorial “NLP Approaches to Computational Argumentation”, http://acl2016tutorial.arg.tech
• Online Seminars on Computational Models of Argument,
https://sites.google.com/view/argumentation-seminar
• Dagstuhl’16 seminar “Natural Language Argumentation: Mining, Processing, and Reasoning over
Textual Arguments”, https://www.dagstuhl.de/16161
• BiCi’14 seminar “Frontiers and Connections between Argumentation Theory and Natural
Language Processing”, http://www-sop.inria.fr/members/Serena.Villata/BiCi2014/frontiersARG-
NLP.html
102
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
• Argument mining in a nutshell
• Argument-based document search
• Argument-based conversational information access
• Neural network-based argument extraction
6. Recommendation applications
7. Conclusions
103
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Proposed framework
104
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Argument model
• Premise → Claim → Major claim
• Types and subtypes of argument relations
• Cause: linking an argument that reflects the reason or condition for another argument
• Clarification: introducing a conclusion, exemplification, restatement or summary of an argument
• Consequence: evidencing an explanation, goal or result of a previous argument
• Contrast: attacking arguments, distinguishing between giving alternatives, doing comparisons,
making concessions, and providing oppositions
• Elaboration: introducing an argument that provides details about another one, entailing addition,
precision or similarity issues about the target argument
• Argument mining methods
• Syntactic pattern matching
• Feature-based machine learning classification
• Embedding-based deep neural network
105
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Heuristic algorithm
• For each sentence of an
input text: looking for certain
syntactic patterns that
introduce argumentative
expressions
• 1,744 arguments extracted
from 5,633 comments
• Contrast: 54.1%
• Consequence: 12.1%
• Cause: 3.6%
• Elaboration: 0.1%
106
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Argument linkers
107
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Information retrieval
• Text processing
• NLP for linguistic feature extraction
• Indexing based on keywords, topics, categories, entities and other metadata
• Search engine based on the vector space model
• Argument-based reranking according to controversy metrics
108
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Outcomes – arguments
• JSON object created for an argument that evidences a contrast premise on a proposal in
favor of using Madrid public transport with pets
109
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Outcomes – documents, topics and arguments
110
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Dataset
• 80 proposals (covering 10 categories and having high controversy) and 5,633 comments
• Experiment setting
• 3 evaluators
• 3 queries
• Topical relevance – accuracy of an argument with respect to the major claim of the
discussion
• 14.6% of the arguments were labeled as very relevant
• 39.9% as relevant
• 36.9% as not relevant
• 8.6% as incorrect
• Rhetoric quality – effectiveness of an argument in persuading an audience
• 17.1% of the arguments were of high quality
• 40.6% of sufficient quality
• 42.3% of low quality
111
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• We have presented a general and flexible argument-based search framework
• Preliminary implementation and evaluation on a dataset with citizen proposals and discussions generated in an
online participatory platform
• Its current implementation includes:
• Various argument extraction methods (heuristic patter matching, feature-based machine learning, embedding-based deep learning)
• A document retrieval engine built upon vector space-based models
• A reranking strategy that exploits certain controversy metrics
• We envision several open research lines:
• Development of ad hoc argument-based document retrieval methods (so far, we have used a reranking technique)
• Consideration of alternative controversy notions
• Increment of the size and quality of the generated corpus
• Evaluation on other datasets and domains
• Measurement of additional argument quality metrics, e.g., based on diversity, fairness, persuasiveness, etc.
112
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
• Argument mining in a nutshell
• Argument-based document search
• Argument-based conversational information access
• Neural network-based argument extraction
6. Recommendation applications
7. Conclusions
113
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• E-participation –understood as the
computer-assisted support to citizen
participation– has originated novel
consultation and deliberation processes
• Most current e-participation platforms are
based on web forums
• Citizens make proposals and provide comments
and opinions, forming
large conversation threads
• Recent attention has shifted to social media,
especially social networks
(e.g., Facebook and Twitter) and
instant messaging tools
(e.g., Telegram and WhatsApp)
5. Information retrieval applications
114
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• Conventional web forums promote social interaction
• Pros
- Easy and fast content generation (through free text
posts)
- Smooth, large-scale interaction (via comment threads)
• Cons
- No or very limited functionalities for content
organization, filtering and analysis
- Dispersed and redundant content, since it is structured
by time
- Challenging processing of discussions
• Argument-driven tools promote the production and
reuse of collective knowledge
115
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• Our work on e-participation…
• addresses 2 promising research lines
- The exploitation of argument mining techniques to automatically
extract and present argumentative information from
citizen-generated content
- The use of conversational agents or chatbots as citizen-to-government
communication channels in instant messaging applications
• targets a final goal
- Helping on finding out and understanding city problems and
citizens’ concerns, and consequently on getting well-formed opinions
for making better decisions in participatory processes
116
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• The ‘Decide Madrid’ e-participation platform
• A web system designed to allow Madrid residents to
make, debate and vote proposals for the city
• Available data from a citizen proposal
• Title
• Author, date
• Summary, description
• Freely-chosen tags
• User comment threads
• Heterogeneous topics and discussions
• urbanism, transport, environment, health care,
education, social rights, education, culture, economy,
job, politics, security, housing, family, old age,
religion, animals, etc.
117
5. Information retrieval applications
https://decide.madrid.es
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• Argument model
• Premise → Claim → Major claim
• Types and subtypes of argument relations
• Cause: linking an argument that reflects the reason or condition for another argument
• Clarification: introducing a conclusion, exemplification, restatement or summary of an argument
• Consequence: evidencing an explanation, goal or result of a previous argument
• Contrast: attacking arguments, distinguishing between giving alternatives,
doing comparisons, making concessions, and providing oppositions
• Elaboration: introducing an argument that provides details about another one,
entailing addition, precision or similarity issues about the target argument
118
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based document search
• Example of an extracted argument tree
119
5. Information retrieval applications
C = claim
L = linker
P = premise
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• Through a natural language
conversation with the chatbot,
the user can:
1. explore citizen proposals and
comments, organized by
categories, topics and districts
2. access to categorized citizens’
arguments given
in the debates around a
proposal
3. provide feedback and
votes for proposals
120
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• The chatbot is built upon the Google DialogFlow framework, which links external web services
with a variety of instant messaging and social networking services, e.g., Google Assistant,
Facebook Messenger, WhatsApp, Telegram and Skype
121
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• The chatbot handles several conversation intents, each of them with triggering sentence
patterns and associated functionalities
122
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
User study: empirical evaluation of the chatbot in terms of:
1. The feasibility of exploring e-participation content via a conversational interface
2. The potential benefits of argument-driven information in e-participation
• Uncontrolled, realistic scenario
• Without external supervision, participants freely tested the chatbot via Telegram during a period of one
week, using their own Telegram accounts and mobile devices
• 32 participants → 2 groups
• Control group: having disabled the chatbot’s argument-driven browsing functionalities
• Experimental group: having enabled the chatbot’s argument-driven browsing functionalities
123
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
Study questionnaire
• 33 items
• 10 evaluation criteria
• Citizen participation
• Decision making
• Public values
124
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
32 participants
• Gender: 22 male, 10 female
• Ages: 18-29 years old (12), 30-39 years old (9), 40-49 years old (5), 50-59 years old (4), more
than 59 years old (2)
• Education levels: secondary education (3), vocational education (1), Bachelor’s degree (20),
Master’s degree (6), Doctoral degree (2)
• Those with Higher Education levels had studied Sciences (3), Social Sciences (10),
Arts and Humanities (4), and Engineering (11) careers
• Diverse levels of knowledge/expertise on chatbots –null knowledge and expertise (5),
null expertise (5), low expertise (20), medium expertise (2)
• Diverse levels of knowledge on citizen participation –null (7), low (16), medium (9)
125
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• Objective metrics
• Subjective questionnaires
126
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• More user activity
• No significant difference on the avg. number of sessions per user (between groups)
• Longer sessions in the experimental group
- Increase of 45.6% on the avg. session duration (from 16.0 to 23.3 minutes)
- Increase of 14.3% (from 56.8 to 64.9) on the avg. number of actions per user
• Higher user engagement and persuasiveness
• Increase of 23.5% (from 1.7 to 2.1) on the avg. number of feedback actions per user
• Meaningful exploration of arguments (avg. 7.4 actions per user)
• Better user opinions
• About the chatbot: highly efficient, quite effective, moderately easy to use
• About the argumentative information: higher perception of transparency and fairness
127
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Argument-based conversational information access
• Participants’ suggestions
• A more “natural” conversation with the chatbot
• A more fluent transition between browsed proposals
• Facilities to read proposals with large descriptions
• Future research directions
• Personalized recommendation mechanisms to proactively present relevant content to the user, thus
mitigating the information overload problem
• Richer data structures, analysis and visualizations for facilitating decision making
• Functionalities oriented to citizen collaboration
• Integration of external data sources, such as open government data and news items
128
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
• Argument mining in a nutshell
• Argument-based document search
• Argument-based conversational information access
• Neural network-based argument extraction
6. Recommendation applications
7. Conclusions
129
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• Argument retrieval aims at automatically extracting structured argumentative
information existing in a text corpus
• It has been commonly modeled as a pipeline of three tasks, namely argument
segmentation, argument component classification, and argument relation recognition
• We investigate the application of transformer-based deep learning to jointly
address the above tasks as a single end-to-end sequence tagging problem
130
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
Deep neural network architecture
• 1st block: BETO Language model
• A BERT-based model trained on a corpus in Spanish with Wikipedia articles, legal texts,
and TED Talks transcript
- 12 encoders with a hidden layer size of 768 units, and 12 self-attention heads
• 2nd block: generic layers of feed-forward neural networks
• 3rd block: task-specific layers that address the following argument mining tasks
• Identification of argumentative units (BIO tagging task)
• Classification of argumentative components: premise, claim, major claim, empty
• Recognition of argumentative relations: 17 subtypes of the 2-level taxonomy
• Classification of argumentative relation intents: support, attack, empty
131
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• Input
• Annotated sentences from citizen comments
• Deep neural network configuration
132
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• ARGAEL: ARGument Annotation and Evaluation tooL
• Simple annotation view: the user identifies argument components and relations (and their types)
133
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• ARGAEL: ARGument Annotation and Evaluation tooL
• Assisted annotation view: the user has access to others’ argument annotations
134
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• ARGAEL: ARGument Annotation and Evaluation tooL
• Evaluation view: the user evaluates others’ argument annotations
135
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• ARGAEL: ARGument Annotation and Evaluation tooL
• Argument component (AC) annotations and evaluations
• Argument relation (AR) annotations and evaluations
136
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• ARGAEL: ARGument Annotation and Evaluation tooL
• Some results of the argument annotation process on the Decide Madrid dataset
137
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• Some preliminary results
• Argument identification
• Argument component classification
138
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Neural network-based argument extraction
• Some preliminary results
• Relation type classification
• Relation intent classification
139
5. Information retrieval applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Contents
1. E-participation
2. Decide Madrid
3. Data acquisition and processing
4. Data mining applications
5. Information retrieval applications
6. Recommendation applications
• Recommender systems in a nutshell
• Personalized recommendations
• Context-aware recommendations
7. Conclusions
140
Disclaimer: some of the materials of this subsection have been created by
Prof. Pablo Castells for his information retrieval master course at EPS-UAM.
Disclaimer: some of the materials of this subsection have been created by
Prof. Pablo Castells for his information retrieval master course at EPS-UAM.
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
141
6. Recommendation applications
Is it possible to help the user to find
information without asking for it?
How to customize the process?
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Personalized recommendations
142
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
143
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
144
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Many ways to make recommendations
• Spotify: https://www.music-tomorrow.com/blog/how-spotify-recommendation-system-works-a-
complete-guide-2022
• Instagram: https://ai.facebook.com/blog/powered-by-ai-instagrams-explore-recommender-system
• Netflix: https://research.netflix.com/research-area/recommendations
https://scale.com/blog/Netflix-Recommendation-Personalization-TransformX-Scale-AI-Insights
• Google Play: https://deepmind.com/blog/article/Advanced-machine-learning-helps-Play-Store-users-
discover-personalised-apps
145
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• It is estimated that the recommendations produce…
• 20% of sales on Amazon
• 60% of streaming on YouTube
• 80% of streaming on Netflix
• ∼10% of electronic commerce
• Recommendation has a large market to tap into
• It seems possible to target beyond ∼10% of engagement
• Many companies aim to exploit such potential
146
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Situations with option overload
• 1994 → 0.5 millions of different products on sale in the USA
• 2010 → 24 millions of products only in Amazon
• Recommendation = Personalized IR without explicit query
• First initiatives published in 1992 (Tapestry at Xerox Parc)
• Precedents: user models based on stereotypes (late 70s)
• Conferences: RecSys, SIGIR, ECIR, UMAP
• Confluence with other areas: Machine Learning (ICML, ECML, IJML, etc.), Data Mining
(KDD, etc.), Artificial Intelligence (IJCAI, AAAI), Human Computer Interaction (IUI)
147
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Non-personalized recommendations
148
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Contextualized recommendations
149
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Utility of recommender systems
150
6. Recommendation applications
Jannach, D. and Adomavicius, G. 2016. Recommendations with a purpose. In Proceedings of the 10th
ACM Conference in Recommender Systems (RecSys ’16), pp. 7-10.
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• User preferences
151
6. Recommendation applications
Ratings
Reviews
Categorical
Thumbs up / down
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Personalized recommendations: problem formulation
152
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Problem formulation
• Input
- A set U of users
- A set I of items
- A sorted set R of values, e.g., R = { 1, 2, 3, 4, 5 }
- A functional relation. r : U x I → R
- Typically, r(u,i) is a “rating”, and represents the user u’s assessment for item I at scale R
- This input can be seen as a matrix of ratings
- Most of its values (95% and more in general) are unknown
• Goal
- Predicting the values r(u,x) of items x for a user u who has not evaluated such items
- The unknown values r(u,x) are considered for recommending x to u
- In general, generating a sorted list of items that can be of interest for the user
- This goal is commonly referred as generating the “top n” recommendations
153
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Problem formulation
• Implicit user feedback (preferences)
- No need for asking the user
- r : U x I→ {0, 1} binary, e.g., “u buys i”
- It can be treated as a particular case R = {0, 1}
- r : U x I → R measuring the frequency of accessing item by user u, e.g., listening music
- Binarized to 1 if frequencies > 0
- Applying a conversion function frequency → rating (e.g., percentiles)
- r : U x I → P(T) for users u annotating (tagging) items x, where T is a set of tags
- It can be treated as “1 tag 1 vote”, but more elaborated and complex techniques can be
performed on graphs of tags, items, users…
- Timestamps
- Frequency data: r(u,i) is a set of timestamps
- Rating data: r(u,i) is a [rating, timestamp] pair
154
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Types of recommendation strategies
• Content-based filtering (CB)
- Item features are considered: words (text case), descriptors (metadata), etc.
- Items are compared with user information collected in a preference profile
- A user profile is long-term; it can be acquired through decision trees, neural networks, etc.
• Collaborative filtering (CF)
- Items are opaque
- The profiles of other users with similar traits (tastes, behavior patterns, demographic data,
etc.) are used to recommend items
• Hybrid filtering: combining different recommendation strategies
- Combining the output of CB and CF
- Inserting CB elements into CF or vice versa
- Unified models
155
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Content-based filtering
156
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Content-based filtering
• Each user is recommended without looking at others
• A feature space for the items is needed → items are represented as vectors in such space
- “Data” that describe the items, structured or unstructured, e.g., item metadata (author,
place, language, categories, tags), words in the text associated with items, etc.
- Binary, integer or real values
• A similarity function on the feature space, e.g.,
- Cosine similarity for numerical features
- Jaccard similarity for binary features
• Two very common methods: kNN- and centroid-based
- but many others based on classification can be used
(where users essentially play the role of class)
157
6. Recommendation applications
Facultad de Ciencias Empresariales
Universidad del Bío-Bío, Chile
Data science in practice: Case studies in e-participation
Recommender systems in a nutshell
• Content-based filtering: kNN-based
• Adaptation of the kNN classification algorithm
- In classification, 𝑟(𝑢,i) would be binary
- Ranking of “instances” (items) for each “class” (user), rather than the opposite
158
6. Recommendation applications
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation
Data science in practice: Case studies in e-participation

More Related Content

Similar to Data science in practice: Case studies in e-participation

Niklas wilhelmsson eparticipation-26.11.2013
Niklas wilhelmsson eparticipation-26.11.2013Niklas wilhelmsson eparticipation-26.11.2013
Niklas wilhelmsson eparticipation-26.11.2013Jaakko J. Korhonen
 
Engaging Times: 20 Years of E-Democracy Lessons
Engaging Times: 20 Years of E-Democracy LessonsEngaging Times: 20 Years of E-Democracy Lessons
Engaging Times: 20 Years of E-Democracy LessonsSteven Clift
 
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...e-ROSA
 
eParticipation in East Africa: Theory, platforms and cases - Amahoro Mu Matora
eParticipation in East Africa: Theory, platforms and cases - Amahoro Mu MatoraeParticipation in East Africa: Theory, platforms and cases - Amahoro Mu Matora
eParticipation in East Africa: Theory, platforms and cases - Amahoro Mu MatoraVictor van R
 
20131002 athens e democracy m gonzalez-sancho
20131002 athens e democracy m gonzalez-sancho20131002 athens e democracy m gonzalez-sancho
20131002 athens e democracy m gonzalez-sanchogonzamg
 
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...Diego López-de-Ipiña González-de-Artaza
 
Digital inclusion cambridgeshire 2014 01 15
Digital inclusion cambridgeshire 2014 01 15Digital inclusion cambridgeshire 2014 01 15
Digital inclusion cambridgeshire 2014 01 15Liz Stevenson
 
EC policy actions and priorities in employment, and the potential of online e...
EC policy actions and priorities in employment, and the potential of online e...EC policy actions and priorities in employment, and the potential of online e...
EC policy actions and priorities in employment, and the potential of online e...James Stewart
 
Making transparency work for you 2014
Making transparency work for you 2014Making transparency work for you 2014
Making transparency work for you 2014Common Futures
 
Catalan ecosystem of citizen participation: Open infrastructures for communit...
Catalan ecosystem of citizen participation: Open infrastructures for communit...Catalan ecosystem of citizen participation: Open infrastructures for communit...
Catalan ecosystem of citizen participation: Open infrastructures for communit...Ismael Peña-López
 
2018.07.10 MyGov citizen centric service. Université de l'innovation publique...
2018.07.10 MyGov citizen centric service. Université de l'innovation publique...2018.07.10 MyGov citizen centric service. Université de l'innovation publique...
2018.07.10 MyGov citizen centric service. Université de l'innovation publique...MiquelEstape
 
DRT dissemination event: discussion highlights
DRT dissemination event: discussion highlightsDRT dissemination event: discussion highlights
DRT dissemination event: discussion highlightsOpen Data Research Network
 
Tom Symons, Principal Researcher, Policy and Research, Nesta
Tom Symons, Principal Researcher, Policy and Research, NestaTom Symons, Principal Researcher, Policy and Research, Nesta
Tom Symons, Principal Researcher, Policy and Research, NestaLucia Garcia
 
Open data MISA_ON November 2011
Open data  MISA_ON November 2011Open data  MISA_ON November 2011
Open data MISA_ON November 2011City of London
 
Digital citizen Working roup
Digital citizen Working roupDigital citizen Working roup
Digital citizen Working roupKarl Donert
 

Similar to Data science in practice: Case studies in e-participation (20)

Niklas wilhelmsson eparticipation-26.11.2013
Niklas wilhelmsson eparticipation-26.11.2013Niklas wilhelmsson eparticipation-26.11.2013
Niklas wilhelmsson eparticipation-26.11.2013
 
Engaging Times: 20 Years of E-Democracy Lessons
Engaging Times: 20 Years of E-Democracy LessonsEngaging Times: 20 Years of E-Democracy Lessons
Engaging Times: 20 Years of E-Democracy Lessons
 
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
2nd Stakeholder workshop: Bertin, Embrapa's appraoch to open Agricultural Sci...
 
eParticipation in East Africa: Theory, platforms and cases - Amahoro Mu Matora
eParticipation in East Africa: Theory, platforms and cases - Amahoro Mu MatoraeParticipation in East Africa: Theory, platforms and cases - Amahoro Mu Matora
eParticipation in East Africa: Theory, platforms and cases - Amahoro Mu Matora
 
Pcst2014 salvador
Pcst2014 salvadorPcst2014 salvador
Pcst2014 salvador
 
20131002 athens e democracy m gonzalez-sancho
20131002 athens e democracy m gonzalez-sancho20131002 athens e democracy m gonzalez-sancho
20131002 athens e democracy m gonzalez-sancho
 
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
Technological pillars to enable Smarter (Collaborative + Inclusive) Environme...
 
Transiting to Open Knowledge by fostering Collaboration through CO-CREATION
Transiting to Open Knowledge by fostering Collaboration through CO-CREATIONTransiting to Open Knowledge by fostering Collaboration through CO-CREATION
Transiting to Open Knowledge by fostering Collaboration through CO-CREATION
 
Digital inclusion cambridgeshire 2014 01 15
Digital inclusion cambridgeshire 2014 01 15Digital inclusion cambridgeshire 2014 01 15
Digital inclusion cambridgeshire 2014 01 15
 
EC policy actions and priorities in employment, and the potential of online e...
EC policy actions and priorities in employment, and the potential of online e...EC policy actions and priorities in employment, and the potential of online e...
EC policy actions and priorities in employment, and the potential of online e...
 
Making transparency work for you 2014
Making transparency work for you 2014Making transparency work for you 2014
Making transparency work for you 2014
 
Catalan ecosystem of citizen participation: Open infrastructures for communit...
Catalan ecosystem of citizen participation: Open infrastructures for communit...Catalan ecosystem of citizen participation: Open infrastructures for communit...
Catalan ecosystem of citizen participation: Open infrastructures for communit...
 
2018.07.10 MyGov citizen centric service. Université de l'innovation publique...
2018.07.10 MyGov citizen centric service. Université de l'innovation publique...2018.07.10 MyGov citizen centric service. Université de l'innovation publique...
2018.07.10 MyGov citizen centric service. Université de l'innovation publique...
 
DRT dissemination event: discussion highlights
DRT dissemination event: discussion highlightsDRT dissemination event: discussion highlights
DRT dissemination event: discussion highlights
 
Drt discussion highlights
Drt discussion highlightsDrt discussion highlights
Drt discussion highlights
 
2002 EGPA Conference presentation
2002 EGPA Conference presentation2002 EGPA Conference presentation
2002 EGPA Conference presentation
 
Tom Symons, Principal Researcher, Policy and Research, Nesta
Tom Symons, Principal Researcher, Policy and Research, NestaTom Symons, Principal Researcher, Policy and Research, Nesta
Tom Symons, Principal Researcher, Policy and Research, Nesta
 
Open data MISA_ON November 2011
Open data  MISA_ON November 2011Open data  MISA_ON November 2011
Open data MISA_ON November 2011
 
Universities & Public Engagement
Universities & Public EngagementUniversities & Public Engagement
Universities & Public Engagement
 
Digital citizen Working roup
Digital citizen Working roupDigital citizen Working roup
Digital citizen Working roup
 

Recently uploaded

Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 

Recently uploaded (20)

Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 

Data science in practice: Case studies in e-participation

  • 1. Universidad del Bío-Bío, Chile Facultad de Ciencias Empresariales Iván Cantador, ivan.cantador@uam.es January 13, 2023 Case studies in e-participation Data science in practice
  • 2. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation About me • Iván Cantador • Associate Professor at the Computer Science and Engineering Department of Universidad Autónoma de Madrid, Spain http://www.eps.uam.es/~cantador • Research interests - Recommender systems - Information retrieval - Machine learning - Natural language processing - Semantic technologies - E-government 1
  • 3. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 2
  • 4. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation • Open government • Citizen participation • Digital platforms for citizen participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 3
  • 5. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Open government • Open Government (Oszlak, 2013) – A public management paradigm that arises in a context characterized by: • The disaffection on the part of the citizenry originated by the numerous crises that question the capacity of the Public Administration to deal with them • The rise of the ubiquitous use of technologies, which have transformed communications and interactions between individuals, and have promoted the emergence of open, participatory and collaborative practices • The opening of the government, among other institutions, to the citizens, aiming to end with the existing disaffection 4 1. E-participation
  • 6. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Open government • Goals of the open government model (Ramírez-Alujas, 2014): • Increasing the transparency (accountability) and access to government information through Open Data - These open data should allow citizens to have access to information and should promote innovation and economic development in the public sector • Facilitating the collaboration between distinct actors, particularly between public administrations, civil society, and the private sector, in order to codesign and generate public value • Promoting citizen participation in the design and implementation of public policies, i.e., in decision and policy making 5 1. E-participation
  • 7. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Open government • Background – Memorandum on Transparency and Open Government, USA. Barack Obama’s Administration, 2009 6 1. E-participation • Providing information about the government activity, its performance, etc. This encourages and promotes accountability and social control. Transparency • Promoting the right of citizens to actively participate in policy making. Participation • Involving citizens and other actors in scenarios of cooperation and coordinated work. Collaboration • Using technology as an instrument to promote openness in government, facing the challenges of the new millennium. Technology
  • 8. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Open government • The Open Government Partnership emerged in 2011 in order to promote open government in different administrations • It seeks for the different governments to reach specific commitments on transparency and power of citizens, fight against corruption, and take advantage of new technologies to strengthen governance - Founded by 8 countries: Brazil, Mexico, Indonesia, Philippines, Norway, USA, South Africa, UK - Composed of 70 member states and numerous government organizations • Principal commitments: - Improvement of public services - Increased public integrity - Effective management of public resources - Safer communities - Increased corporate responsibility 7 1. E-participation https://www.opengovpartnership.org
  • 9. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation • Open government • Citizen participation • Digital platforms for citizen participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 8
  • 10. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Citizen participation • Citizen participation is a process that provides private individuals an opportunity to influence on public decisions, and has been a component of democratic decision making • A community-based process in which citizens may organize themselves and their goals, and may work together through non-governmental organizations to influence on public policies and plans • Benefits • Governance: reducing conflicts, strengthening democratic legitimacy, encouraging active citizenship → government transparency and accountability, and trust between citizens and political institutions • Increasing the quality of public decisions and services • Learning and training to build stronger societies • Promoting social cohesion, mutual understanding and social justice 9 1. E-participation
  • 11. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Citizen participation • Ladder of citizen participation (Arnstein, 1969) • 8 levels in 3 groups - No participation - Symbolic participation - ‘Real’ participation • Simplified by the OECD model into 3 levels 10 1. E-participation
  • 12. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Citizen participation • Barriers of citizen participation • Incompatibilities - Politic, legal, cultural, socioeconomic, organizational • Intrinsic problems - Complex, expensive, under representative, non-plural, little informed, conflictive, non-deliberative, non-scalable, etc. • Extrinsic problems - Arbitrary and manipulable - Inefficient and non-self-sustaining - Irrelevant issues and lack of effect - Citizen saturation - Monopoly of participation, etc. 11 1. E-participation
  • 13. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Citizen participation • Tools for citizen participation • Non-ICT-based - Questionnaires, and surveys - Seminars, talks, and meetings - Discussion and work groups - Cultural, artistic and leisure events • ICT-based - E-mail, RSS, SMS, multimedia sharing - Social media, web portals and e-platforms - Mobile apps - Open data, IoT (crowdsensing) - Augmented/virtual reality 12 1. E-participation
  • 14. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Citizen participation • Participedia.net • Anyone can join the Participedia community and help crowdsource, catalogue, and compare participatory political processes around the world • Cases (2259) • Methods (360) • Organizations (841) • Teaching resources 13 1. E-participation
  • 15. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation • Open government • Citizen participation • Digital platforms for citizen participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 14
  • 16. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Digital platforms for citizen participation • With the advent of social media and mobile computing, nowadays there is a plethora of digital citizen participation channels • general-purpose online social networks • ad hoc e-consultation, e-voting and e-participation platforms • The huge, ever-increasing citizen generated content leads to an information overload problem for both citizens and government stakeholders in decision and policy making tasks • Users may feel overwhelmed by the large amount of data, whose exploration and understanding could result challenging and frustrating • Citizens may feel thwarted if their proposals do not reach sufficient visibility and impact 15 1. E-participation
  • 17. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Digital platforms for citizen participation • E-participation refers to ICT-supported citizen participation in governance processes • administration • service delivery • decision making • policy making • It aims to upgrade the relations among stakeholders in civil society –e.g., local government, citizens, firms–, putting the citizens in the center of the processes • It has originated novel consultation and deliberation initiatives 16 1. E-participation
  • 18. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Digital platforms for citizen participation • E-participation tools by type of engagement and role of ICT/level of participation 17 1. E-participation Aichholzer, G., & Allhutter, D. (2011). Online forms of political participation and their impact on democracy. Institute of Technology Assessment (ITA).
  • 19. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Digital platforms for citizen participation • Most current e-participation platforms are based on web forums • Citizens make proposals and provide comments and opinions, forming large conversation threads 18 1. E-participation Example of web forum-based e-participation platform Citizen proposal Discussions
  • 20. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Digital platforms for citizen participation • Conventional web forums promote social interaction • Pros - Easy and fast content generation (through free text posts) - Smooth, large-scale interaction (via comment threads) • Cons - No or very limited functionalities for content organization, filtering and analysis - Dispersed and redundant content, since it is structured by time - Challenging processing of discussions 19 1. E-participation
  • 21. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid • Participatory budgeting • E-participatory budgeting • The ‘Decide Madrid’ platform 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 20
  • 22. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Participatory budgeting • Participatory budgeting (PB) is a democratic deliberation and decision-making process in which citizens decide how to spend certain municipal or public budgets • informing about issues and problems on a wide range of subject areas in a city, e.g., housing, public safety, education, health, transportation and environment • proposing, debating and supporting/voting for spending ideas and projects aimed to address such problems 21 2. Decide Madrid
  • 23. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Participatory budgeting • Pros • Increased government transparency and trust • Citizens’ empowerment and change of democratic attitude • Better allocation of resources (in general) • Increased voter turnout • Cons • Lack of diverse representation • Time consuming • Resource intensive • Lack of interest or political will 22 2. Decide Madrid
  • 24. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Participatory budgeting • Since its original invention in Porto Alegre, Brazil, in 1988, PB has gained much popularity • As for 2022, PB had spread to over 4,500 cities around the world (source: Participatory Budgeting World Atlas, https://www.pbatlas.net) • Tools of citizen participation • Meetings • Committees • Consultations • … • Electronic participatory platforms 23 2. Decide Madrid http://www.participatorybudgeting.org
  • 25. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Participatory budgeting • PB in Europe (https://www.euractiv.com/section/participatory-democracy/infographic/participatory- budgeting-europes-bet-to-increase-trust-in-government) • While residents’ demands in European cities are often similar, the percentage of budget can vary widely from one place to another: Paris dedicates 25% of the investment budget to PB, while smaller cities usually invest 2 to 5% of their resources. 24 2. Decide Madrid
  • 26. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Participatory budgeting • PB in Chile (https://www.pbatlas.net/chile.html) • 37 local government initiatives + 1 regional government initiative • Although PB initiatives in the country are born in 2002 due to political will of the mayors at local level, since 2014 the region of Los Ríos started its own process: - high valuation of citizen participation that exists in the region - historical roots of the creation of the region that happened in 2007, preceded by a social movement of more than 30 years that demanded to be a region • The presentation of proposals is made mainly through social leaders - the selection of the projects is carried out in neighborhood or territorial assemblies, which mostly are formed by representatives of social organizations and institutions • Regarding voting and prioritizing proposals, predominates the model the people’s direct and universal vote 25 2. Decide Madrid
  • 27. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid • Participatory budgeting • E-participatory budgeting • The ‘Decide Madrid’ platform 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 26
  • 28. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation E-participatory budgeting • In addition to ad hoc PB digital applications and platforms, there are several software frameworks to build online PB platforms • CONSUL, http://consulproject.org: tens of cities in Spain, Italy, France and South America • Stanford Participatory Budgeting, http://pbstanford.org: major cities in the USA, e.g., New York, Chicago, Seattle, Oakland and Boston • EU Open Budgets, http://openbudgets.eu/tools 27 2. Decide Madrid title location category author description supports comments Proposal
  • 29. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation E-participatory budgeting • Motivations for data science applications • Limitations of current ePB platforms of large cities - very limited search and filtering functionalities - unable to facilitate the analysis of hundreds, even thousands, of citizen proposals and associated comments and discussions • Creating a budgeting proposal, a citizen should be aware of similar or related ideas or projects, so she could better define the proposal or find the opportunity to collaborate with others 28 2. Decide Madrid
  • 30. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid • Participatory budgeting • E-participatory budgeting • The ‘Decide Madrid’ platform 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 29
  • 31. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The ‘Decide Madrid’ platform • A web system designed to allow Madrid residents to make, discuss and vote proposals for the city • Used since September 2015 • With a 100M € budget in 2017 • Consisting of a 3-phase process 30 2. Decide Madrid
  • 32. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The ‘Decide Madrid’ platform • ~6,000 citizen proposals per year • Keyword-based search • No use of (structured) metadata • No data analysis • No personalization • No recommendation 31 2. Decide Madrid
  • 33. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The ‘Decide Madrid’ platform • Available data for a proposal • Title • Author • Date • Summary • Description • Freely-chosen tags • Number of user votes • User comment threads 32 2. Decide Madrid
  • 34. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The ‘Decide Madrid’ platform Why considering Decide Madrid as a representative case study? • Participatory budgeting is one the citizen participation methods most used worldwide: • Represented in more than 400 cases from a total of 2,000 cases analyzed in Participedia (https://participedia.net) • Used in more than 3,000 cities and municipalities worldwide according to the Participatory Budgeting Project (https://www.participatorybudgeting.org/white-paper) • Decide Madrid is implemented upon CONSUL (https://consulproject.org), an open-source framework to develop citizen participation platforms: • Used in more than de 135 institutions of 35 countries • With a structure similar to other popular frameworks, such as Stanford Participatory Budgeting (https://pbstanford.org) and EU Open Budgets (http://openbudgets.eu/tools) 33 2. Decide Madrid
  • 35. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 34 1. E-participation 2. Decide Madrid 3. Data acquisition and processing • The data mining pipeline • Data crawling • Data scraping • Data processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions
  • 36. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The data mining pipeline 35 3. Data acquisition and processing Data Information Knowledge Understanding, experience, insights, intuitions to use information Pure and simple facts with no particular organization Understanding, experience, insights, intuitions to use information Processed, filtered, calculated, structured, categorized, contextualized data
  • 37. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The data mining pipeline 36 3. Data acquisition and processing Unstructured data Semi-structured data Structured data Simple and flexible structure, no strict format Limited vocabulary, schema mixed with data values E.g.: taxonomies (categories), folksonomies (tags) Rigid structure, strict format Well defined vocabularies and representation E.g.: databases, ontologies No structure Non-restricted vocabulary, no predefined schema E.g.: free text
  • 38. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The data mining pipeline • Open Government Data (OGD) promote transparency, accountability and public value creation • By making datasets publicly available, institutions become more transparent and accountable to citizens • By facilitating the use, reuse and free distribution of datasets, governments foster business creation and innovative, citizen-centered digital applications and services • OGD portals enable the general public to access the open data collections • allowing the search of data files, but not the search of information within the files 37 3. Data acquisition and processing
  • 39. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The data mining pipeline • Open data portals are web sites to access sets of OGD collections • Search engine - Retrieving collections via keyword-based queries • Collection metadata - Title, description, date, size, etc. • Data files - Formats: CSV, XLS, XML, RDF, etc. - To be downloaded and opened with specific applications, e.g., Microsoft Excel • Documentation - Inner structure of the data files 38 3. Data acquisition and processing Example: open data portal of Madrid City Council
  • 40. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The data mining pipeline • Open data are commonly provided as tables: • Rows = data records (instances, individuals) • Columns = data attributes (features, fields) 39 3. Data acquisition and processing Example: records of traffic accidents occurred in Madrid in 2020
  • 41. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The data mining pipeline Methodology • Text processing on titles, tags, descriptions and comments of citizen proposals • Semantic annotation of proposals: topics and districts • Computing discussion and controversy metrics on the comments of each proposal • Exploiting open data as statistical indicators about districts: economic, sociocultural, ideology, employment, education, health, housing, etc. 40 3. Data acquisition and processing
  • 42. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The data mining pipeline • 2 complex processes for: • crawling and scrapping the ‘Decide Madrid’ web pages • mapping tags to places and topics • 22 districts & hundreds of places • 30 topics • urbanism, transport, environment, health care, education, social rights, education, culture, economy, job, politics, security, housing, family, old age, religion, animals, etc. Assumption: a comment = a (positive, unary,) rating 41 3. Data acquisition and processing
  • 43. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation The data mining pipeline Dataset • Participatory budgeting of 4 editions: 2015-2018 • Around 29,000 proposals • More than 86,000 comments • 30 categories and 325 topics • 21 districts + “city scope” 42 3. Data acquisition and processing
  • 44. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 43 1. E-participation 2. Decide Madrid 3. Data acquisition and processing • The data mining pipeline • Data crawling • Data scraping • Data processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions
  • 45. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data crawling • A (web) crawler is a computer program that browses the Web in a methodological (with an orderly fashion), automated manner • Applications • Web search/indexing • Vertical (specialized) search engines, e.g., news, shopping, recipes, reviews, papers • Monitoring web sites and pages of interest • Business intelligence: collecting information about company competitors and potential collaborators • Malicious applications: collecting personal information 44 3. Data acquisition and processing
  • 46. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data crawling • A crawler within a web search engine 45 3. Data acquisition and processing
  • 47. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data crawling • A crawler within a web application 46 3. Data acquisition and processing
  • 48. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data crawling • Generic web crawling process • Seeds - A list of starting URLs • Visiting order - Frontier = unvisited URLs - Deciding which URLs should be discarded to not fill up the frontier (lower priority) • Stop criterion - Empty frontier or maximum number of pages crawled 47 3. Data acquisition and processing
  • 49. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data crawling • Best First • The simplest topical crawler • The frontier is a priority queue based on text (or keyword) similarity between topic and parent page 48 3. Data acquisition and processing bestFirst(topic, seed_urls) { foreach link(seed_urls) { queue(frontier, link); } while (frontier.size() > 0 and visited < MAX_PAGES) { link = dequeueMax(frontier); // dequeue MAX similarity page := fetch(link); score := sim(topic, page); foreach (extract_links(doc)) { // outlinks enqueue(frontier, outlink, score); } } }
  • 50. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data crawling 49 3. Data acquisition and processing <div class="proposal-content"> <h3><a href="/proposals/34239-luces-led-barrio-concepcion-y-san-pascual"> Luces LED Barrio Concepción y San Pascual </a></h3> <p class="proposal-info"> <span class="icon-comments"></span>&nbsp; <a href="/proposals/34239-luces-led-barrio-concepcion-y-san-pascual#comments"> Sin comentarios</a> <span class="bullet">&nbsp;•&nbsp;</span>01/12/2022 <div class="proposal-content"> <h3><a href="/proposals/34239-luces-led-barrio-concepcion-y-san-pascual"> Luces LED Barrio Concepción y San Pascual </a></h3> <p class="proposal-info"> <span class="icon-comments"></span>&nbsp; <a href="/proposals/34239-luces-led-barrio-concepcion-y-san-pascual#comments"> Sin comentarios</a> <span class="bullet">&nbsp;•&nbsp;</span>01/12/2022
  • 51. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data crawling 50 3. Data acquisition and processing public static void downloadProposalsURLs(String url, String file, int firstPage, int lastPage, boolean append) throws Exception { BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file, append), "UTF-8")); for (int p = firstPage; p <= lastPage; p++) { // Pick a random user agent int indx = RAND.nextInt(USER_AGENTS.length); String userAgent = USER_AGENTS[indx]; // Open the connection and read the web document URI uri = new URI(url + p); Connection connection = Jsoup.connect(uri.toASCIIString()); Document doc = connection.userAgent(userAgent).get(); // Read the proposals URLs from the web document -> identified by <a> links within <div class="proposal-content"> element Elements linkList = doc.getElementsByClass("proposal-content"); Iterator<Element> it = linkList.iterator(); while (it.hasNext()) { Element link = it.next(); String linkURL = link.getElementsByTag("a").get(0).attr("href"); writer.write(linkURL + "n"); } } writer.close(); }
  • 52. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 51 1. E-participation 2. Decide Madrid 3. Data acquisition and processing • The data mining pipeline • Data crawling • Data scraping • Data processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions
  • 53. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data scraping 52 3. Data acquisition and processing <img alt="Armando Cuesta" class="initialjs-avatar author-photo" data-char-count="1" data-font-size="19" data-height="32" data-name="Armando Cuesta" data-radius="4" data-seed="460897" data-text-color="#ffffff" data-width="32" src="data:image/…"> <img alt="Armando Cuesta" class="initialjs-avatar author-photo" data-char-count="1" data-font-size="19" data-height="32" data-name="Armando Cuesta" data-radius="4" data-seed="460897" data-text-color="#ffffff" data-width="32" src="data:image/…">
  • 54. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data scraping 53 3. Data acquisition and processing public static Proposal getProposal(String proposalFile, boolean isClosed) throws Exception { Proposal proposal = new Proposal(); Document doc = Jsoup.parse(new File(proposalFile), "UTF-8"); // URL Elements elems = doc.select("meta[property=og:url]"); String url = elems.attr("content").trim(); proposal.setUrl(url); // Id String id = url.substring(url.lastIndexOf("/") + 1); id = id.substring(0, id.indexOf("-")); proposal.setId(Integer.valueOf(id)); // Title elems = doc.select("meta[property=og:title]"); String title = elems.attr("content").trim(); proposal.setTitle(title); // Summary String summary = doc.select("div.proposal-show").get(0).getElementsByTag("blockquote").text().trim(); if (summary.equals("Resumen de la propuesta")) { summary = ""; } proposal.setSummary(summary); ...
  • 55. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing • The data mining pipeline • Data crawling • Data scraping • Data processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 54
  • 56. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data processing • Database tables • Created by the crawler and scraper - proposals (code, title, author, date, summary, description, supports,…) - users - proposal_tags - proposal_comments (id, author, text, parent_comment, pos_votes, neg_votes, …) • Created from proposal_tags - proposal_categories  text processing + clustering - proposal_topics  text processing + clustering - proposal_districts  text processing - proposal_locations  text processing + mapping to a street directory + geolocation 55 3. Data acquisition and processing
  • 57. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data processing • Graph building • Its nodes are whole set of proposal tags • Each of its (weighted) edges links “related” a pair of tags, according to: - Syntactic similarity - Semantic similarity - Cooccurrences within proposals • Graph clustering method proposed by Newman and Girvan (2004) • It has a criterion to automatically set an optimal number of clusters • Each cluster represents a topic, which is composed by a set of tags 56 3. Data acquisition and processing
  • 58. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Data processing • 2-level taxonomy: 30 categories + 325 topics 57 3. Data acquisition and processing Accesibilidad accesibilidad, accesibilidad metro, aparcamiento para discapacitados.... Animales adiestramiento canino, águilas, animales, animales de compañía, antitaurino.... Asociaciones asociaciones, asociaciones de vecinos, asociaciones juveniles.... Ayuntamiento y administración pública administracion, alcaldesa, atencion al ciudadano.... Civismo acoger, bioetica, bullying, cinismo, civico, civismo, colaboracion social.... Cultura arqueologia, arte, arte callejero, arte urbano, artesania, artistas, .... Delincuencia anti corrupcion, atraco, carteristas, corrupcion, delincuencia, delitos.... Deportes actividad fisica, anillo ciclista, area de deportes, atletas, atleti, atletismo.... Derechos sociales abuso, acoso, albergue, altermundialismo, apoyo emocional, apoyo social.... Economía actividad económica, ahorro, bancos, bbva, comerciantes, comercio.... Educación acoso escolar, alumnos, bachillerato, bibiotecas, brecha cultural.... Empleo autoempleo, autónomos, comerciales, conciliacion laboral, contratacion municipal.... Equidad e integración chabolas, cie, derechos lgtbi, inmigración, desigualdad de genero... Familia e infancia actividades infantiles, ayuda embarazo, bebes, carricoche.... Jóvenes acoso escolar, adolescencia, adolescentes, asociaciones juveniles.... Justicia constitucion, cumplimiento de las leyes, dictadura, fiscal, franquismo.... Medio ambiente acusticas, agroecologia, agua, aire, aire acondicionado, ajardinamiento.... Movilidad abono transportes, adif, agentes de movilidad, aparamiento regulado... Ocio y entretenimiento baile, bares, celebraciones, centro comercial, cines, conciertos.... Participación ciudadana accion social, avisos madrid, decide madrid, decidemadrid... Política, 15m, ahora madrid, ayuntamiento, ayuntamiento de madrid, democracia.... Religión españa laica, estado aconfesional, iglesia, islam, laicismo, religion... Salud y sanidad acoholismo, acustica, acusticas, aire libre, aire puro, alcohol.... Seguridad y emergencias accidentes, app emergencias, aviso, avisos madrid, bomberos... Sostenibilidad agroecologia, ahorro de energia, autogestion, ciudad amable, .... Tercera edad abuelos, ancianos, centros de dia, desempleo mayores, jubilacion.... Transparencia anti corrupcion, datos abiertos, derecho a la informacion.... Turismo oferta turistica, puntos de informacion turistica, puntos de interes... Urbanismo aceras, adoquinado, ajardinamiento, alumbrado, apariencia edificios.... Vivienda alquileres, alquiler vacacional, alquiler vivienda, derecho a un vivienda....
  • 59. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications • Discussion and controversy analysis • Clustering and visualization • Intent-based classification 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 58
  • 60. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis • In the literature, there is a predominance of online tools implemented ad hoc to facilitate citizen participation at scale and to reduce costs • Aiming to analyze in depth how participation is performed in such tools, we conduct a study about a particular tool • The chosen tool is Decide Madrid (https://decide.madrid.es), the participatory budgeting e-platform of Madrid City Council since 2015 • The study makes use of diverse data: • Topics, districts and support levels of citizen proposals • Controversy level of comment threads originated over the proposals • Indicators about economic, sociocultural and ideological aspects of the districts 59 4. Data mining applications
  • 61. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Motivation • Lack of government institutions comprehension about content generated by citizens in electronic tools • Possibility that institutions fail to meet the citizens’ demands - Meeting certain relevant demands may be missed, not because they are unfeasible, but because of their controversial nature • Decreased quality of decision making • Loss of confidence on the part of the citizenry 60 4. Data mining applications
  • 62. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Decide Madrid • Operational since 2015 • With more than 6,000 citizen proposals a year • With more than 400,000 registered users in 2019 • With a structure of discussion threads (comments) for each citizen proposal 61 4. Data mining applications Ejemplo de propuesta ciudadana en Decide Madrid. title author, date description tags votes comments
  • 63. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Controversy metrics • To measure the controversy of a citizen proposal, we consider the aggregation of 3 metrics applied to discussion threads (comments) 62 4. Data mining applications Controversy based on the content (lenght) of dicussions Controversy vased on the opinión polarization (of votes) Controversy based on the estructure of the conversations
  • 64. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Controversy metrics • Discussion content-based controversy • The length of the proposal’s discussion, measured as the sum of the length of its comments • Opinion polarization-based controversy • A weighted ratio measuring the difference of positive and negative votes for the proposal’s comments • Conversation structure-based controversy • An adaptation of the H-index for measuring discussion diversification 63 4. Data mining applications
  • 65. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Some results of the study (I) • The controversy values follow a heavy tail distribution, in which the majority of the proposals have low controversy • The proposals highly supported are not necessarily the most controversial 64 4. Data mining applications “In Decide Madrid, proposals with a low level of support are currently discarded and archived, regardless of the level of discussion and controversy they have. However, from a decision-making perspective, it would be interesting to delve deeper into the controversial proposals and understand the problems of the city and the citizens they are affected by”. “In Decide Madrid, proposals with a low level of support are currently discarded and archived, regardless of the level of discussion and controversy they have. However, from a decision-making perspective, it would be interesting to delve deeper into the controversial proposals and understand the problems of the city and the citizens they are affected by”.
  • 66. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Some results of the study (II) • Most controversial and supported topics • Religion: inclusion of LGTBI+ groups in Cabalgata de Reyes, public funding and tax benefits for Catholic institutions • Housing: creation of social housing, annual property taxes • Culture: prohibition of bullfighting • Topics having low-moderate number of proposals with low level of support and high controversy • Governance: transparency, citizen participation, public administration, laws and legislation • Rights and social movements: social rights, civility, equity, migration, integration, crime, NIMBY 65 4. Data mining applications “In Decide Madrid, citizens’ ideological differences play an important role in the group of controversial categories”. “In Decide Madrid, citizens’ ideological differences play an important role in the group of controversial categories”. “In Decide Madrid, political and social issues reach a low-moderate relevance (final attention)”. “In Decide Madrid, political and social issues reach a low-moderate relevance (final attention)”.
  • 67. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Some results of the study (II) • Topics having a large number of proposals with a high level of support and controversy • Domestic animals, mainly dogs (e.g., cleaning and fines for excrements on public roads, creation of "pipicans", compulsory leash, etc.) • Topics having low-moderate number of proposals with low-moderate level of support and controversy • Education, health, family, childhood, old age, employment, accessibility, youth. 66 4. Data mining applications “In Decide Madrid, proposals aimed at some vulnerable groups (for example, people with disabilities, the elderly, unemployed) tend to generate less citizen participation”. “In Decide Madrid, proposals aimed at some vulnerable groups (for example, people with disabilities, the elderly, unemployed) tend to generate less citizen participation”.
  • 68. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Some results of the study (III) • Study of factors external to participation. Calculation of the correlation between levels of support/controversy and district “statistical indicators” published as open data 67 4. Data mining applications
  • 69. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Some results of the study (III) • Study of factors external to participation. Calculation of the correlation between levels of support/controversy and district “statistical indicators” published as open data • The districts in which the greatest number of proposals are generated are those with: • A high number of groups, neighborhood associations, and consumer organizations • A more progressive position, that is, in which the majority voted for PSOE and Unidas Podemos • A greater environmental commitment, that is, with more ecological associations • The districts in which the most controversial proposals are generated are those with: • A higher percentage of young people • A greater number of citizens belonging to vulnerable groups, such as the elderly, young people and people with some type of disability • A higher birth rate and number of associations related to childhood 4. Data mining applications 68
  • 70. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Discussion and controversy analysis Limitations of the study • The discussion of the votes (for or against) given to the comments has been considered • The “polarity” (positive or negative) of the comments themselves should be analyzed. To do this, natural language processing techniques would have to be applied • Decide Madrid, which is a tool restricted and adjusted to a specific participation procedure, has been analyzed • More open tools such as online social networks (e.g., Twitter) should be considered • Proposals and discussions motivated by political and ideological cleavages that traditionally confront Spanish society have been observed (ideological positioning on the left-right scale, religious versus secular values, traditional versus progressive, etc.) • Tools from other countries should be analyzed to obtain more generalizable conclusions • Possible biases (e.g., digital divide, political program) that could exist in users who use Decide Madrid, and similar tools, have been omitted 4. Data mining applications 69
  • 71. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications • Discussion and controversy analysis • Clustering and visualization • Intent-based classification 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 70
  • 72. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Clustering and visualization • Citizen collaboration through current digital participation platforms can entail the generation of large amounts of complex content, which may hide relevant citizens’ concerns, requests and initiatives, diluted in isolated individual proposals • We present an interactive data mining tool for citizen participation data visualization and analysis • Applying natural language processing, text similarity, and graph clustering techniques • Grouping proposals with common objectives • Identifying trends and recurrent topics of interest • Filtering and presenting information according to several criteria • The tool is flexible, able to process different sources of data, and lightweight as it uses simple data structures and dynamic HTML-based visualization and interaction 71 4. Data mining applications
  • 73. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Clustering and visualization • The tools is built upon the Tableau data visualization software https://www.tableau.com/resource/ data-visualization • Lightweight • Easy to configure • Several visualization functionalities - Diagram bars - Heat maps - Time series graphs 72 4. Data mining applications
  • 74. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Clustering and visualization • Distribution of proposals, categories and topics, according to: • Time (year, month) and location (district) • Support, discussion and controversy levels • Diverse temporal and geographical analysis • Better and easier extraction of patterns and insights when analyzing the published citizen generated content 73 4. Data mining applications
  • 75. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Clustering and visualization • Text processing • Mistake correction - Dictionary - Levenshtein distance • Special characters removal • Stopwords removal • Word lemmatization • Document similarity • Word Mover’s Distance (WMD) similarity, which treats text documents as weighted point clouds of word embeddings 74 4. Data mining applications
  • 76. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Clustering and visualization • Document clustering • Weighted graph - Nodes: citizen proposal documents - Edges: document similarity values - Removal of edges with “low” weights • Louvain clustering method - Optimizes the modularity of the graph, associating nodes to clusters until convergence 75 4. Data mining applications
  • 77. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Clustering and visualization • A coproduction functionality based on the retrieval of existing similar proposals • A citizen who is interested in submitting a new proposal can first bring it into the tool, and check if there are related ones 76 4. Data mining applications
  • 78. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications • Discussion and controversy analysis • Clustering and visualization • Intent-based classification 5. Information retrieval applications 6. Recommendation applications 7. Conclusions 77
  • 79. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Social networks represent a prominent bidirectional communication channel between citizens and government • Citizens are… - content consumers who receive the government announcements, to which they react and freely respond according to personal ideology, interests and needs, and - content providers who generate a wide range of messages targeted to government and political stakeholders • The amount of social media content daily generated by citizens is huge and diverse, and its processing by human actors may result too costly and overwhelming 78 4. Data mining applications
  • 80. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • There is an increasing interest and need to use computer-assisted solutions capable of automatically gathering, processing and analyzing the underlying information in the citizens’ messages (a.k.a. posts) on social networks • The research literature reports extensive work on: • analyzing social phenomena produced through the online network structures (e.g., information spreading, fake news, and opinion polarity), and mainly originated by particular events (e.g., natural disasters, elections, and trending news) • extracting the most popular topics addressed by citizens’ posts in social networks, as well as the general dynamics (i.e., temporal evolution) and opinions on such topics 79 4. Data mining applications
  • 81. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Differently to previous work, we go beyond the extraction of topics by attempting to automatically classify citizens’ posts (tweets) according to their intents or purposes 1. Complaint: stating something that is unsatisfactory or unacceptable - “@MADRID after 1 week of calling, the city is yet not clean, and the rats are taking over!! http://t.co/IiIDuaPFG9” 2. Announcement: making a public statement about a fact, occurrence or event - “The date, place and schedule of the Festival activities in La Latina have already been confirmed http://t.co/U0tRwKAC @madrid @madridiario” 3. News item: objectively informing about current events - “#oladecalor #aemet @Madrid has suffered its warmest night within the latest 100 years http://t.co/ZSjeqK6m” 80 4. Data mining applications
  • 82. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Differently to previous work, we go beyond the extraction of topics by attempting to automatically classify citizens’ posts (tweets) according to their intents or purposes 4. Personal fact: publicizing self issues and experiences - “I also support the candidature from @Madrid2020ES @MADRID #aporella” 5. Opinion: expressing subjective opinions about the city, its events, activities, etc. - “The activity of #emprendeenmadrid is amazing. Congratulations @MADRID and greetings from an entrepreneur” 6. Request: explicitly asking for something specific - “Very nice but impossible to ride a bike at normal speed #MadridRio. Please @MADRID create a bike lane with cyclist priority” 81 4. Data mining applications
  • 83. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Differently to previous work, we go beyond the extraction of topics by attempting to automatically classify citizens’ posts (tweets) according to their intents or purposes 7. Notification: reporting or giving notice of urban, citizenship- or government-related issues, so that government can quickly act on them and help other citizens - “@MADRID can you fix this gap in San Bernardino street 8-10 before someone gets hurt? http://lockerz.com/s/117566458” 8. Question: explicitly asking for information - “@MADRID could you please give me the telephone number of the press office of the Madrid city hall” 9. Proposal: suggesting an initiative or project - “There is a collection of used oil in the center of Alicante. It would be fantastic to have something similar @MADRID” 82 4. Data mining applications
  • 84. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • To automatically categorize a tweet into one of the previous intents (labels), it is first transformed into a vector of features • We consider 37 domain- and language-independent features to describe the content of a tweet 83 4. Data mining applications Lexical features Grammatical features Social network-based features • number of characters • number of words • number of exclamation marks • number of question marks • existence of a positive emoticon • existence of a negative emoticon • existence of a vowel (or “y”) consecutively repeated 3 or more times in a word • number of nouns • number of proper nouns • number of adjectives • number of verbs • number of adverbs • number of personal/possessive pronouns • number of time references (entities) • number of money-related references • number of followers • number of friends (a.k.a. followees) • number of posts • number of active days in Twitter • number of hashtags (#) • number of user mentions (@) • number of hyperlinks • number of multimedia • maximum hashtag length • existence of an explicit retweet request (i.e., "RT" abbreviation)
  • 85. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • To validate the proposed approach, we evaluated several machine learning algorithms on a labeled dataset: • K-Nearest Neighbors (KNN) • Logistic Regression (LR) • Quadratic Discriminant Analysis (QDA) • Decision Tree (DT) - executed alone, and in combination with feature selection (RFECV DT) and tree pruning (AP DT) to avoid learning over-fitting • Gaussian Process (GP) • Support Vector Machine (SVM) • Bagging Ensemble (BE) 84 4. Data mining applications
  • 86. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Dataset: a random sample of 666 tweets mentioning @Madrid account, each of them manually labeled by 3 researchers (almost perfect agreement: Fleiss' kappa = 0.98) • 9 binary classification problems: one-against-all (i.e., training a single classifier per label) • Classification metrics • acc (accuracy) • acc+ (minority class acc) • acc– (majority class acc) • 85 4. Data mining applications
  • 87. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Dataset: a random sample of 666 tweets mentioning @Madrid account, each of them manually labeled by 3 researchers (almost perfect agreement: Fleiss' kappa = 0.98) • 9 binary classification problems: one-against-all (i.e., training a single classifier per label) • Classification metrics • acc (accuracy) • acc+ (minority class acc) • acc– (majority class acc) • 86 4. Data mining applications (very) unbalanced classification problems
  • 88. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Dataset: a random sample of 666 tweets mentioning @Madrid account, each of them manually labeled by 3 researchers (almost perfect agreement: Fleiss' kappa = 0.98) • 9 binary classification problems: one-against-all (i.e., training a single classifier per label) • Classification metrics • acc (accuracy) • acc+ (minority class acc) • acc– (majority class acc) • 87 4. Data mining applications (misleading) high classification accuracies
  • 89. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Dataset: a random sample of 666 tweets mentioning @Madrid account, each of them manually labeled by 3 researchers (almost perfect agreement: Fleiss' kappa = 0.98) • 9 binary classification problems: one-against-all (i.e., training a single classifier per label) • Classification metrics • acc (accuracy) • acc+ (minority class acc) • acc– (majority class acc) • 88 4. Data mining applications reasonably good accuracy balance for the two labels
  • 90. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • Most discriminating words and features for each of the considered intents 89 4. Data mining applications COM = complaint ANN = announcement REQ = request NEW = news item FAC = personal fact OPI = personal opinion
  • 91. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Intent-based classification • The proposed intent-based classification represents a task prior to the extraction of topics and opinions, and may help filtering and prioritizing citizens’ messages, and further automatizing processes for more efficient and effective decision and policy making • There is room for improvement: • More sophisticated NLP techniques, such as language models and word embeddings, could be used to exploit the semantics of words and word sequences - e.g., “opinion is” and “really think that” could be identified as informative bigram and trigram of the personal opinion intent • Features from other sources of information, such as the user who creates a post and the user(s) who are mentioned in a post - e.g., by considering their types: citizens, neighborhood associations, organizations, or political actors 90 4. Data mining applications
  • 92. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications • Argument mining in a nutshell • Argument-based document search • Argument-based conversational information access • Neural network-based argument extraction 6. Recommendation applications 7. Conclusions 91
  • 93. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Tasks • Detection of argument text fragments • Identification of argument components • Extraction of argument relations • Algorithmic foundations • Natural Language Processing (NLP) • Machine/deep learning • Linguistic features • Sentence-level (e.g., sentence length, argument linkers, etc.), grammatical (e.g., number of nouns, adjectives, modal verbs, etc.), syntactic (e.g., patterns, constituency tree depth, etc.), semantic (e.g., named entities, word embeddings, etc.) 92 5. Information retrieval applications Source: ACL’16 tutorial “NLP Approaches to Computational Argumentation”
  • 94. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Tasks 1. Detection of arguments 2. Identification of argument components and structures 3. Extraction of argument relations 93 5. Information retrieval applications Source: ACL’16 tutorial “NLP Approaches to Computational Argumentation”
  • 95. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Tasks 1. Detection of arguments 2. Identification of argument components and structures 3. Extraction of argument relations 94 5. Information retrieval applications Source: ACL’16 tutorial “NLP Approaches to Computational Argumentation”
  • 96. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Tasks 1. Detection of arguments 2. Identification of argument components and structures 3. Extraction of argument relations 95 5. Information retrieval applications Source: ACL’16 tutorial “NLP Approaches to Computational Argumentation”
  • 97. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Tasks 1. Detection of arguments 2. Identification of argument components and structures 3. Extraction of argument relations 96 5. Information retrieval applications Source: ACL’16 tutorial “NLP Approaches to Computational Argumentation”
  • 98. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Tasks 1. Detection of arguments 2. Identification of argument components and structures 3. Extraction of argument relations 97 5. Information retrieval applications Source: ACL’16 tutorial “NLP Approaches to Computational Argumentation”
  • 99. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Example: Categorization of argumentative components via machine learning • Classes - “Major claim”, “Claim”, “Premise” • Features - Lexical lemmatized unigrams including previous tokens - Syntactic number of nested phrases, depth of the syntactic tree, POS distribution, tense of the principal verb, modal verbs - Structural first or last sentence of a paragraph, present in introduction or conclusion, relative position, number of tokens, etc. - Indicators connectors: “because”, “however”, “as a result”, etc. - Contextual contextualized connectors, number of words shared by introduction and conclusion - Probabilistic conditional probability P(category | previous tokens) - Discourse discourse relation based on Penn Discourse Treebank - Embeddings vectors with 300 dimensions trained with Google News Corpus 98 5. Information retrieval applications
  • 100. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Example: Categorization of argumentative components via machine learning • Using all features results on the best F1 values • The classification of claims is the most difficult task • The structural features are the most valuable • The discourse features are informative for the identification of claims • The word embeddings achieve results similar to lexical features 99 5. Information retrieval applications Source: ACL’16 tutorial “NLP Approaches to Computational Argumentation”
  • 101. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Corpus • AIFdb: repository of databases, following the Argument Interchange Format, AIF - AracuriaDB: news editorials, parliamentary records, court summaries and panel discussions - MM2012: transcriptions of BBC Radio 4 - … • The Internet Argument Corpus, IAC: set of political debates in internet forums • The ECHR Corpus: collection of documents extracted from legal texts of the European Court of Human Rights • The Argument Annotated Essays Corpus, AAEC: collection of persuasive essays • … 100 5. Information retrieval applications
  • 102. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Tools • Collaborative editors of argumentative graphs - Agora, http://agora.gatech.edu - Argunet, http://www.argunet.org - DebateGraph, http://debategraph.org - Rationale Online, https://www.rationaleonline.com • Argumentative annotation platforms - Araucaria, http://araucaria.arg.tech - OVA, http://ova.arg-tech.org 101 5. Information retrieval applications
  • 103. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument mining in a nutshell • Events • International Conference on Computational Models of Argument (COMMA), https://comma2020.dmi.unipg.it • Workshop on Argument Mining (ArgMining), https://2021.argmining.org • Workshop on Computational Models of Natural Argument (CMNA), http://cmna.csc.liv.ac.uk/CMNA20 • Summer School on Argumentation (SSA), https://ssa2020.dmi.unipg.it • ACL’19 tutorial “Advances in Argument Mining”, http://arg.tech/~chris/acl2019tut/index.html • ACL’16 tutorial “NLP Approaches to Computational Argumentation”, http://acl2016tutorial.arg.tech • Online Seminars on Computational Models of Argument, https://sites.google.com/view/argumentation-seminar • Dagstuhl’16 seminar “Natural Language Argumentation: Mining, Processing, and Reasoning over Textual Arguments”, https://www.dagstuhl.de/16161 • BiCi’14 seminar “Frontiers and Connections between Argumentation Theory and Natural Language Processing”, http://www-sop.inria.fr/members/Serena.Villata/BiCi2014/frontiersARG- NLP.html 102 5. Information retrieval applications
  • 104. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications • Argument mining in a nutshell • Argument-based document search • Argument-based conversational information access • Neural network-based argument extraction 6. Recommendation applications 7. Conclusions 103
  • 105. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Proposed framework 104 5. Information retrieval applications
  • 106. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Argument model • Premise → Claim → Major claim • Types and subtypes of argument relations • Cause: linking an argument that reflects the reason or condition for another argument • Clarification: introducing a conclusion, exemplification, restatement or summary of an argument • Consequence: evidencing an explanation, goal or result of a previous argument • Contrast: attacking arguments, distinguishing between giving alternatives, doing comparisons, making concessions, and providing oppositions • Elaboration: introducing an argument that provides details about another one, entailing addition, precision or similarity issues about the target argument • Argument mining methods • Syntactic pattern matching • Feature-based machine learning classification • Embedding-based deep neural network 105 5. Information retrieval applications
  • 107. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Heuristic algorithm • For each sentence of an input text: looking for certain syntactic patterns that introduce argumentative expressions • 1,744 arguments extracted from 5,633 comments • Contrast: 54.1% • Consequence: 12.1% • Cause: 3.6% • Elaboration: 0.1% 106 5. Information retrieval applications
  • 108. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Argument linkers 107 5. Information retrieval applications
  • 109. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Information retrieval • Text processing • NLP for linguistic feature extraction • Indexing based on keywords, topics, categories, entities and other metadata • Search engine based on the vector space model • Argument-based reranking according to controversy metrics 108 5. Information retrieval applications
  • 110. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Outcomes – arguments • JSON object created for an argument that evidences a contrast premise on a proposal in favor of using Madrid public transport with pets 109 5. Information retrieval applications
  • 111. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Outcomes – documents, topics and arguments 110 5. Information retrieval applications
  • 112. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Dataset • 80 proposals (covering 10 categories and having high controversy) and 5,633 comments • Experiment setting • 3 evaluators • 3 queries • Topical relevance – accuracy of an argument with respect to the major claim of the discussion • 14.6% of the arguments were labeled as very relevant • 39.9% as relevant • 36.9% as not relevant • 8.6% as incorrect • Rhetoric quality – effectiveness of an argument in persuading an audience • 17.1% of the arguments were of high quality • 40.6% of sufficient quality • 42.3% of low quality 111 5. Information retrieval applications
  • 113. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • We have presented a general and flexible argument-based search framework • Preliminary implementation and evaluation on a dataset with citizen proposals and discussions generated in an online participatory platform • Its current implementation includes: • Various argument extraction methods (heuristic patter matching, feature-based machine learning, embedding-based deep learning) • A document retrieval engine built upon vector space-based models • A reranking strategy that exploits certain controversy metrics • We envision several open research lines: • Development of ad hoc argument-based document retrieval methods (so far, we have used a reranking technique) • Consideration of alternative controversy notions • Increment of the size and quality of the generated corpus • Evaluation on other datasets and domains • Measurement of additional argument quality metrics, e.g., based on diversity, fairness, persuasiveness, etc. 112 5. Information retrieval applications
  • 114. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications • Argument mining in a nutshell • Argument-based document search • Argument-based conversational information access • Neural network-based argument extraction 6. Recommendation applications 7. Conclusions 113
  • 115. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • E-participation –understood as the computer-assisted support to citizen participation– has originated novel consultation and deliberation processes • Most current e-participation platforms are based on web forums • Citizens make proposals and provide comments and opinions, forming large conversation threads • Recent attention has shifted to social media, especially social networks (e.g., Facebook and Twitter) and instant messaging tools (e.g., Telegram and WhatsApp) 5. Information retrieval applications 114
  • 116. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • Conventional web forums promote social interaction • Pros - Easy and fast content generation (through free text posts) - Smooth, large-scale interaction (via comment threads) • Cons - No or very limited functionalities for content organization, filtering and analysis - Dispersed and redundant content, since it is structured by time - Challenging processing of discussions • Argument-driven tools promote the production and reuse of collective knowledge 115 5. Information retrieval applications
  • 117. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • Our work on e-participation… • addresses 2 promising research lines - The exploitation of argument mining techniques to automatically extract and present argumentative information from citizen-generated content - The use of conversational agents or chatbots as citizen-to-government communication channels in instant messaging applications • targets a final goal - Helping on finding out and understanding city problems and citizens’ concerns, and consequently on getting well-formed opinions for making better decisions in participatory processes 116 5. Information retrieval applications
  • 118. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • The ‘Decide Madrid’ e-participation platform • A web system designed to allow Madrid residents to make, debate and vote proposals for the city • Available data from a citizen proposal • Title • Author, date • Summary, description • Freely-chosen tags • User comment threads • Heterogeneous topics and discussions • urbanism, transport, environment, health care, education, social rights, education, culture, economy, job, politics, security, housing, family, old age, religion, animals, etc. 117 5. Information retrieval applications https://decide.madrid.es
  • 119. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • Argument model • Premise → Claim → Major claim • Types and subtypes of argument relations • Cause: linking an argument that reflects the reason or condition for another argument • Clarification: introducing a conclusion, exemplification, restatement or summary of an argument • Consequence: evidencing an explanation, goal or result of a previous argument • Contrast: attacking arguments, distinguishing between giving alternatives, doing comparisons, making concessions, and providing oppositions • Elaboration: introducing an argument that provides details about another one, entailing addition, precision or similarity issues about the target argument 118 5. Information retrieval applications
  • 120. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based document search • Example of an extracted argument tree 119 5. Information retrieval applications C = claim L = linker P = premise
  • 121. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • Through a natural language conversation with the chatbot, the user can: 1. explore citizen proposals and comments, organized by categories, topics and districts 2. access to categorized citizens’ arguments given in the debates around a proposal 3. provide feedback and votes for proposals 120 5. Information retrieval applications
  • 122. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • The chatbot is built upon the Google DialogFlow framework, which links external web services with a variety of instant messaging and social networking services, e.g., Google Assistant, Facebook Messenger, WhatsApp, Telegram and Skype 121 5. Information retrieval applications
  • 123. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • The chatbot handles several conversation intents, each of them with triggering sentence patterns and associated functionalities 122 5. Information retrieval applications
  • 124. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access User study: empirical evaluation of the chatbot in terms of: 1. The feasibility of exploring e-participation content via a conversational interface 2. The potential benefits of argument-driven information in e-participation • Uncontrolled, realistic scenario • Without external supervision, participants freely tested the chatbot via Telegram during a period of one week, using their own Telegram accounts and mobile devices • 32 participants → 2 groups • Control group: having disabled the chatbot’s argument-driven browsing functionalities • Experimental group: having enabled the chatbot’s argument-driven browsing functionalities 123 5. Information retrieval applications
  • 125. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access Study questionnaire • 33 items • 10 evaluation criteria • Citizen participation • Decision making • Public values 124 5. Information retrieval applications
  • 126. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access 32 participants • Gender: 22 male, 10 female • Ages: 18-29 years old (12), 30-39 years old (9), 40-49 years old (5), 50-59 years old (4), more than 59 years old (2) • Education levels: secondary education (3), vocational education (1), Bachelor’s degree (20), Master’s degree (6), Doctoral degree (2) • Those with Higher Education levels had studied Sciences (3), Social Sciences (10), Arts and Humanities (4), and Engineering (11) careers • Diverse levels of knowledge/expertise on chatbots –null knowledge and expertise (5), null expertise (5), low expertise (20), medium expertise (2) • Diverse levels of knowledge on citizen participation –null (7), low (16), medium (9) 125 5. Information retrieval applications
  • 127. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • Objective metrics • Subjective questionnaires 126 5. Information retrieval applications
  • 128. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • More user activity • No significant difference on the avg. number of sessions per user (between groups) • Longer sessions in the experimental group - Increase of 45.6% on the avg. session duration (from 16.0 to 23.3 minutes) - Increase of 14.3% (from 56.8 to 64.9) on the avg. number of actions per user • Higher user engagement and persuasiveness • Increase of 23.5% (from 1.7 to 2.1) on the avg. number of feedback actions per user • Meaningful exploration of arguments (avg. 7.4 actions per user) • Better user opinions • About the chatbot: highly efficient, quite effective, moderately easy to use • About the argumentative information: higher perception of transparency and fairness 127 5. Information retrieval applications
  • 129. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Argument-based conversational information access • Participants’ suggestions • A more “natural” conversation with the chatbot • A more fluent transition between browsed proposals • Facilities to read proposals with large descriptions • Future research directions • Personalized recommendation mechanisms to proactively present relevant content to the user, thus mitigating the information overload problem • Richer data structures, analysis and visualizations for facilitating decision making • Functionalities oriented to citizen collaboration • Integration of external data sources, such as open government data and news items 128 5. Information retrieval applications
  • 130. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications • Argument mining in a nutshell • Argument-based document search • Argument-based conversational information access • Neural network-based argument extraction 6. Recommendation applications 7. Conclusions 129
  • 131. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • Argument retrieval aims at automatically extracting structured argumentative information existing in a text corpus • It has been commonly modeled as a pipeline of three tasks, namely argument segmentation, argument component classification, and argument relation recognition • We investigate the application of transformer-based deep learning to jointly address the above tasks as a single end-to-end sequence tagging problem 130 5. Information retrieval applications
  • 132. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction Deep neural network architecture • 1st block: BETO Language model • A BERT-based model trained on a corpus in Spanish with Wikipedia articles, legal texts, and TED Talks transcript - 12 encoders with a hidden layer size of 768 units, and 12 self-attention heads • 2nd block: generic layers of feed-forward neural networks • 3rd block: task-specific layers that address the following argument mining tasks • Identification of argumentative units (BIO tagging task) • Classification of argumentative components: premise, claim, major claim, empty • Recognition of argumentative relations: 17 subtypes of the 2-level taxonomy • Classification of argumentative relation intents: support, attack, empty 131 5. Information retrieval applications
  • 133. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • Input • Annotated sentences from citizen comments • Deep neural network configuration 132 5. Information retrieval applications
  • 134. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • ARGAEL: ARGument Annotation and Evaluation tooL • Simple annotation view: the user identifies argument components and relations (and their types) 133 5. Information retrieval applications
  • 135. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • ARGAEL: ARGument Annotation and Evaluation tooL • Assisted annotation view: the user has access to others’ argument annotations 134 5. Information retrieval applications
  • 136. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • ARGAEL: ARGument Annotation and Evaluation tooL • Evaluation view: the user evaluates others’ argument annotations 135 5. Information retrieval applications
  • 137. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • ARGAEL: ARGument Annotation and Evaluation tooL • Argument component (AC) annotations and evaluations • Argument relation (AR) annotations and evaluations 136 5. Information retrieval applications
  • 138. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • ARGAEL: ARGument Annotation and Evaluation tooL • Some results of the argument annotation process on the Decide Madrid dataset 137 5. Information retrieval applications
  • 139. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • Some preliminary results • Argument identification • Argument component classification 138 5. Information retrieval applications
  • 140. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Neural network-based argument extraction • Some preliminary results • Relation type classification • Relation intent classification 139 5. Information retrieval applications
  • 141. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Contents 1. E-participation 2. Decide Madrid 3. Data acquisition and processing 4. Data mining applications 5. Information retrieval applications 6. Recommendation applications • Recommender systems in a nutshell • Personalized recommendations • Context-aware recommendations 7. Conclusions 140 Disclaimer: some of the materials of this subsection have been created by Prof. Pablo Castells for his information retrieval master course at EPS-UAM. Disclaimer: some of the materials of this subsection have been created by Prof. Pablo Castells for his information retrieval master course at EPS-UAM.
  • 142. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell 141 6. Recommendation applications Is it possible to help the user to find information without asking for it? How to customize the process?
  • 143. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Personalized recommendations 142 6. Recommendation applications
  • 144. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell 143 6. Recommendation applications
  • 145. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell 144 6. Recommendation applications
  • 146. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Many ways to make recommendations • Spotify: https://www.music-tomorrow.com/blog/how-spotify-recommendation-system-works-a- complete-guide-2022 • Instagram: https://ai.facebook.com/blog/powered-by-ai-instagrams-explore-recommender-system • Netflix: https://research.netflix.com/research-area/recommendations https://scale.com/blog/Netflix-Recommendation-Personalization-TransformX-Scale-AI-Insights • Google Play: https://deepmind.com/blog/article/Advanced-machine-learning-helps-Play-Store-users- discover-personalised-apps 145 6. Recommendation applications
  • 147. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • It is estimated that the recommendations produce… • 20% of sales on Amazon • 60% of streaming on YouTube • 80% of streaming on Netflix • ∼10% of electronic commerce • Recommendation has a large market to tap into • It seems possible to target beyond ∼10% of engagement • Many companies aim to exploit such potential 146 6. Recommendation applications
  • 148. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Situations with option overload • 1994 → 0.5 millions of different products on sale in the USA • 2010 → 24 millions of products only in Amazon • Recommendation = Personalized IR without explicit query • First initiatives published in 1992 (Tapestry at Xerox Parc) • Precedents: user models based on stereotypes (late 70s) • Conferences: RecSys, SIGIR, ECIR, UMAP • Confluence with other areas: Machine Learning (ICML, ECML, IJML, etc.), Data Mining (KDD, etc.), Artificial Intelligence (IJCAI, AAAI), Human Computer Interaction (IUI) 147 6. Recommendation applications
  • 149. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Non-personalized recommendations 148 6. Recommendation applications
  • 150. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Contextualized recommendations 149 6. Recommendation applications
  • 151. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Utility of recommender systems 150 6. Recommendation applications Jannach, D. and Adomavicius, G. 2016. Recommendations with a purpose. In Proceedings of the 10th ACM Conference in Recommender Systems (RecSys ’16), pp. 7-10.
  • 152. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • User preferences 151 6. Recommendation applications Ratings Reviews Categorical Thumbs up / down
  • 153. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Personalized recommendations: problem formulation 152 6. Recommendation applications
  • 154. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Problem formulation • Input - A set U of users - A set I of items - A sorted set R of values, e.g., R = { 1, 2, 3, 4, 5 } - A functional relation. r : U x I → R - Typically, r(u,i) is a “rating”, and represents the user u’s assessment for item I at scale R - This input can be seen as a matrix of ratings - Most of its values (95% and more in general) are unknown • Goal - Predicting the values r(u,x) of items x for a user u who has not evaluated such items - The unknown values r(u,x) are considered for recommending x to u - In general, generating a sorted list of items that can be of interest for the user - This goal is commonly referred as generating the “top n” recommendations 153 6. Recommendation applications
  • 155. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Problem formulation • Implicit user feedback (preferences) - No need for asking the user - r : U x I→ {0, 1} binary, e.g., “u buys i” - It can be treated as a particular case R = {0, 1} - r : U x I → R measuring the frequency of accessing item by user u, e.g., listening music - Binarized to 1 if frequencies > 0 - Applying a conversion function frequency → rating (e.g., percentiles) - r : U x I → P(T) for users u annotating (tagging) items x, where T is a set of tags - It can be treated as “1 tag 1 vote”, but more elaborated and complex techniques can be performed on graphs of tags, items, users… - Timestamps - Frequency data: r(u,i) is a set of timestamps - Rating data: r(u,i) is a [rating, timestamp] pair 154 6. Recommendation applications
  • 156. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Types of recommendation strategies • Content-based filtering (CB) - Item features are considered: words (text case), descriptors (metadata), etc. - Items are compared with user information collected in a preference profile - A user profile is long-term; it can be acquired through decision trees, neural networks, etc. • Collaborative filtering (CF) - Items are opaque - The profiles of other users with similar traits (tastes, behavior patterns, demographic data, etc.) are used to recommend items • Hybrid filtering: combining different recommendation strategies - Combining the output of CB and CF - Inserting CB elements into CF or vice versa - Unified models 155 6. Recommendation applications
  • 157. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Content-based filtering 156 6. Recommendation applications
  • 158. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Content-based filtering • Each user is recommended without looking at others • A feature space for the items is needed → items are represented as vectors in such space - “Data” that describe the items, structured or unstructured, e.g., item metadata (author, place, language, categories, tags), words in the text associated with items, etc. - Binary, integer or real values • A similarity function on the feature space, e.g., - Cosine similarity for numerical features - Jaccard similarity for binary features • Two very common methods: kNN- and centroid-based - but many others based on classification can be used (where users essentially play the role of class) 157 6. Recommendation applications
  • 159. Facultad de Ciencias Empresariales Universidad del Bío-Bío, Chile Data science in practice: Case studies in e-participation Recommender systems in a nutshell • Content-based filtering: kNN-based • Adaptation of the kNN classification algorithm - In classification, 𝑟(𝑢,i) would be binary - Ranking of “instances” (items) for each “class” (user), rather than the opposite 158 6. Recommendation applications