SlideShare a Scribd company logo
1 of 24
Download to read offline
7/20/16
1
Data and Algorithmic Bias
in the Web
Ricardo Baeza-Yates
California (NTENT), Catalonia (UPF), Chile (UChile)
WebVisions Barcelona, July 2016
Steve Jobs and Bias
Pervasive Optimistic Bias (Kahneman)
Reality Distortion Field
7/20/16
2
Every Website is an
Information Market
Good Design
Good Interaction
Right Incentives
7/20/16
3
All Data has Bias
§  Gender
§  Racial
§  Sexual
§  Religious
§  Social
§  Linguistic
§  Geographic
§  Political
§  Educational
§  Economic
§  Technological
§  from Noise or Spam
§  Validity (e.g. temporal)
§  Completeness
§  Gathering process
§  ….
However many people extrapolate
results to the whole population
(e.g., social media analysis)
In addition there is bias when
measuring bias as well as bias
towards measuring it!
Yes, We Live in a (Very) Biased World!
7/20/16
4
A Non-Technical Question
Algorithm
Biased
Data
Neutral?
Same
Bias
Not
Always!
Unbias the data
Tune the algorithm
Unbias the output
Bias awareness!
Big Data and Bias
15
§  The quality of any algorithm is bounded by the
quality of the data that uses
§  Data bias awareness
§  Algorithmic fairness
§  Key issues for machine learning
§  Uniformity of data properties
§  In the Web, distributions resemble a power law
§  Uniformity of error
§  Data sample methodology
§  E.g., sample size to see infrequent events or
sampling bias issues
7/20/16
5
Data bias
Activity bias
Selection bias
Sampling bias and size
Algorithmic bias
Interface
(Self) selection bias
Second order bias
Sparsity
Privacy
Algorithm
7/20/16
6
Quantity
Quality
User-
generated
Traditional
publishing
What is in the Web? How Much Data?
How Good is it? 168 million active web servers,
1083 million hostnames and
infinitely many pages!
26
What else is in the Web?
7/20/16
7
Noise and Spam
27
§  Noise may come from many places:
§  Instruments that measure (e.g., IoT)
§  How we interpret the data
§  Spam is everywhere
§  Fight both with the wisdom of the crowds
Data Bias and Redundancy
38
§  There is any dependency in the data?
§  There is any duplication?
§  Lexical duplication in the Web is around 25%
§  Semantic duplication is larger (more later)
§  Any other biases? Many!
§  Web structure (economic, cultural)
§  Web content (linguistic, geography, gender)
7/20/16
8
39
[Baeza-Yates, Castillo & López. Characteristics of the Web of Spain.
The Information Professional (Spanish), 2006, vol. 15, n. 1, pp. 6-17]
Economic Bias in Links
Number of linked domains
Exports(thousandsofUS$)
40
Baeza-Yates & Castillo, WWW2006
Exports/Imports vs. Domain Links
7/20/16
9
41
[Baeza-Yates, Castillo, Efthimiadis, TOIT 2007]
Website Structure
Minimal effortShame
42
Linguistic Bias
7/20/16
10
Geographical Bias
[E. Graells-Garrido and M. Lalmas,
“Balancing diversity to counter-measure
geographical centralization in microblogging
platforms”, ACM Hypertext’14]
Gender Bias
[E. Graells-Garrido et al,. “First Women, Second Sex: Gender Bias in Wikipedia”, ACM Hypertext’15]
Systemic bias?
Equal opportunity?
7/20/16
11
•  The Web already is influenced by small groups
•  "0.05% of the user population, attract almost
50% of all attention within Twitter" (50K users)
[Wu, Hofman, Mason & Watts, WWW 2011]
•  We explored this issue further with four different datasets:
1.  a large one from Twitter (2011),
2.  a small one from Facebook (2009),
3. Amazon reviews (2013), and
4.  Wikipedia editors (2015).
•  Digital desert: the content that is never seen
Activity Bias: Wisdom of a Few?
[Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
Examples
[Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
7/20/16
12
October 2015
Quality of Content?
51 Yahoo Confidential & Proprietary
•  Adding content implies adding wisdom?
•  We use Amazon’s reviews helpfulness
•  We computed the text entropy
•  Content-based-wise users
•  How many of those users are being paid?
7/20/16
13
Digital Desert
52 Yahoo Confidential & Proprietary
Weblands
of Wisdom
7/20/16
14
Bias in the Interface
Position bias
Ranking bias
Presentation bias
Social bias
Interaction bias
Presentation Bias
§  Interaction data will be biased to what is shown
§  In recommender systems, items recommended will get
more clicks than items not recommended
§  In search systems top ranked results will get more clicks
than other results
›  Ranking bias
›  Interaction bias
CTR
(log)
1 11 21 Rank
[Dupret & Piwowarski, SIGIR 2008]
[Chapelle & Zhang, WWW 2009]bias
7/20/16
15
[WHY AMAZON’S RATINGS MIGHT MISLEAD YOU; The Story of Herding Effects
Ting Wang and Dashun Wang, Big Data, 2014]
Social Bias
Extreme Algorithmic Bias
7/20/16
16
Second Order Bias in Web Content
[Baeza-Yates, Pereira & Ziviani,
Geneological Trees in the Web, WWW 2008]
Person
Web content is redundant
Clicks in results are biased to
the ranking and the interaction
Query
Ranking bias
Redundancy grows (35%)
Search results
New
Most measures in the Web follow a power law
The Long Tail: Sparsity
[Anatomy of the long tail: Ordinary People with Extraordinary Tastes,
Goel, Broder, Gabrilovich, Pang; WSDM 2010]
§  Why there is a long tail?
§  Sampling in the tail
§  When the crowd dominates
§  Empowering the tail
7/20/16
17
When the Crowd
Dominates
Kills the long tail
80
Personalization “facets”:
•  Language (not always)
•  Location
•  Semantic facets per user
•  Query intent prediction in search
Empowering the Tail
The Filter “Bubble”, Eli Pariser
•  Avoid the Poor get Poorer Syndrome
•  Avoid the Echo Chamber
•  How to expose opposite views?
81
Cold start problem solution:
Explore & Exploit
Solutions:
•  Diversity
•  Novelty
•  Serendipity
7/20/16
18
A Data Portrait is a visual
context where users can
explore how the system
understand their interests.
This context is used to
embed content-based
recommendations, displayed
visually to facilitate
exploration and user
engagement.
To combat homophily,
recommendations are
generated having political
diversity in mind.
Does it work? Yes, by using
intermediary topics that are shared!
But only when users are interested
in politics.
Demo at http://auroratwittera.cl/perfil/YahooLabs
[E. Graells-Garrido, M. Lalmas and R. Baeza-Yates, ACM UAI 2016]
•  Exploit the context (and deep learning!)
91% accuracy to predict the next app you will use
[Baeza-Yates et al, WSDM 2015]
•  Personalization vs. Contextualization
Recall that user interaction is another long tail
People
Interests
Aggregating in theTail
7/20/16
19
[De Choudhury et al, ACM HT 2010]
87[Quercia et al, ACM HT 2014]
Crowdsourcing Data: Good Paths
7/20/16
20
Regions from Pictures
[Thomee et al, Demo at CHI 2014]
AOL Query Logs Release Incident
§  No. 4417749 conducted hundreds of searches over
a three-month period on topics ranging from “numb
fingers” to “60 single men”.
§  Other queries: “landscapers in Lilburn, Ga,” several
people with the last name Arnold and “homes sold in
shadow lake subdivision gwinnett county georgia.”
§  Data trail led to Thelma Arnold, a 62-year-old widow
who lives in Lilburn, Ga., frequently researches her
friends’ medical ailments and loves her three dogs.
A Face Is Exposed for AOL Searcher No. 4417749,
By MICHAEL BARBARO and TOM ZELLER Jr,
The New York Times, Aug 9 2006
90
7/20/16
21
91
Risks of Privacy in Query Logs
§  Profile [Jones, Kumar, Pang, Tompkins, CIKM 2007]
•  Gender: 84%
•  Age (±10): 79%
•  Location (ZIP3): 35%
§  Vanity Queries [Jones et al, CIKM 2008]
•  Partial name: 8.9%
•  Complete: 1.2%
§  More information:
•  A Survey of query log privacy-enhancing techniques
from a policy perspective [Cooper, ACM TWEB 2008]
§  A good anonymization technique is still an open problem
7/20/16
22
Privacy Awareness
§ How our privacy changes when we change our social network?
§ Information gain to predict a private attribute based on public data
§ Each user may have a promiscuity score
§ Example: new friendship request
Promiscuity( me ) > Promiscuity( new)
Promiscuity( me ) ≥ Promiscuity( new ) + max-gain-I-allow
Promiscuity( me ) < Promiscuity( new ) + max-gain-I-allow
Related work by [Estivill-Castro & Nettleton; Singh, ASONAM 2015]
The Web Works Thanks to Bias!
§ Web traffic
›  Local caching
›  Proxy/Akamai caching
§ Search engines
›  Answer caching
›  Essential web pages
•  25% queries can be answered with less than 1% of the URLs!
[Baeza-Yates, Boldi, Chierichetti, WWW 2015]
§ E-Commerce
›  Large fraction of revenue comes from few popular items
Activity bias
(Self) selection bias
7/20/16
23
Web Data
§  A mirror of ourselves, the good, the bad and the ugly
§  The web amplifies everything, good or bad, but always
leaves traces
§  We have to be aware of the biases and contrarrest them
§  We have to be aware of our privacy
Big Data of People is huge…..
….. but is tiny compared to the future
Big Data of the Internet of Things (IoT)
It’s Hard to Get Data to Tell the Truth
§  The blindness of the averages
§  Look at distributions
§  Absolute vs. relative
§  Income per capita vs. Inequality
§  Local vs. global optimization
§  Teams competing without knowing, uncorrelated criteria
§  You can always see/torture data as you wish
›  61 analysts, 29 teams: 20 yes and 9 no (Univ. of Virginia, COS)
7/20/16
24
Contact: rbaeza@acm.org
www.baeza.cl
@polarbearby
ASIST 2012
Book of the
Year Award
Questions?
Biased Questions?

More Related Content

What's hot

HR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceHR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceIBM Smarter Workforce
 
2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey resultsDeloitte United States
 
Big Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsBig Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsAgile Technologies
 
Rapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetRapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetPraz Hari
 
The female millennial: A new era of talent
The female millennial: A new era of talentThe female millennial: A new era of talent
The female millennial: A new era of talentPwC
 
Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Adobe
 
Onboarding AI by Jana Eggers
Onboarding AI by Jana EggersOnboarding AI by Jana Eggers
Onboarding AI by Jana EggersGlobant
 
Connecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarConnecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarNetDimensions
 
WUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebWUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebRich Miller
 
Living in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRLiving in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRMartin Sutherland
 
The Future of Personalised Education
The Future of Personalised EducationThe Future of Personalised Education
The Future of Personalised EducationIBM Government
 
The Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceThe Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceCatalant Technologies
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Jason Miller
 
Designing Mobile Experiences
Designing Mobile ExperiencesDesigning Mobile Experiences
Designing Mobile ExperiencesBrian Fling
 
The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!Jennie Vickers
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger Hoerl
 
Social Media is about People not Technology
Social Media is about People not TechnologySocial Media is about People not Technology
Social Media is about People not TechnologyFatmir Hyseni
 
JESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3
 

What's hot (20)

HR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceHR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforce
 
2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results
 
Big Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsBig Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our Organizations
 
Rapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetRapid fire with Douglas Van Praet
Rapid fire with Douglas Van Praet
 
The female millennial: A new era of talent
The female millennial: A new era of talentThe female millennial: A new era of talent
The female millennial: A new era of talent
 
Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016
 
Onboarding AI by Jana Eggers
Onboarding AI by Jana EggersOnboarding AI by Jana Eggers
Onboarding AI by Jana Eggers
 
The Future of Work
The Future of Work The Future of Work
The Future of Work
 
Connecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarConnecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems Webinar
 
WUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebWUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the Web
 
Living in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRLiving in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HR
 
The Future of Personalised Education
The Future of Personalised EducationThe Future of Personalised Education
The Future of Personalised Education
 
The Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceThe Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile Workforce
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
 
Designing Mobile Experiences
Designing Mobile ExperiencesDesigning Mobile Experiences
Designing Mobile Experiences
 
Digital Ethics
Digital EthicsDigital Ethics
Digital Ethics
 
The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013
 
Social Media is about People not Technology
Social Media is about People not TechnologySocial Media is about People not Technology
Social Media is about People not Technology
 
JESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding Presentation
 

Viewers also liked

Organizing for Success with Digital Retail
Organizing for Success with Digital RetailOrganizing for Success with Digital Retail
Organizing for Success with Digital RetailJDA Software
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcasesAndrii Gakhov
 
Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Brenda Leibowitz
 
Myth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceMyth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceDean Shareski
 
Airing of grievances
Airing of grievancesAiring of grievances
Airing of grievancesDean Shareski
 
Nuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaNuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaMariana Garcia Ballesteros
 
นางสาวกรุณา สุขโนนทอง
นางสาวกรุณา   สุขโนนทองนางสาวกรุณา   สุขโนนทอง
นางสาวกรุณา สุขโนนทองsuknontong
 
Advantages of native apps
Advantages of native appsAdvantages of native apps
Advantages of native appsJatin Dabas
 
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Bryan K. O'Rourke
 
Psychological Improvement program
Psychological Improvement programPsychological Improvement program
Psychological Improvement programFarah Hoque
 
טיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםטיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםOrit Levav
 
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...360mnbsu
 
Vancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialVancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialGlassdoor
 
Are you a Feminist?
Are you a Feminist?Are you a Feminist?
Are you a Feminist?Farah Hoque
 
NEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT Conference
 

Viewers also liked (20)

Reference 2.0
Reference 2.0Reference 2.0
Reference 2.0
 
Thirstier
ThirstierThirstier
Thirstier
 
Organizing for Success with Digital Retail
Organizing for Success with Digital RetailOrganizing for Success with Digital Retail
Organizing for Success with Digital Retail
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
 
Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016
 
Imperialismo
ImperialismoImperialismo
Imperialismo
 
Myth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceMyth busting and the Nigerian Prince
Myth busting and the Nigerian Prince
 
Airing of grievances
Airing of grievancesAiring of grievances
Airing of grievances
 
Nuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaNuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garcia
 
นางสาวกรุณา สุขโนนทอง
นางสาวกรุณา   สุขโนนทองนางสาวกรุณา   สุขโนนทอง
นางสาวกรุณา สุขโนนทอง
 
Advantages of native apps
Advantages of native appsAdvantages of native apps
Advantages of native apps
 
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
 
The Full Gospel
The Full GospelThe Full Gospel
The Full Gospel
 
Psychological Improvement program
Psychological Improvement programPsychological Improvement program
Psychological Improvement program
 
טיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםטיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעיים
 
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
 
Vancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialVancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB Financial
 
Are you a Feminist?
Are you a Feminist?Are you a Feminist?
Are you a Feminist?
 
NEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT11 Sponsoring Opportunites
NEXT11 Sponsoring Opportunites
 
Online Marketing and SEO Workshop
Online Marketing and SEO WorkshopOnline Marketing and SEO Workshop
Online Marketing and SEO Workshop
 

Similar to Data and Algorithmic Bias in the Web

Ux day2018 ricardo baeza yayes search-biases-semantics
Ux day2018   ricardo baeza yayes search-biases-semanticsUx day2018   ricardo baeza yayes search-biases-semantics
Ux day2018 ricardo baeza yayes search-biases-semanticsMultiplica
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeTyrone Grandison
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Using Big Data to Tell Your Story
Using Big Data to Tell Your StoryUsing Big Data to Tell Your Story
Using Big Data to Tell Your StoryBen Wright
 
Policy primer net303 study period 3, 2017
Policy primer net303  study period 3, 2017Policy primer net303  study period 3, 2017
Policy primer net303 study period 3, 2017Steve Mckee
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatescaise2013vlc
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatesPROS-UPV
 
Big data in the web
Big data in the webBig data in the web
Big data in the webcaise2013
 
Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureTim Davies
 
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Cloudera, Inc.
 
Informationliteracy
InformationliteracyInformationliteracy
InformationliteracyYvonne M
 
Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015know4drr
 

Similar to Data and Algorithmic Bias in the Web (20)

Ux day2018 ricardo baeza yayes search-biases-semantics
Ux day2018   ricardo baeza yayes search-biases-semanticsUx day2018   ricardo baeza yayes search-biases-semantics
Ux day2018 ricardo baeza yayes search-biases-semantics
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Webinar v.5.23.11
Webinar v.5.23.11Webinar v.5.23.11
Webinar v.5.23.11
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Using Big Data to Tell Your Story
Using Big Data to Tell Your StoryUsing Big Data to Tell Your Story
Using Big Data to Tell Your Story
 
Purdue IronHacks
Purdue IronHacksPurdue IronHacks
Purdue IronHacks
 
Policy primer net303 study period 3, 2017
Policy primer net303  study period 3, 2017Policy primer net303  study period 3, 2017
Policy primer net303 study period 3, 2017
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Your organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and securityYour organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and security
 
Innovations in Data for Decision Making
Innovations in Data for Decision MakingInnovations in Data for Decision Making
Innovations in Data for Decision Making
 
Discovering and mapping your community needs
Discovering and mapping your community needsDiscovering and mapping your community needs
Discovering and mapping your community needs
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Big data in the web
Big data in the webBig data in the web
Big data in the web
 
Gettind data used
Gettind data usedGettind data used
Gettind data used
 
Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructure
 
SLA RGC Universe
SLA RGC Universe SLA RGC Universe
SLA RGC Universe
 
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
 
Informationliteracy
InformationliteracyInformationliteracy
Informationliteracy
 
Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015
 

More from WebVisions

Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...WebVisions
 
Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"WebVisions
 
Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"WebVisions
 
Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”WebVisions
 
The Importance of Side Projects
The Importance of Side ProjectsThe Importance of Side Projects
The Importance of Side ProjectsWebVisions
 
Commit to the Crazy
Commit to the CrazyCommit to the Crazy
Commit to the CrazyWebVisions
 
Intuition and Reason in Design
Intuition and Reason in DesignIntuition and Reason in Design
Intuition and Reason in DesignWebVisions
 
Activism x Technology
Activism x TechnologyActivism x Technology
Activism x TechnologyWebVisions
 
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"WebVisions
 
Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"WebVisions
 
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"WebVisions
 
Art + Commerce
Art + CommerceArt + Commerce
Art + CommerceWebVisions
 
Users are People Too
Users are People TooUsers are People Too
Users are People TooWebVisions
 
Happily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationHappily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationWebVisions
 
Taming Context in the Internet of Things
Taming Context in the Internet of ThingsTaming Context in the Internet of Things
Taming Context in the Internet of ThingsWebVisions
 
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicMind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicWebVisions
 
Poetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentPoetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentWebVisions
 
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"WebVisions
 
Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"WebVisions
 
Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"WebVisions
 

More from WebVisions (20)

Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
 
Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"
 
Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"
 
Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”
 
The Importance of Side Projects
The Importance of Side ProjectsThe Importance of Side Projects
The Importance of Side Projects
 
Commit to the Crazy
Commit to the CrazyCommit to the Crazy
Commit to the Crazy
 
Intuition and Reason in Design
Intuition and Reason in DesignIntuition and Reason in Design
Intuition and Reason in Design
 
Activism x Technology
Activism x TechnologyActivism x Technology
Activism x Technology
 
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
 
Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"
 
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
 
Art + Commerce
Art + CommerceArt + Commerce
Art + Commerce
 
Users are People Too
Users are People TooUsers are People Too
Users are People Too
 
Happily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationHappily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free Prioritization
 
Taming Context in the Internet of Things
Taming Context in the Internet of ThingsTaming Context in the Internet of Things
Taming Context in the Internet of Things
 
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicMind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
 
Poetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentPoetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities Experiment
 
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
 
Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"
 
Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Data and Algorithmic Bias in the Web

  • 1. 7/20/16 1 Data and Algorithmic Bias in the Web Ricardo Baeza-Yates California (NTENT), Catalonia (UPF), Chile (UChile) WebVisions Barcelona, July 2016 Steve Jobs and Bias Pervasive Optimistic Bias (Kahneman) Reality Distortion Field
  • 2. 7/20/16 2 Every Website is an Information Market Good Design Good Interaction Right Incentives
  • 3. 7/20/16 3 All Data has Bias §  Gender §  Racial §  Sexual §  Religious §  Social §  Linguistic §  Geographic §  Political §  Educational §  Economic §  Technological §  from Noise or Spam §  Validity (e.g. temporal) §  Completeness §  Gathering process §  …. However many people extrapolate results to the whole population (e.g., social media analysis) In addition there is bias when measuring bias as well as bias towards measuring it! Yes, We Live in a (Very) Biased World!
  • 4. 7/20/16 4 A Non-Technical Question Algorithm Biased Data Neutral? Same Bias Not Always! Unbias the data Tune the algorithm Unbias the output Bias awareness! Big Data and Bias 15 §  The quality of any algorithm is bounded by the quality of the data that uses §  Data bias awareness §  Algorithmic fairness §  Key issues for machine learning §  Uniformity of data properties §  In the Web, distributions resemble a power law §  Uniformity of error §  Data sample methodology §  E.g., sample size to see infrequent events or sampling bias issues
  • 5. 7/20/16 5 Data bias Activity bias Selection bias Sampling bias and size Algorithmic bias Interface (Self) selection bias Second order bias Sparsity Privacy Algorithm
  • 6. 7/20/16 6 Quantity Quality User- generated Traditional publishing What is in the Web? How Much Data? How Good is it? 168 million active web servers, 1083 million hostnames and infinitely many pages! 26 What else is in the Web?
  • 7. 7/20/16 7 Noise and Spam 27 §  Noise may come from many places: §  Instruments that measure (e.g., IoT) §  How we interpret the data §  Spam is everywhere §  Fight both with the wisdom of the crowds Data Bias and Redundancy 38 §  There is any dependency in the data? §  There is any duplication? §  Lexical duplication in the Web is around 25% §  Semantic duplication is larger (more later) §  Any other biases? Many! §  Web structure (economic, cultural) §  Web content (linguistic, geography, gender)
  • 8. 7/20/16 8 39 [Baeza-Yates, Castillo & López. Characteristics of the Web of Spain. The Information Professional (Spanish), 2006, vol. 15, n. 1, pp. 6-17] Economic Bias in Links Number of linked domains Exports(thousandsofUS$) 40 Baeza-Yates & Castillo, WWW2006 Exports/Imports vs. Domain Links
  • 9. 7/20/16 9 41 [Baeza-Yates, Castillo, Efthimiadis, TOIT 2007] Website Structure Minimal effortShame 42 Linguistic Bias
  • 10. 7/20/16 10 Geographical Bias [E. Graells-Garrido and M. Lalmas, “Balancing diversity to counter-measure geographical centralization in microblogging platforms”, ACM Hypertext’14] Gender Bias [E. Graells-Garrido et al,. “First Women, Second Sex: Gender Bias in Wikipedia”, ACM Hypertext’15] Systemic bias? Equal opportunity?
  • 11. 7/20/16 11 •  The Web already is influenced by small groups •  "0.05% of the user population, attract almost 50% of all attention within Twitter" (50K users) [Wu, Hofman, Mason & Watts, WWW 2011] •  We explored this issue further with four different datasets: 1.  a large one from Twitter (2011), 2.  a small one from Facebook (2009), 3. Amazon reviews (2013), and 4.  Wikipedia editors (2015). •  Digital desert: the content that is never seen Activity Bias: Wisdom of a Few? [Baeza-Yates & Saez-Trumper, ACM Hypertext 2015] Examples [Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
  • 12. 7/20/16 12 October 2015 Quality of Content? 51 Yahoo Confidential & Proprietary •  Adding content implies adding wisdom? •  We use Amazon’s reviews helpfulness •  We computed the text entropy •  Content-based-wise users •  How many of those users are being paid?
  • 13. 7/20/16 13 Digital Desert 52 Yahoo Confidential & Proprietary Weblands of Wisdom
  • 14. 7/20/16 14 Bias in the Interface Position bias Ranking bias Presentation bias Social bias Interaction bias Presentation Bias §  Interaction data will be biased to what is shown §  In recommender systems, items recommended will get more clicks than items not recommended §  In search systems top ranked results will get more clicks than other results ›  Ranking bias ›  Interaction bias CTR (log) 1 11 21 Rank [Dupret & Piwowarski, SIGIR 2008] [Chapelle & Zhang, WWW 2009]bias
  • 15. 7/20/16 15 [WHY AMAZON’S RATINGS MIGHT MISLEAD YOU; The Story of Herding Effects Ting Wang and Dashun Wang, Big Data, 2014] Social Bias Extreme Algorithmic Bias
  • 16. 7/20/16 16 Second Order Bias in Web Content [Baeza-Yates, Pereira & Ziviani, Geneological Trees in the Web, WWW 2008] Person Web content is redundant Clicks in results are biased to the ranking and the interaction Query Ranking bias Redundancy grows (35%) Search results New Most measures in the Web follow a power law The Long Tail: Sparsity [Anatomy of the long tail: Ordinary People with Extraordinary Tastes, Goel, Broder, Gabrilovich, Pang; WSDM 2010] §  Why there is a long tail? §  Sampling in the tail §  When the crowd dominates §  Empowering the tail
  • 17. 7/20/16 17 When the Crowd Dominates Kills the long tail 80 Personalization “facets”: •  Language (not always) •  Location •  Semantic facets per user •  Query intent prediction in search Empowering the Tail The Filter “Bubble”, Eli Pariser •  Avoid the Poor get Poorer Syndrome •  Avoid the Echo Chamber •  How to expose opposite views? 81 Cold start problem solution: Explore & Exploit Solutions: •  Diversity •  Novelty •  Serendipity
  • 18. 7/20/16 18 A Data Portrait is a visual context where users can explore how the system understand their interests. This context is used to embed content-based recommendations, displayed visually to facilitate exploration and user engagement. To combat homophily, recommendations are generated having political diversity in mind. Does it work? Yes, by using intermediary topics that are shared! But only when users are interested in politics. Demo at http://auroratwittera.cl/perfil/YahooLabs [E. Graells-Garrido, M. Lalmas and R. Baeza-Yates, ACM UAI 2016] •  Exploit the context (and deep learning!) 91% accuracy to predict the next app you will use [Baeza-Yates et al, WSDM 2015] •  Personalization vs. Contextualization Recall that user interaction is another long tail People Interests Aggregating in theTail
  • 19. 7/20/16 19 [De Choudhury et al, ACM HT 2010] 87[Quercia et al, ACM HT 2014] Crowdsourcing Data: Good Paths
  • 20. 7/20/16 20 Regions from Pictures [Thomee et al, Demo at CHI 2014] AOL Query Logs Release Incident §  No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men”. §  Other queries: “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.” §  Data trail led to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. A Face Is Exposed for AOL Searcher No. 4417749, By MICHAEL BARBARO and TOM ZELLER Jr, The New York Times, Aug 9 2006 90
  • 21. 7/20/16 21 91 Risks of Privacy in Query Logs §  Profile [Jones, Kumar, Pang, Tompkins, CIKM 2007] •  Gender: 84% •  Age (±10): 79% •  Location (ZIP3): 35% §  Vanity Queries [Jones et al, CIKM 2008] •  Partial name: 8.9% •  Complete: 1.2% §  More information: •  A Survey of query log privacy-enhancing techniques from a policy perspective [Cooper, ACM TWEB 2008] §  A good anonymization technique is still an open problem
  • 22. 7/20/16 22 Privacy Awareness § How our privacy changes when we change our social network? § Information gain to predict a private attribute based on public data § Each user may have a promiscuity score § Example: new friendship request Promiscuity( me ) > Promiscuity( new) Promiscuity( me ) ≥ Promiscuity( new ) + max-gain-I-allow Promiscuity( me ) < Promiscuity( new ) + max-gain-I-allow Related work by [Estivill-Castro & Nettleton; Singh, ASONAM 2015] The Web Works Thanks to Bias! § Web traffic ›  Local caching ›  Proxy/Akamai caching § Search engines ›  Answer caching ›  Essential web pages •  25% queries can be answered with less than 1% of the URLs! [Baeza-Yates, Boldi, Chierichetti, WWW 2015] § E-Commerce ›  Large fraction of revenue comes from few popular items Activity bias (Self) selection bias
  • 23. 7/20/16 23 Web Data §  A mirror of ourselves, the good, the bad and the ugly §  The web amplifies everything, good or bad, but always leaves traces §  We have to be aware of the biases and contrarrest them §  We have to be aware of our privacy Big Data of People is huge….. ….. but is tiny compared to the future Big Data of the Internet of Things (IoT) It’s Hard to Get Data to Tell the Truth §  The blindness of the averages §  Look at distributions §  Absolute vs. relative §  Income per capita vs. Inequality §  Local vs. global optimization §  Teams competing without knowing, uncorrelated criteria §  You can always see/torture data as you wish ›  61 analysts, 29 teams: 20 yes and 9 no (Univ. of Virginia, COS)