SlideShare a Scribd company logo
1 of 23
Social Networks Analytics
Hubert Lo Prateek Maitra Aaron Strahl
Wikipedia Vote Network
Outline
- Introduction
- Wikipedia Request for Adminship
- Is the RfA process fair?
- Application Techniques
- Descriptive Statistics
- Distributions, Betweenness, Clustering
- Graph Partitioning
- Key Takeaways
- Conclusion
Background
Background
About RfA and its process:
Nomination
Notice of RfA
Expressing Opinions
Discussion, decision, and closing procedures
Research Question
Question: We were interested to analyze the directed graph
relationship between wikipedia administrator users and
average users in a Wikipedia voting dataset.
Are the procedures in place fair or not?
Application Techniques
Techniques:
Descriptive Statistics and Interpretation
Graph Partitioning/Visuals
Filtering the network by increasing degree - Gephi
Network Degree Distribution
Pattern of Random or Preferential Attachment?
Descriptive Statistics
Edges Count: 103,689 Strongly Connected - False
Vertices Count: 7,115 Global Clustering: 0.1254791
Reciprocity: 0.0564 Weakly Connected - False
Average Path: 3.34 Diameter: 10
Degree Distribution
• The Long Tail
Distribution is very
evident
• Nodes from 0 to 100
degrees account for
about 85% of the all the
nodes in the dataset
Degree Distribution
• Few hubs with large
number of links.
• Many nodes with less
number of links.
Log-Log Plot
• Quantity being
measured can be
viewed as a type of
popularity
• Rich-get-Richer
Phenomenon
Average Betweenness and Degree
• Degree Centrality and
Node Betweenness appear
very linear
• Nodes with higher degree
of connections have
higher betweenness
scores
Average Clustering and Degree
• local clustering appears
to be decreasing
exponentially as degree
centrality increases,
resembling the power law
phenomenon
• Moderate levels of
degree centrality, still
high clustering levels
Average Constraint and Degree
• Average constraint
embeddedness and degree
centrality have a
negative linear
relationship.
• Majority of users have
relatively low level of
constraint.
Average Neighbor Degree and Degree Plot
• Low level degree
users have wide,
their neighbors
have higher average
degree.
• As we increase
degree, in
comparison their
neighbors have
lower degree
connections.
Application Techniques - Partitioning
Challenge in How to partition the graph?We have a network
that has a lot of edges, very dense.
Nodes:7,066
Edges:103,663
Graph Networks - Partitioning
We increased the degree over time to see how the network
evolved
Degree: Range 2 to 1,167.
Nodes:4797 (67.42%)
Edges:101394(97.97%)
Graph Networks - Increasing Degree
Graph Networks - Partitioning
Degree Range 160 to 1,167.
Nodes:262 (3.68%)
Edges:9,959(9.60%)
Graph Networks - Partitioning
Degree Range 260 to 1,167.
Nodes:92(1.29%)
Edges:2,098(2.02%)
Core Component
• Majority of these nodes have very high betweenness scores.
• Majority of these nodes have high eigenvector centrality.
• They belong to the strongly connected component id:1016.
Key Takeaways
- RfA process for adding new administrators does not
exhibit weak or strong connectivity
- Network structure is directed toward a dense, central
core with a lot of nodes around the periphery
- Rich-get-richer/Preferential attachment model
characteristics are exhibited
- Although every vote counts the same, an Administrator’s
vote has the potential to bring many more votes along
with it
- Graph partitioning allows us to view the core clearly
So is it Fair?
- Ultimately, we determined that the Wikipedia Rfa process
is fair but highly flawed, with underlying nuances
- Although a new user’s vote and an administrator’s vote
technically carries the same weight, administrators
leverage the power of their personal network
- As a result, current administrators retain control over
the network as a whole and decide who gets to become an
administrator
Questions?

More Related Content

Similar to Wikipedia Vote Network - Social Networks

PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...Tristan Penman
 
Network Traffic Trends Prediction Using Machine Learning Modelling of Packet ...
Network Traffic Trends Prediction Using Machine Learning Modelling of Packet ...Network Traffic Trends Prediction Using Machine Learning Modelling of Packet ...
Network Traffic Trends Prediction Using Machine Learning Modelling of Packet ...Rangaprasad Sampath
 
network design 8.pptx
network design 8.pptxnetwork design 8.pptx
network design 8.pptxaida alsamawi
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxShakas Technologies
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxShakas Technologies
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreePradeeban Kathiravelu, Ph.D.
 
Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Vishal Sharma, Ph.D.
 
Cp7101 design and management of computer networks-requirements analysis 2
Cp7101 design and management of computer networks-requirements analysis 2 Cp7101 design and management of computer networks-requirements analysis 2
Cp7101 design and management of computer networks-requirements analysis 2 Dr Geetha Mohan
 
Bonneau - Complex Networks Foundations of Information Systems - Spring Review...
Bonneau - Complex Networks Foundations of Information Systems - Spring Review...Bonneau - Complex Networks Foundations of Information Systems - Spring Review...
Bonneau - Complex Networks Foundations of Information Systems - Spring Review...The Air Force Office of Scientific Research
 
Cp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkCp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkDr Geetha Mohan
 
Direct_studies_report13
Direct_studies_report13Direct_studies_report13
Direct_studies_report13Farhad Gholami
 
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the CloudThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the CloudThousandEyes
 
Node Deployment Technique using Wireless Sensor Networks
Node Deployment Technique using Wireless Sensor NetworksNode Deployment Technique using Wireless Sensor Networks
Node Deployment Technique using Wireless Sensor NetworksIRJET Journal
 
Link aware nice application level multicast protocol
Link aware nice application level multicast protocolLink aware nice application level multicast protocol
Link aware nice application level multicast protocolIJCNCJournal
 
Well_Monitoring_System_DataComm_Technology.pdf
Well_Monitoring_System_DataComm_Technology.pdfWell_Monitoring_System_DataComm_Technology.pdf
Well_Monitoring_System_DataComm_Technology.pdfHari Prasetyo Utomo
 
Network Management and Flow Analysis in Today’s Dense IT Environments
Network Management and Flow Analysis in Today’s Dense IT EnvironmentsNetwork Management and Flow Analysis in Today’s Dense IT Environments
Network Management and Flow Analysis in Today’s Dense IT EnvironmentsSolarWinds
 

Similar to Wikipedia Vote Network - Social Networks (20)

PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
 
Network Traffic Trends Prediction Using Machine Learning Modelling of Packet ...
Network Traffic Trends Prediction Using Machine Learning Modelling of Packet ...Network Traffic Trends Prediction Using Machine Learning Modelling of Packet ...
Network Traffic Trends Prediction Using Machine Learning Modelling of Packet ...
 
network design 8.pptx
network design 8.pptxnetwork design 8.pptx
network design 8.pptx
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?
 
Cp7101 design and management of computer networks-requirements analysis 2
Cp7101 design and management of computer networks-requirements analysis 2 Cp7101 design and management of computer networks-requirements analysis 2
Cp7101 design and management of computer networks-requirements analysis 2
 
Bonneau - Complex Networks Foundations of Information Systems - Spring Review...
Bonneau - Complex Networks Foundations of Information Systems - Spring Review...Bonneau - Complex Networks Foundations of Information Systems - Spring Review...
Bonneau - Complex Networks Foundations of Information Systems - Spring Review...
 
Cp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkCp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -network
 
Direct_studies_report13
Direct_studies_report13Direct_studies_report13
Direct_studies_report13
 
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the CloudThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
ThousandEyes EMEA - Why 74% of IT Teams Are Not Ready for the Cloud
 
Node Deployment Technique using Wireless Sensor Networks
Node Deployment Technique using Wireless Sensor NetworksNode Deployment Technique using Wireless Sensor Networks
Node Deployment Technique using Wireless Sensor Networks
 
Link aware nice application level multicast protocol
Link aware nice application level multicast protocolLink aware nice application level multicast protocol
Link aware nice application level multicast protocol
 
WSN presentation
WSN presentationWSN presentation
WSN presentation
 
1.CN-PPT.ppt
1.CN-PPT.ppt1.CN-PPT.ppt
1.CN-PPT.ppt
 
Well_Monitoring_System_DataComm_Technology.pdf
Well_Monitoring_System_DataComm_Technology.pdfWell_Monitoring_System_DataComm_Technology.pdf
Well_Monitoring_System_DataComm_Technology.pdf
 
NOMA in 5G Networks
NOMA in 5G NetworksNOMA in 5G Networks
NOMA in 5G Networks
 
Network Management and Flow Analysis in Today’s Dense IT Environments
Network Management and Flow Analysis in Today’s Dense IT EnvironmentsNetwork Management and Flow Analysis in Today’s Dense IT Environments
Network Management and Flow Analysis in Today’s Dense IT Environments
 
Research Issues on WSN
Research Issues on WSNResearch Issues on WSN
Research Issues on WSN
 

More from Hubert Lo

X Technology Delivery Company - Case Study
X Technology Delivery Company - Case StudyX Technology Delivery Company - Case Study
X Technology Delivery Company - Case StudyHubert Lo
 
US Automobiles Analysis - FCA Fiat
US Automobiles Analysis - FCA FiatUS Automobiles Analysis - FCA Fiat
US Automobiles Analysis - FCA FiatHubert Lo
 
3D Scatterplot - R programming
3D Scatterplot - R programming3D Scatterplot - R programming
3D Scatterplot - R programmingHubert Lo
 
Senior Project Analyst Report
Senior Project Analyst ReportSenior Project Analyst Report
Senior Project Analyst ReportHubert Lo
 
Stat 323 project: Throwing Accuracy with Darts
Stat 323 project: Throwing Accuracy with DartsStat 323 project: Throwing Accuracy with Darts
Stat 323 project: Throwing Accuracy with DartsHubert Lo
 
Statistical Modeling - Cereal Data Project
Statistical Modeling - Cereal Data Project  Statistical Modeling - Cereal Data Project
Statistical Modeling - Cereal Data Project Hubert Lo
 
Deloitte Case Competition - Finalist
Deloitte Case Competition - FinalistDeloitte Case Competition - Finalist
Deloitte Case Competition - FinalistHubert Lo
 

More from Hubert Lo (7)

X Technology Delivery Company - Case Study
X Technology Delivery Company - Case StudyX Technology Delivery Company - Case Study
X Technology Delivery Company - Case Study
 
US Automobiles Analysis - FCA Fiat
US Automobiles Analysis - FCA FiatUS Automobiles Analysis - FCA Fiat
US Automobiles Analysis - FCA Fiat
 
3D Scatterplot - R programming
3D Scatterplot - R programming3D Scatterplot - R programming
3D Scatterplot - R programming
 
Senior Project Analyst Report
Senior Project Analyst ReportSenior Project Analyst Report
Senior Project Analyst Report
 
Stat 323 project: Throwing Accuracy with Darts
Stat 323 project: Throwing Accuracy with DartsStat 323 project: Throwing Accuracy with Darts
Stat 323 project: Throwing Accuracy with Darts
 
Statistical Modeling - Cereal Data Project
Statistical Modeling - Cereal Data Project  Statistical Modeling - Cereal Data Project
Statistical Modeling - Cereal Data Project
 
Deloitte Case Competition - Finalist
Deloitte Case Competition - FinalistDeloitte Case Competition - Finalist
Deloitte Case Competition - Finalist
 

Recently uploaded

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 

Wikipedia Vote Network - Social Networks

  • 1. Social Networks Analytics Hubert Lo Prateek Maitra Aaron Strahl Wikipedia Vote Network
  • 2. Outline - Introduction - Wikipedia Request for Adminship - Is the RfA process fair? - Application Techniques - Descriptive Statistics - Distributions, Betweenness, Clustering - Graph Partitioning - Key Takeaways - Conclusion
  • 4. Background About RfA and its process: Nomination Notice of RfA Expressing Opinions Discussion, decision, and closing procedures
  • 5. Research Question Question: We were interested to analyze the directed graph relationship between wikipedia administrator users and average users in a Wikipedia voting dataset. Are the procedures in place fair or not?
  • 6. Application Techniques Techniques: Descriptive Statistics and Interpretation Graph Partitioning/Visuals Filtering the network by increasing degree - Gephi Network Degree Distribution Pattern of Random or Preferential Attachment?
  • 7. Descriptive Statistics Edges Count: 103,689 Strongly Connected - False Vertices Count: 7,115 Global Clustering: 0.1254791 Reciprocity: 0.0564 Weakly Connected - False Average Path: 3.34 Diameter: 10
  • 8. Degree Distribution • The Long Tail Distribution is very evident • Nodes from 0 to 100 degrees account for about 85% of the all the nodes in the dataset
  • 9. Degree Distribution • Few hubs with large number of links. • Many nodes with less number of links.
  • 10. Log-Log Plot • Quantity being measured can be viewed as a type of popularity • Rich-get-Richer Phenomenon
  • 11. Average Betweenness and Degree • Degree Centrality and Node Betweenness appear very linear • Nodes with higher degree of connections have higher betweenness scores
  • 12. Average Clustering and Degree • local clustering appears to be decreasing exponentially as degree centrality increases, resembling the power law phenomenon • Moderate levels of degree centrality, still high clustering levels
  • 13. Average Constraint and Degree • Average constraint embeddedness and degree centrality have a negative linear relationship. • Majority of users have relatively low level of constraint.
  • 14. Average Neighbor Degree and Degree Plot • Low level degree users have wide, their neighbors have higher average degree. • As we increase degree, in comparison their neighbors have lower degree connections.
  • 15. Application Techniques - Partitioning Challenge in How to partition the graph?We have a network that has a lot of edges, very dense. Nodes:7,066 Edges:103,663
  • 16. Graph Networks - Partitioning We increased the degree over time to see how the network evolved Degree: Range 2 to 1,167. Nodes:4797 (67.42%) Edges:101394(97.97%)
  • 17. Graph Networks - Increasing Degree
  • 18. Graph Networks - Partitioning Degree Range 160 to 1,167. Nodes:262 (3.68%) Edges:9,959(9.60%)
  • 19. Graph Networks - Partitioning Degree Range 260 to 1,167. Nodes:92(1.29%) Edges:2,098(2.02%)
  • 20. Core Component • Majority of these nodes have very high betweenness scores. • Majority of these nodes have high eigenvector centrality. • They belong to the strongly connected component id:1016.
  • 21. Key Takeaways - RfA process for adding new administrators does not exhibit weak or strong connectivity - Network structure is directed toward a dense, central core with a lot of nodes around the periphery - Rich-get-richer/Preferential attachment model characteristics are exhibited - Although every vote counts the same, an Administrator’s vote has the potential to bring many more votes along with it - Graph partitioning allows us to view the core clearly
  • 22. So is it Fair? - Ultimately, we determined that the Wikipedia Rfa process is fair but highly flawed, with underlying nuances - Although a new user’s vote and an administrator’s vote technically carries the same weight, administrators leverage the power of their personal network - As a result, current administrators retain control over the network as a whole and decide who gets to become an administrator

Editor's Notes

  1. https://snap.stanford.edu/data/wiki-Vote.html
  2. Describe RfA Dataset from start of Wikipedia in 2001 to 2008 Data: 7,066 nodes; 103,663 edges Only includes nominations for other users (not self nominations)
  3. Describe RfA Dataset from start of Wikipedia in 2001 to 2008 Data: 7,066 nodes; 103,663 edges Only includes nominations for other users (not self nominations)
  4. Users fill out application (questionnaire) Existing admins vote on application and provide commentary - why or why they weren’t accepted Successful promotion is exclusive, just a 44% success rate -Candidates display the Rfa-notice tag on userpages. Rfa remains open for seven days, contributors will ask questions and and make comments as they wish -Bureaucrats, will review and close the RfA, at least 75% support most likely will pass -Strong edit history, user interaction, high quality articles, trustworthiness
  5. Behavior of the network Is it fair? Business question: Can the behavior be used to predict behavior for other vote driven processes? Political elections, board seats,etc.
  6. Basic stats Graph partitioning - paring down extraneous edges for visualization
  7. Reciprocity: likelihood of vertices in a directed network being mutually linked 0: purely unidirectional 1: purely bidirectional - everything points back This is low; indicates that there are not a lot of mutual linkages. Indicates that the process may be fair because you don’t see a lot of people voting for each other. This may be due to the small number of people up for nomination. Average Path: defined as the average number of steps along the shortest paths for all possible pairs of network nodes. It is a measure of the efficiency of information or mass transport on a network. This makes sense given the dense core of the network. However, note the large amount of fringe nodes. Diameter: find the shortest path between each pair of vertices. The greatest length of any of these paths is the diameter of the graph. We expect to see a much higher number than avg path length since fringe nodes are highly isolated from other fringe nodes due to the structure of the overall network. Of the two most distant nodes from one another, the shortest path length is 10 steps. Weakly Connected - A directed graph is called weakly connected if replacing all of its directed edges with undirected edges produces a connected (undirected) graph. Strongly connected- The graph is strongly connected if all nodes have a connection to other nodes within the network. Given the amount of data points in our network, it is difficult to isolate clusters. However, it would make sense that the graph is not entirely connected given the way the voting process works. There are different communities based on different languages, countries of origin, etc. There is also a drastic difference between the center and the periphery of the network. It is however comprised of strongly connected components. This too, would make sense given that there are smaller communities of editors/readers/administrators who would all be connected due to the “networking” aspect of the nomination process. Global clustering coefficient - The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed) - this is a measure of 0-1 This is important to us because it is a measure of the degree of triadic closure that we see in the network, which is relatively low overall
  8. Long tail distribution - degree. Highly skewed - Uneven distribution. Very few high degree nodes, Nodes with 0 to 100 degrees - account for over 85% of the nodes in the entire data set In this distribution, the degrees are the number votes cast. This tells us that there are a large number of individuals who are voting a small number of times. This makes sense given the network structure.
  9. Admin votes count as much as a user voting for another user. Admin - power, Wikipedia editor for longer Right - few high degrees, the core component Left - most of the nodes are in the area less than 100. Note: there are sparsely distributed nodes High preferential attachment, but not totally pure. High distribution of near zero’s but not exactly a textbook form of preferential attachment Demonstrates Seeking the approval for the one from high degree, formula for success. Wikipedia tells you these people. In the previous slide, we couldn’t see any degree distribution above 250. However, here we clearly see the long tail stretch out to in excess of 800. This is the component we find the most interesting. These votes represent votes cast or votes received. How do we reach in excess of 800 votes? Perhaps administrators’ networks cast votes for nominees based on who the administrator is voting for. Their credibility extends to the nominee. Preferential attachment- when we see preferential attachment exist it is an extremely high concentration of observations clustered around 0. In this case we do not see this, but instead a hybrid attachment model.
  10. The network demonstrates the rich-get-richer phenomenon of the power law (preferential attachment). This could be due to the influence of the administrator’s followers. Another important characteristic of scale-free networks is the clustering coefficient distribution, which decreases as the node degree increases.
  11. Nodes at the center of the network lie on the shortest paths between a greater percentage of the entire network than any other nodes within the network, contributing to their high degree centrality. The pattern for degree centrality and average node betweenness is very linear in that nodes with more degree or connections are expected to have higher betweenness scores.
  12. The degree centrality and local clustering plot: --- local clustering is based on how connected your neighbors are as a ratio to yourself Fraction of possible interconnections of V. The low degree-high clustering coefficient observations on the left-hand side of the graph represents nodes at the edge of the network. As degree increases, clustering coefficient decreases similar to what we saw in the scale free network/power law plot. Plot looks a bit negative exponential. Where relative high degree of centrality, average clustering still holds up. We are able to see different from what we saw in the SAP networks.
  13. Very linear trend whereas the average constraint embeddedness increases, degree centrality lowers. Users who have a less degree centrality, has high constraint in terms of access to information. Reliant on high betweenness nodes. Summary measure that taps the extent to which ego's connections are to others who are connected to one another. If ego's potential trading partners all have one another as potential trading partners, ego is highly constrained. If ego's partners do not have other alternatives in the neighborhood, they cannot constrain ego's behavior.
  14. First we had to Remove loops - eliminated users who voted for themselves as this introduced noise into the data. Patterns - weakly connected fringe cases, dense strongly connected interior. Peripheral nodes showed the pattern of 1 node accounting for 1 vote a majority of the time whereas nodes in the strongly connected component frequently voted for one another hundreds of times per case.
  15. How did we begin to make sense of the overall structure of the network? We filtered the nodes of having just one degree and saw less of a halo pattern between the first and second iteration. All of the nodes pictured in the above visualization represent between 2-1167 nodes.
  16. Filtering to reveal the underlying structure in the network Degree = 10 Nodes:2864, Edges 93681 (40.25%, 90.35%) Degree=20 Nodes: 2200, Edges:85153 (30.92%, 82.12%) Degree=40 Nodes: 1576, Edges 69764 (22.15%, 67.28%) Degree=80 Nodes: 757, Edges 36121 (10.64%, 34.84%) Degree=120 Nodes:428, Edges 19113 (6.02%, 18.4%) Degree=160 Nodes: 262, Edges 9959 (3.68%, 9.6%)
  17. 160
  18. Less than 2% of the entire network comprised the strongest connected component within the network. These are presumably members who have the strongest possibility of becoming an administrator. Note that strong triadic closure is demonstrated through a few number of people voting a great deal
  19. High betweenness - Vertices with high betweenness may have considerable influence within a network by virtue of their control over information passing between others. Eigenvector centrality: The assumption is that each node's centrality is the sum of the centrality values of the nodes that it is connected to. Eigenvector centrality is also another means of gauging influence. A smaller number of nodes with higher quality connection always weights stronger eigenvalue centrality than a node with a higher frequency of lower quality connections. OLD --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- LSCC: nodes in the largest strongly connected component. Adjust it to the giant component stats. Maximum Degree Node: 1167 Size of LSCC:1300 : Size of maximal subset of nodes such that there is a directed path from each node to each other node. Strongly connected component id:1016. Similar to how we we interpret, when a earlier administrator, the probability that you link up or edges with is some Proportional to its current number of edges. 80% of the all nodes have a betweenness centrality of 0.
  20. Essentially are two parts to the graph, the outer editor/voters, and presumably the power players/existing admins, or other popular users up for adminship
  21. Information we would have liked to have - who are the admins? And who are their network cohorts? Had we had access to this information we would have been able to hone in on the impact of the administrators entire network.
  22. Describe RfA Dataset from start of Wikipedia in 2001 to 2008 Data: 7,066 nodes; 103,663 edges Only includes nominations for other users (not self nominations)