Measures of Central Tendency: Mean, Median and Mode
Deploying Open Learning Analytics at a National Scale
1. Lessons from the Real World
Oct 2016 Deploying Open Learning Analytics at National Scale
2. Michael Webb
Director of Technology and Analytics, Jisc
Eitel J.M. Lauría, PhD
Professor of Information Technology & Systems, Marist College
Kate Valenti
Vice President of Operations, Unicon
3. Agenda
» Strategic View
›A brief introduction to Learning Analytics
›National issues in the UK
» Technical View
›Open architecture
›Predictive modeling
» Implementation View
›Trends and tactics from the field
» Discussion
5. What do we mean by Learning Analytics?
» The application of big data techniques such as machine based learning and data
mining to help learners and institutions meet their goals:
› For our project:
– Improve retention (current project)
– Improve achievement (current project)
– Improve employability (current project)
– Improve learning design (later stage)
6. Learning Analytics stages get progressively “smarter”
Basic Analytics
What has happened
Automated Analytics
What is happening
Predictive Analytics
What might happen
8. National issues in the UK: Retention
» 16-18 Education:
› 178,100 students aged 16-18 failed to finish (2012/13)
› costing UK £814 million a year
» Undergraduates:
› 8% of undergraduates drop out in their first year of study
› This costs universities up to £33,000 per student
9. National issues in the UK: Differential achievement
» Parental background and ethnicity impact achievement:
10. National issues in the UK: Differential achievement
2/03/2016 The case for Learning Analytics
» Which behaviours are associated
with lower than expected
academic achievement?
13. Jisc’s Learning Analytics project
Three core strands
Learning Analytics
architecture and
service
Toolkit Community
Jisc Learning Analytics
14. Toolkit: Code of practice
2/03/2016 The case for Learning Analytics
15. Jisc Learning Analytics architecture
What
» Building a national architecture
» Defined standards and models
» Implementation with core services
Why?
» Standards mean models, visualisations and so on can be shared
» Lower cost per institutions through shared infrastructure
» Lower barrier to innovation – the underpinning work is already done
16. What do we mean by an open architecture?
» All APIs published, and process for engaging in their development
» Open standards and definitions
›Data Models and Definitions Creative Commons.
›Developed openly on Github
» All core elements open source or open specification (eg creative commons)
» Freedom to implement both commercial and open solutions as the non-core
elements
17. Data
Collection
Data
Storage
and Analysis
Presentation
and Action
Jisc Learning Analytics open architecture: core
Alert and
Intervention system
Staff Dashboards Consent Student App
Learning
Analytics Processor
Learning
Records Warehouse
Student Records VLE Library
DataExplorer
Self Declared Data
18. Meanwhile, in the US...
Learning Analytics Processor: Predictive Modeling Framework
19. Motivation: Alarming Stats in 2010
36% 4-year completion rate across all four-year institutions in the US
21% for Black students
25% for Hispanic students
58% 6-year completion rate for four-year institutions
40% for Black students
49% for Hispanic Students
41% 25-to-34 Year-Olds with an Associate Degree or Higher
(US ranked 12th among 36 developed nations)
Sources: U.S. Dept. of Education, Postsecondary Education Data System (2009)
CollegeBoard, Advocacy & Policy Center, The Completion Agenda 2011 Progress Report
20. Open Academic Analytics Initiative @ Marist
EDUCAUSE Next Generation Learning Challenges (NGLC) grant
Funded by Bill and Melinda Gates Foundation
Use machine learning to find patterns in large datasets as means to predict student
academic performance.
Create “early alert” framework:
• Predict academically at-risk students in initial weeks of a course
• Deploy intervention to improve chances of success
Based on Open ecosystem for academic analytics
• Sakai Collaboration and Learning Environment
• Pentaho Business Intelligence Suite (Kettle + Weka)
• Collaboration with commercial vendors (IBM SPSS Modeler)
21. Learning Analytics Processor @ Marist: Early Alert
How does it actually work?
(binary classification problem)
Hardware Platform: IBM zEnterprise 114 with
BladeCenter Extension (zBX)
Virtualized Servers: 64 bit, 16/32 GB RAM
Linux Red Hat
Extraction,
Transformation &
Loading
Scoring
(predictions on new
student data using
library of persisted
learnt classifiers)
Predictive Model
Building
(classifiers learnt
from data)
New Student
Data
(early in the
Semester)
Prediction of at-risk studentsSingle node architecture
Relational Storage
Intervention
SATs, GPA,
HS ranking,
Course size,
Course grade
(target
feature)
Age, gender,
ethnicity,
income level
Sessions
Resources
Lessons
Assignments
Forums
Tests
Partial
contributions
to final
grade
Logistic Regression
SVMs
Naïve Bayes
J48 Decision Trees
Student
Academic
Data
Student
Demographic
Data
LMS
Event Log Data
LMS
Gradebook Data
22. Learning Analytics Processor @ Marist: Early Alert
New Iteration: Cluster Computing Architecture
New Student
Data
(early in the
Semester)
Prediction of At-risk students
Intervention
Scoring
(predictions on
new student data
using library of
persisted learnt
classifiers)
Hardware Platform (Dev)
Linux VMs (32GB RAM) running on
IBM PureFlex System
Distributed Storage (HDFS)
& Processing
Extraction,
Transformation &
Loading
Predictive
Model Building
(classifiers learnt
from data)
Job
Scheduling
Student Academic
Data
Student Demographic
Data
LMS Event Log Data
LMS Gradebook Data
Library Data
Student Engagement
Data
Social Network Data
and more …
C
U
R
R
E
N
T
F
U
T
U
R
E
Scales well for Big Data use cases
(more volume & variety)
Logistic Regression
Random Forests
Naïve Bayes
23. Promising Outcomes
Phase II: Cluster Computing Accuracy Recall FP Rate
Marist
- 3 semesters, 25K records each 86% 87% 14%
North Carolina State University
- 3 semesters, 160K recs each 81% 77% 18%
- 3 semesters, online, 85K recs each 80% 82% 19%
Jisc Project:
• 260,000 records
• 4 institutions (Aberystwyth University, University of Gloucestershire, Cardiff
Metropolitan University, University of Greenwich)
• Results due in December 2016
26. Discovery activity assesses institutional readiness
– Goal: to assess institutional readiness
(think organizational maturity)
» Measured on 26 factors crossing organizational
and technical considerations
» Approximately 60% of the first 11 institutions are
ready to implement Learning Analytics technology
solutions
Source: Moving the Red Queen Forward, Educaus
Review September/October 2016, Dahlstrom
27. Varied activities show adoption flexibility
Profile Aim Activity Data sources
Russell Group Retention of widening participation +
support for students to achieve 2.1 or better
Discovery +
Tribal Insight +
Learning Locker
Moodle +
Student Records
Research led University Retention, improve teaching, empowering
students
Discovery +
OpenSource Suite +
Student App
Moodle +
Attendance+ Student
Records
Teaching led University
with WP mission
Retention - requirement to make identifying
students more efficient so they can focus on
interventions
Tribal Insight +
Learning Locker
Blackboard +
Attendance +
Student Records
Research led University Student engagement Discovery +
Student app +
Learning Locker
Moodle +
Student Records
Teaching Lead Understanding of how Learning Analytics
can be used
Discovery +
Technical Integration
Moodle
28. Organizational Trends
» Top level support is critical
» Change culture makes things easier
» Red tape is real (in policy management)
» Academics looking for evidence-based results
It’s (almost) all about change management
& Tactics
» Convene a Learning Analytics committee (include students)
» Identify champions and advocates
» Adjust existing policies rather than creating new
» Pilot the solution
29. It’s (almost) all about the data
& Tactics
» Perform data audits and quality checks (early and often)
» Look for “all inclusive” offerings (predictions)
» Look for integration options
Technical Trends
» Institutional infrastructure for data collection requires improvement
» Unified data management desired but not realized
» Data quality issues are common
» Integration with existing infrastructure a challenge
» Doing more with the same technical staff
30. Keep it simple, snowflakes
& Tactics
» Simplify for pilot; add complexity later
» Overall, only add components you don’t already have
» Flexibility (by institution and by vendors) is key
Pilot Trends
» Customizations are required to meet institutional needs
» Ditto for integrations
» Data gathering effort is considerable
» Did we mention data quality?
31. Q&A
2/03/2016 The case for Learning Analytics
Interested in more detail?
» Data quality challenges
» Predictive model research
» Data collection, UDD, xAPI recipes, use of standards
» Spark, ETL flows
» ?
Michael Webb
michael.webb@jisc.ac.uk
Eitel J.M. Lauría, PhD
eitel.lauria@marist.edu
Kate Valenti
kvalenti@unicon.net