Structural Topic Modelling of Ofsted Documents

STRUCTURAL TOPIC MODELLING
OF OFSTED DOCUMENTS
BERA seminar, 16 September 2021
Dr Christian Bokhove
Southampton Education School
University of Southampton
Trusting the text: corpus-assisted approaches in education research

Contents
1. Computational Social Science
2. Ofsted’s inspection context and prior
research
3. Structural Topic Modelling
4. Conclusions

Computational research methods
Approach that relies on forms of automated analysis of information,
using computers, to answer education research questions.
The methods can include one or more of the following:
• Analysis depends on algorithms, including the use of
• Artificial intelligence (AI) - computers make complex, human-like judgements
• Machine Learning (ML) - computers learn to copy human behaviour
• Data sets are usually large scale, 'Big Data', sometimes millions of
sources are collected and analysed.
• Information already exists, rather than collected specifically for
research.
• 'Scraping' from websites (news, reports, blogs, etc)
• Extraction from databases and archives created for other purposes (eg journal
contents, interactions with a learning platform)
• Social networks (e.g. social media)
• Simulating new data

Adapting Cioffi-Revilla (2017), we can distinguish different
types of computational social science, each with associated
computational research methods.
• Automated social information extraction;
• Social networks and social complexity;
• Social simulation modelling;
Here, we focus on the first category.
Cioffi-Revilla, C. (2017). Introduction to computational social science (2nd edition).
London, UK: Springer.

For example, Bokhove
(2015) scraped thousands of
OFSTED reports from the
inspection website to answer
the question whether topics
and sentiments in the
reports had changed over
time, so-called ‘sentiment
analysis.
Bokhove, C. (2015). Text mining school inspection reports in England with R. University of
Southampton.

Bokhove, C., & Sims, S. (2020). Demonstrating the potential of text mining for analyzing school inspection
reports: a sentiment analysis of 17,000 Ofsted documents. International Journal of Research and Method in
Education. https://doi.org/10.1080/1743727X.2020.1819228

Boxplot showing the distribution of sentiment scores by inspection grade. N=3,155.

Average sentiment score for the corpus of inspection documents by Chief Inspector. N=17,212.

R
package
stm
(Roberts, Stewart, & Tingley, 2019)

Analytical approach
• 3155 documents, classified by judgement
• Outstanding
• Good
• Requiring Improvement
• Satisfactory
• Inadequate
• Lower case, stemming, remove stopwords, remove
punctuation, remove numbers
• “Your corpus now has 3155 documents, 1435 terms and
1767508 tokens.”
• Judgement as covariate.
• Age as covariate.

Ten topics, some make sense, more prevalent in
‘inadequate’ and ‘requiring improvement’.
Some topics hard to gauge:

• Munoz-Najar Galvez et al. (2019) used text analysis to
study the paradigm wars in graduate research in the field of
education.
• Topic modelling by Inglis and Foster (2018) with the
package MALLET, to study evidence of the ‘social turn’ in
five decades of mathematics education research.
Munoz-Najar Galvez, S., Heiberger, R., & McFarland, D. (2020). Paradigm wars
revisited: A cartography of graduate research in the field of education (1980–2010).
American Educational Research Journal, 57(2), 612-652.
Other examples…
Inglis, M., & Foster, C. (2018). Five decades of mathematics education research.
Journal for Research in Mathematics Education, 49(4), 462-500.

Conclusions
• Large corpora of documents can be analysed at scale
with computational methods (e.g. text mining).
• There are several methods to do this, for example
sentiment analysis and (structural) topic modelling.
• Some methods allow for including other variables.
• Real-world documents are messy and probably require
plenty of cleaning. Interpretation can be a challenge.
• Number of topics to choose not straightforward. There are
methods for this (e.g. ‘perplexity’).
• Computational methods work well in combination with
qualitative methods e.g. ‘quotes in context’.

Thank you - Questions
• C.Bokhove@soton.ac.uk
• Southampton Education School
• Twitter: @cbokhove
• Website: www.bokhove.net

Structural Topic Modelling of Ofsted Documents

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Structural Topic Modelling of Ofsted Documents

Similar to Structural Topic Modelling of Ofsted Documents (20)

More from Christian Bokhove

More from Christian Bokhove (20)

Recently uploaded

Recently uploaded (20)

Structural Topic Modelling of Ofsted Documents

Editor's Notes