SlideShare a Scribd company logo
1 of 31
Download to read offline
Mining Paper Catalogues
A Multilingual Solution to Reduce Verbose Fields to Consistent Terminology
Tim Evans1, Felix Kußmaul2
23rd Annual Meeting EAA, Maastricht
31 August 2017
1Archaeology Data Service, University of York
2Archaeological Institute, University of Cologne
Data Source
Figure 1: Sample from Ettlinger, Conspectus.
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 2/22
Oh dear!
Problem
Running texts contain a lot of irrelevant information (for machine processing).
This makes database lookups without keywords extremely inefficient.
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 3/22
What we have: What we want:
{
"form": "23.1",
"origin": "Italy",
"decoration": "none",
"occurs": "uncommon"
},
{
"form": "23.2",
"origin": "Italy, not Padana",
"occurs": "Mediterranean region;
North-Italy"
}
UNSTRUCTURED DATA STRUCTURED DATA
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 4/22
What we have: What we want:
{
"form": "23.1",
"origin": "Italy",
"decoration": "none",
"occurs": "uncommon"
},
{
"form": "23.2",
"origin": "Italy, not Padana",
"occurs": "Mediterranean region;
North-Italy"
}
UNSTRUCTURED DATA STRUCTURED DATA
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 4/22
What we have: What we want:
{
"form": "23.1",
"origin": "Italy",
"decoration": "none",
"occurs": "uncommon"
},
{
"form": "23.2",
"origin": "Italy, not Padana",
"occurs": "Mediterranean region;
North-Italy"
}
UNSTRUCTURED DATA STRUCTURED DATA
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 4/22
Information Extraction
Definition: Information Extraction (IE)
“[IE refers to] the identification and extraction of instances of a particular class
of events or relationships in a natural language text and their transformation
into a structured representation.” – Grishman 1997, Eikvil 1999
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 5/22
IE Process Pipeline
Stanford ClearNLP OpenNLP CoreNLP UIMA Ruta
Tokenisation Lemmatisation POS-Tagging NER Information
Extraction
unstructured
document structured data
PosMapper CoreNLP MatePos OpenNLP CoreNLP
Figure 2: IE process pipeline.
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 6/22
Named Entity Recognition
The quick brown fox jump over the lazy dog .
DT JJ JJ NN VBD IN DT JJ NN .
jumps
Figure 3: POS-tagging examples after lemmatisation.
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 7/22
Named Entity Recognition
The quick brown fox jump over the lazy dog .
DT JJ JJ NN VBD IN DT JJ NN .
jumps
Figure 3: POS-tagging examples after lemmatisation.
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 7/22
Adapting the NER
Most NERs (e. g. Stanford CoreNLP) only recognise 8 entities types:
PERSON DATE
ORGANIZATION TIME
LOCATION MONEY
PERCENT MISC
So we have to add the custom entity type FORM.
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 8/22
Two approaches for NER
Rule-based approach
• High precision, but lower recall
⇒ Many many rules?!
Machine-learning approach
• Lower precision, but high recall
• Needs to be trained!
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 9/22
Two approaches for NER
Rule-based approach
• High precision, but lower recall
⇒ Many many rules?!
Figure 4: Excerpt from Gempeler,
Elephantine X.
Machine-learning approach
• Lower precision, but high recall
• Needs to be trained!
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 9/22
Two approaches for NER
Rule-based approach
• High precision, but lower recall
⇒ Many many rules?!
Figure 4: Excerpt from Gempeler,
Elephantine X.
Machine-learning approach
• Lower precision, but high recall
• Needs to be trained!
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 9/22
Two approaches for NER
Rule-based approach
• High precision, but lower recall
⇒ Many many rules?!
Figure 4: Excerpt from Gempeler,
Elephantine X.
Machine-learning approach
• Lower precision, but high recall
• Needs to be trained!
Figure 5: Manually annotated sentence
from Ettlinger, Conspectus in iepy.
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 9/22
Temporal Expressions
With HeidelTime temporal expressions are mapped to TIMEX3 standard
around 140 B. C. −→ APPROX BC0140
Spätes 3.–4. Jh. n. Chr. −→ END 02; 03
second quarter first century B. C. −→ XXXX-Q2 BC00
first half third century A. D. −→ XXXX-H1 02
HeidelTime supports many other languages, e. g. German, Italian, French, …
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 10/22
Relation Extraction
Subject Relation Object
quick brown fox jump over lazy dog
K 612 dates 031
Subform 23.2 occurs North Italy
Subform 23.2 dates XXXX-Q2 002
⇒ e. g.
{ "form": "23.2",
"dating": "XXXX-Q2 00" }
1“4th century A. D.”
2“second and third quarters of the first century A. D.”
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 11/22
Relation Extraction
Subject Relation Object
quick brown fox jump over lazy dog
K 612 dates 031
Subform 23.2 occurs North Italy
Subform 23.2 dates XXXX-Q2 002
⇒ e. g.
{ "form": "23.2",
"dating": "XXXX-Q2 00" }
1“4th century A. D.”
2“second and third quarters of the first century A. D.”
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 11/22
MULTILINGUALISM
Background
Two problems:
• Linguistic
• Conceptual
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 12/22
Different languages
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 13/22
Different traditions
Figure 6: Plate, platter or dish?
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 14/22
Creating controlled vocabularies
Creating wordlists that project team would be most useful to describe the key
features of a vessel or sherd
• Sherd type (e.g. rim or handle)
• Form (e.g. plate or bowl)
• Decoration form (e.g. burnished)
• Decoration color (e.g. yellow)
• Fabric (e.g. Dressel 28 fabric)
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 15/22
Lessons from ARIADNE
Used tools and methodology developed for the ARIADNE project by the
Hypermedia Research Group at the University of South Wales
• Created a neutral spine based on the Getty Institute’s Art and Architecture
Thesaurus (AAT)
• This spine was populated by members from partner organisations,
identifying common terms and concepts within it
• Project partners then mapped terms in their language to this neutral spine
• French terms supplied courtesy of a 2001 Masters thesis by Caroline Sourzat
(thanks to Eleni Schindler Kaudelka for identifying this on the ArchAIDE blog!)
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 16/22
Mapping terms and concepts (part 1)
Often this was very straightforward, for example:
• The Italian terms graffita, graffita a punta, graffita a stecca = “sgraffito”
(http://vocab.getty.edu/aat/300266416)
• The Spanish term Cántaro = “jars” (http://vocab.getty.edu/aat/300195348)
• The German terms gebogener Henkel, Ohrförmiger Henkel, langer
Vertikalhenkel = “handles” (http://vocab.getty.edu/aat/300266416)
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 17/22
Mapping terms and concepts (part 2)
Often this was more complicated, with partners having differing perceptions on
what to call something (e.g. “plate” versus “platter”)
In truth, this confusion may also be reflected by what has come out of the ground!
An advantage of using the AAT (a “SKOS’d” thesaurus), is that ambiguity or
difference in nomenclature can be resolved by a broader term or concept, so for
example …
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 18/22
Mapping terms and concepts (part 3)
Looking at the hierarchies for plate and platter in the AAT we can see that both
are “dishes (vessels for food)”, or even broader “culinary containers”. So whole
we can retain our original classifications (and this is essential for text mining), we
can agree at a fundamental level what these fundamentally are
Figure 7: AAT Hierarchies for Plate and Platter
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 19/22
Outlook
• Recognize reigns of emperors as DATE entities
• Coreferences in general
• HeidelTime:
second and third quarter of the first century A. D. −→ XXXX-Q3; 00
• Returning to difference in ceramic recording details
• Fabric names often contain locations, e. g. Magdalensberg xyz
• Location sometimes narrow, sometimes whole regions
• In many cases the form is not named in particular but just described
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 20/22
References
Ettlinger, Elisabeth. Conspectus formarum terrae sigillatae Italico modo confectae.
Ed. by Deutsches Archäologisches Institut zu Frankfurt and
Römisch-Germanische Kommission. Materialien zur römisch-germanischen
Keramik. Bonn: Habelt, 1990.
Gempeler, Robert D. Elephantine X. Die Keramik römischer bis früharabischer Zeit.
Mainz: Von Zabern, 1992.
T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 21/22
Thank you very much for your attention!
Questions?
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement № 693548
Mining Paper Catalogues
A Multilingual Solution to Reduce Verbose Fields to Consistent Terminology
Tim Evans1, Felix Kußmaul2
23rd Annual Meeting EAA, Maastricht
31 August 2017
1Archaeology Data Service, University of York
2Archaeological Institute, University of Cologne

More Related Content

Similar to Mining Paper Catalogues A Multilingual Solution to Reduce Verbose Fields to Consistent Terminology

Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdfEncyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdfAndrsHernndezGarca3
 
Vibration of plates by Leissa
Vibration of plates  by LeissaVibration of plates  by Leissa
Vibration of plates by LeissaAghilesh V
 
What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)Frieda Brioschi
 
JMartin - 596 PE - Final Project Write-Up
JMartin - 596 PE - Final Project Write-UpJMartin - 596 PE - Final Project Write-Up
JMartin - 596 PE - Final Project Write-UpJason Martin
 
601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]bellhawaii
 
0134484592 ch31
0134484592 ch310134484592 ch31
0134484592 ch31dworthdoty
 
The Life-History of the BACH Project
The Life-History of the BACH ProjectThe Life-History of the BACH Project
The Life-History of the BACH ProjectRuth Tringham
 
Planning the Dissertation Project
Planning the Dissertation ProjectPlanning the Dissertation Project
Planning the Dissertation Projectunmgrc
 
Planning the dissertation
Planning the dissertationPlanning the dissertation
Planning the dissertationJoseph Martinez
 
Saturday school exemplars 11.20
Saturday school exemplars 11.20Saturday school exemplars 11.20
Saturday school exemplars 11.20Michael Christiano
 
International Cloud Atlas - 1987 - Volume 2
International Cloud Atlas - 1987 - Volume 2International Cloud Atlas - 1987 - Volume 2
International Cloud Atlas - 1987 - Volume 2Samantha Martins
 
What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)Frieda Brioschi
 
Sharing and Creating Knowledge: Crowdsurfing at the ETH Library in Zurich
Sharing and Creating Knowledge: Crowdsurfing at the ETH Library in ZurichSharing and Creating Knowledge: Crowdsurfing at the ETH Library in Zurich
Sharing and Creating Knowledge: Crowdsurfing at the ETH Library in ZurichSharing is Caring - Hamburg Extension
 
601 Session9-dicts+quizreview-s13
601 Session9-dicts+quizreview-s13601 Session9-dicts+quizreview-s13
601 Session9-dicts+quizreview-s13Diane Nahl
 
What are data and information, why they matter
What are data and information, why they matterWhat are data and information, why they matter
What are data and information, why they matterFrieda Brioschi
 

Similar to Mining Paper Catalogues A Multilingual Solution to Reduce Verbose Fields to Consistent Terminology (20)

Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdfEncyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
 
Vibration of plates by Leissa
Vibration of plates  by LeissaVibration of plates  by Leissa
Vibration of plates by Leissa
 
What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)
 
JMartin - 596 PE - Final Project Write-Up
JMartin - 596 PE - Final Project Write-UpJMartin - 596 PE - Final Project Write-Up
JMartin - 596 PE - Final Project Write-Up
 
Cantor
CantorCantor
Cantor
 
601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]
 
0134484592 ch31
0134484592 ch310134484592 ch31
0134484592 ch31
 
Archimedes
ArchimedesArchimedes
Archimedes
 
The Life-History of the BACH Project
The Life-History of the BACH ProjectThe Life-History of the BACH Project
The Life-History of the BACH Project
 
Planning the Dissertation Project
Planning the Dissertation ProjectPlanning the Dissertation Project
Planning the Dissertation Project
 
Planning the dissertation
Planning the dissertationPlanning the dissertation
Planning the dissertation
 
Saturday school exemplars 11.20
Saturday school exemplars 11.20Saturday school exemplars 11.20
Saturday school exemplars 11.20
 
International Cloud Atlas - 1987 - Volume 2
International Cloud Atlas - 1987 - Volume 2International Cloud Atlas - 1987 - Volume 2
International Cloud Atlas - 1987 - Volume 2
 
Pitch 1 Veldnotities | Andreas Weber
Pitch 1 Veldnotities | Andreas WeberPitch 1 Veldnotities | Andreas Weber
Pitch 1 Veldnotities | Andreas Weber
 
What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)
 
7 q5 marco
7 q5   marco7 q5   marco
7 q5 marco
 
Sharing and Creating Knowledge: Crowdsurfing at the ETH Library in Zurich
Sharing and Creating Knowledge: Crowdsurfing at the ETH Library in ZurichSharing and Creating Knowledge: Crowdsurfing at the ETH Library in Zurich
Sharing and Creating Knowledge: Crowdsurfing at the ETH Library in Zurich
 
Prescott Emda2015
Prescott Emda2015Prescott Emda2015
Prescott Emda2015
 
601 Session9-dicts+quizreview-s13
601 Session9-dicts+quizreview-s13601 Session9-dicts+quizreview-s13
601 Session9-dicts+quizreview-s13
 
What are data and information, why they matter
What are data and information, why they matterWhat are data and information, why they matter
What are data and information, why they matter
 

More from ArchAIDE Project

Presentation and results of ArchAIDE project - EAA2018
Presentation and results of ArchAIDE project - EAA2018Presentation and results of ArchAIDE project - EAA2018
Presentation and results of ArchAIDE project - EAA2018ArchAIDE Project
 
Talking about the revolution. Innovation in communication within the ArchAIDE...
Talking about the revolution. Innovation in communication within the ArchAIDE...Talking about the revolution. Innovation in communication within the ArchAIDE...
Talking about the revolution. Innovation in communication within the ArchAIDE...ArchAIDE Project
 
Workshop, Athens 14 May 2018
Workshop, Athens 14 May 2018Workshop, Athens 14 May 2018
Workshop, Athens 14 May 2018ArchAIDE Project
 
Fair of European Innovators in Cultural Heritage
Fair of European Innovators in Cultural HeritageFair of European Innovators in Cultural Heritage
Fair of European Innovators in Cultural HeritageArchAIDE Project
 
Italian training day. Pisa, 23 Marzo 2018 Il progetto
Italian training day. Pisa, 23 Marzo 2018 Il progettoItalian training day. Pisa, 23 Marzo 2018 Il progetto
Italian training day. Pisa, 23 Marzo 2018 Il progettoArchAIDE Project
 
Una nuova frontiera per la documentazione e l’interpretazione della ceramica
Una nuova frontiera per la documentazione e l’interpretazione della ceramicaUna nuova frontiera per la documentazione e l’interpretazione della ceramica
Una nuova frontiera per la documentazione e l’interpretazione della ceramicaArchAIDE Project
 
EVA/Minerva Conference on Digitisation of Cultural Heritage
EVA/Minerva Conference on Digitisation of Cultural HeritageEVA/Minerva Conference on Digitisation of Cultural Heritage
EVA/Minerva Conference on Digitisation of Cultural HeritageArchAIDE Project
 
II Congreso Internacional de musealización y puesta en valor del Patrimonio C...
II Congreso Internacional de musealización y puesta en valor del Patrimonio C...II Congreso Internacional de musealización y puesta en valor del Patrimonio C...
II Congreso Internacional de musealización y puesta en valor del Patrimonio C...ArchAIDE Project
 
Campagne fotografiche sulle classi ceramiche test (WP5)
Campagne fotografiche sulle classi ceramiche test (WP5)Campagne fotografiche sulle classi ceramiche test (WP5)
Campagne fotografiche sulle classi ceramiche test (WP5)ArchAIDE Project
 
Una rete neurale per il riconoscimento automatico della ceramica: il progetto...
Una rete neurale per il riconoscimento automatico della ceramica: il progetto...Una rete neurale per il riconoscimento automatico della ceramica: il progetto...
Una rete neurale per il riconoscimento automatico della ceramica: il progetto...ArchAIDE Project
 
A mobile app for the automatic recognition of archaeological potsherds: the A...
A mobile app for the automatic recognition of archaeological potsherds: the A...A mobile app for the automatic recognition of archaeological potsherds: the A...
A mobile app for the automatic recognition of archaeological potsherds: the A...ArchAIDE Project
 
Navigating a new digital interface: using automated image recognition to iden...
Navigating a new digital interface: using automated image recognition to iden...Navigating a new digital interface: using automated image recognition to iden...
Navigating a new digital interface: using automated image recognition to iden...ArchAIDE Project
 
Una rete neurale per l’archeologia
Una rete neurale per l’archeologiaUna rete neurale per l’archeologia
Una rete neurale per l’archeologiaArchAIDE Project
 
ArchAIDE Projekttreffen und EAA in Maastricht
ArchAIDE Projekttreffen und EAA in MaastrichtArchAIDE Projekttreffen und EAA in Maastricht
ArchAIDE Projekttreffen und EAA in MaastrichtArchAIDE Project
 
Development and analysis of 3D reference collections from archaeological arch...
Development and analysis of 3D reference collections from archaeological arch...Development and analysis of 3D reference collections from archaeological arch...
Development and analysis of 3D reference collections from archaeological arch...ArchAIDE Project
 
Michael Remmy, WP5: Population of the database
Michael Remmy, WP5: Population of the databaseMichael Remmy, WP5: Population of the database
Michael Remmy, WP5: Population of the databaseArchAIDE Project
 
Populating the Reference Database Photographing Collections
Populating the Reference Database Photographing CollectionsPopulating the Reference Database Photographing Collections
Populating the Reference Database Photographing CollectionsArchAIDE Project
 
ArchAIDE Kick-Off Meeting - WP5
ArchAIDE Kick-Off Meeting - WP5ArchAIDE Kick-Off Meeting - WP5
ArchAIDE Kick-Off Meeting - WP5ArchAIDE Project
 
Eva Miguel Gascon, Mireia Pinto Monte, Marisol Madrid i Fernandez, Jaume Buxe...
Eva Miguel Gascon, Mireia Pinto Monte, Marisol Madrid i Fernandez, Jaume Buxe...Eva Miguel Gascon, Mireia Pinto Monte, Marisol Madrid i Fernandez, Jaume Buxe...
Eva Miguel Gascon, Mireia Pinto Monte, Marisol Madrid i Fernandez, Jaume Buxe...ArchAIDE Project
 

More from ArchAIDE Project (20)

Presentation and results of ArchAIDE project - EAA2018
Presentation and results of ArchAIDE project - EAA2018Presentation and results of ArchAIDE project - EAA2018
Presentation and results of ArchAIDE project - EAA2018
 
Talking about the revolution. Innovation in communication within the ArchAIDE...
Talking about the revolution. Innovation in communication within the ArchAIDE...Talking about the revolution. Innovation in communication within the ArchAIDE...
Talking about the revolution. Innovation in communication within the ArchAIDE...
 
Workshop, Athens 14 May 2018
Workshop, Athens 14 May 2018Workshop, Athens 14 May 2018
Workshop, Athens 14 May 2018
 
Fair of European Innovators in Cultural Heritage
Fair of European Innovators in Cultural HeritageFair of European Innovators in Cultural Heritage
Fair of European Innovators in Cultural Heritage
 
Italian training day. Pisa, 23 Marzo 2018 Il progetto
Italian training day. Pisa, 23 Marzo 2018 Il progettoItalian training day. Pisa, 23 Marzo 2018 Il progetto
Italian training day. Pisa, 23 Marzo 2018 Il progetto
 
Una nuova frontiera per la documentazione e l’interpretazione della ceramica
Una nuova frontiera per la documentazione e l’interpretazione della ceramicaUna nuova frontiera per la documentazione e l’interpretazione della ceramica
Una nuova frontiera per la documentazione e l’interpretazione della ceramica
 
EVA/Minerva Conference on Digitisation of Cultural Heritage
EVA/Minerva Conference on Digitisation of Cultural HeritageEVA/Minerva Conference on Digitisation of Cultural Heritage
EVA/Minerva Conference on Digitisation of Cultural Heritage
 
II Congreso Internacional de musealización y puesta en valor del Patrimonio C...
II Congreso Internacional de musealización y puesta en valor del Patrimonio C...II Congreso Internacional de musealización y puesta en valor del Patrimonio C...
II Congreso Internacional de musealización y puesta en valor del Patrimonio C...
 
Campagne fotografiche sulle classi ceramiche test (WP5)
Campagne fotografiche sulle classi ceramiche test (WP5)Campagne fotografiche sulle classi ceramiche test (WP5)
Campagne fotografiche sulle classi ceramiche test (WP5)
 
Una rete neurale per il riconoscimento automatico della ceramica: il progetto...
Una rete neurale per il riconoscimento automatico della ceramica: il progetto...Una rete neurale per il riconoscimento automatico della ceramica: il progetto...
Una rete neurale per il riconoscimento automatico della ceramica: il progetto...
 
A mobile app for the automatic recognition of archaeological potsherds: the A...
A mobile app for the automatic recognition of archaeological potsherds: the A...A mobile app for the automatic recognition of archaeological potsherds: the A...
A mobile app for the automatic recognition of archaeological potsherds: the A...
 
Navigating a new digital interface: using automated image recognition to iden...
Navigating a new digital interface: using automated image recognition to iden...Navigating a new digital interface: using automated image recognition to iden...
Navigating a new digital interface: using automated image recognition to iden...
 
Una rete neurale per l’archeologia
Una rete neurale per l’archeologiaUna rete neurale per l’archeologia
Una rete neurale per l’archeologia
 
ArchAIDE Projekttreffen und EAA in Maastricht
ArchAIDE Projekttreffen und EAA in MaastrichtArchAIDE Projekttreffen und EAA in Maastricht
ArchAIDE Projekttreffen und EAA in Maastricht
 
InnovativeTechnologies
InnovativeTechnologiesInnovativeTechnologies
InnovativeTechnologies
 
Development and analysis of 3D reference collections from archaeological arch...
Development and analysis of 3D reference collections from archaeological arch...Development and analysis of 3D reference collections from archaeological arch...
Development and analysis of 3D reference collections from archaeological arch...
 
Michael Remmy, WP5: Population of the database
Michael Remmy, WP5: Population of the databaseMichael Remmy, WP5: Population of the database
Michael Remmy, WP5: Population of the database
 
Populating the Reference Database Photographing Collections
Populating the Reference Database Photographing CollectionsPopulating the Reference Database Photographing Collections
Populating the Reference Database Photographing Collections
 
ArchAIDE Kick-Off Meeting - WP5
ArchAIDE Kick-Off Meeting - WP5ArchAIDE Kick-Off Meeting - WP5
ArchAIDE Kick-Off Meeting - WP5
 
Eva Miguel Gascon, Mireia Pinto Monte, Marisol Madrid i Fernandez, Jaume Buxe...
Eva Miguel Gascon, Mireia Pinto Monte, Marisol Madrid i Fernandez, Jaume Buxe...Eva Miguel Gascon, Mireia Pinto Monte, Marisol Madrid i Fernandez, Jaume Buxe...
Eva Miguel Gascon, Mireia Pinto Monte, Marisol Madrid i Fernandez, Jaume Buxe...
 

Recently uploaded

OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 

Recently uploaded (20)

OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 

Mining Paper Catalogues A Multilingual Solution to Reduce Verbose Fields to Consistent Terminology

  • 1. Mining Paper Catalogues A Multilingual Solution to Reduce Verbose Fields to Consistent Terminology Tim Evans1, Felix Kußmaul2 23rd Annual Meeting EAA, Maastricht 31 August 2017 1Archaeology Data Service, University of York 2Archaeological Institute, University of Cologne
  • 2. Data Source Figure 1: Sample from Ettlinger, Conspectus. T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 2/22
  • 3. Oh dear! Problem Running texts contain a lot of irrelevant information (for machine processing). This makes database lookups without keywords extremely inefficient. T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 3/22
  • 4. What we have: What we want: { "form": "23.1", "origin": "Italy", "decoration": "none", "occurs": "uncommon" }, { "form": "23.2", "origin": "Italy, not Padana", "occurs": "Mediterranean region; North-Italy" } UNSTRUCTURED DATA STRUCTURED DATA T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 4/22
  • 5. What we have: What we want: { "form": "23.1", "origin": "Italy", "decoration": "none", "occurs": "uncommon" }, { "form": "23.2", "origin": "Italy, not Padana", "occurs": "Mediterranean region; North-Italy" } UNSTRUCTURED DATA STRUCTURED DATA T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 4/22
  • 6. What we have: What we want: { "form": "23.1", "origin": "Italy", "decoration": "none", "occurs": "uncommon" }, { "form": "23.2", "origin": "Italy, not Padana", "occurs": "Mediterranean region; North-Italy" } UNSTRUCTURED DATA STRUCTURED DATA T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 4/22
  • 7. Information Extraction Definition: Information Extraction (IE) “[IE refers to] the identification and extraction of instances of a particular class of events or relationships in a natural language text and their transformation into a structured representation.” – Grishman 1997, Eikvil 1999 T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 5/22
  • 8. IE Process Pipeline Stanford ClearNLP OpenNLP CoreNLP UIMA Ruta Tokenisation Lemmatisation POS-Tagging NER Information Extraction unstructured document structured data PosMapper CoreNLP MatePos OpenNLP CoreNLP Figure 2: IE process pipeline. T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 6/22
  • 9. Named Entity Recognition The quick brown fox jump over the lazy dog . DT JJ JJ NN VBD IN DT JJ NN . jumps Figure 3: POS-tagging examples after lemmatisation. T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 7/22
  • 10. Named Entity Recognition The quick brown fox jump over the lazy dog . DT JJ JJ NN VBD IN DT JJ NN . jumps Figure 3: POS-tagging examples after lemmatisation. T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 7/22
  • 11. Adapting the NER Most NERs (e. g. Stanford CoreNLP) only recognise 8 entities types: PERSON DATE ORGANIZATION TIME LOCATION MONEY PERCENT MISC So we have to add the custom entity type FORM. T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 8/22
  • 12. Two approaches for NER Rule-based approach • High precision, but lower recall ⇒ Many many rules?! Machine-learning approach • Lower precision, but high recall • Needs to be trained! T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 9/22
  • 13. Two approaches for NER Rule-based approach • High precision, but lower recall ⇒ Many many rules?! Figure 4: Excerpt from Gempeler, Elephantine X. Machine-learning approach • Lower precision, but high recall • Needs to be trained! T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 9/22
  • 14. Two approaches for NER Rule-based approach • High precision, but lower recall ⇒ Many many rules?! Figure 4: Excerpt from Gempeler, Elephantine X. Machine-learning approach • Lower precision, but high recall • Needs to be trained! T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 9/22
  • 15. Two approaches for NER Rule-based approach • High precision, but lower recall ⇒ Many many rules?! Figure 4: Excerpt from Gempeler, Elephantine X. Machine-learning approach • Lower precision, but high recall • Needs to be trained! Figure 5: Manually annotated sentence from Ettlinger, Conspectus in iepy. T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 9/22
  • 16. Temporal Expressions With HeidelTime temporal expressions are mapped to TIMEX3 standard around 140 B. C. −→ APPROX BC0140 Spätes 3.–4. Jh. n. Chr. −→ END 02; 03 second quarter first century B. C. −→ XXXX-Q2 BC00 first half third century A. D. −→ XXXX-H1 02 HeidelTime supports many other languages, e. g. German, Italian, French, … T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 10/22
  • 17. Relation Extraction Subject Relation Object quick brown fox jump over lazy dog K 612 dates 031 Subform 23.2 occurs North Italy Subform 23.2 dates XXXX-Q2 002 ⇒ e. g. { "form": "23.2", "dating": "XXXX-Q2 00" } 1“4th century A. D.” 2“second and third quarters of the first century A. D.” T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 11/22
  • 18. Relation Extraction Subject Relation Object quick brown fox jump over lazy dog K 612 dates 031 Subform 23.2 occurs North Italy Subform 23.2 dates XXXX-Q2 002 ⇒ e. g. { "form": "23.2", "dating": "XXXX-Q2 00" } 1“4th century A. D.” 2“second and third quarters of the first century A. D.” T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: 31 August 2017 11/22
  • 20. Background Two problems: • Linguistic • Conceptual T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 12/22
  • 21. Different languages T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 13/22
  • 22. Different traditions Figure 6: Plate, platter or dish? T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 14/22
  • 23. Creating controlled vocabularies Creating wordlists that project team would be most useful to describe the key features of a vessel or sherd • Sherd type (e.g. rim or handle) • Form (e.g. plate or bowl) • Decoration form (e.g. burnished) • Decoration color (e.g. yellow) • Fabric (e.g. Dressel 28 fabric) T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 15/22
  • 24. Lessons from ARIADNE Used tools and methodology developed for the ARIADNE project by the Hypermedia Research Group at the University of South Wales • Created a neutral spine based on the Getty Institute’s Art and Architecture Thesaurus (AAT) • This spine was populated by members from partner organisations, identifying common terms and concepts within it • Project partners then mapped terms in their language to this neutral spine • French terms supplied courtesy of a 2001 Masters thesis by Caroline Sourzat (thanks to Eleni Schindler Kaudelka for identifying this on the ArchAIDE blog!) T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 16/22
  • 25. Mapping terms and concepts (part 1) Often this was very straightforward, for example: • The Italian terms graffita, graffita a punta, graffita a stecca = “sgraffito” (http://vocab.getty.edu/aat/300266416) • The Spanish term Cántaro = “jars” (http://vocab.getty.edu/aat/300195348) • The German terms gebogener Henkel, Ohrförmiger Henkel, langer Vertikalhenkel = “handles” (http://vocab.getty.edu/aat/300266416) T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 17/22
  • 26. Mapping terms and concepts (part 2) Often this was more complicated, with partners having differing perceptions on what to call something (e.g. “plate” versus “platter”) In truth, this confusion may also be reflected by what has come out of the ground! An advantage of using the AAT (a “SKOS’d” thesaurus), is that ambiguity or difference in nomenclature can be resolved by a broader term or concept, so for example … T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 18/22
  • 27. Mapping terms and concepts (part 3) Looking at the hierarchies for plate and platter in the AAT we can see that both are “dishes (vessels for food)”, or even broader “culinary containers”. So whole we can retain our original classifications (and this is essential for text mining), we can agree at a fundamental level what these fundamentally are Figure 7: AAT Hierarchies for Plate and Platter T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 19/22
  • 28. Outlook • Recognize reigns of emperors as DATE entities • Coreferences in general • HeidelTime: second and third quarter of the first century A. D. −→ XXXX-Q3; 00 • Returning to difference in ceramic recording details • Fabric names often contain locations, e. g. Magdalensberg xyz • Location sometimes narrow, sometimes whole regions • In many cases the form is not named in particular but just described T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 20/22
  • 29. References Ettlinger, Elisabeth. Conspectus formarum terrae sigillatae Italico modo confectae. Ed. by Deutsches Archäologisches Institut zu Frankfurt and Römisch-Germanische Kommission. Materialien zur römisch-germanischen Keramik. Bonn: Habelt, 1990. Gempeler, Robert D. Elephantine X. Die Keramik römischer bis früharabischer Zeit. Mainz: Von Zabern, 1992. T. Evans & F. Kußmaul (York, Cologne) Mining Paper Catalogues: Multilingualism 31 August 2017 21/22
  • 30. Thank you very much for your attention! Questions? This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement № 693548
  • 31. Mining Paper Catalogues A Multilingual Solution to Reduce Verbose Fields to Consistent Terminology Tim Evans1, Felix Kußmaul2 23rd Annual Meeting EAA, Maastricht 31 August 2017 1Archaeology Data Service, University of York 2Archaeological Institute, University of Cologne