SlideShare a Scribd company logo
1 of 29
Download to read offline
Literature Recommendation Software
Faruk Cankaya
Melike Keskin
Supervisor: Florian Schramm
Professor: Prof. Dr. Jürgen Ernstberger
April 15, 2021
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
Introduction
➢ Problem Statement
○ No preliminary data
○ Paragraph input
Introduction
➢ Keyword based input (X)
➢ Reference based recommendation (X)
➢ Mostly cited papers (X)

https://images.unsplash.com/photo-1526721940322-10fb6e3ae94a?utm_medium=medium&w=700&q=50&auto=format
https://cdn-images-1.medium.com/max/880/0*LHnFAic3Jw4N_IdP
https://images.unsplash.com/photo-1532012197267-da84d127e765?utm_medium=medium&w=700&q=50&auto=format
Introduction
➢ Problem Statement
○ No preliminary data
○ Paragraph input
➢ Motivation
○ First recommender system based on just a paragraph input
○ Specific area based paper recommendation
○ Wide area to try different technique combinations
○ Make easier the writing thesis
○ Time saving
○ Specific domain
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
➢ Related Works
○ Scienstein: A Research Paper Recommender System
■ Paper recommender
■ Hybrid filtering
■ Citation, author and source analysis
■ Preliminary data (citation analysis, author analysis, source analysis )
○ Science Concierge: A Fast Content-Based Recommendation System for
Scientific Publications
■ Paper recommender
■ Content-based filtering
■ Topic Modeling
■ Preliminary data (users’ votes)
○ ScienceDirect: Topic Modeling Driven Content-Based Jobs Recommendation
Engine for Recruitment Industry
■ Job recommender
■ Content-based filtering
■ Topic Modeling
■ Preliminary data (job description, user details)
Related Works
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
Methodology
Methodology
➢ Used Method
○ Content-based
○ Data Preprocessing
■ Cleaning + Tokenization + Stop Word Removing + Lemmatization
○ Topic modelling
■ LDA
■ NMF
○ Similarity Function
■ Cosine Similarity
➢ Data preparation
○ Number of documents: ~12.000 papers
○ Tokenization, Cleaning text, Stop word removal, Stemming,
Lemmatization, Synonym replacement, POS, etc.
Our Model:
Cleaning + Tokenization + Stop Word Removing + Lemmatization
Methodology
Methodology
➢ Vectorization
Vectorization
● Bag of words
● TF-IDF……...
Preprocessed input
text
Vectorized data
Methodology
➢ Vectorization
○ Bag-Of-Words
○ TF-IDF
terms, features or corpus
items or
documents
Methodology
➢ Topic Extraction
○ Applied Topic Modeling Technique
■ LDA
■ NMF
Methodology
Vectorized data
➢ Topic Extraction
Terms in each topic
Topic Probability of each document
Methodology
➢ Prediction / Recommendation
○ based on Cosine Similarity
Topic Probability
Matrix of dataset
Topic Probability
Vector of input
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding Similarity
➢ Results
➢ Conclusion and Future Work
➢ Questions
Results
➢ Data preprocessing steps effect
Results
➢ Data preprocessing steps effect
Results
➢ Model Comparisons
Results
➢ Number of Words in User Input
Results
➢ Validation with user feedback
○ Before user feedback
■ Accuracy with content 3
● LDA is better than NMF
■ Accuracy with content 10
● NMF is better than LDA
○ After user feedback
■ NMF is better than LDA
Agenda
➢ Introduction
○ Problem Statement
○ Motivation
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding Similarity
➢ Results
➢ Conclusion and Future Work
➢ Questions
Conclusion & Future Works
➢ Conclusion
○ Found optimal data preprocessing model
■ Cleaning + Tokenization + Stop Word Removing + Lemmatization
○ Compared 2 different topic modelling techniques
■ LDA, and NMF
○ Compared model accuracies
○ User ratings
■ Models with LDA, and NMF
➢ Future Works
➢ Try another techniques such as BERT and check if the result of these
techniques give better result on user rating feedback.
➢ Use user ratings to improve recommendation system
➢ Add new features to the website
➢ Try different topic modellings
➢ Try different similarity functions
➢ Train a model use the extracted topics
➢ Tune the hyperparameters according to new techniques
Conclusion & Future Works
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding Similarity
➢ Results
➢ Conclusion and Future Work
➢ Questions
DEMO
➢ Web Site
Thank You
Questions?

More Related Content

Similar to Literature Recommendation Software

Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo羽祈 張
 
Henning agt talk-caise-semnet
Henning agt   talk-caise-semnetHenning agt   talk-caise-semnet
Henning agt talk-caise-semnetcaise2013vlc
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Daniel Davis
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015QBiC_Tue
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMProduct School
 
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...Dan Blickensderfer
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional dataSantosConleyha
 
Ai and ml study group lecture 1 and 2
Ai and ml study group   lecture 1 and 2Ai and ml study group   lecture 1 and 2
Ai and ml study group lecture 1 and 2Ashley Davis
 
Making Sense of It All: Analyzing Qualitative Data
Making Sense of It All: Analyzing Qualitative DataMaking Sense of It All: Analyzing Qualitative Data
Making Sense of It All: Analyzing Qualitative DataGeorge Hayhoe
 
Research Methods in Medical Informatics
Research Methods in Medical InformaticsResearch Methods in Medical Informatics
Research Methods in Medical InformaticsSerkan Turkeli
 
PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023Matlantis
 
Coursera data science specialization
Coursera data science specializationCoursera data science specialization
Coursera data science specializationMengshu Liu
 

Similar to Literature Recommendation Software (20)

Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
Henning agt talk-caise-semnet
Henning agt   talk-caise-semnetHenning agt   talk-caise-semnet
Henning agt talk-caise-semnet
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
 
Research Problem
Research ProblemResearch Problem
Research Problem
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PM
 
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
 
Seminar2017
Seminar2017Seminar2017
Seminar2017
 
Pmp session 1
Pmp session 1Pmp session 1
Pmp session 1
 
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
 
Intro
IntroIntro
Intro
 
first_seminar.pdf
first_seminar.pdffirst_seminar.pdf
first_seminar.pdf
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data
 
Ai and ml study group lecture 1 and 2
Ai and ml study group   lecture 1 and 2Ai and ml study group   lecture 1 and 2
Ai and ml study group lecture 1 and 2
 
Making Sense of It All: Analyzing Qualitative Data
Making Sense of It All: Analyzing Qualitative DataMaking Sense of It All: Analyzing Qualitative Data
Making Sense of It All: Analyzing Qualitative Data
 
Research Methods in Medical Informatics
Research Methods in Medical InformaticsResearch Methods in Medical Informatics
Research Methods in Medical Informatics
 
PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023
 
Coursera data science specialization
Coursera data science specializationCoursera data science specialization
Coursera data science specialization
 

Recently uploaded

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 

Recently uploaded (20)

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 

Literature Recommendation Software

  • 1. Literature Recommendation Software Faruk Cankaya Melike Keskin Supervisor: Florian Schramm Professor: Prof. Dr. Jürgen Ernstberger April 15, 2021
  • 2. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding similar papers ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 3. Introduction ➢ Problem Statement ○ No preliminary data ○ Paragraph input
  • 4. Introduction ➢ Keyword based input (X) ➢ Reference based recommendation (X) ➢ Mostly cited papers (X)  https://images.unsplash.com/photo-1526721940322-10fb6e3ae94a?utm_medium=medium&w=700&q=50&auto=format https://cdn-images-1.medium.com/max/880/0*LHnFAic3Jw4N_IdP https://images.unsplash.com/photo-1532012197267-da84d127e765?utm_medium=medium&w=700&q=50&auto=format
  • 5. Introduction ➢ Problem Statement ○ No preliminary data ○ Paragraph input ➢ Motivation ○ First recommender system based on just a paragraph input ○ Specific area based paper recommendation ○ Wide area to try different technique combinations ○ Make easier the writing thesis ○ Time saving ○ Specific domain
  • 6. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding similar papers ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 7. ➢ Related Works ○ Scienstein: A Research Paper Recommender System ■ Paper recommender ■ Hybrid filtering ■ Citation, author and source analysis ■ Preliminary data (citation analysis, author analysis, source analysis ) ○ Science Concierge: A Fast Content-Based Recommendation System for Scientific Publications ■ Paper recommender ■ Content-based filtering ■ Topic Modeling ■ Preliminary data (users’ votes) ○ ScienceDirect: Topic Modeling Driven Content-Based Jobs Recommendation Engine for Recruitment Industry ■ Job recommender ■ Content-based filtering ■ Topic Modeling ■ Preliminary data (job description, user details) Related Works
  • 8. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding similar papers ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 10. Methodology ➢ Used Method ○ Content-based ○ Data Preprocessing ■ Cleaning + Tokenization + Stop Word Removing + Lemmatization ○ Topic modelling ■ LDA ■ NMF ○ Similarity Function ■ Cosine Similarity
  • 11. ➢ Data preparation ○ Number of documents: ~12.000 papers ○ Tokenization, Cleaning text, Stop word removal, Stemming, Lemmatization, Synonym replacement, POS, etc. Our Model: Cleaning + Tokenization + Stop Word Removing + Lemmatization Methodology
  • 12. Methodology ➢ Vectorization Vectorization ● Bag of words ● TF-IDF……... Preprocessed input text Vectorized data
  • 13. Methodology ➢ Vectorization ○ Bag-Of-Words ○ TF-IDF terms, features or corpus items or documents
  • 14. Methodology ➢ Topic Extraction ○ Applied Topic Modeling Technique ■ LDA ■ NMF
  • 15. Methodology Vectorized data ➢ Topic Extraction Terms in each topic Topic Probability of each document
  • 16. Methodology ➢ Prediction / Recommendation ○ based on Cosine Similarity Topic Probability Matrix of dataset Topic Probability Vector of input
  • 17. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding Similarity ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 21. Results ➢ Number of Words in User Input
  • 22. Results ➢ Validation with user feedback ○ Before user feedback ■ Accuracy with content 3 ● LDA is better than NMF ■ Accuracy with content 10 ● NMF is better than LDA ○ After user feedback ■ NMF is better than LDA
  • 23. Agenda ➢ Introduction ○ Problem Statement ○ Motivation ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding Similarity ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 24. Conclusion & Future Works ➢ Conclusion ○ Found optimal data preprocessing model ■ Cleaning + Tokenization + Stop Word Removing + Lemmatization ○ Compared 2 different topic modelling techniques ■ LDA, and NMF ○ Compared model accuracies ○ User ratings ■ Models with LDA, and NMF
  • 25. ➢ Future Works ➢ Try another techniques such as BERT and check if the result of these techniques give better result on user rating feedback. ➢ Use user ratings to improve recommendation system ➢ Add new features to the website ➢ Try different topic modellings ➢ Try different similarity functions ➢ Train a model use the extracted topics ➢ Tune the hyperparameters according to new techniques Conclusion & Future Works
  • 26. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding Similarity ➢ Results ➢ Conclusion and Future Work ➢ Questions