SlideShare a Scribd company logo
1 of 12
NATURAL LANGUAGE PROCESSING
INTRODUCTION:
Natural Language Processing (NLP) is an area of research and application that
explores how computers can be used to understand and manipulate natural language text
or speech to do useful things. NLP researchers aim to gather knowledge on how human
beings understand and use language so that appropriate tools and techniques can be
developed to make computer systems understand and manipulate natural languages to
perform the desired tasks. The foundations of NLP lie in a number of disciplines, viz.
computer and information sciences, linguistics, mathematics, electrical and electronic
engineering, artificial intelligence and robotics, psychology, etc.
NLP has become quite prominent due to the proliferation of the World Wide Web and
digital libraries. Several researchers have pointed out the need for appropriate research in
facilitating multi- or cross-lingual information retrieval, including multilingual text processing
and multilingual user interface systems, in order to exploit the full benefit of the www and digital
libraries.
APPROACHES TO NATURAL LANGUAGE PROCESSING:
Natural language processing approaches fall roughly into four categories:
symbolic, statistical, connectionist, and hybrid.
Symbolic approaches perform deep analysis of linguistic phenomena and are
based on explicit representation of facts about language through well-understood
knowledge representation schemes and associated algorithms. A good example of
symbolic approaches is seen in logic or rule-based systems. Another example of symbolic
approaches is semantic networks. Symbolic approaches have been used for a few decades
in a variety of research areas and applications such as information extraction, text
categorization, ambiguity resolution, and lexical acquisition. Typical techniques include:
explanation-based learning, rule-based learning, inductive logic programming, decision
trees, conceptual clustering, and K nearest neighbor algorithms.
Statistical approaches employ various mathematical techniques and often use
large text corpora to develop approximate generalized models of linguistic phenomena
based on actual examples of these phenomena provided by the text corpora without
adding significant linguistic or world knowledge. A frequently used statistical model is
the Hidden Markov Model (HMM) inherited from the speech community. Statistical
approaches have typically been used in tasks such as speech recognition, lexical
acquisition, parsing, part-of-speech tagging, collocations, statistical machine translation,
and statistical grammar learning, and so on.
Connectionist approaches also develop generalized models from examples of
linguistic phenomena. What separates connectionism from other statistical methods is
that connectionist models combine statistical learning with various theories of
representation - thus the connectionist representations allow transformation, inference,
and manipulation of logic formulae. They perform well at tasks such as word-sense
disambiguation, language generation, and limited inference. These models are well suited
for natural language processing tasks such as syntactic parsing, limited domain
translation tasks, and associative retrieval.
NATURAL LANGUAGE UNDERSTANDING:
Liddy (1998) and Feldman (1999) suggest that in order to understand natural languages, it is
important to be able to distinguish among the following seven interdependent levels, that people
use to extract meaning from text or spoken languages:
• Phonetic or phonological level that deals with pronunciation
• Morphological level that deals with the smallest parts of words, that carry a meaning, and
suffixes and prefixes
• Lexical level that deals with lexical meaning of words and parts of speech analyses
• Syntactic level that deals with grammar and structure of sentences
• Semantic level that deals with the meaning of words and sentences
• Discourse level that deals with the structure of different kinds of text using document
structures and
• Pragmatic level that deals with the knowledge that comes from the outside world, i.e., from
outside the contents of the document.
A natural language processing system may involve all or some of these levels of analysis.
MAJOR TASKS IN NLP:
The following is a list of some of the most commonly researched tasks in NLP.
Note that some of these tasks have direct real-world applications, while others more
commonly serve as subtasks that are used to aid in solving larger tasks. What
distinguishes these tasks from other potential and actual NLP tasks is not only the volume
of research devoted to them but the fact that for each one there is typically a well-defined
problem setting, a standard metric for evaluating the task, standard corpora on which the
task can be evaluated, and competitions devoted to the specific task.
 Automatic summarization: Produce a readable summary of a chunk of text.
Often used to provide summaries of text of a known type, such as articles in the
financial section of a newspaper.
 Coreference resolution: Given a sentence or larger chunk of text, determine
which words ("mentions") refer to the same objects ("entities"). Anaphora
resolution is a specific example of this task, and is specifically concerned with
matching up pronouns with the nouns or names that they refer to. The more
general task of coreference resolution also includes identify so-called "bridging
relationships" involving referring expressions. For example, in a sentence such as
"He entered John's house through the front door", "the front door" is a referring
expression and the bridging relationship to be identified is the fact that the door
being referred to is the front door of John's house (rather than of some other
structure that might also be referred to).
 Machine translation: Automatically translate text from one human language to
another. This is one of the most difficult problems, and is a member of a class of
problems colloquially termed "AI-complete", i.e. requiring all of the different
types of knowledge that humans possess (grammar, semantics, facts about the real
world, etc.) in order to solve properly.
 Morphological segmentation: Separate words into individual morphemes and
identify the class of the morphemes. The difficulty of this task depends greatly on
the complexity of the morphology (i.e. the structure of words) of the language
being considered. English has fairly simple morphology, especially inflectional
morphology, and thus it is often possible to ignore this task entirely and simply
model all possible forms of a word (e.g. "open, opens, opened, opening") as
separate words. In languages such as Turkish, however, such an approach is not
possible, as each dictionary entry has thousands of possible word forms.
 Named Entity Recognition : It is also known as entity identification and entity
extraction is a subtask of information extraction that seeks to locate and classify
atomic elements in text into predefined categories such as the names of persons,
organizations, locations, expressions of times, quantities, monetary values,
percentages, etc.
 Natural language generation: Convert information from computer databases
into readable human language.
 Optical Character Recognition (OCR): It is the mechanic or electronic
translation of scanned images of handwritten, type written or printed text into
machine-encoded text. It is widely used to convert books and documents into
electronic files, to computerize a record-keeping system in an office, or to publish
the text on a website. OCR makes it possible to edit the text, search for a word or
phrase, store it more compactly, display or print a copy free of scanning artifacts,
and apply techniques such as machine translation, text-to-speech and text
mining to it. OCR is a field of research in pattern recognition, artificial
intelligence and computer vision.
 Part-of-speech tagging: Given a sentence, determine the part of speech for each
word. Many words, especially common ones, can serve as multiple parts of
speech. For example, "book" can be a noun ("the book on the table") or verb ("to
book a flight"); "set" can be a noun, verb or adjective; and "out" can be any of at
least five different parts of speech. Note that some languages have more such
ambiguity than others.
 Parsing: It is an (ordered, rooted) tree that represents the syntactic structure of
a string according to some formal grammar. In a parse tree, the interior nodes are
labeled by non-terminals of the grammar, while the leaf nodes are labeled
by terminals of the grammar. A parse tree is made up of nodes and branches. The
image below represents a linguistic parse tree, here representing
the English sentence "John hit the ball". (The parse tree here is a greatly
simplified one; for more information, see X-bar theory.) The parse tree is the
entire structure, starting from S and ending in each of the leaf nodes (John, hit,
the, ball). We use the following abbreviations in the example:
 S for sentence, the top-level structure in this example
 NP for noun phrase. The first (leftmost) NP, a single noun "John", serves as
the subject of the sentence. The second one is the object of the sentence.
 VP for verb phrase, which serves as the predicate
 V for verb. In this case, it's a transitive verb "hit".
 Det for determiner, in this instance the definite article "the"
 N for noun.
 Question Answering: Given a human-language question, determine its answer.
Typical questions have a specific right answer (such as "What is the capital of
Canada?"), but sometimes open-ended questions are also considered (such as
"What is the meaning of life?").
 Relationship extraction: Given a chunk of text, identify the relationships among
named entities (e.g. who is the wife of whom).
 Sentence breaking (also known as sentence boundary disambiguation): Given a
chunk of text, find the sentence boundaries. Sentence boundaries are often marked
by periods or other punctuation marks, but these same characters can serve other
purposes (e.g. marking abbreviations).
 Sentiment analysis: Extract subjective information usually from a set of
documents, often using online reviews to determine "polarity" about specific
objects. It is especially useful for identifying trends of public opinion in the social
media, for the purpose of marketing.
 Speech recognition: Given a sound clip of a person or people speaking,
determine the textual representation of the speech. In natural speech there are
hardly any pauses between successive words, and thus speech segmentation is a
necessary subtask of speech recognition. Note also that in most spoken languages,
the sounds representing successive letters blend into each other in a process
termed co-articulation, so the conversion of the analog signal to discrete
characters can be a very difficult process.
 Speech segmentation: Given a sound clip of a person or people speaking,
separate it into words. A subtask of speech recognition and typically grouped with
it.
 Topic segmentation and recognition: Given a chunk of text, separate it into
segments each of which is devoted to a topic, and identify the topic of the
segment.
 Word segmentation: Separate a chunk of continuous text into separate words.
For a language like English, this is fairly trivial, since words are usually separated
by spaces. However, some written languages like Chinese, Japanese and Thai do
not mark word boundaries in such a fashion, and in those languages text
segmentation is a significant task requiring knowledge of
the vocabulary and morphology of words in the language.
 Word sense disambiguation: Many words have more than one meaning; we
have to select the meaning which makes the most sense in context. For this
problem, we are typically given a list of words and associated word senses, e.g.
from a dictionary or from an online resource such as Word Net.
 Stemming is the process for reducing inflected (or sometimes derived) words to
their stem, base or root form—generally a written word form. Many search
engines treat words with the same stem as synonyms as a kind of query
broadening, a process called conflation. Stemming programs are commonly
referred to as stemming algorithms or stemmers. A stemming algorithm reduces
the words "fishing", "fished", "fish", and "fisher" to the root word, "fish".
 Text simplification is an operation used in natural language processing to
modify, enhance, classify or otherwise process an existing corpus of human-
readable text in such a way that the grammar and structure of the prose is greatly
simplified, while the underlying meaning and information remains the same. Text
simplification is an important area of research, because natural human languages
ordinarily contain complex compound constructions that are not easily processed
through automation.
 Text to speech: Speech synthesis is the artificial production of human speech. A
computer system used for this purpose is called a speech synthesizer, and can be
implemented in software or hardware. A text-to-speech (TTS) system converts
normal language text into speech; other systems render symbolic linguistic
representations like phonetic transcriptions into speech.
 Text proofing : Proofreading is the reading of a galley proof or computer
monitor to detect and correct production-errors of text or art. Proofreaders are
expected to be consistently accurate by default because they occupy the last stage
of typographic production before publication.
 Natural Language User Interfaces (LUI) are a type of computer human
interface where linguistic phenomena such as verbs, phrases and clauses act as UI
controls for creating, selecting and modifying data in software applications.
 Query expansion (QE) is the process of reformulating a seed query to improve
retrieval performance in information retrieval operations. In the context of
web search engines, query expansion involves evaluating a user's input and
expanding the search query to match additional documents.
 True casing is the problem in natural language processing (NLP) of determining
the proper capitalization of words where such information is unavailable.
In some cases, sets of related tasks are grouped into subfields of NLP that are often
considered separately from NLP as a whole. Examples include:
 Information retrieval (IR): This is concerned with storing, searching and
retrieving information. It is a separate field within computer science (closer to
databases), but IR relies on some NLP methods (for example, stemming). Some
current research and applications seek to bridge the gap between IR and NLP.
 Information extraction (IE): This is concerned in general with the extraction of
semantic information from text. This covers tasks such as named entity
recognition, coreference resolution, relationship extraction, etc.
 Dialogue Systems: perhaps the omnipresent application of the future, in the
systems envisioned by large providers of end-user applications. Dialogue systems,
which usually focus on a narrowly defined application (e.g. your refrigerator or
home sound system), currently utilize the phonetic and lexical levels of language.
It is believe that utilization of all the levels of language processing explained
above offer the potential for truly habitable dialogue systems.
APPLICATIONS:
Natural language processing can be used in the automatic scoring of the
students’ papers. Automatic scoring is a help to teachers of giving students exact marks
in examination. Chinese subjective questions scoring algorithms are based on similarity
calculation and natural language processing. Forward maximum matching algorithm and
semantic similarity computation and contrary degree calculation can overcome problems
of Chinese subjective questions scoring. Forward maximum matching algorithm is a
convenient segmentation method.
Mohd Ibrahim, Rodina Ahmad Proposes a method and a tool to facilitate
requirements analysis process and class diagram extraction from textual requirements
supporting natural language processing and Domain Ontology techniques. Requirement
Analysis and Class Diagram Extraction (RACE) tool system is decomposed in to internal
and external components subsystems. They are a) Open NLP parser b) Race stemming
algorithm c) Word Net d) Concept extraction engine e) Domain ontology f) Class
extraction engine g) Race concept management (UI).
The natural language processing software which transforms a natural language
sentence into a series of commands for a mobile service robot. A prototype application
is presented which allows severely disabled persons bound to a wheel chair to control the
chair and their home environment by natural language. The spoken sentence is parsed by
Finite State Transducer Network (FSTN) , which checks the grammatical structure and
vocabulary. The output of FSTN then is processed by transformation module. First the
sentence is split into the basic components “instruction”, “intervention”, and “question”.
The splitting algorithm is based on the recognition of grammatical word pattern. These
patterns are stored and checked on semantic completeness. This process is based on the
environment and semantic database contained within the environment module. In case of
missing information, the system checks the context memory to find missing data. Incase
of no entry can be found, the question/answer module is initiated and a question is
generated. Finally, a control sequence is generated and transmitted to the controller.
Mladen Stanojevic and Sanja Vranes propose a model that could be used to
describe roughly the process of understanding as it happens in our brain. They developed
a Hierarchical Semantic Form (HSF) for the representation of semantics and the
corresponding SOUL (Space of Universal Links) learning algorithm. The proposed HSF
which resembles hierarchically organized neural network and represents a hierarchical
equivalent of plain text forms, where all semantic categories are explicitly represented
and hierarchically organized. To validate the ideas of HSF and SOUL we have developed
a prototype of Semantic Web Service for Flight Information Service (FIS).
Flight schedule query system based on NLP is developed in order to increase the
capability of natural language processing from time to time. Here there are two types of
methodology they are requirement prototype and evolution prototype. The prototype
consists of 4 procedures that are initial investigation, system analysis, system design and
system implementation. The system design process is emphasizes on 3 main aspects
which are design of knowledge base, inference engine and user interface.
The Email-Based Intrusion Detection System to find Social Engineering using
Natural language processing(EBIDS-SENLP) does this by reading plain text emails and
sending emails natural language body to OntoSem(Ontology). However, OntoSem needs
pretraining to look for certain segments of language as concepts of deception to
meaningfully analyze the raw text and tell EBIDS what types of deception the e-mail
could contain.
The E-democracy European network project applied NLP to improve
communication between public administrations and their citizens- are a major application
field for NLP and language engineering. Our goal was twofold: to test whether we could
meet e-democracy requirements using advanced linguistic technologies and to test
whether Augmented Phrase Structure Grammars (APSGs) were robust and well-assessed
enough to use in a real world (and highly sensitive) environment. To improve
communication the set of modules as a toolset comprising 5 toolsets address guesser,
answer tree, style enhancer, multi language helper and natural language map.
In clinical application, the NLP and Vision Processing (VP) might be harnessed
together in to a single system. The key to our integration is use of a single target
representation. Their combination is more powerful than either in isolation because they
mutually disambiguate and confirm one another.
Speech based information service systems are much friendlier and more
accessible than keyboard type-in based one. Here a new way of error detection and
correction using Comprehensive Information based natural language processing. A
module of text post-processing is added after a Speech Recognizer. In this module,
syntactic, semantic and pragmatic information are analyzed to check the result sentence
text of the Speech Recognizer. If there are some errors, the module also tries to correct
them.
ALGORITHM FOR CI BASED SENTENCE PROCESSING
To enhance the readability of Japanese texts by machine-assisted language control
with automatic paraphrasing. Technically plain Japanese is a controlled language and its
specification includes controlled spelling, controlled vocabulary, controlled grammar and
limitation of quantitative parameters such as sentence length.
Natural Languages Interfaces to Data Bases” (NLIDB) is a system that allows a
user to interact with a database by expressing himself in a natural language such as
English or any other. This system prepares an “expert system” implemented in prolog
which it can identify synonymous words in any language. It first parses the input
sentences, and then the natural language expressions are transformed to SQL language.
Understanding natural language by machine includes Syntactic knowledge (shallow
parsing, syntactic parsing, and lexical cohesion), semantic knowledge, expert system,
prolog, semantic analysis, implementing proposal determinant system.
Conclusions:
While NLP is a relatively recent area of research and application, as compared to
other information technology approaches, there have been sufficient successes to date
that suggest that NLP-based information access technologies will continue to be a major
area of research and development in information systems now and far into the future.
REPORT ON NATURAL
LANGUAGE PROCESSING

More Related Content

Similar to REPORT.doc

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural language processing
Natural language processingNatural language processing
Natural language processingSaurav Aryal
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewBenjaminlapid1
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShashank Shisodia
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxAlyaaMachi
 
Natural Language Processing from Object Automation
Natural Language Processing from Object Automation Natural Language Processing from Object Automation
Natural Language Processing from Object Automation Object Automation
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingBhavya Chawla
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingdhruv_chaudhari
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 

Similar to REPORT.doc (20)

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
NLP
NLPNLP
NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overview
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Natural Language Processing from Object Automation
Natural Language Processing from Object Automation Natural Language Processing from Object Automation
Natural Language Processing from Object Automation
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
NLP
NLPNLP
NLP
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Language Modeling.docx
Language Modeling.docxLanguage Modeling.docx
Language Modeling.docx
 
Nltk
NltkNltk
Nltk
 

Recently uploaded

internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

REPORT.doc

  • 1. NATURAL LANGUAGE PROCESSING INTRODUCTION: Natural Language Processing (NLP) is an area of research and application that explores how computers can be used to understand and manipulate natural language text or speech to do useful things. NLP researchers aim to gather knowledge on how human beings understand and use language so that appropriate tools and techniques can be developed to make computer systems understand and manipulate natural languages to perform the desired tasks. The foundations of NLP lie in a number of disciplines, viz. computer and information sciences, linguistics, mathematics, electrical and electronic engineering, artificial intelligence and robotics, psychology, etc. NLP has become quite prominent due to the proliferation of the World Wide Web and digital libraries. Several researchers have pointed out the need for appropriate research in facilitating multi- or cross-lingual information retrieval, including multilingual text processing and multilingual user interface systems, in order to exploit the full benefit of the www and digital libraries. APPROACHES TO NATURAL LANGUAGE PROCESSING: Natural language processing approaches fall roughly into four categories: symbolic, statistical, connectionist, and hybrid. Symbolic approaches perform deep analysis of linguistic phenomena and are based on explicit representation of facts about language through well-understood knowledge representation schemes and associated algorithms. A good example of symbolic approaches is seen in logic or rule-based systems. Another example of symbolic approaches is semantic networks. Symbolic approaches have been used for a few decades in a variety of research areas and applications such as information extraction, text categorization, ambiguity resolution, and lexical acquisition. Typical techniques include: explanation-based learning, rule-based learning, inductive logic programming, decision trees, conceptual clustering, and K nearest neighbor algorithms. Statistical approaches employ various mathematical techniques and often use large text corpora to develop approximate generalized models of linguistic phenomena based on actual examples of these phenomena provided by the text corpora without adding significant linguistic or world knowledge. A frequently used statistical model is
  • 2. the Hidden Markov Model (HMM) inherited from the speech community. Statistical approaches have typically been used in tasks such as speech recognition, lexical acquisition, parsing, part-of-speech tagging, collocations, statistical machine translation, and statistical grammar learning, and so on. Connectionist approaches also develop generalized models from examples of linguistic phenomena. What separates connectionism from other statistical methods is that connectionist models combine statistical learning with various theories of representation - thus the connectionist representations allow transformation, inference, and manipulation of logic formulae. They perform well at tasks such as word-sense disambiguation, language generation, and limited inference. These models are well suited for natural language processing tasks such as syntactic parsing, limited domain translation tasks, and associative retrieval. NATURAL LANGUAGE UNDERSTANDING: Liddy (1998) and Feldman (1999) suggest that in order to understand natural languages, it is important to be able to distinguish among the following seven interdependent levels, that people use to extract meaning from text or spoken languages: • Phonetic or phonological level that deals with pronunciation • Morphological level that deals with the smallest parts of words, that carry a meaning, and suffixes and prefixes • Lexical level that deals with lexical meaning of words and parts of speech analyses • Syntactic level that deals with grammar and structure of sentences • Semantic level that deals with the meaning of words and sentences • Discourse level that deals with the structure of different kinds of text using document structures and • Pragmatic level that deals with the knowledge that comes from the outside world, i.e., from outside the contents of the document. A natural language processing system may involve all or some of these levels of analysis. MAJOR TASKS IN NLP: The following is a list of some of the most commonly researched tasks in NLP. Note that some of these tasks have direct real-world applications, while others more commonly serve as subtasks that are used to aid in solving larger tasks. What distinguishes these tasks from other potential and actual NLP tasks is not only the volume
  • 3. of research devoted to them but the fact that for each one there is typically a well-defined problem setting, a standard metric for evaluating the task, standard corpora on which the task can be evaluated, and competitions devoted to the specific task.  Automatic summarization: Produce a readable summary of a chunk of text. Often used to provide summaries of text of a known type, such as articles in the financial section of a newspaper.  Coreference resolution: Given a sentence or larger chunk of text, determine which words ("mentions") refer to the same objects ("entities"). Anaphora resolution is a specific example of this task, and is specifically concerned with matching up pronouns with the nouns or names that they refer to. The more general task of coreference resolution also includes identify so-called "bridging relationships" involving referring expressions. For example, in a sentence such as "He entered John's house through the front door", "the front door" is a referring expression and the bridging relationship to be identified is the fact that the door being referred to is the front door of John's house (rather than of some other structure that might also be referred to).  Machine translation: Automatically translate text from one human language to another. This is one of the most difficult problems, and is a member of a class of problems colloquially termed "AI-complete", i.e. requiring all of the different types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) in order to solve properly.  Morphological segmentation: Separate words into individual morphemes and identify the class of the morphemes. The difficulty of this task depends greatly on the complexity of the morphology (i.e. the structure of words) of the language being considered. English has fairly simple morphology, especially inflectional morphology, and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g. "open, opens, opened, opening") as separate words. In languages such as Turkish, however, such an approach is not possible, as each dictionary entry has thousands of possible word forms.
  • 4.  Named Entity Recognition : It is also known as entity identification and entity extraction is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.  Natural language generation: Convert information from computer databases into readable human language.  Optical Character Recognition (OCR): It is the mechanic or electronic translation of scanned images of handwritten, type written or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.  Part-of-speech tagging: Given a sentence, determine the part of speech for each word. Many words, especially common ones, can serve as multiple parts of speech. For example, "book" can be a noun ("the book on the table") or verb ("to book a flight"); "set" can be a noun, verb or adjective; and "out" can be any of at least five different parts of speech. Note that some languages have more such ambiguity than others.  Parsing: It is an (ordered, rooted) tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the grammar. A parse tree is made up of nodes and branches. The image below represents a linguistic parse tree, here representing the English sentence "John hit the ball". (The parse tree here is a greatly simplified one; for more information, see X-bar theory.) The parse tree is the
  • 5. entire structure, starting from S and ending in each of the leaf nodes (John, hit, the, ball). We use the following abbreviations in the example:  S for sentence, the top-level structure in this example  NP for noun phrase. The first (leftmost) NP, a single noun "John", serves as the subject of the sentence. The second one is the object of the sentence.  VP for verb phrase, which serves as the predicate  V for verb. In this case, it's a transitive verb "hit".  Det for determiner, in this instance the definite article "the"  N for noun.  Question Answering: Given a human-language question, determine its answer. Typical questions have a specific right answer (such as "What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the meaning of life?").  Relationship extraction: Given a chunk of text, identify the relationships among named entities (e.g. who is the wife of whom).  Sentence breaking (also known as sentence boundary disambiguation): Given a chunk of text, find the sentence boundaries. Sentence boundaries are often marked by periods or other punctuation marks, but these same characters can serve other purposes (e.g. marking abbreviations).  Sentiment analysis: Extract subjective information usually from a set of documents, often using online reviews to determine "polarity" about specific objects. It is especially useful for identifying trends of public opinion in the social media, for the purpose of marketing.
  • 6.  Speech recognition: Given a sound clip of a person or people speaking, determine the textual representation of the speech. In natural speech there are hardly any pauses between successive words, and thus speech segmentation is a necessary subtask of speech recognition. Note also that in most spoken languages, the sounds representing successive letters blend into each other in a process termed co-articulation, so the conversion of the analog signal to discrete characters can be a very difficult process.  Speech segmentation: Given a sound clip of a person or people speaking, separate it into words. A subtask of speech recognition and typically grouped with it.  Topic segmentation and recognition: Given a chunk of text, separate it into segments each of which is devoted to a topic, and identify the topic of the segment.  Word segmentation: Separate a chunk of continuous text into separate words. For a language like English, this is fairly trivial, since words are usually separated by spaces. However, some written languages like Chinese, Japanese and Thai do not mark word boundaries in such a fashion, and in those languages text segmentation is a significant task requiring knowledge of the vocabulary and morphology of words in the language.  Word sense disambiguation: Many words have more than one meaning; we have to select the meaning which makes the most sense in context. For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary or from an online resource such as Word Net.  Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form. Many search engines treat words with the same stem as synonyms as a kind of query broadening, a process called conflation. Stemming programs are commonly referred to as stemming algorithms or stemmers. A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish".
  • 7.  Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human- readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains the same. Text simplification is an important area of research, because natural human languages ordinarily contain complex compound constructions that are not easily processed through automation.  Text to speech: Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.  Text proofing : Proofreading is the reading of a galley proof or computer monitor to detect and correct production-errors of text or art. Proofreaders are expected to be consistently accurate by default because they occupy the last stage of typographic production before publication.  Natural Language User Interfaces (LUI) are a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.  Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. In the context of web search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents.  True casing is the problem in natural language processing (NLP) of determining the proper capitalization of words where such information is unavailable. In some cases, sets of related tasks are grouped into subfields of NLP that are often considered separately from NLP as a whole. Examples include:  Information retrieval (IR): This is concerned with storing, searching and retrieving information. It is a separate field within computer science (closer to
  • 8. databases), but IR relies on some NLP methods (for example, stemming). Some current research and applications seek to bridge the gap between IR and NLP.  Information extraction (IE): This is concerned in general with the extraction of semantic information from text. This covers tasks such as named entity recognition, coreference resolution, relationship extraction, etc.  Dialogue Systems: perhaps the omnipresent application of the future, in the systems envisioned by large providers of end-user applications. Dialogue systems, which usually focus on a narrowly defined application (e.g. your refrigerator or home sound system), currently utilize the phonetic and lexical levels of language. It is believe that utilization of all the levels of language processing explained above offer the potential for truly habitable dialogue systems. APPLICATIONS: Natural language processing can be used in the automatic scoring of the students’ papers. Automatic scoring is a help to teachers of giving students exact marks in examination. Chinese subjective questions scoring algorithms are based on similarity calculation and natural language processing. Forward maximum matching algorithm and semantic similarity computation and contrary degree calculation can overcome problems of Chinese subjective questions scoring. Forward maximum matching algorithm is a convenient segmentation method. Mohd Ibrahim, Rodina Ahmad Proposes a method and a tool to facilitate requirements analysis process and class diagram extraction from textual requirements supporting natural language processing and Domain Ontology techniques. Requirement Analysis and Class Diagram Extraction (RACE) tool system is decomposed in to internal and external components subsystems. They are a) Open NLP parser b) Race stemming algorithm c) Word Net d) Concept extraction engine e) Domain ontology f) Class extraction engine g) Race concept management (UI). The natural language processing software which transforms a natural language sentence into a series of commands for a mobile service robot. A prototype application is presented which allows severely disabled persons bound to a wheel chair to control the chair and their home environment by natural language. The spoken sentence is parsed by
  • 9. Finite State Transducer Network (FSTN) , which checks the grammatical structure and vocabulary. The output of FSTN then is processed by transformation module. First the sentence is split into the basic components “instruction”, “intervention”, and “question”. The splitting algorithm is based on the recognition of grammatical word pattern. These patterns are stored and checked on semantic completeness. This process is based on the environment and semantic database contained within the environment module. In case of missing information, the system checks the context memory to find missing data. Incase of no entry can be found, the question/answer module is initiated and a question is generated. Finally, a control sequence is generated and transmitted to the controller. Mladen Stanojevic and Sanja Vranes propose a model that could be used to describe roughly the process of understanding as it happens in our brain. They developed a Hierarchical Semantic Form (HSF) for the representation of semantics and the corresponding SOUL (Space of Universal Links) learning algorithm. The proposed HSF which resembles hierarchically organized neural network and represents a hierarchical equivalent of plain text forms, where all semantic categories are explicitly represented and hierarchically organized. To validate the ideas of HSF and SOUL we have developed a prototype of Semantic Web Service for Flight Information Service (FIS). Flight schedule query system based on NLP is developed in order to increase the capability of natural language processing from time to time. Here there are two types of methodology they are requirement prototype and evolution prototype. The prototype consists of 4 procedures that are initial investigation, system analysis, system design and system implementation. The system design process is emphasizes on 3 main aspects which are design of knowledge base, inference engine and user interface. The Email-Based Intrusion Detection System to find Social Engineering using Natural language processing(EBIDS-SENLP) does this by reading plain text emails and sending emails natural language body to OntoSem(Ontology). However, OntoSem needs pretraining to look for certain segments of language as concepts of deception to meaningfully analyze the raw text and tell EBIDS what types of deception the e-mail could contain. The E-democracy European network project applied NLP to improve communication between public administrations and their citizens- are a major application
  • 10. field for NLP and language engineering. Our goal was twofold: to test whether we could meet e-democracy requirements using advanced linguistic technologies and to test whether Augmented Phrase Structure Grammars (APSGs) were robust and well-assessed enough to use in a real world (and highly sensitive) environment. To improve communication the set of modules as a toolset comprising 5 toolsets address guesser, answer tree, style enhancer, multi language helper and natural language map. In clinical application, the NLP and Vision Processing (VP) might be harnessed together in to a single system. The key to our integration is use of a single target representation. Their combination is more powerful than either in isolation because they mutually disambiguate and confirm one another. Speech based information service systems are much friendlier and more accessible than keyboard type-in based one. Here a new way of error detection and correction using Comprehensive Information based natural language processing. A module of text post-processing is added after a Speech Recognizer. In this module, syntactic, semantic and pragmatic information are analyzed to check the result sentence text of the Speech Recognizer. If there are some errors, the module also tries to correct them.
  • 11. ALGORITHM FOR CI BASED SENTENCE PROCESSING To enhance the readability of Japanese texts by machine-assisted language control with automatic paraphrasing. Technically plain Japanese is a controlled language and its specification includes controlled spelling, controlled vocabulary, controlled grammar and limitation of quantitative parameters such as sentence length. Natural Languages Interfaces to Data Bases” (NLIDB) is a system that allows a user to interact with a database by expressing himself in a natural language such as English or any other. This system prepares an “expert system” implemented in prolog which it can identify synonymous words in any language. It first parses the input sentences, and then the natural language expressions are transformed to SQL language. Understanding natural language by machine includes Syntactic knowledge (shallow parsing, syntactic parsing, and lexical cohesion), semantic knowledge, expert system, prolog, semantic analysis, implementing proposal determinant system. Conclusions: While NLP is a relatively recent area of research and application, as compared to other information technology approaches, there have been sufficient successes to date that suggest that NLP-based information access technologies will continue to be a major area of research and development in information systems now and far into the future.