Natural Language Processing (NLP) is a very interesting field of study under machine learning that enables computers to understand the natural language of humans. To understand the complexity of languages, complex patterns need to be studied and analyzed via different processes starting from noisy and incomplete voice input processing through lexical identification, syntactic and semantic analysis, and language interpretation in context.
In this article, we are mainly going to focus on Syntactic Analysis, which is a crucial part of NLP. We are going to discuss the following in brief:
Syntactic Analysis
Parsers
Grammar
Syntactic Analysis vs Lexical Analysis
(More to learn: NLP guide for beginners)
The first question that is bound to be asked by everyone is, What exactly is Syntactic Analysis? Syntactic analysis is described as the study of the logical meaning of specific phrases or portions of sentences.
It is the process of analyzing the natural language with the rules of formal grammar to find out the dictionary meaning of any sentence.
It is the third phase of NLP and it only works on a group of words or sentences.
It does not work on individual words as individual words do not determine the overall grammar of any sentence.
Syntactic analysis is also known as Syntax analysis or Parsing. To implement the task of parsing, we use parsers. Now let us learn what parsers are.
(Must read: NLP interview questions)
We already know parsers are used to implement parsing, but what is the definition of a parser? It is described as a software component meant to take input text data and provide a structural representation of the data after validation for correct syntax using formal grammar.
It also creates a data structure, which is often in the form of a parse tree, an abstract syntax tree, or another hierarchical structure. After searching over the space of a variety of trees, it attempts to identify an ideal tree for a certain text.
Generally, there are two types of Parsing: Top-down parsing and Bottom-up parsing.
In top-down parsing, the parser builds the parse tree from the start symbol and then attempts to convert the start symbol to the input. The recursive technique is used to process the input in the most popular type of top-down parsing, but it has one major drawback: backtracking.
Whereas, In bottom-up parsing, the parser begins with the input symbol and works its way up to the start symbol, attempting to create the parser tree. Now, these types of parsings are used by different parsers.
(Also read: Applications of NLP)
The following are the types of parsers that are available:
It is a to-the-point parser used frequently during parsing. It follows a top-down process where it checks if the syntax of the input is correct or not, by scanning the text from left to right.
For these sorts of parsers, the required operation is to read characters from the input stream and match them with the terminals using grammar. We will learn about grammar later in this article.
Shift-reduce parsers use a bottom-up process, unlike recursive descent parsers. Its goal is to locate the words and phrases that correspond to the right-hand side of a grammatical production, replace them with the left-hand side, and try to find a word sequence that continues until the entire sentence is reduced.
Thus this parser starts with the input symbol and builds the parser tree all the way to the start symbol.
Chart parser is mainly used for ambiguous grammars, like grammars of natural languages. It solves parsing difficulties using the dynamic programming idea. It saves partly theorized findings in a structure called a 'chart' as a consequence of dynamic programming. The 'chart' can also be utilized in a variety of situations.
It's one of the most popular parsers out there. On top of a POS-tagged string, it applies a regular expression defined in the form of grammar. Basically, it parses the input phrases using regular expressions and generates a parse tree as a result.
Now we know the types of parsing and types of parsers, let us learn about another important topic, Parse Trees. (Source)
(Suggested blog: Text mining techniques)
A Parse tree is a graphical representation of a derivation. The root node of the parse tree is the start symbol of derivation, whereas the leaf nodes are terminals and the inner nodes are non-terminals. The most useful characteristic of the parse tree is that it produces the original input string when traversed in sequence.
Parsing is done to analyze the grammar of a sentence, so we must have a basic idea about the concept of grammar. To explain the syntactic structure of well-formed programs, grammar is highly significant. They imply syntactical norms for dialogue in natural languages in the literary sense.
Since the beginning of natural languages such as English, Hindi, and others, linguists have sought to define the grammar. The theory of formal languages is also useful in computer science, particularly in the areas of programming languages and data structures.
In the ‘C' programming language, for example, the precise grammar rules specify how functions are created from lists and instructions.
(Recommended blog: Text Cleaning & Preprocessing in NLP)
There are three types of grammar that we will list out here: Constituency grammar, dependency grammar, and context-free grammar.
Constituency grammar is also known as phrase structure and is proposed by Noam Chomsky. It is based on constituency relation (hence, the name), and is completely the opposite of dependency grammar.
The sentence structure in this type of grammar is seen via the lens of constituency relations in all relevant frameworks. The constituency connection is derived from Latin and Greek grammar's subject-predicate division.
The noun phrase NP and verb phrase VP are used to understand the basic sentence structure. A parse tree that uses constituency grammar is known as a constituency-based parse tree.
The following are the most important aspects of Dependency Grammar and Dependency Relationship:
The linguistic units, i.e. words, are linked together via directed connections in DG.
The verb takes center stage in the sentence structure.
In terms of directed connection, all other syntactic elements are related to the verb. Dependencies are the syntactic components in question.
Parse trees that use dependency grammar are called dependency-based parse trees.
Context-free grammar (CFG) is a superset of Regular grammar and a notation for describing languages. The following 4 components consisting of a finite set of grammar rules:
It is indicated by the letter V. The non-terminals are syntactic variables that represent groups of strings that the grammar generates to help define the language.
It's also known as tokens, and it's defined by Σ. The fundamental symbols of terminals are used to create strings.
P is the symbol for it. The set specifies the possible combinations of terminals and non-terminals. Non-terminals, an arrow, and terminals make up every production(P) (the sequence of terminals). Non-terminals are referred to as the left side of the production, whereas terminals are referred to as the right side.
The production process starts with the start sign. The letter S stands for it. The start symbol is always a non-terminal symbol.
The main difference between syntactic analysis and lexical analysis is that lexical analysis is concerned with data cleaning and feature extraction with techniques like stemming, lemmatization, correcting misspelled words, and many more. Whereas in syntactic analysis, the roles played by words in a sentence are analyzed, the relationship between different words in the sentence is determined, and the grammatical structure of the sentence is interpreted.
For example, if we look into two sentences:
“Delhi is the capital of India” and “Is Delhi the capital of India?”
In these two sentences, the words are the same, yet the first sentence is more decipherable than the second, making the first one syntactically correct. However, using basic lexical processing approaches, we are unable to make these differences.
As a result, more advanced syntax processing algorithms are required to comprehend the link between individual words in a phrase. The following diagram shows the relation between lexical analysis and syntactical analysis:
Interaction between lexical analyzer and a parser
The syntactic analysis uses the following techniques that are not used by lexical analysis.
The goal of the syntactical analysis is to extract the relationship between words in a document. It will be difficult to understand the statement if the words are rearranged in a different sequence.
It is possible to completely change the meaning of a phrase by removing the stop-words. So stop-words are required to be retained.
Stemming and lemmatization will reduce the words to their simplest form, changing the sentence's syntax.
It's crucial to determine a word's right part of speech.
(Top reading: Text Analytics and Models in NLP)
NLP is getting more and more popular every day as it has many applications like chatbots, voice assistants, speech recognition, and many more. Syntactic analysis is a very important part of NLP that helps in understanding the grammatical meaning of any sentence. In this article, we have discussed the definition of syntactic analysis or parsing, talked about the types of parsers, and understood the basic concept of grammar. We have also learned the difference between syntactic analysis and lexical analysis.
5 Factors Influencing Consumer Behavior
READ MOREElasticity of Demand and its Types
READ MOREAn Overview of Descriptive Analysis
READ MOREWhat is PESTLE Analysis? Everything you need to know about it
READ MOREWhat is Managerial Economics? Definition, Types, Nature, Principles, and Scope
READ MORE5 Factors Affecting the Price Elasticity of Demand (PED)
READ MORE6 Major Branches of Artificial Intelligence (AI)
READ MOREScope of Managerial Economics
READ MOREDijkstra’s Algorithm: The Shortest Path Algorithm
READ MOREDifferent Types of Research Methods
READ MORE
Latest Comments