Some of the key tasks in NLP involve understanding grammar, semantics, and syntax. To start this process, lemmatization and stemming convert raw text to a normalized text (e.g. converting “is” to “be” and “cities” to “city”). Further, preprocessing of text is often necessary before applying NLP algorithms. This involves removing many common words, called stop words, and tokenizing text in the form of sequences of sentences and words. Parsers play a central role in NLP by tagging words with a finite set of types. Part-of-Speech (POS) taggers specify the grammatical role of a word in a sentence and Named-Entity Recognition (NER) parsers tag words that fall within a small set of entities such as people, places, companies, dates, times, money, percentage, and geo-political monuments.
Semantic Role Labeling (SRL) breaks down a sentence into noun phrases (NP) and verb phrases (VP) by forming a semantic tree. The main challenge in modern NLP is deciphering meaning, aka Natural Language Understanding (NLU), and figuring out what the machine should “say” back, aka Natural Language Generation (NLG). Both of these require machines to learn the ambiguities of human languages and need to be trained to handle new tasks and express concepts. NLU and NLG play major roles in the creation of intelligent Chatbots with human-like conversational capabilities.