What is NLP?
NLP is an interdisciplinary field that uses computational methods to analyze the properties of human language and model mechanisms underlying the understanding and production of language by computer.
In simple words, NLP is a way for computers to understand human language, generate it
Why NLP is important?
To interact with machines using human languages (chatbots).
To mine information stored in the form of written natural language.
Basic Model
Challenges
Primary Challenges
Ambiguity
Irregularity
Unknown Words
Other challenges: Slang, Neogleism, bias and stereotyping
Ambiguous
For input, if we can build multiple linguistic structures, we call it ambiguous.
Ex: I made her duck ( here duck might mean animal or an action)
Ambiguity types :
Lexical: word has more than one meaning
Ex: I made her duck.
Structural: multiple interpretations for a sentence.
Ex: Boy saw the man with telescope.
1: Boy saw a man using a telescope
2:Boy saw a man holding a telescope.
Anaphoric: Arises due to the use of anaphora entities (pronouns like- he, she, it etc)in sentences.
Ex : Example: The horse ran up the hill with cat. It soon got tired.
here the word "It" might refer to a horse or cat.
Pragmatic: The situation where the context of a phrase gives it multiple interpretations.
Ex: The chicken is ready to eat
1: The chicken is ready to eat its food.
2: The cooked chicken is ready to be served
Irregularity
In simple words, it is the existence of the same word in different forms, there are no specific rules to understand them(can't be generalized).
Colloquialisms and slang
colloquialisms may have no dictionary definition at all, they vary based on geographical areas.
cultural slang is constantly morphing and expanding
Bias
Bias can occur due to: Data, annotation process(metadata used to mark elements/ words in the dataset), Input representations, the models, the research design
Neologism
words formed by combining existing words, words with unique prefixes, and suffixes attached to them.
In the next article, we will be covering basic models (N-grams).