Introduction

Introduction

An overview on Natural Language Processing

What is NLP?

NLP is an interdisciplinary field that uses computational methods to analyze the properties of human language and model mechanisms underlying the understanding and production of language by computer.

In simple words, NLP is a way for computers to understand human language, generate it

Why NLP is important?

  • To interact with machines using human languages (chatbots).

  • To mine information stored in the form of written natural language.

Basic Model

Challenges

Primary Challenges

  • Ambiguity

  • Irregularity

  • Unknown Words

    Other challenges: Slang, Neogleism, bias and stereotyping

Ambiguous

For input, if we can build multiple linguistic structures, we call it ambiguous.

Ex: I made her duck ( here duck might mean animal or an action)

Ambiguity types :

  1. Lexical: word has more than one meaning

    Ex: I made her duck.

  2. Structural: multiple interpretations for a sentence.

    Ex: Boy saw the man with telescope.

    1: Boy saw a man using a telescope

    2:Boy saw a man holding a telescope.

  3. Anaphoric: Arises due to the use of anaphora entities (pronouns like- he, she, it etc)in sentences.

    Ex : Example: The horse ran up the hill with cat. It soon got tired.

    here the word "It" might refer to a horse or cat.

  4. Pragmatic: The situation where the context of a phrase gives it multiple interpretations.

    Ex: The chicken is ready to eat

    1: The chicken is ready to eat its food.

    2: The cooked chicken is ready to be served

Irregularity

In simple words, it is the existence of the same word in different forms, there are no specific rules to understand them(can't be generalized).

Colloquialisms and slang

  • colloquialisms may have no dictionary definition at all, they vary based on geographical areas.

  • cultural slang is constantly morphing and expanding

Bias

Bias can occur due to: Data, annotation process(metadata used to mark elements/ words in the dataset), Input representations, the models, the research design

Neologism

words formed by combining existing words, words with unique prefixes, and suffixes attached to them.

In the next article, we will be covering basic models (N-grams).

Did you find this article valuable?

Support Aryan Sri harsha by becoming a sponsor. Any amount is appreciated!