(testing signal)

Tag: keywordextraction

Keyword Extraction Methods — The Overview

What is keyword extraction?

Read more...

NLP Natural Language Processing

Main ideas

The process for NLP is always similar to other classification algos:

  • Compile documents. Get the data which uses to be raw text.
  • Featurize documents. Get the text in a format that ML algorithms understand.
  • Compare features for classification. Use ML techniques to build the model.

Unstructured text ⇒ Compile documents ⇒ Featurize them ⇒ Compare features

How does does the algorithm work

In NLP the featurization is done through vectorization:

  • Corpus of D documents: a = “The House is Blue” , b = ”The House is Red”.
  • Build and index of relevant, meaningful keywords. Eg (house,blue,red)
  • Vectorize documents. Eg a = “The Blue House” ⇒ (1,1,0)
  • Compare the docs as follows:

Use cosine similarity to compare: similarity docs(a,b) = cos (θ)

Characterize the terms:
Term Frequency TF(t) = TF(t,d) ⇒ Importance of the term t within doc d
Inverse Doc Frequency IDF(t) = log (D/t) ⇒ Importance of term within corpus D
TF-IDF = This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.… Read more...