Skip to content

This blog post visualize vector norms of FastText embedding and evaluates use of FastText word vector norm multiplied with number of word n-grams for detecting non-english OOV words.

Notifications You must be signed in to change notification settings

vackosar/fasttext-vector-norms-and-oov-words

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastText Vector Norms And OOV Words

Summary

Word embeddings, trained on large unlabeled corpora are useful for many natural language processing tasks. FastText (Bojanowski et al., 2016) in contrast to Word2vec model accounts for sub-word information by also embedding sub-word n-grams. FastText word representation is the word embedding vector plus sum of n-grams contained in it. Word2vec vector norms have been shown (Schakel & Wilson, 2015) to be correlated to word significance. This blog post visualize vector norms of FastText embedding and evaluates use of FastText word vector norm multiplied with number of word n-grams for detecting non-english OOV words.

About

This blog post visualize vector norms of FastText embedding and evaluates use of FastText word vector norm multiplied with number of word n-grams for detecting non-english OOV words.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages