FastText Vector Norms And OOV Words

Summary

Word embeddings, trained on large unlabeled corpora are useful for many natural language processing tasks. FastText (Bojanowski et al., 2016) in contrast to Word2vec model accounts for sub-word information by also embedding sub-word n-grams. FastText word representation is the word embedding vector plus sum of n-grams contained in it. Word2vec vector norms have been shown (Schakel & Wilson, 2015) to be correlated to word significance. This blog post visualize vector norms of FastText embedding and evaluates use of FastText word vector norm multiplied with number of word n-grams for detecting non-english OOV words.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
results		results
.gitignore		.gitignore
README.md		README.md
main.py		main.py
presentation.odp		presentation.odp
presentation.pptx		presentation.pptx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results

results

.gitignore

.gitignore

README.md

README.md

main.py

main.py

presentation.odp

presentation.odp

presentation.pptx

presentation.pptx

requirements.txt

requirements.txt

Repository files navigation

FastText Vector Norms And OOV Words

Summary

About

Releases

Packages

Languages

vackosar/fasttext-vector-norms-and-oov-words

Folders and files

Latest commit

History

Repository files navigation

FastText Vector Norms And OOV Words

Summary

About

Resources

Stars

Watchers

Forks

Languages