Copper City Labs

About

Copper City Labs is a research lab that specializes in making computers understand the Uzbek language. You may reach us at hello@coppercitylabs.com.

Publications

  1. UzBERT: pretraining a BERT model for Uzbek [PDF, arXiv, figshare]
  2. Uzbek Cyrillic-Latin-Cyrillic Machine Transliteration [PDF, arXiv, figshare]; code
  3. Development of Word Embeddings for Uzbek Language [PDF, arXiv, figshare]

Models

  1. Uzbek news category classifier (based on UzBERT) [Hugging Face Hub]
  2. UzBERT (BERT for Cyrillic Uzbek) [Hugging Face Hub]
  3. Uzbek tokenizers [GitHub]
  4. Word embeddings for Uzbek (Cyrillic):
    • 100d fasttext (CBOW) [figshare]
    • 100d fasttext (skipgram) [figshare]
    • 100d word2vec (CBOW, negative sampling) [figshare]
    • 100d word2vec (skipgram, negative sampling) [figshare]
    • 300d fasttext (CBOW) [figshare]
    • 300d fasttext (skipgram) [figshare]
    • 300d glove [figshare]
    • 300d word2vec (CBOW, hierarchical softmax) [figshare]
    • 300d word2vec (skipgram, hierarchical softmax) [figshare]