Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Add Mixtral] Adds support for the Mixtral MoE #27942

Merged
merged 111 commits into from
Dec 11, 2023
Merged
Show file tree
Hide file tree
Changes from 93 commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
15ef543
up
younesbelkada Dec 8, 2023
3367d25
up
younesbelkada Dec 8, 2023
f9da444
test
younesbelkada Dec 8, 2023
f59eacc
logits ok
younesbelkada Dec 8, 2023
7e0968a
up
younesbelkada Dec 8, 2023
0bfcd75
up
younesbelkada Dec 8, 2023
6b84e42
few fixes
younesbelkada Dec 8, 2023
2896e2f
conversion script
younesbelkada Dec 8, 2023
92d143f
up
younesbelkada Dec 8, 2023
d3261c1
nits
ArthurZucker Dec 8, 2023
407f8a8
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 8, 2023
65bbd30
nits
ArthurZucker Dec 8, 2023
6afc8f3
update
ArthurZucker Dec 8, 2023
bfef811
Merge branch 'main' into add-mixtral-alternative
younesbelkada Dec 8, 2023
7a54d1a
nuke
younesbelkada Dec 8, 2023
b9f3fc0
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
younesbelkada Dec 8, 2023
f8513e8
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 8, 2023
0d31424
more updates
ArthurZucker Dec 8, 2023
c8987cb
nites
ArthurZucker Dec 8, 2023
d82c8ee
fix many issues
younesbelkada Dec 9, 2023
ccc6011
nit
younesbelkada Dec 9, 2023
356d484
scatter
ArthurZucker Dec 9, 2023
e858c01
nit
younesbelkada Dec 9, 2023
82037ca
nuke megablocks
younesbelkada Dec 9, 2023
e66d1a9
nits
ArthurZucker Dec 9, 2023
49eb7f0
fix conversion script
younesbelkada Dec 9, 2023
ffc8463
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 9, 2023
82e4a1b
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 9, 2023
0b1ca52
nit
younesbelkada Dec 9, 2023
3616d3b
remove
ArthurZucker Dec 9, 2023
6d73a58
nits
ArthurZucker Dec 9, 2023
4c1fbf3
nit
younesbelkada Dec 9, 2023
a922710
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 9, 2023
1abf6bd
update
ArthurZucker Dec 9, 2023
b938a30
oupsssss
ArthurZucker Dec 9, 2023
12ddba9
change
younesbelkada Dec 9, 2023
445e6e6
nits device
ArthurZucker Dec 9, 2023
1e83d0e
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 9, 2023
263310f
nits
ArthurZucker Dec 9, 2023
b2bedb1
fixup
ArthurZucker Dec 9, 2023
4afd7e4
update
ArthurZucker Dec 9, 2023
c0e6dfd
merge
ArthurZucker Dec 9, 2023
dd33a59
add copied from
ArthurZucker Dec 9, 2023
0de7081
fix the copy mentions
ArthurZucker Dec 9, 2023
48945de
update tests
ArthurZucker Dec 9, 2023
d927baf
more fixes
ArthurZucker Dec 9, 2023
7402aca
nits
ArthurZucker Dec 9, 2023
54bee10
conversion script
younesbelkada Dec 9, 2023
1ae98dc
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
younesbelkada Dec 9, 2023
8bf257f
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker Dec 9, 2023
01b2969
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 9, 2023
80c593e
add parts of the readme
ArthurZucker Dec 9, 2023
dab227f
Update tests/models/mixtral/test_modeling_mixtral.py
younesbelkada Dec 9, 2023
ca1f7d0
new test + conversion script
younesbelkada Dec 9, 2023
e4237a3
Apply suggestions from code review
younesbelkada Dec 9, 2023
0c04bc3
Apply suggestions from code review
younesbelkada Dec 9, 2023
bd7c786
fix
younesbelkada Dec 9, 2023
b2b8e01
fix copies
younesbelkada Dec 9, 2023
badceae
fix copies
younesbelkada Dec 9, 2023
11a4db9
ooops
younesbelkada Dec 9, 2023
419ddb3
fix config
younesbelkada Dec 9, 2023
bbbd1b2
Apply suggestions from code review
younesbelkada Dec 9, 2023
2b23c47
fix nits
younesbelkada Dec 9, 2023
76a65e6
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
younesbelkada Dec 9, 2023
18caab8
nit
younesbelkada Dec 9, 2023
a00ad3a
add copies
younesbelkada Dec 9, 2023
657dd95
add batched tests
younesbelkada Dec 9, 2023
a092648
docs
younesbelkada Dec 10, 2023
67e8e03
fix flash attention
younesbelkada Dec 10, 2023
72542dd
let's add more verbose
younesbelkada Dec 10, 2023
d3f5abb
add correct outputs
ArthurZucker Dec 10, 2023
e900e36
support router ouptus
ArthurZucker Dec 10, 2023
68b8b41
ignore copies where needed
ArthurZucker Dec 10, 2023
ded6028
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 10, 2023
1ceb940
fix
ArthurZucker Dec 10, 2023
38eef46
cat list if list is given for now
ArthurZucker Dec 10, 2023
8d3f83f
nits
ArthurZucker Dec 10, 2023
ee5f3e9
Update docs/source/en/model_doc/mixtral.md
younesbelkada Dec 10, 2023
e834e89
finish router refactoring
ArthurZucker Dec 10, 2023
1b6358e
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 10, 2023
54f2a48
fix forward
ArthurZucker Dec 10, 2023
19be169
fix expected values
ArthurZucker Dec 10, 2023
5c929df
nits
ArthurZucker Dec 10, 2023
872ee24
fixup
ArthurZucker Dec 10, 2023
eaa2a5f
fix
younesbelkada Dec 10, 2023
703672d
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 10, 2023
10760c1
fix bug
younesbelkada Dec 10, 2023
3499c98
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
younesbelkada Dec 10, 2023
19e4aea
fix
ArthurZucker Dec 10, 2023
9bc7d5b
fix dtype mismatch
younesbelkada Dec 10, 2023
290f621
fix
ArthurZucker Dec 10, 2023
1f411a1
Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…
ArthurZucker Dec 10, 2023
9ebf661
grrr grrr I support item assignment
ArthurZucker Dec 10, 2023
23abc46
fix CI
younesbelkada Dec 10, 2023
6549f48
docs
younesbelkada Dec 11, 2023
39e38ed
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker Dec 11, 2023
e4b84bc
fixup
ArthurZucker Dec 11, 2023
80d49aa
remove some copied form
ArthurZucker Dec 11, 2023
fbde97b
fix weird diff
younesbelkada Dec 11, 2023
20091dc
skip doctest fast on the config and modeling
ArthurZucker Dec 11, 2023
adc7113
mark that is supports flash attention in the doc
ArthurZucker Dec 11, 2023
c6ddca8
update
ArthurZucker Dec 11, 2023
3f62433
Update src/transformers/models/mixtral/modeling_mixtral.py
ArthurZucker Dec 11, 2023
6c6df4e
Update docs/source/en/model_doc/mixtral.md
ArthurZucker Dec 11, 2023
d4e826f
revert router logits config issue
ArthurZucker Dec 11, 2023
d17b756
update doc accordingly
ArthurZucker Dec 11, 2023
e86facd
Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
younesbelkada Dec 11, 2023
2c85405
nits
ArthurZucker Dec 11, 2023
bb88c76
use torch testing asssert close
ArthurZucker Dec 11, 2023
6624e9c
fixup
ArthurZucker Dec 11, 2023
c26aaa4
doc nits
ArthurZucker Dec 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao.
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[Mixtral](https://huggingface.co/docs/transformers/main/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
Expand Down
1 change: 1 addition & 0 deletions README_es.md
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao.
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
1. **[Mixtral](https://huggingface.co/docs/transformers/main/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
Expand Down
1 change: 1 addition & 0 deletions README_hd.md
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,7 @@ conda install -c huggingface transformers
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (NVIDIA से) साथ वाला पेपर [Megatron-LM: ट्रेनिंग मल्टी-बिलियन पैरामीटर लैंग्वेज मॉडल्स यूजिंग मॉडल पैरेललिज़्म] (https://arxiv.org/abs/1909.08053) मोहम्मद शोएबी, मोस्टोफा पटवारी, राउल पुरी, पैट्रिक लेग्रेस्ले, जेरेड कैस्पर और ब्रायन कैटानज़ारो द्वारा पोस्ट किया गया।
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (Alibaba Research से) Peng Wang, Cheng Da, and Cong Yao. द्वाराअनुसंधान पत्र [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) के साथ जारी किया गया
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
1. **[Mixtral](https://huggingface.co/docs/transformers/main/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (फ्रॉम Studio Ousia) साथ में पेपर [mLUKE: द पावर ऑफ एंटिटी रिप्रेजेंटेशन इन मल्टीलिंगुअल प्रीट्रेन्ड लैंग्वेज मॉडल्स](https://arxiv.org/abs/2110.08151) रयोकन री, इकुया यामाडा, और योशिमासा त्सुरोका द्वारा।
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (Facebook से) Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. द्वाराअनुसंधान पत्र [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) के साथ जारी किया गया
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (सीएमयू/गूगल ब्रेन से) साथ में कागज [मोबाइलबर्ट: संसाधन-सीमित उपकरणों के लिए एक कॉम्पैक्ट टास्क-अज्ञेय बीईआरटी] (https://arxiv.org/abs/2004.02984) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, और Denny Zhou द्वारा पोस्ट किया गया।
Expand Down
1 change: 1 addition & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (NVIDIA から) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro から公開された研究論文: [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053)
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (Alibaba Research から) Peng Wang, Cheng Da, and Cong Yao. から公開された研究論文 [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592)
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
1. **[Mixtral](https://huggingface.co/docs/transformers/main/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (Studio Ousia から) Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka から公開された研究論文: [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151)
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (Facebook から) Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. から公開された研究論文 [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516)
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (CMU/Google Brain から) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou から公開された研究論文: [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984)
Expand Down
1 change: 1 addition & 0 deletions README_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (NVIDIA 에서) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 의 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 논문과 함께 발표했습니다.
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (Alibaba Research 에서 제공)은 Peng Wang, Cheng Da, and Cong Yao.의 [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592)논문과 함께 발표했습니다.
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
1. **[Mixtral](https://huggingface.co/docs/transformers/main/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (Studio Ousia 에서) Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 의 [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) 논문과 함께 발표했습니다.
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (Facebook 에서 제공)은 Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.의 [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516)논문과 함께 발표했습니다.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (CMU/Google Brain 에서) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou 의 [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) 논문과 함께 발표했습니다.
Expand Down
1 change: 1 addition & 0 deletions README_zh-hans.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,7 @@ conda install -c huggingface transformers
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (来自 Alibaba Research) 伴随论文 [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) 由 Peng Wang, Cheng Da, and Cong Yao 发布。
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
1. **[Mixtral](https://huggingface.co/docs/transformers/main/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (来自 Studio Ousia) 伴随论文 [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 发布。
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (来自 Facebook) 伴随论文 [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) 由 Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli 发布。
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (来自 CMU/Google Brain) 伴随论文 [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) 由 Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou 发布。
Expand Down
1 change: 1 addition & 0 deletions README_zh-hant.md
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,7 @@ conda install -c huggingface transformers
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao.
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed..
1. **[Mixtral](https://huggingface.co/docs/transformers/main/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,8 @@
title: MegatronGPT2
- local: model_doc/mistral
title: Mistral
- local: model_doc/mixtral
title: Mixtral
- local: model_doc/mluke
title: mLUKE
- local: model_doc/mobilebert
Expand Down