Show:

Abron [abr] Ghana

No Statistics available for this language. more details ...

Aceh [ace] Indonesia (Sumatra)

Corpus: ace_community_2017
Sentences: 4,539, Types: 12,771, Tokens: 84,860
URLs: 1,133 more details ...

Acholi [ach] Uganda

Corpus: ach_community_2017
Sentences: 186, Types: 1,713, Tokens: 5,292
URLs: 49 more details ...

Afar [aar] Ethiopia

Corpus: aar_community_2017
Sentences: 56, Types: 586, Tokens: 899
URLs: 3 more details ...

Ahirani [ahr] India

No Statistics available for this language. more details ...

Akan [aka] Ghana

No Statistics available for this language. more details ...

Alur [alz] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Amharic [amh] Ethiopia

Corpus: amh_community_2017
Sentences: 45,092, Types: 139,904, Tokens: 671,748
URLs: 7,755 more details ...

Anaang [anw] Nigeria

Corpus: anw_community_2017
Sentences: 30,362, Types: 67,236, Tokens: 479,884
URLs: 14,860 more details ...

Assamese [asm] India

Corpus: asm_community_2017
Sentences: 63,627, Types: 121,419, Tokens: 905,966
URLs: 2,651 more details ...

Awadhi [awa] India

No Statistics available for this language. more details ...

Aymara [aym] Bolivia

Corpus: aym_community_2017
Sentences: 3,264, Types: 9,600, Tokens: 49,740
URLs: 1,836 more details ...

Bagheli [bfy] India

No Statistics available for this language. more details ...

Bakhtiari [bqi] Iran

No Statistics available for this language. more details ...

Bali [ban] Indonesia (Java and Bali)

Corpus: ban_community_2021 + 1
Sentences: 598, Types: 2,920, Tokens: 9,969
URLs: 133 more details ...

Baluchi [bal] Pakistan

No Statistics available for this language. more details ...

Bamanankan [bam] Mali

Corpus: bam_community_2021 + 1
Sentences: 10,874, Types: 14,371, Tokens: 217,734
URLs: 748 more details ...

Banjar [bjn] Indonesia (Kalimantan)

Corpus: bjn_community_2017
Sentences: 7,240, Types: 22,439, Tokens: 126,268
URLs: 1,119 more details ...

Baoulé [bci] Côte d’Ivoire

No Statistics available for this language. more details ...

Bashkort [bak] Russian Federation

Corpus: bak_community_2019 + 1
Sentences: 3,332, Types: 11,450, Tokens: 58,078
URLs: 296 more details ...

Batak Dairi [btd] Indonesia (Sumatra)

No Statistics available for this language. more details ...

Batak Mandailing [btm] Indonesia (Sumatra)

No Statistics available for this language. more details ...

Batak Simalungun [bts] Indonesia (Sumatra)

No Statistics available for this language. more details ...

Batak Toba [bbc] Indonesia (Sumatra)

No Statistics available for this language. more details ...

Bedawiyet [bej] Sudan

No Statistics available for this language. more details ...

Bemba [bem] Zambia

Corpus: bem_community_2017
Sentences: 81, Types: 422, Tokens: 606
URLs: 32 more details ...

Bengali [ben] Bangladesh

Corpus: ben_community_2022 + 3
Sentences: 469,711, Types: 266,808, Tokens: 5,997,574
URLs: 21,299 more details ...

Berom [bom] Nigeria

No Statistics available for this language. more details ...

Betawi [bew] Indonesia (Java and Bali)

No Statistics available for this language. more details ...

Bhili [bhb] India

No Statistics available for this language. more details ...

Bhojpuri [bho] India

No Statistics available for this language. more details ...

Bikol [bik] Philippines

Corpus: bik_community_2017
Sentences: 12,023, Types: 26,115, Tokens: 265,360
URLs: 1,060 more details ...

Bodo [brx] India

No Statistics available for this language. more details ...

Bosnian [bos] Bosnia and Herzegovina

Corpus: bos_community_2017
Sentences: 659,405, Types: 492,801, Tokens: 14,318,352
URLs: 128,753 more details ...

Bouyei [pcc] China

No Statistics available for this language. more details ...

Brahui [brh] Pakistan

No Statistics available for this language. more details ...

Bugis [bug] Indonesia (Sulawesi)

Corpus: bug_community_2017
Sentences: 583, Types: 1,901, Tokens: 7,452
URLs: 466 more details ...

Bundeli [bns] India

No Statistics available for this language. more details ...

Burmese [mya] Myanmar

Corpus: mya_community_2022 + 1
Sentences: 7,478, Types: 52,098, Tokens: 110,351
URLs: 2,846 more details ...

Cebuano [ceb] Philippines

Corpus: ceb_community_2017
Sentences: 729,461, Types: 387,017, Tokens: 11,495,546
URLs: 702,495 more details ...

Central Atlas Tamazight [tzm] Morocco

No Statistics available for this language. more details ...

Central Bikol [bcl] Philippines

Corpus: bcl_community_2017
Sentences: 15,726, Types: 38,027, Tokens: 318,931
URLs: 3,430 more details ...

Kalanga [kck] Zimbabwe

Corpus: kck_community_2019
Sentences: 996, Types: 4,156, Tokens: 16,798
URLs: 26 more details ...

Central Kanuri [knc] Nigeria

No Statistics available for this language. more details ...

Central Khmer [khm] Cambodia

Corpus: khm_community_2017
Sentences: 1,773, Types: 10,143, Tokens: 19,124
URLs: 895 more details ...

Central Kurdish [ckb] Iraq

Corpus: ckb_community_2017
Sentences: 4,978, Types: 19,063, Tokens: 101,004
URLs: 427 more details ...

Chechen [che] Russian Federation

Corpus: che_community_2023 + 1
Sentences: 49,553, Types: 70,347, Tokens: 650,090
URLs: 27,881 more details ...

Chhattisgarhi [hne] India

No Statistics available for this language. more details ...

Chiga [cgg] Uganda

No Statistics available for this language. more details ...

Chittagonian [ctg] Bangladesh

No Statistics available for this language. more details ...

Chokwe [cjk] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Chuanqiandian Cluster Miao [cqd] China

No Statistics available for this language. more details ...

Chuvash [chv] Russian Federation

Corpus: chv_community_2017
Sentences: 71,333, Types: 119,208, Tokens: 917,180
URLs: 11,649 more details ...

Dan [dnj] Côte d’Ivoire

No Statistics available for this language. more details ...

Dari [prs] Afghanistan

Corpus: prs_community_2017
Sentences: 71,489, Types: 127,345, Tokens: 1,809,533
URLs: 6,408 more details ...

Deccan [dcc] India

No Statistics available for this language. more details ...

Dholuo [luo] Kenya

No Statistics available for this language. more details ...

Dhundari [dhd] India

No Statistics available for this language. more details ...

Dimli [diq] Turkey

Corpus: diq_community_2017
Sentences: 31,099, Types: 67,631, Tokens: 442,831
URLs: 5,337 more details ...

Dinka [din] Sudan

No Statistics available for this language. more details ...

Dogri [doi] India

No Statistics available for this language. more details ...

Domari [rmt] Iran

No Statistics available for this language. more details ...

Eastern Balochi [bgp] Pakistan

No Statistics available for this language. more details ...

Eastern Maninkakan [emk] Guinea

Corpus: emk_community_2017
Sentences: 30, Types: 147, Tokens: 233
URLs: 5 more details ...

Eastern Tamang [taj] Nepal

No Statistics available for this language. more details ...

Eastern Yiddish [ydd] Israel

Corpus: ydd_community_2017
Sentences: 21,819, Types: 32,472, Tokens: 489,592
URLs: 9 more details ...

Ebira [igb] Nigeria

No Statistics available for this language. more details ...

Edo [bin] Nigeria

No Statistics available for this language. more details ...

Ekegusii [guz] Kenya

No Statistics available for this language. more details ...

Éwé [ewe] Ghana

Corpus: ewe_community_2017
Sentences: 10,173, Types: 15,532, Tokens: 198,511
URLs: 988 more details ...

Fang [fan] Guinea

No Statistics available for this language. more details ...

Filipino [fil] Philippines

No Statistics available for this language. more details ...

Fon [fon] Benin

Corpus: fon_community_2017
Sentences: 19, Types: 174, Tokens: 445
URLs: 1 more details ...

Fulah [ful] Cameroon

Corpus: ful_community_2017
Sentences: 1,002, Types: 5,337, Tokens: 18,654
URLs: 94 more details ...

Galician [glg] Spain

Corpus: glg_community_2022 + 2
Sentences: 270,434, Types: 267,554, Tokens: 6,173,696
URLs: 42,282 more details ...

Gamo [gmv] Ethiopia

No Statistics available for this language. more details ...

Gan Chinese [gan] China

Corpus: gan_community_2017
Sentences: 9,897, Types: 12,707, Tokens: 45,618
URLs: 1,027 more details ...

Ganda [lug] Uganda

Corpus: lug_community_2017
Sentences: 78,609, Types: 178,380, Tokens: 1,382,770
URLs: 4,842 more details ...

Garhwali [gbm] India

No Statistics available for this language. more details ...

Garo [grt] India

No Statistics available for this language. more details ...

Gikuyu [kik] Kenya

Corpus: kik_community_2017
Sentences: 815, Types: 3,586, Tokens: 9,848
URLs: 326 more details ...

Goan Konkani [gom] India

Corpus: gom_community_2017
Sentences: 39,645, Types: 71,960, Tokens: 583,042
URLs: 126 more details ...

Godwari [gdx] India

No Statistics available for this language. more details ...

Gogo [gog] Tanzania

No Statistics available for this language. more details ...

Gondi [gon] India

No Statistics available for this language. more details ...

Gorontalo [gor] Indonesia (Sulawesi)

No Statistics available for this language. more details ...

Guarani [grn] Paraguay, Bolivia

Corpus: grn_community_2021 + 1
Sentences: 14,613, Types: 40,052, Tokens: 213,397
URLs: 2,254 more details ...

Hadiyya [hdy] Ethiopia

No Statistics available for this language. more details ...

Haitian [hat] Haiti

Corpus: hat_community_2017
Sentences: 23,384, Types: 32,792, Tokens: 335,466
URLs: 14,585 more details ...

Halh Mongolian [khk] Mongolia

Corpus: khk_community_2017
Sentences: 19,593, Types: 43,349, Tokens: 351,010
URLs: 2,182 more details ...

Haryanvi [bgc] India

No Statistics available for this language. more details ...

Hassaniyya [mey] Mauritania

No Statistics available for this language. more details ...

Hausa [hau] Nigeria

Corpus: hau_community_2017 + 2
Sentences: 41,589, Types: 40,232, Tokens: 1,017,657
URLs: 2,492 more details ...

Haya [hay] Tanzania

No Statistics available for this language. more details ...

Hazaragi [haz] Afghanistan

No Statistics available for this language. more details ...

Hiligaynon [hil] Philippines

Corpus: hil_community_2017
Sentences: 69, Types: 399, Tokens: 1,390
URLs: 3 more details ...

Hmong Daw [mww] China

No Statistics available for this language. more details ...

Hmong [hmn] China

No Statistics available for this language. more details ...

Ho [hoc] India

No Statistics available for this language. more details ...

Hunsrik [hrx] Brazil

No Statistics available for this language. more details ...

Ibibio [ibb] Nigeria

Corpus: ibb_community_2017
Sentences: 66, Types: 217, Tokens: 383
URLs: 17 more details ...

Igbo [ibo] Nigeria

Corpus: ibo_community_2017
Sentences: 7,742, Types: 14,660, Tokens: 158,694
URLs: 1,429 more details ...

Ilocano [ilo] Philippines

Corpus: ilo_community_2017
Sentences: 18,026, Types: 43,542, Tokens: 438,419
URLs: 3,402 more details ...

Izon [ijc] Nigeria

No Statistics available for this language. more details ...

Jambi Malay [jax] Indonesia

No Statistics available for this language. more details ...

Javanese [jav] Indonesia (Java and Bali)

Corpus: jav_community_2017
Sentences: 83,798, Types: 134,699, Tokens: 1,506,187
URLs: 18,815 more details ...

Jula [dyu] Burkina Faso

Corpus: dyu_community_2017
Sentences: 8, Types: 17, Tokens: 75
URLs: 1 more details ...

Kabardian [kbd] Russian Federation

Corpus: kbd_community_2017
Sentences: 9,242, Types: 34,481, Tokens: 116,016
URLs: 1,170 more details ...

Kabuverdianu [kea] Cape Verde Islands

Corpus: kea_community_2017
Sentences: 258, Types: 1,108, Tokens: 2,269
URLs: 99 more details ...

Kabyle [kab] Algeria

Corpus: kab_community_2017
Sentences: 4,778, Types: 21,054, Tokens: 103,625
URLs: 945 more details ...

Kalenjin [kln] Kenya

No Statistics available for this language. more details ...

Kamba [kam] Kenya

No Statistics available for this language. more details ...

Kanauji [bjj] India

No Statistics available for this language. more details ...

Kangri [xnr] India

No Statistics available for this language. more details ...

Kannada [kan] India

Corpus: kan_community_2017
Sentences: 102,397, Types: 273,390, Tokens: 1,590,451
URLs: 10,550 more details ...

Kanuri [kau] Nigeria

No Statistics available for this language. more details ...

Kashkay [qxq] Iran

No Statistics available for this language. more details ...

Kashmiri [kas] India

Corpus: kas_community_2017
Sentences: 448, Types: 2,439, Tokens: 4,749
URLs: 230 more details ...

Khams Tibetan [khg] China

No Statistics available for this language. more details ...

Kimbundu [kmb] Angola

No Statistics available for this language. more details ...

Kimîîru [mer] Kenya

No Statistics available for this language. more details ...

Kipsigis [sgc] Kenya

No Statistics available for this language. more details ...

Kituba [ktu] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Kituba [mkw] Congo

Corpus: mkw_community_2017
Sentences: 143,476, Types: 218,126, Tokens: 2,025,305
URLs: 39,683 more details ...

Kongo [kon] Dem. Rep. of Congo

Corpus: kon_community_2017
Sentences: 337, Types: 1,448, Tokens: 3,740
URLs: 179 more details ...

Koongo [kng] Dem. Rep. of Congo

Corpus: kng_community_2017
Sentences: 41, Types: 178, Tokens: 292
URLs: 24 more details ...

Kpelle [kpe] Guinea

No Statistics available for this language. more details ...

Kumaoni [kfy] India

No Statistics available for this language. more details ...

Kurdish [kur] Kurdistan, Iraq, Turkey

Corpus: kur_community_2017
Sentences: 36,729, Types: 141,732, Tokens: 789,948
URLs: 4,912 more details ...

Kurux [kru] India

No Statistics available for this language. more details ...

Kyrgyz [kir] Kyrgyzstan

Corpus: kir_community_2017
Sentences: 251,608, Types: 281,359, Tokens: 4,382,407
URLs: 33,270 more details ...

Lahnda [lah] Pakistan

No Statistics available for this language. more details ...

Laki [lki] Iran

No Statistics available for this language. more details ...

Lambadi [lmn] India

No Statistics available for this language. more details ...

Lango [laj] Uganda

No Statistics available for this language. more details ...

Lao [lao] Laos

Corpus: lao_community_2021 + 1
Sentences: 11,099, Types: 38,524, Tokens: 243,067
URLs: 4,513 more details ...

Limburgish [lim] Netherlands

Corpus: lim_community_2019 + 1
Sentences: 0, Types: 0, Tokens: 0
URLs: 0 more details ...

Lingala [lin] Dem. Rep. of Congo

Corpus: lin_community_2017
Sentences: 1,300, Types: 5,060, Tokens: 24,488
URLs: 214 more details ...

Lomwe [ngl] Mozambique

Corpus: ngl_community_2017
Sentences: 76, Types: 270, Tokens: 470
URLs: 36 more details ...

Luba-Kasai [lua] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Luba-Katanga [lub] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Lubukusu [bxk] Kenya

No Statistics available for this language. more details ...

Lugbara [lgg] Uganda

Corpus: lgg_community_2017
Sentences: 95, Types: 368, Tokens: 665
URLs: 40 more details ...

Maasai [mas] Kenya

No Statistics available for this language. more details ...

Maasina Fulfulde [ffm] Mali

No Statistics available for this language. more details ...

Madura [mad] Indonesia (Java and Bali)

Corpus: mad_community_2017
Sentences: 32, Types: 94, Tokens: 280
URLs: 7 more details ...

Magahi [mag] India

No Statistics available for this language. more details ...

Maguindanao [mdh] Philippines

No Statistics available for this language. more details ...

Mahasu Pahari [bfz] India

No Statistics available for this language. more details ...

Maithili [mai] India

No Statistics available for this language. more details ...

Makasar [mak] Indonesia (Sulawesi)

No Statistics available for this language. more details ...

Makhuwa [vmw] Mozambique

No Statistics available for this language. more details ...

Makhuwa-Meetto [mgh] Mozambique

No Statistics available for this language. more details ...

Makonde [kde] Tanzania

Corpus: kde_community_2017
Sentences: 96, Types: 340, Tokens: 705
URLs: 39 more details ...

Malagasy [mlg] Madagascar

Corpus: mlg_community_2017
Sentences: 92,762, Types: 126,134, Tokens: 1,191,086
URLs: 37,324 more details ...

Malay [msa] Thailand, Malaysia

Corpus: msa_community_2017
Sentences: 756,962, Types: 341,656, Tokens: 15,392,221
URLs: 89,140 more details ...

Malay [zlm] Malaysia (Peninsular)

No Statistics available for this language. more details ...

Malayalam [mal] India

Corpus: mal_community_2021 + 1
Sentences: 602,385, Types: 1,001,257, Tokens: 7,066,023
URLs: 62,153 more details ...

Mandingo [man] Senegal

No Statistics available for this language. more details ...

Mandinka [mnk] Senegal

No Statistics available for this language. more details ...

Manyika [mxc] Zimbabwe

No Statistics available for this language. more details ...

Marwari [mwr] India

No Statistics available for this language. more details ...

Masaaba [myx] Uganda

No Statistics available for this language. more details ...

Mazanderani [mzn] Iran

Corpus: mzn_community_2017
Sentences: 26,727, Types: 46,533, Tokens: 459,204
URLs: 10,350 more details ...

Meitei [mni] India

No Statistics available for this language. more details ...

Mende [men] Sierra Leone

No Statistics available for this language. more details ...

Min Dong Chinese [cdo] China

Corpus: cdo_community_2017
Sentences: 3,280, Types: 10,200, Tokens: 55,617
URLs: 584 more details ...

Min Nan Chinese [nan] China

Corpus: nan_community_2021 + 1
Sentences: 78,527, Types: 103,993, Tokens: 1,413,232
URLs: 2,930 more details ...

Mina [myi] India

No Statistics available for this language. more details ...

Minangkabau [min] Indonesia (Sumatra)

Corpus: min_community_2017
Sentences: 186,423, Types: 93,939, Tokens: 1,927,011
URLs: 170,993 more details ...

Mundari [unr] India

No Statistics available for this language. more details ...

Muong [mtq] Viet Nam

No Statistics available for this language. more details ...

Musi [mui] Indonesia (Sumatra)

No Statistics available for this language. more details ...

Mòoré [mos] Burkina Faso

Corpus: mos_community_2017
Sentences: 107, Types: 382, Tokens: 693
URLs: 43 more details ...

Ndau [ndc] Zimbabwe

No Statistics available for this language. more details ...

Ndebele [nde] Zimbabwe

No Statistics available for this language. more details ...

Ndonga [ndo] Namibia

Corpus: ndo_community_2017
Sentences: 13,495, Types: 37,191, Tokens: 308,836
URLs: 1,253 more details ...

Ngbaka [nga] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Nigerian Fulfulde [fuv] Nigeria

No Statistics available for this language. more details ...

Nigerian Pidgin [pcm] Nigeria

No Statistics available for this language. more details ...

Nimadi [noe] India

No Statistics available for this language. more details ...

Northern Hindko [hno] Pakistan

No Statistics available for this language. more details ...

Northern Khmer [kxm] Thailand

No Statistics available for this language. more details ...

Northern Luri [lrc] Iran

No Statistics available for this language. more details ...

Northern Qiandong Miao [hea] China

No Statistics available for this language. more details ...

Northern Sotho [nso] South Africa

Corpus: nso_community_2021 + 1
Sentences: 9,560, Types: 12,871, Tokens: 171,558
URLs: 4,293 more details ...

Norwegian [nor] Norway

Corpus: nor_community_2017
Sentences: 69,299, Types: 95,606, Tokens: 984,898
URLs: 28,907 more details ...

Nuosu [iii] China

No Statistics available for this language. more details ...

Nyakyusa-Ngonde [nyy] Tanzania

No Statistics available for this language. more details ...

Nyanja [nya] Malawi

Corpus: nya_community_2017
Sentences: 896, Types: 4,343, Tokens: 13,712
URLs: 162 more details ...

Nyankore [nyn] Uganda

Corpus: nyn_community_2017
Sentences: 5, Types: 15, Tokens: 36
URLs: 1 more details ...

Occitan [oci] France

Corpus: oci_community_2023 + 2
Sentences: 199,559, Types: 258,826, Tokens: 4,215,601
URLs: 36,911 more details ...

Oluluyia [luy] Kenya

No Statistics available for this language. more details ...

Oriya [ori] India

Corpus: ori_community_2017
Sentences: 39,314, Types: 71,426, Tokens: 504,296
URLs: 5,127 more details ...

Oromo [orm] Ethiopia

Corpus: orm_community_2021 + 1
Sentences: 2,637, Types: 12,613, Tokens: 44,923
URLs: 401 more details ...

Pahari-Potwari [phr] Pakistan

No Statistics available for this language. more details ...

Pampangan [pam] Philippines

Corpus: pam_community_2017
Sentences: 17,250, Types: 41,952, Tokens: 347,606
URLs: 5,593 more details ...

Pangasinan [pag] Philippines

Corpus: pag_community_2017
Sentences: 6,012, Types: 11,854, Tokens: 79,080
URLs: 3,784 more details ...

Pontic [pnt] Greece

Corpus: pnt_community_2017
Sentences: 1,564, Types: 6,290, Tokens: 25,581
URLs: 398 more details ...

Pulaar [fuc] Senegal

Corpus: fuc_community_2017
Sentences: 124, Types: 1,022, Tokens: 2,642
URLs: 63 more details ...

Pular [fuf] Guinea

No Statistics available for this language. more details ...

Pwo Eastern Karen [kjp] Myanmar

No Statistics available for this language. more details ...

Quechua [que] Bolivia, Peru

Corpus: que_community_2017
Sentences: 21,139, Types: 38,495, Tokens: 266,643
URLs: 13,583 more details ...

Quiché [quc] Guatemala

No Statistics available for this language. more details ...

Rajasthani [raj] India

No Statistics available for this language. more details ...

Rangpuri [rkt] Bangladesh

No Statistics available for this language. more details ...

Rohingya [rhg] Myanmar

No Statistics available for this language. more details ...

Romany [rom] Romania

Corpus: rom_community_2019
Sentences: 1,435, Types: 6,646, Tokens: 28,320
URLs: 237 more details ...

Rundi [run] Burundi

Corpus: run_community_2017
Sentences: 17,361, Types: 56,856, Tokens: 363,797
URLs: 1,812 more details ...

Rwanda [kin] Rwanda

Corpus: kin_community_2022 + 3
Sentences: 30,270, Types: 88,135, Tokens: 638,210
URLs: 4,259 more details ...

S'gaw Karen [ksw] Myanmar

Corpus: ksw_community_2017
Sentences: 448, Types: 2,415, Tokens: 4,749
URLs: 230 more details ...

Sadri [sck] India

No Statistics available for this language. more details ...

Santali [sat] India

No Statistics available for this language. more details ...

Sasak [sas] Indonesia (Nusa Tenggara)

No Statistics available for this language. more details ...

Sena [seh] Mozambique

Corpus: seh_community_2017
Sentences: 27, Types: 118, Tokens: 167
URLs: 10 more details ...

Seraiki [skr] Pakistan

Corpus: skr_community_2017
Sentences: 87, Types: 885, Tokens: 1,897
URLs: 42 more details ...

Serer-Sine [srr] Senegal

No Statistics available for this language. more details ...

Shan [shn] Myanmar

No Statistics available for this language. more details ...

Shekhawati [swv] India

No Statistics available for this language. more details ...

Shona [sna] Zimbabwe

Corpus: sna_community_2017
Sentences: 48,339, Types: 122,437, Tokens: 792,698
URLs: 4,881 more details ...

Sidamo [sid] Ethiopia

No Statistics available for this language. more details ...

Sindhi [snd] Pakistan

Corpus: snd_community_2017
Sentences: 7,431, Types: 22,097, Tokens: 140,754
URLs: 588 more details ...

Sinhala sin [sri] Lanka

No Statistics available for this language. more details ...

Soga [xog] Uganda

No Statistics available for this language. more details ...

Somali [som] Somalia

Corpus: som_community_2017
Sentences: 170,575, Types: 216,881, Tokens: 4,320,798
URLs: 22,918 more details ...

Songe [sop] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Soninke [snk] Mali

Corpus: snk_community_2017
Sentences: 124, Types: 454, Tokens: 866
URLs: 40 more details ...

Southern Balochi [bcc] Pakistan

No Statistics available for this language. more details ...

Southern Dong [kmc] China

No Statistics available for this language. more details ...

Southern Kurdish [sdh] Iran

No Statistics available for this language. more details ...

Southern Ndebele [nbl] South Africa

Corpus: nbl_community_2017
Sentences: 318, Types: 2,643, Tokens: 5,424
URLs: 29 more details ...

Southern Sotho [sot] South Africa, Lesotho

Corpus: sot_community_2017
Sentences: 9,773, Types: 17,421, Tokens: 238,709
URLs: 542 more details ...

Sukuma [suk] Tanzania

Corpus: suk_community_2017
Sentences: 47, Types: 163, Tokens: 391
URLs: 13 more details ...

Sunda [sun] Indonesia (Java and Bali)

Corpus: sun_community_2017
Sentences: 50,340, Types: 84,722, Tokens: 913,176
URLs: 6,187 more details ...

Surgujia [sgj] India

No Statistics available for this language. more details ...

Surjapuri [sjp] India

No Statistics available for this language. more details ...

Susu [sus] Guinea

Corpus: sus_community_2017
Sentences: 9, Types: 79, Tokens: 108
URLs: 5 more details ...

Swahili [swa] Tanzania

Corpus: swa_community_2023 + 3
Sentences: 59,342, Types: 93,552, Tokens: 1,412,549
URLs: 4,949 more details ...

Swati [ssw] South Africa, Swaziland

Corpus: ssw_community_2017
Sentences: 380, Types: 3,132, Tokens: 5,895
URLs: 60 more details ...

Sylheti [syl] Bangladesh

No Statistics available for this language. more details ...

Tachawit [shy] Algeria

No Statistics available for this language. more details ...

Tachelhit [shi] Morocco

No Statistics available for this language. more details ...

Tagalog [tgl] Philippines

Corpus: tgl_community_2017
Sentences: 979,689, Types: 472,209, Tokens: 20,664,580
URLs: 106,820 more details ...

Tajiki [tgk] Tajikistan

Corpus: tgk_community_2022 + 2
Sentences: 941,793, Types: 596,826, Tokens: 19,341,776
URLs: 93,504 more details ...

Tamashek [tmh] Niger

No Statistics available for this language. more details ...

Tarifit [rif] Morocco

No Statistics available for this language. more details ...

Tausug [tsg] Philippines

No Statistics available for this language. more details ...

Teso [teo] Uganda

No Statistics available for this language. more details ...

Thai [tha] Thailand

Corpus: tha_community_2021 + 1
Sentences: 57,017, Types: 196,452, Tokens: 793,384
URLs: 21,462 more details ...

Themne [tem] Sierra Leone

Corpus: tem_community_2017
Sentences: 8, Types: 15, Tokens: 40
URLs: 1 more details ...

Tibetan [bod] China

Corpus: bod_community_2017
Sentences: 7,525, Types: 22,081, Tokens: 32,178
URLs: 4,379 more details ...

Tigrigna [tir] Ethiopia, Eritrea

Corpus: tir_community_2017
Sentences: 1,379, Types: 5,700, Tokens: 17,837
URLs: 181 more details ...

Tigré [tig] Eritrea

No Statistics available for this language. more details ...

Tiv [tiv] Nigeria

Corpus: tiv_community_2017
Sentences: 3, Types: 20, Tokens: 60
URLs: 1 more details ...

Tonga [toi] Zambia, Zimbabwe

No Statistics available for this language. more details ...

Tsonga [tso] South Africa

Corpus: tso_community_2017
Sentences: 10,571, Types: 17,493, Tokens: 238,504
URLs: 446 more details ...

Tswa [tsc] Mozambique

No Statistics available for this language. more details ...

Tswana [tsn] South Africa, Botswana

Corpus: tsn_community_2017
Sentences: 28,276, Types: 34,977, Tokens: 687,676
URLs: 2,772 more details ...

Tulu [tcy] India

No Statistics available for this language. more details ...

Tumbuka [tum] Malawi

Corpus: tum_community_2017
Sentences: 240, Types: 1,645, Tokens: 4,989
URLs: 31 more details ...

Turkmen [tuk] Turkmenistan

Corpus: tuk_community_2023 + 2
Sentences: 181, Types: 1,504, Tokens: 2,952
URLs: 99 more details ...

Tày [tyz] Viet Nam

No Statistics available for this language. more details ...

Umbundu [umb] Angola

No Statistics available for this language. more details ...

Uyghur [uig] China

Corpus: uig_community_2021 + 1
Sentences: 68,736, Types: 143,824, Tokens: 1,177,946
URLs: 4,703 more details ...

Uzbek [uzb] Uzbekistan

Corpus: uzb_community_2017
Sentences: 663,119, Types: 706,425, Tokens: 10,900,014
URLs: 65,787 more details ...

Varhadi-Nagpuri [vah] India

No Statistics available for this language. more details ...

Vasavi [vas] India

No Statistics available for this language. more details ...

Venda [ven] South Africa

Corpus: ven_community_2017
Sentences: 9,279, Types: 14,412, Tokens: 179,877
URLs: 375 more details ...

Vlaams [vls] Belgium

Corpus: vls_community_2017
Sentences: 36,393, Types: 75,458, Tokens: 693,658
URLs: 4,740 more details ...

Waray-Waray [war] Philippines

Corpus: war_community_2017
Sentences: 808,036, Types: 399,359, Tokens: 13,358,684
URLs: 793,771 more details ...

Western Balochi [bgn] Pakistan

No Statistics available for this language. more details ...

Western Panjabi [pnb] Pakistan

Corpus: pnb_community_2017
Sentences: 63,683, Types: 64,365, Tokens: 1,052,347
URLs: 26,859 more details ...

Wolaytta [wal] Ethiopia

No Statistics available for this language. more details ...

Wolof [wol] Senegal

Corpus: wol_community_2017
Sentences: 9,988, Types: 22,011, Tokens: 254,548
URLs: 1,628 more details ...

Xhosa [xho] South Africa

Corpus: xho_community_2019 + 1
Sentences: 63,387, Types: 172,520, Tokens: 972,301
URLs: 4,227 more details ...

Yao [yao] Malawi

No Statistics available for this language. more details ...

Yilumbu [lup] Gabon

Corpus: lup_community_2017
Sentences: 12,246, Types: 16,227, Tokens: 87,930
URLs: 1 more details ...

Yombe [yom] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Yoruba [yor] Nigeria

Corpus: yor_community_2017
Sentences: 10,703, Types: 28,265, Tokens: 210,852
URLs: 1,961 more details ...

Zande [zne] Dem. Rep. of Congo

No Statistics available for this language. more details ...

Zarma [dje] Niger

No Statistics available for this language. more details ...

Zaza [zza] Turkey

No Statistics available for this language. more details ...

Zhuang [zha] China

Corpus: zha_community_2017
Sentences: 2,306, Types: 6,395, Tokens: 20,110
URLs: 642 more details ...

Zulu [zul] South Africa

Corpus: zul_community_2021 + 1
Sentences: 158,644, Types: 372,449, Tokens: 2,520,740
URLs: 5,316 more details ...