Abron [abr] Ghana
No Statistics available for this language. more details ...
Aceh [ace] Indonesia (Sumatra)
Corpus: ace_community_2017
Sentences: 4,539,
Types: 12,771,
Tokens: 84,860
URLs: 1,133
more details ...
Acholi [ach] Uganda
Corpus: ach_community_2017
Sentences: 186,
Types: 1,713,
Tokens: 5,292
URLs: 49
more details ...
Afar [aar] Ethiopia
Corpus: aar_community_2017
Sentences: 56,
Types: 586,
Tokens: 899
URLs: 3
more details ...
Ahirani [ahr] India
No Statistics available for this language. more details ...
Akan [aka] Ghana
No Statistics available for this language. more details ...
Alur [alz] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Amharic [amh] Ethiopia
Corpus: amh_community_2017
Sentences: 45,092,
Types: 139,904,
Tokens: 671,748
URLs: 7,755
more details ...
Anaang [anw] Nigeria
Corpus: anw_community_2017
Sentences: 30,362,
Types: 67,236,
Tokens: 479,884
URLs: 14,860
more details ...
Assamese [asm] India
Corpus: asm_community_2017
Sentences: 63,627,
Types: 121,419,
Tokens: 905,966
URLs: 2,651
more details ...
Awadhi [awa] India
No Statistics available for this language. more details ...
Aymara [aym] Bolivia
Corpus: aym_community_2017
Sentences: 3,264,
Types: 9,600,
Tokens: 49,740
URLs: 1,836
more details ...
Bagheli [bfy] India
No Statistics available for this language. more details ...
Bakhtiari [bqi] Iran
No Statistics available for this language. more details ...
Bali [ban] Indonesia (Java and Bali)
Corpus: ban_community_2021 + 1
Sentences: 598,
Types: 2,920,
Tokens: 9,969
URLs: 133
more details ...
Baluchi [bal] Pakistan
No Statistics available for this language. more details ...
Bamanankan [bam] Mali
Corpus: bam_community_2021 + 1
Sentences: 10,874,
Types: 14,371,
Tokens: 217,734
URLs: 748
more details ...
Banjar [bjn] Indonesia (Kalimantan)
Corpus: bjn_community_2017
Sentences: 7,240,
Types: 22,439,
Tokens: 126,268
URLs: 1,119
more details ...
Baoulé [bci] Côte d’Ivoire
No Statistics available for this language. more details ...
Bashkort [bak] Russian Federation
Corpus: bak_community_2019 + 1
Sentences: 3,332,
Types: 11,450,
Tokens: 58,078
URLs: 296
more details ...
Batak Dairi [btd] Indonesia (Sumatra)
No Statistics available for this language. more details ...
Batak Mandailing [btm] Indonesia (Sumatra)
No Statistics available for this language. more details ...
Batak Simalungun [bts] Indonesia (Sumatra)
No Statistics available for this language. more details ...
Batak Toba [bbc] Indonesia (Sumatra)
No Statistics available for this language. more details ...
Bedawiyet [bej] Sudan
No Statistics available for this language. more details ...
Bemba [bem] Zambia
Corpus: bem_community_2017
Sentences: 81,
Types: 422,
Tokens: 606
URLs: 32
more details ...
Bengali [ben] Bangladesh
Corpus: ben_community_2022 + 3
Sentences: 469,711,
Types: 266,808,
Tokens: 5,997,574
URLs: 21,299
more details ...
Berom [bom] Nigeria
No Statistics available for this language. more details ...
Betawi [bew] Indonesia (Java and Bali)
No Statistics available for this language. more details ...
Bhili [bhb] India
No Statistics available for this language. more details ...
Bhojpuri [bho] India
No Statistics available for this language. more details ...
Bikol [bik] Philippines
Corpus: bik_community_2017
Sentences: 12,023,
Types: 26,115,
Tokens: 265,360
URLs: 1,060
more details ...
Bodo [brx] India
No Statistics available for this language. more details ...
Bosnian [bos] Bosnia and Herzegovina
Corpus: bos_community_2017
Sentences: 659,405,
Types: 492,801,
Tokens: 14,318,352
URLs: 128,753
more details ...
Bouyei [pcc] China
No Statistics available for this language. more details ...
Brahui [brh] Pakistan
No Statistics available for this language. more details ...
Bugis [bug] Indonesia (Sulawesi)
Corpus: bug_community_2017
Sentences: 583,
Types: 1,901,
Tokens: 7,452
URLs: 466
more details ...
Bundeli [bns] India
No Statistics available for this language. more details ...
Burmese [mya] Myanmar
Corpus: mya_community_2022 + 1
Sentences: 7,478,
Types: 52,098,
Tokens: 110,351
URLs: 2,846
more details ...
Cebuano [ceb] Philippines
Corpus: ceb_community_2017
Sentences: 729,461,
Types: 387,017,
Tokens: 11,495,546
URLs: 702,495
more details ...
Central Atlas Tamazight [tzm] Morocco
No Statistics available for this language. more details ...
Central Bikol [bcl] Philippines
Corpus: bcl_community_2017
Sentences: 15,726,
Types: 38,027,
Tokens: 318,931
URLs: 3,430
more details ...
Kalanga [kck] Zimbabwe
Corpus: kck_community_2019
Sentences: 996,
Types: 4,156,
Tokens: 16,798
URLs: 26
more details ...
Central Kanuri [knc] Nigeria
No Statistics available for this language. more details ...
Central Khmer [khm] Cambodia
Corpus: khm_community_2017
Sentences: 1,773,
Types: 10,143,
Tokens: 19,124
URLs: 895
more details ...
Central Kurdish [ckb] Iraq
Corpus: ckb_community_2017
Sentences: 4,978,
Types: 19,063,
Tokens: 101,004
URLs: 427
more details ...
Chechen [che] Russian Federation
Corpus: che_community_2023 + 1
Sentences: 49,553,
Types: 70,347,
Tokens: 650,090
URLs: 27,881
more details ...
Chhattisgarhi [hne] India
No Statistics available for this language. more details ...
Chiga [cgg] Uganda
No Statistics available for this language. more details ...
Chittagonian [ctg] Bangladesh
No Statistics available for this language. more details ...
Chokwe [cjk] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Chuanqiandian Cluster Miao [cqd] China
No Statistics available for this language. more details ...
Chuvash [chv] Russian Federation
Corpus: chv_community_2017
Sentences: 71,333,
Types: 119,208,
Tokens: 917,180
URLs: 11,649
more details ...
Dan [dnj] Côte d’Ivoire
No Statistics available for this language. more details ...
Dari [prs] Afghanistan
Corpus: prs_community_2017
Sentences: 71,489,
Types: 127,345,
Tokens: 1,809,533
URLs: 6,408
more details ...
Deccan [dcc] India
No Statistics available for this language. more details ...
Dholuo [luo] Kenya
No Statistics available for this language. more details ...
Dhundari [dhd] India
No Statistics available for this language. more details ...
Dimli [diq] Turkey
Corpus: diq_community_2017
Sentences: 31,099,
Types: 67,631,
Tokens: 442,831
URLs: 5,337
more details ...
Dinka [din] Sudan
No Statistics available for this language. more details ...
Dogri [doi] India
No Statistics available for this language. more details ...
Domari [rmt] Iran
No Statistics available for this language. more details ...
Eastern Balochi [bgp] Pakistan
No Statistics available for this language. more details ...
Eastern Maninkakan [emk] Guinea
Corpus: emk_community_2017
Sentences: 30,
Types: 147,
Tokens: 233
URLs: 5
more details ...
Eastern Tamang [taj] Nepal
No Statistics available for this language. more details ...
Eastern Yiddish [ydd] Israel
Corpus: ydd_community_2017
Sentences: 21,819,
Types: 32,472,
Tokens: 489,592
URLs: 9
more details ...
Ebira [igb] Nigeria
No Statistics available for this language. more details ...
Edo [bin] Nigeria
No Statistics available for this language. more details ...
Ekegusii [guz] Kenya
No Statistics available for this language. more details ...
Éwé [ewe] Ghana
Corpus: ewe_community_2017
Sentences: 10,173,
Types: 15,532,
Tokens: 198,511
URLs: 988
more details ...
Fang [fan] Guinea
No Statistics available for this language. more details ...
Filipino [fil] Philippines
No Statistics available for this language. more details ...
Fon [fon] Benin
Corpus: fon_community_2017
Sentences: 19,
Types: 174,
Tokens: 445
URLs: 1
more details ...
Fulah [ful] Cameroon
Corpus: ful_community_2017
Sentences: 1,002,
Types: 5,337,
Tokens: 18,654
URLs: 94
more details ...
Galician [glg] Spain
Corpus: glg_community_2022 + 2
Sentences: 270,434,
Types: 267,554,
Tokens: 6,173,696
URLs: 42,282
more details ...
Gamo [gmv] Ethiopia
No Statistics available for this language. more details ...
Gan Chinese [gan] China
Corpus: gan_community_2017
Sentences: 9,897,
Types: 12,707,
Tokens: 45,618
URLs: 1,027
more details ...
Ganda [lug] Uganda
Corpus: lug_community_2017
Sentences: 78,609,
Types: 178,380,
Tokens: 1,382,770
URLs: 4,842
more details ...
Garhwali [gbm] India
No Statistics available for this language. more details ...
Garo [grt] India
No Statistics available for this language. more details ...
Gikuyu [kik] Kenya
Corpus: kik_community_2017
Sentences: 815,
Types: 3,586,
Tokens: 9,848
URLs: 326
more details ...
Goan Konkani [gom] India
Corpus: gom_community_2017
Sentences: 39,645,
Types: 71,960,
Tokens: 583,042
URLs: 126
more details ...
Godwari [gdx] India
No Statistics available for this language. more details ...
Gogo [gog] Tanzania
No Statistics available for this language. more details ...
Gondi [gon] India
No Statistics available for this language. more details ...
Gorontalo [gor] Indonesia (Sulawesi)
No Statistics available for this language. more details ...
Guarani [grn] Paraguay, Bolivia
Corpus: grn_community_2021 + 1
Sentences: 14,613,
Types: 40,052,
Tokens: 213,397
URLs: 2,254
more details ...
Hadiyya [hdy] Ethiopia
No Statistics available for this language. more details ...
Haitian [hat] Haiti
Corpus: hat_community_2017
Sentences: 23,384,
Types: 32,792,
Tokens: 335,466
URLs: 14,585
more details ...
Halh Mongolian [khk] Mongolia
Corpus: khk_community_2017
Sentences: 19,593,
Types: 43,349,
Tokens: 351,010
URLs: 2,182
more details ...
Haryanvi [bgc] India
No Statistics available for this language. more details ...
Hassaniyya [mey] Mauritania
No Statistics available for this language. more details ...
Hausa [hau] Nigeria
Corpus: hau_community_2017 + 2
Sentences: 41,589,
Types: 40,232,
Tokens: 1,017,657
URLs: 2,492
more details ...
Haya [hay] Tanzania
No Statistics available for this language. more details ...
Hazaragi [haz] Afghanistan
No Statistics available for this language. more details ...
Hiligaynon [hil] Philippines
Corpus: hil_community_2017
Sentences: 69,
Types: 399,
Tokens: 1,390
URLs: 3
more details ...
Hmong Daw [mww] China
No Statistics available for this language. more details ...
Hmong [hmn] China
No Statistics available for this language. more details ...
Ho [hoc] India
No Statistics available for this language. more details ...
Hunsrik [hrx] Brazil
No Statistics available for this language. more details ...
Ibibio [ibb] Nigeria
Corpus: ibb_community_2017
Sentences: 66,
Types: 217,
Tokens: 383
URLs: 17
more details ...
Igbo [ibo] Nigeria
Corpus: ibo_community_2017
Sentences: 7,742,
Types: 14,660,
Tokens: 158,694
URLs: 1,429
more details ...
Ilocano [ilo] Philippines
Corpus: ilo_community_2017
Sentences: 18,026,
Types: 43,542,
Tokens: 438,419
URLs: 3,402
more details ...
Izon [ijc] Nigeria
No Statistics available for this language. more details ...
Jambi Malay [jax] Indonesia
No Statistics available for this language. more details ...
Javanese [jav] Indonesia (Java and Bali)
Corpus: jav_community_2017
Sentences: 83,798,
Types: 134,699,
Tokens: 1,506,187
URLs: 18,815
more details ...
Jula [dyu] Burkina Faso
Corpus: dyu_community_2017
Sentences: 8,
Types: 17,
Tokens: 75
URLs: 1
more details ...
Kabardian [kbd] Russian Federation
Corpus: kbd_community_2017
Sentences: 9,242,
Types: 34,481,
Tokens: 116,016
URLs: 1,170
more details ...
Kabuverdianu [kea] Cape Verde Islands
Corpus: kea_community_2017
Sentences: 258,
Types: 1,108,
Tokens: 2,269
URLs: 99
more details ...
Kabyle [kab] Algeria
Corpus: kab_community_2017
Sentences: 4,778,
Types: 21,054,
Tokens: 103,625
URLs: 945
more details ...
Kalenjin [kln] Kenya
No Statistics available for this language. more details ...
Kamba [kam] Kenya
No Statistics available for this language. more details ...
Kanauji [bjj] India
No Statistics available for this language. more details ...
Kangri [xnr] India
No Statistics available for this language. more details ...
Kannada [kan] India
Corpus: kan_community_2017
Sentences: 102,397,
Types: 273,390,
Tokens: 1,590,451
URLs: 10,550
more details ...
Kanuri [kau] Nigeria
No Statistics available for this language. more details ...
Kashkay [qxq] Iran
No Statistics available for this language. more details ...
Kashmiri [kas] India
Corpus: kas_community_2017
Sentences: 448,
Types: 2,439,
Tokens: 4,749
URLs: 230
more details ...
Khams Tibetan [khg] China
No Statistics available for this language. more details ...
Kimbundu [kmb] Angola
No Statistics available for this language. more details ...
Kimîîru [mer] Kenya
No Statistics available for this language. more details ...
Kipsigis [sgc] Kenya
No Statistics available for this language. more details ...
Kituba [ktu] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Kituba [mkw] Congo
Corpus: mkw_community_2017
Sentences: 143,476,
Types: 218,126,
Tokens: 2,025,305
URLs: 39,683
more details ...
Kongo [kon] Dem. Rep. of Congo
Corpus: kon_community_2017
Sentences: 337,
Types: 1,448,
Tokens: 3,740
URLs: 179
more details ...
Koongo [kng] Dem. Rep. of Congo
Corpus: kng_community_2017
Sentences: 41,
Types: 178,
Tokens: 292
URLs: 24
more details ...
Kpelle [kpe] Guinea
No Statistics available for this language. more details ...
Kumaoni [kfy] India
No Statistics available for this language. more details ...
Kurdish [kur] Kurdistan, Iraq, Turkey
Corpus: kur_community_2017
Sentences: 36,729,
Types: 141,732,
Tokens: 789,948
URLs: 4,912
more details ...
Kurux [kru] India
No Statistics available for this language. more details ...
Kyrgyz [kir] Kyrgyzstan
Corpus: kir_community_2017
Sentences: 251,608,
Types: 281,359,
Tokens: 4,382,407
URLs: 33,270
more details ...
Lahnda [lah] Pakistan
No Statistics available for this language. more details ...
Laki [lki] Iran
No Statistics available for this language. more details ...
Lambadi [lmn] India
No Statistics available for this language. more details ...
Lango [laj] Uganda
No Statistics available for this language. more details ...
Lao [lao] Laos
Corpus: lao_community_2021 + 1
Sentences: 11,099,
Types: 38,524,
Tokens: 243,067
URLs: 4,513
more details ...
Limburgish [lim] Netherlands
Corpus: lim_community_2019 + 1
Sentences: 0,
Types: 0,
Tokens: 0
URLs: 0
more details ...
Lingala [lin] Dem. Rep. of Congo
Corpus: lin_community_2017
Sentences: 1,300,
Types: 5,060,
Tokens: 24,488
URLs: 214
more details ...
Lomwe [ngl] Mozambique
Corpus: ngl_community_2017
Sentences: 76,
Types: 270,
Tokens: 470
URLs: 36
more details ...
Luba-Kasai [lua] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Luba-Katanga [lub] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Lubukusu [bxk] Kenya
No Statistics available for this language. more details ...
Lugbara [lgg] Uganda
Corpus: lgg_community_2017
Sentences: 95,
Types: 368,
Tokens: 665
URLs: 40
more details ...
Maasai [mas] Kenya
No Statistics available for this language. more details ...
Maasina Fulfulde [ffm] Mali
No Statistics available for this language. more details ...
Madura [mad] Indonesia (Java and Bali)
Corpus: mad_community_2017
Sentences: 32,
Types: 94,
Tokens: 280
URLs: 7
more details ...
Magahi [mag] India
No Statistics available for this language. more details ...
Maguindanao [mdh] Philippines
No Statistics available for this language. more details ...
Mahasu Pahari [bfz] India
No Statistics available for this language. more details ...
Maithili [mai] India
No Statistics available for this language. more details ...
Makasar [mak] Indonesia (Sulawesi)
No Statistics available for this language. more details ...
Makhuwa [vmw] Mozambique
No Statistics available for this language. more details ...
Makhuwa-Meetto [mgh] Mozambique
No Statistics available for this language. more details ...
Makonde [kde] Tanzania
Corpus: kde_community_2017
Sentences: 96,
Types: 340,
Tokens: 705
URLs: 39
more details ...
Malagasy [mlg] Madagascar
Corpus: mlg_community_2017
Sentences: 92,762,
Types: 126,134,
Tokens: 1,191,086
URLs: 37,324
more details ...
Malay [msa] Thailand, Malaysia
Corpus: msa_community_2017
Sentences: 756,962,
Types: 341,656,
Tokens: 15,392,221
URLs: 89,140
more details ...
Malay [zlm] Malaysia (Peninsular)
No Statistics available for this language. more details ...
Malayalam [mal] India
Corpus: mal_community_2021 + 1
Sentences: 602,385,
Types: 1,001,257,
Tokens: 7,066,023
URLs: 62,153
more details ...
Mandingo [man] Senegal
No Statistics available for this language. more details ...
Mandinka [mnk] Senegal
No Statistics available for this language. more details ...
Manyika [mxc] Zimbabwe
No Statistics available for this language. more details ...
Marwari [mwr] India
No Statistics available for this language. more details ...
Masaaba [myx] Uganda
No Statistics available for this language. more details ...
Mazanderani [mzn] Iran
Corpus: mzn_community_2017
Sentences: 26,727,
Types: 46,533,
Tokens: 459,204
URLs: 10,350
more details ...
Meitei [mni] India
No Statistics available for this language. more details ...
Mende [men] Sierra Leone
No Statistics available for this language. more details ...
Min Dong Chinese [cdo] China
Corpus: cdo_community_2017
Sentences: 3,280,
Types: 10,200,
Tokens: 55,617
URLs: 584
more details ...
Min Nan Chinese [nan] China
Corpus: nan_community_2021 + 1
Sentences: 78,527,
Types: 103,993,
Tokens: 1,413,232
URLs: 2,930
more details ...
Mina [myi] India
No Statistics available for this language. more details ...
Minangkabau [min] Indonesia (Sumatra)
Corpus: min_community_2017
Sentences: 186,423,
Types: 93,939,
Tokens: 1,927,011
URLs: 170,993
more details ...
Mundari [unr] India
No Statistics available for this language. more details ...
Muong [mtq] Viet Nam
No Statistics available for this language. more details ...
Musi [mui] Indonesia (Sumatra)
No Statistics available for this language. more details ...
Mòoré [mos] Burkina Faso
Corpus: mos_community_2017
Sentences: 107,
Types: 382,
Tokens: 693
URLs: 43
more details ...
Ndau [ndc] Zimbabwe
No Statistics available for this language. more details ...
Ndebele [nde] Zimbabwe
No Statistics available for this language. more details ...
Ndonga [ndo] Namibia
Corpus: ndo_community_2017
Sentences: 13,495,
Types: 37,191,
Tokens: 308,836
URLs: 1,253
more details ...
Ngbaka [nga] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Nigerian Fulfulde [fuv] Nigeria
No Statistics available for this language. more details ...
Nigerian Pidgin [pcm] Nigeria
No Statistics available for this language. more details ...
Nimadi [noe] India
No Statistics available for this language. more details ...
Northern Hindko [hno] Pakistan
No Statistics available for this language. more details ...
Northern Khmer [kxm] Thailand
No Statistics available for this language. more details ...
Northern Luri [lrc] Iran
No Statistics available for this language. more details ...
Northern Qiandong Miao [hea] China
No Statistics available for this language. more details ...
Northern Sotho [nso] South Africa
Corpus: nso_community_2021 + 1
Sentences: 9,560,
Types: 12,871,
Tokens: 171,558
URLs: 4,293
more details ...
Norwegian [nor] Norway
Corpus: nor_community_2017
Sentences: 69,299,
Types: 95,606,
Tokens: 984,898
URLs: 28,907
more details ...
Nuosu [iii] China
No Statistics available for this language. more details ...
Nyakyusa-Ngonde [nyy] Tanzania
No Statistics available for this language. more details ...
Nyanja [nya] Malawi
Corpus: nya_community_2017
Sentences: 896,
Types: 4,343,
Tokens: 13,712
URLs: 162
more details ...
Nyankore [nyn] Uganda
Corpus: nyn_community_2017
Sentences: 5,
Types: 15,
Tokens: 36
URLs: 1
more details ...
Occitan [oci] France
Corpus: oci_community_2023 + 2
Sentences: 199,559,
Types: 258,826,
Tokens: 4,215,601
URLs: 36,911
more details ...
Oluluyia [luy] Kenya
No Statistics available for this language. more details ...
Oriya [ori] India
Corpus: ori_community_2017
Sentences: 39,314,
Types: 71,426,
Tokens: 504,296
URLs: 5,127
more details ...
Oromo [orm] Ethiopia
Corpus: orm_community_2021 + 1
Sentences: 2,637,
Types: 12,613,
Tokens: 44,923
URLs: 401
more details ...
Pahari-Potwari [phr] Pakistan
No Statistics available for this language. more details ...
Pampangan [pam] Philippines
Corpus: pam_community_2017
Sentences: 17,250,
Types: 41,952,
Tokens: 347,606
URLs: 5,593
more details ...
Pangasinan [pag] Philippines
Corpus: pag_community_2017
Sentences: 6,012,
Types: 11,854,
Tokens: 79,080
URLs: 3,784
more details ...
Pontic [pnt] Greece
Corpus: pnt_community_2017
Sentences: 1,564,
Types: 6,290,
Tokens: 25,581
URLs: 398
more details ...
Pulaar [fuc] Senegal
Corpus: fuc_community_2017
Sentences: 124,
Types: 1,022,
Tokens: 2,642
URLs: 63
more details ...
Pular [fuf] Guinea
No Statistics available for this language. more details ...
Pwo Eastern Karen [kjp] Myanmar
No Statistics available for this language. more details ...
Quechua [que] Bolivia, Peru
Corpus: que_community_2017
Sentences: 21,139,
Types: 38,495,
Tokens: 266,643
URLs: 13,583
more details ...
Quiché [quc] Guatemala
No Statistics available for this language. more details ...
Rajasthani [raj] India
No Statistics available for this language. more details ...
Rangpuri [rkt] Bangladesh
No Statistics available for this language. more details ...
Rohingya [rhg] Myanmar
No Statistics available for this language. more details ...
Romany [rom] Romania
Corpus: rom_community_2019
Sentences: 1,435,
Types: 6,646,
Tokens: 28,320
URLs: 237
more details ...
Rundi [run] Burundi
Corpus: run_community_2017
Sentences: 17,361,
Types: 56,856,
Tokens: 363,797
URLs: 1,812
more details ...
Rwanda [kin] Rwanda
Corpus: kin_community_2022 + 3
Sentences: 30,270,
Types: 88,135,
Tokens: 638,210
URLs: 4,259
more details ...
S'gaw Karen [ksw] Myanmar
Corpus: ksw_community_2017
Sentences: 448,
Types: 2,415,
Tokens: 4,749
URLs: 230
more details ...
Sadri [sck] India
No Statistics available for this language. more details ...
Santali [sat] India
No Statistics available for this language. more details ...
Sasak [sas] Indonesia (Nusa Tenggara)
No Statistics available for this language. more details ...
Sena [seh] Mozambique
Corpus: seh_community_2017
Sentences: 27,
Types: 118,
Tokens: 167
URLs: 10
more details ...
Seraiki [skr] Pakistan
Corpus: skr_community_2017
Sentences: 87,
Types: 885,
Tokens: 1,897
URLs: 42
more details ...
Serer-Sine [srr] Senegal
No Statistics available for this language. more details ...
Shan [shn] Myanmar
No Statistics available for this language. more details ...
Shekhawati [swv] India
No Statistics available for this language. more details ...
Shona [sna] Zimbabwe
Corpus: sna_community_2017
Sentences: 48,339,
Types: 122,437,
Tokens: 792,698
URLs: 4,881
more details ...
Sidamo [sid] Ethiopia
No Statistics available for this language. more details ...
Sindhi [snd] Pakistan
Corpus: snd_community_2017
Sentences: 7,431,
Types: 22,097,
Tokens: 140,754
URLs: 588
more details ...
Sinhala sin [sri] Lanka
No Statistics available for this language. more details ...
Soga [xog] Uganda
No Statistics available for this language. more details ...
Somali [som] Somalia
Corpus: som_community_2017
Sentences: 170,575,
Types: 216,881,
Tokens: 4,320,798
URLs: 22,918
more details ...
Songe [sop] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Soninke [snk] Mali
Corpus: snk_community_2017
Sentences: 124,
Types: 454,
Tokens: 866
URLs: 40
more details ...
Southern Balochi [bcc] Pakistan
No Statistics available for this language. more details ...
Southern Dong [kmc] China
No Statistics available for this language. more details ...
Southern Kurdish [sdh] Iran
No Statistics available for this language. more details ...
Southern Ndebele [nbl] South Africa
Corpus: nbl_community_2017
Sentences: 318,
Types: 2,643,
Tokens: 5,424
URLs: 29
more details ...
Southern Sotho [sot] South Africa, Lesotho
Corpus: sot_community_2017
Sentences: 9,773,
Types: 17,421,
Tokens: 238,709
URLs: 542
more details ...
Sukuma [suk] Tanzania
Corpus: suk_community_2017
Sentences: 47,
Types: 163,
Tokens: 391
URLs: 13
more details ...
Sunda [sun] Indonesia (Java and Bali)
Corpus: sun_community_2017
Sentences: 50,340,
Types: 84,722,
Tokens: 913,176
URLs: 6,187
more details ...
Surgujia [sgj] India
No Statistics available for this language. more details ...
Surjapuri [sjp] India
No Statistics available for this language. more details ...
Susu [sus] Guinea
Corpus: sus_community_2017
Sentences: 9,
Types: 79,
Tokens: 108
URLs: 5
more details ...
Swahili [swa] Tanzania
Corpus: swa_community_2023 + 3
Sentences: 59,342,
Types: 93,552,
Tokens: 1,412,549
URLs: 4,949
more details ...
Swati [ssw] South Africa, Swaziland
Corpus: ssw_community_2017
Sentences: 380,
Types: 3,132,
Tokens: 5,895
URLs: 60
more details ...
Sylheti [syl] Bangladesh
No Statistics available for this language. more details ...
Tachawit [shy] Algeria
No Statistics available for this language. more details ...
Tachelhit [shi] Morocco
No Statistics available for this language. more details ...
Tagalog [tgl] Philippines
Corpus: tgl_community_2017
Sentences: 979,689,
Types: 472,209,
Tokens: 20,664,580
URLs: 106,820
more details ...
Tajiki [tgk] Tajikistan
Corpus: tgk_community_2022 + 2
Sentences: 941,793,
Types: 596,826,
Tokens: 19,341,776
URLs: 93,504
more details ...
Tamashek [tmh] Niger
No Statistics available for this language. more details ...
Tarifit [rif] Morocco
No Statistics available for this language. more details ...
Tausug [tsg] Philippines
No Statistics available for this language. more details ...
Teso [teo] Uganda
No Statistics available for this language. more details ...
Thai [tha] Thailand
Corpus: tha_community_2021 + 1
Sentences: 57,017,
Types: 196,452,
Tokens: 793,384
URLs: 21,462
more details ...
Themne [tem] Sierra Leone
Corpus: tem_community_2017
Sentences: 8,
Types: 15,
Tokens: 40
URLs: 1
more details ...
Tibetan [bod] China
Corpus: bod_community_2017
Sentences: 7,525,
Types: 22,081,
Tokens: 32,178
URLs: 4,379
more details ...
Tigrigna [tir] Ethiopia, Eritrea
Corpus: tir_community_2017
Sentences: 1,379,
Types: 5,700,
Tokens: 17,837
URLs: 181
more details ...
Tigré [tig] Eritrea
No Statistics available for this language. more details ...
Tiv [tiv] Nigeria
Corpus: tiv_community_2017
Sentences: 3,
Types: 20,
Tokens: 60
URLs: 1
more details ...
Tonga [toi] Zambia, Zimbabwe
No Statistics available for this language. more details ...
Tsonga [tso] South Africa
Corpus: tso_community_2017
Sentences: 10,571,
Types: 17,493,
Tokens: 238,504
URLs: 446
more details ...
Tswa [tsc] Mozambique
No Statistics available for this language. more details ...
Tswana [tsn] South Africa, Botswana
Corpus: tsn_community_2017
Sentences: 28,276,
Types: 34,977,
Tokens: 687,676
URLs: 2,772
more details ...
Tulu [tcy] India
No Statistics available for this language. more details ...
Tumbuka [tum] Malawi
Corpus: tum_community_2017
Sentences: 240,
Types: 1,645,
Tokens: 4,989
URLs: 31
more details ...
Turkmen [tuk] Turkmenistan
Corpus: tuk_community_2023 + 2
Sentences: 181,
Types: 1,504,
Tokens: 2,952
URLs: 99
more details ...
Tày [tyz] Viet Nam
No Statistics available for this language. more details ...
Umbundu [umb] Angola
No Statistics available for this language. more details ...
Uyghur [uig] China
Corpus: uig_community_2021 + 1
Sentences: 68,736,
Types: 143,824,
Tokens: 1,177,946
URLs: 4,703
more details ...
Uzbek [uzb] Uzbekistan
Corpus: uzb_community_2017
Sentences: 663,119,
Types: 706,425,
Tokens: 10,900,014
URLs: 65,787
more details ...
Varhadi-Nagpuri [vah] India
No Statistics available for this language. more details ...
Vasavi [vas] India
No Statistics available for this language. more details ...
Venda [ven] South Africa
Corpus: ven_community_2017
Sentences: 9,279,
Types: 14,412,
Tokens: 179,877
URLs: 375
more details ...
Vlaams [vls] Belgium
Corpus: vls_community_2017
Sentences: 36,393,
Types: 75,458,
Tokens: 693,658
URLs: 4,740
more details ...
Waray-Waray [war] Philippines
Corpus: war_community_2017
Sentences: 808,036,
Types: 399,359,
Tokens: 13,358,684
URLs: 793,771
more details ...
Western Balochi [bgn] Pakistan
No Statistics available for this language. more details ...
Western Panjabi [pnb] Pakistan
Corpus: pnb_community_2017
Sentences: 63,683,
Types: 64,365,
Tokens: 1,052,347
URLs: 26,859
more details ...
Wolaytta [wal] Ethiopia
No Statistics available for this language. more details ...
Wolof [wol] Senegal
Corpus: wol_community_2017
Sentences: 9,988,
Types: 22,011,
Tokens: 254,548
URLs: 1,628
more details ...
Xhosa [xho] South Africa
Corpus: xho_community_2019 + 1
Sentences: 63,387,
Types: 172,520,
Tokens: 972,301
URLs: 4,227
more details ...
Yao [yao] Malawi
No Statistics available for this language. more details ...
Yilumbu [lup] Gabon
Corpus: lup_community_2017
Sentences: 12,246,
Types: 16,227,
Tokens: 87,930
URLs: 1
more details ...
Yombe [yom] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Yoruba [yor] Nigeria
Corpus: yor_community_2017
Sentences: 10,703,
Types: 28,265,
Tokens: 210,852
URLs: 1,961
more details ...
Zande [zne] Dem. Rep. of Congo
No Statistics available for this language. more details ...
Zarma [dje] Niger
No Statistics available for this language. more details ...
Zaza [zza] Turkey
No Statistics available for this language. more details ...
Zhuang [zha] China
Corpus: zha_community_2017
Sentences: 2,306,
Types: 6,395,
Tokens: 20,110
URLs: 642
more details ...
Zulu [zul] South Africa
Corpus: zul_community_2021 + 1
Sentences: 158,644,
Types: 372,449,
Tokens: 2,520,740
URLs: 5,316
more details ...