2005-09-20

The Big Eight

Languages with more than a hundred million speakers, that is; not accounting firms, which in any case are now reduced to the Big Four thanks to mergers and malpractice.

Update: Blast it, Japanese has 122 million speakers. Why does Ethnologue claim there are only 8 languages in this range? Screws up the whole post. It makes me wonder if there are more hiding in the database but not readily accessible from the Web pages. Anyhow, thanks to JibberJim for the heads-up here.

Chinese. 873 million native speakers. The Big One, really; it's almost three times the size of Spanish and English, the next largest languages. It's also the most widely taught second language in the world, though most of the people who learn it as a second language live in China, and almost half of the second-language speakers use another Sinitic language as their mother tongue.

Spanish. 322 million native speakers. The widespread one: it's an official or national language in 21 countries plus New Mexico, a state of the U.S. It's truly remarkable how well this old colonial empire has held together linguistically despite vast political and geographical differences; though it's possible to tell where a Spanish-speaker comes from, there are no serious barriers to mutual intelligibility (though there are many jokes about it).

English. 309 million native speakers. English now belongs to the world: about 200 million people speak it as a second language with varying degrees of proficiency, making it the most widely taught foreign language. If we counted English as a Second Language as a separate language, it would make this list the Big Nine.

Bengali. 211 million native speakers, half of them in the small but very densely populated country of Bangladesh (which indeed is named after the language). Most people who speak Bengali, interestingly, are descended from second-language speakers, which gives it an unusual degree of consistency for such a large language.

Arabic. 206 million native speakers. This is the diverse one: "Arabic" is actually a cover term for over 30 closely related (but not always mutually intelligible languages), unified by Modern Standard Arabic, which is an updated version of Classical Arabic, the language of the Quran. Nobody actually speaks it, except for politicians making speeches, and even then the speeches have to be translated into the local Arabic language.

Hindi. 181 million native speakers. If you count the closely related Urdu (different script, different religion, different set of twenty-dollar words), tack on another 60 million native speakers. There are a lot of other languages in India, some of them quite large, and the term "Hindi" covers a lot of linguistic territory: it has been said that the local language in India changes significantly every 100 kilometers.

Portuguese. 177 million native speakers. Bizarrely, the Ethnologue calls this "a language of Portugal", even though only a tiny fraction of Portuguese speakers live there. The overwhelming majority, of course, live in Brazil, the other large non-Spanish-speaking American country.

Russian. 145 million native speakers. Here the much more recent colonial empire didn't hold together so well: except in Russia itself and in Kyrgyzstan, the language is not official anywhere, though there are plenty of Russian-speakers in what Russians call "the near foreign".

So how are we doing if we know all those languages (and hardly anyone does, I'll bet)? Well, we can now handle the native languages of just 40% of the world's population. That's how linguistically diverse the planet is. If you throw in the next 75 languages, with more than 10 million but less than 100 million speakers, you do much better, reaching 79%; and adding to that the 264 languages with more than a million but less than 10 million speakers, and coverage goes up to 93%.

After that, it's the Long Tail, with 6,565 languages with less than a million speakers, covering the remaining 7%. To be sure, sometimes native languages aren't everything: in the Pacific island nation of Vanuatu, with about 200,000 people, no single language has more than about 7000 native speakers, but almost everyone can handle at least some Bislama, an English-based creole (which itself has only 5000 native speakers).

Essentially all the data here comes from the 15th edition of the Ethnologue.

11 comments:

M. T. MacPhee said...

"...the other large non-Spanish-speaking American country."

OK. Lets see, the largest non-Spanish-speaking American country is Canada (the second largest country in the world, after Russia), with French and English native speakers. You mentioned Brazil, or as they write it, Brasil.

Was there anyone else?

Mike

david said...

You know this already, but it's still nasty treating Chinese as a single language, while Spanish and Portuguese (which are mostly mutually-comprehensible, unlike, say, Mandarin and Cantonese) are treated as separate languages.

John Cowan said...

m.t.: In this context, "large" means large in population.

david: By "Chinese" I mean exclusively Mandarin here: note the reference to "other Sinitic languages".

M. T. MacPhee said...

No! Really?

Thanks for the slight, though, eh?

Mike

Tim May said...

Was it your intention not to name "the Pacific island nation of 200,000 people"? It's not hard to work out, but the sentence is a little odd.

John Cowan said...

mike: No slight intended.

tim may: Thanks; I dropped a line out of my draft. Fixed.

M. T. MacPhee said...

"No slight intended."

Well, that will make all Bahamians, Jamacians, Caymaners, Belizians, Netherlands Antillians, Trinidad and Tobaboans, Granadans, Barbadians, Grenadinians, St. Lucians, Martiniques, Dominicans, Quadeloupes, Montserratians, St. Kitts and Nevisians, Antigua and Barbudans, Anguillans, BV Islanders, and Haitians as well as Canadians feel ever so much better.*

Intended or not, it was very real.

Mike

*With apologies to those I may have missed, or misspelled. Such a *long* list of non-Spanish speaking American countries. So little time.

Anton said...

... closely related (but not always mutually intelligible languages), ...

Was the placement of the closing paren. intentional?

Anton said...

Speaking of that other Portuguese-speaking country, I'm curious about when "Spain" came to mean "Iberia minus Portugal". In 1469, when Isabella married Ferdinand? In 1492, when Islam lost the last bit of Andalus? In 1516, when Ferdinand died, leaving Castile and Leon and Aragon and Granada to his Habsburg grandson?

Anonymous said...

According to Harmann 2001b, page 11ff.

Chinese
English
Hindi
Spanish
Russian
Arabic
Bengali
Portuguese

Anonymous said...

The numbers are off and some languages are missing. Where are French and German? CIA factbook is a good source. Ethnologue is great, but it has its limitations.Check out Wiki too.

http://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers