by

Why do some languages have more words than others?

Reading Time: 4 minutes

Ever heard that Arabic has 10 million words? That English has a million? French has 80,000, or that the Alaskan Inuits have no word for love but 16 for snow?

Why is that?

Totally weird illustration of lists of things that look like they should be words but they really aren't
Words are only words because we all mutually agree that they’re words. Stare long enough and the semantic saturation will kick in and you’ll actually think this says anything. It doesn’t. DALL-E was trippin’ on mescaline again.

“Word” is a tricky definitiondiligently deny demons dairy

This is one of those things that’s really easy to define so long as you don’t know much about linguistics. A word is some combination of sounds that communicates meaning, right?

ehhhhhhhhhhhhhhhhhhhh.

Linguists have a much harder time defining a word than non-linguists.[1] I’ll try to spare you the fullness of the debate, but it amounts to the fact that we kinda think it’s something like a singular unit of meaning attached to a sound but maybe not under all conditions.

A Word Example:

We might say “word” is a word.

Cool.

It means something like,” set of sounds that communicates meaning and can’t be divided.” Maybe it also means “some brief remark or conversation.”[2]

Cool cool.

So then, what is “reword” ? Is that a new word? You added this “re-” prefix that means “again,” and it’s often used as a verb. So… definitely a new word, right? So in your analog dictionary, are you going to the R section or the W?

But “words” isn’t a new word. That’s just a plural of of the “noun” word. The “s” at the end means, “more than one”, and you can’t write “s” all on your own and have it mean many, so “words” isn’t two words smashed together like “can’t” or “isn’t”, so “words” really isn’t a word. Right?

Cool cool cool.

What about “unword”? “Un” means “not”. Can we “unword” something? Does that even make sense? Does it have to make sense for it to count as a word, though?

What about “Wordle”. That’s a new word, right? We added “le” but it doesn’t mean anything so it’s a new word even though it has “word” in it.

Ok, what about “wordy”? As in, “Frank you’re being way too wordy.” Isn’t that a word totally different from word, since it’s now an adjective and we adjectivize nouns with “y”? Or is it not a new word because you had the word “word” to build on, and the adjective just kinda means, “yeah, so it’s like the thing I’m talking about”?

So adjectivized nouns aren’t new words, right? Or are they?

What if I’ve verbified nouns, though? Does that make them new words?

The kind of language affects how we might define what a word is

We can have synthetic languages and we can have agglutinative languages. (agglutinative is a fancy term that means “gluey”)

Synthetic languages can trick you into thinking many different words are one word

Spanish, for instance, is a kind of synthetic language that we’d call inflectional.

So in Spanish, Decir is the word for

  • Digo
  • digas
  • diga
  • digamos
  • digan

Those are all inflections; changes to a “base”.

Ok, sure, they’re totally different “words”, but we don’t put all of those forms in a dictionary, do we? We say that Spanish has a verb “decir” and that we can change it in all these ways that just so happen to form very different shapes from each other.

See how it can be confusing how we’d count the “words”?

Agglutinative languages can trick you into thinking really big words are many little words.

In Turkish you can say Çekoslovakyalılaştıramadıklarımızdanmışçasına. You would translate that as, “as if you were one of those whom we could not make resemble the Czechoslovakian people”

Aside: Damn Turkish, calm down.

Agglutination means you can add all these little bits together in various order and each of those bits has a meaning and once they’re mashed together you get your word. Except even the order of the bits can shift meaning. So then how many words are even in Turkish? Shouldn’t it be infinite? Why then does Wikipedia say Turkish dictionaries max at 316,000?[3]

Some languages have more words than others because of how they form words

It’s that synthetic versus agglutinative thing. Languages that are “synthetic” like English or Spanish may look at any totally disparate sound/meaning chunks and say they are the same “word”.

Be, is, was, were” in English are all “to be”.

So is “soy, eres, es, fué” in Spanish.

Notice how those to-be words look nothing alike yet we’ve all been acting like they mean the same thing? Spooky ain’t it?

Then over in German you can say, “Backpfeifengesicht(face in need of a fist) as one word. And it’s a totally new, on the spot “word” you probably won’t find in a dictionary but you will find in Slack DMs when the new boss wants you to work late on Fridays.

Dictionaries are not comprehensive lists of all words

The job of a dictionary is to give you the “base form” or “lemma” of a word. So whether you’re counting by dictionary size or capability within the language matters. A language may not have many dictionary entries, but could be because it’s agglutinative (gluey) nature makes up.

Does this mean some languages are more efficient than others?

Efficient is kind of a subjective thing, right? How do we measure “efficiency”? Is it letter (sound) count, syllables, the number of pauses between sounds, or how many breaths you take?

Is it more efficient to say, “garbage collector in the back of a truck” or “Mullautohintendraufsteher?

Do what the Persians did and try it drunk, then try it sober, and report back.