How to Program Like a Linguist

Reading Time: 12 minutes

This here is a friendly introduction to linguistics for programmers.

There are many ways to improve our programming skills: watching videos, reading articles, hackathons, and doing code competitions. But there’s another way that gives us less time in front of an IDE: Learning to think about how programming works.

We’re not talking about how compilers work, how we get to 1’s and 0’s, or how the hardware works. We’re talking about learning to think about how the language works.

Parsing errors with Prepositions is something up with which I shall not put

Here’s what I mean:

if you ever code something that "feels like a hack but it works," just remember that a CPU is literally a rock that we tricked into thinking
— ben 🚀 cobalt core! now! (@daisyowl) March 15, 2017

How much thought have you put into the way you’ve been talking to (or swearing at) the rock?

You know the keywords and the general syntax of your programming language(s) — but have you thought about what they mean and why they have to be written that way?

Let’s do that: Let’s go through some exercises that will teach us to think about a programming language the same way that linguists think about a natural language and see if we become better developers as a result.Watermark:FrankMTaylor

What is linguistics?

Before we learn how to apply linguistics to our coding, let’s keep in mind that the word linguistics is a lot like the term “computer science”; it’s an umbrella term that covers many specific fields of study. Let’s get familiar with a few of the different branches:

I/O
- Phonetics: How sounds are physically produced and interpreted
- Phonology: The organization of speech sounds
- Psycholinguistics: How the brain understands language
Graph Theory
- Historical Linguistics: How languages are evolving
- Comparative Linguistics: How languages compare to each other
Engineering
- Computational Linguistics: Statistical and rule-based modeling of natural languages
- Applied Linguistics: Creating real-world usage of some study of linguistics (e.g. teaching, developing software, writing textbooks)
Parsing and Compiling
- Syntax: Also known as “grammar”; the rules for how language is used
- Semantics: The study of meaning
- Morphology: The parts of words
Quality and Style
- Pragmatics: How language is used
- Stylistics: The study of patterns and devices in language
User Acceptance
- Sociolinguistics: How language affects interactions and relationships

Suffice it to say: a lot is going on in the world of linguistics. Someone could know a lot about etymology (origins of words) and very little about phonology.

If it sounds a bit confusing to you, just remember that there’s someone in marketing who doesn’t know the difference between you and a DBA. We’ll get through this together.

What kinds of linguistics is a programmer involved in?

Unless you’re actively working on language software in some way, chances are you aren’t technically involved in linguistics. But what we’re going to find out is that a lot of things you’ve already been doing as a programmer have had you thinking like a linguist:

If you’ve pseudo-coded something before taking it to an IDE, you’ve committed an exercise in Applied Linguistics.

Have you ever asked yourself how = ended up meaning, “assign this variable?” That’s semantics. And the fact that the name of the thing is on the left but the value is on the right is syntax.

When you wondered why Python prefers snake_case, JavaScript likes camelCase, but C# likes PascaleCase, that’s pragmatics.

Describing someone’s code as more “functional”, object-oriented, or imperative is a form of stylistics.

Remember that time someone you love confused Java and JavaScript and you explained the difference? Congratulations, you’re both patient and using historical and comparative linguistics.

You were secretly a linguist the whole time, but, really—how do you linguist?

First, let’s dispel common misconceptions about linguists and languages:

Not every multilingual person is a translator
Not every translator is a linguist
Not every linguist is multilingual

Linguists say that about as often as you refuse to fix a printer because, “that’s a hardware issue.”

A linguist is an observer of language

When we talk about “programming like a linguist,” we aren’t talking about learning multiple programming languages or explaining how to migrate a JavaScript library into Rust.

Regardless of being multilingual, a translator, or a programmer, we start linguisting when we:

Become an observer of the language
Identify interesting observations within the language and explore them
Consider those observations from both the transmitter’s and receiver’s perspective

What makes us different from a natural-language linguist is that we know the receiver is some sort of rock we’ve tricked into thinking.

Introductory Exercise: Analyze XKCD

'help, I fell down a hole,' shouted someone in a hole. Then a linguist ran over and asked, 'is fell down a hole exactly equivalent to fell in a hole? Or do they have slightly different implications?'

Linguists are the kind of people who enthusiastically exclaim, 'yes exactly,' when you tell them they're too focused on semantics.

Did you write const RETRY_LIMIT because you really didn’t want it to change or because you were shouting at the API?

Linguists aren’t pedants

The first thing we have to notice here is Randall’s caption where he explains that linguists aren’t grammar pedants. That’s true. Linguists generally fall into two camps:

Descriptive
Prescriptive

The English teacher who told you that sentences couldn’t end in a preposition was a prescriptive linguist. They were an observer of the language—with the intent of observing and informing you how you were using it incorrectly. Everyone who told you, “it isn’t a word unless it’s in the dictionary,” was a prescriptive linguist. Your JavaScript linter and Rust compiler are prescriptivists.

Linguists out in the wild doing the work of observing people trapped in holes and whatnot are descriptive linguists. They don’t see right and wrong grammar. If you say things like, “linguisty” they just scribble a note about, “y is an adjectivizer.” We, the human programmers, are descriptivists. ¹

The semantics of fell, down, and “a hole”

You might be tempted to think that this is just a matter of going to the dictionary, looking up the definitions, and then producing meaning by stringing those definitions together. That works fine until you realize that:

“fell” can be something you do to cut down a tree,
“down” can be a covering of soft feathers,
And “a hole” could mean a prison cell

Thank goodness we have an illustration to clarify that someone did not, in fact, errantly swing an ax at a goose-feathered pillow causing it to go into the nearby penitentiary.

Of course, these conceptual semantics are hyperbolic, but they are still technically correct — which is the ~~best~~ worst kind of correct.

Native English speakers will intuitively apply lexical semantics (how words refer to real things or concepts), and use context to understand that “fell” is the past tense of “fall”, “down” is an adverb² ,describing the direction of the fall, and “a hole” a noun representing a real physical location in the ground where the speaker landed.

The pragmatics of “in” and “down”

This is the actual point of the cartoon; why did our poor stick figure say “down” instead of “in”? “I fell in a hole,” and, “I fell down a hole,” are both grammatically correct and both communicate a sense of, “I didn’t start in a hole but now I’m in one.” But the choice to say “down a hole” communicates some shade of difference in how the poor stick figure fell.

The syntax of “I fell down a hole”

English is a very special language in that we can be a bit loosey-goosey with word order at times. But generally speaking, we like our sentences to go Subject-Verb-Object and we can see that in this sentence:

“I” subject
“fell” verb
“a hole” object

And as an additional English rule, we want our prepositions (those words that show how things relate to each other) to come before their complements—that’s why they are prepositions; they come before the thing they’re connecting. So “I fell in a hole” follows the 100% A-ok grammatically-correct order of “Subject-Verb-(preposition)-object”.

Except of course the sentence was, “I fell down a hole,” and down is a bit more ambiguous because it can be a noun, verb, adjective, adverb, or preposition. Which part of speech is it‽

Syntax is here to help!

It isn’t a verb: English syntax wants a subject to precede a verb. The subject is “I”, and it’s already got a verb, “fell”.
It isn’t a noun because English syntax wants a verb after a subject, and what comes after “down” is “a hole”.
It could be an adverb because adverbs can come after verbs
Or it could be a preposition because prepositions come before objects

So by knowing a bit of English syntax, suddenly we realize that there is some semantic ambiguity between “I fell in a hole” and, “I fell down a hole”; both describe where you are, but only one could be describing how you fell. Is “down” an adverb or a preposition? More importantly, will it help someone get out of the hole?

If you thought this was messy, remember that JavaScript has 8 rules it follows for figuring out if you meant to put a semicolon there and only gets it wrong sometimes.

The Stylistics

If someone said, “Down in the hole fell I”, that would be a ~~technically correct~~ terrible structure that would evoke thoughts of Jane Austen and 18th-century aristocracy.

No native English speaker stuck at the bottom of a hole would say that; it sounds like something we would say as part of telling a story about falling in a hole. It sounds like a second half of a chiasmus or an antimetabole where the speaker has no immediate concern for their well-being. But we would observe it and take note of who was listening to this remarkable story as it was being told.

Exercise: CSS Selectors

CSS is a domain-specific style language that’s low in syntax and rich in vocabulary. All the while it carries very few reserved keywords and instead opts for a small set of symbols. One of the best things to analyze with our linguist brain is selectors because that can truly help us write more efficient CSS.

Let’s start by looking at two rule-sets:

#topic-123 .header h1.title > a:hover {
   Border-bottom: 1px solid blue;
}

.title a:hover {
 Border-bottom: 1px solid blue;
}

Semantics tells us that a few symbols have a special meaning:

# means this is an element with an id
. means this is an element with a class
> means, “direct descendant of”
: means, “a pseudo-class of”
White space between two selectors means “is inside of”
{ means, “start of the declaration block”
: inside of a declaration block separates the element property from the element value
; inside of a declaration block means “end of declaration”

Syntax tells us

The right-most selector is the element that receives the style
If there is no whitespace between a selector and a reserved symbol, that is a secondary description of that element

So, knowing the semantics and the syntax of a CSS selector, let’s turn this code into a human-readable sentence:

A :hover pseudo-class for the a element that is directly inside of an element with the class title that is also an h1 that is inside of an element with the class header that is inside of an element with the id topic-123.
A hover pseudo-class for the anchor element that is somewhere inside of an element with the class title

Now let’s apply pragmatics

One of those sentences is long with lots of descriptors, and the other isn’t. Why would we choose the former while the latter is so succinct?

If we didn’t know how the browser parsed (i.e. evaluated the syntax and semantics) a CSS selector, we might be tempted to write very long CSS selectors. But with that knowledge, we can now try to write more efficient CSS selectors.

And that gets us to stylistics

Knowing that it’s always the right-most item that’s selected, we uncover an unexpected feature of CSS’ syntax: when chaining selectors, the chain is commutative.

All of these select the same element:

section#topic-123:hover.container
section.container:hover#topic-123
section:hover#topic-123.container
section:hover.container#topic-123
section.container#topic-123:hover
section#topic-123.container:hover

Don’t ask a web developer, “should I write these?” Most front-end developers would say, “no!” ³ Instead ask, “If I had to choose one to write, which one would it be?” And then ask, “what scenarios would make me choose a different one?” Now we’re exploring stylistics!

Any time the syntax gives us multiple choices in how we can say something, the choice we make is added information that we can share.

I think most front-end developers would tell you, without knowing why, that they would prefer the sixth option. That decision algorithm isn’t quite as complex as the Urinal Protocol Vulnerability that lives in every man’s head, but it is up there.⁴

Finally, some sociolinguistics

Almost all CSS frameworks select elements with short classes and don’t use much chaining. Now we can see why:

Shorter selectors are more efficient
Chaining is commutative and could reveal unintended meanings

Front-end developers are favoring frameworks with lots of selectors containing few classes.

Exercise: Functions in JavaScript

JavaScript is a high-level, just-in-time compiled language that runs in all web browsers and quite a few servers, too. It’s one of the core languages of the web (like CSS). So let’s use our language brains to learn something about functions.

function sayWord(word) {
  console.log(word);
}

Semantics tells us that a few words and symbols have special meanings:

function is a reserved word that means that this is a block of code with a name that can be executed later
() informs us the names of the arguments to be accepted
{ tells us that a block is starting with some statements (instructions)
; tells us that a statement has ended
} tells us that the block of statements is finished
for is a reserved word that means, “for each item that is countable”
in is a reserved word that means, “the property in the object”

Syntax tells us

The name of the function comes after the function keyword
The statements come after the opening {
The statements end with the closing }
The ; goes at the end of a statement

With semantics and syntax out of the way, let’s once again make it into a sentence:

There is a function named sayWord that takes an argument called word and when this function runs it should console.log(word);

Now let’s apply pragmatics

Pragmatics boils down to a focus on the implied and inferred meanings in our statements. We don’t have to look at the mechanics of each component here, instead, we look at what we’re “really” saying.

So what are we “saying” when we make a function called sayWord? We’re saying that it’s a function whose purpose isn’t to say two words. Or three. Or even a sentence. That’s even reinforced by the name of the function’s parameter, word. Both the name of the function and the name of the parameter reinforce usage.

You might realize that your pragmatics is incredibly valuable for dynamically typed languages like Python and JavaScript because it’s how you can communicate how your classes and ultimately libraries should be used.

Exercise: Statements…in JavaScript and Python

We learned in our JavaScript exercise with functions that { and } start and end a function block. But our study of semantics would be incomplete if we thought they only worked when we used the function keyword.

First, let’s observe how we’d create a loop in JavaScript:

for (i in [‘foo’, ‘bar’]) {
   console.log(i);
}

Semantics reinforces what we knew about some symbols, and introduces two new keywords

{ tells us that a block is starting with some statements (instructions)
; tells us that a statement has ended
} tells us that the group of statements is finished
for is a reserved word that means, “for each item that is countable”
in is a reserved word that means, “the property in the object”

Syntax tells us

The arguments for for come inside a parenthesis placed after the keyword
The left-hand side of the in keyword comes out of the object on the right
The statements come after the opening {
The statements end with the closing }
The ; goes at the end of a statement

Next, let’s observe some totally valid (albeit weird) JavaScript:

const cracker = “saltine”;
{
   const topping = “peanut butter”;
   const cracker = “graham” ;
   console.log(topping, cracker); // logs ‘peanut butter graham’ 
}
console.log(cracker); // logs ‘saltine’

Semantics tells us that those curly braces mean, “a collection of statements.”

Syntax tells us that we don’t need any identifiers at all in front of those statements.

Wild, huh?

Now let’s apply some comparative linguistics

Let’s translate our code into Python and do some analysis:

for i in ['foo', 'bar']:
   print(i)

Semantics tells us

: introduces a block
Any indentation indicates that it’s an instruction belonging to the block
for is a reserved word that means, “for each item that is countable”
in is a reserved word that means, “the item in the object”

Syntax tells us

The arguments for for come after the keyword
The left-hand side of the in keyword comes out of the thing on the right
The instructions come after the : and are indented
The instructions end before the non-indented code
A new line indicates the previous instruction has ended

Great. We’ve learned some things about blocks.

But what about this?

cracker = "saltine"

   topping = "deep"
   cracker = "graham"
   print(topping, cracker)

print(cracker)

Python will tell you it’s a syntax error: IndentationError: unexpected indent. And it’ll still do it if you add a colon (:) above the indented code. Or at the end of line 1. Why?

Let’s take a step back and learn something from human languages:

Any translator will tell you, “translation is not a matter of words only,” ⁵ , because translation is not simply about semantics; it’s a combined effort between semantics and syntax (and in literary translation, pragmatics and stylistics, too).

The block in Python is not simply a colon (:) followed by an indented line.

In Python, the block starts with an identifier!

More technically, Python considers a block a compound statement.

But JavaScript doesn’t have compound statements; it has block statements. A block statement is delimited with just { and } and doesn’t require an identifier.

By doing some comparative linguistics on some syntax, we’ve discovered that two things that look similar on the surface are in fact very different things. So different it would take some serious mental effort to translate them.

What have we gained from programming like a linguist?

We chose to become observers of our programming languages. We picked on CSS, JavaScript, and Python today, but we could do this in C#, Java, and SQL if we wanted. The point was that we made a concerted effort to not just use them but think about them.

Then we picked up on some interesting observations in those languages. We looked at how CSS selects selectors, and how JavaScript identifies functions, then asked ourselves, “what is a block?” And then we decided to think about what the code that we’re writing means not only to us but to someone else.

If we’re talking about how we’ve named things and written in a certain way, that “someone else” is another programmer. But if we’re talking about how they work and what they mean, we’re talking about an interpreter and/or compiler.

So the next time you decide to write some fresh code, take a minute to think about all the little tiny rules you’re following, the words you’re using, and what it all truly means. Your computer is a rock we’ve tricked into thinking, but maybe now you’ve got a better idea of how we tricked it into thinking with a few little words and rules. Now you can program like a linguist.

Sources and Whatnots

Save for the few like Guido van Rossum, Larry Wall, Bjarne Stroustrup, Brendan Eich, and others who are going out creating programming languages like Python, Perl, C++, and JavaScript. Though Larry Wall might object to being called a prescriptivist purely on the basis that he is also a linguist.
Properly trained linguists may argue that “down” is a preposition that can function as a verbal modifier either on its own, or as the head of a prepositional phrase that modifies the verb. But that’s really an argument to have for someone who’s ready to accept that there aren’t really adverbs and adjectives. This isn’t that time.
Doing so is often considered a hack.
The reason for option six has to do with the fact that developers prefer pseudo-class (:hover) at the end because this is a state — which is notable. They also prefer the most-permanent attribute at the beginning (the #) because that’s what’s most likely to change when something goes wrong. The syntactic constraints of chaining then put element name (section) on the left side of the id.
https://www.goodreads.com/quotes/3211247-translation-is-not-a-matter-of-words-only-it-is

3 Comments

Dominic
(1 year ago) Permalink Reply

Fell isn’t something you do to chop down a tree. It’s not swinging the axe. It’s chopping down the tree.
Arguments will be entertained as to whether those things on the back of excavators actually fell trees or merely snip them off and place them gently on the ground.
1. paceaux
  (1 year ago) Permalink Reply
  
  Merriam-Webster defines “fell” as “to cut, knock, or bring down”. Whether “fell” means an axe has been swung or an excavator has been used is really up to the imagination of the writer.
  1. Dominic
    (1 year ago) Permalink Reply
    
    You swing the axe, but swinging the axe isn’t felling until the tree comes down. The excavator thing was an attempt to make humour from the fact that fell is cognate with fall, and if the tree is gently placed on the ground…. well whatever ;-)

Frank M Taylor

blog

How to Program Like a Linguist

What is linguistics?

What kinds of linguistics is a programmer involved in?

You were secretly a linguist the whole time, but, really—how do you linguist?

A linguist is an observer of language

Introductory Exercise: Analyze XKCD

Linguists aren’t pedants

The semantics of fell, down, and “a hole”

The pragmatics of “in” and “down”

The syntax of “I fell down a hole”

The Stylistics

Exercise: CSS Selectors

Now let’s apply pragmatics

And that gets us to stylistics

Finally, some sociolinguistics

Exercise: Functions in JavaScript

Now let’s apply pragmatics

Exercise: Statements…in JavaScript and Python

Now let’s apply some comparative linguistics

What have we gained from programming like a linguist?

Sources and Whatnots

3 Comments

Leave a Reply Cancel reply