Introducing Isidore

Reading Time: 3 minutes

Anyone who knows me knows that I have a bit of a thing for languages. I like studying them and I like speaking them. But I also like learning how they…work. A few years ago I read a book called, The Secret Life of Pronouns which got me interested in sentiment analysis. The central thesis of James Pennebaker’s book was that, by evaluating a small subset of words, you can learn a lot about the person: gender, age, whether the person is a subordinate, etc. The only catch is that, well, you have to know what a pronoun is. Or a definite article. Or an adjective.

And that got me thinking: I need a thing that can do that. I need a part-of-speech tagger.

Which also got me thinking, “I should make a part of speech tagger”.

So I did.

Quintessential Pamene kuzindikira ulemu wobadwa nawo komanso ufulu wofanana ndi wosatha wa anthu onse ndiye maziko a ufulu, chilungamo ndi mtendere padziko lonse lapansi,
Pamene kunyalanyaza ndi kunyoza ufulu wa anthu kwachititsa zinthu zankhanza
zomwe zakwiyitsa chikumbumtima cha anthu, ndi kubwera kwa dziko
momwe anthu adzasangalalire ndi ufulu wolankhula, chikhulupiriro, ndi ufulu
kuopa ndi kusowa kwalengezedwa ngati chikhumbo chachikulu cha anthu wamba
,
Pamene ndikofunikira, ngati munthu sakakamizidwa kuti apeze njira yomaliza
yopandukira nkhanza ndi kupondereza, kuti ufulu wa anthu uyenera
kutetezedwa ndi lamulo,
Pamene ndikofunikira kulimbikitsa ubale wabwino pakati pa mayiko,
Pamene anthu a United Nations mu Charter atsimikiziranso chikhulupiriro chawo
mu ufulu wofunikira wa anthu, ulemu ndi kufunika kwa munthu
ndi ufulu wofanana wa amuna ndi akazi ndipo atsimikiza mtima kulimbikitsa
kupita patsogolo kwa anthu ndi miyezo yabwino ya moyo mu ufulu waukulu,
Pamene Mayiko Omwe Ali Mamembala alonjeza kukwaniritsa, mogwirizana
ndi

The Name is Isidore

Jeff Atwood is right, naming things is hard. So I decided to look for patron saints of languagey things. Saint Isidore of Seville was a scholar who lived in Seville, Spain. He wrote this book, Etymologiae, which, like the word sounds, was an etymology. Etymology is the study of the origin of words. And…well, I want to study words.

And Saint Isidore is also considered the patron saint of the internet. And, I work on the internet.

So Isidore seems like an appropriate name.

What does Isidore do?

Right now, Isidore does part-of-speech tagging. Feed it some text, it’ll tell you if a word is a pronoun, a verb, an adjective, or an adverb. So if you want to do something with that information, it’s on you to figure out what you want to do.

Eventually I want Isidore to be able to do some analysis; tell you how many verbs, adjectives, adverbs, etc you have in a block of text. But for right now, it’s a part-of-speech tagger.

Feed it something like this:

const { Sentence } = isidore
const mySentence = new Sentence('He gives him a car.');
const { wordList } = mySentence;

And you get a result like this:

Sentence {
    text: 'He gives him a car.',
    language: 'En',
    rawWordList: [ 'he', 'gives', 'him', 'a', 'car' ],
    wordList:
    [
        Pronoun {
            partOfSpeech: 'pronoun',
            word: 'he',
            referent: 'animate',
            gender: 'masculine',
            type: 'subject',
            person: 3,
            quantity: 'singular'
        },
        Verb {
            partOfSpeech: 'verb',
            word: 'give',
            type: 'transitive',
            valence: 2
        },
        Pronoun {
            partOfSpeech: 'pronoun',
            word: 'him',
            referent: 'animate',
            gender: 'masculine',
            type: 'object',
            person: 3,
            quantity: 'singular',
        },
        Adjective {
            partOfSpeech: 'adjective',
            word: 'a',
            type: 'article',
            degree: undefined
        },
        Noun {
            partOfSpeech: 'noun',
            word: 'car',
            type: 'entityClass',
            subType: 'common',
            inflection: undefined
        }
    ],
    type: 'declarative'
}

Isidore isn’t too special

Natural Language Processing and part-of-speech tagging isn’t something new. It’s been around for a hot minute. So Isidore isn’t particularly special. I will call out two things that might make Isidore a bit different, possibly, from other PoS utilities:

It’s written in JavaScript. As most NLP utilities are written in Python, this is a differentiator.
It’s written with multiple languages in mind. I’ve done my level-best to not be English-centric in my approach.

Why Use Isidore?

Really the idea is powered by what James Pennebaker talked about in his book: discover neat facts about a person by the pronouns (or other words) they use. But be able to do it in more languages with a part-of-speech tagger that’s built to support more languages.

Where’s the code

Of course, it’s on Github. The master branch is currently version 0.0.4, and I’m already working on version 0.0.5 (my develop branch). You’ll see that I’ve set up issues and a project within Github, so you can see what I’m working on.

If you want to install it and try it out, it’s as easy as:

npm install isidore

You can checkout NPM, if you’re really curious.

As usual, I’m interesting in feedback and open to contributions.

Frank M Taylor

blog