Introducing Methodius CLI

Reading Time: 4 minutes

People are sometimes unprepared to learn that I have a deep (read: unhealthy) interest in languages. Up to and including, “how do you even figure out what makes a language so languagey?” (sidenote: apologies to the mother of the 8 year old a few days ago who wanted to know why bears are called bears)

A while back I shared that I created a JavaScript library called Methodius; a utility that lets you analyze frequencies and other facts about arbitrarily-sized chunks of text. Now I get to share a tool that uses it.

Cyril and Methodius were very proud of the script they created. Latin was for dorks. — Cyril and Methodius, the conlangers of the 9th century. Painted by Zahari Zograf (Захарий Христович Димитров). Public Domain

Quintessential Pamene kuzindikira ulemu wobadwa nawo komanso ufulu wofanana ndi wosatha wa anthu onse ndiye maziko a ufulu, chilungamo ndi mtendere padziko lonse lapansi,
Pamene kunyalanyaza ndi kunyoza ufulu wa anthu kwachititsa zinthu zankhanza
zomwe zakwiyitsa chikumbumtima cha anthu, ndi kubwera kwa dziko
momwe anthu adzasangalalire ndi ufulu wolankhula, chikhulupiriro, ndi ufulu
kuopa ndi kusowa kwalengezedwa ngati chikhumbo chachikulu cha anthu wamba
,
Pamene ndikofunikira, ngati munthu sakakamizidwa kuti apeze njira yomaliza
yopandukira nkhanza ndi kupondereza, kuti ufulu wa anthu uyenera
kutetezedwa ndi lamulo,
Pamene ndikofunikira kulimbikitsa ubale wabwino pakati pa mayiko,
Pamene anthu a United Nations mu Charter atsimikiziranso chikhulupiriro chawo
mu ufulu wofunikira wa anthu, ulemu ndi kufunika kwa munthu
ndi ufulu wofanana wa amuna ndi akazi ndipo atsimikiza mtima kulimbikitsa
kupita patsogolo kwa anthu ndi miyezo yabwino ya moyo mu ufulu waukulu,
Pamene Mayiko Omwe Ali Mamembala alonjeza kukwaniritsa, mogwirizana
ndi

How do you use Methodius CLI?diligently deny demons dairy

Well step 1 is going to be to install it globally:

npm i -g methodius-cli

Now run it on a text file:

methodius -f "inferno.txt"

What are the options with Methodius CLI?

Designate what properties you want:

If you look at Methodius, you’ll see there’s a lot of properties to choose from. The more properties you want, the slower it gets. So if you’re only interested in meanWordSize, letterPositions, or wordFrequencies, you can get just that thing

methodius -f "inferno.txt" -f -p letterFrequencies -p uniqueBigrams

Methodius will tell you in the terminal window that it starts and what it's doing. it will output the exact filename you've designated. — Methodius, like me after 3 drinks, likes oversharing. It starts by telling you it’s started, and what it’s reading.

Methodius will tell you how many characters it's read, that it's created the analysis object, and even give you a summary.

the summary tells you how long it all took, mean and median word sizes, the top related birams, and tells you where the output file is. — Methodius gives a helpful summary once analysis finishes

Methodius will also tell you what properties it scanned for and what topMethods it looked for. — Methodius will also remind you what you asked for at the end.

Run the top methods and set a limit

Not everything on a Methodius instance is a property. So for the top methods like getTopLetters() and getTopBigrams(), you can grab those and optionally set what the parameter should be.

methodius -f "inferno.txt" -t topLetters -t topWords -l 25

Run it on multiple files:

Methodius would be super tedious if you had to do all this on one file at a time. So you have the option to set multiple files.

methodius -f "inferno.txt" -f "divine-comedy.txt"

Designating output file

So this is a bit complex. The default output file name is analysis.json. You can change that to whatever you want.

methodius -f "inferno.txt" -t topWords -l 25 -o inferno-top-words.json

But what if you have multiple files? How does the output name work then?

methodius -f "inferno.txt" -f "divine-comedy.txt" -t topWords -l 25 -o words

Something like the above will give you words.inferno.json, words.divine-comedy.json

Methodius will tell you at the end what your filenames are — Methodius won’t leave you guessing, it’ll tell you at the end what the results are.

Of course, you could also designate a folder, too:

methodius -f "inferno.txt" -f "divine-comedy.txt" -t topWords -l 25 -o "words/"

Methodius will show the directory and filename of the results file. — Easy peasy lemon-squeezy

And that would give you words/inferno.json and words/divine-comedy.json.

Merge results from multiple files

So this is what really makes the CLI so useful for analysis. You could analyze multiple text files at once and have it merge all the results for you into one file.

When it merges, if the property is an Object or a Map, it’s the keys that are merged (with duplicates removed). If the property is an Array or a Set, they’re concatenated and duplicates are removed. If the value is a number, then what you get is an average.

methodius -f "inferno.txt" -f "divine-comedy.txt" -p bigramFrequencies -p wordFrequencies -l 25 -m

The output will be a merged.json file.

You also have the option to use a dedicated merge command, because maybe you want to produce the analyses in one command, and then you want to merge separately:

methodius-merge -f "inferno.analysis.json" "divine-comedy.analysis.json" -o "dante.json"

If you decide to use methodius-merge, you get the added option of being able to designate exactly which properties are put in the output file:

methodius-merge -f "inferno.analysis.json" "divine-comedy.analysis.json" -o "dante.json" -p uniqueWords

The nice thing about the dedicate merge is that it’s less cluttered.

See for Yourself

Feel free to clone the repository.

If you’re a VSCode user, you’ll notice that there’s a launch.json file there with some commands already set up for you to run on the sample texts in that repository. I’ve found that copying & pasting those commands was about the easiest way to do text analysis.

Editing launch.json file will mean that the commands will appear as drop downs in VSCode's debugger. you'll see multi-file, All-English, single-file, Multi-file Merge and dedicated merge — There’s options. that’s all I’m saying. Who doesn’t like options?

What’s Next?

Maybe more work in the merger/summarization process? IDK.

What would aspiring computational linguists want to know once they’ve scanned a bunch of texts?

Frank M Taylor

blog