by

Introducing SelectorHound

Reading Time: 4 minutes

A few years back I ran into a difficult situation on a project: I needed to find out where a particular CSS selector was used.

I had a static version of the site, and so I did what any fool might do and I tried searching for it in my IDE. I had two problems though:

  1. Writing a RegEx to essentially parse HTML is substantially more difficult dangerous than you might expect.
  2. The actual live version of the site was managed by a CMS (content management system), and it would be much more difficult to know where that selector might end up

So after almost a day of failing to produce a good-enough RegEx, I got an idea: What if I just scanned the live site for that selector?

In about the same amount of hours it took for me to write a RegEx that didn’t always work, I was able to produce a node.js-based script that could scan a live site for the selector.

So with that, I got the bright idea to make it a proper NPM package that could run on the command line. And now I should introduce you.

It can be very, very, very hard to find things on websites. It's especially defeating when you built the damned thing yourself.
Buddy you’re never gonna find #treats if you don’t stop crapping on my shoes

Introducing SelectorHound

SelectorHound is on NPM and believe it or not it’s already at 2.2!

It’s a Command Line Interface (CLI) that offers a pretty robust set of options:

  • Give it a single selector or a CSS file
  • Give it a URL to a sitemap or tell it to crawl your site
  • Ask for a lot of details about HTML elements that match the selector, or a screenshot
  • Tell it to treat pages like they’re a SPA (Single Page Application) or like static HTML

What it’s good for

  • Do you have CSS on your site that you’d like to delete, but you’re uncertain if it’s used anywhere?
  • Are you looking for instances where one element may be next to another?
  • Would you like to know if your stylesheet has CSS that could be deleted?
  • Has malware infected your CMS and started adding weird links?
  • Do you have calls to action that might be missing data attributes?

All of these are real world use-cases that I’ve used SelectorHound for.

Try it out

First, install it

npm i -g selector-hound

Or, for more speed:

bun install -g selector-hound

Then run it

SelectorHound -u https://blog.frankmtaylor.com/sitemap.xml -s "h1"

Then look at what you got

It’ll tell you what it’s doing as it gets started

Selector Hound will tell you what the sitemap is, and the CSS selector you've asked for.
It’s a very proud pupper as you can see

And it will export all those URLs to a JSON file. This means you can customize the pages it scans. It’ll rely on that JSON file for every scan unless you pass -X to force it to generate a new sitemap file.

SelectorHound tells you how many urls it finds from your sitemap
Fetching is so much easier to represent in emoji than URLs and reading from a disk

Then it’ll tell you when it’s finished and give you a nice summary of what it found.

SelectorHound will tell you how long it took, how many pages it scanned, which pages had a match, the total results, and the name of the file you need to look in
SelectorHound is a very good doggo

You can modify the output file name with the -o flag. Your chosen name will be prepended to pages.json

Don’t forget to check the log

the log.txt file will show you any errors that happen while its running. It'll also have the same details that you see in the command line.
The log file looks a lot like what you see in the CLI. But one difference is that between start and finish will be error messages if they should occur

And then look at the results

Selector Hound outputs results to a pages.json file where you can see, with the -e flag details that make it easy to spot the element on the page
The -e flag will give you all the sporty details you need to know exactly where the element is on the page and what it looks like.

The output can be pretty robust because it’ll give you results for every page that it found. I am working on a reporting feature that can summarize the results if you don’t want to wade through what could be thousands of lines of JSON.

Is it performant?

It’s faster than writing a RegEx to scan your codebase, that’s for sure.

I’ve done a little bit of testing and found that, if you were looking for a single HTML element, it might take on average .52s per page. If you install with Bun, you will get maybe a .1s gain.

I’ve used SelectorHound with sitemaps containing up to 2000 links, and with crawling that produced up to 500 pages.

Activating Puppeteer to either take screenshots or just expect it to be a SPA will slow things down significantly, so use that with caution.

Where can you see the code?

It’s over on Github. I welcome contributions and feature requests.

Leave a Reply

You don't have to register to leave a comment. And your email address won't be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.