2 Node.js Apps That Showed Me The Light

I don’t know if you know this, but everybody and their dog is writing node.js applications. It’s more popular than Kanye memes.

It’s a contagious bug, and I caught it too. I had a one specific use case that it was perfect for, since a Javascript library already existed to do it. Another idea came later, and that’s the one that started it for me.

So let’s get on with that.

shortestpaper

This was the one I started on. The basic problem was that most times I went to read my Instapaper articles, I’d have 15 minutes or so to do it. I’d want to hammer through a bunch of short ones, but I never knew which ones were the short ones.

Instapaper also has a “Text” feature, which works like Readability ¹. It strips out all the crap, and just gives you the article text, formatted nicely so you can actually read it. If I could count the number of words in the relevant element (and let’s reject words less than 3 characters), then I’d have a pretty good idea of the length of it, and if I knew the length of all them, I could sort them.

Proxy time!

The code is worth a thousand words, but this is basically what happens:

Proxy all requests to www.instapaper.com
If the Content-Type of the response is HTML, we keep it and send it to a page processing task.
We also stuff in a couple script tags: one for jQuery, and one for the script from the shortestpaper application.
In the page processing task, we use jsdom and jQuery to extract all the URLs, which are stuffed into a queue if nothing exists in the Redis store for that URL.
Another process polls the queue,² and requests the Instapaper text page for that URL.
We again use jsdom and jQuery to grab the relevant element from that response, grab the innerText, split on whitespace and count the words.
We store that count in Redis using the first 10 characters of the SHA1 of the URL as the key.

Okay, so now what? Remember that script we insert to the document?

The script grabs all the URLs, calculates their SHA1, and requests some JSON from the server.
This JSON is a SHA1 ⇒ count mapping.
Sort the elements, and add the word count to the controls!

Now I can burn through short articles.

I could also have done this using a Chrome extension, but developing Chrome extensions isn’t my favorite thing in the world, so I went this route. It also will work in all browsers, so that’s a big win. Future improvements are probably going to include using the Readability stuff from the next project so I’m not bound by the Instapaper rate limit.

If you use Instapaper, check out shortestpaper at http://shortestpaper.darkhax.com/.

kindlebility

kindlebility was my original use case. I wanted to be able to use Readability on the server, turn an article into a clean PDF, and send it to my Kindle. One click! Bookmarklet. That’s what I wanted. So I did it.

I do nothing in the request, except add a job to the queue. I used technoweenie’s chain gang since it doesn’t need to be persistent.
From there, the worker is a big chain of callbacks:
1. Download the page.
2. Run node-readability on it.
3. Save the HTML out to a file.
4. Run wkhtmltopdf on it.
5. Read the PDF in and base64 encode it.
6. Email it to my Kindle address using Postmark
7. Clean up.

Ten minutes later, you’ve got the article on your Kindle, converted by Amazon to be all nice and readable. In one click.

Deployment

Since I wanted both of these apps on port 80, and I didn’t want to run nodejs as root, so I put both apps behind Mongrel2 on my Rackspace slice.

The config:

	shortestpaper = Host(name = 'shortestpaper.darkhax.com', routes = {
	'/': Proxy(addr = '127.0.0.1', port = 8080)
	})

	kindle = Host(name = 'kindle.darkhax.com', routes = {
	'/': Proxy(addr = '127.0.0.1', port = 9090)
	})

	main = Server(
	uuid = 'util',
	chroot = '.',
	pid_file = '/run/mongrel2.pid',
	access_log = '/logs/access.log',
	error_log = '/logs/error.log',
	default_host = 'shortestpaper.darkhax.com',
	name = 'util',
	port = 80,
	hosts = [shortestpaper, kindle]
	)

	settings = {
	'zeromq.threads': 1,
	'limits.buffer_size': 4096,
	'limits.proxy_read_retry_warn': 5
	}

	servers = [main]

view raw mongrel2.conf hosted with ❤ by GitHub

At first, shortestpaper was a bit wonky. Sometimes would be slow and sort of never finish. After talking with Zed Shaw about it, he suggested cranking up the buffer_size in the settings, and that did the trick. I might even crank it up some more. If you’re having problems with Proxy setups in Mongrel2, look at the buffer_size.

Postmortem

shortestpaper took me a few days to write, as I was just learning nodejs. Some error messages are confusing, some libraries didn’t work 100% the first time around, and I was getting use to npm. kindlebility took me a couple hours one day after work.

All in all, I’m quite impressed. nodejs seemed to like to eat the CPU on my slice on small spurts, and liked to eat RAM, though it gave it back. It’s damn fast though. Development is quick, but the error messages are sometimes frustrating. Debugging isn’t built in, but go grab ndb and you can use debugger; in your code, and it will shell out to a debugger console. Code reloading isn’t built in either, but there are other modules that can apparently do that, and forks of nodejs with it integrated into the server.

These little apps, along with all the other cool nodejs stuff I’ve seen, have really convinced me. The fact that you can simulate a browser window and run Javascript designed for the client like it’s just another day of the week is pretty mind boggling, not to mention powerful. Javascript on the server is here to stay.

¹ Which is what Apple used for Safari’s Reader functionality.

² Instapaper has a rate limit, which we need to obey.