So let’s get on with that.
This was the one I started on. The basic problem was that most times I went to read my Instapaper articles, I’d have 15 minutes or so to do it. I’d want to hammer through a bunch of short ones, but I never knew which ones were the short ones.
Instapaper also has a “Text” feature, which works like Readability 1. It strips out all the crap, and just gives you the article text, formatted nicely so you can actually read it. If I could count the number of words in the relevant element (and let’s reject words less than 3 characters), then I’d have a pretty good idea of the length of it, and if I knew the length of all them, I could sort them.
The code is worth a thousand words, but this is basically what happens:
- Proxy all requests to www.instapaper.com
- If the Content-Type of the response is HTML, we keep it and send it to a page processing task.
- We also stuff in a couple script tags: one for jQuery, and one for the script from the shortestpaper application.
- In the page processing task, we use jsdom and jQuery to extract all the URLs, which are stuffed into a queue if nothing exists in the Redis store for that URL.
- Another process polls the queue,2 and requests the Instapaper text page for that URL.
- We again use jsdom and jQuery to grab the relevant element from that response, grab the innerText, split on whitespace and count the words.
- We store that count in Redis using the first 10 characters of the SHA1 of the URL as the key.
Okay, so now what? Remember that script we insert to the document?
- The script grabs all the URLs, calculates their SHA1, and requests some JSON from the server.
- This JSON is a
SHA1 ⇒ countmapping.
- Sort the elements, and add the word count to the controls!
Now I can burn through short articles.
I could also have done this using a Chrome extension, but developing Chrome extensions isn’t my favorite thing in the world, so I went this route. It also will work in all browsers, so that’s a big win. Future improvements are probably going to include using the Readability stuff from the next project so I’m not bound by the Instapaper rate limit.
If you use Instapaper, check out shortestpaper at http://shortestpaper.darkhax.com/.
kindlebility was my original use case. I wanted to be able to use Readability on the server, turn an article into a clean PDF, and send it to my Kindle. One click! Bookmarklet. That’s what I wanted. So I did it.
- I do nothing in the request, except add a job to the queue. I used technoweenie’s chain gang since it doesn’t need to be persistent.
- From there, the worker is a big chain of callbacks:
Ten minutes later, you’ve got the article on your Kindle, converted by Amazon to be all nice and readable. In one click.
At first, shortestpaper was a bit wonky. Sometimes would be slow and sort of never finish. After talking with Zed Shaw about it, he suggested cranking up the
buffer_size in the settings, and that did the trick. I might even crank it up some more. If you’re having problems with
Proxy setups in Mongrel2, look at the
shortestpaper took me a few days to write, as I was just learning nodejs. Some error messages are confusing, some libraries didn’t work 100% the first time around, and I was getting use to
npm. kindlebility took me a couple hours one day after work.
All in all, I’m quite impressed. nodejs seemed to like to eat the CPU on my slice on small spurts, and liked to eat RAM, though it gave it back. It’s damn fast though. Development is quick, but the error messages are sometimes frustrating. Debugging isn’t built in, but go grab ndb and you can use
debugger; in your code, and it will shell out to a debugger console. Code reloading isn’t built in either, but there are other modules that can apparently do that, and forks of nodejs with it integrated into the server.
1 Which is what Apple used for Safari’s Reader functionality.
2 Instapaper has a rate limit, which we need to obey.