Thursday, December 5, 2013


I don't have screen shots for this because it happened a while back and I've switched jobs, so I'll tell this tale in text.

Working on migrating articles out of a company wiki I wrote a script to download these articles automatically using the wiki's rest API. Here's the general algorithm:

1. Start with article 0 and request a batch of 100 articles (maximum allowed)
2. Request the text for the first article returned
3. Request list of attachments for each article
4. Download each attachment
5. Repeat for the next article in the batch
6. Grab the next batch
7. Stop when the batch returned is less than 100

Since the API is labeled "RESTful" this should be fine, right? Each batch of 100 will always return the same 100 articles, so asking for them sequentially is fine, right?

Wrong. So very wrong. Putting the word "REST" next to the word "API" does not necessarily mean they gave you a REST API.

One article was failing, making my whole script bomb. Thinking I could pop in and try to exclude that specific article, I found the index number and excluded it. But the next day it failed again. I tried to figure out why and realized that the bad index was moving down the index once every 6 - 10 hours. Which means that the indexes were stateful. Which means it's not REST.

I get it. I honestly do. Not cleaning up indices makes for a painful system that can get bloated fast. But don't use the buzzword if you can't actually make it work.