Improving an RSS Twitter Bot

Shortly after creating the RSS twitter bot to tweet out an unduplicated list of published items, I realized a few issues with the approach:

Because of this, the “_once” accounts now retweet the publications’ original tweets instead of composing their own tweets.

Resolving URLs

We’ll still need a way to keep track of the articles that are retweeted so we don’t retweet a tweet that contains an article that’s already been sent out. Most of the infrastructure and logic from the RSS bot still applies: store article URLs in a keyed DynamoDB table and query against it to determine if the article’s been sent out already.

Using the article’s URL as a key was simple with an RSS feed, since it was the key itself for the feed. With tweets, URLs are usually shortened, making them unsuitable for use as a unique key because the same link would be shortened to 2 separate links.

To fix this issue, we can take the shortened link and resolve it by using the HEAD HTTP method, and follow any redirects. The HEAD method is the same as a GET, but the server won’t return the body of the target page. This is great because we don’t care about the body, and we want the call to be quick. Following redirects is necessary because a shortened link is essentially just a redirect.

The popular request module has an easy way to perform this resolution:

request({
  method: 'HEAD',
  url: 'shortened url',
  followAllRedirects: true
}, (err, res) => {
  const resolvedUrl = res.request.href;
});

After that, we can determine whether the incoming tweet contains a link that’s been retweeted before, and use the Twitter API to retweet it if not.

All code and current list of feeds can be found here. These feeds are active, tweet at @srednass or add an issue to the GitHub project to suggest a new publication’s feed.