Readable, Writable, and Transform Streams in Node.js

Streams in Node.js are very powerful and useful constructs. They are also one of the more difficult concepts in Node to wrap your head around. Over the past few weeks I’ve become more and more familiar with them, and the best way I’ve come to understand them is by creating simple examples of them myself. This post will go through trivial implementations of a readable, writable, and transform stream, and demonstrate how to interact with each and how they interact with each other. Continue reading

Setting up Nginx Reverse Proxy for Node.JS on Fedora

One of Nginx‘s popular uses is as a reverse proxy for several different Node.js servers. A reverse proxy makes it easy to point to each separate app without having to remember which instance is on what port. Nginx makes this very easy, but Fedora’s SELinux policies make this setup not so straightforward. In this tutorial, I’m going to map 3 Node.js web servers, running on different ports, to different virtual directories on the same domain. Continue reading

Understanding Javascript Callbacks

The first step in understanding the concept of Javascript callbacks is to realize that functions are really objects. The thing that makes them special is that you can invoke them (i.e. ‘run’ them). This is done by using ‘.call()’, or simply ‘()’. Because functions are objects, you can pass them as parameters to any other function. Continue reading

Infinite Scrolling in Android

Infinite scrolling has become very popular in recent years. It’s become especially popular on mobile devices for the simple fact that it allows you to fetch new data while accessing data that’s already been fetched. The concept is pretty simple – once the end of the list is detected to be near, a call is made to fetch more data which is then appended to the end of the list. This post goes through my implementation, contained here. Continue reading

Data Join Techniques in JavaScript

When I decided to cut over my festival guide to a standalone site, I needed to figure out a way to do the data operations on the client’s browser that I was doing on the MySQL server. Specifically, I needed to emulate joining tables. The nested loop join is the simplest join you can do in SQL, but it happens to be the most costly. The merge join is a much faster method that can also be easily emulated in JavaScript, but is only advantageous if the sets are sorted (which can offset any performance gain over the nested loop join). I wanted to find out if any of the popular JavaScript data manipulation libraries offer an advantage over using standard JavaScript operations. Continue reading

Text Deduplication in SQL

Data deduplication is essential when importing similar data from different sources. Different providers store data differently, and several variations (both correct and incorrect) exist in the English language for names of people, companies, and entities in general. Deduplication is often made easier if there is a lot of other information associated with the data because it gives you several things to compare to identify a dupe (such as birthday for people, location for company, etc.). When you’re trying to identify duplicate names only, things get a bit tricky. Continue reading

Mapping the Snapchat Data Leak

I’ve been following the Snapchat data leak pretty closely the past few weeks, from the announced weakness to the actual leak of the phone numbers. What I found most interesting about this in particular was that instead of email addresses, password hashes , or credit cards, the leaked data was geographical, mappable data. Continue reading

Hashing With SQL Server CLR

I have been looking at using hashes in a computed column to determine equality among rows, rather than compare each column. While running some tests, I encountered a limitation with SQL Server’s HASHBYTES function: the input can only be 8000 bytes or smaller. This won’t work for our purposes, as some of our tables have NVARCHAR(MAX) columns whose maximum length exceeds 8000 bytes. One solution I’m looking into is using a CLR. UPDATE: I’ve added remarks and benchmarks for the undocumented function “fn_repl_hash_binary”. Continue reading