Publishing compute-histogram (or unexpected depth)

I published my first module (compute-histogram) to NPM this weekend. I originally whipped out a quick histogram implementation in JavaScript to better understand how frequent wide swings are in the Treasury bond market. Recently I decided to improve the functionality, and specifically add heuristics for automatically determining the number of bins. The shape of a histogram can vary widely depending on the number of bins.

Histograms are a rather simple statistical concept, but I will admit I was surprised by how deep you can go into the topic. My implementation only scratches the surface, but the NumPy implementation, which is fairly robust, goes on for about 1000 lines of code and comments and it doesn’t fully explore bin size estimation.

In many ways, it is easier to write an application today than 10 years ago, but it is still possible to go insanely deep in many topics. User authentication is table stakes for any SaaS application, but it is surprisingly complex. In fact, entire companies focus on the problem. This is one of my favorite articles on user authentication by one of the authors of Django’s registration module: Let’s talk about usernames. It is 6000 words on the topic of uniqueness of usernames.

As software developer and designer, especially at smaller companies, it is a constant tradeoff between depth and breadth. While small companies can’t avoid going deep in all topics, specifically those areas which are market differentiators, most companies can’t afford an expert level of knowledge in topics like user authentication. I think the real art of product development is knowing when you’ve gone deep enough.