Till today we have tried out and used plenty of databases. Some of them are relational and some of them are document storage and column storage. We have tried out MySQL, Cassandra, Redis, MongoDB. But our Assortment API poses an interesting challenge to be handled carefully. Before going into the details on how we actually handled it, lets shed some light on what assortment means. Each website has their own set of products on different categories. Apart from categories, there are number of important attributes for a product – subcategory, product type, price, discount etc. Each source tries their best to showcase their products competitively with respect to other websites. Our assortment API provides intelligence on the assortment of products for each of these sources. For example, the customer would like to know how many products each source have for jeans category within the price range 1000 to 2000 by different brands.
The challenge is to build an API flexible enough to sustain all the query requirements. This will need a database and very expressive query language. The queries can vary category by category. The customer might like to view apparel assortments by different sizes, fit and price range. On the other hand, another customer might want to view assortments on laptop products by RAM, processor speed and price range. It would be unwise to have one assortment API per category. The challenge from an engineering perspective is how to accommodate every request into one solid API with great flexibility under the hood which will serve any request at scale.
We have found MongoDB to do just that. Its query language is great and expressive. The aggregation framework is robust and the Python driver serves all your needs. Since we have pushed its aggregation framework to its limits, I will spend few more sentences praising it. Thanks to its pipeline, you can push all your
limit and many more into one single query and still get back your data in real time. Currently the DB has few hundred thousands of products in each collection but still does not fail to deliver the result in subsecond latency. This happens in a single Mongo server! Things are bound to get interesting later on when we spread the data across multiple nodes and shards.
Things are looking promising at this point and we believe that we should be able to scale this assortment API with time serving from few million to hundreds of millions of products in future.