Exploring MongoDB’s Aggregation Pipeline

The perfect solution to Fisher Guiding’s complex search requirements

by Jonathan Cox

Space. Time. Spacetime. While reading Why Does E=mc2? (And Why Should We Care?) by Brian Cox (no relation) and Jeff Forshaw recently, I learned that distances in four-dimensional spacetime are invariant and can easily be measured using the equation s2 = (ct)2 - x2 where c is conventionally called the speed of light.* Easily.

Thankfully, this post deals with programming challenges that are much less complex but do involve space and time in the guises of maps and calendars. More specifically, I’ll be discussing some complicated search requirements and how we fulfilled them using MongoDB’s aggregation pipeline — one component of many for a large custom application we recently built for Fisher Guiding.

Think of Fisher Guiding as “Airbnb for anglers” — an app that lets a fisher book a fishing trip with a guide near a particular place on a particular date. Here’s how it works. A visitor arrives at the site and puts in, at a minimum, the name of a destination where’d they like to cast a line. Upon clicking “search,” the would-be fisher sees a list of guides, their locations and the prices of their lowest-cost trips. That’s the simple version. Under the hood, that single action puts a number of variables into play. The destination is transformed from a location name to a pair of coordinates, which are then checked against the coordinates of available trips; the resulting list of trips is sorted by geographic proximity and grouped by guide with the minimum aggregate price attached.

It only gets more complicated if the visitor takes advantage of a number of optional filters: number of allowed guests, trip type, trip attributes and price. But by far, the most complicated component is the date filter. Guides manage their availability like repeating events, so they’re able to create a regular schedule by choosing the days of the week on which they’re available. This regular schedule can be complemented with a custom schedule on a date-by-date basis. For example, if I have recurring availability on Saturdays but will be visiting Grandma on the last day of April (a Saturday), I can mark myself unavailable on the 30th. Similarly, say I’m usually unavailable on Fridays but happen to be available one Friday, I can mark myself available on that particular day. This presented another challenge: how do we turn this combined schedule into a set of dates that can be queried in a performant way so that search results are returned quickly?

We initially thought to use MongoDB as a complement to the standard MySQL database — it would essentially hold a pre-warmed cache of denormalized availability data that would let us avoid running expensive application code on each search request and bypass the hydration process for entities (the app uses Doctrine as its ORM). But as it turned out, that was only the first of numerous advantages offered by MongoDB. Its geospatial commands let us bypass some complex implementation of the haversine formula traditionally used with SQL to calculate distances, and its powerful aggregation pipeline let us completely avoid an SQL/application code hybrid to generate our search results.

MongoDB’s aggregation pipeline proceeds in stages, and the following is a general overview of how a Fisher Guiding search proceeds.

  1. $geoNear — In this stage, the collection of Trip documents is queried to find all trips that fall within a certain radius of the submitted coordinates. A stage is not restricted to the primary operator, so to the primary $geoNear operator we can add other MongoDB query operators. So before the $geoNear stage is complete, we have already filtered out any trips that do not also match the optional filters if used (price, minimum guests, etc.).
  2. $lookup — This is like a left outer join in a relational database and lets us combine data from two different document collections. In this case, we have Availability documents that are joined to Trip documents with the same guide identifier — at this point, we can filter out any trips that do not fall within the date range provided by the visitor, if any.
  3. $project — That’s the verb project, like actors do with their voices. This stage lets you reshape the data from the documents in the current result set into any form you wish — the new form does not even need to match any existing document structure. This is where we append the distance in miles and available dates to each Trip as miles and days, respectively.
  4. $group — At this point, our result set is still essentially lots and lots of trips. As its name clearly indicates, this is the stage when the newly restructured documents are grouped by guide, effectively transforming the results from a list of trips to a list of guides. One notable special effect at this stage is that we make sure the minimum trip price for each guide is preserved (otherwise, it would be an arbitrary price value from within the same grouping).
  5. While not technically a stage, the last operation is simply to sort our search results by ascending distance.

I did not previously know about Mongo’s aggregation pipeline, so discovering it at the beginning of this project was rather fortuitous — I cannot now think of a technology better suited to the project’s search requirements. Instead of a complicated mix of SQL and application code, we have a single, fluent query that lets us search, filter and transform the data into a consumable set of results.

Oh, and how fast is it? It may not quite reach the speed of light, but try it for yourself.

* I say “conventionally” because, as the authors write, c is actually the speed of particles with no mass. What’s more, photons actually have the potential to move faster than c over short distances. Can you tell that particle physics is a recent fascination of mine!? In addition to their book on relativity, I highly recommend Cox and Forshaw’s The Quantum Universe (And Why Anything That Can Happen, Does) and Richard Feynman’s popular lectures on quantum electrodynamics, published simply as QED.

Posted in: Development

We can make it together.

Let's talk today

Recent write-ups