Sometimes, it looks like it is not possible to avoid using an accumulating array, a pattern that feels unnatural in Ruby.
Recently, I’ve need to chase down and unroll pagination links over a JSON / REST api. I don’t know how many pages there will be, and it’s probable (but not guaranteed) that I need to retrieve and use all of the content. Since each page is dependant on results from the previous page, there is no obvious Enumerable parallel. Here, I’ll demonstrate a quick refactoring that will provide in a clean, lazy enumerable object.
This being a HATEOAS API, the next page link is embedded in the response JSON,
tucked under a
["links"]["next"] key. To fetch all of the data, I end up with
code that looks something like:
def retrieve_all_pages(url) widgets =  while url response = connection.get url json = JSON.parse response.body url = json["links"]["next"] widgets.concat json["widgets"] end widgets end
As ruby goes, this is pretty ugly. When there’s something distinct to
enumerate over, it’s recommend to replace the accumulating
widgets array with
#map call, or working with some other
Enumerable method. Unfortunately,
there isn’t a clear parallel for this case.
Another issue is that this loop fetches every single page, before returning control, and regardless of how many results I actually end up using.
Fortunately, there is
Enumerator, Ruby’s answer to producing generators.
The enumerator class produces an enumerable, backed with any arbitrary
I can refactor the while loop to look more like this:
def retrieve_all_pages(url) Enumerator.new do |yielder| while url response = connection.get url json = JSON.parse response.body url = json["links"]["next"] Array(json["widgets"]).each do |widget| yielder << widget end end end end
We’ve gotten rid of the accumulator array, and instead have something that looks much closer to idiomatic ruby. Additionally, the method begins yielding immediately after fetching the first page, only retrieving additional pages when needed, without client code needing to understand the mechanics of the underlying pagination.
It is important to note that instead of returning an
retreive_all_pages now returns an
Enumerable, but generally quacks the same
– it’s rather unlikely that any client of the original implementation was
using direct array semantics; if so, a simple
to_a converts an Enumerable to
a normal Array.
Overall though, I find the resulting enumerator to be far easier to work with in standard ruby, providing a more flexible and versatile encapsulation of the remote interface.