Dec21

Hypermedia APIs, Part One

Part of me bristles when I hear someone say “Hypermedia API.” I worry it’ll become the sort of phrase, like “semantic web,” that means different things to different people, and ends up covering such a breadth of ideas that it’s impossible to argue for or against without specifying which flavor you’re addressing.

Nonetheless, when I see DHH arguing against Hypermedia APIs, I worry that we’re in serious “die, heretic scum” territory. I’m no expert, but the difference between REST and Hypermedia really doesn’t seem that large, especially in a universe where SOAP is a thing. Moreover, Rails deserves a lot of credit for demonstrating that web APIs could work within HTTP rather than try to reinvent it. Out of the box, Rails checks three out of four of Steve Klabnik’s boxes, and all we’re arguing over is that last one.

Anyway, what prompted this was a post by Adam Keys, my former Gowalla colleague. I agree with most of what he’s saying here. My gut reaction to Hypermedia APIs is this:

Roughly 90% of it is sensible stuff that I’ve already seen in the wild and which is demonstrably a Good Idea. The remaining 10% is the stuff that (at this early stage) seems non-intuitive, or overkill, or YAGNI, or whatever the word is for a thing that you think is awesome but which your users won’t give a damn about.

In fact, that last thing is my chiefest concern. The final 10% seems to require nontrival re-education on the part of consumers. I don’t mean they’d have to be brainwashed; I just mean that some of the stated benefits only come to pass if the consumers buy in, and in my experience an API consumer wants to do the simplest thing that could possibly work. I believe this is what Adam is getting at in his follow-up post.

The Gowalla API

Adam’s opinions on hypermedia are informed, in part, by his time at Gowalla, and so are mine. Before I convince myself it’s a bad idea, let’s take a retrospective look at the Gowalla API (which, by the way, was started in 2008–2009) and see how it measures up against a hypermedia rubric.

Things we did right

Addressability

URLs identified resources. A spot had the same URL whether you were requesting an HTML representation in a web browser or a JSON representation from curl. If a form could create a resource by POSTing some multipart form data to a URL, odds are a client could create the same resource by POSTing some JSON to that same URL.

This was less like a knowing philosophical decision and more like a thing that Rails just does by default. Until rather late in the game, if you were using the Gowalla API, your requests were hitting the same controllers and actions as web users’ requests. (Eventually we decided to move API stuff into dedicated controllers for maintainability’s sake, but that tilting-at-a-windmill saga will have to be told on another day.)

Content negotiation

As implied above, the API was driven by content negotiation. If you asked for HTML, you got a browser representation; if you asked for JSON, you got a pure data representation. (If you asked for XML, we pretended we didn’t hear you.)

HATEOAS

We endeavored to practice what Steve calls HATEOAS: Hypertext As The Engine Of Application State. To over-simplify: a response should publicize the URLs of any resources that are reasonably related to it.

(By the way: I do not come down on one side or the other here. If there’s a natural workflow to your API, as there was for Gowalla’s, it obviously makes sense to publicize related resources rather than force a user to memorize your URL-making conventions. On the other hand, odds are high that your API consumers will make assumptions about your URL schemes anyway. So I’m not sure what HATEOAS gets you in the real world, except for the ability to say “I told you so.” Which, admittedly, is underrated.)

But back to Gowalla. If you were authenticated with Gowalla and requested the resource for your own user profile, this is a snapshot of what you saw:

// GET /users/savetheclocktower
{
  "stamps_count": 14,
  "stamps_url": "/users/savetheclocktower/stamps",

  "pins_count": 11,
  "pins_url": "/users/savetheclocktower/pins",

  "top_spots_url": "/users/savetheclocktower/top_spots",

  "friends_count": 44,
  "friends_url": "/users/savetheclocktower/friends",

  // ...

  "visited_spots_urls_url": "/users/savetheclocktower/visited_spots_urls"
}

Nearly every meaningful kind of resource is discoverable by starting at this response and navigating through the various URLs. (Of course, not every API use case would start with loading a specific user’s profile. For instance, those that were interested mainly in the place database would probably start with the result set of spots from a geographical search.) Though the URL conventions were simple enough that a client could build URLs on their own, we tried to make it so that building URLs was harder than just using the URLs that we’d given you in the response. This gave us a theoretical freedom to change URLs in the future (not that we’d ever want to do so, we thought).

This style — in which everything ending in url points to another resource — is just one version of what HAL or Collection+JSON are trying to formalize. It’s a pattern that worked very well for us. It made our API very “surfable,” and though I doubt we had machine discovery in mind when we were doing it, it did mean that the API explorer I built was a lot of fun to use — anything that looked like a URL was hyperlinked, and clicking on it would load that new resource in the explorer. We updated the URL hash, too, so the back button would return you to the previous resource.

Crucial to all of this is that the API used a resource’s URL as its unique identifier, rather than a raw ID. This is the part that Rails didn’t give you out of the box, so credit to Scott for designing it this way.

What we could’ve done better

API Versioning

Rather than version our API with MIME types, we used a separate X-Gowalla-API-Version header, defaulting to the most recent version if a JSON-requesting client omitted this header.

I don’t necessarily think that our approach was wrong — only that if we’d made people opt into a particular MIME-type, rather than just the generic application/json, and if the MIME-type was tied to a particular API version, we likely would’ve had fewer incidents where changes we made inadvertently broke third-party tools.

Discoverability

When I said that every resource was discoverable, I was lying. Nearly all GET requests were discoverable. Anything that required a POST (and any GET that involved query parameters) wasn’t documented within the API itself, so you’d have to dig into the API documentation to figure out exactly how they worked. If we were doing it over again now, it’s possible that we’d toss in query templates or something like it, but I suspect we wouldn’t have bothered.

Sub-resources

We never really figured out the best way to do sub-resources. Consider a checkin, which referenced one user and one spot:

// GET /checkins/131072

{
  "created_at": "2010-12-21T01:03:15-06:00",
  "message": "I am eating here under protest.",
  "url": "/checkins/131072",

  "user": {
    "first_name": "Andrew",
    "last_name": "Dupont",
    "url": "/users/savetheclocktower",
    "image_url": "http://some.crazy.cdn.url/jklyjksljkrewus.jpg",
    "hometown": "Austin, TX",
    "photos_url": "/users/savetheclocktower/photos"
  },

  "spot": {
    "name": "Red Lobster",
    "url": "/spots/15555",
    "image_url": "http://some.crazy.cdn.url/jjkpwopresas.jpg",
    "lat": -90.105324,
    "lng": 30.448674,
    "address": {
      "street_address": "123 Fake St.",
      "locality": "New Orleans",
      "region": "LA",
      "iso3166": "US"
    }
  }

  // ...
}

Now, we don’t want to dump the whole user resource into our response, but neither do we want to force someone to follow a URL to learn anything about the person who checked in. So we chose the middle ground: include a “concise” representation of the resource. In this case, the properties we show from the sub-resources are the things we’d need to know if we were rendering the checkin in a list; with this response, I can render the sentence “Andrew checked in at Red Lobster,” along with a user avatar and a spot icon, without having to make any other requests.

This eventually got crazy, though, because a sub-resource could plausibly have a half-dozen representations of varying lengths, each of which could be justified from context. For instance, if you requested a user’s checkins, you’d get a list of these:

// GET /users/savetheclocktower/checkins
{
  "checkins": [
    {
      "created_at": "2010-12-21T01:03:15-06:00",
      "message": "I am eating here under protest.",
      "url": "/checkins/131072",

      "user": {
        "first_name": "Andrew",
        "last_name": "Dupont",
        "url": "/users/savetheclocktower"
      },

      "spot": {
        "name": "Red Lobster",
        "url": "/spots/15555",
        "image_url": "http://some.crazy.cdn.url/jjkpwopresas.jpg",
        "lat": -90.105324,
        "lng": 30.448674,
        "address": {
          "street_address": "123 Fake St.",
          "locality": "New Orleans",
          "region": "LA",
          "iso3166": "US"
        }
      }
    },
    {
      "created_at": "2010-12-21T01:02:44-06:00",
      "message": "I am in need of fuel for my car.",
      "url": "/checkins/130808",

      "user": {
        "first_name": "Andrew",
        "last_name": "Dupont",
        "url": "/users/savetheclocktower"
      },

      "spot": {
        "name": "Chevron",
        "url": "/spots/91142",
        "image_url": "http://some.crazy.cdn.url/oahkhjs.jpg",
        "lat": -90.105416,
        "lng": 30.444994,
        "address": {
          "street_address": "919 Fake St.",
          "locality": "New Orleans",
          "region": "LA",
          "iso3166": "US"
        }
      }
    },

    // ...
  ]
}

Here, the spot resource is using the same representation that it did for an individual checkin, but the user resource is much more sparse. Why? Because (a) in this response, all the checkins are guaranteed to be from the same user, and the redundancy bothered the hell out of me; (b) chances are you followed this URL from the response for /users/savetheclocktower and thus already have the full representation of this user.

If you were to ask for a single spot’s checkins, the situation would be reversed — the user representation would be the same as for a single checkin, but the spot representation would be as minimal as possible.

We managed this complexity as best we could. First we added a to_public_json method on models — so named because it wasn’t trying to be exhaustive like to_json; it merely wanted to expose properties that would be relevant for a public API. It optionally took a symbol argument that would specify a named represenation, much like DateTime#to_formatted_s lets you choose between date formats. When even that got too complicated, Brad Fults wrote an awesome thing called Boxer that centralized all this logic in a place that was neither a controller nor a model.

I’d always wished for YAML-style anchors and references in JSON, but I didn’t want to do anything crazy with our JSON responses that put an extra burden on API consumers. Still, if I were to do it over again, I’d probably do something like this:

// (hypothetically)
// GET /users/savetheclocktower/checkins

{
  "includes": {
    "users": {
      "savetheclocktower": {
        "first_name": "Andrew",
        "last_name": "Dupont",
        "url": "/users/savetheclocktower",
        "image_url": "http://some.crazy.cdn.url/jklyjksljkrewus.jpg",
        "hometown": "Austin, TX",
        "photos_url": "/users/savetheclocktower/photos"
      }
    },
    "spots": {
      "15555": {
        "name": "Red Lobster",
        "url": "/spots/15555",
        "image_url": "http://some.crazy.cdn.url/jjkpwopresas.jpg",
        "lat": -90.105324,
        "lng": 30.448674,
        "address": {
          "street_address": "123 Fake St.",
          "locality": "New Orleans",
          "region": "LA",
          "iso3166": "US"
        }
      },
      "91142": {
        "name": "Chevron",
        "url": "/spots/91142",
        "image_url": "http://some.crazy.cdn.url/oahkhjs.jpg",
        "lat": -90.105416,
        "lng": 30.444994,
        "address": {
          "street_address": "919 Fake St.",
          "locality": "New Orleans",
          "region": "LA",
          "iso3166": "US"
        }
      }
    }
  },

  "checkins": [
    {
      "created_at": "2010-12-21T01:03:15-06:00",
      "message": "I am eating here under protest.",
      "url": "/checkins/131072",

      "user": { "include": "/users/savetheclocktower" },
      "spot": { "include": "/spots/15555" }
    },
    {
      "created_at": "2010-12-21T01:02:44-06:00",
      "message": "I am in need of fuel for my car.",
      "url": "/checkins/130808",

      "user": { "include": "/users/savetheclocktower" },
      "spot": { "include": "/spots/91142" }
    },
    // ...
  ]
}

All sub-resources would get put into a hierarchical repository at the root of the response, and the structure of that repository would mirror the URL structure, so that when you saw an object with an “include” property, you could try to look it up locally and then fall back to another HTTP request if necessary. This is probably overkill, but dammit, if I’m going to introduce an extra-language convention into JSON, I’m going to give it some style.

The Verdict

On reflection, I think we did pretty well, especially considering that these decisions were made incrementally over the course of two years. I can think of only one instance when the API design painted us into a corner, and that’s the story I’ll save for next time.

Comments

  1. Andrew , Thanks for sharing. This it was a great article. I was wondering how you handle getting sub resources internally and populating the final response?

  2. Thanks, Sergey.

    Over the life of Gowalla, we generated API JSON three different ways. But there’s one constant across all three phases: we were always working with an ActiveRecord object, and so a resource’s associations were always within reach. If we were rendering a spot, we had a spot object, and spot.user would always give us back a user. If we had a trip object, the spots that belonged to the trip were accessible via trip.spots.

    Phase 1: we generated the JSON with ERB. An API request to /users/savetheclocktower would hit the show action on the users controller, which would in turn grab some data and send it to a template called show.json.erb. Inside that ERB file, we’d take the result and manually build a hash with the attributes we wanted to return. If there was a subresource, we’d build a hash representation for it (staying in the same template) and attach it to the first hash. If it was a collection of subresources, we’d do the same thing, but over the entire collection. (Ruby’s Enumerable#map is a godsend for this sort of thing.)

    So for phase 1, we went into each sub-resource manually and picked which of its attributes would be serialized for the parent object’s response.

    Phase 2: we had to_public_json methods on each model object, and each would return a hash that was ready to be serialized via to_json. As I mentioned, such methods would optionally take a symbol argument specifying which “kind” of representation we wanted. So the method might look like this:

    class Checkin < ActiveRecord::Base
      def to_public_json(format=:full)
        hash = {
          :message    => self.message,
          :created_at => self.created_at,
          :spot       => self.spot.to_public_json,
          :user       => self.user.to_public_json
        }
        
        # Perhaps attach other properties, depending on which format we wanted
        if format == :full
          # et cetera
        end
        
        hash
      end
    end

    So in phase 2, we relied on the sub-resources to serialize themselves with their own to_public_json methods. Of course, sometimes we wanted to control the format of sub-resources’ serializations, so we’d do something like this:

    class Checkin < ActiveRecord::Base
      def to_public_json(format=:full, options)
        hash = {
          :message    => self.message,
          :created_at => self.created_at,
          :spot       => self.spot.to_public_json(options[:spot_format] || :short),
          :user       => self.user.to_public_json(options[:user_format] || :short)
        }
        
        # ...
      end
    end

    This worked well enough, but it lived in the models, and the format of an API response probably shouldn’t be a model’s concern.

    Thus, phase 3 was Boxer. Boxer’s README has some great examples of how it would handle the above situation.

    I hope I understood your question right. If this doesn’t answer it, please let me know.

  3. Hey Andrew, I’m interested to know how the API used the resource’s URL as its unique identifier rather than a raw ID? Assuming resources themselves still used a raw ID, at what point did you dissect it from the URL?

  4. Ryan, Rails routes handled that for us for 99% of cases. By the time a request hit the controller, the URL had been split into its parts, and there was a params[:id] waiting for us.

    If, for some reason, we had (e.g.) a spot URL and we had to turn it into a spot record, we’d define a method like Spot.find_by_url that would handle turning the URL into something that we could use to query the database. I know a few of our models had such a method, but I don’t remember exactly when we needed it.

  5. Andrew, You answered my question precisely. I agree that the API response shouldn’t be on the model but with attributes on the model or reflection we can help the API response serializer generate a response. The phase 3 Boxer seems very interesting, unfortunately I’m not doing this in Ruby.

Painfully Obvious was built with WordPress, Prototype, Slicehost, and other accoutrements. Colophon →