why is OAuth2 still hard

Abstract

  1. the OAuth standard is just too big and complex
  2. everybody’s OAuth is different in subtle ways
  3. many APIs add nonstandard extensions to OAuth
  4. debugging OAuth flows is hard
  5. cumbersome approvals to build on top of APIs
  6. OAuth security is hard and a moving target
  7. other things like:
  • How to securely store these access tokens and refresh tokens. They are like passwords to your users’ accounts. But hashing is not an option; you need secure, reversible encryption.
  • Checking that the granted scopes match the requested scopes (some APIs allow users to change the scopes they grant in the authorize flow).
  • Avoiding race conditions when refreshing tokens.
  • Detecting access tokens revoked by the user on the provider side.
  • Letting users know that access tokens have expired, so they can re-authorize your app if needed.
  • How to revoke access tokens you no longer need (or that the user has requested you delete under GDPR).
  • Changes in available OAuth scopes, provider bugs, missing documentation, and so on.

OAuth is a standard protocol. Right? And there are client libraries for OAuth 2.0 available in basically every programming language you can imagine.

You might conclude that, armed with a client library, you would be able to implement OAuth for any API in about 10 minutes. Or at least in an hour.

If you manage, please email us — we’d like to treat you to a delicious dinner and hear how you did it.

OAuth in practice

We implemented OAuth for the 50 most popular APIs, such as Google (Gmail, Calendar, Sheets etc.), HubSpot, Shopify, Salesforce, Stripe, Jira, Slack, Microsoft (Azure, Outlook, OneDrive), LinkedIn, Facebook and other OAuth APIs.

Our conclusion: The real-world OAuth experience is comparable to JavaScript browser APIs in 2008. There’s a general consensus on how things should be done, but in reality every API has its own interpretation of the standard, implementation quirks, and nonstandard behaviors and extensions. The result: footguns behind every corner.

If it weren’t so annoying, it would be quite funny. Let’s dive in!

Problem 1: The OAuth standard is just too big and complex

“This API also uses OAuth 2.0, and we already did that a few weeks ago. I should be done by tomorrow.”
– Famous last words from the intern

OAuth is a very big standard. The OAuth 2.0’s official site currently lists 17 different RFCs (documents defining a standard) that together define how OAuth 2 works. They cover everything from the OAuth framework and Bearer tokens to threat models and private key JWTs.

“But,” I hear you say, “surely not all of these RFCs are relevant for a simple third-party-access token authorization with an API?”
You’re right. Let’s focus only on the things that are likely to be relevant for the typical API third-party-access use case:

  • OAuth standard: OAuth 2.0 is the default now, but OAuth 1.0a is still used by some (and 2.1 is around the corner). Once you know which one your API uses, move on to:
  • Grant type: Do you need authorization_code, client_credentials, or device_code? What do they do, and when should you use each of them? When in doubt, try authorization_code.
  • Side note: Refresh tokens are also a grant type, but kind of a special one. How they work is standardized, but how you ask for them in the first place is not. More on that later.
  • Now that you’re ready for your requests, let’s look at the many (72, to be precise) official OAuth parameters with a defined meaning and behavior. Common examples are prompt, scope, audience, resource, assertion, and login_hint. However, in our experience, most API providers seem to be as oblivious to this list as you probably were until now, so don’t worry too much about it.

If you think this still feels too complicated and like a lot to learn, we tend to agree with you.

Most teams building public APIs seem to agree as well. Instead of implementing a full OAuth 2.0 subset, they just implement the parts of OAuth they think they need for their API’s use case. This leads to pretty long pages in docs outlining how OAuth works for this particular API. But we have a hard time blaming them; they have only the best intentions in mind for their DX. And if they truly tried to implement the full standard, you’d need to read a small book!

The Salesforce authorization_code OAuth flow. What’s not to like about a clear visual for this simple 10-step process?

The trouble is that everybody has a slightly different idea of which subset of OAuth is relevant for them, so you end up with lots of different (sub-) implementations.

Problem 2: Everybody’s OAuth is different in subtle ways

As every API implements a different subset of OAuth, you quickly get into a situation where you are forced to read their long pages of OAuth docs in detail:

  • Which parameters do they require in the authorize call?
    • For Jira, the audience parameter is key (and must be set to a specific fixed value). Google prefers to handle this through different scopes but really cares about the prompt parameter. Meanwhile, somebody at Microsoft discovered the response_mode parameter and demands that you always set it to query.
    • The Notion API takes a radical approach and does away with the ubiquitous scope parameter. In fact, you won’t even find the word “scope” in their API docs. Notion calls them “capabilities,” and you set them when you register the app. It took us 30 confused minutes to understand what was going on. Why did they reinvent this wheel?
    • It gets worse with offline_access: Most APIs these days expire access tokens after a short while. To get a refresh token, you need to request “offline_access,” which needs to be done through a parameter, a scope, or something you set when you register your OAuth app. Ask your API or OAuth doctor for details.
  • What do they want to see in the token request call?
    • Some APIs, like Fitbit, insist on getting data in the headers. Most really want it in the body, encoded as x-www-url-form-encoded, except for a few, such as Notion, which prefer to get it as JSON.
    • Some want you to authenticate this request with Basic auth. Many don’t bother with that. But beware, they may change their mind tomorrow.
  • Where should I redirect my users to authorize?
    • Shopify and Zendesk have a model in which every user gets a subdomain like {subdomain}.myshopify.com. And yes, that includes the OAuth authorization page, so you’d better build dynamic URLs into your model and frontend code.
    • Zoho Books has different data centers for their customers in different locations. Hopefully, they remember where their data resides: To authorize your app, your U.S. customers should go to https://accounts.zoho.com, Europeans can visit https://accounts.zoho.eu, and Indians are welcome at https://accounts.zoho.in. The list goes on.
  • But at least I can pick my callback URL, no?

We could go on for a long time, but we think you probably get the point by now.

OAuth is too complex; let’s make a simpler version of OAuth that has everything we need! ©XKCD

Problem 3: Many APIs add nonstandard extensions to OAuth

Even though the OAuth standard is vast, many APIs still seem to find gaps in it for features they need. A common issue we see is that you need some data in addition to the access_token to work with the API. Wouldn’t it be neat if this additional data could be returned to you together with the access_token in the OAuth flow?

We actually think this is a good idea — orr at least it’s better than forcing users to do quirky additional API requests afterward to fetch this information (looking at you, Jira). But it does mean more nonstandard behavior that you specifically need to implement for every API.

Here’s a small list of nonstandard extensions we have seen:

  • Quickbooks employs a realmID, which you need to pass in with every API request. The only time they tell you this realmID is as an additional parameter in the OAuth callback. Better store it somewhere safe!
  • Braintree does the same with a companyID
  • Salesforce uses a different API base URL for each customer; they call this the instance_url. Thankfully, they return the instance_url of the user together with the access token in the token response, but you do need to parse it out from there and store it.
  • Unfortunately, Salesforce also does even more annoying things: Access tokens expire after a preset period of time, which can be customized by the user. Fine so far, but for some reason they don’t tell you in the token response when the access token you just received will expire (everybody else does this). Instead, you need to query an additional token details endpoint to get the (current) expiration date of the token. Why, Salesforce, why?
  • Slack has two different types of scopes: scopes you hold as a Slack bot and scopes that allow you to take action on behalf of the user who authorized your app. Smart, but instead of just adding different scopes for each, they implemented a separate user_scopes parameter that you need to pass in the authorization call. You’d better be aware of this, and good luck finding support for this in your OAuth library.

For the sake of brevity and simplicity, we’re skipping the many not-really-standard OAuth flows we have encountered.

Problem 4: “invalid_request” — debugging OAuth flows is hard

Debugging distributed systems is always hard. It’s harder when the service you’re working with uses broad, generic error messages.

OAuth2 has standardized error messages, but they’re about as useful in telling you what’s going on as the example in the title above (which, by the way, is one of the recommended error messages from the OAuth standard).

You could argue that OAuth is a standard and that there are docs for every API, so what is there to debug?
A lot. I cannot tell you how often the docs are wrong. Or missing a detail. Or have not been updated for the latest change. Or you missed something when you first looked at them. A good 80% of the OAuth flows we implement have some problem upon first implementation and require debugging.

How did Randall observe me while I was debugging OAuth flows? ©XKCD

Some flows also break for, what seem to be, random reasons: LinkedIn OAuth, for instance, breaks if you pass in PKCE parameters. The error you get? “client error - invalid OAuth request.” That is … telling? It took us an hour to understand that passing in (optional, usually disregarded) PKCE parameters is what breaks the flow.

Another common mistake is sending scopes that don’t match the ones you preregistered with the app. (Preregister scopes? Yes, a lot of APIs these days demand that.) This often results in a generic error message about there being an issue with scopes. Duh.

Problem 5: Cumbersome approvals to build on top of APIs

The truth is, if you build toward some other system by using their API, you’re probably in the weaker position. Your customers are asking for the integration because they’re already using the other system. Now you need to make them happy.

To be fair, many APIs are liberal and provide easy self-service signup flows for developers to register their apps and start using OAuth. But some of the most popular APIs out there require reviews before your app becomes public and can be used by any of their users. Again, to be fair, most review processes are sane and can be completed in a few days. They’re probably a net gain in terms of security and quality for end users.

But some notorious examples can take months to complete, and some even require you to enter into revenue-share agreements:

  • Google requires a “security review” if you want to access scopes with more sensitive user data, such as email contents. We have heard these reviews can take days or weeks to pass and require a nontrivial amount of work on your side.
  • Looking to integrate with Rippling? Get ready for their 30-plus questions and security preproduction screening. We hear access takes months (if you are approved).
  • HubSpot, Notion, Atlassian, Shopify, and pretty much everybody else who has an integrations marketplace or app store requires a review to get listed there. Some reviews are mild, and some ask you for demo logins, video walkthroughs, blog posts (yes!), and more. However, listing on the marketplace or store is often optional.
  • Ramp, Brex, Twitter, and a good number of others don’t have a self-service signup flow for developers and require that you fill in forms for manual access. Many are quick to process requests, but we’re still waiting to hear back from some after weeks.
  • Xero is a particularly drastic example of a monetized API: If you want to exceed a limit of 25 connected accounts, you have to become a Xero partner and list your app in their app store. They will then take (as of the time of this writing) a 15% revenue cut from every lead generated from that store.

Problem 6: OAuth security is hard and a moving target

As attacks have been uncovered, and the available web technologies have evolved, the OAuth standard has changed as well. If you’re looking to implement the current security best practices, the OAuth working group has a rather lengthy guide for you. And if you’re working with an API that is still using OAuth 1.0a today, you realize that backwards compatibility is a never-ending struggle.

Luckily, security is getting better with every iteration, but it often comes at the cost of more work for developers. The upcoming OAuth 2.1 standard will make some current best practices mandatory and includes mandatory PKCE (today only a handful of APIs require this) and additional restrictions for refresh tokens.

At least OAuth already implements a two-factor auth model. ©XKCD

The biggest change has probably been ushered in with expiring access tokens and the rise of refresh tokens. On the surface, the process seems simple: Whenever an access token expires, refresh it with the refresh token and store the new access token and refresh token.

In reality, when we implemented this we had to consider:

  • Race conditions: How can we make sure no other requests run while we refresh the current access token?
  • Some APIs also expire the refresh token if you don’t use it for a certain number of days (or if the user has revoked the access). Expect some refreshes to fail.
  • Some APIs issue you a new refresh token with every refresh request …
  • … but some also silently assume that you will keep the old refresh token and keep on using it.
  • Some APIs will tell you the access token expiration time in absolute values. Others only in relative “seconds from now.” And some, like Salesforce, don’t divulge this kind of information easily.

Last but not least: Some things we haven’t talked about yet

Sadly, we have only just scratched the surface of your OAuth implementation. Now that your OAuth flow runs and you get access tokens, it’s time to think about:

  • How to securely store these access tokens and refresh tokens. They are like passwords to your users’ accounts. But hashing is not an option; you need secure, reversible encryption.
  • Checking that the granted scopes match the requested scopes (some APIs allow users to change the scopes they grant in the authorize flow).
  • Avoiding race conditions when refreshing tokens.
  • Detecting access tokens revoked by the user on the provider side.
  • Letting users know that access tokens have expired, so they can re-authorize your app if needed.
  • How to revoke access tokens you no longer need (or that the user has requested you delete under GDPR).
  • Changes in available OAuth scopes, provider bugs, missing documentation, and so on.

A better way?

If you’ve read this far, you might be thinking, “There must be a better way!”

We think there is, which is why we’re building Nango: An open-source, self-contained service that comes with prebuilt OAuth flows, secure token storage, and automatic token refreshes for more than 90 OAuth APIs.

If you give it a try, we’d love to hear your feedback. And if you want to share your worst OAuth horror story with us, we’d love to hear about it in our Slack community.

Thanks for reading and happy authorizing!