Tanya and I have been evaluating authentication use cases against available solutions for Pulp 3. This email includes a summary of the goals we identified, based input from numerous stakeholders, and a working proposal for a set of technologies we can use to achieve them. Please provide feedback, ask questions, and suggest alternatives if you know of something that may be better. I hope to turn this thread into a set of redmine tasks and stories soon.
Please note that this covers authentication only. Authorization is a related, but distinct, problem that we think can be solved separately.
Planning thus far has been tracked here:
- As a user, I can authenticate to the REST API when pulp is the authn authority.
- As a user, I can authenticate to the REST API using an external authority such as FreeIPA or AD.
- Participate in the django auth ecosystem so we don't have to reinvent things, and so we can integrate well with django add-ons.
- The REST API does not and should not have sessions. Any persistent token should be used for auth only.
- Authenticating to external authorities is hard. To the extent that it's reasonable to do so, leverage other tools for this.
The working proposal is to use djangorestframework-jwt for token-based authentication to the REST API, and leverage a set of apache modules to handle authentication and retrieval of user attributes from an external authority.
Pulp 2 supports client SSL certificates as an option for authentication, and as the only option when using the "login" feature. Client SSL certs are a well-known and tested standard that is robust. However, their complexity has caused continued friction in the user experience. Situations such as an expired certificate, changing the CA on the server, or restricted filesystem access to a certificate are difficult to diagnose in large part because SSL libraries do a poor job of error reporting. Connection negotiation fails, and it can be unclear why.
From a development perspective, working with client ssl certificates can be challenging, such as the requirement from many libraries to provide a path to a certificate on disk.
As such, while client SSL certificates would be a viable solution for pulp 3, a token-based authentication approach would be simpler and more in-line with how other APIs handle authentication.
Token authentication may be marginally less secure than client SSL certificates, since the entire token must be sent with every request. However, in order for that to be compromised (assuming https is in use), a third party would need the ability to eavesdrop and decrypt the ssl traffic.
Django and DRF (django rest framework) provide basic auth support out of the box, including password management. This can be enabled for the entire REST API and any other views we want.
Most modern network-based APIs use some variety of token authentication. A token is a string obtained after proving identity (through basic auth or some other means), which is used to prove identity for future requests by including it in an Authorization header.
Authenticating against an external authority can be an expensive operation, particularly in terms of latency and load on the external service. Using tokens allows the external authn to take place only once per user within some period of time.
Tokens normally do not require server-side state. (There are exceptions, such as DRF's own token support, which stores a random string in the DB just because it's convenient.) This reduces database dependence and use, which has slight performance benefits and would allow services to respond to requests even when the database is unavailable. It also allows trust to easily be delegated among services.
One advantage that client SSL certs have over tokens is that they can be explicitly revoked (although doing so may not be easy). Tokens can be indirectly revoked if they contain an "issued" timestamp. For example, it's normal for a token-based authn system to reject a token that was issued before the user's most recent password reset. That said, if tokens are only being used for authn, user access can and should still be enforced by authz, so that is another mechanism for turning off a user's access.
DRF includes built-in token support, but it stores tokens in the database. This adds the characteristics of a session identifier, which is not desirable for a stateless REST API.
JSON Web Tokens are a popular open standard (RFC 7519) commonly used for API authentication. They are widely supported by many libraries in many languages, simple to use, small, and can include arbitrary data as is useful to the issuing application. Validity is verified by signature and an optional expiration time.
djangorestframework-jwt is a recommended library that works out of the box, and it is our current proposal for use with Pulp 3. I'll refer to it as "drf-jwt" for brevity.
How to Get a Token
drf-jwt comes with a view that issues a token after successful basic auth, integrating with django's user ecosystem. There is also a view for renewing a token.
Given that pulp needs to authenticate also to external sources, the view used to obtain a token needs to be extended for that support. It's a simple integration point, but before diving more into drf-jwt, let's look at that integration.
The FreeIPA project has a comprehensive guide that is worth reading if you are interested in the topic:
They advise that utilizing any of several apache modules for external auth is a best practice, combined with mod_lookup_identity to pass user attributes to a web application. The primary advantage is that this offloads authentication work, which can be very complex, to a separate project that specializes in it.
One downside is that it does potentially limit deployment to apache httpd. But there is work in-progress to make similar modules available for nginx. Presumably users who do not require external auth could use a non-apache web server. Another option is that there are some plugins available for django that support specific types of external authn without requiring apache modules, such as django-auth-ldap.
mod_lookup_identity is the recommended solution by the FreeIPA project for allowing a web app to discover user identity and related attributes from a trusted authentication source. It uses SSSD to lookup attributes, and then it sets various REMOTE_USER_* environment variables within the context of a request. Any web application can then trust those values, making it a simple integration point.
Commonly-available attributes include username, email, first name, last name, and group membership.
Auto-Creation of Users
In addition to looking for and trusting the REMOTE_USER environment variable, the drf-jwt token creation view would be extended to automatically create and update users. When invoked, it would:
- trust the REMOTE_USER value for authentication
- if the user exists in the DB, update its attributes with the other REMOTE_USER_* values
- if the user does not exist in the DB, create it based on the REMOTE_USER_* values
Group membership is an interesting aspect. One compelling approach we saw is to auto-create groups and prefix their names with something like "ext:". A user in the "ops" group in their enterprise directory would get put in an auto-generated pulp group called "ext:ops".
Authz will merit its own planning, but presumably we will include the ability to authorize based on group membership. This would provide a nice integration story where a new employee could be added to a group by HR in their enterprise directory, and they would automatically get the corresponding permissions in pulp.
The working proposal is to use djangorestframework-jwt, and extend it to trust the REMOTE_USER_* environment variables when creating tokens. External authentication would be done by one of several apache modules, thus requiring no explicit support in pulp.
Please ask questions and provide feedback.
A big thank-you goes to Tanya, who kept this investigation going and did a ton of the analysis. Also thank you to the many users who provided input, which we will continue to apply while planning authz.