Overview
Identity and authentication is at the core of a secure and manageable cloud platform. This document describes the overall architecture and specific mechanisms used by the OpenMethods cloud. Access to cloud user interfaces require the user to be authenticated. Any cloud component that accesses or manipulates cloud data must also be authenticated.
Success metrics
Goal | Metric |
---|---|
Assumptions
Milestones
Requirements
Requirement | User Story | Importance | Jira Issue | Notes | |
---|---|---|---|---|---|
1 | HIGH | ||||
2 |
|
|
|
|
|
User interaction and design
Identity Realm
This component is responsible for managing a set of users including basic user information and user credentials. It also issues and manages security tokens for managed users who authenticate with the realm. The Identity Realm component does not provide a REST API directly, but is meant to be bundled into a more complete package such as the Authentication Server and Cloud Manager. It also does not manage a user’s permissions which are centrally controlled from the Cloud Manager.
Authentication Server
An Authentication Server contains an Identity Realm and provides a UI for interactive logins, an API for service logins, and an API for managing security tokens. An Authentication Server is deployed for each cloud shard and ensures the Personally Identifiable Information (PII) for managed users remain within the shard’s home AWS region. This is important for shards, such as GDPR, that have regulatory requirements for data location and lifecycle.
Cloud Manager
The Cloud Manager provides myriad services related to the operation and management of the OpenMethods cloud including playing key roles in user authentication and permission management. The Cloud Manager contains an Identity Realm that is responsible for handling logins and security tokens for Cloud Team users and shared infrastructure systems. It provides similar APIs for service logins and security tokens as the Authentication Server, but the interactive login is handled by the Cloud Manager Admin Console instead of being an independent UI.
In addition to the identity information for some users, the Cloud Manager is responsible for storing and providing access to extended information for all users regardless of home identity realm. Cloud Manager includes a component that manages the extended user information and exposes a secure REST API for retrieving this information. The REST API requires a valid security token for a user that has the appropriate permissions. Changes to a user’s extended information, including user permissions, can only be made through the Public Console by an authenticated user with appropriate permissions.
User Authentication
There are two types of cloud users: Humans and Systems. There are two styles of login: UI and API. Human logins are restricted to the user interface and cannot be used with the Login API. System logins use the Login API and will not work in the user interface. Humans access the login screen using a browser and provide their username (email address) and password to authenticate. Systems use the Login API and provide the username (special value) and password (special value). Once the user’s credentials are validated, a JWT is generated for the user. A Human user is only allowed to maintain a single active token with any existing tokens being immediately revoked. A System user can have any number of active tokens, each with their own lifecycle. There may be some identifiable information related to the specific instance of a system user that can be used to unsure only a single token is active for that instance.
Human users access the main login screen by visiting their home shard’s authentication server. This will be available at login[.shard].openmethodscloud.com. If the user attempts to visit the public console directly, the public console will redirect the user to the authentication server login page. After the login is successful, a cookie will automatically be set containing the JWT and associated with the shard’s domain.
Roles
Permissions
Extended User Information
User Assigned Roles
User Assigned Permissions
Token Leasing Values
There is a separation between authenticating a user and retrieving the properties associated with a user, including a user’s permissions. The user’s token DOES NOT include the user’s extended information. This information must be retrieved in a separate request to the cloud manager. This request uses the token of the system or human user making the request and provides the token of the user whose information is being requested. The Cloud Manager first determines if the requester token is authentic and valid. If the requester token is authentic and valid, the Cloud Manager then determines if the requester has the RETRIEVE_EXTENDED_INFORMATION permission. If the requester is authentic, valid, and has the correct permissions, the Cloud Manager determines if the query token is authentic and valid. If the query token is authentic and valid, the query user’s extended information is returned to the requester.
Token Renewal
When a token is issued for a user, the time it remains valid is limited. After time expires, the token is no longer valid and the user must log in again. API requests using an expired token will fail. The token expiration can be extended through renewal. This involves making a request to the Login API and providing the original token as well as the security stamp returned during the initial login. The renewal can occur any time prior to expiry. An updated token will be issued to the requester.
Token Leasing
For APIs that receive multiple requests from the same client in quick succession, the process of validating the client’s token can get very expensive and would produce significant traffic directed at the authentication service. To avoid this situation, a service can choose to lease a token’s validation result. Part of the user’s extended information is values related to token leasing. These values represent the length of time a previous validation can be used to service the request without having to revalidate the token.
The type of request determines which value is used. The length of time for read operations is typically a longer time frame as these types of requests are generally safe. Some write operations are only moderately dangerous so can be performed within a shorter window of time. Security critical or destructive operations like deletes should typically only be performed in conjunction with a token validation so usually don’t allow a lease window. Regardless of the type of request that requires a new validation, the clock for all leases are refreshed.
Scenario 1 - Valid Cache Hit
An initial request comes in and the token is validated. A read request comes in 6 seconds after the initial request. The token is found in the cache. The last validated property of the cached token is checked to see if the new request falls into the token’s read window by adding the read lease time to the last validated time and comparing it to the time of the request: lastValidated + readLease < currentTime. Since the new request falls into the read window, the request is processed as if the token is valid without explicitly validating the token with the issuer.
Scenario 1 - Invalid Cache Hit
An initial request comes in and the token is validated. A write request comes in 6 seconds after the initial request. The token is found in the cache. The last validated property of the cached token is checked to see if the new request falls into the token’s write window by adding the write lease time to the last validated time and comparing it to the time of the request: lastValidated + writeLease < currentTime. Since the new request falls outside the write window, the token is validated with the issuer before the request can be processed. Once validated, the last validated property is updated and the lease windows begin again.
Token Revocation
A token can be revoked by another user that has the appropriate permissions. As a result of features like high availability of authentication servers and token leasing, revoking a token won’t be immediate. There may be some time lag between when a token is revoked and API requests being aware of the revocation. The maximum amount of time a token may remain in effect is the authentication server cache cycle plus the longest token lease time frame. In most cases the revocation will be in effect within 30 seconds and typically within a much shorter period of time.
Token Caching
All tokens are stored in a database that is shared by all authentication servers servicing a user realm. For large user realms, the number of token related requests can negatively impact the overall performance of the authentication server. To avoid accessing the database for every request the authentication server employs a token caching strategy. When a token is issued after a successful login, it is stored in the database immediately. The token is also placed in the cache. When a validation request arrives for a token, the authentication server first looks for the token in the cache. If the token is found in the cache and is valid the token is successfully validated. If the token is found in the cache but is invalid, the token is removed from the cache. If the cached token was invalid or is not in the cache, a request is made to the database to retrieve the token. If the token is found and is valid it is placed in the cache and is returned. If the token is not found or is not valid, the validation request is denied.
The cache mechanism must also be periodically cleaned up to avoid validating revoked tokens and leaking memory over time. The cache clearing process is run in the background and not related to incoming requests. When the authentication server starts up, a timestamp is taken and stored as the previous cleanup cycle time. A timer is created to launch the cleanup cycle and scheduled to fire after the clean up cycle cool down time. When the cleanup cycle is started, a new timestamp is taken and stored in a temporary variable. A database request is made to update all tokens in the ACTIVE state whose expire time has passed. A second database request is made to retrieve all tokens not in the ACTIVE state whose end date is after the previous cleanup cycle time. All tokens returned in the list of newly ended tokens are removed from the cache. Finally the temporary timestamp is stored in the previous cleanup cycle time variable and the cleanup process is scheduled again.
Authentication Server API
/loginUI
Only accepts credentials for human type users.
POST
{ "username": "[username]", "password": "[password]" }
Success Response - 200 OK
Sets a cookie containing the JWT based on the authentication server domain used.
{ "JWT": [JWT Content], "securityStamp": "[security stamp used for renewal]" }
/loginSystem
Only accepts credentials for system type users.
POST
{ "username": "[username]", "password": "[password]", "instanceId": "[instance identifiable information]" }
Success Response - 200 OK
{ "JWT": [JWT Content], "securityStamp": "[security stamp used for renewal]" }
/validateToken?critical=true|false
Determines if the provided token is valid.
POST
{ "JWT": [JWT Content] }
Success Response - 200 OK
No response content
/renewToken
Extends the expiry time of the provided token.
POST
{ "JWT": [JWT to Renew. Can be provided as cookie], "securityStamp": "[previous security stamp]" }
Success Response - 200 OK
If JWT was provided as a cookie, JWT is updated with returned content.
{ "JWT": [JWT Content], "securityStamp": "[replacement security stamp used for next renewal]" }
/logoutToken
Logs the provided token out. This is considered a normal end of a token’s lifecycle in contrast to expiry or revocation.
POST
{ "JWT": [JWT to logout. Can be provided as cookie], "securityStamp": "[previous security stamp]" }
Success Response - 200 OK
If JWT was provided as a cookie, the cookie is removed.
No response content
/revokeToken
Cancels the provided token. Requires the CANCEL_TOKEN permission. This is a non normal end of a token’s lifecycle.
POST
{ "authJWT": [JWT of canceling user. Can be provided as cookie], "JWT": [JWT to Cancel], "securityStamp": "[previous security stamp]" }
Success Response - 200 OK
No response content
Open Questions
Question | Answer | Date Answered |
---|---|---|
0 Comments