Cloud Authentication

Target release
Epic
Document status	DRAFT
Document owner	Trip Gilman
Designer
Tech lead
Technical writers
QA

Overview

Identity and authentication is at the core of a secure and manageable cloud platform. This document describes the overall architecture and specific mechanisms used by the OpenMethods cloud. Access to cloud user interfaces require the user to be authenticated. Any cloud component that accesses or manipulates cloud data must also be authenticated.

Success metrics

Goal	Metric

Assumptions

Requirements

	Requirement	User Story	Importance	Jira Issue	Notes
1			HIGH
2

User interaction and design

Identity Realm

This component is responsible for managing a set of users including basic user information and user credentials. It also issues and manages security tokens for managed users who authenticate with the realm. The Identity Realm component does not provide a REST API directly, but is meant to be bundled into a more complete package such as the Authentication Server and Cloud Manager. It also does not manage a user’s permissions which are centrally controlled from the Cloud Manager.

Authentication Server

An Authentication Server contains an Identity Realm and provides an API for service logins and an API for managing security tokens. An Authentication Server is deployed for each cloud shard and ensures the Personally Identifiable Information (PII) for managed users remain within the shard’s home AWS region. This is important for shards, such as GDPR, that have regulatory requirements for data location and lifecycle. A second type of Authentication Server is designed to integrate with a customer’s Active Directory (AD) or LDAP domain where the PII is stored and managed externally while the tokens are still managed locally.

Cloud Manager

The Cloud Manager provides myriad services related to the operation and management of the OpenMethods cloud including playing key roles in user authentication and permission management. The Cloud Manager contains an Identity Realm that is responsible for handling logins and security tokens for Cloud Admin users and shared infrastructure systems. It provides similar APIs for service logins and security tokens as the Authentication Server. It also provides the UI for all interactive logins for access to the Cloud Manager Admin Console.

In addition to the identity information for some users, the Cloud Manager is responsible for storing and providing access to extended information for all users regardless of home identity realm. Cloud Manager includes a component that manages the extended user information and exposes a secure REST API for retrieving this information. The REST API requires a valid security token for a user that has the appropriate permissions. Changes to a user’s extended information, including user permissions, can only be made through the Public Console by an authenticated user with appropriate permissions.

User Authentication

There are two types of cloud users: Humans and Systems. Human logins are restricted to the user interface and cannot be used with the Login API. Humans access the login screen using a browser and provide their username (email address) and password to authenticate. Once the user’s credentials are validated, a JWT is generated for the user. A Human user is only allowed to maintain a single active token. Issuing a new token for a Human user automatically revokes any existing tokens for that user. Human users access the main login screen by visiting Cloud Management Console. This will be available at console[.shard].openmethodscloud.com and is hosted by the Cloud Manager. After the login is successful, a cookie will automatically be set containing the JWT and associated with the shard’s domain.

System logins use the Login API and will not work in the user interface. Systems provide a username (special value) and password (special value). A System user can have any number of active tokens, each with their own lifecycle. There may be some identifiable information related to the specific instance of a system user that can be used to unsure only a single token is active for that instance.

Roles

A role is a predefined set of permissions that can be applied to a user. A role typically groups permissions based on the type of activities being enabled. There are 4 standard role types: VIEW, MANAGE, ADMIN, and SUPER. The role types generally inherit the capabilities of lower role types. For example, the MANAGE role type includes all the permissions associated with the VIEW role type. It would be hard to MANAGE something if you can’t VIEW it to begin with. A role also has a scope that determines which objects are applicable to a role’s permissions. There are 3 role scopes: DEPLOYMENT, CUSTOMER, SYSTEM. DEPLOYMENT scoped roles hold permissions tied to a specific customer deployment. CUSTOMER scoped roles hold permissions related to a customer and any of their deployments. SYSTEM scoped roles hold permissions related to overall system operation, customer activities, and customer deployments.

A user can have any number of roles assigned. A user must be associated with a customer before a customer or deployment scoped role can be assigned. Similarly a user must be associated with the system level object before a system level role can be assigned.

Each customer will have 3 roles automatically created. These default roles will reflect the VIEW, MANAGE, and ADMIN role types and will contain all permissions for those types and are scoped for that customer and all deployments owned by that customer. Assigning the customer VIEW role will allow a user to view all customer information and all information for every deployment. Each deployment will also have 3 roles automatically created that reflect the VIEW, MANAGE, and ADMIN role types. These roles are specific to the deployment.

Permissions

A permission represents the ability to perform some action within the system. Similar to roles, permissions follow the standard types of VIEW, MANAGE, ADMIN, AND SUPER. Permissions also are targeted for a specific scope of DEPLOYMENT, CUSTOMER, or SYSTEM. A permission can be assigned to either a role or to a user directly.

Extended User Information

User Assigned Roles
User Assigned Permissions
Token Leasing Values

There is a separation between authenticating a user and retrieving the properties associated with a user, including a user’s permissions. The user’s token DOES NOT include the user’s extended information. This information must be retrieved in a separate request to the cloud manager. This request uses the token of the system or human user making the request and provides the id of the user whose information is being requested. The Cloud Manager first determines if the requester token is authentic and valid. If the requester token is authentic and valid, the Cloud Manager then determines if the requester has the RETRIEVE_EXTENDED_INFORMATION permission. If the requester is authentic, valid, and has the correct permissions, the Cloud Manager returns the extended information.

Token Renewal

When a token is issued for a user, the time it remains valid is limited. After time expires, the token is no longer valid and the user must log in again. API requests using an expired token will fail. The token expiration can be extended through renewal. This involves making a request to the Login API and providing the original token as well as the security stamp returned during the initial login. The renewal can occur any time prior to expiry. An updated token will be issued to the requester. A token owner should leave plenty of time prior to expiry to attempt to renew the token in case of network delays or temporary outages. Typically the token owner should begin the renewal process with at least a quarter of the time remaining.

Token Leasing

For APIs that receive multiple requests from the same client in quick succession, the process of validating the client’s token can get very expensive and would produce significant traffic directed at the authentication service. To avoid this situation, a service can choose to lease a token’s validation result. Part of the user’s extended information is values related to token leasing. These values represent the length of time a previous validation can be used to service the request without having to revalidate the token.

The type of request determines which value is used. The length of time for read operations is typically a longer time frame as these types of requests are generally safe. Some write operations are only moderately dangerous so can be performed within a shorter window of time. Security critical or destructive operations like deletes should typically only be performed in conjunction with a token validation so usually don’t allow a lease window. Regardless of the type of request that requires a new validation, the clock for all leases are refreshed. A successful lease hit does not restart the clock for the ongoing lease window.

Scenario 1 - Valid Lease Hit

An initial request comes in and the token is validated. A read request comes in 6 seconds after the initial request. The token is found in the leasing cache. The last validated property of the leased token is checked to see if the new request falls into the token’s read window by adding the read lease time to the last validated time and comparing it to the time of the request: lastValidated + readLease < currentTime. Since the new request falls into the read window, the request is processed as if the token is valid without explicitly validating the token with the issuer.

Scenario 1 - Invalid Lease Hit

An initial request comes in and the token is validated. A write request comes in 6 seconds after the initial request. The token is found in the leasing cache. The last validated property of the leased token is checked to see if the new request falls into the token’s write window by adding the write lease time to the last validated time and comparing it to the time of the request: lastValidated + writeLease < currentTime. Since the new request falls outside the write window, the token is validated with the issuer before the request can be processed. Once validated, the last validated property is updated and the lease windows begin again.

Token Caching

All tokens are stored in a database that is shared by all authentication servers servicing a user realm. For large user realms, the number of token related requests can negatively impact the overall performance of the authentication server. To avoid accessing the database for every request the authentication server employs a token caching strategy. When a token is issued after a successful login, it is stored in the database immediately. The token is also placed in the cache. When a validation request arrives for a token, the authentication server first looks for the token in the cache. If the token is found in the cache and is valid the token is successfully validated. If the token is found in the cache but is invalid, the token is removed from the cache. If the cached token was invalid or is not in the cache, a request is made to the database to retrieve the token. If the token is found and is valid it is placed in the cache and is returned. If the token is not found or is not valid, the validation request is denied.

The cache mechanism must also be periodically cleaned up to avoid validating revoked tokens and leaking memory over time. The cache clearing process is run in the background and not related to incoming requests. When the authentication server starts up, a timestamp is taken and stored as the previous cleanup cycle time. A timer is created to launch the cleanup cycle and scheduled to fire after the clean up cycle cool down time. When the cleanup cycle is started, a new timestamp is taken and stored in a temporary variable. A database request is made to update all tokens in the ACTIVE state whose expire time has passed. A second database request is made to retrieve all tokens not in the ACTIVE state whose end date is after the previous cleanup cycle time. All tokens returned in the list of newly ended tokens are removed from the cache. Finally the temporary timestamp is stored in the previous cleanup cycle time variable and the cleanup process is scheduled again.

Token Revocation

A token can be revoked by another user that has the appropriate permissions. As a result of features like high availability of authentication servers and token leasing, revoking a token won’t be immediate. There may be some time lag between when a token is revoked and API requests being aware of the revocation. The maximum amount of time a token may remain in effect is the authentication server cache cycle plus the longest token lease time frame. In most cases the revocation will be in effect within 30 seconds and typically within a much shorter period of time.

Authentication Server API

/loginUI

Only accepts credentials for human type users.

POST

{
  "username": "[username]",
  "password": "[password]"
}

Success Response - 200 OK

Sets a cookie containing the JWT based on the authentication server domain used.

{
  "JWT": [JWT Content],
  "securityStamp": "[security stamp used for renewal]"
}

/loginSystem

Only accepts credentials for system type users.

POST

{
  "username": "[username]",
  "password": "[password]",
  "instanceId": "[instance identifiable information]"
}

Success Response - 200 OK

{
  "JWT": [JWT Content],
  "securityStamp": "[security stamp used for renewal]"
}

/validateToken?critical=true|false

Determines if the provided token is valid.

POST

{
  "JWT": [JWT Content]
}

Success Response - 200 OK

No response content

/renewToken

Extends the expiry time of the provided token.

POST

{
  "JWT": [JWT to Renew.  Can be provided as cookie],
  "securityStamp": "[previous security stamp]"
}

Success Response - 200 OK

If JWT was provided as a cookie, JWT is updated with returned content.

{
  "JWT": [JWT Content],
  "securityStamp": "[replacement security stamp used for next renewal]"
}

/logoutToken

Logs the provided token out. This is considered a normal end of a token’s lifecycle in contrast to expiry or revocation.

POST

{
  "JWT": [JWT to logout.  Can be provided as cookie],
  "securityStamp": "[previous security stamp]"
}

Success Response - 200 OK

If JWT was provided as a cookie, the cookie is removed.

No response content

/revokeToken

Cancels the provided token. Requires the CANCEL_TOKEN permission. This is a non normal end of a token’s lifecycle.

POST

{
  "authJWT": [JWT of canceling user.  Can be provided as cookie],
  "JWT": [JWT to Cancel],
  "securityStamp": "[previous security stamp]"
}

Success Response - 200 OK

No response content

Open Questions

Question	Answer	Date Answered

Out of Scope