This is a continuation of the addendum to a series of articles on ELK on K8s.

ElasticSearch on K8s: 01 — Basic Design
ElasticSearch on K8s: 02 — Log Collection with Filebeat
ElasticSearch on K8s: 03 - Log Enrichment with Logstash
ElasticSearch on K8s: 04 - Log Storage and Search with ElasticSearch
ElasticSearch on K8s: 05 - Visualization and Production Readying
ElasticSearch Index Management
Authentication and Authorization for ElasticSearch: 01 - A Blueprint for Multi-tenant SSO
Authentication and Authorization for ElasticSearch: 02 - Basic SSO with Role Assignment
Authentication and Authorization for ElasticSearch: 03 - Multi-Tenancy with KeyCloak and Kibana

The previous post on this series was about enabling SSO between Kibana and KeyCloak. However, it did not address any concerns about multi-tenancy with SSO. This article aims to come up with a design for a multi-tenant SSO solution for Kibana with KeyCloak.

The Problem of Multi-Tenancy

A multi-tenant deployment is one that can accommodate different organizational units of users (called tenants) during service utilization without compromising the logical separation between those units. Achieving this design is the responsibility of the deployment stack (including hardware and software). It could be either a fully separated multi-tenant system, where different organizational units (tenants) will be served out of different physical (or virtual) stacks of software, where the code is somewhat free from having knowledge on the multi-tenant complexities of the larger system, or an in-software multi-tenant design where the code itself if fully aware of tenancy in the request-response flow and handles partition of tenant data accordingly.

Physically separate multi-tenancy

In-Software multi-tenancy

There are various pros and cons of both approaches, however physically separate multi-tenancy, being a simpler approach to implement at first, could easily become operational burden if not done correctly. It can also easily evolve (or devolve depending on the Ops perspective) into a serverless design if the software architecture can support it.

Support from SP and IDP

When it comes to the ELK stack, what should be tenant aware is Kibana, the main user interface of the stack. Kibana offers an in-software multi-tenancy model where a feature called Spaces along with ElasticSearch Roles act in combination to provide tenant specific experiences of Kibana. In other words, different tenants could have access to different Spaces in Kibana, that may provide isolation between tenants and avoid having to share the same set of data and configuration between tenants.

On the other side of the process, KeyCloak has a well built-in model for multi-tenancy called Realms (that we already are familiar with after enabling single tenant SSO in the previous post). As the IDP, KeyCloak provides logical separation between tenants with this concept of Realms.

We can simply map these two functionalities so that a set of users from one KeyCloak Realm are assumed to be of the same Tenant and assign them one or more Roles in Kibana (to be precise, ElasticSearch) so that they all have access to a single Space, and only to that Space. This is the basis of our multi-tenant design for Kibana. This results in a simple user model, where a single user may only be part of a single tenant. Note that there could be more complex user models with a single user having the ability to be part of different tenants. For the purposes of this article, a 1-1 mapping between a user and tenant is enough.

Mapping Kibana Spaces to KeyCloak Realms

Tenant Discovery and the Lack of it

In theory, Kibana and KeyCloak having support for multi-tenancy should be the end of story. Unfortunately it is not. There is another part of multi-tenancy outside of just having support for logical separation. The initiating component of the multi-tenant design should be able to figure out which users belong to which tenants, using some distinguishing feature in the request. This is called tenant/realm discovery. For an example, this could be a design based on URL segments, where a user visiting https://saas.my.org/tenant1/ is assumed to be part of the tenant tenant1. There are multiple other ways of doing realm discovery (ex: using a custom header or deriving information from standard headers, additional prompts where the user is allowed to specify the tenant they wish to be part of). Because this is such an open ended approach, different service providers may decide how they want to do tenant discovery on their own terms.

As the Service Provider in this SSO scenario, Kibana is the main component of contact, that would initiate the SSO flow. This means, it should be Kibana that should start separating users to their respective tenants. If there was no authentication for Kibana, on visiting the UI, Kibana would present the list of Spaces to be selected. However, with authentication in place, by the time the Space selection interface is presented, the user should already be granted proper roles.

Therefore, it’s clear that Role Mappings rules (that was discussed in the previous article) should take care of assigning proper roles to users. However, impact of proper tenant discovery goes a step before this. Since KeyCloak realms are completely separate, to authenticate different users from different tenants, Kibana should initiate SSO with the respective realm in the first place, i.e. users from tenant1 should be sent to KeyCloak with SSO initiated with a Client configuration in tenant1 realm. To do this, Kibana should already have derived the user’s tenant, i.e. Kibana should have done tenant discovery. This is why tenant discovery is an important feature for multi-tenant SSO.

Unfortunately, as of the current version (7.4), Kibana does not have any kind of tenant discovery mechanism built into it. A complex URL segment based discovery mechanism may be designed with a reverse proxy in front of Kibana, however with other limitations (ex: ElasticCloud’s inability to support proxy URLs) this quickly becomes a rigid, high-maintenance structure.

A Solution with Federation

Handling Tenant Discovery

A simpler approach for this may be the easiest one to implement. If the burden of the decision during tenant discovery can be pushed off to the end-user, the impact from the lack of software features can be mitigated. Therefore, when the user is redirected to KeyCloak, they should be able to decide which tenant to authenticate against. Kibana on the other hand, should also have a single point of contact for KeyCloak to send the users for authentication. In other words, an SSO facade that speaks SAML should be present between Kibana and KeyCloak. With KeyCloak’s support for IDP Federation, this SAML facade can be another KeyCloak realm that acts as a proxy for the rest of the actual tenant realms.

For an example, for a list of tenant realms named org1, org2, org3, KeyCloak can have another realm (let’s call it the federator realm) to act as a facade, named federator. The realm federator will have the realms org, org2, and org3 as pseudo-external IDPs, using OIDC as the communication protocol between federator and the tenant realms. For the federator realm, the external IDP configured to point to org1 becomes a completely separate KeyCloak instance, even though the two realms act on the same CPU cycles on the same VM.

IDP Federation is feature typically used to enable social authentication with public providers like Google, Twitter, or Facebook. However, because of standardized protocols like OIDC (on which the typical social providers usually build upon and end up with their custom protocols), any compliant provider could be an externally federated IDP. Since any KeyCloak realm is OIDC compliant, a self-referencing federation is like any other integration.

When a user visits a Client configuration on the federator realm to authenticate, they will be provided with a list of external IDPs to select. If the user is from organization1 they can then select org1 option from the list, after which they will be redirected again to a Client on the org1 realm to be authenticated. After successful authentication, the user details will be sent back to the federator realm, which can send them again to the Service Provider that initiated the SSO flow.

The user interface provided to a user when visiting a Client for authentication is part of a KeyCloak Theme. With the default theme, the federated IDPs are rendered as selectable options. However a customized theme can easily modify this to be an additional textual input for the user to enter, if exposure of other realm names is a security concern.

We have solved the problem of tenant discovery by essentially not choosing to do it programmatically.

Implementation

KeyCloak IDP federation is a well documented feature. The essence of implementing this is,

Create an OIDC client in the tenant realm (ex: org1) to be the point of contact during identity brokering
Create an external IDP configuration in the federator realm and use the OIDC client created in #1
Add the entry details created in #2 as a redirect URL in the OIDC client created in #1 to complete the plumbing

OIDC Client configuration in the `org1` Tenant Realm

IDP Federation configuration in the `federator` Realm

Doing these simple tasks will create the basic connectivity between the federated and the federating realms. A few things to note are,

As mentioned above, the federation process happens here using OIDC as the protocol. However additionally SAML or OAuth2 could also be used as the federation protocol. For the authorization needs in our design, either OIDC or SAML is a must in this case.
Since we are using OIDC as the communication protocol, for the backchannel communication that is done as part of the process, we need to enable direct communication between the two parties involved. i.e. we need to enable KeyCloak to KeyCloak direct communication. This means, if the infrastructure architecture is a NAT based one, then the public IP addresses of the proxies should be whitelisted at the network firewall. To be more K8s specific, the firewall for the Cloud Load Balancer that gets created as part of the Ingress should contain the IP addresses of the NAT proxy/proxies of the deployment, as KeyCloak would essentially call itself through the publicly resolvable name.

Handling Authorization

With SSO and Tenant Discovery out of the way, we can start evolving the design towards a multi-tenant authorization. In other words, as the Service Provider, Kibana should be able to separate the users out using the user information received from the IDP. Let’s explore how this would be like in a federated IDP design.

In the previous post, we essentially ignored authorization and assigned the superuser role in ElasticSearch to everyone who authenticates using SAML. However, in a real world deployment, there should be one or more SAML Attributes in the SAML response from the IDP that helps to match one or more Role Mapping Rules to decide one or more Roles to be assigned to the user.

Propagating User Information Through IDP Federation

The federated tenant realm is the source of truth for information about a specific user. However, in the federated IDP design that we came up with so far, Kibana does not have an idea of the federated tenant realm, as it only communicates with the SAML facade, the federator realm. Therefore, the user information should be propagated from the federated tenant realms, through the federator realm, to Kibana.

How this is done in KeyCloak is using Client and IDP federation Mappers. Mappers are a means of dynamically translating protocol specific data into (and out of) KeyCloak internal data structures. For an example in this case, we can use Mappers for IDP federation to translate a Claim in the OIDC token that is received as part of the federated IDP authentication process, into an entry in the SAML Attribute document in the SAML response sent to the Service Provider.

How this mapping would happen in detail would be,

The user would end up in the federated realm authentication step and successfully authenticate themselves against the tenant realm (ex: org1)
The federator realm will request user information from the tenant realm, org1. This will be done through the OIDC client created for the purposes of communication during federation. There should be Client Mappers in this Client that would set additional Claims in the outgoing token as a result of the /userinfo API call. For an example, the KeyCloak Roles that the user is attached with could be part of the userinfo token.
In the federator realm, federation for the tenant realm org1 should be configured to read and translate the specific Claims in the userinfo JWT into the KeyCloak internal data structures, called Attributes. For an example, the role details in the incoming JWT can be translated into a KeyCloak attribute named userroles. This is done through IDP Mappers.
The next hop in the process is the SAML client in the federator realm that creates a SAML response to be sent to the Service Provider, Kibana. This Client should have Client Mappers that translate the KeyCloak attributes to statements in the SAML Attribute document. For an example, the KeyCloak attribute named userroles that we read off from the JWT from org1 could be an Attribute in the SAML response named UserRoles. Since SAML is a prime example of a verbose XML based protocol, you could sprinkle a “friendly” name on top, org.my.kibana.saml.userroles.
Kibana, upon receiving the SAML response, should be configured to read the additional attributes off from the document properly. This is a SAML specific implementation detail in ElasticSearch where the SAML attributes can be referred to in the Role Mapping Rules, by using certain naming conventions. Using this feature, Kibana will figure out the roles to assign to the user, evaluating custom role mappings.

With this knowledge, we can come up with a more detailed picture of what happens when a user tries to authenticate through federated SSO.

Example: Determining Read-only vs Write Access in Kibana

Let’s take a really specific case of assigning write access to users in Kibana (or effectively ElasticSearch). For this example, let’s decide write access using the presence of a SAML attribute named kibanaAdmin, with value true giving user write access, and value false restricting them to a read-only role.

For this, the following configurations should be done in Kibana and KeyCloak.

Create 2 ElasticSearch Roles for Write and Read-only roles
Create 2 ElasticSearch Role Mappings that contain rules to evaluate to assign the above created roles
Configure KeyCloak tenant realm org1 to include the details for kibanaAdmin in the userinfo JWT
Configure KeyCloak federator realm to read and translate the JWT to create the SAML attribute kibanaAdmin

ElasticSearch Roles

The following two roles, roaccess and writeaccess could be created in ElasticSearch to encapsulate the two levels of permissions the users would get.

roaccess

{
  "applications": [],
  "transient_metadata": {
    "enabled": true
  },
  "run_as": [],
  "cluster": [
    "monitor"
  ],
  "indices": [
    {
      "privileges": [
        "read",
        "monitor",
        "view_index_metadata"
      ],
      "field_security": {
        "except": [],
        "grant": [
          "*"
        ]
      },
      "allow_restricted_indices": false,
      "names": [
        "*"
      ]
    }
  ],
  "metadata": {}
}

writeaccess

{
  "applications": [
      {
        "application": "*",
        "privileges": [
          "*"
        ],
        "resources": [
          "*"
        ]
      }
    ],
  "transient_metadata": {
    "enabled": true
  },
  "run_as": [],
  "cluster": [
    "monitor"
  ],
  "indices": [
    {
      "privileges": [
        "all"
      ],
      "field_security": {
        "except": [],
        "grant": [
          "*"
        ]
      },
      "allow_restricted_indices": false,
      "names": [
        "*"
      ]
    }
  ],
  "metadata": {}
}

Note that due to frequent ElasticSearch and Kibana API changes, these definitions could slightly change.

ElasticSearch Role Mappings

With the name of the SAML attribute decided, we can create the following two role mappings, roaccessmapping and writeaccessmapping to map the above two roles to the authenticating users.

writeaccessmapping

{
  "rules": {
    "all": [
      {
        "field": {
          "realm.name": "cloud-saml"
        }
      },
      {
        "field": {
          "metadata.saml_kibanaAdmin": "true"
        }
      }
    ]
  },
  "enabled": true,
  "roles": [
    "writeaccess"
  ],
  "metadata": {
    "version": 1
  }
}

roaccessmapping

{
  "rules": {
    "all": [
      {
        "field": {
          "realm.name": "cloud-saml"
        }
      },
      {
        "field": {
          "metadata.saml_kibanaAdmin": "false"
        }
      }
    ]
  },
  "enabled": true,
  "roles": [
    "roaccess"
  ],
  "metadata": {
    "version": 1
  }
}

Note the use of field name matching for the field metadata.saml_kibanaAdmin. This will result in the lookup of a SAML attribute in the SAML response with the friendly name of kibanaAdmin.

With the client side of the process complete, let’s explore how to setup the user information so that the details that we are looking for end up in the SAML response.

KeyCloak Tenant OIDC Client Mapper

As the source of truth, the tenant realm org1 should start embedding the data needed to derive the information at the ElasticSearch side.

To differentiate between a user that should have write access in Kibana and a user that should only have read-only access, we can make use of a KeyCloak role assignment. There could be a KeyCloak role named kibana_admin that gets assigned to the user only if they should be someone with write access in Kibana. We can take another step for the sake of clean design and make use of KeyCloak Client Roles instead of Realm wide Roles to restrict the impact of that role on other Service Providers. A KeyCloak Client Role is a role mapping that is associated with a specific KeyCloak Client only. In our case, this is the OIDC Client that was created to facilitate federation. Using a Client Role restrict the meaning of that role to the scope of the client, i.e. to the effect of stating something like kibana_admin role has a meaning only if the user is authenticated through this specific Client.

We can start embedding all the client roles in the JWT in the OIDC Client, however for the sake of simplicity we can also embed another Claim that has the values true and false under the Claim name kibanaAdmin. For this we can use the Script Mapper, a feature in KeyCloak that lets us extend the functionality a bit.

With a Script Mapper, we can use JavaScript to programmatically create or modify the Claims in the JWT. The JavaScript is executed using Nashorn runtime inside the JVM so there are certain changes from the common JavaScript dialect on how to access certain data embedded from the JVM runtime, however, including a custom boolean type Claim is a simple code.

The Script Mapper attached to the OIDC Client for federation would look like the following.

/**
 * Available variables: 
 * user - the current user
 * realm - the current realm
 * token - the current token
 * userSession - the current userSession
 * keycloakSession - the current userSession
 */

kibana_admin = false;

// get all client roles for this client
var ArrayList = Java.type("java.util.ArrayList");
var roles = new ArrayList();
var client = keycloakSession.getContext().getClient();
var forEach = Array.prototype.forEach;
forEach.call(user.getClientRoleMappings(client).toArray(), function(roleModel) {
    if (roleModel.getName() === "kibana_admin"){
        kibana_admin = true;
    }
});

// set Claim 
token.setOtherClaims("kibanaAdmin", kibana_admin);

After this Mapper gets evaluated, the JWT will contain a Claim named kibanaAdmin with either true or false set to the value.

KeyCloak IDP and SAML Client Mappers

With the proper values in the incoming JWT, we can now programme the rest of the process to propagate it through federation into the SAML document.

The first step is to configure the IDP federation to read the Claim off of the JWT and store it as a KeyCloak attribute. For this, the Attribute Importer Mapper that is configured to read kibanaAdmin can be used.

The next step is to configure the SAML client for Kibana to include the attribute read off from the JWT in the SAML response. For this, a User Attribute Mapper can be used.

With all these in place, the final SAML response from KeyCloak should contain the following SAML Attribute, that can be evaluated by ElasticSearch.

The value of Response.Assertion.AttributeStatement[.FriendlyName == 'kibanaAdmin'].AttributeValue will be what the above created ElasticSearch Role Mappings would be evaluating.

<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"
                xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion"
                Destination="https://kibana.my.org:443/api/security/v1/saml"
                ...
                >
    <saml:Issuer>https://keycloak.my.org/auth/realms/federator</saml:Issuer>
    <dsig:Signature xmlns:dsig="http://www.w3.org/2000/09/xmldsig#">
       ...
    </dsig:Signature>
    ...
    <saml:Assertion>
        <saml:Issuer>https://keycloak.my.org/auth/realms/federator</saml:Issuer>
        <saml:Subject>
            ...
        </saml:Subject>
        <saml:Conditions>
            ...
        </saml:Conditions>
        <saml:AuthnStatement>
            ...
        </saml:AuthnStatement>
        <saml:AttributeStatement>
            ...
            <saml:Attribute FriendlyName="kibanaAdmin"
                            Name="kibanaAdmin"
                            NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic"
                            >
                <saml:AttributeValue xmlns:xs="http://www.w3.org/2001/XMLSchema"
                                     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                                     xsi:type="xs:string"
                                     >true</saml:AttributeValue>
            </saml:Attribute>
            ...
        </saml:AttributeStatement>
    </saml:Assertion>
</samlp:Response>

This is a somewhat detailed example of how to do fine grained authorization based on information passed through federation in KeyCloak. Let’s consider another scenario where authorization should be done to implement multi-tenancy.

Tenant Separation

Since users can now be authorized into different levels of access, let’s consider how to do that within the confinements of separate tenants.

We can take the same approach as how we enabled multi-level access, i.e. sending data that from KeyCloak that would help to determine the tenant at Kibana level. We decided to map a KeyCloak Realm to a Kibana Space and consider that as the basis of multi-tenancy in our solution. Therefore, Kibana should have the data that helps to determine the tenant realm that the user belongs to (not the federator realm, as it is only a facade for authentication) and assign roles that only allows access to the Space that tenant is mapped to.

For this,

KeyCloak should send another SAML Attribute stating the original tenant realm name
ElasticSearch Role Mappings should be created that evaluate rules to check for the existence of the original tenant realm name in the SAML response
ElasticSearch Roles should be created that restrict access to the specific Spaces and Indices

KeyCloak Mappers for Tenant Name

As we dynamically added a Claim in the JWT and propagated that through to the SAML document, we can dynamically add the original realm name. Another Script Mapper can be made use in this case too. This mapper would be far simpler than the earlier one, since we have access to the information we need as JavaScript variables injected by the JVM runtime.

/**
 * Available variables: 
 * user - the current user
 * realm - the current realm
 * token - the current token
 * userSession - the current userSession
 * keycloakSession - the current userSession
 */

token.setOtherClaims("tenant", realm.getName());

We’ll propagate this Claim in the JWT using Attribute Importer and Mapper.

This should result in an additional SAML Attribute in the Attribute Statement section of the SAML response.

<saml:Attribute FriendlyName="tenant"
                Name="tenant"
                NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic"
                >
    <saml:AttributeValue xmlns:xs="http://www.w3.org/2001/XMLSchema"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                         xsi:type="xs:string"
                         >org1</saml:AttributeValue>
</saml:Attribute>

ElasticSearch Changes to Lookup the New SAML Attribute

To read this SAML Attribute at the ElasticSearch side of things, we could follow the same approach as the earlier one. However, we can make use of an ElasticSearch SAML configuration, that let’s us specify what should be translated as the group variable for a particular subject (i.e. authenticating user). Doing this instead of treating tenant as just another SAML Attribute makes certain sense, since it is the actual grouping factor for a KeyCloak user.

To do this, we should edit the ElasticSearch SAML configuration that was added in the previous post, with a slight modification to the field xpack.security.authc.realms.saml.cloud-saml.attributes.groups.

xpack:
  security:
    authc:
      realms:
        saml:
          cloud-saml:
            order: 2
            attributes.principal: "email"
            attributes.groups: "tenant"
            idp.metadata.path: "/app/config/saml/metadata.xml"
            idp.entity_id: "https://keycloak.my.org/auth/realms/elastic"
            sp.entity_id: "https://kibana.my.org:443/"
            sp.acs: "https://kibana.my.org:443/api/security/v1/saml"
            sp.logout: "https://kibana.my.org:443/logout"

This would enable us to query for the groups field in the Role Mappings to figure out the Space specific Role to assign.

For an example, for the tenant org1, the following two Role Mappings can be added to determine,

whether the user should be assigned to a role related org1
whether the user should be able to do write operations on the tenant org1

Write Access

{
  "rules": {
    "all": [
      {
        "field": {
          "realm.name": "cloud-saml"
        }
      },
      {
        "field": {
          "groups": "org1"
        }
      },
      {
        "field": {
          "metadata.saml_KibanaAdmin": "true"
        }
      }
    ]
  },
  "enabled": true,
  "roles": [
    "org1writeaccess"
  ],
  "metadata": {
    "version": 1
  }
}

Read-only Access

{
  "rules": {
    "all": [
      {
        "field": {
          "realm.name": "cloud-saml"
        }
      },
      {
        "field": {
          "groups": "org1"
        }
      },
      {
        "field": {
          "metadata.saml_KibanaAdmin": "false"
        }
      }
    ]
  },
  "enabled": true,
  "roles": [
    "org1roaccess"
  ],
  "metadata": {
    "version": 1
  }
}

These evaluations will result in either of the following roles being assigned to the user.

org1writeaccess

{
  "applications": [
    {
      "application": "kibana-.kibana",
      "privileges": [
        "feature_discover.all",
        "feature_visualize.all",
        "feature_dashboard.all",
        "feature_dev_tools.all",
        "feature_indexPatterns.all",
        "feature_savedObjectsManagement.all"
      ],
      "resources": [
        "space:org1"
      ]
    }
  ],
  "transient_metadata": {
    "enabled": true
  },
  "run_as": [],
  "cluster": [],
  "indices": [
    {
      "privileges": [
        "all"
      ],
      "field_security": {
        "except": [],
        "grant": [
          "*"
        ]
      },
      "allow_restricted_indices": false,
      "names": [
        "logstash*org1*"
      ]
    }
  ],
  "metadata": {}
}

org1roaccess

{
  "applications": [
    {
      "application": "kibana-.kibana",
      "privileges": [
        "feature_discover.read",
        "feature_visualize.read",
        "feature_dashboard.read",
        "feature_indexPatterns.read",
        "feature_savedObjectsManagement.read"
      ],
      "resources": [
        "space:org1"
      ]
    }
  ],
  "transient_metadata": {
    "enabled": true
  },
  "run_as": [],
  "cluster": [],
  "indices": [
    {
      "privileges": [
        "read",
        "view_index_metadata",
        "monitor"
      ],
      "field_security": {
        "except": [],
        "grant": [
          "*"
        ]
      },
      "allow_restricted_indices": false,
      "names": [
        "logstash*org1*"
      ]
    }
  ],
  "metadata": {}
}

Note how the roles restrict access to one Space named org1 and only Indices that match the name pattern logstash*org1*.

Conclusion

With the topics covered in the articles in this series so far, it should be a fairly straight forward exercise to design a custom multi-tenant SSO solution that doesn’t require complex organization specific customizations. KeyCloak was used as a reference IDP in this scenario, however any IDP that

has a native concept of multi-tenancy
supports SAML and/or OIDC SSO protocols
allows extending the functionality through easy configuration
is preferably Open Source so that you can fallback to the code when things aren’t clear

can easily replace it.

The depth of granularized authorization at ElasticSearch purely depends on the specific requirements of the deployment. For an example, there could easily be other roles that only allow read access to dashboards in Kibana.

A few things to note about this solution are,

This is a SAML/OIDC mix bag - This solution makes use of both SAML and OIDC which in unconventional in SSO designs. The main reason for this is ElasticSearch’s limited support for OIDC and their ElasticCloud being behind when it comes to enabling new features of the product. ElasticCloud was what I was able to experiement with for this solution, therefore most of my experience on ElasticSearch is from there.
OIDC requires back channel communication - Because OIDC has back channel communication steps (unlike SAML which can be configured to do all communication in the front channel through the user’s browser), there should be some whitelisting of traffic originating from self should be done. This could probably be worked around manipulating name resolution (ex: split DNS inside the network boundary), however it is complex enough to avoid altogether.
Automation - Configuring the above mentioned user model for one tenant for one level of granularization is relatively easy. However things easily get out of hand when there are multiple tenants involved, and tenants need to be dynamically created and provisioned. Therefore, the above mentioned configuration steps should be automated to be part of Infrastructure as Code scripts (or an API code that exposes tenant creation as an admin operation). This is easy to do with KeyCloak as it comes with a well documented REST API that is easy to work with. However if you are using ElasticCloud, some operations that should be done as part of maintenance are hard to be automated as ElasticCloud does not have an API to work with at all.
Avoid user information provisioning during federated authentication - One slight downside of a federated multi-tenant SSO solution is a feature that is built-in to IDP federation. The users that are authenticated from the external IDPs (the tenant realms in this scenario) are “provisioned” into the federator realm to be reused later. This is a feature when the actual scope of IDP federation is concerned since you don’t want to connect to the Social IDP Provider everytime you need to read the user’s email address. However, in this example, this could be a downside, since some user information that might be updated in the tenant realm may not get updated in the federator realm, and different users in different realms who have the same username (KeyCloak realms are about full separation) could be detected as a conflict if authenticated through the same federator realm. Some of these drawbacks could be worked around (ex: using a Username Changer Mapper to customize how the username is stored during provisioning), however a longer term solution would be to write a custom First Login Flow that does not do provisioning during federated authentication.

With a certain level of added complexity, we have now implemented a multi-tenant SSO solution for the ELK stack. However there could be simpler solutions that may get rid of the need to design such a solution altogether. This way of doing things is the least that can be done with minimal level of paid support. With new product features, it may be possible that Kibana would introduce tenant discovery in the future, that may get rid of most of the complexity in this design in the first place.

I have been writing about ELK for sometime, and this would hopefully be my last post on the series. Hope this series would help you to navigate around this interesting stack that works 100% of the time sometimes.

Authentication and Authorization for ElasticSearch: 03 - Multi-Tenancy with KeyCloak and Kibana

Table of Contents:

Horton Plains at sunrise, Sri Lanka