Why is Kerberos Terrible?
Tl;dr; It’s really not. Kerberos is showing its age, but it has served us well over the years. As we build new protocols we should remember all the things we got right with it, and account for all the things we got wrong.
A bit of a Background on Kerberos
Kerberos is an authentication protocol. It allows a party (A) to prove to another party (B) they are who they say they are by having a third party (C) vouch for them.
In technical terms this means a principal can prove to another principal who they are by authenticating themselves to an authentication server.
In practical terms a user can log in to their application by authenticating to Active Directory using a password.
This protocol is built on a set of message exchanges.
The following is mostly accurate, but a few liberties have been taken for the sake of simplification. All users are principals, but not all principals are users. All principals can authenticate to all principals, but the type of principal dictates properties of messages.
Here we’re looking at a standard user principal authenticating to a service principal for the simplest explanation as it’s the most common.
Authenticating to the Authentication Server
The first message exchange is between the (user) principal and the authentication server.
The user creates a message, encrypts it using their password, and sends it to the authentication server. The server has knowledge of the user’s password and can decrypt the message, proving to each party they are who they say they are respectively (within the limit of how easy it is to guess or steal this password). The server then generates a new message containing a ticket and a session key, encrypted against the user password, returning it to the user.
The user can decrypt the message and now has a ticket and session key. The session key can be used to secure any future communications between the two parties, and the ticket is used as evidence of authentication for future requests requiring authentication. This ticket is called the ticket granting ticket (TGT) — you use it to get other tickets to services.
The TGT contains identifying information about the user as well as a copy of the negotiated session key, and is encrypted to the password of a special account named krbtgt. Only the authentication server knows the password for krbtgt.
Authenticating to the Service
The second message exchange is between the (user) principal and (service) principal. It’s a bit more complicated than the first exchange.
A protocol higher in the stack indicates to the user that the service requires authentication. Kerberos doesn’t much care; it’s up to the user to initiate the exchange. The user determines the service principal name (SPN) of this service and generates a new message to the ticket granting server asking for a service ticket.
This message from the user to the ticket granting server contains the ticket-granting-ticket (TGT) from the first exchange. The message is encrypted to the previously negotiated session key. The ticket granting server also has knowledge of the krbtgt password and so can decrypt the TGT and extract the session key.
The ticket granting server can then validate the message and generate a response containing a service ticket for the service principal containing identifying information from the TGT The service ticket is encrypted against the password of the service principal. Only the ticket granting server and service principal know this password. The encrypted ticket is wrapped in a message that is then encrypted using the previously negotiated session key and returned to the user.
The user is able to decrypt the message using the session key and can forward the service ticket to the service. The user can’t see the contents of the ticket (and therefore can’t manipulate it). All the user can do is forward it. The service receives the encrypted service ticket and decrypts it using its own password. The service now knows the identification of the user. The service can optionally respond with a final message encrypted with the subsession key to prove receipt or require a seperate subsession key.
Protected Application Layer Exchanges
Now that the service has verified the user and the user has verified the service it’s possible for them to use a key derived from the subsession key to safely continue communications between the user and the service.
The Pro’s and Con’s of Kerberos
It shouldn’t come as a surprise that Kerberos has many positive and negative traits.
Kerberos is Cryptographically Sound
This turns out to be a useful property of an authentication protocol. You can generally prove the cryptographic promises of each leg are secure without making too many assumptions.
A principal’s identity is proved by the secret properties of the password and hardness of the encrypting algorithm; the identity of the server is proven conversely by knowing the password and being able to decrypt the message without having to reveal the password to any party. This is an often overlooked property of most authentication protocols.
The multileg exchange is provably secure by guaranteeing only the final recipients can decrypt messages; messages can’t be modified because they’re signed by the various authoritative sources; and messages can’t be replayed because they have counters.
All of these properties accrue to a confidential, tamper-evident, authenticated, non-repudiated authentication protocol. Most other protocols don’t have all these properties and rely on external protocols to provide these guarantees.
Cryptographic Properties Extend to Services
Services that rely on the derived session keys apply the properties of the authenticated exchanges to the derived key. This is non-trivial to get right in the best of cases so having it available to you and bound to your authentication service is useful.
The Kerberos V5 RFC is mostly complete. It provides an end-to-end solution for multi-party authentication with solutions for transport, retry semantics, supported credential types, etc. This is useful because it limits one’s ability to break interoperability while still being true to the specification. A specification is a great equalizer. You either built it correctly or you didn’t.
Implementors will invariably break interop one way or another, but the break will be contravened by the spec. There is utility in being able to go to the implementor, point to the gap, and ask them to fix it.
There are of course areas where there are gaps; no spec is perfect. However, there are at least 12 recognized specs intended to ratify these gaps and introduce improvements.
It has Provisions for Authenticating Both Parties
Most multi-party federation protocols only dictate how you authenticate the final leg of the trip, meaning from the identity provider (authentication service) to the relying party (service). This means you’re left on your own to figure out how to authenticate to the identity provider. This is not a bad thing, but it starts placing assumptions on the properties of the authenticated identity. This leads to questionable interop.
Kerberos is pretty clear about how a user authenticates to the authentication service to get tickets. The implementor doesn’t need to design their own system.
Windows Integrated Authentication relies heavily on Kerberos; this means means any application running on Windows can authenticate users or services using Kerberos. Most other major operating systems have an ability to do Kerberos, either natively, or through third party implementations like Heimdal or MIT.
Applications can opt in to Kerberos authentication using SSPI or GSS API’s with relatively little or no effort. High level development frameworks often wrap these API’s further to simplify usage.
Most organizations have Active Directory, which is amongst other things an implementation of the Kerberos AS and TGS.
This generally means you wont be seriously limiting yourself if all you relied on for authentication was Kerberos.
Kerberos Credential Protection is Well Understood
There is a fairly large gap in the second exchange, where if an attacker can steal the TGT, they can take that TGT and use it on other clients. This means they can impersonate the user as long as the TGT is valid, and all the services involved really won’t know any better.
This is an unfortunate side effect most authentication systems have because you’re exchanging an expensive authentication process (user entering password) with an inexpensive process (caching a TGT locally) to optimize for performance and user annoyance.
There are ways to solve this problem, which is generally described as proof-of-posession, where you stamp the ticket with a proof of work. The entity receiving the ticket can request you prove ownership by doing work that only you can do and is incredibly difficult for an attacker to forge. The simplest form is by signing a challenge with a secret only the user knows, such as with a private key stored on a smart card or FIDO2 device.
But down the rabbit hole we go because now the verifier needs to know the public key ahead of time, and… We’ve increased complexity of the system making it more fragile and difficult to use correctly.
It’s like entering a carnival. You can either pay each ride independently, which is annoying, time consuming, and potentially dangerous for all parties, or you can go to the ticket booth, buy a bunch of tickets, and use those tickets as evidence you paid money. It’s easier and safer, and it’s not the end of the world if an attacker steals your tickets because they’re only useful for a short period of time.
Given that this is a well known issue, we’ve done a lot of work making sure you can’t steal these tickets. In Windows we have things like centralized protection of secrets in the LSA, which moves any dangerous secrets out of a given application (making it hard to steal secrets) as well as Credential Guard, which uses virtualization-based security to move all the important secrets to a seperate virtual machine so compromising LSA won’t get you anything (making it extremely difficult to steal secrets).
Conversely other authentication protocols are less well understood, primarily because they haven’t been around as long and aren’t as widely adopted.
Cross-Realm Trusts Let you Create Security Boundaries
Kerberos has a concept of cross-realm authentication. This allows two organizations to trust the identities issued from the either organization. They can be directional or bidirectional meaning organization A might trust identities from organization B, but B won’t trust A.
This has untold utility, because you can isolate resources based on the amount of security required by the resource. There aren’t many authentication protocols that have a built-in capability for this without presenting raw credentials.
It’s easy to find fault in things so this is primarily about the big issues people have complained about in the past. We’re not talking minor nits.
The Complexity might Kill You
There’s an old joke: I went to explain Kerberos to someone and we both walked away not understanding it.
Exhibit A: See the first section.
Joking aside, this is a complicated protocol to understand completely. It doesn’t matter to end users so much, but it seriously degrades ones ability to troubleshoot failures, verify security guarantees, or extend in any meaningful way.
Cross-Realm Trusts are Complicated
Cross-realm trusts can be transitive, meaning if A trusts B and B trusts C, A could trust C. In principle this is simple and has a logic to it, but in practice it’s difficult to fit in your head and reason about.
It’s also difficult to reason about authorization to resources. Presenting an identity from another realm and trusting it hasn’t been tampered with is one thing, but deciding that the claims presented about the user should be considered during authorization decisions is something completely different. You are often forced to explicitly set rules within the resource realm instead of implicitly trusting the information because the authorization rules are contextually relevant.
(Unconstrained) Delegated Authentication needs to Die
Once a service has a users identity, it’s useful to forward that identity to another service as proof that the first service is operating on behalf of the user. This means you don’t end up with applications having the highest possible permissions any user will need and instead just rely on the user having the necessary permission. Compromising downstream resources via the application requires having an active user versus having unfettered access.
Kerberos got delegation wrong though. It lets the application operate as the user and access further resources on behalf of that user, but it’s not constrained to specific services. An application might only need access to a single database server, but when given delegation rights, it can access any resource as that user.
However, we have a solution to this which is constrained delegation and resource-based constrained delegation, which specifically lists which principals can be delegated to, as well as what services can receive those delegated tickets.
The problem is that most people don’t understand how constrained delegation works which tends to lead to falling back to unconstrained delegation.
Cryptographic Primitives are Starting to Smell
DES was the standard when Kerberos was first published. It’s still supported by all the major implementations in one form or another and most enterprises still have it enabled.
RC4 is still kicking and is often the default especially when trying to work across multiple vendors.
Key derivation at its lowest form is still generally based on MD4.
Integrity is based on a truncated SHA1 HMAC in most implementations. SHA256 has recently been standardized, but see the next few issues.
Ossification makes Changing Cryptographic Primitives Impossible
We learned this with TLS 1.3: protocols don’t like change. Things start to harden and become brittle when implementations make assumptions about protocol designs. Every implementation makes assumptions in one way or another and often they become difficult to fix.
Fundamental overhauls break everything even if the original protocols were designed to support future changes.
Changes need Critical Mass before they can be Turned on by Default
Turning on a new cryptographic primitive like SHA256 probably won’t break much, but it can’t be default until everyone supports it. It will take a decade before most implementations support it, and who knows how long it’ll be before it can be turned on by default, because there’s always some box that won’t support it.
ASN.1 Data Formatting Suuuuuuuuuuucks
This is a personal gripe. I dislike ASN.1 because it’s a complex serialization format. It’s difficult to parse and more difficult to generate.
This is a problem because it limits one’s abilities to build Kerberos implementations and results in only a handful of libraries that are feature rich. This has the unfortunate side effect that fewer people understand the protocol (coupled with complexity) and fewer still contribute to its long term development.
Kerberos Message Structures have a Range of Serialization Formats
Another gripe of mine. There are multiple serialization formats in use depending on the type of data structure you’re looking at. Kerberos itself uses ASN.1, but MS-KILE (Microsoft’s specification of deltas) uses RPC (NDR) serialization for authorization structures. This is mostly just annoying having to flip between formats, but complicates interop.
All the Crypto is Symmetric
Kerberos is a protocol that was built back when asymmetric cryptography was too expensive to use routinely or securely. It relies entirely on shared secrets between all parties, which means anyone that’s verifying a message can spoof a message. This can be negated by asymmetric signing.
PKINIT is an extension to Kerberos that allows users to execute the first message exchange using asymmetric cryptography, but all future exchanges still rely on symmetric secrets.
We actually built a variation of Kerberos that uses asymmetric crypto called PKU2U, but it’s designed for user-to-user authentication — i.e. without a trusted third party. It was originally built for HomeGroup and is now used for point-to-point authentication using Active Directory credentials.
Initial Authentication Generally Assumes Passwords
The last point suggested PKINIT can be used for authentication, and it can, but this assumes the application knows how it works and supports certificates. It turns out this isn’t as common as you’d think, and applications tend to just prompt for passwords. There will always be legacy applications that just work that will never be updated.
The other problem here is that you’re still limited to passwords or RSA-based asymmetric keys. We’re trending towards ECC asymmetric crypto with FIDO and it’s impossible to fit that into an exchange that is universally (or at least critically) supported let alone other arbitrary authentication messages.
It should be pointed out that using passwords does have its benefits. They are incredibly simple to use when machine generated and managed.
Line of Sight to Authentication Servers is Mandatory
Line of sight to the authenticating server is often a requirement in authentication protocols, but Kerberos services tend to live only within private networks. Whether you should move them to the open internet is up for debate (actually it’s not — don’t do it), but most organizations choose not to do it and would rather rely on a service born on the internet if needed.
So What’s the Takeway?
There are certainly a lot of problems with Kerberos — they are the criticisms from heavy adoption. The protocol still has value, but it’s starting to show its age and we need to look forward to what replaces it. As we move away from it we need to remember that Kerberos got quite a few things right and learn from that.