Kerberos Explained in a Little Too Much Detail
When a client, like Windows, decides it wants to do Kerberos it first needs find a KDC. There are a couple ways this can work.
The first is pretty straightforward: hardcode a list of KDCs. This is how many clients work. MIT Kerberos for instance supports this.
My Kerberos .NET (kerberos.dev) library also supports this using the same schema.
The other way is through DNS lookups. Many clients support this, but often as a secondary approach because DNS isn't entirely secure (at least it wasn't, at the time). This works by looking for an SRV record _kerberos._transport.realm.net. The results are KDCs.
Windows prefers another approach entirely because hardcoding doesn't scale well when you have thousands of DCs, and DNS doesn't include enough detail. What it does is something called DC location using the DC locator service.
The DC locator works by first querying DNS for some LDAP SRV records. These list all (K)DCs in the domain, and the locator goes through each record it finds and makes a UDP LDAP call to the server.
The server either responds immediately, or the request times out quickly and moves on to the next record. The locator checks each record for KDC info and tries to find one that's close enough based on IP subnet and the registered AD site.
As it narrows the list down it then looks to make sure the DC supports the things it needs. There's a couple dozen options to choose from and they're documented here: https://docs.microsoft.com/en-us/windows/win32/api/dsgetdc/nf-dsgetdc-dsgetdcnamea.
///
/// Forces cached domain controller data to be ignored.
///
DS_FORCE_REDISCOVERY = 1 << 0,
///
/// Requires that the returned domain controller support directory services.
///
DS_DIRECTORY_SERVICE_REQUIRED = 1 << 1,
///
/// Attempts to find a domain controller that supports directory service functions.
///
DS_DIRECTORY_SERVICE_PREFERRED = 1 << 2,
///
/// Requires that the returned domain controller be a global catalog server for
/// the forest of domains with this domain as the root.
///
DS_GC_SERVER_REQUIRED = 1 << 3,
///
/// Requires that the returned domain controller be the primary domain controller for the domain.
///
DS_PDC_REQUIRED = 1 << 4,
///
/// Requests that cached domain controller data should be used.
///
DS_BACKGROUND_ONLY = 1 << 5,
///
/// This parameter indicates that the domain controller must have an IP address.
///
DS_IP_REQUIRED = 1 << 6,
///
/// Requires that the returned domain controller be currently running the Kerberos Key Distribution Center service.
///
DS_KDC_REQUIRED = 1 << 7,
///
/// Requires that the returned domain controller be currently running the Windows Time Service.
///
DS_TIMESERV_REQUIRED = 1 << 8,
///
/// Requires that the returned domain controller be writable; that is, host a writable copy of the directory service.
///
DS_WRITABLE_REQUIRED = 1 << 9,
///
/// Attempts to find a domain controller that is a reliable time server.
///
DS_GOOD_TIMESERV_PREFERRED = 1 << 10,
///
/// Specifies that the returned domain controller name should not be the current computer.
///
DS_AVOID_SELF = 1 << 11,
///
/// Specifies that the server returned is an LDAP server.
///
DS_ONLY_LDAP_NEEDED = 1 << 12,
///
/// Specifies that the DomainName parameter is a flat name. This flag cannot be combined with the DS_IS_DNS_NAME flag.
///
DS_IS_FLAT_NAME = 1 << 13,
///
/// Specifies that the DomainName parameter is a DNS name. This flag cannot be combined with the DS_IS_FLAT_NAME flag.
///
DS_IS_DNS_NAME = 1 << 14,
///
/// Attempts to find a domain controller in the same site as the caller otherwise attempts to resolve the next closest site.
///
DS_TRY_NEXTCLOSEST_SITE = 1 << 15,
///
/// Requires that the returned domain controller be running Windows Server 2008 or later.
///
DS_DIRECTORY_SERVICE_6_REQUIRED = 1 << 16,
///
/// Requires that the returned domain controller be currently running the Active Directory web service.
///
DS_WEB_SERVICE_REQUIRED = 1 << 17,
///
/// Requires that the returned domain controller be running Windows Server 2012 or later.
///
DS_DIRECTORY_SERVICE_8_REQUIRED = 1 << 18,
///
/// Requires that the returned domain controller be running Windows Server 2012 R2 or later.
///
DS_DIRECTORY_SERVICE_9_REQUIRED = 1 << 19,
///
/// Requires that the returned domain controller be running Windows Server 2016 or later.
///
DS_DIRECTORY_SERVICE_10_REQUIRED = 1 << 20,
//////////////////////////////////////////////////////////////////////////////////////////////
///
/// Specifies that the names returned should be DNS names.
///
DS_RETURN_DNS_NAME = 1 << 30,
///
/// Specifies that the names returned should be flat names.
///
DS_RETURN_FLAT_NAME = unchecked(1U << 31)
Eventually it finds a DC that it likes and returns the address to the Kerberos stack.
Incidentally this functionality is why we can do amazing things with FAST, or Windows Hello, or FIDO and only require a handful of DCs running the latest bits instead of all of them. The client knows it needs a KDC that supports e.g. FIDO, so just it asks DC Locator for one.
However, suppose Windows isn't talking to an Active Directory domain. Maybe it's talking to an MIT or Heimdal KDC. They don't support DC Locator (well, probably don't?), so Windows will fall back to looking up the DNS SRV _kerberos records.
You'll notice I haven't said a thing about domain-joined vs workgroup vs AADJ vs whatever. This is by design because Windows doesn't give a crap about whether you're any of those (I mean, it does, but...). Windows is just another Kerberos client.
All Windows needs to make Kerberos work is have a user principal name, a credential, and a realm. It can guess about your realm based on your username. If I type jack@threadabort.net it's plausible my realm is THREADABORT.NET. At least, it's worth trying.
So Windows will DC locate threadabort.net: it'll jump down the LDAP rabbit hole, and then it'll move on the straight DNS, then it'll even try HTTP (we'll get to that in a bit).
Eventually it'll probably find KDC, and Windows will do the AS-REQ to get a krbtgt, then a TGS-REQ to the service in question. No domain-joined'ness involved.
So what does domain join do? First it configures your desktop authority. That means Windows will create a desktop and logon session for you if the configured domain gives you a ticket to itself.
Well, the next thing Kerberos does is uses the TGT to make a TGS-REQ to AD for the machine you just logged on to: host/yourdesktop.your.domain.com. Your machine is registered as a computer object in AD and has a password. The ticket for this machine is encrypted to that password. pic.twitter.com/t9RTfjVUK5
— Steve Syfuhs (@SteveSyfuhs) August 24, 2020
As it happens there's nothing inherently special about that last process, beyond knowing what knobs to turn. If you call all the correct APIs you too could create a logon session without being joined to that domain.
The second thing joining the domain does is it provides a bunch of hints to the Kerberos logon process. If my UPN is jack@threadbort.net, and my realm is THREADABORT.NET, Windows can reason through that. It can't reason through it if my realm is CORP.THREADBORT.NET.
Domain join provides these hints to say that if it can't find the domain through its standard process, try the machine's configured domain. It's either the right one, or maybe it's a domain across a trust.
The third thing of course is kinda sorta related to the last one, but is more generically about policy: group policy. It's about management of the device from a centralized location.
And the inverse? An Active Directory realm with a Linux or Mac client? It pretty much works the same.
Have user, have cred, have realm.
Go find KDC for realm.
Do AS-REQ, TGS-REQ.
Connect to App.
Profit.
Where Windows differentiates itself here is the out-of-the-box capability to join the domain and be managed, and use the creds the user typed in to get to the desktop to make SSO magical behind the scenes.
Mac and Linux definitely have this capability, which can be configured through, say MDM. They kinda sorta work the same way. They start setting device defaults and optimistically using those values and creds.
But back to Windows. That covers domain join, which is starting to show its age. These days we have hybrid join, and AAD join. All of these do Kerberos to on-prem DCs.
In the hybrid case, you're still domain joined. Nothing special about that. AADJ is something else entirely though.
AADJ changes the desktop authority from your on-prem domain to Azure AD. It uses a different authentication provider (CloudAP instead of Kerberos). But it still does Kerberos with SSO. How?
Well, it's deceptively simple. Remember how I said Windows doesn't much care what domain state its in? If you give it a cred it'll just do Kerberos. AADJ works the same way. The difference is that when CloudAP gets a successful response back from AAD, it includes metadata.
That metadata includes useful information like your real realm name, and your fully qualified UPN. So even if I typed jack@threadabort.net, it'll return jack@corp.threadabort.net+CORP.THREADABORT.NET. CloudAP hands this off the Kerberos, says have it, and does an AS-REQ.
The difference here is that Windows now doesn't care if the AS-REQ succeeds. It either did, and now it has a TGT to do SSO to on-prem stuff, or it doesn't. Either way it's at the desktop and you still have your cloud creds for SSO to AAD.
The move to the cloud puts you in an interesting predicament though. Half your stuff is in the cloud, and half your stuff is still on-prem. However, now you're out and about, or given the current state of things working from home because of a pandemic.
All the on-prem stuff isn't reeeeeally necessary for day-to-day work, so you don't set up a VPN and everything just kinda hums along while users access their email from the cloud using AAD on their hybrid joined device.
It's been 6 months and your password expired some time ago. You need to change it because stuff is starting to lock you out. The only way to change your password is through CTRL-ALT-DEL, but you need line of sight to a DC and you don't have a VPN ARRRRRGGG.
Sometime before 2012 Windows introduced this idea of KDC Proxy. The ability to tunnel KDC messages (AS/TGS/ChangePW) over an HTTP channel instead of requiring line of sight on port 88.
If configured, Windows will natively use HTTP(S) to proxy KDC messages. It's pretty cool. The way it works is also pretty simple. Windows is configured with a couple registry keys that map a realm to a proxy server.
Windows will attempt DC location and fail. It'll try DNS SRV _kerberos and fail. It'll check this registry location and go AHA!
The protocol is pretty simple. It's just the Kerberos binary format (ASN.1 DER encoded bytes) wrapped in a special KDC Proxy DER structure, and fired off in an HTTP POST.
This is somewhat less useful for AS and TGS because you're not connecting to internal resources, but there's another message flow: Change Password!
Changing a Password
The change password protocol is an extension of Kerberos. You start with an AS-REQ to the 'kadmin/changepw' service instead of krbtgt. This tells the KDC to ignore if the password is expired or required to change, because obviously we're gonna change it in a moment.
Then you send the service ticket to the changepw service on port 464. The message is an AP-REQ + an encrypted structure. The structure contains the new password, and is encrypted to the session key in the authenticator of the AP-REQ. See how it all ties back together?
In any case, now with a configured Windows device you can do CTRL-ALT-DEL and change your password when you're not on the corporate network. Neat.
And that concludes the introductory section to Kerberos.
.
.
.
In our next session we'll talk crypto and passwords because someone asked and brain dumps like this are kinda fun.
Way back in the beginning of time I mentioned the KDC encrypts the AS-REP client bit to the client password. Doesn't this mean the KDC needs to know the password??
I mean, yeah. Kinda.
In actual fact, what the KDC really needs to know is what the derived key is. Kerberos doesn't just take your password and encrypt stuff to it. Passwords are too weak. They need to be run through key-derivation functions, which is just a fancy name for hashing a thousand times.
The derivation functions are complicated. For AES+Sha1 you take the password and
hash = PBKD2(password, 4096 iterations)
constant = dk(...)
ki = n-fold(constant)
key = AESCTS(ki, hash)
See here if you can stomach it: https://github.com/dotnet/Kerberos.NET/blob/develop/Kerberos.NET/Crypto/AES/AESTransformer.cs
AES+Sha1 is the most common key mode out there today, but there are others. DES, 3DES, RC4, AES+Sha256, Camelia, etc. DES, 3DES, and RC4 are deprecated by standards bodies.
RC4 is still in use by Windows by default for backwards compatibility, but we're trying to make it go away.
But back to the key thing. No, the KDC doesn't store the raw passwords. The KDC stores these derived keys. The client knows how to convert the password into these keys too, so when it needs to encrypt or decrypt something, it'll run through it with some info provided by the KDC.
That information, specifically the key salt, provided by the KDC is returned in an error to the original AS-REQ.
There's also another bit that I left out, which is this idea of pre-authentication. Despite the key derivation process described earlier, they're not perfect. With enough effort, or a weak enough password, you can crack them.
If you couple that with a KDC that'll return a blob encrypted to that derived key, you get a nice little crypto oracle. You can take that blob offline, crack it, and now you have a password. Oops.
We need to make that first step of requesting the encrypted blob a little harder.
This is where pre-auth data comes in. Client does AS-REQ, KDC responds with an error, including that salt data, and says "by the way, you need to prove yourself. I support XYZ ways." One of those ways is an encrypted timestamp.
It's kinda simple. Take the salt you were just given, join it to the password and derive. Then take the current timestamp and encrypt it. Stick it into that pre-auth data section of the AS-REQ and away you go.
Much like on the TGS side of things, the AS sees the PA data, decrypts it, checks the timestamp is within the last few minutes and then returns whatever ticket was requested.
And so we have a handful of these hardness makers, or preauth types. I talked about FAST in this other thread:
It's Monday evening, the weather is great, and we're in the middle of a pandemic. Lets talk Kerberos! Or rather, it's little known nephew FAST and Armoring.
— Steve Syfuhs (@SteveSyfuhs) July 28, 2020
And I talked about PKINIT (certificates) and Windows Hello here:
Maybe lets dig into that a bit: PKINIT. It's almost a four letter word. How do you make a symmetric crypto protocol work with asymmetric keys? Well, first the client generates a Diffie-Hellman key pair and signs the public key using the client certificate private key.
— Steve Syfuhs (@SteveSyfuhs) August 24, 2020
And the Kerberos IETF working group is continuing to build out more options like SPAKE pre-auth.
Anyway, that's it. Here's Riley trying to follow along.