Let's talk Azure AD join and what that means to a Windows device. What's it mean to be joined to something?

Like all good things this is mostly correct, with a few details fuzzier than others for reasons: a) details are hard on twitter; b) details are fudged for greater clarity; c) maybe I'm just dumb.

Back in the early days of the universe PCs were workgroup machines, meaning everything stayed local to it, or they were domain joined, meaning they belonged to a domain.

Local is pretty self explanatory. Domain join is where a Domain Controller dictated things such as authentication, authorization, policy, and what not. This allows for centralized management of two or more machines. Neat.

I've gone into great detail about how authentication works on domain join.

A useful model to think about is the idea of an authority. A local workgroup machine is itself it's own authority. Meaning the local machine stores the passwords and does the auth. Domain join has the Domain Controller as the authority, meaning it needs a DC to bless the logon.

This authority more or less has final say over everything on that machine. There is exactly one authority in Windows.

If we jump ahead a decade or two we come across The Cloud and it forever changed how everything everywhere did things. For better or for worse.

Domain Joined machines didn't exactly fit well into this new world because of technical limitations of how authentication and management worked. They always need line of sight to a domain controller to get anything interesting done.

With The cloud you don't need line of sight to your internal servers anymore because everything is out on the internet.

So we introduced Azure AD Join. That means we changed the authority from your on-prem domain controller to Azure AD. When you type in your password it gets verified by AAD, not AD. Let's talk about how that works.

Well, it turns out it works almost identically to domain join. Windows is kinda predictable like that.

Where it diverges is in which packages get used. Instead of msv1_0 and Kerberos, we have a new package: CloudAP. CloudAP is the thing that talks to AAD and MSA (formerly live[.]com and Passport).

Each of those exist as separate, but internal plugin implementations to CloudAP. We're focusing on the AAD plugin.

So you've typed your password in, the credential providers do their thing, they fire them off the LSA, LSA iterates through all the APs.

It hits msv1_0 and Kerberos and both say 🤷‍♂️ "not our problem". It then hits CloudAP and it says "heck yes I can do something with this." So off CloudAP goes.

Now CloudAP determines its AAD, loads up that plugin, begins the authentication dance. Of course, the first thing it does is check the cache, because that's how logon works. Do a fast check to get to the desktop, then a long check in the background to whatever authority.

However, if the last logon timestamp is less than a few hours ago, the long check is skipped because frankly it's kinda unnecessary every. single. time. But lets say this is the first logon, or its been more than a few hours. How does the client authenticate to AAD?

Through OAuth! Believe it or not it's OAuth all the way down. A customized form of OAuth, but at it's core it's completely compatible and per-spec.

Anyway, the first thing the plugin does is figure out where AAD lives. It turns out we have more than one AAD: public, regional, and government. The client was stamped with this information long ago, so in the end it knows it needs to hit https://login.microsoftonline.com/tid/token.

From there it determines if it can authenticate directly to AAD, or if it's a federated user and needs to go elsewhere. We'll come back to federated. Let's assume its a regular managed user.

So the client knows where to go, and it first requests a nonce from AAD. This acts as liveness check to make sure it's not going to be a replay. It's a short randomly generated value.

The client takes the nonce plus the user's username and password and signs it with a device key that was registered when the machine was first joined.

It then takes that signed blob and fires it off to that AAD /token endpoint. AAD looks up the device, verifies the blob, validates the username and password (and makes sure they all live in the same tenant), and if all goes well forms a response.

This response includes a Primary Refresh Token (PRT), an encrypted session key, and an ID Token. The PRT is kinda like your TGT. You use it to exchange it for tokens to other resources. The ID token is like that workstation ticket that tells the machine all about the user.

And that session key is special. The session key is encrypted to a device key that was registered way back when the device was first set up. This key is used to bind the PRT to the device because the session key is used when exchanging the PRT.

Now the client has a useful PRT so it stuffs it into the cache, decrypts the session and also stuffs it into the cache, and then validates the the ID token to log the user on. Within the ID token is useful information like user SID and what not.

All this bubbles up out of CloudAP, through to LSA so it can fill in all the session details, and off you go.

As I've said, at a high enough level this is identical to the other flows.

Now supposing you're an enterprise customer and you live in both AD and AAD. You need access to on-prem resources, how does this work?

It's kinda easy actually. Once Windows has proof from AAD that your credentials are good, LSA opens up the Kerberos AP, hands those creds to Kerberos and says "have at it", and then it's Kerberos all the way down again.

But here's where it diverges a bit more. When an application needed a Kerberos ticket it would call into the SSPI (nay GSS) library and ask for a ticket to some resource. We don't have an SSPI plugin for OAuth.

The reason for this is because SSPI's don't allow for UI (ish), whereas OAuth is inherently UI-driven for things like Consent, or in our case Conditional Access.

It also happens that SSPIs are error prone and kind of a PITA to use if you've never touched them before. So we built a new thing: Web Account Manager (WAM). It's more or less like SSPI, except it has a different API model and handles UI natively.

So when an application like Office or Teams or Edge or whatever needs an OAuth token it asks WAM for one, and if the same application needs a Kerberos ticket it asks SSPI.

Now the interesting thing is, what happens when an AADJ machine needs access to another AADJ machine for something like file shares, or RDP?

Well, it turns out we can't do Kerberos because the on-prem KDC doesn't know anything about the other machine (because AAD is the authority). So what do?

We do PKU2U! Which I discussed in the RDP thread.

Tl;dr; It's kinda like Kerberos (it's actually a copy), but instead of symmetric secrets it uses certificates, and instead of three parties it's two. Those certificates are issued by AAD. Go read the other thread for more information.

So no machine A and B are talking to each other using PKU2U. What's in these tickets so things like authorization can happen?

Let's talk about Windows' authorization for a moment. For the last thirty plus years Windows has relied on the same model, more or less. It all relies on this thing called the SID -- the security identifier. Everything has one: users, groups, computers, domains, etc.

A SID has a special form of S-1-AuthorityIdentifier-Authority1-Authority2-Authority3-Authority4-RelativeIdentifier. In other words S-1-5-21-111-222-333-555, where 1-3 represent your domain, and 555 represents you the user.

In on-prem Active Directory the form is S-1-{Domain}-{User}. This makes it super easy to identify things later on, and you immediately know what domain a user belongs. This, however, is an incredibly painful design for AD internals because of how those RIDs are allocated.

Tl;dr; each DC gets a pool of RIDs within a range. DC1 gets RIDs 1000-1500, DC2 gets RIDs 1501-2000, etc. Eventually you run out because of this allocation mechanism and it's a bad bad bad bad bad day for anyone needing to recover from it.

This simply wouldn't scale in AAD, so we make up a SID and the entire thing represents the user. There is zero relationship to the domain or tenant. It's a fixed value so it's consistent for each user, but there's no relationship between users anymore.

This kinda makes it difficult to manage authorization rules. Mea culpa -- we're working on making it better, promise!

Anyway, as it turns out, we store these SIDs in the certificate, not the PKU2U ticket, so the far machine must parse the certificate, extract these details and we now have something that can turn into an NT token.

As you might imagine this gets a bit complicated when connecting from domain-joined to AADJ machines. That's because the AADJ machine has no knowledge of the SIDs in the Kerberos ticket PAC, and in fact doesn't even have a key to decrypt the ticket, so there's no ticket at all.

The domain-joined machine might not be aware of PKU2U so depending on a whole bunch of conditions might succeed, or might not, but either way the SIDs don't match anything, so you're back to maybe authenticating but not being authorized.

Wheeeeeeee. We're working on it.

So that in a nutshell is AADJ. Let's look at the other thing that people tend to get confused about, which is Hybrid Join.

Remember those authority things? We have a domain authority, and we have an AAD authority. Hybrid join uses the *domain* authority. Full stop. End of the line. What makes it hybrid?

The hybrid story here is about management and SSO. The hybrid joined machine can be managed by Intune/MDM or Group Policy. It gets SSO support to cloud resources AND on-prem resources, but no matter what the domain is the authorizing thing.

The way this works you push a group policy configuration down to each of your domain joined machines, and this tells them to start connecting to AAD and registering themselves. This registration creates the device in AAD which registers some keys.

Then the next time the user logs on it goes through the same old dance. Credential goes to credential provider, CP goes to LSA, LSA says "who can do something about this??" MSV1_0 and Kerberos say "ayyyye" and do their thing.

But now once those are done CloudAP jumps up and exclaims it too can do something!!! And so it does. The difference is if Kerberos fails, it doesn't move on to AAD, there's no cache involved for CloudAP, etc.

 So hybrid gets your WAM for SSO, but you're still relying on your on-prem domain to do things.

Why don't we make hybrid allow you to log in with AAD (as described by authority)? Well, it's a little more mundane than folks would think: because it would be impossible for everyone to reason about or manage.

Have you ever had two bosses? They have competing agendas and priorities and many times they're in conflict. Trying to unravel it is an exercise in madness. So if you want cloud to be your authority you should consider switching to AADJ. Your stress levels will thank you.

Speaking of bosses, here's Bruce reviewing these threads and wondering why I'm not coding.

At it's core Kerberos is an authenticated key agreement protocol based on the Needham-Schroeder protocol.

Like all good things this is mostly correct, with a few details fuzzier than others for reasons: a) details are hard on twitter; b) details are fudged for greater clarity; c) maybe I'm just dumb.

We'll keep the crypto to a minimum because I'm kind of dumb, but here it is in it's most basic form. Wikipedia has everything you need on the crypto aspects of it.

Anyway, the point of this key agreement protocol is so two different parties, A and B, can prove to one another that they have knowledge of the same key without leaking the key to the other party, all the while proving each party is who they say they are.

Why is this the basis for authentication? Because we don't actually care about how a user authenticated (I mean, we do, but...). What we really care about is the application communicating between client A and server B is doing so securely. This is done by using that key.

Remember, we're talking about a time long before SSL and TLS were invented, so this key is *the thing* securing all communications.

It turns out this agreement thing is incredibly difficult to do securely with just two parties (at scale, at least). So we introduce a third party C. Both A and B trust C, so C can act as a go-between for both parties.

If client A can authenticate to C, C can prove to B that A is actually A. Then, C can prove to A that B is actually B. This way A and B really don't need to know anything about each other, other than C says its all good. Let's build this into a protocol: Kerberos.

Okay, so we have our three parties: The client (user/human), the application (say SMB share), and the trusted third party (KDC).

The client knows it needs to talk to the application, so it begins by speaking to the KDC. The client and KDC don't implicitly trust each other, so they need to prove to one another that they are who they say they are. This is done by doing an authentication request (AS-REQ).

An AS-REQ is a message sent to the authentication service (AS). All the AS does is exchange credentials for tickets. These credentials can be anything, but are often passwords.

Eh-e4b5U8AARFsf.png

You'll notice there's nothing in there that includes any password information. That's because of the key agreement thing -- don't leak the key. The KDC happens to know the password, so generates a response (AS-REP).

The AS-REP includes two things: an encrypted ticket, and an encrypted client blob. The encrypted client blob is encrypted using the user password. The KDC has now proven to the client it is the KDC because only the KDC knows the password.

Within the blob is metadata about that other (double) encrypted thing-- the ticket. One other part the metadata is this thing called the session key. The session key encrypts the doubly-encrypted ticket.

Eh-gmQoUYAYiNl8.png

The client decrypts this ticket and gets, well, not a lot -- another encrypted blob. That's okay though because the client has everything it needs to set up a session with application B now. This might be confusing because I'm leaving a flow out, but we'll get there. Trust me.

Eh-hslqVgAAoZ0r.png

This opaque blob of a ticket was encrypted by the KDC to the target applications long term key -- it's password. The application knows the password so it can decrypt it, and since the KDC knew the password too, the application knows the KDC is the KDC.

Within the blob is a whole bunch of metadata, in fact the same metadata in the client blob. Including that same key.

Eh-ix9nVgAod0I6.png

Aha. So now client and the application know a key only they know (well, the KDC too, hold that thought though). This means the client and application can talk to each other securely! Woohoo!

Hold up you say! I'm missing a huge important piece -- the TGS-REQ. Yep, lets talk about that.

Remember how I said the only job of the AS-REQ is to exchange credentials for tickets? Well it's true. You can say AS-REQ => (creds, 'host/some-app') and the KDC will happily oblige. It turns out this is incredibly inefficient and possibly insecure though.

Credentials are super secret. We don't want them around all the time, and we do some heavy crypto with those creds before we can encrypt stuff to them, so that could just be resource intensive. So what do?

What we do is an AS-REQ and we ask for a special service ticket to a special service called the Ticket-Granting-Service (TSG), i.e. krbtgt. This solves both credential problems. How does a TGS-REQ work?

The TGS-REQ is almost identical to the AS-REQ. In fact the message structure is identical. The difference is that we include a special thing called pre-auth data, which is that encrypted ticket (plus goo, we'll get into it).

Eh-lPBfVkAAMvf4.png

The TGS is almost always the same server as the AS, it just has a logically different purpose. The REQ message contains the name of the requested service (host/service.threadabort.net) in this case, plus the ticket granting ticket in the preauth-data.

The TGS receives this message and first checks the preauth-data, extracting that TGT. The TGT is encrypted to the krbtgt long term key -- it's password. Only the KDC knows the krbtgt password, so it knows its genuine and came from itself previously.

Once decrypted we now have that session key that only the client (and now TGS) knows. The KDC generates a response (REP) and instead of encrypting the client metadata to the client password, it encrypts it to the session key.

Eh-mUSYU4AEQ4zc.png

And so we're back to the client. The client receives the REP, decrypts the client metadata blob, and extracts the key. The key is used to decrypt the doubly-encrypted ticket blob, and voila the client now has a ticket and a key it can send to the application.

Eh-mlpFUcAAc2XZ.png

But we glossed over the sending-to-the-application part. It turns out this is partially undefined, that way the application protocol (say SMB) can include the ticket as they need it. Mostly, we have some housekeeping first though.

That housekeeping is the act of converting the encrypted ticket blob into an application request (AP-REQ). Why do this, and not just send the ticket? Well, the AP-REQ serves a few purposes. First, it provides a way to prevent message replay, and includes a way to switch up keys.

An AP-REQ is made up of two things: a ticket and an authenticator. We already know the ticket -- it's encrypted to the application long term key, and contains a special session key. The authenticator is special however.

Eh-ns4qUwAABR4p.png

The authenticator is encrypted with the ticket session key. The application should decrypt it because it includes additional metadata, like a sequence number.

Eh-n99OUMAAV2lO.png

This sequence number can be tracked by the application. If it sees the same number more than once it can treat it as a replay and kill the second attempt. See, each time a client kicks off a request it must generate a new AP-REQ, that way the application can prevent replay.

The other important bit of information in this authenticator is a sub-session key. This key is a special key that only the client (and now the application) know, so you can guarantee only the two parties know it without the KDC knowing it. Neat.

The application may or may not decide it wants to switch to the sub-session key. Regardless of that, it still needs to do one final thing: respond to the client with an AP-REP.

The AP-REP is mostly just an ACK. It contains a blob encrypted to the (sub-)session key, and this tells the client that the application is really who they say they are because it could decrypt the ticket issued by the KDC, and therefore the authenticator.

Eh-pcgIU4AEU4oF.png

But then it has one last nugget: yet another sub-session key. It turns out the application may decide it wants to use a different key for whatever reason it wants. Now any future communications between client and application are encrypted using keys that can be authenticated.

BUT WAIT THERES MORE!

I go into great detail about how this all fits into the Windows logon process in this other thread, but maybe we can go a bit deeper?

When a client, like Windows, decides it wants to do Kerberos it first needs find a KDC. There are a couple ways this can work.

The first is pretty straightforward: hardcode a list of KDCs. This is how many clients work. MIT Kerberos for instance supports this.

Eh-sIdiVoAAX8eL.png

My Kerberos .NET (kerberos.dev) library also supports this using the same schema.

Eh-shYfUwAAEMo8.png

The other way is through DNS lookups. Many clients support this, but often as a secondary approach because DNS isn't entirely secure (at least it wasn't, at the time). This works by looking for an SRV record _kerberos._transport.realm.net. The results are KDCs.

Eh-tKCaU4AEJGUb.png

Windows prefers another approach entirely because hardcoding doesn't scale well when you have thousands of DCs, and DNS doesn't include enough detail. What it does is something called DC location using the DC locator service.

The DC locator works by first querying DNS for some LDAP SRV records. These list all (K)DCs in the domain, and the locator goes through each record it finds and makes a UDP LDAP call to the server.

The server either responds immediately, or the request times out quickly and moves on to the next record. The locator checks each record for KDC info and tries to find one that's close enough based on IP subnet and the registered AD site.

As it narrows the list down it then looks to make sure the DC supports the things it needs. There's a couple dozen options to choose from and they're documented here: https://docs.microsoft.com/en-us/windows/win32/api/dsgetdc/nf-dsgetdc-dsgetdcnamea.

/// 
/// Forces cached domain controller data to be ignored.
/// 
DS_FORCE_REDISCOVERY = 1 << 0,

/// 
/// Requires that the returned domain controller support directory services.
/// 
DS_DIRECTORY_SERVICE_REQUIRED = 1 << 1,

/// 
/// Attempts to find a domain controller that supports directory service functions.
/// 
DS_DIRECTORY_SERVICE_PREFERRED = 1 << 2,

/// 
/// Requires that the returned domain controller be a global catalog server for
/// the forest of domains with this domain as the root.
/// 
DS_GC_SERVER_REQUIRED = 1 << 3,

/// 
/// Requires that the returned domain controller be the primary domain controller for the domain.
/// 
DS_PDC_REQUIRED = 1 << 4,

/// 
/// Requests that cached domain controller data should be used.
/// 
DS_BACKGROUND_ONLY = 1 << 5,

/// 
/// This parameter indicates that the domain controller must have an IP address.
/// 
DS_IP_REQUIRED = 1 << 6,

/// 
/// Requires that the returned domain controller be currently running the Kerberos Key Distribution Center service.
/// 
DS_KDC_REQUIRED = 1 << 7,

/// 
/// Requires that the returned domain controller be currently running the Windows Time Service.
/// 
DS_TIMESERV_REQUIRED = 1 << 8,

/// 
/// Requires that the returned domain controller be writable; that is, host a writable copy of the directory service.
/// 
DS_WRITABLE_REQUIRED = 1 << 9,

/// 
/// Attempts to find a domain controller that is a reliable time server.
/// 
DS_GOOD_TIMESERV_PREFERRED = 1 << 10,

/// 
/// Specifies that the returned domain controller name should not be the current computer.
/// 
DS_AVOID_SELF = 1 << 11,

/// 
/// Specifies that the server returned is an LDAP server.
/// 
DS_ONLY_LDAP_NEEDED = 1 << 12,

/// 
/// Specifies that the DomainName parameter is a flat name. This flag cannot be combined with the DS_IS_DNS_NAME flag.
/// 
DS_IS_FLAT_NAME = 1 << 13,

/// 
/// Specifies that the DomainName parameter is a DNS name. This flag cannot be combined with the DS_IS_FLAT_NAME flag.
/// 
DS_IS_DNS_NAME = 1 << 14,

/// 
/// Attempts to find a domain controller in the same site as the caller otherwise attempts to resolve the next closest site.
/// 
DS_TRY_NEXTCLOSEST_SITE = 1 << 15,

/// 
/// Requires that the returned domain controller be running Windows Server 2008 or later.
/// 
DS_DIRECTORY_SERVICE_6_REQUIRED = 1 << 16,

/// 
/// Requires that the returned domain controller be currently running the Active Directory web service.
/// 
DS_WEB_SERVICE_REQUIRED = 1 << 17,

/// 
/// Requires that the returned domain controller be running Windows Server 2012 or later.
/// 
DS_DIRECTORY_SERVICE_8_REQUIRED = 1 << 18,

/// 
/// Requires that the returned domain controller be running Windows Server 2012 R2 or later.
/// 
DS_DIRECTORY_SERVICE_9_REQUIRED = 1 << 19,

/// 
/// Requires that the returned domain controller be running Windows Server 2016 or later.
/// 
DS_DIRECTORY_SERVICE_10_REQUIRED = 1 << 20,

//////////////////////////////////////////////////////////////////////////////////////////////

/// 
/// Specifies that the names returned should be DNS names.
/// 
DS_RETURN_DNS_NAME = 1 << 30,

/// 
/// Specifies that the names returned should be flat names.
/// 
DS_RETURN_FLAT_NAME = unchecked(1U << 31)

Eventually it finds a DC that it likes and returns the address to the Kerberos stack.

Incidentally this functionality is why we can do amazing things with FAST, or Windows Hello, or FIDO and only require a handful of DCs running the latest bits instead of all of them. The client knows it needs a KDC that supports e.g. FIDO, so just it asks DC Locator for one.

However, suppose Windows isn't talking to an Active Directory domain. Maybe it's talking to an MIT or Heimdal KDC. They don't support DC Locator (well, probably don't?), so Windows will fall back to looking up the DNS SRV _kerberos records.

You'll notice I haven't said a thing about domain-joined vs workgroup vs AADJ vs whatever. This is by design because Windows doesn't give a crap about whether you're any of those (I mean, it does, but...). Windows is just another Kerberos client.

All Windows needs to make Kerberos work is have a user principal name, a credential, and a realm. It can guess about your realm based on your username. If I type jack@threadabort.net it's plausible my realm is THREADABORT.NET. At least, it's worth trying.

So Windows will DC locate threadabort.net: it'll jump down the LDAP rabbit hole, and then it'll move on the straight DNS, then it'll even try HTTP (we'll get to that in a bit).

Eventually it'll probably find KDC, and Windows will do the AS-REQ to get a krbtgt, then a TGS-REQ to the service in question. No domain-joined'ness involved.

So what does domain join do? First it configures your desktop authority. That means Windows will create a desktop and logon session for you if the configured domain gives you a ticket to itself.

As it happens there's nothing inherently special about that last process, beyond knowing what knobs to turn. If you call all the correct APIs you too could create a logon session without being joined to that domain.

The second thing joining the domain does is it provides a bunch of hints to the Kerberos logon process. If my UPN is jack@threadbort.net, and my realm is THREADABORT.NET, Windows can reason through that. It can't reason through it if my realm is CORP.THREADBORT.NET.

Domain join provides these hints to say that if it can't find the domain through its standard process, try the machine's configured domain. It's either the right one, or maybe it's a domain across a trust.

The third thing of course is kinda sorta related to the last one, but is more generically about policy: group policy. It's about management of the device from a centralized location.

And the inverse? An Active Directory realm with a Linux or Mac client? It pretty much works the same. 

Have user, have cred, have realm. 

Go find KDC for realm.

Do AS-REQ, TGS-REQ.

Connect to App.

Profit.

Where Windows differentiates itself here is the out-of-the-box capability to join the domain and be managed, and use the creds the user typed in to get to the desktop to make SSO magical behind the scenes.

Mac and Linux definitely have this capability, which can be configured through, say MDM. They kinda sorta work the same way. They start setting device defaults and optimistically using those values and creds.

But back to Windows. That covers domain join, which is starting to show its age. These days we have hybrid join, and AAD join. All of these do Kerberos to on-prem DCs.

In the hybrid case, you're still domain joined. Nothing special about that. AADJ is something else entirely though.

AADJ changes the desktop authority from your on-prem domain to Azure AD. It uses a different authentication provider (CloudAP instead of Kerberos). But it still does Kerberos with SSO. How?

Well, it's deceptively simple. Remember how I said Windows doesn't much care what domain state its in? If you give it a cred it'll just do Kerberos. AADJ works the same way. The difference is that when CloudAP gets a successful response back from AAD, it includes metadata.

That metadata includes useful information like your real realm name, and your fully qualified UPN. So even if I typed jack@threadabort.net, it'll return jack@corp.threadabort.net+CORP.THREADABORT.NET. CloudAP hands this off the Kerberos, says have it, and does an AS-REQ.

The difference here is that Windows now doesn't care if the AS-REQ succeeds. It either did, and now it has a TGT to do SSO to on-prem stuff, or it doesn't. Either way it's at the desktop and you still have your cloud creds for SSO to AAD.

The move to the cloud puts you in an interesting predicament though. Half your stuff is in the cloud, and half your stuff is still on-prem. However, now you're out and about, or given the current state of things working from home because of a pandemic.

All the on-prem stuff isn't reeeeeally necessary for day-to-day work, so you don't set up a VPN and everything just kinda hums along while users access their email from the cloud using AAD on their hybrid joined device.

It's been 6 months and your password expired some time ago. You need to change it because stuff is starting to lock you out. The only way to change your password is through CTRL-ALT-DEL, but you need line of sight to a DC and you don't have a VPN ARRRRRGGG.

Sometime before 2012 Windows introduced this idea of KDC Proxy. The ability to tunnel KDC messages (AS/TGS/ChangePW) over an HTTP channel instead of requiring line of sight on port 88.

If configured, Windows will natively use HTTP(S) to proxy KDC messages. It's pretty cool. The way it works is also pretty simple. Windows is configured with a couple registry keys that map a realm to a proxy server.

Eh-7g8HVgAEZQdF.png

Windows will attempt DC location and fail. It'll try DNS SRV _kerberos and fail. It'll check this registry location and go AHA!

Eh-8yfdU4AA8vaX.png

The protocol is pretty simple. It's just the Kerberos binary format (ASN.1 DER encoded bytes) wrapped in a special KDC Proxy DER structure, and fired off in an HTTP POST.

This is somewhat less useful for AS and TGS because you're not connecting to internal resources, but there's another message flow: Change Password!

The change password protocol is an extension of Kerberos. You start with an AS-REQ to the 'kadmin/changepw' service instead of krbtgt. This tells the KDC to ignore if the password is expired or required to change, because obviously we're gonna change it in a moment.

Then you send the service ticket to the changepw service on port 464. The message is an AP-REQ + an encrypted structure. The structure contains the new password, and is encrypted to the session key in the authenticator of the AP-REQ. See how it all ties back together?

Eh--X9dU4AA80Nb.png

In any case, now with a configured Windows device you can do CTRL-ALT-DEL and change your password when you're not on the corporate network. Neat.

And that concludes the introductory section to Kerberos.

.

.

.

In our next session we'll talk crypto and passwords because someone asked and brain dumps like this are kinda fun.

Way back in the beginning of time I mentioned the KDC encrypts the AS-REP client bit to the client password. Doesn't this mean the KDC needs to know the password??

I mean, yeah. Kinda.

In actual fact, what the KDC really needs to know is what the derived key is. Kerberos doesn't just take your password and encrypt stuff to it. Passwords are too weak. They need to be run through key-derivation functions, which is just a fancy name for hashing a thousand times.

The derivation functions are complicated. For AES+Sha1 you take the password and

hash = PBKD2(password, 4096 iterations)

constant = dk(...)

ki = n-fold(constant)

key = AESCTS(ki, hash)

See here if you can stomach it: https://github.com/dotnet/Kerberos.NET/blob/develop/Kerberos.NET/Crypto/AES/AESTransformer.cs

AES+Sha1 is the most common key mode out there today, but there are others. DES, 3DES, RC4, AES+Sha256, Camelia, etc. DES, 3DES, and RC4 are deprecated by standards bodies.

RC4 is still in use by Windows by default for backwards compatibility, but we're trying to make it go away.

But back to the key thing. No, the KDC doesn't store the raw passwords. The KDC stores these derived keys. The client knows how to convert the password into these keys too, so when it needs to encrypt or decrypt something, it'll run through it with some info provided by the KDC.

That information, specifically the key salt, provided by the KDC is returned in an error to the original AS-REQ.

Eh_B9pRUwAIYirl.png

There's also another bit that I left out, which is this idea of pre-authentication. Despite the key derivation process described earlier, they're not perfect. With enough effort, or a weak enough password, you can crack them.

If you couple that with a KDC that'll return a blob encrypted to that derived key, you get a nice little crypto oracle. You can take that blob offline, crack it, and now you have a password. Oops.

We need to make that first step of requesting the encrypted blob a little harder.

This is where pre-auth data comes in. Client does AS-REQ, KDC responds with an error, including that salt data, and says "by the way, you need to prove yourself. I support XYZ ways." One of those ways is an encrypted timestamp.

It's kinda simple. Take the salt you were just given, join it to the password and derive. Then take the current timestamp and encrypt it. Stick it into that pre-auth data section of the AS-REQ and away you go.

Much like on the TGS side of things, the AS sees the PA data, decrypts it, checks the timestamp is within the last few minutes and then returns whatever ticket was requested.

And so we have a handful of these hardness makers, or preauth types. I talked about FAST in this other thread:

And I talked about PKINIT (certificates) and Windows Hello here:

 And the Kerberos IETF working group is continuing to build out more options like SPAKE pre-auth.

Anyway, that's it. Here's Riley trying to follow along.

Eh_FPC6UcAAplUc.jfif

A new command line tool called Bruce has been created for managing Kerberos and Kerberos.NET... things.

The Kerberos.NET library is incredibly flexible in what it allows you to do with Kerberos-related things...in code. However, sometimes you don't want to write a bunch of boilerplate code and just want to test out a few scenarios or set up an environment that's compatible with MIT Kerberos, Heimdal, or Windows.

In an attempt to make things easier I recently built and published a new command line tool called Bruce*. It follows the MIT and Heimdal model for tooling. There's a bunch of commands separated into logical purposes:

  1. kinit -- Obtain and cache a Kerberos Ticket-Granting-Ticket.
  2. klist -- Displays the current list of tickets in the cache.
  3. kconfig -- Set or review configuration settings at the default path.
  4. kdestroy -- Delete the ticket cache.

The Bruce tool is a simple shell-like environment that lets you call the commands within itself or from it's own command line.


  ____
 | __ ) _ __ _   _  ___ ___
 |  _ \| '__| | | |/ __/ _ \
 | |_) | |  | |_| | (_|  __/
 |____/|_|   \__,_|\___\___|


Command Line tooling for the Kerberos.NET library. (v4.5.14+0131d7e8b6)

(C) Copyright 2020 .NET Foundation

bruce>kinit

Password for steve@SYFUHS.NET: ******************************

Ticket Count: 1

#0>                 Client:  steve @ SYFUHS.NET
                    Server:  krbtgt/SYFUHS.NET @ SYFUHS.NET
              Ticket EType:  AES256_CTS_HMAC_SHA1_96
                     Flags:  EncryptedPreAuthentication, PreAuthenticated, Initial, Renewable, Forwardable
                Start Time:  12/31/1969 4:00:00 PM -08:00
                  End Time:  9/11/2020 9:37:15 PM -07:00
               Renew Until:  9/12/2020 11:37:15 AM -07:00

bruce>_

For added flexibility you can also copy and rename the bruce.exe into any of the above commands and they'll act as if they were their own self-contained commands.


C:\dev\>copy bruce.exe kinit.exe
C:\dev\>kinit.exe
Password for steve@SYFUHS.NET: ******************************

Ticket Count: 1

#0>                 Client:  steve @ SYFUHS.NET
                    Server:  krbtgt/SYFUHS.NET @ SYFUHS.NET
              Ticket EType:  AES256_CTS_HMAC_SHA1_96
                     Flags:  EncryptedPreAuthentication, PreAuthenticated, Initial, Renewable, Forwardable
                Start Time:  12/31/1969 4:00:00 PM -08:00
                  End Time:  9/11/2020 9:43:55 PM -07:00
               Renew Until:  9/12/2020 11:43:55 AM -07:00

C:\dev\>_

Why didn't you just create a bunch of different command line tools then?

Because I just didn't want to.

The commands are extensible. You can create your own if you want. It just requires some attribute decorations:

    [CommandLineCommand("klist", Description = "KerberosList")]
    public class KerberosListCommand : BaseCommand

Getting the Tool

The tool is shipped as a dotnet tool right now. It's called 'bruce'. Go figure.

> dotnet tool install -g bruce

A standalone installer is forthcoming.

*Why the name Bruce?

Bruce is our guard dog.