Domain trusts are complicated. Here's how they work.

Like all good things this is mostly correct, with a few details fuzzier than others for reasons: a) details are hard on twitter; b) details are fudged for greater clarity; c) maybe I'm just dumb.

A while back I made a joke. It was hilarious.

Q: how does a user in forest A get a ticket to a resource in a domain in forest B?

Me: 😬

D6n6Be-UUAANofz.jpg

Because it was accurate. 

Why? Because domain trusts are complicated. Here's how they work.

To understand trusts we have to understand domains. In Windows-land domains are logical groupings of things like users and resources like computers and services. These things are grouped together by a name -- the domain name: foo.name.com.

These domain names can be whatever you want. They can mirror real DNS names registered publicly, or they can be internal-only and represent just your own stuff.

The things within the domain can only ever belong to a single domain. UserA belongs to domain.com. The computer myserver$ belongs to just domain.com. Both of these things can also exist in niamod.com, but they're different entities. Totally unrelated.

A user in a domain can logically access resources in the same domain. That is, they exist within the same security boundary (we'll come back to this). The user and resource both trust the domain, so when the domain says the user can access the resource, the resource listens.

The thing that dictates these security rules is the domain controller. It is the arbiter of access control. In Kerberos-land it is the key distribution center: A bit about Kerberos (syfuhs.net)

So far all of this is pretty simple (in the sense that I'm lying to you and its actually quite complicated, but you know, details) because everything is self-contained within a single domain. It's easy enough to reason about.

However, it gets a little more interesting when users in one domain need to access resources in another domain. 

How do? Through a domain trust.

A domain trust is an agreement between two domains where domain B is willing to allow users in domain A to access resources in domain B. In effect to act like a member of domain B.

Trusts only work in a single direction. Domain B trusts domain A. If a user in B tries to access a resource in domain A, domain A will block it.

But that doesn't mean you can't have multiple trusts between the same two domains.

B trusts A. That's one trust. Users in A can access resources in B.

A trusts B. That's another trust. Users in B can access resources in A.

There's only two domains and one direction, so in total there can only ever be two trusts.

However, there's nothing to stop multiple domains from trusting a single domain.

B trusts A.

B trusts C.

Users in C can access resources in C, but not in A.

Users in B cannot access resources in A or C.

Users in A can access resources in B, but not in C.

And trusts can be transitive. Meaning a user in domain A can access resources in domain C, by way of domain B.

B trusts A.

C trusts B.

A => B => C.

This is all well and good, but how does it actually work using the protocols Windows understands? In this case Kerberos and NTLM. Let's set the stage.

You have a user Alice in domain A. You have an SMB file share \\partnerstuff\ in domain B. A domain trust exists between the two domains such that domain B trusts domain A. In other words users in A can access stuff in B.

The user types \\partnerstuff\ into explorer and the SMB stack lights up. It connects to that server and the server asks it to authenticate. This is just plain old SSO so far: How Windows Single Sign-On Works (syfuhs.net)

The client will attempt Kerberos first and it connects to the domain controller and asks for a ticket to cifs/partnerstuff. Here the fun begins.

A domain controller has a list of all the service principals in it's domain. In this case the service principal doesn't exist in this domain. It exists in another domain. What does the domain controller do? Well, it consults with it's list of domain trusts.

The domain controller checks the SPN and looks for a suffix. Suppose the user typed in \\partnerstuff.domainb.com. The DC can infer from the name and has a pretty good idea which domain oversees this resource: domainb. However, the user just typed in \\partnerstuff. Hmmmm.

So it...guesses. Well, in the sense that it has a list of domains, and they've optionally been configured in an order, and it grabs the first one from the list.

Now the domain controller has the domain it thinks the resource belongs to so it creates a referral ticket to that domain. A referral ticket is just a plain old service ticket, except the resource is actually another domain instead of the file share.

Remember the security boundary thing. Domain A can't issue tickets to resources in domain B. However, domain A can issue a referral to domain B. When the domain trust was created, a secret key was shared between both domains. Domain A encrypts the referral ticket to that secret.

The domain controller has generated the referral and returns it to the client. The client looks at the response and says "waaaaaait, this isn't for the thing I asked for". The referral is instead of krbtgt/domainb.com.

The client is aware of this special service name [format]. It knows this is a referral. It knows it needs to use this special ticket to get the real ticket it wants, and it now knows what domain oversees this resource.

Since the client knows this is a special ticket, it knows it can do something weird, like use it in place of a TGT. In this case the client now has two TGTs: one to its own realm domaina.com, and now a referral TGT to domainb.com.

Though the special-ness of this ticket is a bit overstated. The client received the ticket and it stuck it in the cache. The client then decided it needed to make a TGS-REQ to a KDC in domainb so it looked for krbtgt/domainb.com in the cache, and oh hey, there's a ticket.

Anyway, the client has been given a hint: domaina doesn't know this resource, but here's a ticket to domainb, go ask them.

So the client makes a TGS-REQ to domainb, using the TGT it now has for domainb, and asks for cifs/partnerstuff.

The KDC receives this request and looks at the TGT. It's for krbtgt/domainb.com -- good good, but it's issued by realm domaina.com -- uhhhhh, what? This is a special hint to the KDC to go check the list of trusts it knows about.

So the KDC finds the domaina trust and gets the secret that was previously shared. The KDC decrypts the ticket. The KDC then looks for the cifs/partnerstuff SPN and finds it. Woohoo! The KDC generates a service ticket and returns it to the client.

The client now has a service ticket and it hands it off to the SMB stack. The SMB stack fires it off to the remote server and down the SSO rabbit hole we go.

But then I also said the trusts can be transitive. What happens when the SMB server is actually in domain C? The same exact thing, except repeated from B to C.

A returns a referral to B. B looks up the SPN and says "pfft, no idea, maybe try C" and returns a referral to C. C says "oh yeah, I got you" and returns a service ticket.

"But Steve" you say "this is just for domains. What's the deal with these forest things?" Hooboy. Okay. Deep breath. Forests.

Forests are hierarchical collections of domains. In Windows-land a domain MUST belong to a forest. It may be a forest of one, but there will always be a forest. It's kind of like a tree. The forest root is a domain itself.

Now suppose you have a forest corp.company.com. In this forest you have two domains: childa and childb. Their names are childa.corp.company.com and childb.corp.company.com.

Forests provide this special structure to the domains by way of trusts. Childa and childb trust corp. By virtue of the transitive property childa trusts childb and childb trusts childa.

Forests also provide another form of security boundary. To understand this we have to look at how authorization works across trusts.

Remember in the SSO thread where I said authorization is based on this Privilege Attribute Certificate thing? The PAC. That still applies to trusts. How Windows Single Sign-On Works (syfuhs.net)

The PAC contains a list of group memberships in the form of Security Identifiers -- SIDs. The SID is a globally unique identifier of the group or user. They're of the form S-1-{authority}-{domain}-{RID}.

The domain portion is the SID of the domain itself, and the RID is the relative identifier of the user/group in the domain. The RID is guaranteed unique within a domain, and the SID is guaranteed unique globally.

So when the KDC of your domain issues a referral ticket, it includes your PAC with your SIDs. The domain receiving the referral examines your SIDs and filters out SIDs that shouldn't belong in it.

What does that mean? SIDs that shouldn't belong? Well, a user in a forest might be a member of groups from a whole bunch of domains in the forest. That identity gets projected to all resources within the forest through this PAC.

But also across forests. You can have trusts between forests. The corp forest domain trusts childa, and the partner forest domain trusts the corp forest.

childa <= corp <= partner.

So a user in childa can access resources in partner, despite being an entirely different forest.

However, forests are bigger security boundaries. When you project your identity through the PAC to the other forest, the other forest is going to filter our anything that might be dangerous.

Principally that means any SID that has the domain portion matching the SID of the partner forest. The forest will accept 

S-1-{corp}-123

But it'll block

S-1-{partner}-456

This is because the partner forest only trusts corp to project identities from corp.

Why would corp ever provide a SID for a forest outside it's own security boundary? It wouldn't, unless it were evil and wanted to get access to resources it shouldn't normally have. Hence the filtering.

Plus, forest trusts are NOT transitive. You can't get referrals from foresta to forestb to forestc to forestd. It won't work. It'll be blocked.

This is why we generally refer to forests as the real security boundary. Domains will do some SID filtering for sanity reasons, but you can still project a false identity with SIDs from the target domain.

Now, if forests are separate collections of domains, how do domain controllers know to issue referrals to these entirely unrelated named things?

We're back to that hint or guess step. If the SPN is fully qualified, the DC grabs the rightmost portion of the name and compares that to the Top Level Names list of a trust. These TLNs say "I can (probably) issue tickets to this resource if the rightmost portion ends in my TLN".

My forest is corp.company.com and I have a trust to ext.partner.com. The TLN on the trust is ext.partner.com. The rightmost portion of cifs/partnerstuff.ext.partner.com is ext.partner.com, so go use that trust. Easy.

But again, people often just type \\partnerstuff. We're back to guessing. The KDC has this thing called Forest Search Order. It basically says "if you can't get a ticket from our domain then try all these other forests in this order until you get a ticket".

You can configure FSO on the KDC or the client. Both of them are really just hints. The KDC is either telling the client "eeeehhh, maybe try this one", or the client is asking the KDC "eeeeeeh, maybe I should try this one?"

Eventually a referral is chased enough that it finds a domain that'll issue the ticket. In large environments this can be kind of a pain. Normally whenever the client needed this ticket it'd have to start from the beginning and chase it down every time.

Thankfully Windows clients have this thing called the SPN cache. It basically acts as a shortcut. When the client finally receives the ticket it requested so many hops before, the realm of the final ticket is logged with the SPN.

The next time a service ticket is requested for that SPN, the cache is consulted. The client then knows it can skip all the referral chasing and just go directly to the domain that has the SPN. The client already has the referral TGT in the cache, so no extra work.

But that initial chase might still be expensive. It's going to take a while if you have to hop through a bunch of domains and the KDCs for those domains are over slow networks or just very far away. It adds up.

This is also kind of a pain because there's only so many hops you can go through before the client gives up. This is more of a safety thing so it doesn't stuck in a cycle chasing the same referrals over and over again. I think Windows allows up to 25 referrals?

Give or take. I can't be bothered to find it in the code. It's not like anyone has that complex of a forest environment anyway.

But anyway, every hop still counts. The DCs are sometimes very clever. Sometimes they know exactly what domain a resource is in, but can't necessarily get you directly to it. This is common in child domain => forest a => forest b chasing.

The DC can analyze this graph and provide a shortcut hint. It can't provide a referral directly to the final domain, but it can provide a hint that says when you get to domain B explicitly request a referral to domain C.

Before I wrap this up I wanted to touch on how NTLM fits into all this. Ugggggh.

It basically works the exact opposite way.

Instead of the client asking the KDC for a ticket, the client connects to the target resource and provides a nonce. The target server forwards that off to its own DC, and the DC checks the nonce.

The DC can't process the nonce because its from another domain, so the DC finds a DC in the other domain and asks the other domain to process it. If that other domain can't process it, that domain finds another domain, and so on until it finds the appropriate domain.

This does have the useful property that the client isn't particularly chatty, but the connection to the target is held open for quite a while. On the other hand as the client you also have no idea WTF is going on behind the scenes.

Anyway, here's Bruce not pleased at all with having to listen to me explain all this.

EnnnqmaVcAAorlw.jpg

Have you ever wondered how Windows does Single Sign-on?

Like all good things this is mostly correct, with a few details fuzzier than others for reasons: a) details are hard on twitter; b) details are fudged for greater clarity; c) maybe I'm just dumb.

 For SSO in Windows to make sense you need to take a look at how Windows logon first works: What happens when you type your password into Windows.

(And for completeness how it works for Azure AD joined machines: How Windows Azure AD sign in works)

(And of course how Kerberos works: A bit about Kerberos) Whew!

Tl;dr; you log into Windows with a credential and get a long lived ticket and that ticket is used in place of your password for lots of stuff.

But that just gets you to do the desktop. How do you get access to file shares, web sites, and who knows what on your network?

All of this is handled through a protocol named Negotiate, or SPNEGO (Simple and Protected Negotiation). It's a somewhat simple protocol on the wire, but the implementation is somewhat complicated. Here's how it works.

First, you try and access a resource, like a file share. You type in \\myserver\mysuperdupershare\ and Windows knows the \\ means "file share", so it opens up the SMB stack and says "hey SMB go connect to this thing...please". SMB does it's thing and connects.

The server on the far side says "whoaaa, this isn't an open share, papers please." The SMB client stack asks Windows for a ticket to the service, Windows obliges, and SMB hands that ticket to the remote service. The remote service verifies the ticket and allows SMB to continue.

Where does Negotiate fit into all this? Isn't this all Kerberos under the covers? Kinda, sorta.

First we have to understand that SPNEGO isn't an authentication protocol per-se. It's a protocol wrapper, meaning it takes one or more protocols and lets clients and servers negotiate which protocol they want to use (clever name, right?).

The protocol itself is pretty simple. It starts with a request: 

{

   I support: Kerberos, NTLM, NegoEx, digest, basic, etc.

   Also, here's an optimistic Kerberos ticket: <ticket>

}

 

The server looks at the list and says "oh Kerberos, let's do that" and uses the supplied ticket.

But it's negotiable, so the server might respond with "sorry, Kerberos doesn't work, please gimme NTLM". The client redoes the request with NTLM. The server can ACK it (and include any sub-protocol response) or fail it, and we're done. That's it, that's the entire protocol.

That, in my mind, isn't the interesting thing here. The interesting thing is how this is implemented in Windows.

So how does Windows implement this? Through a thing called the Security Support Provider Interface or SSPI for short. SSPI has a fraternal twin called the Generic Security Services Application Programming Interface or GSSAPI. SSPI is Windows, GSSAPI is...everything else.

They're both compatible with one another. This is how Windows can communicate with Linux servers, and how Linux clients can communicate with Windows servers. They both support SPNEGO.

Anyway, SSPI and GSSAPI (SSPI from here on out) is a collection of functions exposed by the platform. There's only a handful of functions.

These functions operate operate in what's often called The Loop (SSPI loop or GSS loop). It starts with a call to AcquireCredentialsHandle. Here you tell it what credentials you want to use. It could be blank, meaning it uses the default creds (more later) or supplied creds.

The call to ACH gives you a handle, and you pass this handle off to InitializeSecurityContext. You tell ISC you want to do "negotiate" or "kerberos" "or NTLM" or whatever is implemented. ISC returns to you a blob of binary goo.

Our friendly SMB stack is doing this whole thing during its connection and now has this goo from ISC. SMB takes this goo and fires it off to the server. The server receives this goo and now starts it's side of the loop. The server calls AcquireCredentialsHandle and gets a handle.

The server then passes the handle to AcceptSecurityContext plus "negotiate" plus the goo from the client. ASC returns more goo and the SMB server takes this goo and returns it to the client.

The client takes this goo and hands it, plus the handle from before, back into InitializeSecurityContext. ISC gives you more goo and SMB must fire that back to the server. The server takes that goo and passes it to AcceptSecurityContext.

This whole process can go on a few times. Hence called the loop. Once both sides determine either side is properly authenticated the server creates an NT Token based on the user information in the ticket and the application can then impersonate that user locally.

That's nice and simple at the 10k foot view. But how does Windows handle this internally?

Way back in the 'how logon works' thread I glossed over the "logon session". You've typed your password and out pops an NT token. This NT token is kernel structure and is (sort of) a handle to your logon session. The logon session is a logical container stored in LSA.

Within the container is things like your group membership (SIDs) as well as your Kerberos ticket cache, plus additional housekeeping. Whenever you need to access this container to reference it through your NT token. Every process you start has reference to this NT token.

So when explorer.exe starts up it has your NT token. When explorer realizes it needs to get a ticket it askes SSPI for one. SSPI now knows about this NT token.

So now the SSPI loop starts. ACH is called with a blank credentials structure. The ACH/ISC/ASC functions are all just mostly just stubs though. Just enough internals to validate parameters and fire off those parameters to LSA.

So ACH is called, the parameters are validated, and a new message is created and fired off to LSA via RPC. LSA receives this message and creates a credential handle tied to the NT token, and therefore the logon session. The handle is returned to the calling application.

So this handle is just a pointer to a structure in your logon session in LSA. You call ISC. ISC validates parameters and fires it off to LSA. LSA sees that you passed "negotiate" and opens the SPNego security package.

Security packages are all about SSO. They provide the internal LSA implementation of an SSO protocol, and these implementations mirror SSPI. SPNego is one of these packages.

But remember SPNego isn't an authentication protocol. It's a wrapper. So all it knows is a bunch of other protocols like Kerberos and NTLM. What's it do? It iterates through each of those protocols packages.

First it asks Kerberos "hey kerbie can you get a ticket to cifs/myshare?" Kerberos checks the logon session cache for a TGT, finds it, and fires a TGS-REQ to the KDC. KDC returns a ticket or says 🤷‍♂️. If that fails it moves on to NTLM, otherwise it continues with that ticket.

SPNego takes the ticket as well as the list of packages it knows, creates the message and returns it to the calling application. ISC gets this from LSA, hands it to the caller, and the caller fires it off to the server.

The server in the meantime has done ACH on it's side. The server process is running as a logged on user. It has to. All processes have to run as a user. In some cases that user may just be the SYSTEM. In either case a logon session is present.

The server process receives the message and passes the goo to ASC. ASC validates parameters and fires it off the the server LSA. LSA sees it asked for "negotiate" so it finds the SPNego package and hands the goo to the nego package.

The nego package decodes the message and sees it was Kerberos. The package goes and finds the Kerberos package, hands the ticket to the Kerberos package, and tells it to process it.

The Kerberos package receives this goo, does some validation, and goes looking for the decryption key. The decryption key is the logged on session's password (OR the cred passed through ACH earlier).

Once Kerberos decrypts the message it looks for a special structure called the PAC -- the Privilege Attribute Certificate. Silly name, but it's the thing that contains all your group membership info.

EnISBnZVQAMXOWl.png

The PAC contains a collection of structures such as your logon info (full name, logon server, groups, etc.), as well as other structures like your user or device claims, plus some signatures.

The logon info structure contains group membership SIDs plus metadata like when your password is going to expire, how many logons you've done on a DC, etc.

EnISdG5UwAElfM2.png

All of these structures get signed by the KDC using the service key (AKA the service account password), and then that signature is counter-signed using the krbtgt key.

The server receiving the Kerberos ticket validates the PAC signature because it has the key (machine password, service account password, etc.).

The server *may* decide to validate the PAC further by firing the PAC off to a DC via netlogon to ask if it's valid. However, this only occurs under a handful of instances, specifically when it's possible for the account receiving the ticket to be low privileged trying to do EOP.

Anyway, the Kerberos package now has this validated PAC and the group memberships. Kerberos runs all these groups through SID filtering.

You might have thought SID filtering just happens when you're crossing forests. It's more than that -- it's handled wherever a ticket is accepted. However, there are degrees of filtering, sort of a high/medium/low kind of thing. Here it's more like a sanity check.

Anyway, again, the Kerberos package has the filtered SIDs. The package asks LSA to create an NT token, this time an impersonation token. The impersonation token is special. It sorta has a logon session container, but its limited in what it can do. More on this in a second.

So now the Kerberos package has the NT token, and indicates to the nego package that it's done. Nego indicates to ASC, ASC indicates to the server, server indicates to the client, client informs ISC, and the loop is done. The client now communicates with the server as the user.

Back to the server. The server process has that impersonation token. A process can only have a single logon token, but it can have many impersonation tokens. When the server process wants to run as this user it needs to use the impersonation token. How?

Well, remember in Windows a process itself doesn't do anything. It's just a big container, and all processing actually happens on threads. One property of a thread is the thread identity -- an NT token.

When a process wants to run as a specific impersonated identity, it sets the thread identity to that specific impersonation token.

Now the server process is receiving requests from the client, such as "hey server, I want to access folder .\scratch\steve". The server is processing this request on the thread with the impersonated identity and calls for an ACL check to make sure the user can access the folder.

The ACL check function opens the current thread and gets the thread NT token. The ACL function examines the NT token group memberships (SIDs) and checks against what the folder requires.

This goes on for quite a while and eventually the client closes the connection. The server acknowledges this and begins cleaning up resources for the connection.

It locates the impersonation token and closes it. LSA sees this request to close and cleans up any resources it allocated. Now the server process can't impersonate the user anymore.

If the client wants to reconnect it has to start this dance all over again.

Riley as usual trying to follow along.

EnIZaEaVoAE6PUg.jpg