Over the past week I’ve been building a lab for an upcoming deep dive into Microsoft’s Web Application Proxy. During the course of building the lab I ran into a few interesting issues with AD FS and the Web Application Proxy that I wanted to cover. Some were similar to issues I’ve run into in production environments and some were new to me.
These issues are interesting in that there aren’t any obvious indicators of the problem in any of the typical logs. Two out of three required some trial and error to determine root cause, while the third drove me quite insane for a good two weeks before getting an answer from an “official” source. Over the course of this series of blogs I’ll cover each issue in detail with the hopes that it will help others troubleshoot these issues in the future.
Issue 1: AD FS Certificate authentication fails
I’m going to start with the problem that took me the longest to resolve and eventually required getting the answer directly from an official source.
For those of you that are unfamiliar, AD FS provides the capability to offer multi-factor authentication methods both native and third-party. Out of the box, it supports certificate-based authentication as an option for a multi-factor or “step-up” authentication mechanism.
A few months back I wanted to take advantage of the certificate authentication feature to provide a two-factor authentication solution for applications integrated with AD FS. Like a good engineer I did my Googling, read the Microsoft articles and various blogs out there to understand how the feature worked and what the requirements were. I built a lab in Azure, setup an AD FS server, and ensured port 49443 was open in addition to the the typical ports required by AD FS. I created my instance of AD CS, issued a user certificate containing the user’s UPN in the subject alternate name field, and setup a sample SAML app and configured it to require Certificate authentication.
How easy it all sounds right? I navigated to the sample application and got the screen below…
and I waited…. and waited…. and waited… Ummm, what went wrong? Well surely the AD FS log will tell me what happened.
Well isn’t that odd. No errors or warnings in the AD FS Admin log. A quick check of the Application and System logs showed no errors either. Maybe the AD FS Debug log would show me something? I flipped on the log and attempted another authentication.
Nothing as well? Maybe the server can’t query the revocation lists designated in the certificates CDP? Nope, not that either the server can successfully contact the CDP endpoints. At this point I began to get quite frustrated and attempted packet captures, Fiddler captures, and anything and everything I could think of. Nothing I tried revealed the answer.
I finally gave in (which I can tell you is incredibly challenging for me) and reached out to an “official” source. We chatted back and forth and went through much of the same steps as outlined above to ensure I didn’t miss anything. However, we ran into another dead end. He then reached out to some other engineers he knew and eventually we got a hit. We were told to check to see if there were any intermediary certificates stored within the trusted root certificate authorities store. Sounds like an odd circumstance, but sure why not.
Upon opening up the certificates MMC, opening the machine store, and exploring the trusted root certificate authorities store low and behold I see an intermediary certificate within the store. I deleted the certificate, restarted the AD FS server and attempted another login to the sample claim application and hit the screen below.
Boom, I’m finally receiving the certificate prompt. Clicking the OK button brings about the successful login below.
So what was the issue? Apparently AD FS certificate authentication fails without generating an error in any logical location (maybe nowhere at all?) if there is an intermediary certificate in the trusted root certificate authority machine store. I’ve verified this is an issue in both AD FS 2012 R2 and AD FS 2016. Now why this occurs is unknown to me. It could be the underlining HTTPS.SYS driver that pukes and doesn’t report any errors to the event logs. I didn’t get a straight answer as to why this occurs, just that it will due to some type of integrity check on the machine certificate store. Odd right?
That completes the rundown of the first of three problems I’ll be outlining in this series of blogs. Hopefully this helps save someone else some time and aggravation.
See you next post!