Capturing and Visualizing Office 365 Security Logs – Part 2

Posted on January 31, 2019 by mattfeltonma

Hello again my fellow geeks.

Welcome to part two of my series on visualizing Office 365 security logs. In my last post I walked through the process of getting the sign-in and security logs and provided a link to some Lambda’s I put together to automate pulling them down from Microsoft Graph. Recall that the Lambda stores the files in raw format (with a small bit of transformation on the time stamps) into Amazon S3 (Simple Storage Service). For this demonstration I modified the parameters for the Lambda to download the 30 days of the sign-in logs and to store them in an S3 bucket I use for blog demos.

When the logs are pulled from Microsoft Graph they come down in JSON (JavaScript Object Notation) format. Love JSON or hate it is the common standard for exchanging information these days. The schema for the JSON representation of the sign-in logs is fairly complex and very nested because there is a ton of great information in there. Thankfully Microsoft has done a wonderful job of documenting the schema. Now that we have the logs and the schema we can start working with the data.

When I first started this effort I had put together a Python function which transformed the files into a CSV using pipe delimiters. As soon as I finished the function I wondered if there was an alternative way to handle it. In comes Amazon Athena to the rescue with its Openx-JsonSerDe library. After reading through a few blogs (great AWS blog here), StackOverflow posts, and the official AWS documentation I was ready to put something together myself. After some trial and error I put together a working DDL (Data Definition Language) statement for the data structure. I’ve made the DDLs available on Github.

Once I had the schema defined, I created the table in Athena. The official AWS documentation does a fine job explaining the few clicks that are provided to create a table, so I won’t re-create that here. The DDLs I’ve provided you above will make it a quick and painless process for you.

Let’s review what we’ve done so far. We’ve setup a reoccurring job that is pulling the sign-in and audit logs via the API and is dumping all that juicy data into cheap object storage which we can further enforce lifecycle policies for. We’ve then defined the schema for the data and have made it available via standard SQL queries. All without provisioning a server and for pennies on the dollar. Not to shabby!

At this point you can use your analytics tool of choice whether it be QuickSight, Tableau, PowerBi, or the many other tools that have flooded the market over the past few years. Since I don’t make any revenue from these blog posts, I like to go the cheap and easy route of using Amazon QuickSight.

After completing the initial setup of QuickSight I was ready to go. The next step was to create a new data set. For that I clicked the Manage Data button and selected New Data Set.

On the Create a Data Set screen I selected the Athena option and created a name for the data source.

From there I selected the database in Athena which for me was named azuread. The tables within the database are then populated and I chose the tbl_signin_demo which points to the test S3 bucket I mentioned previously.

Due to the complexity of the data structure I opted to use a custom SQL query. There is no reason why you couldn’t create the table I’m about to create in Athena and then connect to that table instead to make it more consumable for a wider array of users. It’s really up to you and I honestly don’t know what the appropriate “big data” way of doing it is. Either way, those of you with real SQL skills may want to look away from this query lest you experience a Raiders of The Lost Ark moment.

You were warned.

SELECT records.id, records.createddatetime, records.userprincipalname, records.userDisplayName, records.userid, records.appid, records.appdisplayname, records.ipaddress, records.clientappused, records.mfadetail.authdetail AS mfadetail_authdetail, records.mfadetail.authmethod AS mfadetail_authmethod, records.correlationid, records.conditionalaccessstatus, records.appliedconditionalaccesspolicy.displayname AS cap_displayname, array_join(records.appliedconditionalaccesspolicy.enforcedgrantcontrols,' ') AS cap_enforcedgrantcontrols, array_join(records.appliedconditionalaccesspolicy.enforcedsessioncontrols,' ') AS cap_enforcedsessioncontrols, records.appliedconditionalaccesspolicy.id AS cap_id, records.appliedconditionalaccesspolicy.result AS cap_result, records.originalrequestid, records.isinteractive, records.tokenissuername, records.tokenissuertype, records.devicedetail.browser AS device_browser, records.devicedetail.deviceid AS device_id, records.devicedetail.iscompliant AS device_iscompliant, records.devicedetail.ismanaged AS device_ismanaged, records.devicedetail.operatingsystem AS device_os, records.devicedetail.trusttype AS device_trusttype,records.location.city AS location_city, records.location.countryorregion AS location_countryorregion, records.location.geocoordinates.altitude, records.location.geocoordinates.latitude, records.location.geocoordinates.longitude,records.location.state AS location_state, records.riskdetail, records.risklevelaggregated, records.risklevelduringsignin, records.riskstate, records.riskeventtypes, records.resourcedisplayname, records.resourceid, records.authenticationmethodsused, records.status.additionaldetails, records.status.errorcode, records.status.failurereason  FROM "azuread"."tbl_signin_demo" CROSS JOIN (UNNEST(value) as t(records))

This query will de-nest the data and give you a detailed (possibly extremely large depending on how much data you are storing) parsed table. I was now ready to create some data visualizations.

The first visual I made was a geospatial visual using the location data included in the logs filtered to failed logins. Not surprisingly our friends in China have shown a real interest in my and my wife’s Office 365 accounts.

Next up I was interested in seeing if there were any patterns in the frequency of the failed logins. For that I created a simple line chart showing the number of failed logins per user account in my tenant. Interestingly enough the new year meant back to work for more than just you and me.

Like I mentioned earlier Microsoft provides a ton of great detail in the sign-in logs. Beyond just location, they also provide reasons for login failures. I next created a stacked bar chat to show the different reasons for failed logs by user. I found the blocked sign-ins by malicious IPs interesting. It’s nice to know that is being tracked and taken care of.

Failed logins are great, but the other thing I was interested in is successful logins and user behavior. For this I created a vertical stacked bar chart that displayed the successful logins by user by device operating system (yet more great data captured in the logs). You can tell from the bar on the right my wife is a fan of her Mac!

As I gather more data I plan on creating some more visuals, but this was great to start. The geo-spatial one is my favorite. If you have access to a larger data set with a diverse set of users your data should prove fascinating. Definitely share any graphs or interesting data points you end up putting together if you opt to do some of this analysis yourself. I’d love some new ideas!

That will wrap up this series. As you’ve seen the modern tool sets available to you now can do some amazing things for cheap without forcing you to maintain the infrastructure behind it. Vendors are also doing a wonderful job providing a metric ton of data in their logs. If you take the initiative to understand the product and the data, you can glean some powerful information that has both security and business value. Even better, you can create some simple visuals to communicate that data to a wide variety of audiences making it that much more valuable.

Have a great weekend!

Active Directory Federation Services – SQL Attribute Store

Posted on May 28, 2017 by mattfeltonma

Hi everyone,

I recently had a use case come across my desk where I needed to do a SAML integration with a SaaS provider. The provider required a number of pieces of information about the user beyond the standard unique identifier. The additional information would be used to direct the user to the appropriate instance of the SaaS application.

In the past fifty or so SAML integrations I’ve done, I’ve been able to source my data directly from the Active Directory store. This was because Active Directory was authoritative for the data or there was a reliable data synchronization process in place such that the data was being sourced from an authoritative source. In this scenario, neither options was available. Thankfully the data source I needed to hit to get the missing data exposed a subset of its data through a Microsoft SQL view.

I have done a lot in AD FS over the past few years from design to operational support of the service, but I had never sourced information from a data source hosted via MS SQL Server. I reviewed the Microsoft documentation available via TechNet and found it to be lacking. Further searches across MS blogs and third-party blogs provided a number of “bits” of information but no real end to end guide. Given the lack of solid content, I decided it would be fun to put one together so off to Azure I went.

For the lab environment, I built the following:

Active Director forest name – geekintheweeds.com
Server 1 – SERVERDC (Windows Server 2016)
- Active Directory Domain Services
- Active Directory Domain Naming Services
- Active Directory Certificate Services
Server 2 – SERVER-ADFS (Windows Server 2016)
- Active Directory Federation Services
- Microsoft SQL Server Express 2016
Server 3 – SERVER-WEB (Windows Server 2016)
- Microsoft IIS

On SERVER-WEB I installed the sample claims application referenced here. Make sure to follow the instructions in the blog to save yourself some headaches. There are plenty of blogs out there that discuss building a lab consisting the of the services outlined above, so I won’t cover those details.

On SERVER-ADFS I created a database named hrdb within the same instance as the AD FS databases. Within the database I created a table named dbo.EmployeeInfo with 5 columns named givenName, surName, email, userName, and role all of data type nvchar(MAX). The userName column contained the unique values I used to relate a user object in Active Directory back to a record in the SQL database.

Once the database was created and populated with some sample data and the appropriate Active Directory user objects were created, it was time to begin to configure the connectivity between AD FS and MS SQL. Before we go creating the new attribute store, the AD FS service account needs appropriate permissions to access the SQL database. I went the easy route and gave the service account the db_datareader role on the database, although the CONNECT and SELECT permissions would have probably been sufficient.

After the service account was given appropriate permissions the next step was to configure it as an attribute store in AD FS. To that I opened the AD FS management console, expanded the service node, and right-clicked on the Attribute Store and selected the Add Attribute Store option. I used mysql as the store name and selected SQL option from the drop-down box. My SQL was a bit rusty so the connection string took a few tries to get right.

I then created a new claim description to hold the role information I was pulling from the SQL database.

The last step in the process was to create some claim rules to pull data from the SQL database. Pulling data from a MS SQL datastore requires the use of custom claim rules. If you’re unfamiliar with the custom claim language, the following two links are two of the best I’ve found on the net:

The first claim rule I created was a rule to query Active Directory via LDAP for the SAM-Account-Name attribute. This is the attribute I would be using to query the SQL database for the user’s unique record.

Next up I had my first custom claim rule where I queried the SQL database for the value in the userName column for the value of the SAM-Account-Name I pulled from earlier step and I requested back the value in the email column of the record that was returned. Since I wanted to do some transforming of the information in a later step, I added the claim to incoming claim set.

I then issued another query for the value in the role column.

Finally, I performed some transforms to verify I was getting the appropriate data that I wanted. I converted the email address claim type to the Common Name type and the custom claim definition role I referenced above to the out of the box role claim definition. I then hit the endpoint for the sample claim app and… VICTORY!

Simple right? Well it would be if this information had been documented within a single link. Either way, I had some good lessons learned that I will share with you now:

Do NOT copy and paste claim rules. I chased a number of red herrings trying to figure out why my claim rule was being rejected. More than likely the copy/paste added an invalid character I was unable to see.
Brush up on your MS SQL before you attempt this. My SQL was super rusty and it caused me to go down a number of paths which wasted time. Thankfully, my worker Jeff Lee was there to add some brain power and help work through the issues.

Before I sign off, I want to thank my coworker Jeff Lee for helping out on this one. It was a great learning experience for both of us.

Thanks and have a wonderful Memorial Day!

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Tag Archives: SQL

Capturing and Visualizing Office 365 Security Logs – Part 2

Active Directory Federation Services – SQL Attribute Store