Azure VWAN and Private Endpoint Traffic Inspection – Findings

Posted on August 20, 2023 by mattfeltonma

Today I’m taking a break from my AI series to cover an interesting topic that came up at a customer.

My customer base exists within the heavily regulated financial services industry which means a strong security posture. Many of these customers have requirements for inspection of network traffic which includes traffic between devices within their internal network space. This requirement gets interesting when talking inspection of traffic destined for a service behind a Private Endpoint. I’ve posted extensively on Private Endpoints on this blog, including how to perform traffic inspection in a traditional hub and and spoke network architecture. One area I hadn’t yet delved into was how to achieve this using Azure Virtual WAN (VWAN).

VWAN is Microsoft’s attempt to iterate on the hub and spoke networking architecture and make the management and setup of the networking more turnkey. Achieving that goal has been an uphill battle for VWAN with it historically requiring very complex architectures to achieve the network controls regulated industries strive for. There has been solid progress over the past few months with routing intent and support for additional third-party next generation firewalls running in the VWAN hub such as Palo Alto becoming available. These improvements have opened the doors for regulated customers to explore VWAN Secure Hubs as a substitute for a traditional hub and spoke. This brings us to our topic: How do VWAN Secure Hubs work when there is a requirement to inspect traffic destined for a Private Endpoint?

My first inclination when pondering this question was that it would work in the same way a traditional hub and spoke works. In past posts I’ve covered that pattern. You can take a look at this repository I’ve put together which walks through the protocol flows in detail if you’re curious. The short of it is inspection requires enabling network policies for the subnet the Private Endpoints are deployed to and SNATing at the firewall. The SNATing is required at the firewall because Private Endpoints do not obey user-defined routes defined in a route table. Without the SNAT you get asymmetric routing that becomes a nightmare of troubleshooting to identify. Making it even more confusing, some services like Azure Storage will magically keep traffic symmetric as I’ve covered in past posts. Best practice for traditional hub and spoke is SNATing for firewall inspection with Private Endpoints.

My first stop was to read through the Microsoft documentation. I came across this article first which walks through traffic inspection with Azure Firewall with a VWAN Secure Hub. As expected, the article states that SNAT is required (yes I’m aware of the exception for Azure Firewall Application Rules, but that is the exception and not the rule and very few in my customer space use Azure Firewall). Ok great, this aligns with my understanding. But wait, this article about Secure Hub with routing intent does not mention SNAT at all. So is SNAT required or not?

When public documentation isn’t consistent (which of course NEVER happens) it’s time to lab and see what we see. I threw together a single region VWAN Secure Hub with Azure Firewall, enabled routing intent for both Internet and Private traffic, and connected my home lab over a S2S VPN. I created then Private Endpoint for a Key Vault and Azure SQL resource. Per the latter article mentioned above, I enabled Private Endpoint Network Policies for the snet-svc subnet in the spoke virtual network. Finally, I created a single Network Rule allowing traffic for 443 and 1433 from my lab to the spoke virtual network. This ensured I didn’t run into the transparent proxy aspect of Application Rules throwing off my findings.

If you were doing this in the “real world” you’d setup a packet capture on the firewall and validate you see both sides of the conversation. If you’ve used Azure Firewall, you’re well aware it does not yet support packet captures making this impossible. Thankfully, Microsoft has recently introduced Azure Firewall Structure Fir e wall Logs which include a log called Azure Firewall Flow Trace Log. This log will show you the gooey details of the TCP conversation and helps to fill the gap of troubleshooting asymmetric traffic while Microsoft works on offering a packet capture capability (a man can dream, can’t he?).

While the rest of the Azure Firewall Structured Logs need nothing special to be configured, the Flow Trace Logs do (likely because as of 8/20/2023 they’re still in public preview). You need to follow the instructions located within this document. Make sure you give it a solid 30 minutes of completing the steps to enable the feature before you enable the log through the diagnostic settings of the Azure Firewall. Also, do not leave this running. Beyond the performance hit that can occur because of how chatty this log is, you could also be in a world of hurt for a big Log Analytics Workspace bill if you leave it running.

Once I had my lab deployed and the Flow Trace Logs working, I next went ahead testing using the Test-NetConnection PowerShell cmdlet from a Windows machine in my home lab. This is a wonderful cmdlet if you need something built-in to Windows to do a TCP Ping.

**Testing Azure SQL via Private Endpoint**

In the above image you can see that the TCP Ping to port 1433 of an Azure SQL database behind a Private Endpoint was successful. Review of the Azure Firewall Network Logs showed my Network Rule firing which tells me the TCP SYN at least passed through providing proof that Private Endpoint Network Policies were successfully affecting the traffic to the Private Endpoint.

What about return traffic? For that I went to the Flow Trace Logs. Oddly enough, the firewall was also receiving the SYN-ACK back from the Private Endpoint all without SNAT being configured. I repeated the test for a Azure Key Vault behind a Private Endpoint and observed the same behavior (and I’ve confirmed in the Azure Key Vault needs SNAT for return traffic in the past in a standard hub and spoke).

So is SNAT required or not? You’re likely expecting me to answer yes or no. Well today I’m going to surprise you with “I don’t know”. While testing with these two services in this architecture seemed to indicate it was not, I’ve circulated these findings within Microsoft and the recommendation to SNAT to ensure flow symmetry remains. As I’ve documented in prior posts, not all Azure services behave the same way with traffic symmetry and Azure Private Endpoints (Azure Storage for example) and for consistent purposes you should be SNATing. Do not rely on your testing of a few services in a very specific architecture as being gospel. You should be following the practices outlined in the documentation.

I feel like I’m ending this blog Sopranos-style with fade to black, but sometimes even tech has mystery. In this post you got a taste of how Flow Trace Logs can help troubleshoot traffic symmetry issues when using Azure Firewall and you learned that not all things in the cloud work the way you expect them to work. Sometimes that is intentional and sometimes it’s not intentional. When you run into this type of situation where behavior you’re observing doesn’t match documentation, it’s always best to do what is documented (in this case you should be doing SNAT). Maybe it’s something you’re doing wrong (this is me we’re talking about) or maybe you don’t have all the data (I tested 2 of 100+ services). If you go with what you experience, you risk that undocumented behavior being changed or corrected and then being in a heap of trouble in the middle of the night (oh the examples I could give of this across my time at cloud providers over a glass of Titos).

Well folks, that wraps things up. TLDR; SNAT until the documentation says otherwise regardless of what you experience.

Thanks!

Capturing and Visualizing Office 365 Security Logs – Part 2

Posted on January 31, 2019 by mattfeltonma

Hello again my fellow geeks.

Welcome to part two of my series on visualizing Office 365 security logs. In my last post I walked through the process of getting the sign-in and security logs and provided a link to some Lambda’s I put together to automate pulling them down from Microsoft Graph. Recall that the Lambda stores the files in raw format (with a small bit of transformation on the time stamps) into Amazon S3 (Simple Storage Service). For this demonstration I modified the parameters for the Lambda to download the 30 days of the sign-in logs and to store them in an S3 bucket I use for blog demos.

When the logs are pulled from Microsoft Graph they come down in JSON (JavaScript Object Notation) format. Love JSON or hate it is the common standard for exchanging information these days. The schema for the JSON representation of the sign-in logs is fairly complex and very nested because there is a ton of great information in there. Thankfully Microsoft has done a wonderful job of documenting the schema. Now that we have the logs and the schema we can start working with the data.

When I first started this effort I had put together a Python function which transformed the files into a CSV using pipe delimiters. As soon as I finished the function I wondered if there was an alternative way to handle it. In comes Amazon Athena to the rescue with its Openx-JsonSerDe library. After reading through a few blogs (great AWS blog here), StackOverflow posts, and the official AWS documentation I was ready to put something together myself. After some trial and error I put together a working DDL (Data Definition Language) statement for the data structure. I’ve made the DDLs available on Github.

Once I had the schema defined, I created the table in Athena. The official AWS documentation does a fine job explaining the few clicks that are provided to create a table, so I won’t re-create that here. The DDLs I’ve provided you above will make it a quick and painless process for you.

Let’s review what we’ve done so far. We’ve setup a reoccurring job that is pulling the sign-in and audit logs via the API and is dumping all that juicy data into cheap object storage which we can further enforce lifecycle policies for. We’ve then defined the schema for the data and have made it available via standard SQL queries. All without provisioning a server and for pennies on the dollar. Not to shabby!

At this point you can use your analytics tool of choice whether it be QuickSight, Tableau, PowerBi, or the many other tools that have flooded the market over the past few years. Since I don’t make any revenue from these blog posts, I like to go the cheap and easy route of using Amazon QuickSight.

After completing the initial setup of QuickSight I was ready to go. The next step was to create a new data set. For that I clicked the Manage Data button and selected New Data Set.

On the Create a Data Set screen I selected the Athena option and created a name for the data source.

From there I selected the database in Athena which for me was named azuread. The tables within the database are then populated and I chose the tbl_signin_demo which points to the test S3 bucket I mentioned previously.

Due to the complexity of the data structure I opted to use a custom SQL query. There is no reason why you couldn’t create the table I’m about to create in Athena and then connect to that table instead to make it more consumable for a wider array of users. It’s really up to you and I honestly don’t know what the appropriate “big data” way of doing it is. Either way, those of you with real SQL skills may want to look away from this query lest you experience a Raiders of The Lost Ark moment.

You were warned.

SELECT records.id, records.createddatetime, records.userprincipalname, records.userDisplayName, records.userid, records.appid, records.appdisplayname, records.ipaddress, records.clientappused, records.mfadetail.authdetail AS mfadetail_authdetail, records.mfadetail.authmethod AS mfadetail_authmethod, records.correlationid, records.conditionalaccessstatus, records.appliedconditionalaccesspolicy.displayname AS cap_displayname, array_join(records.appliedconditionalaccesspolicy.enforcedgrantcontrols,' ') AS cap_enforcedgrantcontrols, array_join(records.appliedconditionalaccesspolicy.enforcedsessioncontrols,' ') AS cap_enforcedsessioncontrols, records.appliedconditionalaccesspolicy.id AS cap_id, records.appliedconditionalaccesspolicy.result AS cap_result, records.originalrequestid, records.isinteractive, records.tokenissuername, records.tokenissuertype, records.devicedetail.browser AS device_browser, records.devicedetail.deviceid AS device_id, records.devicedetail.iscompliant AS device_iscompliant, records.devicedetail.ismanaged AS device_ismanaged, records.devicedetail.operatingsystem AS device_os, records.devicedetail.trusttype AS device_trusttype,records.location.city AS location_city, records.location.countryorregion AS location_countryorregion, records.location.geocoordinates.altitude, records.location.geocoordinates.latitude, records.location.geocoordinates.longitude,records.location.state AS location_state, records.riskdetail, records.risklevelaggregated, records.risklevelduringsignin, records.riskstate, records.riskeventtypes, records.resourcedisplayname, records.resourceid, records.authenticationmethodsused, records.status.additionaldetails, records.status.errorcode, records.status.failurereason  FROM "azuread"."tbl_signin_demo" CROSS JOIN (UNNEST(value) as t(records))

This query will de-nest the data and give you a detailed (possibly extremely large depending on how much data you are storing) parsed table. I was now ready to create some data visualizations.

The first visual I made was a geospatial visual using the location data included in the logs filtered to failed logins. Not surprisingly our friends in China have shown a real interest in my and my wife’s Office 365 accounts.

Next up I was interested in seeing if there were any patterns in the frequency of the failed logins. For that I created a simple line chart showing the number of failed logins per user account in my tenant. Interestingly enough the new year meant back to work for more than just you and me.

Like I mentioned earlier Microsoft provides a ton of great detail in the sign-in logs. Beyond just location, they also provide reasons for login failures. I next created a stacked bar chat to show the different reasons for failed logs by user. I found the blocked sign-ins by malicious IPs interesting. It’s nice to know that is being tracked and taken care of.

Failed logins are great, but the other thing I was interested in is successful logins and user behavior. For this I created a vertical stacked bar chart that displayed the successful logins by user by device operating system (yet more great data captured in the logs). You can tell from the bar on the right my wife is a fan of her Mac!

As I gather more data I plan on creating some more visuals, but this was great to start. The geo-spatial one is my favorite. If you have access to a larger data set with a diverse set of users your data should prove fascinating. Definitely share any graphs or interesting data points you end up putting together if you opt to do some of this analysis yourself. I’d love some new ideas!

That will wrap up this series. As you’ve seen the modern tool sets available to you now can do some amazing things for cheap without forcing you to maintain the infrastructure behind it. Vendors are also doing a wonderful job providing a metric ton of data in their logs. If you take the initiative to understand the product and the data, you can glean some powerful information that has both security and business value. Even better, you can create some simple visuals to communicate that data to a wide variety of audiences making it that much more valuable.

Have a great weekend!

Capturing and Visualizing Office 365 Security Logs – Part 1

Posted on January 30, 2019 by mattfeltonma

Welcome back again my fellow geeks!

I’ve been busy over the past month nerding out on some pet projects. I thought it would be fun to share one of those pet projects with you. If you had a chance to check out my last series, I walked through my first Python experiment which was to write a re-usable tool that could be used to pull data from Microsoft’s Graph API (Microsoft Graph).

For those of you unfamiliar with Microsoft Graph, it’s the Restful API (application programming interface) that is used to interact with Microsoft cloud offerings such as Office 365 and Azure. You’ve probably been interacting with it without even knowing it if through the many PowerShell modules Microsoft has released to programmatically interact with those services.

One of the many resources which can be accessed through Microsoft Graph are Azure AD (Active Directory) security and audit reports. If you’re using Office 365, Microsoft Azure, or simply Azure AD as an identity platform for SSO (single sign-on) to third-party applications like SalesForce, these reports provide critical security data. You’re going to want to capture them, store them, and analyze them. You’re also going to have to account for the window that Microsoft makes these logs available.

The challenge is they are not available via the means logs have traditionally been captured on-premises by using syslogd, installing an SIEM agent, or even Windows Event Log Forwarding. Instead you’ll need to take a step forward in evolving the way you’re used to doing things. This is what moving to the cloud is all about.

Microsoft allows you to download the logs manually via the Azure Portal GUI (graphical user interface) or capture them by programmatically interacting with Microsoft Graph. While the former option may work for ad-hoc use cases, it doesn’t scale. Instead we’ll explore the latter method.

If you have an existing enterprise-class SIEM (Security Information and Event Management) solution such as Splunk, you’ll have an out of box integration. However, what if you don’t have such a platform, your organization isn’t yet ready to let that platform reach out over the Internet, or you’re interested in doing this for a personal Office 365 subscription? I fell into the last category and decided it would be an excellent use case to get some experience with Python, Microsoft Graph, and take advantage of some of the data services offered by AWS (Amazon Web Services). This is the use case and solution I’m going to cover in this post.

Last year I had a great opportunity to dig into operational and security logs to extract useful data to address some business problems. It was my first real opportunity to examine large amounts of data and to create different visualizations of that data to extract useful trends about user and application behavior. I enjoyed the hell out of it and thought it would be fun to experiment with my own data.

I decided that my first use case would be Office 365 security logs. As I covered in my last series my wife’s Office 365 account was hacked. The damage was minor as she doesn’t use the account for much beyond some crafting sites (she’s a master crocheter as you can see from the crazy awesome Pennywise The Clown she made me for Christmas).

The first step in the process was determining an architecture for the solution. I gave myself a few requirements:

The solution must not be dependent on my home lab infrastructure
Storage for the logs must be cheap and readily available
The credentials used in my Python code needs to be properly secured
The solution must be automated and notify me of failures
The data needs to be available in a form that it can be examined with an analytics solution

Based upon the requirements I decided to go the serverless (don’t hate me for using that tech buzzword 🙂 ) route. My decisions were:

AWS Lambda would run my code
Amazon CloudWatch Events would be used to trigger the Lambda once a day to download the last 24 hours of logs
Amazon S3 (Simple Storage Service) would store the logs
AWS Systems Manager Parameter Store would store the parameters my code used leveraging AWS KMS (Key Management Service) to encrypt the credentials used to interact with Microsoft Graph
Amazon Athena would hold the schema for the logs and make the data queryable via SQL
Amazon QuickSight would be used to visualize the data by querying Amazon Athena

The high level architecture is pictured below.

untitled

I had never done a Lambda before so I spent a few days looking at some examples and doing the typical Hello World that we all do when we’re learning something new. From there I took the framework of Python code I put together for general purpose queries to the Microsoft Graph, and adapted it into two Lambdas. One Lambda would pull Sign-In logs while the other would pull Audit Logs. I also wanted a repeatable way to provision the Lambdas to share with others and get some CloudFormation practice and brush up on my very dusty Bash scripting. The results are located here in one of my Github repos.

I’m going to stop here for this post because we’ve covered a fair amount of material. Hopefully after reading this post you understand that you have to take a new tact with getting logs for cloud-based services such as Azure AD. Thankfully the cloud has brought us a whole new toolset we can use to automate the extraction and storage of those logs in a simple and secure manner.

In my next post I’ll walk through how I used Athena and QuickSight to put together some neat dashboards to satisfy my nerdy interests and get better insight into what’s happening on a daily basis with my Office 365 subscription.

See you next post and go Pats!

Using Python to Pull Data from MS Graph API – Part 2

Posted on January 8, 2019 by mattfeltonma

Welcome back my fellow geeks!

In this series I’m walking through my experience putting together some code to integrate with the Microsoft Graph API (Application Programming Interface). In the last post I covered the logic behind this pet project and the tools I used to get it done. In this post I’ll be walking through the code and covering what’s happening behind the scenes.

The project consists of three files. The awsintegration.py file contains functions for the integration with AWS Systems Manager Parameter Store and Amazon S3 using the Python boto3 SDK (Software Development Kit). Graphapi.py contains two functions. One function uses Microsoft’s Azure Active Directory Library for Python (ADAL) and the other function uses Python’s Requests library to make calls to the MS Graph API. Finally, the main.py file contains the code that brings everything together. There are a few trends you’ll notice with all of the code. First off it’s very simple since I’m a long way from being able to do any fancy tricks and the other is I tried to stay away from using too many third-party modules.

Let’s first dig into the awsintegration.py module. In the first few lines above I import the required modules which include AWS’s Boto3 library.

import json
import boto3
import logging

Python has a stellar standard logging module that makes logging to a centralized location across a package a breeze. The line below configures modules called by the main package to inherit the logging configuration from the main package. This way I was able to direct anything I wanted to log to the same log file.

log = logging.getLogger(__name__)

This next function uses Boto3 to call AWS Systems Manager Parameter Store to retrieve a secure string. Be aware that if you’re using Parameter Store to store secure strings the security principal you’re using to make the call (in my case an IAM User via Cloud9) needs to have appropriate permissions to Parameter Store and the KMS CMK. Notice I added a line here to log the call for the parameter to help debug any failures. Using the parameter store with Boto3 is covered in detail here.

def get_parametersParameterStore(parameterName,region):
    log.info('Request %s from Parameter Store',parameterName)
    client = boto3.client('ssm', region_name=region)
    response = client.get_parameter(
        Name=parameterName,
        WithDecryption=True
    )
    return response['Parameter']['Value']

The last function in this module again uses Boto3 to upload the file to an Amazon S3 bucket with a specific prefix. Using S3 is covered in detail here.

def put_s3(bucket,prefix,region,filename):
    s3 = boto3.client('s3', region_name=region)
    s3.upload_file(filename,bucket,prefix + "/" + filename)

Next up is the graphapi.py module. In the first few lines I again import the necessary modules as well as the AuthenticationContext module from ADAL. This module contains the AuthenticationContext class which is going to get the OAuth 2.0 access token needed to authenticate to the MS Graph API.

import json
import requests
import logging
from adal import AuthenticationContext

log = logging.getLogger(__name__)

In the function below an instance of the AuthenticationContext class is created and the acquire_token_with_client_credentials method is called. It uses the OAuth 2.0 Client Credentials grant type which allows the script to access the MS Graph API without requiring a user context. I’ve already gone ahead and provisioned and authorized the script with an identity in Azure AD and granted it the appropriate access scopes.

Behind the scenes Azure AD (authorization server in OAuth-speak) is contacted and the script (client in OAuth-speak) passes a unique client id and client secret. The client id and client secret are used to authenticate the application to Azure AD which then looks within its directory to determine what resources the application is authorized to access (scope in OAuth-speak). An access token is then returned from Azure AD which will be used in the next step.

def obtain_accesstoken(tenantname,clientid,clientsecret,resource):
    auth_context = AuthenticationContext('https://login.microsoftonline.com/' +
        tenantname)
    token = auth_context.acquire_token_with_client_credentials(
        resource=resource,client_id=clientid,
        client_secret=clientsecret)
    return token

A properly formatted header is created and the access token is included. The function checks to see if the q_param parameter has a value and it if does it passes it as a dictionary object to the Python Requests library which includes the key values as query strings. The request is then made to the appropriate endpoint. If the response code is anything but 200 an exception is raised, written to the log, and the script terminates. Assuming a 200 is received the Python JSON library is used to parse the response. The JSON content is searched for an attribute of @odata.nextLink which indicates the results have been paged. The function handles it by looping until there are no longer any paged results. It additionally combines the paged results into a single JSON array to make it easier to work with moving forward.

def makeapirequest(endpoint,token,q_param=None):
 
    headers = {'Content-Type':'application/json', \
    'Authorization':'Bearer {0}'.format(token['accessToken'])}

    log.info('Making request to %s...',endpoint)
        
    if q_param != None:
        response = requests.get(endpoint,headers=headers,params=q_param)
        print(response.url)
    else:
        response = requests.get(endpoint,headers=headers)    
    if response.status_code == 200:
        json_data = json.loads(response.text)
            
        if '@odata.nextLink' in json_data.keys():
            log.info('Paged result returned...')
            record = makeapirequest(json_data['@odata.nextLink'],token)
            entries = len(record['value'])
            count = 0
            while count < entries:
                json_data['value'].append(record['value'][count])
                count += 1
        return(json_data)
    else:
        raise Exception('Request failed with ',response.status_code,' - ',
            response.text)

Lastly there is main.py which stitches the script together. The first section adds the modules we’ve already covered in addition to the argparse library which is used to handle arguments added to the execution of the script.

import json
import requests
import logging
import time
import graphapi
import awsintegration
from argparse import ArgumentParser

A simple configuration for the logging module is setup instructing it to write to the msapiquery.log using a level of INFO and applies a standard format.

logging.basicConfig(filename='msapiquery.log', level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

This chunk of code creates an instance of the ArgumentParser class and configures two arguments. The sourcefile argument is used to designate the JSON parameters file which contains all the necessary information.

The parameters file is then opened and processed. Note that the S3 parameters are only pulled in if the –s3 switch was used.

parser = ArgumentParser()
parser.add_argument('sourcefile', type=str, help='JSON file with parameters')
parser.add_argument('--s3', help='Write results to S3 bucket',action='store_true')
args = parser.parse_args()

try:
    with open(args.sourcefile) as json_data:
        d = json.load(json_data)
        tenantname = d['parameters']['tenantname']
        resource = d['parameters']['resource']
        endpoint = d['parameters']['endpoint']
        filename = d['parameters']['filename']
        aws_region = d['parameters']['aws_region']
        q_param = d['parameters']['q_param']
        clientid_param = d['parameters']['clientid_param']
        clientsecret_param = d['parameters']['clientsecret_param']
        if args.s3:
            bucket = d['parameters']['bucket']
            prefix = d['parameters']['prefix']

Next up the get_parametersParameterStore function from the awsintegration module is executed twice. Once to get the client id and once to get the client secret. Note that the get_parameters method for Boto3 Systems Manager client could have been used to get both of the parameters in a single call, but I didn’t go that route.

    logging.info('Attempting to contact Parameter Store...')
    clientid = awsintegration.get_parametersParameterStore(clientid_param,aws_region)
    clientsecret = awsintegration.get_parametersParameterStore(clientsecret_param,aws_region)

In these next four lines the access token is obtained by calling the obtain_accesstoken function and the request to the MS Graph API is made using the makeapirequest function.

    logging.info('Attempting to obtain an access token...')
    token = graphapi.obtain_accesstoken(tenantname,clientid,clientsecret,resource)

    logging.info('Attempting to query %s ...',endpoint)
    data = graphapi.makeapirequest(endpoint,token,q_param)

This section creates a string representing the current day, month, and year and prepends the filename that was supplied in the parameters file. The file is then opened using the with statement. If you’re familiar with the using statement from C# the with statement is similar in that it ensures resources are cleaned up after being used.

Before the data is written to file, I remove the @odata.nextLink key if it’s present. This is totally optional and just something I did to pretty up the results. The data is then written to the file as raw text by using the Python JSON encoder/decoder.

    logging.info('Attempting to write results to a file...')
    timestr = time.strftime("%Y-%m-%d")
    filename = timestr + '-' + filename
    with open(filename,'w') as f:
        
        ## If the data was paged remove the @odata.nextLink key
        ## to clean up the data before writing it to a file

        if '@odata.nextLink' in data.keys():
            del data['@odata.nextLink']
        f.write(json.dumps(data))

Finally, if the s3 argument was passed when the script was run, the put_s3 method from the awsintegration module is run and the file is uploaded to S3.

    logging.info('Attempting to write results to %s S3 bucket...',bucket)
    if args.s3:
        awsintegration.put_s3(bucket,prefix,aws_region,filename)

Exceptions thrown anywhere in the script are captured here written to the log file. I played around a lot with a few different ways of handling exceptions and everything was so interdependent that if there was a failure it was best for the script to stop altogether and inform the user. Naftali Harris has an amazing blog that walks through the many different ways of handling exceptions in Python and the various advantages and disadvantages. It’s a great read.

except Exception as e:
    logging.error('Exception thrown: %s',e)
    print('Error running script.  Review the log file for more details')

So that’s what the code is. Let’s take a quick look at the parameters file below. It’s very straight forward. Keep in mind both the bucket and prefix parameters are only required when using the –s3 option. Here are some details on the other options:

The tenantname attribute is the DNS name of the Azure AD tenant being queries.
The resource attribute specifies the resource the access token will be used for. If you’re going to be hitting the MS Graph API, more than likely it will be https://graph.microsoft.com
The endpoint attribute specifies the endpoint the request is being made to including any query strings you plan on using
The clientid_param and clientsecret_param attributes are the AWS Systems Manager Parameter Store parameter names that hold the client id and client secret the script was provisioned from Azure AD
The q_param attribute is an array of key value pairs intended to story OData query strings
The aws_region attribute is the region the S3 bucket and parameter store data is stored in
The filename attribute is the name you want to set for the file the script will produce

{
    "parameters":{
        "tenantname": "mytenant.com",
        "resource": "https://graph.microsoft.com",
        "endpoint": "https://graph.microsoft.com/beta/auditLogs/signIns",
        "clientid_param":"myclient_id",
        "clientsecret_param":"myclient_secret",
        "q_param":{"$filter":"createdDateTime gt 2019-01-09"},
        "aws_region":"us-east-1",
        "filename":"sign_in_logs.json",
        "bucket":"mybucket",
        "prefix":"myprefix"
    }
}

Now that the script has been covered, let’s see it action. First I’m going to demonstrate how it handles paging by querying the MS Graph API endpoint to list out the users in the directory. I’m going to append the $select query parameter and set it to return just the user’s id to make the output more simple and set the $top query parameter to one to limit the results to one user per page. The endpoint looks like this https://graph.microsoft.com/beta/users?$top=1&select=id.

I’ll be running the script from an instance of Cloud9. The IAM user I’m using with AWS has appropriate permissions to the S3 bucket, KMS CMK, and parameters in the parameter store. I’ve set each of the parameters in the parameters file to the appropriate values for the environment I’ve configured. I’ll additionally be using the –s3 option.

Once the script is complete it’s time to look at the log file that was created. As seen below each step in the script to aid with debugging if something were to fail. The log also indicates the results were paged.

The output is nicely formatted JSON that could be further transformed or fed into something like Amazon Athena for further analysis (future post maybe?).

Cool right? My original use case was sign-in logs so let’s take a glance that. Here I’m going to use an endpoint of https://graph.microsoft.com/beta/auditLogs/signIns with a OData filter option of createdDateTime gt 2019-01-08 which will limit the data returned to today’s sign-ins.

In the logs we see the script was successfully executed and included the filter specified.

The output is the raw JSON of the sign-ins over the past 24 hours. For your entertainment purposes I’ve included one of the malicious sign-ins that was captured. I SO can’t wait to examine this stuff in a BI tool.

Well that’s it folks. It may be ugly, but it works! This was a fun activity to undertake as a first stab at making something useful in Python. I especially enjoyed the lack of documentation available on this integration. It really made me dive deep and learn things I probably wouldn’t have if there were a billion of examples out there.

I’ve pushed the code to Github so feel free to muck around with it to your hearts content.

Journey Of The Geek

The chronicles of a Bostonian tech geek navigating through life and technology

Tag Archives: logs

Azure VWAN and Private Endpoint Traffic Inspection – Findings

Capturing and Visualizing Office 365 Security Logs – Part 2

Capturing and Visualizing Office 365 Security Logs – Part 1

Using Python to Pull Data from MS Graph API – Part 2