Using Python to Pull Data from MS Graph API – Part 2

Using Python to Pull Data from MS Graph API – Part 2

Welcome back my fellow geeks!

In this series I’m walking through my experience putting together some code to integrate with the Microsoft Graph API (Application Programming Interface).  In the last post I covered the logic behind this pet project and the tools I used to get it done.  In this post I’ll be walking through the code and covering what’s happening behind the scenes.

The project consists of three files.  The awsintegration.py file contains functions for the integration with AWS Systems Manager Parameter Store and Amazon S3 using the Python boto3 SDK (Software Development Kit).  Graphapi.py contains two functions.  One function uses Microsoft’s Azure Active Directory Library for Python (ADAL) and the other function uses Python’s Requests library to make calls to the MS Graph API.  Finally, the main.py file contains the code that brings everything together. There are a few trends you’ll notice with all of the code. First off it’s very simple since I’m a long way from being able to do any fancy tricks and the other is I tried to stay away from using too many third-party modules.

Let’s first dig into the awsintegration.py module.  In the first few lines above I import the required modules which include AWS’s Boto3 library.

import json
import boto3
import logging

Python has a stellar standard logging module that makes logging to a centralized location across a package a breeze.  The line below configures modules called by the main package to inherit the logging configuration from the main package.  This way I was able to direct anything I wanted to log to the same log file.

log = logging.getLogger(__name__)

This next function uses Boto3 to call AWS Systems Manager Parameter Store to retrieve a secure string.  Be aware that if you’re using Parameter Store to store secure strings the security principal you’re using to make the call (in my case an IAM User via Cloud9) needs to have appropriate permissions to Parameter Store and the KMS CMK.  Notice I added a line here to log the call for the parameter to help debug any failures.  Using the parameter store with Boto3 is covered in detail here.

def get_parametersParameterStore(parameterName,region):
    log.info('Request %s from Parameter Store',parameterName)
    client = boto3.client('ssm', region_name=region)
    response = client.get_parameter(
        Name=parameterName,
        WithDecryption=True
    )
    return response['Parameter']['Value']

The last function in this module again uses Boto3 to upload the file to an Amazon S3 bucket with a specific prefix.  Using S3 is covered in detail here.

def put_s3(bucket,prefix,region,filename):
    s3 = boto3.client('s3', region_name=region)
    s3.upload_file(filename,bucket,prefix + "/" + filename)

Next up is the graphapi.py module.  In the first few lines I again import the necessary modules as well as the AuthenticationContext module from ADAL.  This module contains the AuthenticationContext class which is going to get the OAuth 2.0 access token needed to authenticate to the MS Graph API.

import json
import requests
import logging
from adal import AuthenticationContext

log = logging.getLogger(__name__)

In the function below an instance of the AuthenticationContext class is created and the acquire_token_with_client_credentials method is called.   It uses the OAuth 2.0 Client Credentials grant type which allows the script to access the MS Graph API without requiring a user context.  I’ve already gone ahead and provisioned and authorized the script with an identity in Azure AD and granted it the appropriate access scopes.

Behind the scenes Azure AD (authorization server in OAuth-speak) is contacted and the script (client in OAuth-speak) passes a unique client id and client secret.  The client id and client secret are used to authenticate the application to Azure AD which then looks within its directory to determine what resources the application is authorized to access (scope in OAuth-speak).  An access token is then returned from Azure AD which will be used in the next step.

def obtain_accesstoken(tenantname,clientid,clientsecret,resource):
    auth_context = AuthenticationContext('https://login.microsoftonline.com/' +
        tenantname)
    token = auth_context.acquire_token_with_client_credentials(
        resource=resource,client_id=clientid,
        client_secret=clientsecret)
    return token

A properly formatted header is created and the access token is included. The function checks to see if the q_param parameter has a value and it if does it passes it as a dictionary object to the Python Requests library which includes the key values as query strings. The request is then made to the appropriate endpoint. If the response code is anything but 200 an exception is raised, written to the log, and the script terminates.  Assuming a 200 is received the Python JSON library is used to parse the response.  The JSON content is searched for an attribute of @odata.nextLink which indicates the results have been paged.  The function handles it by looping until there are no longer any paged results.  It additionally combines the paged results into a single JSON array to make it easier to work with moving forward.

def makeapirequest(endpoint,token,q_param=None):
 
    headers = {'Content-Type':'application/json', \
    'Authorization':'Bearer {0}'.format(token['accessToken'])}

    log.info('Making request to %s...',endpoint)
        
    if q_param != None:
        response = requests.get(endpoint,headers=headers,params=q_param)
        print(response.url)
    else:
        response = requests.get(endpoint,headers=headers)    
    if response.status_code == 200:
        json_data = json.loads(response.text)
            
        if '@odata.nextLink' in json_data.keys():
            log.info('Paged result returned...')
            record = makeapirequest(json_data['@odata.nextLink'],token)
            entries = len(record['value'])
            count = 0
            while count < entries:
                json_data['value'].append(record['value'][count])
                count += 1
        return(json_data)
    else:
        raise Exception('Request failed with ',response.status_code,' - ',
            response.text)

Lastly there is main.py which stitches the script together.  The first section adds the modules we’ve already covered in addition to the argparse library which is used to handle arguments added to the execution of the script.

import json
import requests
import logging
import time
import graphapi
import awsintegration
from argparse import ArgumentParser

A simple configuration for the logging module is setup instructing it to write to the msapiquery.log using a level of INFO and applies a standard format.

logging.basicConfig(filename='msapiquery.log', level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

This chunk of code creates an instance of the ArgumentParser class and configures two arguments.  The sourcefile argument is used to designate the JSON parameters file which contains all the necessary information.

The parameters file is then opened and processed.  Note that the S3 parameters are only pulled in if the –s3 switch was used.

parser = ArgumentParser()
parser.add_argument('sourcefile', type=str, help='JSON file with parameters')
parser.add_argument('--s3', help='Write results to S3 bucket',action='store_true')
args = parser.parse_args()

try:
    with open(args.sourcefile) as json_data:
        d = json.load(json_data)
        tenantname = d['parameters']['tenantname']
        resource = d['parameters']['resource']
        endpoint = d['parameters']['endpoint']
        filename = d['parameters']['filename']
        aws_region = d['parameters']['aws_region']
        q_param = d['parameters']['q_param']
        clientid_param = d['parameters']['clientid_param']
        clientsecret_param = d['parameters']['clientsecret_param']
        if args.s3:
            bucket = d['parameters']['bucket']
            prefix = d['parameters']['prefix']

Next up the get_parametersParameterStore function from the awsintegration module is executed twice.  Once to get the client id and once to get the client secret.  Note that the get_parameters method for Boto3 Systems Manager client could have been used to get both of the parameters in a single call, but I didn’t go that route.

    logging.info('Attempting to contact Parameter Store...')
    clientid = awsintegration.get_parametersParameterStore(clientid_param,aws_region)
    clientsecret = awsintegration.get_parametersParameterStore(clientsecret_param,aws_region)

In these next four lines the access token is obtained by calling the obtain_accesstoken function and the request to the MS Graph API is made using the makeapirequest function.

    logging.info('Attempting to obtain an access token...')
    token = graphapi.obtain_accesstoken(tenantname,clientid,clientsecret,resource)

    logging.info('Attempting to query %s ...',endpoint)
    data = graphapi.makeapirequest(endpoint,token,q_param)

This section creates a string representing the current day, month, and year and prepends the filename that was supplied in the parameters file.  The file is then opened using the with statement.  If you’re familiar with the using statement from C# the with statement is similar in that it ensures resources are cleaned up after being used.

Before the data is written to file, I remove the @odata.nextLink key if it’s present.  This is totally optional and just something I did to pretty up the results.  The data is then written to the file as raw text by using the Python JSON encoder/decoder.

    logging.info('Attempting to write results to a file...')
    timestr = time.strftime("%Y-%m-%d")
    filename = timestr + '-' + filename
    with open(filename,'w') as f:
        
        ## If the data was paged remove the @odata.nextLink key
        ## to clean up the data before writing it to a file

        if '@odata.nextLink' in data.keys():
            del data['@odata.nextLink']
        f.write(json.dumps(data))

Finally, if the s3 argument was passed when the script was run, the put_s3 method from the awsintegration module is run and the file is uploaded to S3.

    logging.info('Attempting to write results to %s S3 bucket...',bucket)
    if args.s3:
        awsintegration.put_s3(bucket,prefix,aws_region,filename)

Exceptions thrown anywhere in the script are captured here written to the log file.  I played around a lot with a few different ways of handling exceptions and everything was so interdependent that if there was a failure it was best for the script to stop altogether and inform the user.  Naftali Harris has an amazing blog that walks through the many different ways of handling exceptions in Python and the various advantages and disadvantages.  It’s a great read.

except Exception as e:
    logging.error('Exception thrown: %s',e)
    print('Error running script.  Review the log file for more details')

So that’s what the code is.  Let’s take a quick look at the parameters file below.  It’s very straight forward.  Keep in mind both the bucket and prefix parameters are only required when using the –s3 option.  Here are some details on the other options:

  • The tenantname attribute is the DNS name of the Azure AD tenant being queries.
  • The resource attribute specifies the resource the access token will be used for.  If you’re going to be hitting the MS Graph API, more than likely it will be https://graph.microsoft.com
  • The endpoint attribute specifies the endpoint the request is being made to including any query strings you plan on using
  • The clientid_param and clientsecret_param attributes are the AWS Systems Manager Parameter Store parameter names that hold the client id and client secret the script was provisioned from Azure AD
  • The q_param attribute is an array of key value pairs intended to story OData query strings
  • The aws_region attribute is the region the S3 bucket and parameter store data is stored in
  • The filename attribute is the name you want to set for the file the script will produce
{
    "parameters":{
        "tenantname": "mytenant.com",
        "resource": "https://graph.microsoft.com",
        "endpoint": "https://graph.microsoft.com/beta/auditLogs/signIns",
        "clientid_param":"myclient_id",
        "clientsecret_param":"myclient_secret",
        "q_param":{"$filter":"createdDateTime gt 2019-01-09"},
        "aws_region":"us-east-1",
        "filename":"sign_in_logs.json",
        "bucket":"mybucket",
        "prefix":"myprefix"
    }
}

Now that the script has been covered, let’s see it action.  First I’m going to demonstrate how it handles paging by querying the MS Graph API endpoint to list out the users in the directory.  I’m going to append the $select query parameter and set it to return just the user’s id to make the output more simple and set the $top query parameter to one to limit the results to one user per page.  The endpoint looks like this https://graph.microsoft.com/beta/users?$top=1&select=id.

I’ll be running the script from an instance of Cloud9.  The IAM user I’m using with AWS has appropriate permissions to the S3 bucket, KMS CMK, and parameters in the parameter store.  I’ve set each of the parameters in the parameters file to the appropriate values for the environment I’ve configured.  I’ll additionally be using the –s3 option.

 

run_script.png

Once the script is complete it’s time to look at the log file that was created.  As seen below each step in the script to aid with debugging if something were to fail.  The log also indicates the results were paged.

log

The output is nicely formatted JSON that could be further transformed or fed into something like Amazon Athena for further analysis (future post maybe?).

json.png

Cool right?  My original use case was sign-in logs so let’s take a glance that.  Here I’m going to use an endpoint of https://graph.microsoft.com/beta/auditLogs/signIns with a OData filter option of createdDateTime gt 2019-01-08 which will limit the data returned to today’s sign-ins.

In the logs we see the script was successfully executed and included the filter specified.

graphapi_log_sign.png

The output is the raw JSON of the sign-ins over the past 24 hours.  For your entertainment purposes I’ve included one of the malicious sign-ins that was captured.  I SO can’t wait to examine this stuff in a BI tool.

sign_in_json

Well that’s it folks.  It may be ugly, but it works!  This was a fun activity to undertake as a first stab at making something useful in Python.  I especially enjoyed the lack of documentation available on this integration.  It really made me dive deep and learn things I probably wouldn’t have if there were a billion of examples out there.

I’ve pushed the code to Github so feel free to muck around with it to your hearts content.

Using Python to Pull Data from MS Graph API – Part 1

Welcome to 2019 fellow geeks! I hope each of you had a wonderful holiday with friends and family.

It’s been a few months since my last post. As some of you may be aware I made a career move last September and took on a new role with a different organization. The first few months have been like drinking from multiple fire hoses at once and I’ve learned a ton. It’s been an amazing experience that I’m excited to continue in 2019.

One area I’ve been putting some focus in is learning the basics of Python. I’ve been a PowerShell guy (with a bit of C# thrown in there) for the past six years so diving into a new language was a welcome change. I picked up a few books on the language, watched a few videos, and it wasn’t clicking. At that point I decided it was time to jump into the deep end and come up with a use case to build out a script for. Thankfully I had one queued up that I had started in PowerShell.

Early last year my wife’s Office 365 account was hacked. Thankfully no real damage was done minus some spam email that was sent out. I went through the wonderful process of changing her passwords across her accounts, improving the complexity and length, getting her on-boarded with a password management service, and enabling Azure MFA (Multi-factor Authentication) on her Office 365 account and any additional services she was using that supported MFA options.  It was not fun.

Curious of what the logs would have shown, I had begun putting together a PowerShell script that was going to pull down the logs from Azure AD (Active Directory), extract the relevant data, and export it CSV (comma-separate values) where I could play around with it in whatever analytics tool I could get my hands on. Unfortunately life happened and I never had a chance to finish the script or play with the data. This would be my use case for my first Python script.

Azure AD offers a few different types of logs which Microsoft divides into a security pillar and an activity pillar. For my use case I was interested in looking at the reports in the Activity pillar, specifically the Sign-ins report. This report is available for tenants with an Azure AD Premium P1 or P2 subscription (I added P2 subscriptions to our family accounts last year).  The sign-in logs have a retention period of 30 days and are available either through the Azure Portal or programmatically through the MS Graph API (Application Programming Interface).

My primary goals were to create as much reusable code as possible and experiment with as many APIs/SDKs (Software Development Kits) as I could.  This was accomplished by breaking the code into various reusable modules and leveraging AWS (Amazon Web Services) services for secure storage of Azure AD application credentials and cloud-based storage of the exported data.  Going this route forced me to use the MS Graph API, Microsoft’s Azure Active Directory Library for Python (or ADAL for short), and Amazon’s Boto3 Python SDK.

On the AWS side I used AWS Systems Manager Parameter Store to store the Azure AD credentials as secure strings encrypted with a AWS KMS (Key Management Service) customer-managed customer master key (CMK).  For cloud storage of the log files I used Amazon S3.

Lastly I needed a development environment and source control.  For about a day I simply used Sublime Text on my Mac and saved the file to a personal cloud storage account.  This was obviously not a great idea so I decided to finally get my GitHub repository up and running.  Additionally I moved over to using AWS’s Cloud9 for my IDE (integrated development environment).   Cloud9 has the wonderful perk of being web based and has the capability of creating temporary credentials that can do most of what my AWS IAM user can do.  This made it simple to handle permissions to the various resources I was using.

Once the instance of Cloud9 was spun up I needed to set the environment up for Python 3 and add the necessary libraries.  The AMI (Amazon Machine Image) used by the Cloud9 service to provision new instances includes both Python 2.7 and Python 3.6.  This fact matters when adding the ADAL and Boto3 modules via pip because if you simply run a pip install module_name it will be installed for Python 2.7.  Instead you’ll want to execute the command python3 -m pip install module_name which ensures that the two modules are installed in the appropriate location.

In my next post I’ll walk through and demonstrate the script.

Have a great week!

 

 

 

 

Azure AD ASP .NET Web Application

Hi all. Before I complete my series on Azure AD Provisioning with a look at how provisioning works with the Graph API, I want to take a detour and cover some Microsoft Visual Studio. Over the past month I’ve been spending some time building very basic applications.

As I’ve covered previously in my blog, integration is going to the primary responsibility of IT Professionals as more infrastructure shifts to the cloud. Our success at integration will largely depend on how well we understand the technologies, our ability to communicate what business problems these technologies can solve, and our understanding of how to help developers build applications to consume these technologies. In my own career, I’ve spent a significant time on all of the above except the last piece. That is where I’m focusing now.

Recently I built a small.NET forms application that integrated with the new Azure AD B2B API. Over the past few days I’ve been spending time diving with to ASP .NET Web Applications built with an MVC architecture. I decided to build an small MVC application that performed queries against the Graph API, and wow was it easy. There was little to no code that I had to provide myself and “it just worked”. You can follow these instructions if you’d like to play with it as well. If you’ve read this blog, you know I don’t do well with things that just work. I need to know how it works.

If you follow the instructions in the above link you will have a ASP .NET Web Application that is integrated with your Azure AD tenant, uses Azure AD for authentication via the Open ID Connect protocol, and is capable of reading directory data from the Graph API uses OAuth 2.0. So how is all this accomplished? What actually happened? What protocol is being used for authentication, how does the application query for directory data? Those are the questions I’ll focus on answering in this blog post.

Let’s first answer the question as to what Visual Studio did behind the scenes to integrate the application with Azure AD. In the explanation below I’ll be using the technology terms (OAuth 2.0 and Open ID Connect) in parentheses to translate the Microsoft lingo. During the initialization process Visual Studio communicated with Azure AD (Authorization Server) and registered (registration) the application (confidential client) with Azure AD as a Web App and gave it the delegated permissions (scopes) of “Sign in and read user profile” and “Read directory data”.

In addition to the registration of the application in Azure AD, a number of libraries and code have been added to the project that make the authentication and queries to the Graph API “out of the box”. All of the variables specific to the application such as Client ID, Azure AD Instance GUID, application secret key are stored in the web.config. The Startup.cs has the simple code that adds Open ID Connect authentication. Microsoft does a great job explaining the Open ID Connect code here. In addition to the code to request the Open ID Connect authentication, there is code to exchange the authorization code for an access token and refresh token for the Graph API as seen below with my comments.

AuthorizationCodeReceived = (context) =>
{
var code = context.Code;
// MF -> Create a client credential object with the Client ID and Application key
ClientCredential credential = new ClientCredential(clientId, appKey);
// MF -> Extract the access token from cache and generate an authentication context from it
string signedInUserID = context.AuthenticationTicket.Identity.FindFirst(ClaimTypes.NameIdentifier).Value;
AuthenticationContext authContext = new AuthenticationContext(Authority, new ADALTokenCache(signedInUserID));
// MF -> Acquire a token by submitting the authorization code, providing the URI that is registered for the application, the application secret key, and the resource (in this scenario the Graph API)
AuthenticationResult result = authContext.AcquireTokenByAuthorizationCode(
code, new Uri(HttpContext.Current.Request.Url.GetLeftPart(UriPartial.Path)), credential, graphResourceId);
return Task.FromResult(0);
}

Now that the user has authenticated and the application has a Graph API access token, I’ll hop over to the UserProfileController.cs. The code we’re concerned about in here is below with my comments.


{
Uri servicePointUri = new Uri(graphResourceID);
Uri serviceRoot = new Uri(servicePointUri, tenantID);
ActiveDirectoryClient activeDirectoryClient = new ActiveDirectoryClient(serviceRoot, async () => await GetTokenForApplication());
// MF -> Use the access token previous obtained to query the Graph API
var result = await activeDirectoryClient.Users.Where(u => u.ObjectId.Equals(userObjectID)).ExecuteAsync();
IUser user = result.CurrentPage.ToList().First();
return View(user);
}

Next I’ll hop over to the UserProfile view to look at the Index.cshtml. In this file a simple table is constructed that returns information about the user from the Graph API. I’ve removed some of the pesky HTML and replaced it with the actions.


@using Microsoft.Azure.ActiveDirectory.GraphClient
@model User
@{
ViewBag.Title = "User Profile";
}
"CREATE TABLE"
"TABLE ROW"
Display Name
@Model.DisplayName
"TABLE ROW"
First Name
@Model.GivenName
"TABLE ROW"
Last Name
@Model.Surname
"TABLE ROW"
Email Address
@Model.Mail

Simple right? I can expand that table to include any attribute exposed via the Graph API. As you can see in the above, I’ve added email address to the display. Now that we’ve reviewed the code, let’s cover the steps the application takes and what happens in each step:

  1. App accesses https://login.microsoftonline.com//.well-known/openid-configuration
    • Get OpenID configuration for Azure AD
  2. App accesses https://login.microsoftonine.com/common/discovery/keys
    • Retrieve public-keys used to verify signature of open id connect tokens
  3. User’s browser directed to https://login.microsoftonline.com//oauth2/authorize
    • Request an open id connect id token and authorization code for user’s profile information
  4. User’s browser directed to https://login.microsoftonline.com//login
    • User provides credentials to AAD and receives back
      1. id token
      2. access code for graph API with Directory.Read, User.Read scope
  5. User’s browser directed back to application
    • Return id token and access code to application
      1. id token authenticates user to application
      2. Access code for graph API with Directory.Read, User.Read scope temporarily stored
  6. Application accesses https://login.microsftonine.com//oauth2/token
    • Exchanges access code for bearer token
  7. Application sends OData query to Graph API and attaches bearer token.

That’s it folks! In my next post I will complete the Azure AD Provisioning series with a simple ASP .NET Web app that provisions new users into Azure AD.

Azure AD User Provisioning – Part 4

Today I will continue my series in Azure AD User Provisioning. In previous posts I looked at the GUI methods of provisioning users. I’ll now begin digging into the methods that provide opportunities for programmatic management of a user’s identity management lifecycle. For this post we’ll cover every IT Professional’s favorite Microsoft topic, PowerShell.

Microsoft has specific PowerShell modules for administration of Azure Active Directory. Microsoft is in the process of transitioning from the MSOnline (Azure Active Directory PowerShell v1) to the AzureAD module (Azure Active Directory PowerShell v2). At this time the AzureAD PowerShell module is in public preview with plans to migrate all the functionality from the MSOnline module to the AzureAD module. Until that point there will be some activities you’ll need to do in the MSOnline cmdlets.
Creating a new user using the MSOnline module (lovely referred to as the “msol” cmdlets) is a quick and easy simple line of code:

New-MsolUser -userPrincipalName user@sometenant.com -DisplayName "Some User"

The user can then be modified by using cmlets such as Set-MSolUser, Set-MsolUserLicense, Set-MsolUserPrincipalName, Set-MsolUserPassword, Remove-MsolUser. Using this set of cmdlets you can assign and un-assign user licenses and manage the identity lifecycle of the user account.

There are some subtle differences in the MSOnline and AzureAD cmdlets. These differences arise due to Microsoft’s drive to make the experience using PowerShell in the Azure AD module similar to the experience of using the Graph API. The AzureAD cmdlets are much more full featured in comparison to the MSOnline cmdlets. Close to every aspect of Azure AD can be managed across the board beyond just users. Here is an example of how you would create a user with the new cmdlets:

New-AzureADUser -UserPrincipalName user@sometenant.com -AccountEnabled $False -DisplayName "user" -PasswordProfile $UserPasswordProfile

You’ll notice a few differences when we compare the minimum options required when creating a user with the MSOnline cmdlets. As you’ll notice above, there are not just more required options, but one that make look unfamiliar called PasswordProfile. The Password profile option configures the user’s password and whether or not the user is going to be forced to change the password at next login. I struggled a bit with getting the cmdlet to accept the PasswordProfile, but the documentation on the Azure Graph API and a Microsoft blog that provided some information. It can be set with the following lines of code:

$UserPasswordProfile = "" | Select-Object password,forceChangePasswordNextLogin
$UserPasswordProfile.forceChangePasswordNextLogin $true
$UserPasswordProfile.password = "some password"

So what is the big difference between the MSOnline and AzureAD cmdlets? Well it comes down to the API each uses to interact with Azure AD. The MSOnline (aka Azure AD PowerShell v1) uses the old SOAP API. Oddly enough it uses the same SOAP API that Azure AD Connect uses. If you’ve read my Azure AD Connect – Behind the Scenes series, you’ll know notice the similarities in the Fiddler capture below.

pica-1

The AzureAD cmdlets (aka Azure AD PowerShell v2) uses the Graph API. In the Fiddler capture below you can see the authentication to Azure AD, obtaining of the authorization code, exchange for bearer token, and delivery of the bearer token to the graph API for the user information on Dr. Frakenstein. Check out the screenshot from Fiddler below:

picb

As you can see, PowerShell presents a powerful method of managing Azure AD and its resources. Given how familiar IT professionals are with PowerShell, it presents a ton of opportunities for automation and standardization without a significant learning curve. Microsoft’s move to having PowerShell leverage the power of the Graph API will help IT professionals leverage the power of the Graph API without having to break out Visual Studio.

In my next post I’ll write a simple application in Visual Studio to demonstrate the simplicity of integration with the Graph API and the opportunities that could be presented to integrating with existing toolsets.