Visualizing AWS Logging Data in Azure Monitor – Part 2

Visualizing AWS Logging Data in Azure Monitor – Part 2

Welcome back folks!

In this post I’ll be continuing my series on how Azure Monitor can be used to visualize log data generated by other cloud services.  In my last post I covered the challenges that multicloud brings and what Azure can do to help with it.  I also gave an overview of Azure Monitor and covered the design of the demo I put together and will be walking through in this post.  Please take a read through that post if you haven’t already.  If you want to follow along, I’ve put the solution up on Github.

Let’s quickly review the design of the solution.

Capture

This solution uses some simple Python code to pull information about the usage of AWS IAM User access id and secret keys from an AWS account.  The code runs via a Lambda and stores the Azure Log Analytics Workspace id and key in environment variables of the Lambda that are encrypted with an AWS KMS key.  The data is pulled from the AWS API using the Boto3 SDK and is transformed to JSON format.  It’s then delivered to the HTTP Data Collector API which places it into the Log Analytics Workspace.  From there, it becomes available to Azure Monitor to query and visualize.

Setting up an Azure environment for this integration is very simple.  You’ll need an active Azure subscription.  If you don’t have one, you can setup a free Azure account to play around.  Once you’re set with the Azure subscription, you’ll need to create an Azure Log Analytics Workspace.  Instructions for that can be found in this Microsoft article.  After the workspace has been setup, you’ll need to get the workspace id and key as referenced in the Obtain workspace ID and key section of this Microsoft article.  You’ll use this workspace ID and key to authenticate to the HTTP Data Collector API.

If you have a sandbox AWS account and would like to follow along, I’ve included a CloudFormation template that will setup the AWS environment.  You’ll need to have an AWS account with sufficient permissions to run the template and provision the resources.  Prior to running the template, you will need to zip up the lambda_function.py and put it on an AWS S3 bucket you have permissions on.  When you run the template you’ll be prompted to provide the S3 bucket name, the name of the ZIP file, the Log Analytics Workspace ID and key, and the name you want the API to assign to the log in the workspace.

The Python code backing the solution is pretty simple.  It uses all standard Python modules except for the boto3 module used to interact with AWS.

import json
import logging
import re
import csv
import boto3
import os
import hmac
import base64
import hashlib
import datetime

from io import StringIO
from datetime import datetime
from botocore.vendored import requests

The first function in the code parses the ARN (Amazon Resource Name) to extract the AWS account number.  This information is later included in the log data written to Azure.

# Parse the IAM User ARN to extract the AWS account number
def parse_arn(arn_string):
    acct_num = re.findall(r'(?<=:)[0-9]{12}',arn_string)
    return acct_num[0]

The second function uses the strftime method to transform the timestamp returned from the AWS API to a format that the Azure Monitor API will detect as a timestamp and make that particular field for each record in the Log Analytics Workspace a datetime type.

# Convert timestamp to one more compatible with Azure Monitor
def transform_datetime(awsdatetime):
transf_time = awsdatetime.strftime("%Y-%m-%dT%H:%M:%S")
return transf_time

The next function queries the AWS API for a listing of AWS IAM Users setup in the account and creates dictionary object representing data about that user. That object is added to a list which holds each object representing each user.

# Query for a list of AWS IAM Users
def query_iam_users():
    
    todaydate = (datetime.now()).strftime("%Y-%m-%d")
    users = []
    client = boto3.client(
        'iam'
    )

    paginator = client.get_paginator('list_users')
    response_iterator = paginator.paginate()
    for page in response_iterator:
        for user in page['Users']:
            user_rec = {'loggedDate':todaydate,'username':user['UserName'],'account_number':(parse_arn(user['Arn']))}
            users.append(user_rec)
    return users

The query_access_keys function queries the AWS API for a listing of the access keys that have been provisioned the AWS IAM User as well as the status of those keys and some metrics around the usage.  The resulting data is then added to a dictionary object and the object added to a list.  Each item in the list represents a record for an AWS access id.

# Query for a list of access keys and information on access keys for an AWS IAM User
def query_access_keys(user):
    keys = []
    client = boto3.client(
        'iam'
    )
    paginator = client.get_paginator('list_access_keys')
    response_iterator = paginator.paginate(
        UserName = user['username']
    )

    # Get information on access key usage
    for page in response_iterator:
        for key in page['AccessKeyMetadata']:
            response = client.get_access_key_last_used(
                AccessKeyId = key['AccessKeyId']
            )
            # Santize key before sending it along for export

            sanitizedacctkey = key['AccessKeyId'][:4] + '...' + key['AccessKeyId'][-4:]
            # Create new dictonionary object with access key information
            if 'LastUsedDate' in response.get('AccessKeyLastUsed'):

                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),
                'LastUsedDate':(transform_datetime(response['AccessKeyLastUsed']['LastUsedDate'])),
                'Region':response['AccessKeyLastUsed']['Region'],'Status':key['Status'],
                'ServiceName':response['AccessKeyLastUsed']['ServiceName']}
                keys.append(key_rec)
            else:
                key_rec = {'loggedDate':user['loggedDate'],'user':user['username'],'account_number':user['account_number'],
                'AccessKeyId':sanitizedacctkey,'CreateDate':(transform_datetime(key['CreateDate'])),'Status':key['Status']}
                keys.append(key_rec)
    return keys

The next two functions contain the code that creates and submits the request to the Azure Monitor API.  The product team was awesome enough to provide some sample code in the in the public documentation for this part.  The code is intended for Python 2 but only required a few small changes to make it compatible with Python 3.

Let’s first talk about the build_signature function.  At this time the API uses HTTP request signing using the Log Analytics Workspace id and key to authenticate to the API.  In short this means you’ll have two sets of shared keys per workspace, so consider the workspace your authorization boundary and prioritize proper key management (aka use a different workspace for each workload, track key usage, and rotate keys as your internal policies require).

Breaking down the code below, we the string that will act as the header includes the HTTP method, length of request content, a custom header of x-ms-date, and the REST resource endpoint.  The string is then converted to a bytes object, and an HMAC is created using SHA256 which is then base-64 encoded.  The result is the authorization header which is returned by the function.

def build_signature(customer_id, shared_key, date, content_length, method, content_type, resource):
    x_headers = 'x-ms-date:' + date
    string_to_hash = method + "\n" + str(content_length) + "\n" + content_type + "\n" + x_headers + "\n" + resource
    bytes_to_hash = bytes(string_to_hash, encoding="utf-8")  
    decoded_key = base64.b64decode(shared_key)
    encoded_hash = base64.b64encode(
        hmac.new(decoded_key, bytes_to_hash, digestmod=hashlib.sha256).digest()).decode()
    authorization = "SharedKey {}:{}".format(customer_id,encoded_hash)
    return authorization

Not much needs to be said about the post_data function beyond that it uses the Python requests module to post the log content to the API.  Take note of the limits around the data that can be included in the body of the request.  Key takeaways here is if you plan pushing a lot of data to the API you’ll need to chunk your data to fit within the limits.

def post_data(customer_id, shared_key, body, log_type):
    method = 'POST'
    content_type = 'application/json'
    resource = '/api/logs'
    rfc1123date = datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
    content_length = len(body)
    signature = build_signature(customer_id, shared_key, rfc1123date, content_length, method, content_type, resource)
    uri = 'https://' + customer_id + '.ods.opinsights.azure.com' + resource + '?api-version=2016-04-01'

    headers = {
        'content-type': content_type,
        'Authorization': signature,
        'Log-Type': log_type,
        'x-ms-date': rfc1123date
    }

    response = requests.post(uri,data=body, headers=headers)
    if (response.status_code >= 200 and response.status_code <= 299):
        print("Accepted")
    else:
        print("Response code: {}".format(response.status_code))

Last but not least we have the lambda_handler function which brings everything together. It first gets a listing of users, loops through each user to information about the access id and secret keys usage, creates a log record containing information about each key, converts the data from a dict to a JSON string, and writes it to the API. If the content is successfully delivered, the log for the Lambda will note that it was accepted.

def lambda_handler(event, context):

    # Enable logging to console
    logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    try:

        # Initialize empty records array
        #
        key_records = []
        
        # Retrieve list of IAM Users
        logging.info("Retrieving a list of IAM Users...")
        users = query_iam_users()

        # Retrieve list of access keys for each IAM User and add to record
        logging.info("Retrieving a listing of access keys for each IAM User...")
        for user in users:
            key_records.extend(query_access_keys(user))
        # Prepare data for sending to Azure Monitor HTTP Data Collector API
        body = json.dumps(key_records)
        post_data(os.environ['WorkspaceId'], os.environ['WorkspaceKey'], body, os.environ['LogName'])

    except Exception as e:
        logging.error("Execution error",exc_info=True)

Once the data is delivered, it will take a few minutes for it to be processed and appear in the Log Analytics Workspace. In my tests it only took around 2-5 minutes, but I wasn’t writing much data to the API.  After the data processes you’ll see a new entry under the listing of Custom Logs in the Log Analytics Workspace.  The entry will be the log name you picked and with a _CL at the end.  Expanding the entry will display the columns that were created based upon the log entry.  Note that the columns consumed from the data you passed will end with an underscore and a character denoting the data type.

mylog

Now that the data is in the workspace, I can start querying it and creating some visualizations.  Azure Monitor uses the Kusto Query Language (KQL).  If you’ve ever created queries in Splunk, the language will feel familiar.

The log I created in AWS and pushed to the API has the following schema.  Note the addition of the underscore followed by a character denoting the column data type.

  • logged_Date (string) – The date the Lambda ran
  • user_s (string) – The AWS IAM User the key belongs to
  • account_number_s (string) – The AWS Account number the IAM Users belong to
  • AccessKeyId (string) – The id of the access key associated with the user which has been sanitized to show just the first 4 and last 4 characters
  • CreateDate_t (timestamp) – The date and time when the access key was created
  • LastUsedDate_t (timestamp) – The date and time the key was last used
  • Region_s (string) – The region where the access key was last used
  • Status_s (string) – Whether the key is enabled or disabled
  • ServiceName_s (string) – The AWS service where the access key was last used

In addition to what I’ve pushed, Azure Monitor adds a TimeGenerated field to each record which is the time the log entry was sent to Azure Monitor.  You can override this behavior and provide a field for Azure Monitor to use for this if you like (see here).  There are some other miscellaneous fields are inherited from whatever schema the API is drawing from.  These are fields such as TenantId and SourceSystem, which in this case is populated with RestAPI.

Since my personal AWS environment is quite small and the AWS IAM Users usage are very limited, my data sets aren’t huge.  To address this I created a number of IAM Users with access keys for the purpose blog.  I’m getting that out of the way so my AWS friends don’t hate on me. 🙂

One of core best practices in key management with shared keys is to ensure you rotate them.  The first data point I wanted to extract was which keys that existed in my AWS account were over 90 days old.  To do that I put together the following query:

AWS_Access_Key_Report_CL
| extend key_age = datetime_diff('day',now(),CreateDate_t)
| project Age=key_age,AccessKey=AccessKeyId_s, User=user_s
| where Age > 90
| sort by Age

Let’s walk through the query.  The first line tells the query engine to run this query against the AWS_Access_Key_Report_CL.  The next line creates a new field that contains the age of the key by determining the amount of time that has passed between the creation date of the key and today’s date.  The line after that instructs the engine to pull back only the key_age field I just created and the AccessKeyId_s, user_s , and status_s fields.  The results are then further culled down to pull only records where the key age is greater than 90 days and finally the results are sorted by the age of the key.

query1

Looks like it’s time to rotate that access key in use by Azure AD. 🙂

I can then pin this query to a new shared dashboard for other users to consume.  Cool and easy right?  How about we create something visual?

Looking at the trends in access key creation can provide some valuable insights into what is the norm and what is not.  Let’s take a look a the metrics for key creation (of the keys still exist in an enabled/disabled state).  For that I’m going to use the following query:

AWS_Access_Key_Report_CL
| make-series AccessKeys=count() default=0 on CreateDate_t from datetime(2019-01-01) to datetime(2020-01-01) step 1d

In this query I’m using the make-series operator to count the number of access keys created each day and assigning a default value of 0 if there are no keys created on that date.  The result of the query isn’t very useful when looking at it in tabular form.

query2.PNG

By selecting the Line drop down box, I can transform the date into a line grab which shows me spikes of creation in log creation.  If this was real data, investigation into the spike of key creations on 6/30 may be warranted.

quer2_2.PNG

I put together a few other visuals and tables and created a custom dashboard like the below.  Creating the dashboard took about an hour so, with much of the time invested in figuring out the query language.

dashboard

What you’ve seen here is a demonstration of the power and simplicity of Azure Monitor.  By adding a simple to use API, Microsoft has exponentially increased the agility of the tool by allowing it to become a single pane of glass for monitoring across clouds.   It’s also worth noting that Microsoft’s BI (business intelligence) tool Power BI has direct integration with Azure Log Analytics.  This allows you to pull that log data into PowerBI and perform more in-depth analysis and to create even richer visualizations.

Well folks, I hope you’ve found this series of value.  I really enjoyed creating it and already have a few additional use cases in mind.  Make sure to follow me on Github as I’ll be posting all of the code and solutions I put together there for your general consumption.

Have a great day!

 

 

Visualizing AWS Logging Data in Azure Monitor – Part 1

Visualizing AWS Logging Data in Azure Monitor – Part 1

Hi folks!

2019 is more than halfway over and it feels like it has happened in a flash.  It’s been an awesome year with tons of change and even more learning.  I started the year neck deep in AWS and began transitioning into Azure back in April when I joined on with Microsoft.  Having the opportunity to explore both clouds and learn the capabilities of each offering has been an amazing experience that I’m incredibly thankful for.  As I’ve tried to do for the past 8 years, I’m going to share some of those learning with you.  Today we’re going to explore one of the capabilities that differentiates Azure from its competition.

One of the key takeaways I’ve had from my experiences with AWS and Microsoft is enterprises have become multicloud.  Workloads are quickly being spread out among public and private clouds.  While the business benefits greatly from a multicloud approach where workloads can go to the most appropriate environment where the cost, risks, and time tables best suit it, it presents a major challenge to the technical orchestration behind the scenes.  With different APIs (application programmatic interface), varying levels of compliance, great and not so great capabilities around monitoring and alerting, and a major industry gap in multicloud skills sets, it can become quite a headache to successfully execute this approach.

One area Microsoft Azure differentiates itself is its ability to easy the challenge of monitoring and alerting in a multicloud environment.  Azure Monitor is one of the key products behind this capability.  With this post I’m going to demonstrate Azure Monitor’s capabilities in this realm by walking you through a pattern of delivering, visualizing, and analyzing log data collected from AWS.  The pattern I’ll be demonstrating is reusable for most any cloud (and potentially on-premises) offering.  Now sit back, put your geek hat on, and let’s dive in.

First I want to briefly talk about what Azure Monitor is?  Azure Monitor is a solution which brings together a collection of tools that can be used to collect and analyze the large abundance of telemetry available today.  This telemetry could be metrics in regards to a virtual machine’s performance or audit logs for Azure Active Directory.  The product team has put together the excellent diagram below which explains the architecture of the solution.

As you can see from the inputs on the left, Azure Monitor is capable of collecting and analyzing data from a variety of sources.  You’ll find plenty of documentation the product team has made publicly available on the five gray items, so I’m going to instead focus on custom sources.

For those of you who have been playing in the AWS pool, you can think of Azure Monitor as something similar (but much more robust) to CloudWatch Metrics and CloudWatch Logs.  I know, I know, you’re thinking I’ve drank the Microsft Kool-Aid.

koolaid.jpg

While I do love to reminisce about cold glasses of Kool-Aid on hot summers in the 1980s, I’ll opt to instead demonstrate it in action and let you decide for yourself.  To do this I’ll be leveraging the new API Microsoft introduced.  The Azure Monitor HTTP Data Collector API was introduced a few months back and provides the capability of delivering log data to Azure where it can be analyzed by Azure Monitor.

With Azure Monitor logs are stored in an Azure resource called a Log Analytics Workspace.  For you AWS folk, you can think of a Log Analytics Workspace as something similar to CloudWatch Log Groups where the data stored in a logical boundary where the data shares a retention and authorization boundary.  Logs are sent to the API in JSON format and are placed in the Log Analytics Workspace you specify.  A high level diagram of the flow can be seen below.

So now that you have a high level understanding of what Azure Monitor is, what it can do, and how the new API works, let’s talk about the demonstration.

If you’ve used AWS you’re very familiar with the capabilities CloudWatch Metrics Dashboards and the basic query language available to analyze CloudWatch Logs.  To perform more complex queries and to create deeper visualizations, third-party solutions are often used such as ElasticSearch and Kibana.  While these solutions work, they can be complex to implement and can create more operational overhead.

When a peer informed me about the new API a few weeks back, I was excited to try it out.  I had just started to use Azure Monitor to put together some dashboards for my personal Office 365 and Azure subscriptions and was loving the power and simplicity of the analytics component of the solution.  The new API opened up some neat opportunities to pipe logging data from AWS into Azure to create a single dashboard I could reference for both clouds.  This became my use case and demonstration of the pattern of delivering logs from a third party to Azure Monitor with some simple Python code.

The logs I chose to deliver to the API were logs containing information surrounding the usage of AWS access ids and keys.  I had previously put together some code to pull this data and write it to an S3 bucket.

Let’s take a look at the design of the solution.  I had a few goals I wanted to make sure to hit if possible.  My first goal was to keep the code simple.  That mean limiting the usage of third-party modules and avoid over complicating the implementation.

My second goal was to limit the usage of static credentials.  If I ran the code in Azure, I’d need to setup an AWS IAM User and provision an access id and secret key.  While I’m aware of the workaround to use SAML authentication, I’m not a fan because in my personal opinion, it’s using SAML in such a way you are trying to hammer in a square peg in a round hole.  Sure you can do it, but you really shouldn’t unless you’re out of options.  Additionally, the solution requires some fairly sensitive permissions in AWS such as IAM:ListAccessKeys so the risk of the credentials being compromised could be significant.  Given the risks and constraints of authentication methods to the AWS API, I opted to run my code as a Lambda and follow AWS best practices and assign the Lambda an IAM role.

On the Azure side, the Azure Monitor API for log delivery requires authentication using the Workspace ID and Workspace key.   Ideally these would be encrypted and stored in AWS Secrets Manager or as a secure parameter in Parameter Store, but I decided to go the easy route and store them as environment variables for the Lambda and to encrypt them with AWS KMS.  This cut back on the code and made the CloudFormation templates easier to put together.

With the decisions made the resulting design is pictured above.

Capture.PNG

I’m going to end the post here and save the dive into implementation and code for the next post.  In the meantime, take a read through the Azure Monitor documentation and familiarize yourself with the basics.  I’ve also put the whole solution up on Github if you’d like to follow along for next post.

See you next post!

 

Using Python to Pull Data from MS Graph API – Part 2

Using Python to Pull Data from MS Graph API – Part 2

Welcome back my fellow geeks!

In this series I’m walking through my experience putting together some code to integrate with the Microsoft Graph API (Application Programming Interface).  In the last post I covered the logic behind this pet project and the tools I used to get it done.  In this post I’ll be walking through the code and covering what’s happening behind the scenes.

The project consists of three files.  The awsintegration.py file contains functions for the integration with AWS Systems Manager Parameter Store and Amazon S3 using the Python boto3 SDK (Software Development Kit).  Graphapi.py contains two functions.  One function uses Microsoft’s Azure Active Directory Library for Python (ADAL) and the other function uses Python’s Requests library to make calls to the MS Graph API.  Finally, the main.py file contains the code that brings everything together. There are a few trends you’ll notice with all of the code. First off it’s very simple since I’m a long way from being able to do any fancy tricks and the other is I tried to stay away from using too many third-party modules.

Let’s first dig into the awsintegration.py module.  In the first few lines above I import the required modules which include AWS’s Boto3 library.

import json
import boto3
import logging

Python has a stellar standard logging module that makes logging to a centralized location across a package a breeze.  The line below configures modules called by the main package to inherit the logging configuration from the main package.  This way I was able to direct anything I wanted to log to the same log file.

log = logging.getLogger(__name__)

This next function uses Boto3 to call AWS Systems Manager Parameter Store to retrieve a secure string.  Be aware that if you’re using Parameter Store to store secure strings the security principal you’re using to make the call (in my case an IAM User via Cloud9) needs to have appropriate permissions to Parameter Store and the KMS CMK.  Notice I added a line here to log the call for the parameter to help debug any failures.  Using the parameter store with Boto3 is covered in detail here.

def get_parametersParameterStore(parameterName,region):
    log.info('Request %s from Parameter Store',parameterName)
    client = boto3.client('ssm', region_name=region)
    response = client.get_parameter(
        Name=parameterName,
        WithDecryption=True
    )
    return response['Parameter']['Value']

The last function in this module again uses Boto3 to upload the file to an Amazon S3 bucket with a specific prefix.  Using S3 is covered in detail here.

def put_s3(bucket,prefix,region,filename):
    s3 = boto3.client('s3', region_name=region)
    s3.upload_file(filename,bucket,prefix + "/" + filename)

Next up is the graphapi.py module.  In the first few lines I again import the necessary modules as well as the AuthenticationContext module from ADAL.  This module contains the AuthenticationContext class which is going to get the OAuth 2.0 access token needed to authenticate to the MS Graph API.

import json
import requests
import logging
from adal import AuthenticationContext

log = logging.getLogger(__name__)

In the function below an instance of the AuthenticationContext class is created and the acquire_token_with_client_credentials method is called.   It uses the OAuth 2.0 Client Credentials grant type which allows the script to access the MS Graph API without requiring a user context.  I’ve already gone ahead and provisioned and authorized the script with an identity in Azure AD and granted it the appropriate access scopes.

Behind the scenes Azure AD (authorization server in OAuth-speak) is contacted and the script (client in OAuth-speak) passes a unique client id and client secret.  The client id and client secret are used to authenticate the application to Azure AD which then looks within its directory to determine what resources the application is authorized to access (scope in OAuth-speak).  An access token is then returned from Azure AD which will be used in the next step.

def obtain_accesstoken(tenantname,clientid,clientsecret,resource):
    auth_context = AuthenticationContext('https://login.microsoftonline.com/' +
        tenantname)
    token = auth_context.acquire_token_with_client_credentials(
        resource=resource,client_id=clientid,
        client_secret=clientsecret)
    return token

A properly formatted header is created and the access token is included. The function checks to see if the q_param parameter has a value and it if does it passes it as a dictionary object to the Python Requests library which includes the key values as query strings. The request is then made to the appropriate endpoint. If the response code is anything but 200 an exception is raised, written to the log, and the script terminates.  Assuming a 200 is received the Python JSON library is used to parse the response.  The JSON content is searched for an attribute of @odata.nextLink which indicates the results have been paged.  The function handles it by looping until there are no longer any paged results.  It additionally combines the paged results into a single JSON array to make it easier to work with moving forward.

def makeapirequest(endpoint,token,q_param=None):
 
    headers = {'Content-Type':'application/json', \
    'Authorization':'Bearer {0}'.format(token['accessToken'])}

    log.info('Making request to %s...',endpoint)
        
    if q_param != None:
        response = requests.get(endpoint,headers=headers,params=q_param)
        print(response.url)
    else:
        response = requests.get(endpoint,headers=headers)    
    if response.status_code == 200:
        json_data = json.loads(response.text)
            
        if '@odata.nextLink' in json_data.keys():
            log.info('Paged result returned...')
            record = makeapirequest(json_data['@odata.nextLink'],token)
            entries = len(record['value'])
            count = 0
            while count < entries:
                json_data['value'].append(record['value'][count])
                count += 1
        return(json_data)
    else:
        raise Exception('Request failed with ',response.status_code,' - ',
            response.text)

Lastly there is main.py which stitches the script together.  The first section adds the modules we’ve already covered in addition to the argparse library which is used to handle arguments added to the execution of the script.

import json
import requests
import logging
import time
import graphapi
import awsintegration
from argparse import ArgumentParser

A simple configuration for the logging module is setup instructing it to write to the msapiquery.log using a level of INFO and applies a standard format.

logging.basicConfig(filename='msapiquery.log', level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

This chunk of code creates an instance of the ArgumentParser class and configures two arguments.  The sourcefile argument is used to designate the JSON parameters file which contains all the necessary information.

The parameters file is then opened and processed.  Note that the S3 parameters are only pulled in if the –s3 switch was used.

parser = ArgumentParser()
parser.add_argument('sourcefile', type=str, help='JSON file with parameters')
parser.add_argument('--s3', help='Write results to S3 bucket',action='store_true')
args = parser.parse_args()

try:
    with open(args.sourcefile) as json_data:
        d = json.load(json_data)
        tenantname = d['parameters']['tenantname']
        resource = d['parameters']['resource']
        endpoint = d['parameters']['endpoint']
        filename = d['parameters']['filename']
        aws_region = d['parameters']['aws_region']
        q_param = d['parameters']['q_param']
        clientid_param = d['parameters']['clientid_param']
        clientsecret_param = d['parameters']['clientsecret_param']
        if args.s3:
            bucket = d['parameters']['bucket']
            prefix = d['parameters']['prefix']

Next up the get_parametersParameterStore function from the awsintegration module is executed twice.  Once to get the client id and once to get the client secret.  Note that the get_parameters method for Boto3 Systems Manager client could have been used to get both of the parameters in a single call, but I didn’t go that route.

    logging.info('Attempting to contact Parameter Store...')
    clientid = awsintegration.get_parametersParameterStore(clientid_param,aws_region)
    clientsecret = awsintegration.get_parametersParameterStore(clientsecret_param,aws_region)

In these next four lines the access token is obtained by calling the obtain_accesstoken function and the request to the MS Graph API is made using the makeapirequest function.

    logging.info('Attempting to obtain an access token...')
    token = graphapi.obtain_accesstoken(tenantname,clientid,clientsecret,resource)

    logging.info('Attempting to query %s ...',endpoint)
    data = graphapi.makeapirequest(endpoint,token,q_param)

This section creates a string representing the current day, month, and year and prepends the filename that was supplied in the parameters file.  The file is then opened using the with statement.  If you’re familiar with the using statement from C# the with statement is similar in that it ensures resources are cleaned up after being used.

Before the data is written to file, I remove the @odata.nextLink key if it’s present.  This is totally optional and just something I did to pretty up the results.  The data is then written to the file as raw text by using the Python JSON encoder/decoder.

    logging.info('Attempting to write results to a file...')
    timestr = time.strftime("%Y-%m-%d")
    filename = timestr + '-' + filename
    with open(filename,'w') as f:
        
        ## If the data was paged remove the @odata.nextLink key
        ## to clean up the data before writing it to a file

        if '@odata.nextLink' in data.keys():
            del data['@odata.nextLink']
        f.write(json.dumps(data))

Finally, if the s3 argument was passed when the script was run, the put_s3 method from the awsintegration module is run and the file is uploaded to S3.

    logging.info('Attempting to write results to %s S3 bucket...',bucket)
    if args.s3:
        awsintegration.put_s3(bucket,prefix,aws_region,filename)

Exceptions thrown anywhere in the script are captured here written to the log file.  I played around a lot with a few different ways of handling exceptions and everything was so interdependent that if there was a failure it was best for the script to stop altogether and inform the user.  Naftali Harris has an amazing blog that walks through the many different ways of handling exceptions in Python and the various advantages and disadvantages.  It’s a great read.

except Exception as e:
    logging.error('Exception thrown: %s',e)
    print('Error running script.  Review the log file for more details')

So that’s what the code is.  Let’s take a quick look at the parameters file below.  It’s very straight forward.  Keep in mind both the bucket and prefix parameters are only required when using the –s3 option.  Here are some details on the other options:

  • The tenantname attribute is the DNS name of the Azure AD tenant being queries.
  • The resource attribute specifies the resource the access token will be used for.  If you’re going to be hitting the MS Graph API, more than likely it will be https://graph.microsoft.com
  • The endpoint attribute specifies the endpoint the request is being made to including any query strings you plan on using
  • The clientid_param and clientsecret_param attributes are the AWS Systems Manager Parameter Store parameter names that hold the client id and client secret the script was provisioned from Azure AD
  • The q_param attribute is an array of key value pairs intended to story OData query strings
  • The aws_region attribute is the region the S3 bucket and parameter store data is stored in
  • The filename attribute is the name you want to set for the file the script will produce
{
    "parameters":{
        "tenantname": "mytenant.com",
        "resource": "https://graph.microsoft.com",
        "endpoint": "https://graph.microsoft.com/beta/auditLogs/signIns",
        "clientid_param":"myclient_id",
        "clientsecret_param":"myclient_secret",
        "q_param":{"$filter":"createdDateTime gt 2019-01-09"},
        "aws_region":"us-east-1",
        "filename":"sign_in_logs.json",
        "bucket":"mybucket",
        "prefix":"myprefix"
    }
}

Now that the script has been covered, let’s see it action.  First I’m going to demonstrate how it handles paging by querying the MS Graph API endpoint to list out the users in the directory.  I’m going to append the $select query parameter and set it to return just the user’s id to make the output more simple and set the $top query parameter to one to limit the results to one user per page.  The endpoint looks like this https://graph.microsoft.com/beta/users?$top=1&select=id.

I’ll be running the script from an instance of Cloud9.  The IAM user I’m using with AWS has appropriate permissions to the S3 bucket, KMS CMK, and parameters in the parameter store.  I’ve set each of the parameters in the parameters file to the appropriate values for the environment I’ve configured.  I’ll additionally be using the –s3 option.

 

run_script.png

Once the script is complete it’s time to look at the log file that was created.  As seen below each step in the script to aid with debugging if something were to fail.  The log also indicates the results were paged.

log

The output is nicely formatted JSON that could be further transformed or fed into something like Amazon Athena for further analysis (future post maybe?).

json.png

Cool right?  My original use case was sign-in logs so let’s take a glance that.  Here I’m going to use an endpoint of https://graph.microsoft.com/beta/auditLogs/signIns with a OData filter option of createdDateTime gt 2019-01-08 which will limit the data returned to today’s sign-ins.

In the logs we see the script was successfully executed and included the filter specified.

graphapi_log_sign.png

The output is the raw JSON of the sign-ins over the past 24 hours.  For your entertainment purposes I’ve included one of the malicious sign-ins that was captured.  I SO can’t wait to examine this stuff in a BI tool.

sign_in_json

Well that’s it folks.  It may be ugly, but it works!  This was a fun activity to undertake as a first stab at making something useful in Python.  I especially enjoyed the lack of documentation available on this integration.  It really made me dive deep and learn things I probably wouldn’t have if there were a billion of examples out there.

I’ve pushed the code to Github so feel free to muck around with it to your hearts content.

Using Python to Pull Data from MS Graph API – Part 1

Welcome to 2019 fellow geeks! I hope each of you had a wonderful holiday with friends and family.

It’s been a few months since my last post. As some of you may be aware I made a career move last September and took on a new role with a different organization. The first few months have been like drinking from multiple fire hoses at once and I’ve learned a ton. It’s been an amazing experience that I’m excited to continue in 2019.

One area I’ve been putting some focus in is learning the basics of Python. I’ve been a PowerShell guy (with a bit of C# thrown in there) for the past six years so diving into a new language was a welcome change. I picked up a few books on the language, watched a few videos, and it wasn’t clicking. At that point I decided it was time to jump into the deep end and come up with a use case to build out a script for. Thankfully I had one queued up that I had started in PowerShell.

Early last year my wife’s Office 365 account was hacked. Thankfully no real damage was done minus some spam email that was sent out. I went through the wonderful process of changing her passwords across her accounts, improving the complexity and length, getting her on-boarded with a password management service, and enabling Azure MFA (Multi-factor Authentication) on her Office 365 account and any additional services she was using that supported MFA options.  It was not fun.

Curious of what the logs would have shown, I had begun putting together a PowerShell script that was going to pull down the logs from Azure AD (Active Directory), extract the relevant data, and export it CSV (comma-separate values) where I could play around with it in whatever analytics tool I could get my hands on. Unfortunately life happened and I never had a chance to finish the script or play with the data. This would be my use case for my first Python script.

Azure AD offers a few different types of logs which Microsoft divides into a security pillar and an activity pillar. For my use case I was interested in looking at the reports in the Activity pillar, specifically the Sign-ins report. This report is available for tenants with an Azure AD Premium P1 or P2 subscription (I added P2 subscriptions to our family accounts last year).  The sign-in logs have a retention period of 30 days and are available either through the Azure Portal or programmatically through the MS Graph API (Application Programming Interface).

My primary goals were to create as much reusable code as possible and experiment with as many APIs/SDKs (Software Development Kits) as I could.  This was accomplished by breaking the code into various reusable modules and leveraging AWS (Amazon Web Services) services for secure storage of Azure AD application credentials and cloud-based storage of the exported data.  Going this route forced me to use the MS Graph API, Microsoft’s Azure Active Directory Library for Python (or ADAL for short), and Amazon’s Boto3 Python SDK.

On the AWS side I used AWS Systems Manager Parameter Store to store the Azure AD credentials as secure strings encrypted with a AWS KMS (Key Management Service) customer-managed customer master key (CMK).  For cloud storage of the log files I used Amazon S3.

Lastly I needed a development environment and source control.  For about a day I simply used Sublime Text on my Mac and saved the file to a personal cloud storage account.  This was obviously not a great idea so I decided to finally get my GitHub repository up and running.  Additionally I moved over to using AWS’s Cloud9 for my IDE (integrated development environment).   Cloud9 has the wonderful perk of being web based and has the capability of creating temporary credentials that can do most of what my AWS IAM user can do.  This made it simple to handle permissions to the various resources I was using.

Once the instance of Cloud9 was spun up I needed to set the environment up for Python 3 and add the necessary libraries.  The AMI (Amazon Machine Image) used by the Cloud9 service to provision new instances includes both Python 2.7 and Python 3.6.  This fact matters when adding the ADAL and Boto3 modules via pip because if you simply run a pip install module_name it will be installed for Python 2.7.  Instead you’ll want to execute the command python3 -m pip install module_name which ensures that the two modules are installed in the appropriate location.

In my next post I’ll walk through and demonstrate the script.

Have a great week!

 

 

 

 

Integrating Azure AD and G-Suite – Automated Provisioning

Integrating Azure AD and G-Suite – Automated Provisioning

Today I’ll wrap up my series on Azure Active Directory’s (Azure AD) integration with Google’s G-Suite.  In my first entry I covered the single-sign on (SSO) integration and in my second and third posts I gave an overview of Google’s Cloud Platform (GCP) and demonstrated how to access a G-Suite domain’s resources through Google’s APIs.  In this post I’m going to cover how Microsoft provides automated provisioning of user, groups, and contacts .  If you haven’t read through my posts on Google’s API (part 1, part 2) take a read through so you’re more familiar with the concepts I’ll be covering throughout this post.

SSO using SAML or Open ID Connect is a common capability of most every cloud solutions these days.  While that solves the authentication problem, the provisioning of users, groups, and other identity-relates objects remains a challenge largely due to the lack of widely accepted standards (SCIM has a ways to go folks).  Vendors have a variety of workarounds including making LDAP calls back to a traditional on-premises directory (YUCK), supporting uploads of CSV files, or creating and updating identities in its local databases based upon the information contained in a SAML assertion or Open ID Connect id token.  A growing number of vendors are exposing these capabilities via a web-based API.  Google falls into this category and provides a robust selection of APIs to interact with its services from Gmail to resources within Google Cloud Platform, and yes even Google G-Suite.

If you’re a frequent user of Azure AD, you’ll have run into the automatic provisioning capabilities it brings to the table across a wide range of cloud services.  In a previous series I covered its provisioning capabilities with Amazon Web Services.  This is another use case where Microsoft leverages a third party’s robust API to simplify the identity management lifecycle.

In the SSO Quickstart Guide Microsoft provides for G-Suite it erroneously states:

“Google Apps supports auto provisioning, which is by default enabled. There is no action for you in this section. If a user doesn’t already exist in Google Apps Software, a new one is created when you attempt to access Google Apps Software.”

This simply isn’t true.  While auto provisioning via the API can be done, it is a feature you need to code to and isn’t enabled by default.  When you enable SSO to G-Suite and attempt to access it using an assertion containing the claim for a user that does not exist within a G-Suite domain you receive the error below.

google4int1

This establishes what we already knew in that identities representing our users attempting SSO to G-Suite need to be created before the users can authenticate.  Microsoft provides a Quickstart for auto provisioning into G-Suite.  The document does a good job telling you were to click and giving some basic advice but really lacks in the detail into what’s happening in the background and describing how it works.

Let’s take a deeper look shall we?

If you haven’t already, add the Google Apps application from the Azure AD Application Gallery.  Once the application is added navigate to the blade for the application and select the Provisioning page.  Switch the provisioning mode from manual to automatic.

google4int2.png

Right off the bat we see a big blue Authorize button which tells us that Microsoft is not using the service accounts pattern for accessing the Google API.  Google’s recommendation is to use the service account pattern when accessing project-based data rather than user specific data.  The argument can be made that G-Suite data doesn’t fall under project-based data and the service account credential doesn’t make sense.  Additionally using a service account would require granting the account domain-wide delegation for the G-Suite domain allowing the account to impersonate any user in the G-Suite domain.  Not really ideal, especially from an auditing perspective.

By using the Server-side Web Apps pattern a new user in G-Suite can be created and assigned as the “Azure AD account”. The downfall with of this means you’re stuck paying Google $10.00 a month for a non-human account. The price of good security practices I guess.

google4int3.png

Microsoft documentation states that the account must be granted the Super Admin role. I found this surprising since you’re effectively giving the account god rights to your G-Suite domain. It got me wondering what authorization scopes is Microsoft asking for? Let’s break out Fiddler and walk through the process that kicks off after clicking on the Authorization button.

A new window pops up from Google requesting me to authenticate. Here Azure AD, acting as the OAuth client, has made an authorization request and has sent me along with the request over to the Google which is acting as the authorization server to authenticate, consent to the access, and take the next step in the authorization flow.

google4int4

When I switch over to Fiddler I see a number of sessions have been captured.  Opening the WebForms window of the first session to accounts.google.com a number of parameters that were passed to Google.

google4int5

The first parameter gives us the three authorization scopes Azure AD is looking for.  The admin.directory.group and admin.directory.user are scopes are both related to the Google Directory API, which makes sense if it wants to manage users and groups.  The /m8/feeds scope grants it access to manage contacts via the Google Contacts API.  This is an older API that uses XML instead of JSON to exchange information and looks like it has been/is being replaced by the Google People API.

Management of contacts via this API is where the requirement for an account in the Super Admin role originates.  Google documentation states that management of domain shared contacts via the /m8/feeds API requires an administrator username and password for Google Apps.  I couldn’t find any privilege in G-Suite which could be added to a custom Admin role that mentioned contacts.  Given Google’s own documentation along the lack of an obvious privilege option, this may be a hard limitation of G-Suite.  Too bad too because there are options for both Users and Groups.  Either way, the request for this authorization scope drives the requirement for Super Admin for the account Azure AD will be using for delegated access.

The redirect_uri is the where Google sends the user after the authorization request is complete.  The response_type tells us Azure AD and Google are using the OAuth authorization code grant type flow.  The client_id is the unique identifier Google has assigned to Azure AD in whatever project Microsoft has it built in.  The approval_prompt setting of force tells Google to display the consent window and the data Azure AD wants to access.  Lastly, the access_type setting of offline allows Azure AD to access the APIs without the user being available to authenticate via a refresh token which will be issued along with the access token.  Let’s pay attention to that one once the consent screen pops up.

I plug in valid super user credentials to my G-Suite domain and authenticate and receive the warning below.  This indicates that Microsoft has been naughty and hasn’t had their application reviewed by Google.  This was made a requirement back in July of 2017… so yeah… Microsoft maybe get on that?

google4int6.png

To progress to the consent screen I hit the Advanced link in the lower left and opt to continue.  The consent window then pops up.

google4int7.png

Here I see that Microsoft has registered their application with a friendly name of azure.com.  I’m also shown the scopes that the application wants to access which jive with the authorization scopes we saw in Fiddler.  Remember that offline access Microsoft asked for?  See it mentioned anywhere in this consent page that I’m delegating this access to Microsoft perpetually as long as they ask for a refresh token?  This is one of my problems with OAuth and consent windows like this.  It’s entirely too vague as to how long I’m granting the application access to my data or to do things as me.  Expect to see this OAuth consent attacks continue to grow in in use moving forward.  Why worry about compromising the user’s credentials when I can display a vague consent window and have them grant me access directly to their data?  Totally safe.

Hopping back to the window, I click the Allow button and the consent window closes.  Looking back at Fiddler I see that I received back an authorization code and posted it back to the reply_uri designated in the original authorization request.

google4int8.png

Switching back to the browser window for the Azure Portal the screen updates and the Test Connection button becomes available.  Clicking the button initiates a quick check where Azure AD obtains an access token for the scopes it requires unseen to the user.  After the successful test I hit the Save button.

google4int9.png

Switching to the browser window for the Google Admin Portal let’s take a look at the data that’s been updated for the user I used to authorize Microsoft its access.  For that I select the user, go to the Security section and I now see that the Azure Active Directory service is authorized to the contacts, user, and group management scopes.

google4int10.png

Switching back to the browser window for the Azure Portal I see some additional options are now available.

google4int11.png

The mappings are really interesting and will look familiar to you if you’ve ever done anything with an identity management tool like Microsoft Identity Manager (MIM) or even Azure AD Sync.  The user mappings for example show which attributes in Azure AD are used to populate the attributes in G-Suite.

google4int12.png

The attributes that have the Delete button grayed out are required by Google in order to provision new user accounts in a G-Suite domain.  The options available for deletion are additional data beyond what is required that Microsoft can populate on user accounts it provisions into G-Suite.  Selecting the Show advanced options button, allow you to play with the schema Microsoft is using for G-Suite.   What I found interesting about this schema is it doesn’t match the resource representation Google provides for the API.  It would have been nice to match the two to make it more consumable, but they’re probably working off values used in the old Google Provisioning API or they don’t envision many people being nerdy enough to poke around the schema.

Next up I move toggle the provisioning status from Off to On and leave the Scope option set to sync only the assigned users and groups.

google4int13.png

I then hit the Save button to save the new settings and after a minute my initial synchronization is successful.  Now nothing was synchronized, but it shows the credentials correctly allowed Azure AD to hit my G-Suite domain over the appropriate APIs with the appropriate access.

google4int14.png

So an empty synchronization works, how about one with a user?  I created a new user named dutch.schaefer@geekintheweeds.com with only the required attributes of display name and user principal name populated, assigned the new user to the Google Apps application and give Azure AD a night to run another sync.  Earlier tonight I checked the provisioning summary and verified the sync grabbed the new user.

google4int15.png

Review of the audit logs for the Google Apps application shows that the new user was exported around 11PM EST last night.  If you’re curious the synch between Azure AD and G-Suite occurs about every 20 minutes.

google4int16.png

Notice that the FamilyName and GivenName attributes are set to a period.  I never set the first or last name attributes of the user in Azure AD, so both attributes are blank.  If we bounce back to the attribute mapping and look at the attributes for Google Apps, we see that FamilyName and GivenName are both required meaning Azure AD had to populate them with something.  Different schemas, different requirements.

google4int17

Switching over to the Google Admin Console I see that the new user was successfully provisioned into G-Suite.

google4int18.png

Pretty neat overall.  Let’s take a look at what we learned:

  • Azure AD supports single sign-on to G-Suite via SAML using a service provider-initiated flow where Azure AD acts as the identity provider and G-Suite acts as the service provider.
  • A user object with a login id matching the user’s login id in Azure Active Directory must be created in G-Suite before single sign-on will work.
  • Google provides a number of libraries for its API and the Google API Explorer should be used for experimentation with Google’s APIs.
  • Google’s Directory API is used by Azure AD to provision users and groups into a G-Suite domain.
  • Google’s Contacts API is used by Azure AD to provision contacts into a G-Suite domain.
  • A user holding the Super Admin role in the G-Suite domain must be used to authorize Azure AD to perform provisioning activities.  The Super Admin role is required due to the usage of the Google Contact API.
  • Azure AD’s authorization request includes offline access using refresh tokens to request additional access tokens to ensure the sync process can be run on a regular basis without requiring re-authorization.
  • Best practice is to dedicate a user account in your G-Suite domain to Azure AD.
  • Azure AD uses the Server-side Web pattern for accessing Google’s APIs.
  • The provisioning process will populate a period for any attribute that is required in G-Suite but does not have a value in the corresponding attribute in Azure AD.
  • The provisioning process runs a sync every 20 minutes.

Even though my coding is horrendous, I absolutely loved experimenting with the Google API.  It’s easy to realize why APIs are becoming so critical to a good solution.  With the increased usage of a wide variety of products in a business, being able to plug and play applications is a must.  The provisioning aspect Azure AD demonstrates here is a great example of the opportunities provided when critical functionality is exposed for programmatic access.

I hope you enjoyed the series, learned a bit more about both solutions, and got some insight into what’s going on behind the scenes.

 

Integrating Azure AD and G-Suite – Google API Integration Part 2

Integrating Azure AD and G-Suite – Google API Integration Part 2

Welcome back.

We’ve had an interesting journey so far in this series exploring the Azure Active Directory (Azure AD) and Google G-Suite (G-Suite) integration.  In the first entry I covered how the single sign-on integration works.  The second entry covered explored the process of registering an application with the Google Cloud Platform (GCP) for API access and GCP’s relationship to G-Suite.  Today I demonstrate interaction with Google’s API through the use of some very basic .NET console applications I’ve put together.

As you’ll recall from the second entry, I created a new project with GCP and created a service account identity for my sample application and the Google Admin API has been enabled for the project.  The application has been granted domain-wide delegation rights to my G-Suite domain at geekintheweeds.com.  Finally, the application’s client ID added to my G-Suite domain and has been granted access to the https://www.googleapis.com/auth/admin.directory.user.readonly scope.  The application has an identity, credentials to authenticate, and has been authorized with appropriate access.  Let’s get to some coding (or the mess of garbled logic that accounts for my coding ability 🙂 ).

Google provides API client libraries for a number of languages.  For the purposes of this demonstration, I’ll be using the .NET client library.  Play it smart and leverage the libraries vendors provide for ease of integration as well as avoiding critical mistakes that could impact the security of your API calls.  Now using libraries doesn’t mean you get to ignore what goes on behind the curtains.  It’s critical to understand what the libraries are doing to ensure they’re doing things securely and for the purposes of working out bugs.  APIs in the cloud are changing on a daily basis, don’t count on the libraries keeping up with the vendor pace.  The last thing you want to run into is an situation where your application kicks the bucket due to an API change that isn’t reflected in the library and you’re stuck with a broken production application.

With that lecture out of the way, let’s get into the code for the demo application.

My first step is to open a new project in Visual Studio creating a Visual C# Console App using the .NET Framework.  Before I jump into using Google’s libraries I’m going to do the exact opposite of what I advised you to do above.  Yes folks, I’m going to make a call to the API without using Google’s libraries for the purposes of demonstrating the steps involved in acquiring an access token and sending that access token to the API to retrieve data.  I’ll then demonstrate how much simpler it is to use the library so you’ll appreciate the value of them.

As I explained in my last post, Google APIs use OAuth 2.0 for delegated access to data.  Google does a good job explaining the basic steps in obtaining an access token in this article so I won’t go too deeply into detail.  Instead I’m going to walk through putting these steps into code.  Fair warning, this code is meant for demo purposes only.  This means I’m not going to do any data validation, exception catching, or exercise secure coding practices.  Please please please don’t use my terrible coding practices in any real applications you develop.

Ready?  Let’s begin.

The article above first instructs you to obtain OAuth 2.0 credential which I did in my last entry.  The second step is to obtain an access token from the Google Authorization Server.  I’ll be following the instructions for the OAuth 2.0 for Service Accounts since I’m building a console application for simplicity purposes.

The first thing we need to is collect a few pieces of information.  In the service account scenario we need to know the unique identifier of the service account and the scopes we want to access.  For that I pop open a browser and go to the my project in the console for GCP.  From there I navigate to the APIs & Services section and select the credentials option.

google2int1

On the Credentials page I click on my service account to bring up the relevant information.  The service account field contains the unique identifier (or email address as Google refers to it) of the service account I setup.

google2int2.pngThis identifier is one of the pieces of information I’ll need to include in the JSON Web Token (JWT) I’m going to send to Google’s API to obtain an access token.  In addition to the service account identifier, I also need the scopes I want to access, Google Authorization Server endpoint, G-Suite user I’m impersonating, the expiration time I want for the access token (maximum of one hour), and the time the assertion was issued.

google2int3

Now that I’ve collected the information I need for my claim set, I need to grab the private key from the PKCS#12 (.p12) certificate I was issued when I setup my service account.

google2int4.pngAt this point I have all the components I need to assemble my JWT.  For that I’m going to add the jose-jwt library to my application and leverage it to simplify the creation of the JWT.

google2int5.png

Once the library is added I create the JWT.  Google requires it include a base64 encoded header, base64 encoded claim set, and the signature providing my application’s identity and ensuring the integrity of the JWT.  The signature is creating using the RSA-SHA256 signing algorithm, which is the only algorithm Google supports at this time.

google2int6.png
I quickly debug the app, dump the token variable to a string, copy it into Fiddler’s Text Wizard and Base64 decode it, we see the JWT we’ll be passing to Google.

google2int7.pngI now have my JWT assembled and am ready to deliver to Google for an access token request.  The next step is to submit it to Google’s authorization server.  For that I need to a do a few things.  The first thing I need to is create a new class that will act as the data model for JSON response Google will return to me after I submit my authorization request.  I’ll deserialize the JWT I receive back using this data model.

google2int8.png

Next I create a new task that will accept a base web address, uri, and a JWT.  The task then uses an instance of the HttpClient class to post it to Google’s authorization server and to capture the JSON response.

google2int9I then call the new task, pass the appropriate parameters, deserialize the JSON response using the data model I created earlier, and dump the bearer access token into a string variable.

google2int10So I have my bearer access token that allows me to impersonate the user within my G-Suite domain for the purposes of hitting the Google Directory API. I whip up another task that will deliver the bearer access token to the Google Directory API and the JSON response which will include details about the user.

google2int11Finally I call the task and dump the JSON response to the console.

google2int12.pngThe results look like the below.

google2int13.png

Victory!  Whew a lot of work there to pull a small amount of data from an API.   I had to create models for the data (you’ll notice I got lazy at the end and didn’t create a user model), properly secure the JWT for the access token request, manage my own http clients, and handle the JSON responses.  It ended up being a fair amount of code for a relatively simple process.

How much easier is it using Google’s .NET library?  Let’s take a look at it.

In the code below I create a service account credential which handles the process of creating the appropriate JWT access token.

google2int14.png

Next up I create a new instance of the DirectoryService class which will represent the connection to the Google Directory API.  This class the assembly of the eventual POST to obtain the bearer access token.

google2int15.png

Last but not least I submit a request for user information for marge.simpson@geekinthweeds.com using the instance of the DirectoryService class I created and placing the JSON response into instance of the User class included in the Directory API which will serve as the data model for the response.

google2int16
Just a tad bit easier using the library right? If we break out Fiddler we see that four sessions have been created.

google2int17Session 1 and 2 display the submission of the JWT and acquisition of the bearer token.

google2int18.png

Sessions 3 and 4 display the delivery of the bearer token to the API and the JSON response received back.

google2int19.png

What did we learn from these past two posts (besides the fact I’m a terrible developer)?  We saw programmatic access to G-Suite is very dependent on foundational components of GCP.  Application identities are provisioned in the Google’s IAM Service instance associated with the GCP Project.  The appropriate APIs must be enabled for the GCP project that the application needs to use.  The application must then be granted access to the appropriate authorization scopes within the G-Suite domain and Google provides an option for the application to impersonate users within the G-Suite domain rather than relying upon the user’s consent.

My hope is the biggest lesson you took from these two posts is how valuable it is to understand the inner workings of how a vendor leverages a technology.  Vendor-provided libraries are wonderful and make our lives far easier and our code potentially more simple and secure, but nothing is ever more powerful than understanding the inner workings of the foundational technology.  The knowledge you take at the technology level will translate across all vendors you encounter making grasping that new product all that much easier.  Take your time, dig deep into the weeds, and enjoy the journey into the technology.

See you in the next post where I’ll wrap up this series and break down the Microsoft Azure AD identity provisioning integration with G-Suite.  Have a great week!

 

 

Integrating Azure AD and G-Suite – Google API Integration Part 1

Integrating Azure AD and G-Suite – Google API Integration Part 1

Hi everyone,

Welcome to the second post in my series on the integration between Azure Active Directory (Azure AD) and Google’s G-Suite (formally named Google Apps).  In my first entry I covered the single sign-on (SSO) integration between the two solutions.  This included a brief walkthrough of the configuration and an explanation of how the SAML protocol is used by both solutions to accomplish the SSO user experience.  I encourage you to read through that post before you jump into this.

So we have single sign-on between Azure AD and G-Suite, but do we still need to provision the users and group into G-Suite?  Thanks to Google’s Directory Application Programming Interface (API) and Azure Active Directory’s (Azure AD) integration with it, we can get automatic provisioning into G-Suite .  Before I cover how that integration works, let’s take a deeper look at Google’s Cloud Platform (GCP) and its API.

Like many of the modern APIs out there today, Google’s API is web-based and robust. It was built on Google’s JavaScript Object Notation (JSON)-based API infrastructure and uses Open Authorization 2.0 (OAuth 2.0) to allow for delegated access to an entities resources stored in Google. It’s nice to see vendors like Microsoft and Google leveraging standard protocols for interaction with the APIs unlike some vendors… *cough* Amazon *cough*. Google provides software development kits (SDKs) and shared libraries for a variety of languages.

Let’s take a look at the API Explorer.  The API Explorer is a great way to play around with the API without the need to write any code and to get an idea of the inputs and outputs of specific API calls.  I’m first going to do something very basic and retrieve a listing of users in my G-Suite directory.  Once I access the API explorer I hit the All Versions menu item and select the Admin Directory API.

google1int1

On the next screen I navigate down to the directory.users.list method and select it.  On the screen that follows I’m provided with a variety of input fields.  The data I input into these fields will affect what data is returned to me from the API. I put the domain name associated with my G-Suite subscription and hit the Authorize and Execute button.  A new window pops up which allows me to configure which scope of access I want to grant the API Explorer.  I’m going to give it just the scope of https://www.googleapis.com/auth/admin.directory.user.readonly.

google1int2

I then hit the Authorize and Execute button and I’m prompted to authenticate to Google and delegate API Explorer to access data I have permission to access in my G-Suite description.   Here I plug in the username and password for a standard user who isn’t assigned to any G-Suite Admin Roles.

google1int3

After successfully authenticating, I’m then prompted for consent to delegate API Explorer to view the users configured in the user’s G-Suite directory.

google1int4

I hit the allow button, the request for delegated access is complete, and a listing of users within my G-Suite directory are returned in JSON format.

google1int5

Easy right?  How about we step it up a notch and create a new user.  For that operation I’ll be delegating access to API Explorer using an account which has been granted the G-Suite User Management Admin role.  I navigate back to the main list of methods and choose the directory.users.insert method.  I then plug in the required values and hit the Authorize and Execute button.  The scopes menu pops up and I choose the https://www.googleapis.com/auth/admin.directory.user scope to allow for provisioning of the user and then hit the Authorize and Execute button.  The request is made and a successful response is returned.

google1int6

Navigating back to G-Suite and looking at the listing of users shows the new user Marge Simpson as appearing as created.

google1int7

Now that we’ve seen some simple samples using API Explorer let’s talk a bit about how you go about registering an application to interact with Google’s API as well as covering some basic Google Cloud Platform (GCP) concepts.

First thing I’m going to do is navigate to Google’s Getting Started page and create a new project.  So what is a project?  This took a bit of reading on my part because my prior experience with GCP was non-existent.  Think of a Google Project like an Amazon Web Services (AWS) account or a Microsoft Azure Subscription.  It acts as a container for billing, reporting, and organization of GCP resources.  Projects can be associated with a Google Cloud Organization (similar to how multiple Azure subscriptions can be associated with a single Microsoft Azure Active Directory (Azure AD) tenant) which is a resource available for a G-Suite subscription or Google Cloud Identity resource.  The picture below shows the organization associated with my G-Suite subscription.

google1int8

Now that we have the concepts out of the way, let’s get back to the demo. Back at the Getting Started page, I click the Create a new project button and authenticate as the super admin for my G-Suite subscription. I’ll explain why I’m using a super admin later. On the next screen I name the project JOG-NET-CONSOLE and hit the next button.

google1int9

The next screen prompts me to provide a name which will be displayed to the user when the user is prompted for consent in the instance I decide to use an OAuth flow which requires user consent.

google1int10

Next up I’m prompted to specify what type of application I’m integrated with Google.  For this demonstration I’ll be creating a simple console app, so I’m going to choose Web Browser simply to move forward.  I plug in a random unique value and click Create.

google1int11

After creation is successful, I’m prompted to download the client configuration and provided with my Client ID and Client Secret. The configuration file is in JSON format that provides information about the client’s registration and information on authorization server (Google’s) OAuth endpoints. This file can be consumed directly by the Google API libraries when obtaining credentials if you’re going that route.

google1int12

For the demo application I’m building I’ll be using the service account scenario often used for server-to-server interactions. This scenario leverages the OAuth 2.0 Client Credentials Authorization Grant flow. No user consent is required for this scenario because the intention of the service account scenario is it to access its own data. Google also provides the capability for the service account to be delegated the right to impersonate users within a G-Suite subscription. I’ll be using that capability for this demonstration.

Back to the demo…

Now that my application is registered, I now need to generate credentials I can use for the service account scenario. For that I navigate to the Google API Console. After successfully authenticating, I’ll be brought to the dashboard for the application project I created in the previous steps. On this page I’ll click the Credentials menu item.

google1int13

The credentials screen displays the client IDs associated with the JOG-NET-CONSOLE project.  Here we see the client ID I received in the JSON file as well as a default one Google generated when I created the project.

google1int14

Next up I click the Create Credentials button and select the Service Account key option.  On the Create service account key page I provide a unique name for the service account of Jog Directory Access.

The Role drop down box relates to the new roles that were introduced with Google’s Cloud IAM.

You can think of Google’s Cloud IAM as Google’s version of Amazon Web Services (AWS) IAM  or Microsoft’s Azure Active Directory in how the instance is related to the project which is used to manage the GCP resources.  When a new service account is created a new security principal representing the non-human identity is created in the Google Cloud IAM instance backing the project.

Since my application won’t be interacting with GCP resources, I’ll choose the random role of Logs Viewer.  When I filled in the service account name the service account ID field was automatically populated for me with a value.  The service account ID is unique to the project and represents the security principal for the application.   I choose the option to download the private key as a PKCS12 file because I’ll be using the System.Security.Cryptography.X509Certificates namespace within my application later on.  Finally I click the create button and download the PKCS12 file.

google1int15

The new service account now shows in the credential page.

google1int16

 

Navigating to the IAM & Admin dashboard now shows the application as a security principal within the project.

google1int17

I now need to enable the APIs in my project that I want my applications to access.  For this I navigate to the API & Services dashboard and click the Enable APIs and Services link.

google1int23.png

On the next page I use “admin” as my search term, select the Admin SDK and click the Enable button.  The API is now enabled for applications within the project.

From here I navigate down to the Service accounts page and edit the newly created service account.

google1int19

At this point I’ve created a new project in GCP, created a service account that will represent the demo application, and have given that application the right to impersonate users in my G-Suite directory.  I now need to authorize the application to access the G-Suite’s data via Google’s API.  For that I switch over to the G-Suite Admin Console and authenticate as a super admin and access the Security dashboard.  From there I hit the Advanced Setting option and click the Manage API client access link.

google1int20

On the Manage API client access page I add a new entry using the client ID I pulled previously and granting the application access to the  https://www.googleapis.com/auth/admin.directory.user.readonly scope.  This allows the application to impersonate a user to pull a listing of users from the G-Suite directory.

google1int21

Whew, a lot of new concepts to digest in this entry so I’ll save the review of the application for the next entry.  Here’s a consumable diagram I put together showing the relationship between GCP Projects, G-Suite, and a GCP Organization.  The G-Suite domain acts as a link to the GCP projects.  The G-Suite users can setup GCP projects and have a stub identity (see my first entry LINK) provisioned in the project.  When an service account is created in a project and granted G-Suite Domain-wide Delegation, we use the Client ID associated with the service account to establish an identity for the app in the G-Suite domain which is associated with a scope of authorized access.

google1int22

In this post I covered some basic GCP concepts and saw that the concepts are very similar to both Microsoft and AWS.  I also covered the process to create a service account in GCP and how all the pieces come together to programmatic access to G-Suite resources.  In my next entry I’ll demo some simple .NET applications and walk through the code.

Have a great weekend and go Pats!