Setting up a Python Coding Environment

Welcome back folks!

Like many of my fellow veteran men and women in tech, I’ve been putting in the effort to evolve my skill set and embrace the industry’s shift to a more code focused world.  Those of us who came from the “rack and stack” generation did some scripting here and there where it workable using VB, Bash, Batch, Perl, or the many other languages who have had their time in the limelight.  The concept of a development lifecycle and code repository typically consisted of a few permanently open Notepad instances or if you were really fancy, scripts saved to a file share with files labeled v1, v2, and so on.  Times have changed we must change with them.

Over the past two years I’ve done significantly more coding.  These efforts ranged from creating infrastructure using Microsoft ARM (Azure Resource Manager) and AWS CloudFormation templates to embracing serverless with Azure Functions and AWS Lambdas.  Through this process I’ve quickly realized that the toolsets available to manage code and its lifecycle have evolved and gotten more accessible to us “non-developers”.

I’m confident there are others like myself out there who are coming from a similar background and I wanted to put together a post that might help others begin or move forward with their own journeys.  So for this post I’m going to cover how to setup a Visual Studio Code environment on a Mac for developing code using Python.

With the introduction done, let’s get to it!

First up you’ll need to get Python installed.  The Windows installation is pretty straightforward and can be downloaded here.  Macs are a bit tricker because OS X ships with Python 2.7 by default.  You can validate this by running python –version from the terminal.  What this means is you’ll need to install Python 3.7 in parallel.  Thankfully the process is documented heavily by others who are far more knowledgable than me.  William Vincent some wonderful instructions.

Once Python 3.7 is installed, we’ll want to setup our IDE (integrated development environment).  I’m partial to VSC (Visual Studio Code) because it’s free, cross platform, and simple to use.  Installation is straightforward so I won’t be covering those steps.

Well you have your interpreter and your IDE but you need a good solution to store and track changes to the code you’re going to put together.  Gone are the days of managing it by saving copies (if you even got that far) to your desktop and arrived are the days of Git.  You can roll your own Git service or use a managed service.  Since I’m a newbie, I’ve opted to go mainstream and simple with Github.  A free account should more than suffice unless you’re planning on doing something that requires a ton of collaboration.

Now that your account is setup, let’s go through the process of creating a simple Python script, creating a new repository, committing the code, and pushing it up to Github.  We’ll first want to create a new workspace in VSC.  One of the benefits of a workspace is you can configure settings on a per project basis vs modifying the settings of the VSC as a whole.

To do this open VSC and create a new empty file using the New file shortcut as seen below.

Screen Shot 2019-06-18 at 9.11.34 PM.png

Once the new window is opened, you can then choose Save Workspace as from the File context menu.  Create a new directory for the project (I’ll refer to this as the project directory) and save the workspace to that folder.  Create a subfolder under the workspace (I’ll refer to this as the working directory).

We’ll now want to initialize the local repository.  We can do this by using the shortcut Command+Shift+P which will open the command pallet in VSC.  Search for Git, choose Git: Initialize Repository, and select the working directory.   You’ll be prompted to add the folder to the workspace which you’ll want to do.

Screen Shot 2019-06-18 at 9.26.11 PM.png

VSC will begin tracking changes to files you put in the folder and the Source Control icon will now be active.

Screen Shot 2019-06-18 at 9.27.28 PM.png

Let’s now save the new file we created as hello-world.py.  The py extension tells VSC that this is Python code and you’ll yield a number of benefits such as IntelliSense.  If you navigate back to the Source Control you’ll see there are uncommitted changes from the new hello-world.py file.  Let’s add the classic line of code to print Hello World.  To execute the code we’ll choose the Start Without Debugging option from the Debug context menu.

Screen Shot 2019-06-18 at 9.35.45 PM.png

The built in Python libraries will serve you well, but there are a TON of great libraries out there you’ll most certainly want to use.  Wouldn’t it be wonderful if you could have separate instances of the interpreter with specific libraries?  It comes the awesomeness of virtual environments.  Using them isn’t required but it is best practice in the Python world and will make your life a lot easier.

Creating a new virtual environment is easy.

  1. Open a new terminal in Visual Studio Code, navigate to your working directory, and create a new folder named envs.
  2. Create the new virtual environment using the command below.
    python3 -m venv ./envs

You’ll now be able select the virtual environment for use in the bottom left hand corner of VSC as seen below.

Screen Shot 2019-06-14 at 9.31.43 PM.png

After you select it, close out the terminal window and open a new one in VSC by selecting New Terminal from the Terminal context menu.  You’ll notice the source command is run to select the virtual environment.  You can now add new libraries using pip (Python’s package manager) as needed and they will be added to the virtual environment you created.

If you go back to the source control menu you’ll notice there a whole bunch of new files.  Essentially Git is trying to track all of the files within the virtual environment.  You’ll want to have Git ignore it by creating a file name .gitignore file.  Within the file we’ll add two entries, one for the ignore file and one for the virtual environment directory (and a few others if you have some hidden files like Mac’s .DS_Store).

Screen Shot 2019-06-18 at 10.21.13 PM

Let’s now commit the new file hello-world.py to the local repository.  Accompanying the changes, you’ll also add a message about what has changed in the code.  There is a whole art around good commit messages which you can research on the web.  Most of my stuff is done solo, so it’s simple short messages to remind me of what I’ve done.   You can make your Git workflows more sophisticated as outlined here, but for very basic development purposes a straight commit to the master works.

Now that we have the changes committed to our local repository, let’s push them up to a new remote repository in Github.  First you’ll want to create an empty repository.  To add data to the repo, you’ll need to authenticate.  I’ve added two-factor authentication to my Github account, which it doesn’t look like Visual Studio Code supports at this time.  To work around the limitation you can create personal access tokens.  Not a great solution, but it will suffice as long as you practice good key management and create the tokens with a limited authorization scope and limit their lifetime.

Once your repository is set and you’ve created your access token, you can push to the remote repository.  In Visual Studio Code run Command+Shift+P to open the command pallet and find Git: Add Remote command to add the repository.  Provide a name (I simply used origin, seems like the common name) as the name and provide the URL of your repository.  You’ll then be prompted to authentication.  Provide your Github username and the personal access token for the password.   Your changes will be pushed to the repository.

There you have it folks!  I’m sure there are better ways to orchestrate this process, but this is what’s working for me.  If you have alternative methods and shortcuts, I’d love to hear about them.

Have a great week!

Passing the AZ-300

Hello all!

Over the past year I’ve been buried in Amazon Web Services (AWS), learning the platform, and working through the certification paths.  As part of my new role at Microsoft, I’ve been given the opportunity to pursue the Microsoft Certified: Azure Solutions Architect Expert.  In the world of multi-cloud who doesn’t want to learn multiple platforms? 🙂

The Microsoft Certified: Azure Solutions Architect Expert certification is part of Microsoft’s new set of certifications.  If you’re already familiar with the AWS Certification track, the new Microsoft track is very similar in that it has three paths.  These paths are Developer, Administrator, and Architect.  Each path consists of two exams, again similar to AWS’s structure of Associate and Professional.

Even though the paths are similar the focus and structure of the first tier of exams for the Microsoft exams differ greatly from the AWS Associate exams.  The AWS exams are primarily multiple choice while the Microsoft first level of exams consists of multiple choice, drag and drop, fill in the blank, case studies, and emulated labs.  Another difference between the two is the AWS exams focus greatly on how the products work and when and where to use each product.  The Microsoft first level exams focus on those topics too, but additionally test your ability to implement the technologies.

When I started studying for the AZ-300 – Microsoft Azure Architect Technologies two weeks I had a difficult time finding good study materials because the exam is so new and has changed a few times since Microsoft released it last year.  Google searches brought up a lot of illegitimate study materials (brain dumps) but not much in the way of helpful materials beyond the official Azure documentation.  After passing the exam this week, I wanted to give back to the community and provide some tips, links, and the study guide I put together to help prepare for the exam.

To prepare for an exam I have a standard routine.

  1. I first start with referencing the official exam requirements.
  2. From there, I take one or two on-demand training classes.  I watch each lesson in a module at 1.2x speed (1x always seems to slow which I think is largely due to living in Boston where we tend to talk very quickly).  I then go back through each module at 1.5x to 2.0x taking notes on paper.  I then type up the notes and organize them into topics.
  3. Once I’m done with the training I’ll usually dive deep into the official documentation on the subjects I’m weak on or that I find interesting.
  4. During the entirety of the learning process I will build out labs to get a feel for implementation and operation of the products.
  5. I wrap it up by adding the additional learnings from the public documentation and labs into my digital notes.  I then pull out the key concepts from the digital notes and write up flash cards to study.
  6. Practice makes perfect and for that I will leverage legitimate practice exams (braindumps make the entire exercise a pointless waste of time and degrade the value of the certification) like those offered from MeasureUp.

Yes, I’m a bit nuts about my studying process but I can assure you it works and you will really learn the content and not just memorize it.

From a baseline perspective, my experience with Microsoft’s cloud services were primarily in Azure Active Directory and Azure Information Protection.  For Azure I had built some virtual networks with virtual machines in the past, but nothing more than that.  I have a pretty solid foundation in AWS and cloud architectural patterns which definitely came in handy since the base offerings of each of the cloud providers are fairly similar.

For on-demand training A Cloud Guru has always been my go to.  Unfortunately, their Azure training options aren’t as robust as the AWS offerings, but Nick Coyler’s AZ-300 course is solid.  It CANNOT be your sole source of material but as with most training from the site, it will give you the 10,000 ft view.  Once I finished with A Cloud Guru, I moved on to UdemyScott Duffy’s AZ-300 course does not have close to the detail of Nick’s course, but provides a lot more hands-on activities that will get you working with the platform via the GUI and the CLI.  Add both courses together and you’ll cover a good chunk of the exam.

The courses themselves are not sufficient to pass the exam.  They will give you the framework, but docs.microsoft.com is your best friend.  There is the risk you can dive more deep into the product than you need to, but reference back to the exam outline to keep yourself honest.  Hell, worst case scenario is you learn more than you need to learn. 🙂  Gregor Suttie put together a wonderful course outline with links to the official documentation that will help you target key areas of the public documentation.

Perhaps most importantly, you need to lab.  Then lab again.  Lab once more, and then another time.  Run through the Quickstarts and Tutorials on docs.microsoft.com.  Get your hands dirty with the CLI, PowerShell, and the Portal.  You don’t have to be an expert, but you’ll want to understand the basics and the general syntax of both the CLI and PowerShell.  You will have fully interactive labs where you’ll need to implement the products given a set of requirements.

Finally, I’ve added the study guides I put together to my github.  I make no guarantees that the data is up to date or even that there aren’t mistakes in some of the content.  Use it as an artifact to supplement your studies as you prepare your own study guide.

Summing it up, don’t just look at the exam as a piece of virtual paper.  Look at it as an opportunity to learn and grow your skill set.  Take the time to not just memorize, but understand and apply what you learn.  Be thankful you work an industry where things change and provides you with the opportunity to learn something new and exercise that big brain of yours.

I wish you the best of luck in your studies and if you have additional materials or a website you’ve found helpful, please comment below.

Thanks!

 

 

 

 

 

 

 

Some updates…

Hello folks!  Life has been busy with some wonderful work and some big career changes.  As some of you may know, I moved on from my role at the Federal Reserve last summer.  While I loved the job, the people, and the organization, I wanted to try something new and different.

I was lucky enough to have the opportunity to work for one of the big three cloud providers in a security-focused professional services role supporting public sector customers.  The role was amazing, I learned a TON about a cloud platform I had never worked with, interacted with some of the smartest people I’ve ever met, and had the chance to help architect and implement some really awesome environments for some stellar customers.

Unfortunately the travel started taking a toll on my personal life and family time.  I made the tough decision to move on and find something that was a bit more regional and less travel.  I struck the lottery once more and in April started as a Cloud Solution Architect at Microsoft focusing on Infrastructure and Security in Azure.  I’ve once again been drinking from multiple firehoses and learning my second cloud platform.  It’s been a ball so far and I’m extremely excited learn and contribute to Microsoft’s mission to empower every person and every organization on the planet to achieve more!

Expect a lot more activity on this blog as I share my experiences and my learnings with the wider tech community.  It’s going to be a fun ride!

Capturing and Visualizing Office 365 Security Logs – Part 2

Capturing and Visualizing Office 365 Security Logs – Part 2

Hello again my fellow geeks.

Welcome to part two of my series on visualizing Office 365 security logs.  In my last post I walked through the process of getting the sign-in and security logs and provided a link to some Lambda’s I put together to automate pulling them down from Microsoft Graph.  Recall that the Lambda stores the files in raw format (with a small bit of transformation on the time stamps) into Amazon S3 (Simple Storage Service).  For this demonstration I modified the parameters for the Lambda to download the 30 days of the sign-in logs and to store them in an S3 bucket I use for blog demos.

When the logs are pulled from  Microsoft Graph they come down in JSON (JavaScript Object Notation) format.  Love JSON or hate it is the common standard for exchanging information these days.  The schema for the JSON representation of the sign-in logs is fairly complex and very nested because there is a ton of great information in there.  Thankfully Microsoft has done a wonderful job of documenting the schema.  Now that we have the logs and the schema we can start working with the data.

When I first started this effort I had put together a Python function which transformed the files into a CSV using pipe delimiters.  As soon as I finished the function I wondered if there was an alternative way to handle it.  In comes Amazon Athena to the rescue with its Openx-JsonSerDe library.  After reading through a few blogs (great AWS blog here), StackOverflow posts, and the official AWS documentation I was ready to put something together myself.  After some trial and error I put together a working DDL (Data Definition Language) statement for the data structure.  I’ve made the DDLs available on Github.

Once I had the schema defined, I created the table in Athena.  The official AWS documentation does a fine job explaining the few clicks that are provided to create a table, so I won’t re-create that here.  The DDLs I’ve provided you above will make it a quick and painless process for you.

Let’s review what we’ve done so far.  We’ve setup a reoccurring job that is pulling the sign-in and audit logs via the API and is dumping all that juicy data into cheap object storage which we can further enforce lifecycle policies for.  We’ve then defined the schema for the data and have made it available via standard SQL queries.  All without provisioning a server and for pennies on the dollar.  Not to shabby!

At this point you can use your analytics tool of choice whether it be QuickSight, Tableau, PowerBi, or the many other tools that have flooded the market over the past few years.  Since I don’t make any revenue from these blog posts, I like to go the cheap and easy route of using Amazon QuickSight.

After completing the initial setup of QuickSight I was ready to go.  The next step was to create a new data set.  For that I clicked the Manage Data button and selected New Data Set.

Screen Shot 2019-01-31 at 8.57.15 PM.png

On the Create a Data Set screen I selected the Athena option and created a name for the data source.

screenshot2019-01-31at9.01.48pm

From there I selected the database in Athena which for me was named azuread.  The tables within the database are then populated and I chose the tbl_signin_demo which points to the test S3 bucket I mentioned previously.

Screen Shot 2019-01-31 at 9.04.22 PM.png

Due to the complexity of the data structure I opted to use a custom SQL query.  There is no reason why you couldn’t create the table I’m about to create in Athena and then connect to that table instead to make it more consumable for a wider array of users.  It’s really up to you and I honestly don’t know what the appropriate “big data” way of doing it is.  Either way, those of you with real SQL skills may want to look away from this query lest you experience a Raiders of The Lost Ark moment.

indianjones

You were warned.

SELECT records.id, records.createddatetime, records.userprincipalname, records.userDisplayName, records.userid, records.appid, records.appdisplayname, records.ipaddress, records.clientappused, records.mfadetail.authdetail AS mfadetail_authdetail, records.mfadetail.authmethod AS mfadetail_authmethod, records.correlationid, records.conditionalaccessstatus, records.appliedconditionalaccesspolicy.displayname AS cap_displayname, array_join(records.appliedconditionalaccesspolicy.enforcedgrantcontrols,' ') AS cap_enforcedgrantcontrols, array_join(records.appliedconditionalaccesspolicy.enforcedsessioncontrols,' ') AS cap_enforcedsessioncontrols, records.appliedconditionalaccesspolicy.id AS cap_id, records.appliedconditionalaccesspolicy.result AS cap_result, records.originalrequestid, records.isinteractive, records.tokenissuername, records.tokenissuertype, records.devicedetail.browser AS device_browser, records.devicedetail.deviceid AS device_id, records.devicedetail.iscompliant AS device_iscompliant, records.devicedetail.ismanaged AS device_ismanaged, records.devicedetail.operatingsystem AS device_os, records.devicedetail.trusttype AS device_trusttype,records.location.city AS location_city, records.location.countryorregion AS location_countryorregion, records.location.geocoordinates.altitude, records.location.geocoordinates.latitude, records.location.geocoordinates.longitude,records.location.state AS location_state, records.riskdetail, records.risklevelaggregated, records.risklevelduringsignin, records.riskstate, records.riskeventtypes, records.resourcedisplayname, records.resourceid, records.authenticationmethodsused, records.status.additionaldetails, records.status.errorcode, records.status.failurereason  FROM "azuread"."tbl_signin_demo" CROSS JOIN (UNNEST(value) as t(records))

This query will de-nest the data and give you a detailed (possibly extremely large depending on how much data you are storing) parsed table. I was now ready to create some data visualizations.

The first visual I made was a geospatial visual using the location data included in the logs filtered to failed logins. Not surprisingly our friends in China have shown a real interest in my and my wife’s Office 365 accounts.

screenshot2019-01-31at9.26.24pm

Next up I was interested in seeing if there were any patterns in the frequency of the failed logins.  For that I created a simple line chart showing the number of failed logins per user account in my tenant.  Interestingly enough the new year meant back to work for more than just you and me.

screenshot2019-01-31at9.28.45pm

Like I mentioned earlier Microsoft provides a ton of great detail in the sign-in logs.  Beyond just location, they also provide reasons for login failures.  I next created a stacked bar chat to show the different reasons for failed logs by user.  I found the blocked sign-ins by malicious IPs interesting.  It’s nice to know that is being tracked and taken care of.

screenshot2019-01-31at9.31.24pm

Failed logins are great, but the other thing I was interested in is successful logins and user behavior.  For this I created a vertical stacked bar chart that displayed the successful logins by user by device operating system (yet more great data captured in the logs).  You can tell from the bar on the right my wife is a fan of her Mac!

screenshot2019-01-31at9.38.02pm

As I gather more data I plan on creating some more visuals, but this was great to start.  The geo-spatial one is my favorite.  If you have access to a larger data set with a diverse set of users your data should prove fascinating.  Definitely share any graphs or interesting data points you end up putting together if you opt to do some of this analysis yourself.  I’d love some new ideas!

That will wrap up this series.  As you’ve seen the modern tool sets available to you now can do some amazing things for cheap without forcing you to maintain the infrastructure behind it.  Vendors are also doing a wonderful job providing a metric ton of data in their logs.  If you take the initiative to understand the product and the data, you can glean some powerful information that has both security and business value.  Even better, you can create some simple visuals to communicate that data to a wide variety of audiences making it that much more valuable.

Have a great weekend!