Lessons Learned – Managing your Critical IT Infrastructure during a Pandemic

Featured

Worldwide Craziness

The Novel Coronavirus has already devastated the global economy. Historically, most business continuity plans for data centers are based on local scenarios, where “acts of God” wreaked havoc in one place. Rarely had anyone considered that one place being all of Earth.

A Change in IT Mindset

It is not — at least not yet — the equivalent of a worldwide hurricane. Today, the world’s data centers are, for the most part, functional. Modern enterprise data centers have already been designed to operate with as few as three full-time staff members onsite.

You don’t have to look far to see how the global COVID-19 pandemic has fundamentally upended IT. As organizations in all sectors have rapidly emptied their offices and sent their employees home to comply with ever more expansive shelter-in-place and quarantine mandates, replicating the full breadth of services remotely has been IT’s biggest priority.

All of this is nothing short of a remote collaboration revolution. It is already rewriting how work gets done — and how technology gets supported — when direct access to traditional, physical infrastructure is no longer a given.

But this is merely one aspect of IT. As we begin to digest how these changes will shape technology best practices, both during the current crisis and well into the future, we can’t afford to ignore the often unseen underpinnings of IT infrastructure that don’t have the luxury of working remotely.

Not an Option

Put merely, mission-critical facilities like data centers can’t be relocated into employees’ home offices. While transferring end-user productivity out of a traditional office context is a fairly straightforward process. The same can’t be said for the highly specialized workloads that can only be managed within the framework of a data center. Beyond the uniquely visible and non-transferrable capabilities of the facilities themselves — grid access, raw compute power, failover, security, etc. — there is the genuine accountability associated with the sheer volume and type of workloads managed within them.

Regulatory constraints around how incalculably vital data must be managed and protected throughout all phases of its lifecycle add even more complexity to data center protocols during a pandemic.

So while you can’t simply abandon your data center in the same manner as your end users have cleared out their offices, you can — and must — understand how to rebalance your provision of data center services in light of how the pandemic continues to evolve. And it would be best if you did so while you continue to keep the lights on for stakeholders who need uninterrupted access to data center services now more than ever.

Against this backdrop, if you haven’t already examined your data center management strategy through a COVID-19 lens, now is the time to do so. As with anything related to the data center, however, this will be a complex, multifaceted process. It would be best if you positioned yourself to navigate it by looking at it through the following contexts.

  • Capacity Management

    The historically unpredictable global business environment is putting unprecedented pressure on capacity management, with businesses barely able to forecast demand — or, in many cases, keep up with it. Global internet traffic is trending upward, with several exchanges routinely reaching record throughput as entire economies and workforces adjust to the new lockdown paradigm. Some organizations facing spiking demand have no choice but to move services out of their own data centers and lean more heavily on vendors. This makes absolute sense in an unpredictable landscape where scale needs to be implemented without delay. Still, it doesn’t make everyday issues like bandwidth, power, CPU, memory, and disk space disappear. Instead, it shifts the burden onto these external providers and their specific infrastructure. IT leadership must adapt these partnerships to keep pace because, if vendors don’t stay ahead of the curve, IT may find itself unable to serve the business adequately.

  • Connectivity

    The old truth to avoid putting all your eggs in one basket has never been more valid than it is now. This issue relates directly to capacity management, and, as the crisis deepens, the strain on all aspects of infrastructure will only increase. Diversify your upstream providers as much as possible to mitigate the risks associated with any one of them being compromised by pandemic-related resourcing constraints. This minimizes the potential for back-end interruptions to reach your customers. Leverage third-party user reviews and analyst resources to better assess and compare vendors, match provider capabilities to fast-changing business needs, and position yourself to make best-of-breed decisions faster.

  • Disaster Recovery

    The uptick in adopting mission-critical services being deployed off-premises doesn’t only impact day-to-day service delivery and the service level agreements (SLAs) that set expectations and confirm accountabilities. It also has significant implications for disaster recovery (DR) planning and implementation. It shifts a fair degree of risk over to the third-party providers now responsible for delivering these services. DR plans must be updated to reflect this new world of vendor-distributed work, and vendors must be integral to this process to ensure they are in a position to fulfill all requirements.

  • Security

    Cybercriminals have never missed an opportunity to take advantage of periods of uncertainty to ply their evil trade, and the COVID-19 pandemic is no exception. As more organizations move their services to centralized locations, bad actors suddenly have significantly more — and better defined — higher-value targets. From a cybercriminal’s perspective, why attack one company and net only one victim when you can strike a mission-critical data center and compromise many victims? This sobering reality reinforces the need to nail down end-to-end security protocols with all vendors, including, but not limited to, encryption, authentication, and onsite access control. Reaffirming your cybersecurity skills inventory — and closing any gaps with targeted training — should also be prioritized.

  • Colocation

    If you are either using or responsible for colocated resources or infrastructure, you must take immediate steps to reduce physical risks at all levels, including:

    • Focus on disease control and disinfection throughout the facility.
    • Enforce monitoring — including temperature checks — at tightly controlled entries, and turn away anyone exhibiting symptoms to avoid compromising the facility itself.
    • Reduce the number of people onsite, especially unknowns and other individuals not considered essential to the business.
    • Consider extending shift lengths from eight to 12 hours and moving to a two-shift schedule, if local labor laws will accommodate.
    • Take individual steps to protect technical staff with skills required to maintain data center uptime, including sequestering them in a third, unscheduled shift, and holding them in reserve if primary staff exhibit symptoms.
    • Incorporate in-person monitoring of tasks during shift rotations to ensure continuity of operations. Implement contactless handovers to minimize transmission risk during these critical periods.
    • Assign activities and technical resources to single buildings and prevent them from moving to other buildings within a more massive campus.
    • Prioritize the implementation of “smart hands” services to ensure trained, known resources handle tasks requiring onsite engagement.
    • Leverage guidance from local and regional health authorities to ensure nothing is missed, including physical traffic control methods in shared areas to support social distancing.

Focus on the Opportunity

Not everything about the current pandemic should incite fear — all significant disruptions offer opportunities to rethink how data center operations are planned, managed, and evolved over time. The possibilities can be game-changing, but only if you take the time to get out of firefighting mode and zero in on what your strategy should look like once COVID-19 is firmly behind us.

For example, as more data physically moves offsite toward data centers, hardware GPUs can be leveraged for compute-intensive artificial intelligence, machine learning, and related data analysis applications. Recognize that data has gravity and tends to pull surrounding apps with it. Position yourself to sell compute capacity to meet these shifting demands.

Don’t Reinvent the Wheel

As the pandemic continues to play out, expect the value of traditional data center best practice to be reinforced. This isn’t so much a time to rip apart and rebuild as it is to validate what you’ve been doing all along and double down on it.

Start by ensuring your basics are sound and that your existing slate of products and services is reliable, secure, and well-communicated to your stakeholders. The sudden increase in demand for data center services and capacity may be unique in history, but stakeholders will depend on you having a firm foundation. By taking the time to reaffirm that this is indeed the case, you’re in a much better position to scale and meet this demand.

Learn from experience

As unique as this experience seems to us all, recognize that we’ve been through this before — including the SARS, H1N1, and Ebola outbreaks in 2003, 2009, and 2014, respectively. Refer back to any documentation you may have from those periods to inform your thinking and responses for the current pandemic, but bear in mind that the impact in those previous cases was significantly smaller, and we “returned to normal” much more quickly.

This time out, the impact is unprecedented, and the future timeline won’t be resolving itself anytime soon. Expect it to take far longer than initially expected to return to anything remotely approaching “normal,” and, even then, expect the very definition of the word to evolve.

Many economic, technological, and social changes will indeed be permanent, which means your go-forward strategy to manage data center resources should not be to overutilize what you’ve got and hope to ride out the storm. Instead, now is the time to scale your investments in critical infrastructure and prepare for a changing world after that. This strategy will maximize your business continuity and minimize the risks associated with navigating these strange times.

Until next time, Rob.

Azure Powershell – How to Build and Deploy Azure IaaS VMs

Featured

Throughout my career, my primary role has always been to make things more efficient and automated.  And now more than ever, automation is needed to manage and deploy IT services at scale to support our ever-changing needs.

In my opinion, one of the most convenient aspects of public cloud-based services is the ability to host virtual machines (VMs). Hosting VMs in the cloud doesn’t just mean putting your VMs in someone else’s datacenter. It’s a way to achieve a scalable, low-cost and resilient infrastructure in a matter of minutes.

What once required hardware purchases, layers of management approval and weeks of work now can be done with no hardware and in a fraction of the time. We still probably have those management layers though 🙁

Microsoft Azure is in the lead pack along with Google (GCP) and Amazon (AWS). Azure has made great strides over the past few years on in its Infrastructure as a Service (IaaS) service which allows you to host VMs in their cloud.

Azure provides a few different ways to build and deploy VMs in Azure.

  • You could choose to use the Azure portal, build VMs through Azure Resource Manager(ARM) templates and some PowerShell
  • Or you could simply use a set of PowerShell cmdlets to provision a VM and all its components from scratch.

Each has its advantages and drawbacks. However, the main reason to use PowerShell is for automation tasks. If you’re working on automated VM provisioning for various purposes, PowerShell is the way to go 😉

Let’s look at how we can use PowerShell to build all of the various components that a particular VM requires in Azure to eventually come up with a fully-functioning Azure VM.

To get started, you’ll first obviously need an Azure subscription. If you don’t, you can sign up for a free trial to start playing around. Once you have a subscription, I’m also going to be assuming you’re using at least Windows 10 with PowerShell version 6. Even though the commands I’ll be showing you might work fine on older versions of PowerShell, it’s always a good idea to work alongside me with the same version, if possible.

You’ll also need to have the Azure PowerShell module installed. This module contains hundreds of various cmdlets and sub-modules. The one we’ll be focusing on is called Azure.RM. This contains all of the cmdlets we’ll need to provision a VM in Azure.

Building a VM in Azure isn’t quite as simple as New-AzureVM; far from it actually. Granted, you might already have much of the underlying infrastructure required for a VM, but how do you build it out, I’ll be going over how to build every component necessary and will be assuming you’re beginning to work from a blank Azure subscription.

At its most basic, an ARM VM requires eight individual components

  1. A resource group
  2. A virtual network (VNET)
  3. A storage account
  4. A network interface with private IP on VNET
  5. A public IP address (if you need to access it from the Internet)
  6. An operating system
  7. An operating system disk
  8. The VM itself (compute)

In order to build any components between numbers 2 and 7, they must all reside in a resource group so we’ll need to build this first. We can then use it to place all the other components in. To create a resource group, we’ll use the New-AzureRmResourceGroup cmdlet. You can see below that I’m creating a resource group called NetWatchRG and placing it in the East US datacenter.

New-AzureRmResourceGroup -Name 'NetWatchRG' -Location 'East US'

Next, I’ll build the networking that is required for our VM. This requires both creating a virtual subnet and adding that to a virtual network. I’ll first build the subnet where I’ll assign my VM an IP address dynamically in the 10.0.1.0/24 network when it gets built.

$newSubnetParams = @{
'Name' = 'NetWatchSubnet'
'AddressPrefix' = '10.0.1.0/24'
}
$subnet = New-AzureRmVirtualNetworkSubnetConfig @newSubnetParams

Next, I’ll create my virtual network and place it in the resource group I just built. You’ll notice that the subnet’s network is a slice of the virtual network (my virtual network is a /16 while my subnet is a /24). This allows me to segment out my VMs

$newVNetParams = @{
'Name' = 'NetWatchNetwork'
'ResourceGroupName' = 'MyResourceGroup'
'Location' = 'West US'
'AddressPrefix' = '10.0.0.0/16'
'Subnet' = $subnet
}
$vNet = New-AzureRmVirtualNetwork @newVNetParams

Next, we’ll need somewhere to store the VM so we’ll need to build a storage account. You can see below that I’m building a storage account called NetWatchSA.

$newStorageAcctParams = @{
'Name' = 'NetWatchSA'
'ResourceGroupName' = 'NetWatchRG'
'Type' = 'Standard_LRS'
'Location' = 'East US'
}
$storageAccount = New-AzureRmStorageAccount @newStorageAcctParams

Once the storage account is built, I’ll now focus on building the public IP address. This is not required but if you’re just testing things out now it’s probably easiest to simply access your VM over the Internet rather than having to worry about setting up a VPN.

Here I’m calling it NetWatchPublicIP and I’m ensuring that it’s dynamic since I don’t care what the public IP address is. I’m using many of the same parameters as the other objects as well.

$newPublicIpParams = @{'Name' = 'NetWatchPublicIP''ResourceGroupName' = 'NetWatchRG''AllocationMethod' = 'Dynamic' ## Dynamic or Static'DomainNameLabel' = 'NETWATCHVM1''Location' = 'East US'}$publicIp = New-AzureRmPublicIpAddress @newPublicIpParams
Once the public IP address is created, I then need somehow to get connected to my virtual network and ultimately the Internet. I’ll create a network interface again using the same resource group and location again. You can also see how I’m slowly building all of the objects I need as I go along. Here I’m specifying the subnet ID I created earlier and the public IP address I just created. Each step requires objects from the previous steps.
$newVNicParams = @{
'Name' = 'NetWatchNic1'
'ResourceGroupName' = 'NetWatchRG'
'Location' = 'East US'
'SubnetId' = $vNet.Subnets[0].Id
'PublicIpAddressId' = $publicIp.Id
}
$vNic = New-AzureRmNetworkInterface @newVNicParams
Once we’ve got the underlying infrastructure defined, it’s now time to build the VM.
First, you’ll need to define the performance of the VM. Here I’m choosing the lowest performance option (and the cheapest) with a Standard A3. This is great for testing but might not be enough performance for your production environment.
$newConfigParams = @{
'VMName' = 'NETWATCHVM1'
'VMSize' = 'Standard_A3'
}
$vmConfig = New-AzureRmVMConfig @newConfigParams
Next, we need to create the OS itself. Here I’m specifying that I need a Windows VM, the name it will be, the password for the local administrator account and a couple of other Azure-specific parameters. However, by default, an Azure VM agent is installed anyway but does not automatically update itself. You don’t explicitly need a VM agent but it will come in handy if you begin to need more advanced automation capabilities down the road.
$newVmOsParams = @{
'Windows' = $true
'ComputerName' = 'NETWATCHVM1'
'Credential' = (Get-Credential -Message 'Type the name and password of the local administrator account.')
'ProvisionVMAgent' = $true
'EnableAutoUpdate' = $true
}
$vm = Set-AzureRmVMOperatingSystem @newVmOsParams -VM $vmConfig
Next, we need to pick what image our OS will come from. Here I’m picking Windows Server 2016 Datacenter with the latest patches. This will pick an image from the Azure image gallery to be used for our VM.
$newSourceImageParams = @{
'PublisherName' = 'MicrosoftWindowsServer'
'Version' = 'latest'
'Skus' = '2016-Datacenter'
'VM' = $vm
}$offer = Get-AzureRmVMImageOffer -Location 'East US' -PublisherName 'MicrosoftWindowsServer'
$vm = Set-AzureRmVMSourceImage @newSourceImageParams -Offer $offer.Offer
Next, we’ll attach the NIC we’ve built earlier to the VM and specify the NIC ID on the VM that we’d like to add it as in case we need to add more NICs later.
$vm = Add-AzureRmVMNetworkInterface -VM $vm -Id $vNic.Id
At this point, Azure still doesn’t know how you’d like the disk configuration on your VM. To define where the operating system will be stored, you’ll need to create an OS disk. The OS disk is a VHD that’s stored in your storage account. Here I’m putting the VHD in a VHDs storage container (folder) in Azure. This step gets a little convoluted since we must specify the VhdUri. This is the URI to the storage account we created earlier.
$osDiskUri = $storageAcct.PrimaryEndpoints.Blob.ToString() + "vhds/" + $vmName + $osDiskName + ".vhd"

$newOsDiskParams = @{
'Name' = 'OSDisk'
'CreateOption' = 'fromImage'
'VM' = $vm
'VhdUri' = $osDiskUri
}

$vm = Set-AzureRmVMOSDisk @newOsDiskParams
Ok, Whew! We now have all the components required to finally bring up our VM. To build the actual VM, we’ll use the New-AzureRmVM cmdlet. Since we’ve already done all of the hard work ahead of time, at this point, I simply need to pass the resource group name, the location, and the VM object which contains all of the configurations we just applied to it.
$newVmParams = @{
'ResourceGroupName' = 'NetWatchRG'
'Location' = 'East US'
'VM' = $vm
}
New-AzureRmVM @newVmParams

Your VM should now be showing up under the Virtual Machines section in the Azure portal. If you’d like to check on the VM from PowerShell you can also use the Get-AzureRmVM cmdlet.

Now that you’ve got all the basic code required to build a VM in Azure, I suggest you go and build a PowerShell script from this tutorial. Once you’re able to bring this code together into a script, building your second, third or tenth VM will be a breeze!

One final tip, in addition to managing Azure Portal through a browser, there are mobile apps for IOS and Android and now the new Azure portal app (Currently in Preview).  It gives you the same experience as the Azure Portal, without the need of a browser, like Microsoft Edge or Google Chrome.  Great for environments that have restrictions on browsing.

Until next time, Rob…

My thoughts on the Future of the Cloud

Many people in the IT consider containers, a technology used to isolate applications with their own environment, to be the future.

However, serverless geeks think that containers will gradually fade away. They will exist as a low-level implementation detail bubbling below the surface but most software developers will not have to deal with them directly. It may seem premature to declare victory for serverless just yet but there are enough positive signs already. Forward-thinking organizations like iRobot, Coca-Cola, Thomson Reuters, and Autodesk are experimenting and adopting serverless technologies. All major and minor Cloud providers — including the aforementioned ones as well as players like Azure, AWS, GCP, IBM, Oracle, and Pivotal are working on serverless offerings.  If you wan to learn more just take a quick look to this link, https://docs.microsoft.com/en-us/archive/blogs/wincat/validating-hybrid-cloud-scenarios-in-the-server-2012-technology-adoption-program-tap.

Together with the major players, a whole ecosystem of startups is emerging. These startups attempt to solve problems around deployment and observability, provide new security solutions, and help enterprises evolve their systems and architectures to take advantage of serverless. This isn’t, of course, to mention a vibrant community of enthusiasts who contribute to serverless open source projects, evangelize at conferences and online, and promote ideas within their organizations.

It would be great to close the book now and declare victory for the serverless camp, but the reality is different. There are challenges that the community and vendors are yet to solve. These challenges are cultural and technological; there’s tribal friction within the tech community; inertia to adoption within organizations, and issues around some of the technology itself. Also remember to make sure that you are properly certified if you are running cloud-based services, it’s the ISO 27017 certificate that you need for that.

Confusion and the Cloud

While adoption of serverless is growing, more work needs to be done by the serverless community to communicate what this technology is all about. The community needs to bring more people in and explain how serverless adds value. It’s inarguable that there are good questions from members of the tech community. These can range from trivial disagreements over “serverless” as a name, to more philosophical arguments about fit, use-case, and lock-in. This as a perfectly normal example of past successes (with other technologies) breeding inertia to change.

This isn’t to say that those who have objections are wrong. Serverless in its current incarnation isn’t suitable in all cases. There are limitations on how long functions can run, tooling is immature and monitoring distributed applications made up of a lot of functions and cloud services can be difficult (although some progress is being made to address this).

There’s also a need for a robust set of example patterns and architectures. After all, the best way to convince someone of the merit of technology is to build something with it and then show them how it was done.

Confusingly, there is a tendency by some vendors to label their offerings as serverless when they aren’t. This makes it look like they are jumping on the bandwagon rather than thoughtfully building services that adhere to serverless principles. Some of the bigger cloud vendors are guilty of this and unfortunately, this confuses people’s understanding of technology.

Go Big or Go Home

At the very large end of the scale, companies like Netflix and Uber are building their own internal serverless-like platforms. But unless you are the size of Netflix or Uber, building your own Function as a service (FaaS) platform from scratch is a terrible idea. Think of it this way like this, its like building a toaster yourself rather than buying a commoditized, off-the-shelf product. Interestingly, Google recently released a product called kNative. This product — based on the open source Kubernetes container orchestration software— is designed to help build, deploy and manage serverless workloads on your own servers.

For example, Google’s Bret McGowen, at Serverlessconf San Francisco ’18, gave of a real-life customer scenario out on an oil rig in the middle of an ocean with poor Internet connectivity. The customer needed to perform computation with terabytes of telemetry data but uploading it to a cloud platform over a connection equivalent to a 3G modem wasn’t feasible. “They cannot use cloud and it’s totally unfair to say — sorry buddy, hosted functions-as-a-service or bust — their developers deserve to have the same serverless experience as the rest of us” was Bret’s explanation why, in this case, running kNative locally on the oil rig made sense.

He is, of course, correct. Having a serverless system running in your own environment — when you cannot use a cloud platform — is better than nothing. However, for most of us, serverless solutions like Google Cloud Functions, Azure Functions, or AWS Lambda offer a far smaller barrier to entry and remove many administrative headaches. It’s fair to say that most companies should look at serverless solutions like Lambda first and if they don’t satisfy requirements look at other alternatives, like kNative and containers, second.

The Future…in my humble opinion

It’s likely that some of the major limitations with serverless functions are going to be solved in the coming years, if not months. Cloud vendors will allow functions to run for longer, support more languages, and allow deeper customizations. A lot of work is being done by cloud vendors to allow developers to bring their own containers to a hosted environment and then have those containers seamlessly managed by the platform alongside regular functions.

In the end, “do you have a choice?” “No, none, whatsoever” was Bret’s succinct, brutal answer at the conference. Existing limitations will be solved and serverless compute technologies will herald the rise of new, emerging architectural patterns and practices. We are yet to see what these are but, this is the future and it is unavoidable.

Cloud computing is where we are, and where the world is going for the next decade or two. After that, probably something new will come along.

But the reasons for going to cloud computing in general and the inevitable wind-down of on-premises to niche special functions are now pretty obvious.

  • Security – Big cloud operators have FAR more security people and capacity than even a big enterprise, and your own disgruntled employees don’t have the keys to the servers.
  • Cost-effectiveness – Economies of scale. The rule of big numbers.
  • Zero capital outlay – reduced costs.
  • For software developers, no more software piracy. That’s a big saving on the cost of developing software, especially for sales in certain countries.
  • Compliance – So much easier if your cloud vendor is fully certified, so you only have to worry about your part of the puzzle.
  • Energy efficiency – Big, well-designed datacentres use a LOT less global resources.

My next post in this series will be on “The Past and On-prem and the Cloud?

Until next time, Rob