Cloud security is like voodoo. Clients blindly trust the cloud providers and the security they provide. If we look at popular cloud vulnerabilities, we see that most of them focus on the security of the client’s applications (aka misconfigurations or vulnerable applications), and not the cloud provider infrastructure itself. We wanted to disprove the assumption that cloud infrastructures are secure. In this part, we demonstrate various attack vectors and vulnerabilities we found on Azure Stack.
Check Point Research informed Microsoft Security Response Center about the vulnerabilities exposed in this research and
a solution was responsibly deployed to ensure its users can safely continue using Azure Stack
Researching cloud components can be difficult, particularly as most of the time it’s “black box” research. Fortunately, Microsoft has an on-premise Azure environment called Azure Stack which is meant primarily for enterprise usage. There is also a version called Azure Stack Development Kit (ASDK) which is free. All you have to do is get a single server that meets the installation hardware requirements and follow the detailed installation guides. Once the installation is finished, you will be greeted with the User/Admin Portal, which looks very similar to the Azure Portal:
By default, ASDK comes with a small set of features (core components) which can be extended with features like SQL Providers, App Service and more. With that said, let’s see how ASDK compares to Azure.
Note – Most of the data in this section is taken from this book
Let’s break down the diagram by layers:
First, we have the Azure Stack portal that provides a simple and accessible UI, along with Templates, PowerShell, etc. These components are used for deploying and managing resources and are the common interfaces in Azure Stack. They are built on top of and interact with the Azure Resource Manager (ARM). The ARM decides which requests it can handle and which need to be passed on to another layer.
The partition request broker includes core resource providers in Azure Stack. Each resource provider has an API that works back and forth with the ARM layer. A resource provider is what allows you to communicate with the underlying layer, and includes a user/admin extensions that are accessible from the portal.
The next layer underneath contains the infrastructure controllers which communicate with the infrastructure roles. This layer has a set of internal APIs which are not exposed to the user.
The infrastructure roles are responsible for tasks such as computing, networking, storage and more.
Finally, the infrastructure roles contain all the management components of Azure Stack, interacting with the underlying hardware layer to abstract hardware features into high-level software services that Azure Stack provides.
ASDK is based on Hyper-V, meaning all of its roles run as separate virtual machines on the host server. The infrastructure has separate virtual networks that isolate them from the host network.
By default, there are several infrastructure roles that are deployed, including:
Name | Description |
AzS-ACS01 | Azure Stack storage services. |
AzS-ADFS01 | Active Directory Federation Services (ADFS). |
AzS-CA01 | Certificate authority services for Azure Stack role services. |
AzS-DC01 | Active Directory, DNS, and DHCP services for Microsoft Azure Stack. |
AzS-ERCS01 | Emergency Recovery Console VM. |
AzS-GWY01 | Edge gateway services such as VPN site-to-site connections for tenant networks. |
AzS-NC01 | Network Controller, which manages Azure Stack network services. |
AzS-SLB01 | Load balancing multiplexer services in Azure Stack for both tenants and Azure Stack infrastructure services. |
AzS-SQL01 | Internal data store for Azure Stack infrastructure roles. |
AzS-WAS01 | Azure Stack administrative portal and Azure Resource Manager services. |
AzS-WASP01 | Azure Stack user (tenant) portal and Azure Resource Manager services. |
AzS-XRP01 | Infrastructure management controller for Microsoft Azure Stack, including the Compute, Network, and Storage resource providers. |
Source: https://docs.microsoft.com/en-us/azure-stack/asdk/asdk-architecture
If we break down the main abstract layers in the diagram above into the main virtual machines:
Let’s look at an example that demonstrates how all the abstract layers in the diagram work together:
A tenant wants to stop a virtual machine in Azure Stack. How does this work?
In the following section, we describe in detail an issue we found in one of the internal services that allowed us to grab screenshots of the tenant and infrastructure machines.
Service Fabric Explorer is a web tool pre-installed in the machine that takes the role of the RP and Infrastructure Control Layer (AzS-XRP01). This enables us to view the internal services which are built as Service Fabric Applications (located in the RP Layer).
When we tried to access the URLs of the services from the Service Fabric Explorer, we noticed that some of them don’t require authentication (usually there is a certificate authentication/HTTP Authentication).
We had some questions:
These services are written in C# and their source code is not public, so we had to use a decompiler to research them. This required us to understand the structure of the Service Fabric applications.
One particular service that didn’t require authentication is called “DataService”. Our first task was to find where this service is located on the Azs-XRP01 machine. We found this easily by running a WMI query to list the running processes:
The result revealed the location of all the service fabric services there are on the machine, including DataService. Performing a directory listing on the DataService code folder revealed a lot of DLLs. However, their names indicate their purpose:
De-compiling the DLLs gave us the ability to explore the code and find the mapping for the API HTTP routes:
We can see that if the HTTP URI matches to one of the route templates, the request is handled by a specific controller, which is a common REST API implementation. Most of the route templates require at least one parameter that we don’t necessarily know. Therefore, we first started looking at those that don’t require additional parameters:
As Azure Stack runs locally on our machine, we can just locally browse these API to see how they respond.
When accessing the virtualMachines/allocation
API (QueryVirtualMachineInstanceView
), it returns a large XML/JSON file (depending on the Accept header you send) which contains a lot of data about infrastructure/tenant machines located on the Hyper-V node in the cluster.
This is a snippet from the information returned. We can see here interesting stuff like the virtual machine name and ID, hardware information like cores, total memory, etc.
Now that we know there is an API that can provide information about the infrastructure/tenant machines, we can look at the API calls that require other parameters. For example, the VirtualMachineScreenshot
looks interesting, so let’s see how it works.
According to the template, several parameters must be supplied to route the request through the VirtualMachineScreenshot
controller:
When all of these parameters are provided, the GetVirtualMachineScreenshot
function is invoked:
If the virtual machine ID is valid and exists, the GetVmScreenshot
function is called. This actually “proxies” the request into another internal service.
We can see that it creates a new request with the specified parameters and passes it to the request executor. The internal service which will process this request is called “Compute Cluster Manager” (located in the Infrastructure Control Layer). From its name, we see that it manages the compute clusters, and can perform relevant actions. Let’s see how this service handles the screenshot request:
First, we encounter this wrapper function, which calls another GetVmScreenshot
on the vmScreenshotCollector
instance. However, we can see that there is a new parameter, a flag that determines if the compute cluster contains only a single host/node.
After GetVirtualMachineOwnerNode
figures out which node of the cluster the virtual machine is located on, it calls the GetVmThumbnail
function:
It seems like this function constructs a remote Powershell command which it executes on the compute node (this is how most of the compute operations work). Let’s look at the compute node and see how the Get-CpiVmThumbnail
is implemented:
This is the Powershell implementation of this function. It looks like it executes the GetVirtualSystemthumbnailImage
which is a Hyper-V WMI call that grabs the thumbnail for the virtual machine. The thumbnail is the small window at the bottom left of the machine overview in Hyper-V:
However, because of the option to specify dimensions, this is equivalent to a legit quality screenshot.
Now that we have a good understanding of the primitives contained in “DataService”, let’s get back to our first question: Why doesn’t it require authentication? We actually don’t know the answer, but it should absolutely require authentication. We approached this by asking an additional question: In what scenario can we access this service from outside? The answer is SSRF, but where should we start looking? The obvious choice is the User Portal. It is accessible to the tenants and can access services such as ARM. On Azure Stack, it can even directly access the internal services.
Azure Stack and Azure can deploy resources from a template. The template can be loaded from a local file, or a remote URL. It is a very simple feature and also interesting in terms of SSRF, because it sends a GET request to a URL to retrieve data. This is the implementation of the remote template loading (used as Ajax):
The GetStringAsync
function sends an HTTP GET request to the templateUri and returns the data as JSON. There is no validation on whether the host is internal or external (and it supports IPv6). Therefore, this method is a perfect candidate for SSRF. Although this allows only GET requests, as we’ve seen above, it’s sufficient for accessing the DataService.
So let’s use an example. We want to get a screenshot from a machine whose ID is f6789665-5e37-45b8-96d9-7d7d55b59be6 with the 800×600 dimensions:
The response we got is Base64 encoded raw image data.
We can now take the data we got and transform it into an actual image. Here is an example using powershell:
We will get this image:
In this part, we showed how a small logical bug can sometimes be leveraged into a serious issue. In our case, because DataService didn’t require authentication, this eventually allowed us to get screenshots and information about tenants and infrastructure machines.
In the second part, we will take a deep dive into Azure App Service internals and examine its architecture, attack vectors, and demonstrate how a critical vulnerability we found in one of its components affected Azure Cloud.
The SSRF vulnerability (CVE-2019-1234) was disclosed and fixed by Microsoft, and was awarded $5,000 from Microsoft’s bug bounty program.
The unauthenticated internal API issue had also been separately discovered by Microsoft, and had been addressed in late 2018 in Azure Stack 1811 update.
In the next part, we disclose a critical vulnerability we found in the Azure App Service.