Auto-scaling Azures SignalR Service via Powershell Function

8 min read

Introduction

I am currently working on a project for a rather large cycling event where the organizers wish to display a live race progress map on their website. In order to do this the organizers have partnered with a 3rd party service provider who can equip the pace vehicle in front of the race with a GPS tracker and provide an API from which we can access the cars current location.

Now, the cycling event generates a lot of web traffic and on race days can reach in excess of 30,000 concurrent connections, so rather than having every client connecting directly to the tracking API, we thought a better approach would be to use a bunch of Azure services to offload these demands away from both the website and the 3rd party API. To do this we decided to go with the following setup:

  • 1 x Azure SignalR Service This will act as our message hub pushing out the current car location to all connected clients
  • 1 x Timer based Azure Function This connects to the 3rd party API service to retrieve the current car location and pushes it out to the SignalR hub
  • 1 x Http based Azure Function This acts as the SignalR negotiation endpoint for connecting web clients to the SignalR hub
  • 1 x Map Web Page This is the map page on the site which displays a Mapbox map and has a basic SignalR javascript client which connects and listens to the SignalR Service for car location messages and updates a map marker accordingly

I'm not going to go into the code for the various parts here, maybe I'll go into them further on another post, but within a day we had a working solution and all the components connected exactly as we were hoping, culminating in a map marker automatically moving around a map (Unfortunately I can't show it just yet as the project is yet to go live, but I'm sure you can picture a marker on a map 😁).

Our biggest issue however came when we started to plan for the levels of traffic the site could come to expect.

The Problem

The problem we had was that because of the nature of the event and the fact that races only lasted a few hours, the traffic we were expecting on the site wasn't going to be consistent throughout the day. The day would start with basically no connections, but when the race starts it could ramp up very quickly to the 30k+ mark and then come crashing back down once the race is over. Compounded onto this the fact that it was a multi race event, with races occurring throughout the week, and some times multiple races per day, it just meant that traffic levels to the site were going to be a bit of a roller coaster.

If you were thinking what I was thinking at the start of the project, you'd probably have thought "Isn't that what Azure is meant to do, taking all the scaling and resource management away so you don't have to worry about it?". Well, when it comes to the SignalR Service, that doesn't appear to be the case.

Now, it does offer scaling in a sense, unfortunately it is completely manual and so would require us to scale the resources up and down as demand dictated by physically sliding a slider up and down.

Azure SignalR Scaling Slider

Well needless to say, this wasn't going to work for us so we had to find another, more automated solution.

The Solution

Thankfully, much of the Azure services can all be provisioned and altered using various APIs, which the SignalR Service was no exception. Unfortunately, it looked like nobody else had done this before, although I did find one chap asking Microsoft support the same question, and to whom I reached out to and whom was kind enough to share their approach, however by this point I'd already started on my own solution and by his own admission, his solution was far from ideal anyways.

Our solution to the problem was to create another Azure function, this time using powershell and the Az management module to perform the management tasks for us. This function would be a Timer based function, scheduled to run every 5 minutes and would perform the following tasks:

  1. Create an Azure connection, authenticating a service principal linked to the azure account we wish to manage in order to set context for the following actions
  2. Fetch information about the current configuration of the Azure SignalR Service resource
  3. Query Azures Metrics API to see what the maximum number of connection was in the 5 minutes since the function last ran
  4. Determine whether we are reaching / below the limits of the current configurations quotas and if so, to which level we should scale to
  5. If we have determined that we are not on the optimum scale, then update the resource to the new scale level

Sidenote: I'm making this all seem matter of fact here, like this is what I planned from the get go, but the truth is, I actually know very little about Azure functions, Powershell and all of Azures APIs, so this literally took me days to piece together, but if I told you about every single step, this post would get a little long 😁

The Code

To create my function, I followed a tutorial on getting started with Powershell Azure Functions which setup my folder structure and boilerplate files. What follows below are just the key files you'll want to modify and the required code to perform our plan of action.

function.json

{
  "bindings": [
    {
      "name": "Timer",
      "type": "timerTrigger",
      "direction": "in",
      "schedule": "0 */5 * * * *"
    }
  ]
}

local.settings.json

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "FUNCTIONS_WORKER_RUNTIME": "powershell",
    "ServicePrincipalClientId": "{YOUR_SERVICE_PRINCIPAL_CLIENT_ID}",
    "ServicePrincipalKey": "{YOUR_SERVICE_PRINCIPAL_KEY}",
    "ServicePrincipalTenantId": "{YOUR_SERVICE_PRINCIPAL_TENANT_ID}",
    "SignalRResourceId": "{YOUR_SIGNALR_SERVICE_RESOURCE_ID}"
  }
}

run.ps1

# Input bindings
param($Timer)

# Variable definitions
$resourceId = $env:SignalRResourceId

$connectionsPerUnit = 1000          # Number of concurent connections you can have per unit
$unitCounts = 1,2,5,10,20,50,100    # Supported SignalR Unit Counts
$scaleThreshold = .95               # Percentage threshold at which to scale 

# Authenticate the service principle
$clientId = $env:ServicePrincipalClientId
$key = $env:ServicePrincipalKey
$securePassword = ConvertTo-SecureString $key -AsPlainText -Force
$credentials = New-Object System.Management.Automation.PSCredential($clientId, $securePassword)
$tenantId = $env:ServicePrincipalTenantId

Connect-AzAccount -ServicePrincipal -Credential $credentials -Tenant $tenantId

# Get information about the current resource state
$signalRResource = Get-AzResource -ResourceId $resourceId -Verbose
$currentUnitCount = [int]$signalRResource.Sku.Capacity

# Only scale if we are on the Standard_S1 plan
if ($signalRResource.Sku.Name -eq "Standard_S1") {

    # Get metrics for the last 5 minutes
    $connectionCountMetric = Get-AzMetric -ResourceId $resourceId -MetricName "ConnectionCount" -TimeGrain 00:05:00 -StartTime (Get-Date).AddMinutes(-5) -AggregationType Maximum
    $maxConnectionCount = $connectionCountMetric.Timeseries.Data[0].Maximum

    # Calculate the target unit count
    $targetUnitCount = 1
    foreach ($unitCount in $unitCounts) {
        $unitCountConnections = $unitCount * $connectionsPerUnit
        $unitCountConnectionsThreshold = $unitCountConnections * $scaleThreshold
        if ($unitCountConnectionsThreshold -gt $maxConnectionCount -or $unitCount -eq $unitCounts[$unitCounts.Count - 1]) {
            $targetUnitCount = $unitCount
            Break
        }
    }

    # See if we need to change the unit count
    if ($targetUnitCount -ne $currentUnitCount) {

        Write-Host "Scaling resource to unit count: " $targetUnitCount
                
        # Change the resource unit count
        $signalRResource.Sku.Capacity = $targetUnitCount
        $signalRResource | Set-AzResource -Force
        
    } else {

        Write-Host "Not scaling as resource is already at the optimum unit count: " $currentUnitCount

    }

} else {

    Write-Host "Can't scale as resource is not on a scalable plan: " $signalRResource.Sku.Name

}

I think the code itself should be pretty self explanatory as it matches the structure of the plan and I've added code comments to explain key elements throughout. The important thing really is to update the local.settings.json file with the relevant keys (see here for how to create a service principal, and for the SignalRResourceId you can find that in the properties section for the SignalR Service within the Azure portal).

Once you have the code implemented, you can refer back to the getting started with Powershell Azure Functions tutorial for instructions on how to deploy your function.

And with that, you now have a fully configured SignalR Service complete with Auto-scaling.

Caveats

Whilst the function I've presented will scale up and down the SignalR Service for you based on demand, it's worth knowing that the pricing for this service is currently billed per day, rather than being based on usage like many of their other services (this caught me out). What this means is that if you do scale up to a larger quota, even if you scale back down in the next function call, you will be charged for the whole day at the highest quota level you reached within that day.

In Closing

I have fed back some of my experience to a member of the Azure support team, so I am hoping that both auto-scaling and per-minute billing will come to the SignalR Service offering in the future, but until then, I hope the above code will save you much of the "pain" I had to endure in order to get this implemented.