Self-hosted Gitlab Runners on AKS, with Managed Identities
This post is long overdue, I've been meaning to write this for a long time but did not get around to doing it. So here goes!
The code used in this post is located in https://gitlab.com/ascodenl/gitlab-runner-aks
The great thing about using Managed Identities in Azure is that they cannot be (ab)used to run elsewhere, like Service Principals can. Yes, you can use workload identities conditional access policies nowadays, but SP's are basically just another username/password combination that can be abused for other purposes than they were meant for. Managed Identities are a special type of Service Principal (which is an Azure Enterprise Application Registration under the hood) that can only be used when attached to an Azure resource. When connected to a Kubernetes service account, the managed identity permissions can be assumed by the pod that has that service account attached. This allows for fine-grained permissions for specific purposes, for specific Gitlab runners. For example, a runner that is purpose-built for building container images that pushes to Azure Container Registry - using a Managed Identity that only has the AcrPush
permission on the ACR(s) it needs to push to.
The case for self-hosted Gitlab Runners
"Why not use the Gitlab provided runners"? I hear that quite often. In summary, self-hosted GitLab runners are ideal when you need maximum control, security, and flexibility for your CI/CD jobs, want to optimize costs at scale, or have specific compliance and infrastructure requirements that (what Gitlab calls) Instance runners
cannot meet. Yes, it requires maintenance but it outweighs the added benefits - especially in environments with heavy compliance requirements like the Financial industry.
Slightly more detailed, for several reasons:
1. Full Control Over the Build Environment
- Self-hosted runners give complete control over the (virtual) hardware, operating system, and software installed on the machines (in our case, Kubernetes) running CI/CD jobs. This allows you to customize environments to match your production setup, install proprietary or legacy tools, and fine-tune performance for specific workloads.
2. Security and Compliance
- By running jobs on your own infrastructure, you can ensure sensitive code and data never leave your network. This is especially important for organizations with strict compliance, data residency, or privacy requirements, such as those in the public sector.
- Self-hosted runners can be placed behind firewalls or VPNs, further reducing exposure to potential threats.
3. Cost Efficiency and Scalability
- For teams with high CI/CD usage, self-hosted runners can be more cost-effective than paying for shared or cloud-hosted runners, especially at scale. You avoid per-minute billing and can utilize existing cloud resources as needed.
- You can scale your runners as your needs grow.
4. Performance and Flexibility
- Self-hosted runners can be optimized for your specific workloads, providing faster builds and more reliable performance than shared runners.
- You can run jobs on specialized hardware (e.g., GPUs, large-memory machines) or in specific environments (e.g., on-premises, in particular cloud regions) that aren't available with GitLab-hosted runners.
5. Advanced Customization
- You can create custom caching strategies, use local Docker registries, or integrate with internal systems and services that aren't accessible from public runners.
- Self-hosted runners allow for advanced monitoring, logging, and debugging, giving greater visibility into CI/CD processes.
Types of runners
Gitlab has 3 types of runners:
- Instance runners: Shared runners that are built, deployed and managed by Gitlab for generic use.
- Group runners: Deployed on Gitlab group level, inherited by all repositories (projects) that are part of that group. Self-hosted and self- managed.
- Project runners: Deployed on a single repository. Self-hosted and self-managed.
Types of runtime environments
When registering a GitLab runner, you must select an executor, which determines the environment in which your CI/CD jobs will run. Each executor offers different levels of isolation, scalability, and compatibility, making them suitable for various scenarios.
Executor | Isolation | Typical Use Case | Pros | Cons |
---|---|---|---|---|
Shell | Low | Simple, local jobs | Easy, minimal setup | Low isolation, less secure |
Docker | High | Reproducible, isolated builds | Clean, scalable, supports services | Needs Docker, some limits |
Docker Autoscaler | High | Scalable cloud builds | Auto-scales, cloud support | Complex setup |
Instance | Very High | Full VM per job, high isolation | Max isolation, flexibility | Resource intensive |
Kubernetes | High | Cloud-native, Kubernetes environments | Scalable, cloud integration | Needs Kubernetes |
SSH | Varies | Remote, legacy, or custom environments | Remote execution | Limited support |
VirtualBox/Parallels | High | VM-based isolation on local hardware | Good isolation | Slower, needs virtualization |
Custom | Varies | Anything not covered above | Flexible | Requires custom scripts |
Choosing the right executor depends on your project's requirements for isolation, scalability, environment, and available infrastructure.
Our default platform of choice is Kubernetes, this article covers the Azure implementation of Kubernetes called Azure Kubernetes Service (AKS)
Creating infrastructure in Azure (or any environment for that matter) is done using Infrastructure as Code (IaC). The tool of choice is Terraform or Tofu, whatever your preference. The idea is to let a CI/CD pipeline handle the creating, updating and destruction of Azure resources, using Gitlab Runners on AKS. For this, we need several resources to make that happen.
Here is a quick overview of what we are building:
Azure configuration
The runner will use a Kubernetes Service account, which is "connected" to an Azure Managed Identity, that will be assigned roles with permissions to create resources in Azure. You will need an AKS cluster with OIDC issuer enabled. Read here how to enable if not configured yet.
The Managed Identity is created as a separate resource outside of any Terraform Module. The main reason for this is, we use RBAC to assign permissions on Azure Resources. If you want to allow a Managed Identity access to Entra ID (to read groups, primarily), assigning permissions in Entra ID requires elevated privileges that we do not want to delegate to a Managed Identity. To prevent (re)creation of a Managed Identity as part of a module, we create it separately.
locals {
runners = {
tf = {
Note that we make the MSI specific to deploying Terraform resources (azurerm_user_assigned_identity.gitlab_runner["tf"].principal_id
) owner of the subscriptions. Contributor is not going to be enough as we also want to use the pipeline to do role assignments and RBAC, therefor it needs owner permissions.
⚠️ Warning: This means that this MSI has very powerful privileges! Make sure you lock down your pipelines so that not just anyone can run them and make sure you do proper Merge Requests and code reviews!
Kubernetes resources
Remember we are using Kubernetes as the Gitlab Executor. What we are deploying is what can be described as a "runner manager", which will spin off containers (pods, actually) that will run the pipeline. Once the pipeline is finished, the pod is destroyed.
The Gitlab Runner is deployed using Helm. Gitlab maintains a Helm chart that you can find on https://gitlab.com/gitlab-org/charts/gitlab-runner/.
Gitlab Runner configuration is done in a config.toml
file that we deploy using a template.
gitlabUrl: "${gitlab_url}"
unregisterRunners: true
terminationGracePeriodSeconds: 3600
Then we use Terraform to tranform the template and deploy the Helm chart:
locals {
gitlab_runner_vars = {
gitlab_url = var.gitlab_url
To check for the latest version(s) of the chart:
helm repo add gitlab-runner https://charts.gitlab.io
helm repo update
helm search repo-l gitlab/gitlab-runner | head -5
The last part is to create the Kubernetes Service Account that "glues" the Managed Identity to the Gitlab Runner:
resource "kubernetes_service_account_v1" "gitlab_runner" {
metadata {
name = "gitlab-runner-${random_id.this.hex}"
namespace = kubernetes_namespace_v1.gitlab.metadata[0].name
annotations = {
"azure.workload.identity/client-id" = var.msi_client_id
"azure.workload.identity/tenant-id" = var.tenant_id
}
labels = {
"azure.workload.identity/use" = "true"
}
}
}
Finally some required Kubernetes resources to make this all work (I use rbac.create = false
in the config.toml because I like to be in control of what is created. Setting this to true means you have to annotate the Service Account with the correct values as it gets auto-created).
resource "kubernetes_namespace_v1" "gitlab" {
metadata {
name = "gitlab-${random_id.this.hex}"
Runner registration
ℹ️ Note: This article describes the new way of registering runners. Please see https://docs.gitlab.com/ci/runners/new_creation_workflow/ how to migrate.
When deploying a runner, it needs to be registered against a group or repository. Each group or repository has its own unique token. You can do this from the CI/CD settings of the repository or group, create a project or group runner, fill in the details and out comes a registration token. But who wants do do manual? Let's automate this.
Terraform has a great provider for Gitlab, found on https://registry.terraform.io/providers/gitlabhq/gitlab/latest/docs. You can use it to fully automate your Gitlab environment, including repositories, groups, authorizations, integrations, etc. We'll focus on the gitlab_user_runner resource to get the registration token.
Each group or project in Gitlab has a unique id, whiich is hard to find and even harder to remember. We use the path to find the id, which is a lot easier to remember. If you use Terraform to also create your groups and projects, you can even reference the Terraform resource!
Terraform provider configuration is required for Gitlab and Kubernetes (the registration token is stored in a Kubernetes secret):
provider "kubernetes" {
config_path = "~/.kube/config" # Need to create this file from the pipeline or run locally
}
provider "gitlab" {
base_url = "https://gitlab.com/"
token = data.azurerm_key_vault_secret.gitlab_token.value # this can be a Group token or a PAT token with the create_runner scope
}
First, we need to determine if we are deploying a group runner or a project runner:
data "gitlab_group" "group" {
count = var.runner_type == "group_type" ? 1 : 0
full_path = var.repo_path
}
data "gitlab_project" "project" {
count = var.runner_type == "project_type" ? 1 : 0
path_with_namespace = var.repo_path
}
repo_path
is the path to your repo, for example ascodenl/infra/tools
.
Then, dependent on what type of runner you want, create a token and store it in a Kubernetes secret. Note the reference to a Kubernetes Service Account, this will become clear later on.
resource "gitlab_user_runner" "gitlab_runner_project" {
count = var.runner_type == "project_type" ? 1 : 0
runner_type = var.runner_type
Using this in a Gitlab pipeline
Now that the runner is deployed with the proper permissions, it is time to create a pipeline to implement this in CI/CD.
Creating a full multi environment pipline is enough for a separate blog post, so here is the most important part:
before_script:
- |
if ! [ -x "$(command -v az)" ]; then
echo -e "\e[33mWarn: az is not installed.\e[0m"
exit 1
else
echo "Logging in to Azure using client_id $AZURE_CLIENT_ID..."
az login --service-principal -u $AZURE_CLIENT_ID --tenant $AZURE_TENANT_ID --federated-token $(cat $AZURE_FEDERATED_TOKEN_FILE)
if [[ ! -z ${ARM_SUBSCRIPTION_NAME} ]]; then az account set -n ${ARM_SUBSCRIPTION_NAME}; fi
export ARM_OIDC_TOKEN=$(cat $AZURE_FEDERATED_TOKEN_FILE)
export ARM_CLIENT_ID=$AZURE_CLIENT_ID
export ARM_TENANT_ID=$AZURE_TENANT_ID
fi
If OIDC is working correctly, the Azure token is stored in a file that is referenced in $AZURE_FEDERATED_TOKEN_FILE
which usually points to /var/run/secrets/azure/tokens/azure-identity-token
. Enabling OIDC on AKS deploys something called a "Mutating Admission Webhook" which takes care of getting a token that has a limited lifetime, and refreshes the token on expiration. If you are interested in how this works under the hood, look here.
I hope you enjoyed this, thanks for sticking around till the end. Until the next!
- ← Previous
New Website, refactored