Part 02: Building a scalable Jenkins server on a Kubernetes Cluster

Part 02: Building a scalable Jenkins server on a Kubernetes Cluster

System Implementation

Introduction

In part 1, we introduced the system architecture as well as its importance to the business. This part will discuss in-depth the infrastructure implementation using Terraform (IaC). We will dig into the coding of different parts of the system (VPC, Subnets, Helm Releases, etc.). Not to mention the configuration of our star, Jenkins Server, will be also discussed.

System Implementation

The implementation is divided into three main parts:

  • AWS infrastructure provisioning
  • Jenkins configuration
  • Monitoring tools configuration

AWS Cloud Components

In this section, we will show the implementation details for the AWS components. In addition, a simple explanation of the code will be added.

AWS VPC

resource "aws_vpc" "seitech-vpc" {
  cidr_block           = var.vpc_cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true


  tags = {
    Name = "${var.cluster_name}-vpc"
  }
}

The block shows that we will use the resource aws_vpc from the Terraform AWS provider. seitech-vpc is just a name for referencing this object in other Terraform files and code.

To build a VPC, you just need to define a CIDR block (the address space). This CIDR block is defined in the attribute cidr_block and is assigned from the variables.

enable_dns_support and enable_dns_hostnames enable private access for Amazon EKS cluster's Kubernetes API server endpoint and limit, or completely disable, public access from the internet.

tags are just key and value pairs that define the assigned tags to the VPC. Here we use string interpolation to assign a tag name for the AWS VPC. Again, note the use of the cluster_name variable.

The vpc_cidr_block variable definition may be like:

variable "vpc_cidr_block" {
  type        = string
  default     = "10.0.0.0/16"
  description = "VPC CIDR Block"
}

This variable may be assigned to any other valid CIDR block through the tfvars file or Terraform CLI. But the default value is used otherwise.

AWS Internet Gateway

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.seitech-vpc.id

  tags = {
    Name = "${var.cluster_name}-gw"
  }
}

The same code as VPC. The difference is that to build an Internet Gateway we need to assign the vpc_id. You may note how we referenced the vpc_id from the VPC object recently defined.

AWS Public and Private Subnet

Public Subnet

We have discussed in part 1 the overall architecture of the system. This architecture depicts the usage of two public subnets in different availability zones. The code below shows the basic structure of one of them and the other is the same except for changing the availability zone.

resource "aws_subnet" "public-seitech-subnet1" {
  vpc_id                  = aws_vpc.seitech-vpc.id
  cidr_block              = var.public_subnet1_cidr_block
  availability_zone       = var.public_subnet1_az
  map_public_ip_on_launch = true

  tags = {
    "Name"                                      = "${var.cluster_name}-public-subnet1"
    "kubernetes.io/role/elb"                    = "1"
    "kubernetes.io/cluster/${var.cluster_name}" = "owned"
  }
}

A weird thing to notice is the tags. The first tag is familiar to us as shown before. But the subsequent tags are not the case. These tags are required by the AWS EKS implementation reference in order to enable the EKS cluster of managing the nodes in these subnets.

map_public_ip_on_launch is used to automatically assign public IPs for services deployed inside the private subnet.

Private Subnet

resource "aws_subnet" "private-seitech-subnet1" {
  vpc_id            = aws_vpc.seitech-vpc.id
  cidr_block        = var.private_subnet1_cidr_block
  availability_zone = var.public_subnet1_az

  tags = {
    "Name"                                      = "${var.cluster_name}-private-subnet1"
    "kubernetes.io/role/internal-elb"           = "1"
    "kubernetes.io/cluster/${var.cluster_name}" = "owned"
  }
}

The private subnet captures the same structure as before. Except for a small change in the second tag. This tag is also defined by AWS EKS documentation as a flag to the cluster to manage the private (not public) worker nodes.

NAT Gateways

A NAT gateway is a Network Address Translation (NAT) service. We use a NAT gateway so that instances in a private subnet can connect to services outside the VPC but external services cannot initiate a connection with those instances.

To create a NAT gateway we need to have the following attributes:

  • allocation_id: the elastic IP to associate with the NAT gateway.
  • subnet_id: the subnet id where the NAT resides.

The following code shows that we can simply create an Elastic IP by specifying the attribute vpc to be true.

resource "aws_eip" "nat_eip1" {
  vpc = true

  tags = {
    Name = "nat-eip1"
  }
}
resource "aws_nat_gateway" "nat1" {
  allocation_id = aws_eip.nat_eip1.id
  subnet_id     = aws_subnet.public-seitech-subnet1.id

  tags = {
    Name = "${var.cluster_name}-NatGatewayAZ1"
  }

  depends_on = [aws_eip.nat_eip1, aws_subnet.public-seitech-subnet1]
}

The NAT gateway attributes are assigned as needed by referencing their objects. One thing to note is the depends_on attribute. This is a terraform intrinsic attribute that explicitly defines the dependency between the needed objects to create. It is assigned to an array of objects (EIP and Subnet) to ensure that any errors due to dependence are resolved.

Another EIP and NAT Gateway are defined for the other public subnet. The naming is the difference only.

Route Tables

The final component in the networking infrastructure is the Route Table. We will define only one public route table to the public internet that will be associated with both the public subnets.

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.seitech-vpc.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id
  }
  tags = {
    Name    = "Public Subnets"
    Network = "Public"
  }
  depends_on = [
    aws_vpc.seitech-vpc
  ]
}

The code shows that we open all the traffic to the public internet ingress using the Internet Gateway defined before.

The next code is the private route table for one of the private subnets. It opens the egress traffic to the internet through the NAT Gateway.

resource "aws_route_table" "private1" {
  vpc_id = aws_vpc.seitech-vpc.id
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.nat1.id
  }
  tags = {
    Name    = "Private Subnet AZ1"
    Network = "Private1"
  }
  depends_on = [
    aws_nat_gateway.nat1,
    aws_vpc.seitech-vpc
  ]
}

We will have another route table for the other private subnet. Again, change only the subnet id and the tags.

Till now we have created all the networking infrastructure and glued the components together, except for gluing the route table and the subnets.

resource "aws_route_table_association" "private-us-east-1a" {
  subnet_id      = aws_subnet.private-seitech-subnet1.id
  route_table_id = aws_route_table.private1.id
}

resource "aws_route_table_association" "private-us-east-1b" {
  subnet_id      = aws_subnet.private-seitech-subnet2.id
  route_table_id = aws_route_table.private2.id
}

resource "aws_route_table_association" "public-us-east-1a" {
  subnet_id      = aws_subnet.public-seitech-subnet1.id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "public-us-east-1b" {
  subnet_id      = aws_subnet.public-seitech-subnet2.id
  route_table_id = aws_route_table.public.id
}

The last snippet of code shows the association configuration of the public route table to the public subnet and the private route table to the corresponding subnets.

AWS EKS

Yet another important component of the system is the Kubernetes cluster. This cluster will be provisioned using the AWS EKS service. Provisioning an EKS is much simpler than the last-mentioned components.

resource "aws_eks_cluster" "seitech-cluster" {
  name     = var.cluster_name
  role_arn = aws_iam_role.seitech-cluster-role.arn

  vpc_config {
    subnet_ids = [
      aws_subnet.private-seitech-subnet1.id,
      aws_subnet.private-seitech-subnet2.id,
      aws_subnet.public-seitech-subnet1.id,
      aws_subnet.public-seitech-subnet2.id
    ]
  }

  depends_on = [aws_iam_role_policy_attachment.demo-AmazonEKSClusterPolicy]
}

The last code shows that we need only a name for the cluster, role, and subnets. We have the name and the subnet ids but we don't have the role arn.

This role arn will be created using the following code:

resource "aws_iam_role" "seitech-cluster-role" {
  name = "seitech-cluster-role"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "demo-AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.seitech-cluster-role.name
}

There is no much to discuss regarding this block. Just define roles and policies as in the AWS documentation.

IAM OIDC

To manage permissions for the applications that we deploy in Kubernetes. We can either attach policies to Kubernetes nodes directly. In that case, every pod will get the same access to AWS resources. Or we can create OpenID connect provider, which will allow granting IAM permissions based on the service account used by the pod.

data "tls_certificate" "eks" {
  url = aws_eks_cluster.seitech-cluster.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "eks" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.seitech-cluster.identity[0].oidc[0].issuer
}

EKS Autoscaler

We will be using OpenID connect provider to create an IAM role and bind it with the auto scaler. Let's create an IAM policy and role first. It's similar to the previous one, but autoscaler will be deployed in the kube-system namespace.

data "aws_iam_policy_document" "eks_cluster_autoscaler_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition {
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub"
      values   = ["system:serviceaccount:kube-system:cluster-autoscaler"]
    }

    principals {
      identifiers = [aws_iam_openid_connect_provider.eks.arn]
      type        = "Federated"
    }
  }
}

resource "aws_iam_role" "eks_cluster_autoscaler" {
  assume_role_policy = data.aws_iam_policy_document.eks_cluster_autoscaler_assume_role_policy.json
  name               = "eks-cluster-autoscaler"
}

resource "aws_iam_policy" "eks_cluster_autoscaler" {
  name = "eks-cluster-autoscaler"

  policy = jsonencode({
    Statement = [{
      Action = [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions"
            ]
      Effect   = "Allow"
      Resource = "*"
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_autoscaler_attach" {
  role       = aws_iam_role.eks_cluster_autoscaler.name
  policy_arn = aws_iam_policy.eks_cluster_autoscaler.arn
}

Then we need to deploy the auto-scaler using Helm as shown.

resource "helm_release" "cluster_autoscaler" {
  name       = "cluster-autoscaler"
  chart      = "cluster-autoscaler"
  repository = "https://kubernetes.github.io/autoscaler"
  version    = "9.9.2"
  namespace  = "kube-system"

  set {
    name  = "fullnameOverride"
    value = "aws-cluster-autoscaler"
  }

  set {
    name  = "autoDiscovery.clusterName"
    value = var.cluster_name
  }

  set {
    name  = "awsRegion"
    value = var.aws_region
  }

  set {
    name  = "rbac.serviceAccount.name"
    value = "cluster-autoscaler"
  }

  set {
    name  = "rbac.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = aws_iam_role.eks_cluster_autoscaler.arn
  }


  depends_on = [
    aws_eks_cluster.seitech-cluster,
    aws_iam_role.eks_cluster_autoscaler,
    aws_iam_policy.eks_cluster_autoscaler,
    aws_iam_role_policy_attachment.eks_cluster_autoscaler_attach
  ]
}

Worker Nodes

The last component related to the EKS infrastructure is defining the attributes for the worker nodes.

We will define the roles and policies for the node groups in order to use them in the subsequent sections.

resource "aws_iam_role" "nodes" {
  name = "eks-node-group-nodes"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "nodes-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.nodes.name
}

resource "aws_iam_role_policy_attachment" "nodes-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.nodes.name
}

resource "aws_iam_role_policy_attachment" "nodes-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.nodes.name
}

Note: All the roles and policies defined in the article are referenced from the AWS documentation and are mandatory for the correct working of the cluster.

After the definition of roles, we need to define the actual node groups. These node groups are divided into two sub-types. The first type is node groups for provisioning on demand EC2 instances. While the second is used in provisioning spot instances. This division is important for decreasing the cluster costs as these spot instances will be used as the Jenkins agents.

resource "aws_eks_node_group" "private-nodes-on-demand" {
  cluster_name    = aws_eks_cluster.seitech-cluster.name
  node_group_name = var.worker_group_name_on_demand
  node_role_arn   = aws_iam_role.nodes.arn

  subnet_ids = [
      aws_subnet.private-seitech-subnet1.id,
      aws_subnet.private-seitech-subnet2.id,
  ]

  capacity_type  = "ON_DEMAND"
  instance_types = var.worker_group_instance_type_on_demand

  scaling_config {
    desired_size = var.autoscaling_group_desired_capacity_on_demand
    max_size     = var.autoscaling_group_desired_capacity_on_demand
    min_size     = var.autoscaling_group_desired_capacity_on_demand
  }

  update_config {
    max_unavailable = 1
  }

  labels = {
    Environment = "${var.cluster_name}-dev-on-demand"
    instance-type = "on-demand"
  }

  depends_on = [
    aws_iam_role_policy_attachment.nodes-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.nodes-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.nodes-AmazonEC2ContainerRegistryReadOnly,
  ]
}
resource "aws_eks_node_group" "private-nodes-spot" {
  cluster_name    = aws_eks_cluster.seitech-cluster.name
  node_group_name = var.worker_group_name_spot
  node_role_arn   = aws_iam_role.nodes.arn


  subnet_ids = [
      aws_subnet.private-seitech-subnet1.id,
      aws_subnet.private-seitech-subnet2.id,
  ]

  capacity_type  = "SPOT"
  instance_types = var.worker_group_instance_type_spot

  scaling_config {
    desired_size = var.autoscaling_group_desired_capacity_spot
    max_size     = var.autoscaling_group_max_size_spot
    min_size     = var.autoscaling_group_min_size_spot
  }

  update_config {
    max_unavailable = 1
  }

  labels = {
    Environment = "${var.cluster_name}-dev-spot"
    instance-type = "spot"
  }

  tags = {
    "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned"
    "k8s.io/cluster-autoscaler/enabled" = "TRUE"
    "k8s.io/cluster-autoscaler/node-template/label/instance-type" = "spot"
  } 

  depends_on = [
    aws_iam_role_policy_attachment.nodes-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.nodes-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.nodes-AmazonEC2ContainerRegistryReadOnly,
  ]
}

Both worker nodes have almost the same attributes and are self-explanatory. We need to emphasize two attributes:

  • The tag attribute
    • The first attribute is an EKS requirement.
    • The second and third are the auto-discovery requirement for the auto-scaler.
  • The labels attribute: it defines instance-type label to use as a nodeSelector in the deployments. We will find later that the first worker group (on_demand) is used as the basic node for the Jenkins controller while the spot nodes will be selected for the agents.

Jenkins Server Configuration

Configuring the Jenkins server without interaction with the UI is one goal of this project. To achieve this we will use the Helm chart and Jenkins Configuration as Code (JCasC).

Let's dig into the Jenkins Values file from top to bottom:

controller:
  installPlugins:
   ....
    - configuration-as-code:latest
    - kubernetes:latest
    - prometheus:latest
   ....
  numExecutors: 1
  containerEnv:
    - name: JENKINS_GITHUB_USERNAME
      valueFrom:
        secretKeyRef:
          name: jenkins-secrets
          key: jenkins-github-username
    - name: JENKINS_GITHUB_PASSWORD
      valueFrom: 
        secretKeyRef:
          name: jenkins-secrets
          key: jenkins-github-password
  serviceType: LoadBalancer
  adminUser: "admin"
  adminPassword: "admin"
  nodeSelector:
    instance-type: on-demand

The first object in the file is the controller. This object will have some other objects like installPlugins. Let's define each concept separately.

  • controller.installPlugins: List of Jenkins plugins to install. Installing "Kubernetes" and "Configuration as Code" is important.
  • numExecutors: Number of Jenkins executors.
  • containerEnv: Environment variables for Jenkins Container. In our example, environment variables will be deduced from Kubernetes secrets.
  • serviceType: k8s service type.
  • adminUser: Admin username.
  • adminPassword: Admin password.
  • nodeSelector: Node labels for pod assignment.

Another important attribute of the Jenkins Controller is the JCasC config:

JCasC:
    defaultConfig: true
    configScripts: 
      base-configs: |
          jenkins:
            systemMessage: "SEITech Demo CI Pipeline"
          credentials:
            system:
              domainCredentials:
                - credentials:
                    - usernamePassword:
                        scope: GLOBAL
                        id: "${JENKINS_GITHUB_USERNAME}-github-organization"
                        username: "${JENKINS_GITHUB_USERNAME}"
                        password: "${JENKINS_GITHUB_PASSWORD}"
                    - aws:
                        scope: GLOBAL
                        id: "${JENKINS_GITHUB_USERNAME}-aws-creds"
                        accessKey: "${AWS_ACCESS_KEY}"
                        secretKey: "${AWS_SECRET_ACCESS_KEY}"
                        description: "AWS Credentials"
          jobs:
            - script: >
                organizationFolder("${JENKINS_GITHUB_USERNAME}") {
                    description("Github Organization - ${JENKINS_GITHUB_USERNAME}")
                    displayName("Organization: ${JENKINS_GITHUB_USERNAME}")
                    buildStrategies {
                        skipInitialBuildOnFirstBranchIndexing()
                    }

                    organizations {
                        github {
                            repoOwner("${JENKINS_GITHUB_USERNAME}")
                            apiUri("https://api.github.com")
                            buildForkPRHead(true)
                            buildForkPRMerge(true)
                            buildOriginBranch(true)
                            buildOriginBranchWithPR(true)
                            buildOriginPRHead(true)
                            buildOriginPRMerge(true)
                            credentialsId("${JENKINS_GITHUB_USERNAME}-github-organization")
                            traits {
                                gitBranchDiscovery()
                                gitHubBranchDiscovery {
                                    strategyId(1)
                                }
                                sourceWildcardFilter {
                                    includes("seitech*")
                                    excludes("")
                                }
                            }
                            configure { node ->
                              node / triggers / 'com.cloudbees.hudson.plugins.folder.computed.PeriodicFolderTrigger' {
                                spec('H H * * *')
                                interval(86400000)
                              }
                            }

                        }
                    }
                }

JCasC uses Configuration as Code Plugin to inject the Jenkins configurations. In our example, we configure the server to show a custom system message and add some credentials for GitHub access. Finally, we configure a seed job to configure an Organization Folder job to integrate all the organization's repos to the Jenkins server. The Jenkins server will then detect the "Jenkinsfile" in each project and execute the pipeline.

Finally, we will look at the agent object. We just define the nodeSelector to use spot instances.

agent:
  nodeSelector:
    instance-type: spot

To deploy all these configurations, we will use the helm_release as shown in the following snippet.

resource "helm_release" "jenkins" {
  name       = "jenkins"
  repository = "https://charts.jenkins.io"
  chart      = "jenkins"

  values = [
    "${file("jenkins-values.yaml")}"
  ]

  set_sensitive {
    name  = "controller.adminUser"
    value = var.jenkins_admin_username
  }

  set_sensitive {
    name  = "controller.adminPassword"
    value = var.jenkins_admin_password
  }

  depends_on = [
    aws_eks_cluster.seitech-cluster
  ]
}

Monitoring Tools

Grafana

Again we will use Helm to deploy Grafana with the following values file.

dashboardProviders:
 dashboardproviders.yaml:
   apiVersion: 1
   providers:
   - name: 'default'
     orgId: 1
     folder: ''
     type: file
     disableDeletion: false
     editable: true
     options:
       path: /var/lib/grafana/dashboards/default

datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-server.default.svc.cluster.local
      access: proxy
      isDefault: true

dashboards: 
  default:
    9964-jenkins-performance-and-health-overview:
      gnetId: 9964
      revision: 1
      datasource: Prometheus

    6417-kubernetes-cluster-prometheus:
      gnetId: 6417
      revision: 1
      datasource: Prometheus

service:
  enabled: true
  type: LoadBalancer

To use helm release for Grafana:

resource "helm_release" "grafana" {
  name       = "grafana"
  chart      = "grafana"
  repository = "https://grafana.github.io/helm-charts"
  version    = "6.15.0"
  namespace  = "default"

  set {
    name  = "adminPassword"
    value = "admin"
  }

  values = [
    file("grafana-values.yaml"),
  ]

  depends_on = [
    aws_eks_cluster.seitech-cluster,
 ]
}

Prometheus

Prometheus values file:

extraScrapeConfigs: |
  - job_name: 'jenkins'
    metrics_path: /prometheus
    params:
      module: [http_2xx]
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets:
        - jenkins.default.svc.cluster.local:8080

Prometheus helm release:

resource "helm_release" "prometheus" {
  name       = "prometheus"
  chart      = "prometheus"
  repository = "https://prometheus-community.github.io/helm-charts"
  version    = "15.0"
  namespace  = "default"

    values = ["${file("prometheus-values.yaml")}"]

  set {
    name  = "adminPassword"
    value = "admin"
  }

  depends_on = [
    aws_eks_cluster.seitech-cluster,
 ]
}

Kubernetes Secrets

The following code shows how to deploy secrets to K8s to use in the Jenkins Configuration file.

resource "kubernetes_secret" "jenkins-secrets" {
  metadata {
    name = "jenkins-secrets"
  }

  data = {
    jenkins-github-username = var.jenkins_github_username
    jenkins-github-password = var.jenkins_github_password
  }

  type = "Opaque"
  immutable = true
}

We have reached the end of the implementation section. In the following sections, we will demo some results and comment on this implementation.

Results

The following video shows the results of this implementation.

Conclusion

In the end, implementing this solution in production is helpful for large teams and projects. However, there are too many services to integrate with your implementation especially artifact manager and security firewalls.

References

1- Online Meetup: Configuration as Code of Jenkins (for Kubernetes): youtu.be/KB7thPsG9VA

2- Terraform Helm EKS Jenkins: github.com/Josh-Tracy/Terraform-Helm-EKS-Je..

3- Jenkins Build agents K8s: devopscube.com/jenkins-build-agents-kuberne..

4- Enter the world of Jenkins Configuration as Code - JCasC: github.com/sharepointoscar/JCasC