Scale a service
The Nomad Autoscaler is a tool that can scale workloads and client nodes in a Nomad cluster automatically. It supports two kinds of scaling scenarios:
- Horizontal application autoscaling is when the autoscaler controls the number of allocations (service instances) Nomad schedules.
- Horizontal cluster autoscaling is when the autoscaler controls the number of Nomad client nodes in the cluster.
Both types of autoscaling are configured with scaling policies that scale according to changes in resource usage, including CPU consumption, memory, or metrics from other Application Performance Monitoring (APM) tools.
When you deploy an application as microservices, the autoscaler helps you scale each service independently. If only one service experiences additional load, then Nomad can add additional allocations for that service only. This approach uses available resources more efficiently than scaling the entire application.
In this tutorial, you deploy a version of HashiCups with a modified job definition for the frontend
service. The modified job instructs the autoscaler to create additional instances during high CPU load.
Infrastructure overview
At the beginning of the tutorial, you have Consul API gateway deployed on the public client node of your cluster.
Prerequisites
This tutorial uses the infrastructure set up in the previous tutorial of this collection, Integrate service mesh and gateway. Complete that tutorial to set up the infrastructure if you have not done so.
Review configuration files
The frontend
service renders the HashiCups UI and contains a value in the page footer that shows which instance send the response to the request. This tutorial uses the footer to show how scaling functions when the autoscaler responds to the increased load.
This version of HashiCups adds a scaling
block to the frontend
service, and includes running the Nomad Autoscaler as a job in Nomad.
Additional configurations for the autoscaler exist in the shared/jobs
directory and include 05.autoscaler.config.sh
and 05.autoscaler.nomad.hcl
.
Review the autoscaler configuration
The Nomad Autoscaler is a separate piece of software that runs as a system process like the Consul and Nomad agents, or as a job in Nomad. It scales workloads running in Nomad based on the scaling
block in the jobspec.
The repository provides a script, 05.autoscaler.config.sh
, that automates the initial configuration required for Nomad to integrate with the Aurtoscaler.
The set up script first cleans up previous ACL configurations and then applies the ACL policy for the autoscaler.
/shared/jobs/05.autoscaler.config.sh
## ... ## Delete Nomad ACL policynomad acl policy delete autoscaler ## ... ## Create Nomad ACL policy 'autoscaling-policy'tee ${_scale_policy_FILE} > /dev/null << EOFnamespace "default" { policy = "scale"} namespace "default" { capabilities = ["read-job"]} operator { policy = "read"} namespace "default" { variables { path "nomad-autoscaler/lock" { capabilities = ["write"] } }}EOF nomad acl policy apply \ -namespace default \ -job autoscaler \ autoscaler ${_scale_policy_FILE}
Review the autoscaler jobspec
The autoscaler runs as a Docker container. Its configuration defines the Nomad cluster address as well as the Application Performance Monitoring (APM) tool used to monitor data.
This jobspec uses the Nomad APM plugin. It is suitable for scaling based on CPU and memory usage. It is not as flexible as other APM plugins, but does not require additional installation or configuration. If you want to scale based on other metrics, consider using the Prometheus plugin or the Datadog plugin.
/shared/jobs/05.autoscaler.nomad.hcl
job "autoscaler" { group "autoscaler" { # ... task "autoscaler" { # ... template { data = <<EOF log_level = "debug" plugin_dir = "/plugins" nomad { address = "https://nomad.service.dc1.global:4646" skip_verify = "true" } apm "nomad" { driver = "nomad-apm" } EOF destination = "${NOMAD_TASK_DIR}/config.hcl" } } }}
Review the HashiCups jobspec
Open the 05.hashicups.nomad.hcl
jobspec file and view the contents.
Nomad scales the frontend
service when CPU usage of all tasks in the frontend
group reaches 70% of the maximum allocated CPU for the group. The target value strategy plugin is responsible for the CPU usage calculation. Scaling up happens in increments of one instance maximum, while scaling down happens up to two instances maximum. These values are part of the strategy
block configuration.
/shared/jobs/05.hashicups.nomad.hcl
# ...variable "frontend_max_instances" { description = "The maximum number of instances to scale up to." default = 5} variable "frontend_max_scale_up" { description = "The maximum number of instances to scale up by." default = 1} variable "frontend_max_scale_down" { description = "The maximum number of instances to scale down by." default = 2} job "hashicups" { # ... group "frontend" { # ... scaling { enabled = true min = 1 max = var.frontend_max_instances policy { evaluation_interval = "5s" cooldown = "10s" check "high-cpu-usage" { source = "nomad-apm" query = "max_cpu-allocated" strategy "target-value" { driver = "target-value" target = 70 threshold = 0.05 max_scale_up = var.frontend_max_scale_up max_scale_down = var.frontend_max_scale_down } } } } # ... } # ...}
Review the load test script
The load testing script makes requests to the HashiCups URL with the hey
tool to trigger scaling. It does so in several waves and adds more requests with each additional wave.
/shared/jobs/05.load-test.sh
#!/bin/bash # This script requires the hey tool# https://github.com/rakyll/hey [ -z "$1" ] && echo "No URL passed as first argument...exiting" && exit 1 _URL=$1echo "Application address: $_URL" _waves=5 _wave_duration=15_workers_multiplier=7_rate_multiplier=6_sleep_time=7 for i in $(seq 1 $_waves);do _wave_duration=15 _concurrent_workers=$(($_workers_multiplier * $i)) _rate_limit_per_sec_per_worker=$(($_rate_multiplier * $i)) echo "Sending $(($_wave_duration * $_concurrent_workers * $_rate_limit_per_sec_per_worker)) requests over $_wave_duration seconds" hey -z "$_wave_duration"s -c $_concurrent_workers -q $_rate_limit_per_sec_per_worker -m GET ${_URL} > /dev/null echo "Waiting $_sleep_time seconds..." sleep $_sleep_timedone
Deploy Nomad autoscaler
Deploy the Nomad autoscaler before you deploy the HashiCups application.
Run the autoscaler setup script and jobspec
Run the autoscaler configuration script.
$ ./05.autoscaler.config.shConfigure environment.Clean previous configurations.Successfully deleted autoscaler policy!Create Nomad ACL policy 'autoscaler'Successfully wrote "autoscaler" ACL policy!
Submit the autoscaler job to Nomad.
$ nomad job run 05.autoscaler.nomad.hcl==> 2024-11-14T11:10:53+01:00: Monitoring evaluation "470de665" 2024-11-14T11:10:53+01:00: Evaluation triggered by job "autoscaler" 2024-11-14T11:10:53+01:00: Allocation "aab6f744" created: node "44bf6336", group "autoscaler" 2024-11-14T11:10:55+01:00: Evaluation within deployment: "30dcd6fb" 2024-11-14T11:10:55+01:00: Evaluation status changed: "pending" -> "complete"==> 2024-11-14T11:10:55+01:00: Evaluation "470de665" finished with status "complete"==> 2024-11-14T11:10:55+01:00: Monitoring deployment "30dcd6fb" ✓ Deployment "30dcd6fb" successful 2024-11-14T11:11:08+01:00 ID = 30dcd6fb Job ID = autoscaler Job Version = 0 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline autoscaler 1 1 1 0 2024-11-14T10:21:06Z
Deploy HashiCups
Submit the HashiCups job to Nomad.
$ nomad job run 05.hashicups.nomad.hcl==> 2024-11-14T11:12:46+01:00: Monitoring evaluation "ddff28e0" 2024-11-14T11:12:46+01:00: Evaluation triggered by job "hashicups" 2024-11-14T11:12:46+01:00: Evaluation within deployment: "ace8ef73" 2024-11-14T11:12:46+01:00: Allocation "8af7e475" created: node "dda24c18", group "frontend" 2024-11-14T11:12:46+01:00: Allocation "976c2df2" created: node "3fadad86", group "db" 2024-11-14T11:12:46+01:00: Allocation "a69ca45d" created: node "3fadad86", group "nginx" 2024-11-14T11:12:46+01:00: Allocation "b2e63ebb" created: node "3fadad86", group "public-api" 2024-11-14T11:12:46+01:00: Allocation "bfc46236" created: node "3fadad86", group "payments" 2024-11-14T11:12:46+01:00: Allocation "f755b328" created: node "dda24c18", group "product-api" 2024-11-14T11:12:46+01:00: Evaluation status changed: "pending" -> "complete"==> 2024-11-14T11:12:46+01:00: Evaluation "ddff28e0" finished with status "complete"==> 2024-11-14T11:12:46+01:00: Monitoring deployment "ace8ef73" ✓ Deployment "ace8ef73" successful 2024-11-14T11:13:28+01:00 ID = ace8ef73 Job ID = hashicups Job Version = 0 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline db 1 1 1 0 2024-11-14T10:23:19Z frontend 1 1 1 0 2024-11-14T10:23:07Z nginx 1 1 1 0 2024-11-14T10:23:03Z payments 1 1 1 0 2024-11-14T10:23:19Z product-api 1 1 1 0 2024-11-14T10:23:22Z public-api 1 1 1 0 2024-11-14T10:23:26Z
Scale the frontend
service
Get the public address of the API gateway and export it as the API_GW
environment variable.
$ export API_GW=`nomad node status -verbose \ $(nomad job allocs --namespace=ingress api-gateway | grep -i running | awk '{print $2}') | \ grep -i public-ipv4 | awk -F "=" '{print $2}' | xargs | \ awk '{print "https://"$1":8443"}'`
Open the Nomad UI and log in with the ui -authenticate
command. This command opens a web browser window on your machine. Alternatively, you can open the Nomad UI with the IP in Nomad_UI
and log in with Nomad_UI_token
.
$ nomad ui -authenticateOpening URL "https://18.116.52.247:4646" with one-time token
The hashicups
job, which consists of multiple services, appears in the list of jobs.
Click the hashicups job, and then select the frontend task from the list of task groups.
This page displays a graph that shows scaling events at the bottom of the page. Keep this page open so that you can reference it when scaling starts.
Run the load test script and observe the graph on the frontend task page in the Nomad UI. Observe Nomad create additional allocations when the autoscaler scales the frontend service up, and then remove the allocations as the autoscaler scales the service back down.
$ ./05.load-test.sh $API_GWApplication address: https://3.15.17.40:8443Sending 630 requests over 15 secondsWaiting 7 seconds...Sending 2520 requests over 15 secondsWaiting 7 seconds...Sending 5670 requests over 15 secondsWaiting 7 seconds...Sending 10080 requests over 15 secondsWaiting 7 seconds...Sending 15750 requests over 15 secondsWaiting 7 seconds...
In the Consul UI, the number of instances of the frontend
service registered in the catalog changes as the autoscaler scales up and down.
In the Consul UI, click the frontend service and then click on the Instances tab name to view details about each instance.
Before you clean up your environment, you can re-run the load script and observe changes in the Nomad UI and Consul UI as they occur.
Clean up
After you complete this tutorial, you should clean up the deployment. If you want to keep experimenting with the cluster you can clean the cluster state without destroying the underlying infrastructure.
When you are finished, we recommend you destroy the infrastructure to avoid unnecessary costs.
Open up the terminal session from where you submitted the jobs and stop the deployment when you are ready to move on. The nomad job stop
command can accept more than one job.
$ nomad job stop -purge hashicups autoscaler==> 2024-11-14T18:31:37+01:00: Monitoring evaluation "a26c41b6"==> 2024-11-14T18:31:37+01:00: Monitoring evaluation "9592a117" 2024-11-14T18:31:37+01:00: Evaluation triggered by job "hashicups" 2024-11-14T18:31:37+01:00: Evaluation status changed: "pending" -> "complete" 2024-11-14T18:31:37+01:00: Evaluation triggered by job "autoscaler"==> 2024-11-14T18:31:37+01:00: Evaluation "9592a117" finished with status "complete" 2024-11-14T18:31:37+01:00: Evaluation status changed: "pending" -> "complete"==> 2024-11-14T18:31:37+01:00: Evaluation "a26c41b6" finished with status "complete"
Clean the autoscaler configuration.
$ ./05.autoscaler.config.sh -cleanConfigure environment.Clean previous configurations.Successfully deleted autoscaler policy!Only cleaning selected...Exiting.
Stop the API gateway deployment.
$ nomad job stop --namespace ingress -purge api-gateway==> 2024-11-14T18:32:45+01:00: Monitoring evaluation "d412f90c" 2024-11-14T18:32:45+01:00: Evaluation triggered by job "api-gateway" 2024-11-14T18:32:45+01:00: Evaluation status changed: "pending" -> "complete"==> 2024-11-14T18:32:45+01:00: Evaluation "d412f90c" finished with status "complete"
Remove Consul intentions.
$ ./04.intentions.consul.sh -cleanConfigure environment.Clean previous configurations.Config entry deleted: service-intentions/databaseConfig entry deleted: service-intentions/product-apiConfig entry deleted: service-intentions/payments-apiConfig entry deleted: service-intentions/public-apiConfig entry deleted: service-intentions/frontendConfig entry deleted: service-intentions/nginxOnly cleaning selected...Exiting.
Remove Consul and Nomad configuration.
$ ./04.api-gateway.config.sh -cleanConfigure environment.Clean previous configurations.Config entry deleted: http-route/hashicups-http-routeConfig entry deleted: inline-certificate/api-gw-certicateConfig entry deleted: api-gateway/api-gatewayBinding rule "29fc7353-623c-ed89-5caa-b252ebc59aad" deleted successfullySuccessfully deleted namespace "ingress"!Auth method "nomad-workloads" deleted successfullyOnly cleaning selected...Exiting.
Next steps
In this tutorial, you deployed a version of HashiCups with a modified job defintion for the frontend
service that instructed the Nomad Autoscaler to scale up and down based on CPU load.
In this collection, you learned how to migrate a monolithic application to microservices and run them in Nomad with Consul. You deployed a cluster running Consul and Nomad, configured access to the CLI and UI components, deployed several versions of the HashiCups application to show different stages of integration with Consul and Nomad, and automatically and independently scaled one of the HashiCups services with the Nomad Autoscaler.
Check out the resources below to learn more and continue your learning and development.