Developing a Nomad Autoscaler for Harvester
Nomad orchestrates application deployment and management. As applications grow in size, managing resource consumption becomes crucial. The Nomad Autoscaler is a pluggable service that makes workload scaling more accessible, empowering users to create logic for scaling their infrastructure.
Developing a custom plugin is especially beneficial when catering to cloud environments or hypervisors that aren't supported by the HashiCorp community. This blog will guide you through creating a Nomad Autoscaler plugin, through the use of the exposed methods: SetConfig
, Scale
, and Status
.
Defining the Plugin Struct
For our Nomad Autoscaler plugin, we'll define a struct to hold configuration and other state information for scaling on Harvester. The Plugin
struct implements the sdk.Target interface to work as a nomad autoscaling plugin. The Plugin struct should contain all the state needed to actually implement autoscaling, such as configuration. loggers, and api clients.
package main
import (
"context"
"fmt"
"time"
"github.com/hashicorp/go-hclog"
"github.com/drewmullen/harvester-go-sdk"
"github.com/hashicorp/nomad/api"
)
type HarvesterPlugin struct {
config map[string]string
logger hclog.Logger
HarvesterClient *harvester.APIClient
NomadClient *api.Client
// Additional Config
}
func NewPlugin(log hclog.Logger) *HarvesterPlugin {
return &HarvesterPlugin{
logger: log,
}
}
Configuring the Plugin
The target plugin contains two different config parameters:
- target: what to scale
- policy: when to scale
Target Configuration
The target configuration contains the plugin-specific configuration, containing authentication credentials, global settings, etc.
target {
driver = "harvester"
config = {
harvester_url = "https://harvester.example.com"
auth_token = "eyabc123"
}
}
Once instantiated, the Nomad autoscaler service will pass the target configuration options to the plugin's SetConfig
method, which can then be used to set up the plugin fields. The configuration will also contain options as defined in the General Options documentation.
A sample setup might look something like this:
func (hp *HarvesterPlugin) SetConfig(config map[string]string) error {
token := getEnvOrConfig("HARVESTER_TOKEN", config, configKeyAuthToken) // A function that returns in order of priority environment var, config value.
url := getEnvOrConfig("HARVESTER_URL", config, configKeyHarvesterURL) // configKeyHarvesterURL is a const defined elsewhere
hp.HarvesterClient = harvester.NewAPIClient(&harvester.Configuration{
DefaultHeader: map[string]string{"Authorization": "Bearer " + token},
UserAgent: "nomad-autoscaler",
Debug: false,
Servers: harvester.ServerConfigurations{
{URL: url, Description: "Harvester API Server"},
},
})
apiConfig := &api.Config{
Address: config["nomad_address"],
Region: config["nomad_region"],
Namespace: config["nomad_namespace"],
}
if token, ok := config["nomad_token"]; ok {
apiConfig.Headers = map[string][]string{"X-Nomad-Token": []string{token}}
}
nomadClient, err := api.NewClient(apiConfig)
if err != nil {
return fmt.Errorf("failed to create Nomad client: %v", err)
}
hp.NomadClient = nomadClient
// Any other additional Config
return nil
}
Scaling
Cluster Operators author scaling policing when interacting with the Autoscaler. The config provided is then pass as a parameter to the Scale
method to dynamically allocate the necessary resources.
scaling "cluster_policy" {
enabled = true
min = 1 # min number of VMs to scale
max = 2 # max number of VMs to scale
policy {
....
target "aws-asg" {
dry-run = "false"
node_class = "linux"
node_group_name = "nomad"
namespace = "default"
cpu_request = "2"
memory_request = "4Gi"
...
}
}
}
With the configuration defined, Nomad passes the scaling config to the plugin's Scale
method. Your hypervisor will determine the actual implementations of how to calculate the active nodes for scale operations.
func (hp *HarvesterPlugin) Scale(action sdk.ScalingAction, config map[string]string) error {
// config parsing removed for simplicity
ctx, cancel := context.WithTimeout(context.Background(), hp.scaleTimeout)
defer cancel()
total, _, remoteIDs, err := hp.countReady(ctx, nodeGroup, namespace)
if err != nil {
return fmt.Errorf("failed to count servers in harvester: %v", err)
}
diff, direction := hp.calculateDirection(total, action.Count)
switch direction {
// SCALE_IN is an enum utilized as output of the calculateDirection function
case SCALE_IN:
if err := hp.scaleIn(ctx, diff, remoteIDs, config); err != nil {
return fmt.Errorf("failed to perform in: %v", err)
}
// SCALE_OUT is an enum utilized as output of the calculateDirection function
case SCALE_OUT:
if err := hp.scaleOut(ctx, diff, config); err != nil {
return fmt.Errorf("failed to perform out: %v", err)
}
default:
hp.logger.Debug("scaling not required", "node group", nodeGroup, "current_count", total, "strategy_count", action.Count)
return nil
}
return err
}
Draining Nodes
During the scaleIn
method, HashiCorp recommends that you first drain the node. Draining and purging nodes is critical to scaling down operations, providing reliability to allow for applications to gracefully shutdown. After some time, with the node offline, Nomad's garbage collector will then remove the node from the cluster.
func (hp *HarvesterPlugin) drainNode(ctx context.Context, nodeID string, timeout time.Duration) error {
_, err := hp.NomadClient.Nodes().UpdateDrainOpts(
nodeID,
&api.DrainOptions{
DrainSpec: &api.DrainSpec{
Deadline: timeout,
IgnoreSystemJobs: true,
},
MarkEligible: false,
},
nil,
)
if err != nil {
hp.logger.Warn(fmt.Sprintf("Failed to drain %v. Will continue to deleting: %v", nodeID, err))
} else {
drainCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
err := hp.waitForDrained(drainCtx, nodeID)
if err != nil {
hp.logger.Warn(fmt.Sprintf("Failed to drain %v: %v", nodeID, err))
}
}
}
Status
The Status
function reports the current status of your plugin, which helps debug and monitor purposes. The plugin determines the current running count using information returned by the plugin's Status
method. The method returns an sdk.TargetStatus
to determine if the next Scale
function can be performed, as well as the current running count to determine the next strategy calculation.
func (hp *HarvesterPlugin) Status(config map[string]string) (*sdk.TargetStatus, error) {
total, active, _, err := hp.countReady(context.Background(), nodeGroup, namespace)
if err != nil {
return nil, fmt.Errorf("failed to count Harvester servers: %v", err)
}
return &sdk.TargetStatus{
Ready: active == total,
Count: total,
Meta: make(map[string]string),
}, nil
}
Conclusion
Developing a Nomad Autoscaler plugin involves implementing key functions like SetConfig
, Scale
, and Status
. Configuring the plugin requires defining target and policy blocks, which dictate what to scale and under what conditions. The proper handling of draining allows for control over the scaling, maintaining the reliability of your application.
Writing an autoscaler plugin lets you to tailor your hypervisor to the needs of your Nomad-managed infrastructure. Finally, here's a demo of the autoscaler in action.
For more details and examples, check out the Nomad Autoscaler Plugin authoring guide, and the Nomad Autoscaling tools documentation.
Special thanks to Steve Kalt for helping review this post.