Resource Efficient Bayesian Optimization: A Technical Analysis

Introduction

Bayesian Optimization (BO) has become a standard approach for optimizing expensive black-box functions, particularly in machine learning hyperparameter tuning. While traditional BO focuses on sample efficiency (minimizing function evaluations), it overlooks computational resource efficiency, a critical factor when optimizing models on cloud or high-performance computing (HPC) infrastructures with heterogeneous resources.

This post analyzes the Resource Efficient Bayesian Optimization (REBO) algorithm, which extends traditional BO to optimize not only what parameters to evaluate but also which computational resources to use for each evaluation.

Background: The Standard Bayesian Optimization Framework

Bayesian Optimization constructs a probabilistic surrogate model of the objective function, typically using Gaussian Process (GP) regression, and employs an acquisition function to determine the next point to evaluate. Standard BO can be formalized as:

Given an objective function $f: \mathcal{X} \rightarrow \mathbb{R}$ , find $x^* \in \mathcal{X}$ such that:

x^* = \underset{x \in \mathcal{X}}{\text{argmin}} f(x)

The surrogate model provides a posterior distribution over possible functions:

f(x) | \{x_{1:n}, y_{1:n}\} \sim \mathcal{N}(\mu_n(x), \sigma^2_n(x))

Where the mean and variance are defined as:

\mu_n(x) = K^T_* [K + \sigma^2 I]^{-1} y \\ \sigma^2_n(x) = K_{**} - K^T_* [K + \sigma^2 I]^{-1} K_*

The acquisition function, commonly Expected Improvement (EI), is defined as:

\alpha_{EI}(x) = \mathbb{E}[\max(0, f(x^+) - f(x)) | \{x_{1:n}, y_{1:n}\}]

Where $x^+$ is the current best solution.

The Resource Efficiency Problem

Traditional BO neglects that in cloud and HPC environments, function evaluation costs vary based on:

Model parameters (e.g., neural network hyperparameters)
System configuration (e.g., number of nodes, cores, memory allocation)

The evaluation cost can be modeled as:

c(x, \theta) = u(\theta) \times t(x, \theta)

Where:

$x \in \mathcal{X}$ represents model parameters
$\theta \in \Omega$ represents system parameters
$u(\theta)$ is the per-unit-time cost of configuration $\theta$
$t(x, \theta)$ is the time function for evaluating $x$ on configuration $\theta$

Existing cost-efficient approaches like EIpu (Expected Improvement per unit time) modify the acquisition function but still operate in a static computing environment, limiting potential cost savings.

The REBO Methodology

REBO decouples the optimization problem into two sequential steps:

Determine the next model parameters $x$ to evaluate using standard EI
Select the optimal system configuration $\theta$ for evaluating those parameters

The algorithm maintains two distinct Gaussian Process models:

$\mathcal{GP}_f: \mathcal{X} \rightarrow \mathbb{R}$ for the objective function
$\gamma: \mathcal{X} \times \Omega \rightarrow \mathbb{R}$ for the Resource Cost Model (RCM)

The optimal system configuration for a given set of model parameters $x'$ is determined by:

\theta' = \underset{\theta \in \Omega}{\text{argmin}} (\mu(\theta | x') + \lambda \, \sigma(\theta | x'))

Where $\lambda$ is a tunable parameter controlling the exploration-exploitation tradeoff in cost prediction.

Algorithm Breakdown

Algorithm: REBO (Resource Efficient Bayesian Optimization)
Input: 
   Objective function f, 
   model parameter space X, 
   system parameter space Ω
Output: Optimal (x*, θ*) pair

1. Initialize with random (x, θ) pairs and evaluate f
2. While stopping criterion not met:
   a. Update surrogate model GPf with observations
   b. Update Resource Cost Model γ with cost observations
   c. Compute next point: x(n+1) = argmax(αEI(x))
   d. Determine optimal system configuration: 
      θ(n+1) = argmin(μc(θ|x(n+1)) + λσc(θ|x(n+1)))
   e. Evaluate f at (x(n+1), θ(n+1))
   f. Update models with new observations
3. Return best solution found

Experimental Results

REBO was evaluated on synthetic benchmark functions (Ackley, Rosenbrock, Matyas) and real-world hyperparameter optimization problems for neural networks, decision trees, and random forests using multiple datasets. The algorithm demonstrated significant cost reductions compared to traditional approaches (EI, EIpu, CEI, EI-cool) across different system configurations, while maintaining comparable or identical convergence performance.

Technical Implications

REBO's effectiveness stems from several key technical innovations:

Dual surrogate modeling: By modeling both the objective function and its computational cost, REBO captures the complex relationship between model parameters, system configurations, and evaluation costs.
Decoupled acquisition strategy: Rather than creating a complex composite acquisition function, REBO separates parameter selection from configuration selection, simplifying the optimization process.
Upper confidence bound minimization: The algorithm uses uncertainty-aware selection of system configurations, balancing between known cost-effective configurations and exploring potentially better alternatives.
Computational overhead management: Despite maintaining an additional GP model, the computational overhead of REBO is negligible compared to the cost savings in function evaluations.

Implementation Considerations

Implementing REBO requires several technical considerations:

The Resource Cost Model requires appropriate kernel selection. The authors found that Radial Basis Function (RBF) kernels work effectively for modeling evaluation costs.
System configuration space $\Omega$ should be defined to capture relevant dimensions of the available computing infrastructure (nodes, cores, memory, etc.).
The exploration parameter $\lambda$ in the cost model should be tuned based on the reliability of cost observations and the risk tolerance of the optimization process.
The time cost function $t(x, \theta)$ is typically unknown a priori and must be learned through observations, which introduces a cold-start challenge.

Conclusion

REBO represents a significant advancement in practical Bayesian Optimization for distributed computing environments. By explicitly modeling and optimizing the computational resources used during the optimization process, it achieves substantial cost reductions without compromising optimization quality.

This approach is particularly valuable for industries and research domains where computational costs are a significant constraint, such as large-scale machine learning, computational physics, and drug discovery simulations.

References

Juneja, N., Desai, P., Wodo, O., Zola, J. and Chandola, V., "Resource Efficient Bayesian Optimization", In IEEE International Conference on Cloud Computing (IEEE Cloud), 2024.