
Resource Efficient Bayesian Optimization: A Technical Analysis
Introduction
Bayesian Optimization (BO) has become a standard approach for optimizing expensive black-box functions, particularly in machine learning hyperparameter tuning. While traditional BO focuses on sample efficiency (minimizing function evaluations), it overlooks computational resource efficiency, a critical factor when optimizing models on cloud or high-performance computing (HPC) infrastructures with heterogeneous resources.
This post analyzes the Resource Efficient Bayesian Optimization (REBO) algorithm, which extends traditional BO to optimize not only what parameters to evaluate but also which computational resources to use for each evaluation.
Background: The Standard Bayesian Optimization Framework
Bayesian Optimization constructs a probabilistic surrogate model of the objective function, typically using Gaussian Process (GP) regression, and employs an acquisition function to determine the next point to evaluate. Standard BO can be formalized as:
Given an objective function , find such that:
The surrogate model provides a posterior distribution over possible functions:
Where the mean and variance are defined as:
The acquisition function, commonly Expected Improvement (EI), is defined as:
Where is the current best solution.
The Resource Efficiency Problem
Traditional BO neglects that in cloud and HPC environments, function evaluation costs vary based on:
- Model parameters (e.g., neural network hyperparameters)
- System configuration (e.g., number of nodes, cores, memory allocation)
The evaluation cost can be modeled as:
Where:
- represents model parameters
- represents system parameters
- is the per-unit-time cost of configuration
- is the time function for evaluating on configuration
Existing cost-efficient approaches like EIpu (Expected Improvement per unit time) modify the acquisition function but still operate in a static computing environment, limiting potential cost savings.
The REBO Methodology
REBO decouples the optimization problem into two sequential steps:
- Determine the next model parameters to evaluate using standard EI
- Select the optimal system configuration for evaluating those parameters
The algorithm maintains two distinct Gaussian Process models:
- for the objective function
- for the Resource Cost Model (RCM)
The optimal system configuration for a given set of model parameters is determined by:
Where is a tunable parameter controlling the exploration-exploitation tradeoff in cost prediction.
Algorithm Breakdown
Algorithm: REBO (Resource Efficient Bayesian Optimization)
Input:
Objective function f,
model parameter space X,
system parameter space Ω
Output: Optimal (x*, θ*) pair
1. Initialize with random (x, θ) pairs and evaluate f
2. While stopping criterion not met:
a. Update surrogate model GPf with observations
b. Update Resource Cost Model γ with cost observations
c. Compute next point: x(n+1) = argmax(αEI(x))
d. Determine optimal system configuration:
θ(n+1) = argmin(μc(θ|x(n+1)) + λσc(θ|x(n+1)))
e. Evaluate f at (x(n+1), θ(n+1))
f. Update models with new observations
3. Return best solution found
Experimental Results
REBO was evaluated on synthetic benchmark functions (Ackley, Rosenbrock, Matyas) and real-world hyperparameter optimization problems for neural networks, decision trees, and random forests using multiple datasets. The algorithm demonstrated significant cost reductions compared to traditional approaches (EI, EIpu, CEI, EI-cool) across different system configurations, while maintaining comparable or identical convergence performance.
Technical Implications
REBO's effectiveness stems from several key technical innovations:
-
Dual surrogate modeling: By modeling both the objective function and its computational cost, REBO captures the complex relationship between model parameters, system configurations, and evaluation costs.
-
Decoupled acquisition strategy: Rather than creating a complex composite acquisition function, REBO separates parameter selection from configuration selection, simplifying the optimization process.
-
Upper confidence bound minimization: The algorithm uses uncertainty-aware selection of system configurations, balancing between known cost-effective configurations and exploring potentially better alternatives.
-
Computational overhead management: Despite maintaining an additional GP model, the computational overhead of REBO is negligible compared to the cost savings in function evaluations.
Implementation Considerations
Implementing REBO requires several technical considerations:
-
The Resource Cost Model requires appropriate kernel selection. The authors found that Radial Basis Function (RBF) kernels work effectively for modeling evaluation costs.
-
System configuration space should be defined to capture relevant dimensions of the available computing infrastructure (nodes, cores, memory, etc.).
-
The exploration parameter in the cost model should be tuned based on the reliability of cost observations and the risk tolerance of the optimization process.
-
The time cost function is typically unknown a priori and must be learned through observations, which introduces a cold-start challenge.
Conclusion
REBO represents a significant advancement in practical Bayesian Optimization for distributed computing environments. By explicitly modeling and optimizing the computational resources used during the optimization process, it achieves substantial cost reductions without compromising optimization quality.
This approach is particularly valuable for industries and research domains where computational costs are a significant constraint, such as large-scale machine learning, computational physics, and drug discovery simulations.