Skip to main content

Hyperparameter Tuning with RandomizedSearchCV

RandomizedSearchCV samples random hyperparameter combinations from specified distributions, instead of exhaustively testing every combination like GridSearchCV. For large parameter spaces (10^5+ configurations), random search is orders of magnitude faster than grid search while finding competitive solutions. Studies show that random search often outperforms grid search: it samples edge cases and parameter interactions that grid search misses. Use RandomizedSearchCV when your grid is too large, or when you are exploring an unfamiliar parameter space.

Imagine tuning 6 hyperparameters with 10 values each: that is 10^6 = 1,000,000 configurations. GridSearchCV would train 1 million models. RandomizedSearchCV samples, say, 100 random configurations—a 10,000x speedup for a surprisingly small accuracy loss.

More importantly, random search often finds better solutions than grid search on a fixed budget. Why? Grid search explores a lattice (axis-aligned); it misses off-axis interactions. Random search samples the entire space, catching parameter combinations that grid search may never evaluate.

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
import time
import numpy as np

iris = load_iris()
X, y = iris.data, iris.target

# Large parameter space
param_grid = {
'n_estimators': [50, 100, 150, 200, 250],
'max_depth': [5, 10, 15, 20, 25, 30],
'min_samples_split': [2, 3, 4, 5, 10],
'min_samples_leaf': [1, 2, 3, 4, 5],
'max_features': ['sqrt', 'log2']
}

# Total: 5 * 6 * 5 * 5 * 2 = 1500 configurations

# GridSearchCV: exhaustive search
start = time.time()
grid_search = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=5,
n_jobs=-1
)
grid_search.fit(X, y)
grid_time = time.time() - start
print(f"GridSearchCV (1500 configs): {grid_time:.1f}s")
print(f"Best score: {grid_search.best_score_:.3f}")

# RandomizedSearchCV: sample 100 random configurations
start = time.time()
random_search = RandomizedSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
n_iter=100, # Sample 100 random combinations
cv=5,
n_jobs=-1,
random_state=42
)
random_search.fit(X, y)
random_time = time.time() - start
print(f"RandomizedSearchCV (100 sampled): {random_time:.1f}s")
print(f"Best score: {random_search.best_score_:.3f}")

print(f"Speedup: {grid_time / random_time:.1f}x faster")

For this example, RandomizedSearchCV is typically 10-15x faster while achieving similar or better CV scores.

RandomizedSearchCV Basics

RandomizedSearchCV takes distributions instead of discrete lists. Use scipy.stats distributions for parameters with unbounded or continuous ranges:

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform, loguniform
from sklearn.svm import SVC
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

# Parameter distributions (not grids)
param_dist = {
'C': loguniform(0.01, 100), # Log-uniform between 0.01 and 100
'gamma': loguniform(0.001, 1), # Log-uniform for gamma (SVM)
'kernel': ['linear', 'rbf', 'poly'], # Discrete choices still work
}

# RandomizedSearchCV
random_search = RandomizedSearchCV(
SVC(),
param_dist,
n_iter=50, # Sample 50 random configurations
cv=5,
n_jobs=-1,
random_state=42 # Reproducible random sampling
)

random_search.fit(X, y)
print(f"Best parameters: {random_search.best_params_}")
print(f"Best CV score: {random_search.best_score_:.3f}")

Key distributions from scipy.stats:

  • randint(a, b) — random integer between a and b (exclusive upper bound)
  • uniform(a, b) — uniform continuous distribution between a and b
  • loguniform(a, b) — log-uniform (useful for C, alpha, learning_rate)

Choosing Distributions: Log-Uniform for Exponential Parameters

Some hyperparameters have exponential importance: C in SVM, alpha in regularization, learning_rate in neural networks. Use log-uniform distributions for these:

from scipy.stats import loguniform, uniform

# BAD: uniform distribution for C
# This samples 50% of points between 0.01-50.5, 50% between 50.5-100
# Most points are in the high range, missing fine-grained low values
param_dist_bad = {
'C': uniform(0.01, 100) # Not uniform on log scale!
}

# GOOD: log-uniform distribution for C
# This uniformly samples the log scale (0.01 to 100)
# Same probability density for 0.01-0.1 and 10-100
param_dist_good = {
'C': loguniform(0.01, 100)
}

# For regularization strength (lower is stronger)
param_dist = {
'alpha': loguniform(1e-5, 1e2) # Covers 0.00001 to 100
}

# For learning rate
param_dist = {
'learning_rate': loguniform(1e-4, 1e-1) # Covers 0.0001 to 0.1
}

Log-uniform is essential when parameters span multiple orders of magnitude.

Combining Discrete and Continuous Parameters

Mix lists and distributions freely:

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform, loguniform

param_dist = {
# Discrete choices
'loss': ['log_loss', 'exponential'],
'learning_rate': loguniform(0.001, 1),
'max_depth': randint(3, 15),
'min_samples_split': randint(2, 20),
'subsample': uniform(0.5, 1), # Between 0.5 and 1.5 (clipped to [0.5, 1])
}

random_search = RandomizedSearchCV(
GradientBoostingClassifier(random_state=42),
param_dist,
n_iter=50,
cv=5,
n_jobs=-1,
random_state=42
)

random_search.fit(X, y)

RandomizedSearchCV treats discrete choices and continuous distributions uniformly, sampling each parameter from its specified distribution.

Analyzing RandomizedSearchCV Results

Inspect the best configurations and variance:

import pandas as pd

# Convert results to DataFrame
results_df = pd.DataFrame(random_search.cv_results_)

# Top 10 configurations
print(results_df[['param_C', 'param_gamma', 'mean_test_score', 'std_test_score']]
.sort_values('mean_test_score', ascending=False)
.head(10))

# Distribution of CV scores
print(f"Score range: {results_df['mean_test_score'].min():.3f} to {results_df['mean_test_score'].max():.3f}")
print(f"Score variance: {results_df['mean_test_score'].std():.3f}")

High variance in CV scores suggests the hyperparameters matter; uniform scores suggest a robust model.

RandomizedSearchCV vs. GridSearchCV: When to Use Each

FactorGridSearchCVRandomizedSearchCV
Parameter space sizeSmall (10^2 - 10^4)Large (10^5+)
Time budgetUnlimitedLimited
Parameter importanceKnownUnknown
CoverageComplete latticeProbabilistic
ReproducibilityExactReproducible with random_state
Recommended use20-100 total configs100+ total configs

For small grids (under 100 configurations), use GridSearchCV for completeness. For large spaces or tight time budgets, use RandomizedSearchCV.

Fine-Tuning After RandomizedSearchCV

Use RandomizedSearchCV as a first pass to narrow the search space, then GridSearchCV for final refinement:

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from scipy.stats import loguniform

# Phase 1: Broad random search
param_dist = {
'C': loguniform(0.01, 100),
'gamma': loguniform(0.001, 1),
}

random_search = RandomizedSearchCV(
SVC(),
param_dist,
n_iter=50,
cv=5,
n_jobs=-1,
random_state=42
)
random_search.fit(X_train, y_train)
print(f"Random search best: {random_search.best_params_}")

# Phase 2: Fine grid around the best random result
best_C = random_search.best_params_['C']
best_gamma = random_search.best_params_['gamma']

param_grid_fine = {
'C': [best_C / 10, best_C, best_C * 10],
'gamma': [best_gamma / 10, best_gamma, best_gamma * 10],
}

grid_search_fine = GridSearchCV(
SVC(),
param_grid_fine,
cv=5,
n_jobs=-1
)
grid_search_fine.fit(X_train, y_train)
print(f"Fine grid best: {grid_search_fine.best_params_}")
print(f"Test score: {grid_search_fine.score(X_test, y_test):.3f}")

This two-phase approach balances exploration (random) and exploitation (grid).

Key Takeaways

  • RandomizedSearchCV samples random parameter configurations, 10-100x faster than grid search for large spaces.
  • Use log-uniform distributions for exponential parameters like C, alpha, and learning_rate.
  • RandomizedSearchCV often finds better solutions than grid search: it explores off-axis interactions.
  • Combine discrete choices and continuous distributions in the same parameter grid.
  • For exploration use random search; for refinement use grid search.

Frequently Asked Questions

How many iterations (n_iter) should I use?

As many as your time budget allows. A common heuristic: try 100-200 iterations for initial exploration. More iterations improve coverage but with diminishing returns. For very large parameter spaces, even 50 iterations can reveal good regions.

Does random_state affect reproducibility?

Yes. Set random_state to a fixed integer for reproducible sampling. Different random_state values will sample different configurations (as expected from randomization).

Should I use RandomizedSearchCV or Bayesian optimization?

For production tuning, Bayesian optimization (e.g., optuna, hyperopt) can be more efficient: it uses past results to guide future sampling. RandomizedSearchCV is simpler and requires no additional packages. For datasets under 1000 samples, the difference is small; use RandomizedSearchCV for simplicity.

Can I parallelize RandomizedSearchCV?

Yes, n_jobs=-1 uses all CPU cores. Each CV fold can run in parallel, and configurations are evaluated in parallel. Typical speedup is near linear until I/O or memory limits.

What if my best parameters are at the boundary of my distributions?

It suggests your distribution range is too narrow. Expand the range and re-run RandomizedSearchCV. For example, if best_C is close to 100 in loguniform(0.01, 100), try loguniform(0.01, 1000) next.

Further Reading