Skip to main content

Regression Metrics: Measuring Prediction Accuracy

Regression metrics measure how close predictions are to actual continuous targets. MAE (Mean Absolute Error) is intuitive and robust to outliers. RMSE (Root Mean Squared Error) penalizes large errors heavily. R² (coefficient of determination) shows how much variance your model explains. Choosing the right metric matters: MAE is scale-dependent (good for domain experts), R² is scale-free (good for comparing models), RMSE detects outliers. Many practitioners use all three to get a complete picture of model performance.

Mean Absolute Error (MAE): Simple and Interpretable

MAE is the average absolute difference between predictions and actuals. It is directly interpretable in the units of your target:

from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
import numpy as np

# Load regression dataset
data = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42
)

# Train a model
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# MAE: average absolute error
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error: {mae:.2f}")
# Interpretation: On average, predictions are off by {mae:.2f} units

# Visualize predictions vs. actuals
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('Regression: Predictions vs. Actuals')
plt.show()

# MAE formula: mean(|y_true - y_pred|)
mae_manual = np.mean(np.abs(y_test - y_pred))
print(f"MAE (manual calculation): {mae_manual:.2f}")

MAE is robust to outliers because it does not square errors. If you have one target value of 1000 and most others under 100, RMSE will emphasize that outlier; MAE will not.

Mean Squared Error (MSE) and RMSE: Penalizing Large Errors

RMSE (Root Mean Squared Error) penalizes large errors quadratically, then takes the square root to return to original units:

from sklearn.metrics import mean_squared_error
import numpy as np

# MSE: mean of squared errors
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

print(f"Mean Squared Error: {mse:.2f}")
print(f"Root Mean Squared Error: {rmse:.2f}")

# RMSE is larger than MAE (it penalizes large errors more)
print(f"RMSE: {rmse:.2f}, MAE: {mae:.2f}")
print(f"Ratio RMSE/MAE: {rmse / mae:.2f}")

# When to use RMSE:
# 1. Outliers matter (you want to penalize them)
# 2. You want to emphasize large prediction errors
# 3. The task is sensitive to extreme deviations (e.g., medicine dosage)

# When to use MAE:
# 1. Outliers are noise (you want to ignore them)
# 2. All errors are equally important
# 3. The metric should be directly interpretable

RMSE and MAE tell different stories:

  • If RMSE is much larger than MAE, your model has outlier predictions.
  • If RMSE is close to MAE, errors are distributed uniformly.

R² (Coefficient of Determination): Proportion of Variance Explained

R² measures what fraction of the target's variance your model explains. It ranges from negative infinity to 1, where 1 is perfect prediction:

from sklearn.metrics import r2_score

# R² score
r2 = r2_score(y_test, y_pred)
print(f"R² Score: {r2:.3f}")

# Interpretation:
# R² = 0.85 means the model explains 85% of the variance in the target
# R² = 0.5 means random guessing explains 50% (baseline)
# R² < 0 means the model is worse than predicting the mean

# R² formula: 1 - (SS_res / SS_tot)
# SS_res = sum of squared residuals
# SS_tot = total sum of squares
ss_res = np.sum((y_test - y_pred) ** 2)
ss_tot = np.sum((y_test - y_test.mean()) ** 2)
r2_manual = 1 - (ss_res / ss_tot)
print(f"R² (manual calculation): {r2_manual:.3f}")

# Baseline: predicting the mean
y_pred_mean = np.full_like(y_test, y_test.mean(), dtype=float)
r2_baseline = r2_score(y_test, y_pred_mean)
print(f"R² for predicting mean (baseline): {r2_baseline:.3f}") # Always 0.0

R² is scale-free: it does not change if you multiply all targets by 10. This makes it ideal for comparing models across different datasets.

Mean Absolute Percentage Error (MAPE): Relative Error

MAPE expresses error as a percentage of actual values, useful when errors have different scales:

from sklearn.metrics import mean_absolute_percentage_error

# MAPE: mean(|y_true - y_pred| / |y_true|) * 100
mape = mean_absolute_percentage_error(y_test, y_pred)
print(f"Mean Absolute Percentage Error: {mape:.2f}%")

# Interpretation: On average, predictions are off by {mape:.2f}% of the actual value
# This is scale-independent, good for comparing across datasets

# MAPE has a drawback: undefined when y_true = 0
# If you have zeros in your target, use MAE or RMSE instead

MAPE is useful for forecasting and business metrics (sales, demand) where percent error is more meaningful than absolute error.

Comparing Regression Models

Use multiple metrics to compare models fairly:

from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR

# Train multiple models
models = {
'Linear Regression': LinearRegression(),
'Decision Tree': DecisionTreeRegressor(random_state=42),
'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42),
'SVR': SVR(kernel='rbf')
}

import pandas as pd

results = []
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)
mape = mean_absolute_percentage_error(y_test, y_pred)

results.append({'Model': name, 'MAE': mae, 'RMSE': rmse, 'R²': r2, 'MAPE': mape})

results_df = pd.DataFrame(results)
print(results_df.to_string(index=False))

Compare models across all metrics, not just R². A model with high R² but very high RMSE may have outlier predictions.

Residual Analysis: Detecting Model Problems

Residuals (prediction errors) reveal model biases and patterns:

import matplotlib.pyplot as plt

# Compute residuals
residuals = y_test - y_pred

# Plot residuals vs. predictions
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Residuals vs. predicted values (should be random scatter)
axes[0].scatter(y_pred, residuals, alpha=0.5)
axes[0].axhline(y=0, color='r', linestyle='--')
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('Residuals')
axes[0].set_title('Residuals vs. Predicted')

# Residuals histogram (should be roughly normal)
axes[1].hist(residuals, bins=20, edgecolor='black')
axes[1].set_xlabel('Residuals')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Residuals Distribution')

# Q-Q plot (tests normality)
from scipy import stats
stats.probplot(residuals, dist='norm', plot=axes[2])
axes[2].set_title('Q-Q Plot')

plt.tight_layout()
plt.show()

# Ideal residuals:
# 1. Centered at 0 (no bias)
# 2. Roughly normally distributed
# 3. Constant variance (homoscedasticity)
# 4. Random pattern (no trends)

# Red flags:
# 1. Residuals drift upward/downward (bias in predictions)
# 2. Residuals have a funnel pattern (heteroscedasticity)
# 3. Residuals follow a clear curve (nonlinear relationship not captured)

Residual plots are as important as aggregate metrics: they reveal how your model fails.

Metrics Reference: When to Use Each

MetricFormulaRobust to OutliersScale-DependentBest For
MAEmean(|y - y_pred|)YesYesInterpretability, outlier-prone data
RMSEsqrt(mean((y - y_pred)^2))NoYesPenalizing large errors
1 - SS_res/SS_totNoNoComparing models, variance explained
MAPEmean(|y - y_pred|/|y|) * 100NoNoForecasting, percent error

Key Takeaways

  • MAE is interpretable and robust to outliers; use when domain interpretation matters.
  • RMSE penalizes large errors; use when outliers are meaningful, not noise.
  • R² is scale-free and ideal for comparing models; it shows fraction of variance explained.
  • Always visualize residuals: they reveal biases, heteroscedasticity, and nonlinear patterns.
  • Use multiple metrics for a complete picture; no single metric tells the full story.

Frequently Asked Questions

Can R² be negative?

Yes. Negative R² means your model performs worse than predicting the mean (baseline). This happens with very poor models or when hyperparameters are misconfigured. R² near 0 means your model performs as well as the baseline.

Should I use MAE or RMSE?

If outliers are important (e.g., predicting extreme weather), use RMSE. If outliers are noise, use MAE. Many practitioners report both; their ratio reveals outlier sensitivity.

What is a "good" R² value?

It depends on your domain. In physics, R² < 0.99 is poor. In social sciences, R² > 0.3 is acceptable. In business forecasting, R² > 0.7 is good. Compare your R² to baseline models, not to arbitrary thresholds.

Can I use classification metrics for regression?

No. Classification metrics (precision, recall, AUC) require discrete labels. For regression, use MAE, RMSE, R², or MAPE.

How do I handle MAPE with zero values in the target?

MAPE is undefined when y_true = 0. Use MAE or RMSE instead, or use "symmetric MAPE" which divides by (|y_true| + |y_pred|) / 2 to handle zeros gracefully.

Further Reading