Convert Python Models to ONNX: Export Guide
ONNX (Open Neural Network Exchange) is an open standard that represents machine learning models as computation graphs, independent of the training framework. Once converted to ONNX, your scikit-learn, PyTorch, or TensorFlow model can run in C++, JavaScript, Java, C#, or mobile without retraining. This unlocks cross-platform deployment and edge inference, but conversion requires careful testing because not all Python operations translate to ONNX.
This article teaches you how to export models from popular frameworks to ONNX, validate accuracy post-conversion, and serve ONNX models in Python at production scale.
Why Convert to ONNX?
ONNX solves three key deployment challenges:
- Cross-language serving: Train in PyTorch, infer in C++. Deploy on Windows, Linux, macOS, iOS, and Android with the same model.
- Performance optimization: ONNX Runtime optimizes inference for your hardware (CPU, GPU, TPU) automatically. Often 2–5× faster than native framework inference.
- Framework independence: You're not locked into TensorFlow or PyTorch. Switch frameworks without retraining.
The trade-off is that ONNX cannot represent all Python semantics. Custom operations, dynamic shapes, and conditional logic may require workarounds.
Converting Scikit-Learn to ONNX
Scikit-learn models are simple to convert using skl2onnx:
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
# Train a pipeline
pipeline = Pipeline([
("scaler", StandardScaler()),
("classifier", RandomForestClassifier(n_estimators=100, random_state=42))
])
X = np.random.randn(100, 4)
y = np.random.randint(0, 3, 100)
pipeline.fit(X, y)
# Define input shape (None = variable batch size, 4 = feature count)
initial_type = [("float_input", FloatTensorType([None, 4]))]
# Convert to ONNX
onnx_model = convert_sklearn(pipeline, initial_types=initial_type)
# Save it
with open("pipeline.onnx", "wb") as f:
f.write(onnx_model.SerializeToString())
print("Pipeline exported to pipeline.onnx")
Supported scikit-learn estimators: linear regression, logistic regression, SVM, random forests, gradient boosting, KMeans, PCA, and most preprocessing transformers. Check the skl2onnx documentation for the full list and any unsupported operations.
Converting XGBoost to ONNX
XGBoost has native ONNX export:
import xgboost as xgb
from xgboost import XGBClassifier
import numpy as np
# Train an XGBoost model
X = np.random.randn(100, 4)
y = np.random.randint(0, 3, 100)
model = XGBClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
# Convert to ONNX (XGBoost 1.5+)
onnx_model = model.get_booster().to_onnx(
initial_types=[("float_input", FloatTensorType([None, 4]))],
onnx_namespace="https://github.com/onnx/onnx/blob/main/onnx/defs/schema.fbs",
doc_string="XGBoost classifier"
)
with open("xgboost_model.onnx", "wb") as f:
f.write(onnx_model.SerializeToString())
XGBoost's ONNX export is mature and handles tree models, ranking, and survival analysis.
Converting PyTorch to ONNX
PyTorch has built-in ONNX export via torch.onnx.export:
import torch
import torch.nn as nn
# Define a simple neural network
class SimpleNet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(4, 16)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(16, 3)
def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
# Train or load the model
model = SimpleNet()
model.eval() # Set to evaluation mode
# Create a dummy input (batch_size=1, features=4)
dummy_input = torch.randn(1, 4)
# Export to ONNX
torch.onnx.export(
model,
dummy_input,
"pytorch_model.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},
opset_version=14 # ONNX opset version
)
print("PyTorch model exported to pytorch_model.onnx")
The dynamic_axes parameter allows variable batch sizes; otherwise, the exported model only accepts the input shape you specify. Always test with a dummy input first.
Converting TensorFlow/Keras to ONNX
TensorFlow SavedModel can be converted to ONNX using tf2onnx:
import tensorflow as tf
import tf2onnx
# Train a Keras model
model = tf.keras.Sequential([
tf.keras.layers.Dense(16, activation="relu", input_shape=(4,)),
tf.keras.layers.Dense(3, activation="softmax")
])
model.compile(optimizer="adam", loss="categorical_crossentropy")
# ... train model ...
model.save("saved_model")
# Convert to ONNX using command line (tf2onnx >= 1.12)
# python -m tf2onnx.convert --saved-model saved_model --output_file model.onnx
# Or programmatically:
import tf2onnx.convert
spec = (tf.TensorSpec((None, 4), tf.float32, name="input"),)
output_path = "model.onnx"
model_proto, _ = tf2onnx.convert.from_keras(model, input_signature=spec, output_path=output_path)
print("TensorFlow model exported to model.onnx")
Validating Accuracy After Conversion
Critical: always test that ONNX output matches the original framework output. Rounding errors and operator differences can cause accuracy loss:
import onnxruntime as rt
import numpy as np
# Load the original PyTorch model
original_model = SimpleNet()
original_model.eval()
# Load the ONNX model
sess = rt.InferenceSession("pytorch_model.onnx", providers=["CPUExecutionProvider"])
# Create test data
X_test = np.random.randn(10, 4).astype(np.float32)
# Get predictions from both
with torch.no_grad():
pytorch_output = original_model(torch.from_numpy(X_test)).numpy()
onnx_output = sess.run(None, {"input": X_test})[0]
# Compare
max_diff = np.abs(pytorch_output - onnx_output).max()
mean_diff = np.abs(pytorch_output - onnx_output).mean()
print(f"Max difference: {max_diff}, Mean difference: {mean_diff}")
if max_diff > 1e-4:
print("WARNING: Large discrepancy between PyTorch and ONNX outputs!")
else:
print("ONNX export validated successfully")
Differences up to 1e-5 are normal due to floating-point precision; anything larger may indicate a conversion error.
Serving ONNX Models with ONNX Runtime
ONNX Runtime is the official inference engine; it's faster and more portable than running models in their native frameworks:
import onnxruntime as rt
import numpy as np
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
# Load ONNX model once
sess = rt.InferenceSession(
"model.onnx",
providers=["CPUExecutionProvider"] # or "CUDAExecutionProvider" for GPU
)
class PredictionRequest(BaseModel):
features: list[float]
@app.post("/predict-onnx")
async def predict_onnx(request: PredictionRequest):
# Get input and output names
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name
# Prepare input
X = np.array(request.features, dtype=np.float32).reshape(1, -1)
# Infer
output = sess.run([output_name], {input_name: X})
pred_proba = output[0][0].tolist()
pred_class = int(np.argmax(output[0][0]))
return {
"prediction": pred_class,
"probabilities": pred_proba
}
Comparison Table: Export Frameworks
| Framework | Tool | Opset Support | Dynamic Shapes | Custom Ops | Testing Ease |
|---|---|---|---|---|---|
| scikit-learn | skl2onnx | High | Good | Limited | Very Easy |
| XGBoost | xgb.to_onnx() | Good | Good | None | Easy |
| PyTorch | torch.onnx.export | Very High | Good | Possible | Moderate |
| TensorFlow | tf2onnx | Good | Moderate | Possible | Moderate |
| LightGBM | onnxmltools | Good | Good | None | Easy |
Key Takeaways
- ONNX converts models to a framework-agnostic format, enabling cross-language serving.
- Scikit-learn, XGBoost, PyTorch, and TensorFlow all have mature ONNX export pathways.
- Always validate ONNX output against the original framework on test data before deploying.
- ONNX Runtime typically infers faster than native frameworks due to hardware optimization.
- Dynamic axes (variable batch size) are essential for production models; static shapes limit flexibility.
Frequently Asked Questions
Can I convert a custom PyTorch module to ONNX?
Yes, as long as the custom code uses standard PyTorch operations. If you use custom C++ extensions or Python function calls, ONNX export will fail; rewrite those sections using standard ops.
What if ONNX conversion fails?
Check the opset version. Higher opsets support newer operations. Try opset_version=14 or opset_version=18 (2026 standard). If still failing, reduce model complexity: remove unsupported layers, replace custom ops with ONNX-compatible equivalents.
Is ONNX smaller than the original model file?
Usually yes, by 10–30%, because ONNX uses efficient binary encoding and removes framework-specific metadata. Exact savings depend on the model.
Can I use ONNX on edge devices (mobile, IoT)?
Yes. ONNX.js runs models in browsers via WebAssembly. ONNX Runtime supports iOS and Android. Export with dynamic shapes and quantization for smaller files.
Further Reading
- ONNX Official Specification — format reference and model zoo.
- ONNX Runtime Performance Tuning — optimization guide.
- skl2onnx Documentation — scikit-learn conversion reference.
- PyTorch ONNX Export — PyTorch ONNX guide.