Skip to main content

Scaling Experiment Tracking Across Teams

As ML teams grow from one person to dozens, experiment tracking must scale. Instead of each researcher storing runs locally, a centralized MLflow server lets everyone log, search, and compare experiments on shared infrastructure. In this article, you will learn to deploy MLflow for teams, configure remote storage, set up access control, and establish governance practices.

From Local to Centralized MLflow

When you develop locally, MLflow runs on your machine with experiments stored in mlruns/ (article 2). But when teams work together:

  • Data scientists cannot see each other's runs.
  • Runs are lost if a machine crashes.
  • There is no governance (who can deploy models, who trained this?).
  • Scaling is impossible (hundreds of researchers cannot all run on their own laptops).

A centralized MLflow server solves this: one server hosts all experiments, backed by a database and remote storage. Everyone logs runs to the server and shares a single source of truth.

Architecture: MLflow Server, Database, and Storage

A production MLflow setup has three components:

  1. MLflow Server — REST API that researchers query.
  2. Backend Store — Database (PostgreSQL, MySQL, or SQLite) storing experiment metadata, run data, and metrics.
  3. Artifact Store — Cloud storage (S3, GCS, Azure Blob) or network filesystem (NFS) storing logged files, models, and plots.

Here's the flow:

  • Researcher trains a model and calls mlflow.log_metric("accuracy", 0.92).
  • MLflow client sends data to the MLflow server (REST API).
  • Server writes metadata to the database and artifacts to storage.
  • Other researchers query the server to see the run.

Deploying MLflow Server

Option 1: Simple Docker Deployment

Use Docker to deploy MLflow with PostgreSQL. Create a docker-compose.yml:

version: "3.8"
services:
postgres:
image: postgres:15
environment:
POSTGRES_USER: mlflow
POSTGRES_PASSWORD: mlflow_password
POSTGRES_DB: mlflow_db
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"

mlflow:
image: ghcr.io/mlflow/mlflow:latest
command: >
mlflow server
--backend-store-uri postgresql://mlflow:mlflow_password@postgres:5432/mlflow_db
--default-artifact-root s3://my-bucket/mlflow-artifacts
--host 0.0.0.0
--port 5000
depends_on:
- postgres
ports:
- "5000:5000"
environment:
AWS_ACCESS_KEY_ID: <your_key>
AWS_SECRET_ACCESS_KEY: <your_secret>

volumes:
postgres_data:

Start the stack:

docker-compose up -d

MLflow is now accessible at http://localhost:5000. For production, use managed services like AWS RDS for the database and S3 for storage, and run the MLflow server on a VM or Kubernetes cluster.

Option 2: Managed MLflow (Databricks)

Databricks (the company behind MLflow) offers hosted MLflow. Create a Databricks workspace and access MLflow at no additional cost. All experiments are stored in Databricks' managed backend.

Configuring Researchers to Use the Server

Researchers must point their MLflow client to the server. Update their code:

import mlflow

# Point to the centralized server
mlflow.set_tracking_uri("http://mlflow.example.com:5000")

# Optional: set registry URI if using a separate registry server
mlflow.set_registry_uri("http://mlflow.example.com:5000")

# Now all runs are logged to the server
mlflow.set_experiment("Image Classification - ResNet")
mlflow.start_run()

# ... training code ...
mlflow.log_metric("accuracy", 0.95)
mlflow.end_run()

Or set via environment variable (good for containerized workflows):

export MLFLOW_TRACKING_URI=http://mlflow.example.com:5000
python train.py

Managing Access and Permissions

MLflow does not have built-in user authentication (as of 2026), so secure it with:

  1. Network isolation — Run the server in a private VPC, accessible only from your organization's network.
  2. Reverse proxy with authentication — Put an Nginx or reverse proxy in front of MLflow that requires login (e.g., via OAuth).
  3. API token-based access — Use a custom wrapper around MLflow that checks tokens before forwarding requests.

Here's an example Nginx reverse proxy config with basic auth:

server {
listen 80;
server_name mlflow.example.com;

location / {
auth_basic "MLflow Access";
auth_basic_user_file /etc/nginx/mlflow_users;

proxy_pass http://localhost:5000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}

Create a password file:

htpasswd -c /etc/nginx/mlflow_users alice
htpasswd /etc/nginx/mlflow_users bob

Now users must provide their username/password to access MLflow. In Python, use:

import mlflow
from urllib.parse import urljoin

tracking_uri = "http://alice:[email protected]/mlflow"
mlflow.set_tracking_uri(tracking_uri)

Organizing Experiments by Team and Project

With many teams logging to one server, organization is critical. Use consistent naming:

mlflow.set_experiment("cv/image_classification/resnet50")
mlflow.set_experiment("nlp/sentiment_analysis/bert")
mlflow.set_experiment("time_series/sales_forecast/lstm")

Pattern: <domain>/<problem>/<model> makes hierarchies clear. In the MLflow UI, experiments are sorted alphabetically, so this structure naturally groups related work.

Use tags to track ownership:

mlflow.set_tag("team", "computer_vision")
mlflow.set_tag("owner", "[email protected]")
mlflow.set_tag("project", "real_time_object_detection")
mlflow.set_tag("deadline", "2026-08-01")

In the MLflow UI, filter by tags to see all runs owned by a team or tracked to a project.

Governance and Approvals

As models move toward production, enforce governance. The model registry (article 4) is your tool:

  1. Staging — New candidate models. Data scientists log to MLflow and register to the registry.
  2. Model Review — Team lead reviews the model (metrics, code, reproducibility, bias analysis).
  3. Production — Only approved models can be transitioned to production.
  4. Archived — Old models are archived after retirement.

Enforce this with code:

from mlflow.tracking import MlflowClient

client = MlflowClient()

def request_model_approval(model_name: str, version: int, reason: str):
"""Request approval to transition a model to Production."""
client.set_model_version_tag(
name=model_name,
version=version,
key="approval_requested",
value="true"
)
client.set_model_version_tag(
name=model_name,
version=version,
key="approval_reason",
value=reason
)
print(f"Approval requested for {model_name} v{version}")

# Data scientist requests approval
request_model_approval(
"credit_scoring",
version=5,
reason="Achieved 92% accuracy on holdout test set. Precision: 0.91, Recall: 0.90"
)

# Later, ML lead reviews and approves
def approve_model(model_name: str, version: int):
"""Approve a model for production."""
client.set_model_version_tag(
name=model_name,
version=version,
key="approved_by",
value="[email protected]"
)
client.set_model_version_tag(
name=model_name,
version=version,
key="approval_date",
value="2026-06-02"
)

# Transition to Production
client.transition_model_version_stage(
name=model_name,
version=version,
stage="Production"
)
print(f"{model_name} v{version} approved and promoted to Production")

approve_model("credit_scoring", 5)

Shared Dataset Registry

Create a central registry of datasets (like the model registry for data). Store metadata about available datasets:

# dataset_registry.py
import json
from datetime import datetime

class DatasetRegistry:
def __init__(self, filepath: str = "datasets.json"):
self.filepath = filepath

def register_dataset(self, name: str, version: str, path: str, description: str):
"""Register a dataset."""
datasets = self.load()
if name not in datasets:
datasets[name] = []

datasets[name].append({
"version": version,
"path": path,
"description": description,
"registered_at": datetime.now().isoformat(),
"registered_by": "[email protected]",
})

self.save(datasets)
print(f"Registered {name} v{version}")

def list_datasets(self):
"""List all registered datasets."""
datasets = self.load()
for name, versions in datasets.items():
latest = versions[-1]
print(f"{name} (v{latest['version']}): {latest['description']}")

def get_dataset(self, name: str, version: str = None):
"""Get path to a dataset."""
datasets = self.load()
if name not in datasets:
raise KeyError(f"Dataset {name} not found")

if version is None:
# Return latest version
return datasets[name][-1]

for v in datasets[name]:
if v["version"] == version:
return v

raise KeyError(f"Dataset {name} v{version} not found")

def load(self):
try:
with open(self.filepath, "r") as f:
return json.load(f)
except FileNotFoundError:
return {}

def save(self, datasets):
with open(self.filepath, "w") as f:
json.dump(datasets, f, indent=2)

# Usage
registry = DatasetRegistry()

# Data engineer registers a dataset
registry.register_dataset(
name="sales_data",
version="2026_q2",
path="s3://data-bucket/sales_data/2026_q2/",
description="Sales data for Q2 2026. 1M transactions."
)

# Data scientist retrieves it
dataset_info = registry.get_dataset("sales_data", "2026_q2")
print(f"Using dataset at {dataset_info['path']}")

Share datasets.json in your Git repo so everyone knows which datasets are available.

Key Takeaways

  • Centralized MLflow server (with database and storage) enables team-wide experiment tracking.
  • Secure the server with network isolation and reverse proxy authentication.
  • Organize experiments with consistent naming (e.g., domain/problem/model) and tags.
  • Use the model registry with governance workflows (request -> review -> approve -> deploy).
  • Create a shared dataset registry so teams know which data versions are available.

Frequently Asked Questions

Can I run MLflow without a database backend?

Yes, using local file storage (--backend-store-uri file:./mlruns), but this does not work for multiple concurrent users. For teams, use a proper database.

Should I use Databricks-hosted MLflow or self-hosted?

Databricks is simpler (no DevOps work) and includes authentication/governance. Self-hosted gives you control and privacy. For small teams, Databricks is often worth the cost. For large enterprises, self-hosted on Kubernetes is common.

How do I migrate experiments from local MLflow to a server?

MLflow provides migration tools. Export local runs and import to the server: mlflow server --backend-store-uri ... --restore-dir <local_mlruns_path>. See the MLflow docs for details.

Can teams work on isolated experiments?

Yes. Use separate experiments per team or project. Only give each team access to their experiments (enforced via proxy authentication). Alternatively, use separate MLflow servers per team.

How do I ensure reproducibility across the team?

Mandate that all training is done on the server (not locally). Log code version (Git commit), data version, dependencies, and random seed. Article 3 covers reproducibility in detail.

Further Reading