Training large language models requires more than algorithms. It requires infrastructure: distributed training frameworks, data pipelines, experiment tracking, and evaluation harnesses. Today we open source Training Gym, our complete platform for model development.

Why Training Gym?

Open AI development faces an infrastructure gap. Publishing model weights is valuable, but it’s not enough. Researchers need:

  • Reproducible training pipelines
  • Scalable distributed training
  • Standardized evaluation
  • Experiment management
  • Data processing tools

Training Gym provides all of this in an integrated, open source package.

Architecture

+------------------+     +------------------+     +------------------+
|   Data Pipeline  | --> |  Training Loop   | --> |   Evaluation     |
+------------------+     +------------------+     +------------------+
         |                       |                        |
         v                       v                        v
+------------------+     +------------------+     +------------------+
|   Data Registry  |     | Checkpoint Store |     |   Metrics DB     |
+------------------+     +------------------+     +------------------+
                                |
                                v
                    +------------------+
                    |   Experiment     |
                    |   Tracker        |
                    +------------------+

Data Pipeline

The data pipeline handles:

  • Ingestion: Load from local files, cloud storage, or streaming sources
  • Processing: Tokenization, filtering, deduplication
  • Mixing: Combine multiple data sources with configurable ratios
  • Streaming: Memory-efficient data loading for large corpora
from training_gym.data import DataPipeline, MixedDataset

pipeline = DataPipeline(
    sources=[
        ("s3://data/books", 0.3),
        ("s3://data/web", 0.5),
        ("s3://data/code", 0.2),
    ],
    tokenizer="zoo-labs/zen-tokenizer",
    sequence_length=2048,
)

dataset = pipeline.build()

Distributed Training

Training Gym supports multiple distributed training strategies:

  • Data Parallel: Simple replication across devices
  • Tensor Parallel: Split layers across devices
  • Pipeline Parallel: Split model stages across devices
  • ZeRO: Memory-efficient data parallelism
  • FSDP: Fully sharded data parallel (PyTorch native)

Configuration is declarative:

distributed:
  strategy: fsdp
  world_size: 64
  sharding_strategy: full_shard
  mixed_precision: bf16
  gradient_checkpointing: true

Training Loop

The training loop is modular and extensible:

from training_gym import Trainer, TrainingConfig

config = TrainingConfig(
    model="zen-7b",
    optimizer="adamw",
    learning_rate=1e-4,
    batch_size=2048,
    max_steps=100000,
    warmup_steps=2000,
    weight_decay=0.1,
)

trainer = Trainer(config)
trainer.fit(dataset)

Built-in features:

  • Learning rate scheduling
  • Gradient clipping
  • Mixed precision training
  • Automatic checkpointing
  • Loss spike detection and recovery

Evaluation

Standardized evaluation across common benchmarks:

from training_gym.eval import Evaluator

evaluator = Evaluator(
    benchmarks=["mmlu", "hellaswag", "winogrande", "arc"],
    model=model,
)

results = evaluator.run()

Supported benchmarks:

  • MMLU (multitask language understanding)
  • HellaSwag (commonsense reasoning)
  • WinoGrande (coreference resolution)
  • ARC (science questions)
  • TruthfulQA (truthfulness)
  • HumanEval (code generation)
  • GSM8K (math reasoning)

Experiment Tracking

Every training run is tracked:

from training_gym import Experiment

with Experiment("zen-7b-v2") as exp:
    exp.log_config(config)
    trainer.fit(dataset)
    exp.log_metrics(results)
    exp.log_artifacts(["model.pt", "tokenizer/"])

The experiment tracker records:

  • Hyperparameters
  • Training metrics (loss, gradient norms, learning rates)
  • Evaluation results
  • System metrics (GPU utilization, memory)
  • Artifacts (checkpoints, configs)

Reproducibility

Training Gym emphasizes reproducibility:

Deterministic Training

reproducibility:
  seed: 42
  deterministic_algorithms: true
  cublas_workspace_config: ":4096:8"

Same seed, same results (within floating point precision).

Environment Capture

Every experiment records:

  • Git commit hash
  • Package versions
  • Hardware configuration
  • CUDA/cuDNN versions

Configuration as Code

All configs are versioned YAML:

# experiments/zen-7b-v2.yaml
model:
  architecture: llama
  hidden_size: 4096
  num_layers: 32
  num_heads: 32
  vocab_size: 32000

training:
  batch_size: 2048
  learning_rate: 3e-4
  max_steps: 150000

Community Features

Training Gym includes tools for collaborative development:

Model Registry

Share and discover models:

from training_gym.registry import ModelRegistry

registry = ModelRegistry()

# Publish a model
registry.push("my-org/my-model", model, config)

# Load a model
model = registry.pull("zoo-labs/zen-7b")

Leaderboards

Automatic benchmark submission:

evaluator.submit_to_leaderboard(
    model_name="zen-7b-v2",
    organization="zoo-labs",
)

Dataset Sharing

from training_gym.data import DatasetRegistry

# Share processed datasets
DatasetRegistry.push("my-corpus", dataset, license="cc-by-4.0")

# Load shared datasets
dataset = DatasetRegistry.pull("zoo-labs/zen-pretrain-v1")

Getting Started

Installation

pip install training-gym

# For distributed training
pip install training-gym[distributed]

# For evaluation suite
pip install training-gym[eval]

Quick Start

from training_gym import quickstart

# Train a small model to verify setup
quickstart.train_tiny_model()

# Run evaluation suite
quickstart.evaluate_model("my-model")

Documentation

Full documentation at docs.training-gym.ai:

  • Getting started guide
  • Architecture overview
  • API reference
  • Example configurations
  • Troubleshooting

Roadmap

Q4 2023: Multi-modal training support Q1 2024: Reinforcement learning from human feedback (RLHF) integration Q2 2024: Federated training support Q3 2024: Automated hyperparameter optimization

Conclusion

Open AI development needs open infrastructure. Training Gym provides the tools to train, evaluate, and share models. Join us in building the future of open AI.

Repository: github.com/zoo-labs/training-gym


Zach Kelling is a co-founder of Zoo Labs Foundation.