Datasets
Zen Agentic Dataset - 8.47 billion tokens of real-world agentic programming
Zen Agentic Dataset
8.47 Billion Tokens of real-world agentic programming data.
Quick Stats
| Metric | Value |
|---|---|
| Total Tokens | 8.47 billion |
| Training Samples | 3.35 million |
| Validation Samples | 100,000 |
| Total Size | 27 GB |
| Repositories | 1,452 |
| Time Span | 15 years (2010-2025) |
Data Composition
| Component | Tokens | Percentage |
|---|---|---|
| Git History | 4.03B | 48% |
| Agentic Debug Sessions | 2.42B | 29% |
| Architecture Discussions | 1.14B | 13% |
| Code Review Sessions | 0.86B | 10% |
Domain Coverage
Agentic AI & LLM Infrastructure
- Model Context Protocol (MCP) - 260+ tool implementations
- Multi-agent orchestration
- Agent frameworks - planning, memory, reflection
- LLM Gateway - 100+ provider proxy
Web3 & Blockchain
- Smart contracts - Solidity, Vyper
- Consensus engines - Snow family, BFT, DAG
- Cross-chain bridges
- DeFi protocols - AMMs, lending, staking
Cryptography & Security
- Post-quantum - Kyber, Dilithium, SPHINCS+
- Threshold cryptography - MPC, DKG
- Zero-knowledge proofs
- Key management - HD wallets
Modern Development
- Full-stack TypeScript - Next.js, React
- Systems - Rust, Go, Python, C/C++
- DevOps - Docker, Kubernetes, CI/CD
- Real-time systems - Event sourcing, CQRS
Languages
| Tier 1 (Core) | Tier 2 (Infrastructure) | Tier 3 (Specialized) |
|---|---|---|
| Python | SQL | Solidity |
| TypeScript | Bash/Shell | C/C++ |
| JavaScript | YAML/TOML | Protobuf |
| Rust | Dockerfile | GraphQL |
| Go | Makefile | Move |
What Makes This Unique
Real Agentic Programming
Unlike synthetic datasets, this contains actual agentic programming sessions showing:
- Real debugging workflows
- Multi-file refactoring decisions
- Architecture discussions
- Tool use patterns
- Error recovery
Production Code Quality
- Code that shipped to production systems
- Security-audited smart contracts
- Performance-optimized infrastructure
- Battle-tested patterns from real deployments
Access & Licensing
This dataset is available for research and commercial licensing.
Request Access
Email: z@hanzo.ai
Please include:
- Intended use case (training, research, evaluation)
- Organization/affiliation
- Target ecosystem (if applicable)
- Licensing requirements
HuggingFace
View the public preview: hanzoai/zen-agentic-dataset