# nanochat ## Docs - [Checkpoint Management](https://mintlify.wiki/karpathy/nanochat/advanced/checkpoint-management.md): Save, load, and resume training from model checkpoints - [Custom Identity](https://mintlify.wiki/karpathy/nanochat/advanced/custom-identity.md): Infuse your nanochat with personality using synthetic data - [Distributed Training](https://mintlify.wiki/karpathy/nanochat/advanced/distributed-training.md): Scale training across multiple GPUs using PyTorch distributed - [FP8 Training](https://mintlify.wiki/karpathy/nanochat/advanced/fp8-training.md): Accelerate training with 8-bit floating point computation using torchao - [base_eval.py](https://mintlify.wiki/karpathy/nanochat/api/base-eval.md): Evaluate base language models - [base_train.py](https://mintlify.wiki/karpathy/nanochat/api/base-train.md): Pretrain a base language model from scratch - [chat_eval.py](https://mintlify.wiki/karpathy/nanochat/api/chat-eval.md): Evaluate chat models on various benchmarks - [chat_rl.py](https://mintlify.wiki/karpathy/nanochat/api/chat-rl.md): Reinforcement learning on GSM8K via simplified GRPO - [chat_sft.py](https://mintlify.wiki/karpathy/nanochat/api/chat-sft.md): Supervised fine-tuning (SFT) for chat models - [Checkpoint Manager](https://mintlify.wiki/karpathy/nanochat/api/checkpoint-manager.md): Functions for saving and loading model checkpoints - [CORE Evaluation](https://mintlify.wiki/karpathy/nanochat/api/core-eval.md): Evaluate models using the CORE benchmark from DCLM - [Dataloader](https://mintlify.wiki/karpathy/nanochat/api/dataloader.md): Distributed data loading with BOS-aligned best-fit packing - [Engine](https://mintlify.wiki/karpathy/nanochat/api/engine.md): Efficient inference engine with KV cache for autoregressive generation - [GPT](https://mintlify.wiki/karpathy/nanochat/api/gpt.md): GPT transformer model implementation with modern architecture features - [Loss Evaluation](https://mintlify.wiki/karpathy/nanochat/api/loss-eval.md): Evaluate model performance using bits per byte metric - [Optimizers](https://mintlify.wiki/karpathy/nanochat/api/optim.md): Combined MuonAdamW optimizers for training - [Tasks](https://mintlify.wiki/karpathy/nanochat/api/tasks.md): Task modules for evaluation and fine-tuning - [Tokenizer](https://mintlify.wiki/karpathy/nanochat/api/tokenizer.md): BPE tokenizer with GPT-4 style splitting pattern - [Tokenizing Data Loader](https://mintlify.wiki/karpathy/nanochat/architecture/dataloader.md): Distributed data loader with BOS-aligned best-fit document packing - [Flash Attention 3 Integration](https://mintlify.wiki/karpathy/nanochat/architecture/flash-attention.md): Unified Flash Attention interface with automatic FA3/SDPA fallback based on hardware - [GPT Model Architecture](https://mintlify.wiki/karpathy/nanochat/architecture/gpt-model.md): Technical details of the nanochat GPT architecture including rotary embeddings, QK normalization, GQA, and sliding window attention - [MuonAdamW Optimizer](https://mintlify.wiki/karpathy/nanochat/architecture/optimizer.md): Combined optimizer using Muon for matrix parameters and AdamW for embeddings and scalars - [Contributing](https://mintlify.wiki/karpathy/nanochat/community/contributing.md): How to contribute to nanochat - [Community Guides](https://mintlify.wiki/karpathy/nanochat/community/guides.md): Tutorials, writeups, and community resources - [Time-to-GPT-2 Leaderboard](https://mintlify.wiki/karpathy/nanochat/community/leaderboard.md): Community leaderboard for training GPT-2 capability models - [CORE Metric](https://mintlify.wiki/karpathy/nanochat/evaluation/core-metric.md): Understanding the DCLM CORE benchmark for base model evaluation - [Loss Evaluation](https://mintlify.wiki/karpathy/nanochat/evaluation/loss-evaluation.md): Bits per byte (BPB) evaluation for comparing model performance - [Evaluation Tasks](https://mintlify.wiki/karpathy/nanochat/evaluation/tasks.md): Overview of all evaluation tasks for chat models - [CLI Chat Interface](https://mintlify.wiki/karpathy/nanochat/inference/chat-cli.md): Interactive command-line interface for chatting with NanoChat models - [Web UI Chat Interface](https://mintlify.wiki/karpathy/nanochat/inference/chat-web.md): Web-based chat interface with multi-GPU support and streaming responses - [Inference Engine](https://mintlify.wiki/karpathy/nanochat/inference/engine.md): High-performance inference engine with KV cache and tool use support - [Introduction](https://mintlify.wiki/karpathy/nanochat/introduction.md): Train your own GPT-2 for under $100 in just 3 hours - [Quickstart](https://mintlify.wiki/karpathy/nanochat/quickstart.md): Train and talk to your own GPT-2 in just 3 hours - [Base Pretraining](https://mintlify.wiki/karpathy/nanochat/training/pretraining.md): Train a base language model from scratch - [Reinforcement Learning](https://mintlify.wiki/karpathy/nanochat/training/reinforcement-learning.md): Post-training with RL on GSM8K math problems - [Scaling Laws](https://mintlify.wiki/karpathy/nanochat/training/scaling-laws.md): Understanding compute-optimal training and the depth parameter - [Supervised Fine-Tuning (SFT)](https://mintlify.wiki/karpathy/nanochat/training/supervised-finetuning.md): Fine-tune a pretrained model on conversational data - [Tokenization](https://mintlify.wiki/karpathy/nanochat/training/tokenization.md): Train a BPE tokenizer in the style of GPT-4 ## OpenAPI Specs - [openapi](https://mintlify.wiki/karpathy/nanochat/api-reference/openapi.json)