
Hugging Face Model Trainer
Low Riskby @sickn33Verified Source
About
Train or fine-tune TRL language models on Hugging Face Jobs, including SFT, DPO, GRPO, and GGUF export.
source: "https://github.com/huggingface/skills/tree/main/skills/huggingface-llm-trainer" name: hugging-face-model-trainer description: Train or fine-tune TRL language models on Hugging Face Jobs, including SFT, DPO, GRPO, and GGUF export. license: Complete terms in LICENSE.txt risk: unknown
TRL Training on Hugging Face Jobs
Overview
Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.
TRL provides multiple training methods:
- SFT (Supervised Fine-Tuning) - Standard instruction tuning
- DPO (Direct Preference Optimization) - Alignment from preference data
- GRPO (Group Relative Policy Optimization) - Online RL training
- Reward Modeling - Train reward models for RLHF
For detailed TRL method documentation:
hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO
# etc.
See also: references/training_methods.md for method overviews and selection guidance
When to Use This Skill
Use this skill when users want to:
- Fine-tune language models on cloud GPUs without local infrastructure
- Train with TRL methods (SFT, DPO, GRPO, etc.)
- Run training jobs on Hugging Face Jobs infrastructure
- Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
- Ensure trained models are permanently saved to the Hub
- Use modern workflows with optimized defaults
When to Use Unsloth
Use Unsloth (references/unsloth.md) instead of standard TRL when:
- Limited GPU memory - Unsloth uses ~60% less VRAM
- Speed matters - Unsloth is ~2x faster
- Training large models (>13B) - memory efficiency is critical
- Training Vision-Language Models (VLMs) - Unsloth has
FastVisionModelsupport
See references/unsloth.md for complete Unsloth documentation and scripts/unsloth_sft_example.py for a production-ready training script.
Key Directives
When assisting with training jobs:
-
ALWAYS use
hf_jobs()MCP tool - Submit jobs usinghf_jobs("uv", {...}), NOT bashtrl-jobscommands. Thescriptparameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string tohf_jobs(). If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately usinghf_jobs(). -
Always include Trackio - Every training script should include Trackio for real-time monitoring. Use example scripts in
scripts/as templates. -
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
-
Use example scripts as templates - Reference
scripts/train_sft_example.py,scripts/train_dpo_example.py, etc. as starting points.
Local Script Execution
Repository scripts use PEP 723 inline dependencies. Run them with uv run:
uv run scripts/estimate_cost.py --help
uv run scripts/dataset_inspector.py --help
Prerequisites Checklist
Before starting any training job, verify:
✅ Account & Authentication
- Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
- Authenticated login: Check with
hf_whoami() - HF_TOKEN for Hub Push ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
- Token must have write permissions
- MUST pass
secrets={"HF_TOKEN": "$HF_TOKEN"}in job config to make token available (the$HF_TOKENsyntax references your actual token value)
✅ Dataset Requirements
- Dataset must exist on Hub or be loadable via
datasets.load_dataset() - Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
- ALWAYS validate unknown datasets before GPU training to prevent format failures (see Dataset Validation section below)
- Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)
⚠️ Critical Settings
- Timeout must exceed expected training time - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.
- Hub push must be enabled - Config:
push_to_hub=True,hub_model_id="username/model-name"; Job:secrets={"HF_TOKEN": "$HF_TOKEN"}
Asynchronous Job Guidelines
⚠️ IMPORTANT: Training jobs run asynchronously and can take hours
Action Required
When user requests training:
- Create the training script with Trackio included (use
scripts/train_sft_example.pyas t