
Hugging Face Vision Trainer
Low Riskby @sickn33Verified Source
About
Train or fine-tune vision models on Hugging Face Jobs for detection, classification, and SAM or SAM2 segmentation.
source: "https://github.com/huggingface/skills/tree/main/skills/huggingface-vision-trainer" name: hugging-face-vision-trainer description: Train or fine-tune vision models on Hugging Face Jobs for detection, classification, and SAM or SAM2 segmentation. risk: unknown
Vision Model Training on Hugging Face Jobs
Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.
When to Use This Skill
Use this skill when users want to:
- Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local
- Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local
- Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts
- Train bounding-box detectors on custom datasets
- Train image classifiers on custom datasets
- Train segmentation models on custom mask datasets with prompts
- Run vision training jobs on Hugging Face Jobs infrastructure
- Ensure trained vision models are permanently saved to the Hub
Related Skills
hugging-face-jobs— General HF Jobs infrastructure: token authentication, hardware flavors, timeout management, cost estimation, secrets, environment variables, scheduled jobs, and result persistence. Refer to the Jobs skill for any non-training-specific Jobs questions (e.g., "how do secrets work?", "what hardware is available?", "how do I pass tokens?").hugging-face-model-trainer— TRL-based language model training (SFT, DPO, GRPO). Use that skill for text/language model fine-tuning.
Local Script Execution
Helper scripts use PEP 723 inline dependencies. Run them with uv run:
uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
uv run scripts/estimate_cost.py --help
Prerequisites Checklist
Before starting any training job, verify:
Account & Authentication
- Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
- Authenticated login: Check with
hf_whoami()(tool) orhf auth whoami(terminal) - Token has write permissions
- MUST pass token in job secrets — see directive #3 below for syntax (MCP tool vs Python API)
Dataset Requirements — Object Detection
- Dataset must exist on Hub
- Annotations must use the
objectscolumn withbbox,category(and optionallyarea) sub-fields - Bboxes can be in xywh (COCO) or xyxy (Pascal VOC) format — auto-detected and converted
- Categories can be integers or strings — strings are auto-remapped to integer IDs
image_idcolumn is optional — generated automatically if missing- ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)
Dataset Requirements — Image Classification
- Dataset must exist on Hub
- Must have an
imagecolumn (PIL images) and alabelcolumn (integer class IDs or strings) - The label column can be
ClassLabeltype (with names) or plain integers/strings — strings are auto-remapped - Common column names auto-detected:
label,labels,class,fine_label - ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)
Dataset Requirements — SAM/SAM2 Segmentation
- Dataset must exist on Hub
- Must have an
imagecolumn (PIL images) and amaskcolumn (binary ground-truth segmentation mask) - Must have a prompt — either:
- A
promptcolumn with JSON containing{"bbox": [x0,y0,x1,y1]}or{"point": [x,y]} - OR a dedicated
bboxcolumn with[x0,y0,x1,y1]values - OR a dedicated
pointcolumn with[x,y]or[[x,y],...]values
- A
- Bboxes should be in xyxy format (absolute pixel coordinates)
- Example dataset:
merve/MicroMat-mini(image matting with bbox prompts) - ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)
Critical Settings
- Timeout must exceed expected training time — Default 30min is TOO SHORT. See directive #6 for recommended values.
- Hub push must be enabled —
push_to_hub=True,hub_model_id="username/model-name", token insecrets
Dataset Validation
Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.
ALWAYS validate for unknown/custom datasets or any dataset you haven't trained with before. Skip for cppe-5 (the default in the training script).
Running the Inspector
Option 1: Via HF Jobs (recommended — avoids local SSL/dependency issues):
hf_jobs("uv", {
"script": "path/to/dataset_inspector.py",
"script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})
Option 2: Locally:
uv run scripts/dataset_inspector.py