Hugging Face Vision Trainer

Low Risk

by @sickn33Verified Source

4.4290 installsv1.0.0Updated May 25, 2026

How to Use

Run in Claude Code terminal

Step 1: Add Marketplace

/plugin marketplace add sickn33/antigravity-awesome-skills

Step 2: Install Plugin

/plugin install hugging-face-vision-trainer@antigravity-awesome-skills

About

Train or fine-tune vision models on Hugging Face Jobs for detection, classification, and SAM or SAM2 segmentation.

source: "https://github.com/huggingface/skills/tree/main/skills/huggingface-vision-trainer" name: hugging-face-vision-trainer description: Train or fine-tune vision models on Hugging Face Jobs for detection, classification, and SAM or SAM2 segmentation. risk: unknown

Vision Model Training on Hugging Face Jobs

Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.

When to Use This Skill

Use this skill when users want to:

Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local
Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local
Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts
Train bounding-box detectors on custom datasets
Train image classifiers on custom datasets
Train segmentation models on custom mask datasets with prompts
Run vision training jobs on Hugging Face Jobs infrastructure
Ensure trained vision models are permanently saved to the Hub

hugging-face-jobs — General HF Jobs infrastructure: token authentication, hardware flavors, timeout management, cost estimation, secrets, environment variables, scheduled jobs, and result persistence. Refer to the Jobs skill for any non-training-specific Jobs questions (e.g., "how do secrets work?", "what hardware is available?", "how do I pass tokens?").
hugging-face-model-trainer — TRL-based language model training (SFT, DPO, GRPO). Use that skill for text/language model fine-tuning.

Local Script Execution

Helper scripts use PEP 723 inline dependencies. Run them with uv run:

uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
uv run scripts/estimate_cost.py --help

Prerequisites Checklist

Before starting any training job, verify:

Account & Authentication

Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
Authenticated login: Check with hf_whoami() (tool) or hf auth whoami (terminal)
Token has write permissions
MUST pass token in job secrets — see directive #3 below for syntax (MCP tool vs Python API)

Dataset Requirements — Object Detection

Dataset must exist on Hub
Annotations must use the objects column with bbox, category (and optionally area) sub-fields
Bboxes can be in xywh (COCO) or xyxy (Pascal VOC) format — auto-detected and converted
Categories can be integers or strings — strings are auto-remapped to integer IDs
image_id column is optional — generated automatically if missing
ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

Dataset Requirements — Image Classification

Dataset must exist on Hub
Must have an image column (PIL images) and a label column (integer class IDs or strings)
The label column can be ClassLabel type (with names) or plain integers/strings — strings are auto-remapped
Common column names auto-detected: label, labels, class, fine_label
ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

Dataset Requirements — SAM/SAM2 Segmentation

Dataset must exist on Hub
Must have an image column (PIL images) and a mask column (binary ground-truth segmentation mask)
Must have a prompt — either:
- A prompt column with JSON containing {"bbox": [x0,y0,x1,y1]} or {"point": [x,y]}
- OR a dedicated bbox column with [x0,y0,x1,y1] values
- OR a dedicated point column with [x,y] or [[x,y],...] values
Bboxes should be in xyxy format (absolute pixel coordinates)
Example dataset: merve/MicroMat-mini (image matting with bbox prompts)
ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

Critical Settings

Timeout must exceed expected training time — Default 30min is TOO SHORT. See directive #6 for recommended values.
Hub push must be enabled — push_to_hub=True, hub_model_id="username/model-name", token in secrets

Dataset Validation

Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.

ALWAYS validate for unknown/custom datasets or any dataset you haven't trained with before. Skip for cppe-5 (the default in the training script).

Running the Inspector

Option 1: Via HF Jobs (recommended — avoids local SSL/dependency issues):

hf_jobs("uv", {
    "script": "path/to/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})

Option 2: Locally:

uv run scripts/dataset_inspector.py

Compatible Tools

Claude CodeCursor

Hugging Face Vision Trainer

About

source: "https://github.com/huggingface/skills/tree/main/skills/huggingface-vision-trainer" name: hugging-face-vision-trainer description: Train or fine-tune vision models on Hugging Face Jobs for detection, classification, and SAM or SAM2 segmentation. risk: unknown

Vision Model Training on Hugging Face Jobs

When to Use This Skill

Local Script Execution

Prerequisites Checklist

Account & Authentication

Dataset Requirements — Object Detection

Dataset Requirements — Image Classification

Dataset Requirements — SAM/SAM2 Segmentation

Critical Settings

Dataset Validation

Running the Inspector

Compatible Tools

Tags

Related Skills

RAG Engineer

"orchestrate-batch-refactor"

Docx Official

Azure AI Agents Persistent Java

Azure Search Documents Ts

Agent Framework Azure AI Py