
About
Web scraping inteligente multi-estrategia. Extrai dados estruturados de paginas web (tabelas, listas, precos). Paginacao, monitoramento e export CSV/JSON.
name: web-scraper description: Web scraping inteligente multi-estrategia. Extrai dados estruturados de paginas web (tabelas, listas, precos). Paginacao, monitoramento e export CSV/JSON. risk: safe source: community date_added: '2026-03-06' author: renat tags:
- scraping
- data-extraction
- automation
- csv tools:
- claude-code
- antigravity
- cursor
- gemini-cli
- codex-cli
Web Scraper
Overview
Web scraping inteligente multi-estrategia. Extrai dados estruturados de paginas web (tabelas, listas, precos). Paginacao, monitoramento e export CSV/JSON.
When to Use This Skill
- When the user mentions "scraper" or related topics
- When the user mentions "scraping" or related topics
- When the user mentions "extrair dados web" or related topics
- When the user mentions "web scraping" or related topics
- When the user mentions "raspar dados" or related topics
- When the user mentions "coletar dados site" or related topics
Do Not Use This Skill When
- The task is unrelated to web scraper
- A simpler, more specific tool can handle the request
- The user needs general-purpose assistance without domain expertise
How It Works
Execute phases in strict order. Each phase feeds the next.
1. CLARIFY -> 2. RECON -> 3. STRATEGY -> 4. EXTRACT -> 5. TRANSFORM -> 6. VALIDATE -> 7. FORMAT
Never skip Phase 1 or Phase 2. They prevent wasted effort and failed extractions.
Fast path: If user provides URL + clear data target + the request is simple (single page, one data type), compress Phases 1-3 into a single action: fetch, classify, and extract in one WebFetch call. Still validate and format.
Capabilities
- Multi-strategy: WebFetch (static), Browser automation (JS-rendered), Bash/curl (APIs), WebSearch (discovery)
- Extraction modes: table, list, article, product, contact, FAQ, pricing, events, jobs, custom
- Output formats: Markdown tables (default), JSON, CSV
- Pagination: auto-detect and follow (page numbers, infinite scroll, load-more)
- Multi-URL: extract same structure across sources with comparison and diff
- Validation: confidence ratings (HIGH/MEDIUM/LOW) on every extraction
- Auto-escalation: WebFetch fails silently -> automatic Browser fallback
- Data transforms: cleaning, normalization, deduplication, enrichment
- Differential mode: detect changes between scraping runs
Web Scraper
Multi-strategy web data extraction with intelligent approach selection, automatic fallback escalation, data transformation, and structured output.
Phase 1: Clarify
Establish extraction parameters before touching any URL.
Required Parameters
| Parameter | Resolve | Default | |:--------------|:-------------------------------------|:---------------| | Target URL(s) | Which page(s) to scrape? | (required) | | Data Target | What specific data to extract? | (required) | | Output Format | Markdown table, JSON, CSV, or text? | Markdown table | | Scope | Single page, paginated, or multi-URL?| Single page |
Optional Parameters
| Parameter | Resolve | Default | |:--------------|:---------------------------------------|:-------------| | Pagination | Follow pagination? Max pages? | No, 1 page | | Max Items | Maximum number of items to collect? | Unlimited | | Filters | Data to exclude or include? | None | | Sort Order | How to sort results? | Source order | | Save Path | Save to file? Which path? | Display only | | Language | Respond in which language? | User's lang | | Diff Mode | Compare with previous run? | No |
Clarification Rules
- If user provides a URL and clear data target, proceed directly to Phase 2. Do NOT ask unnecessary questions.
- If request is ambiguous (e.g. "scrape this site"), ask ONLY: "What specific data do you want me to extract from this page?"
- Default to Markdown table output. Mention alternatives only if relevant.
- Accept requests in any language. Always respond in the user's language.
- If user says "everything" or "all data", perform recon first, then present what's available and let user choose.
Discovery Mode
When user has a topic but no specific URL:
- Use WebSearch to find the most relevant pages
- Present top 3-5 URLs with descriptions
- Let user choose which to scrape, or scrape all
- Proceed to Phase 2 with selected URL(s)
Example: "find and extract pricing data for CRM tools" -> WebSearch("CRM tools pricing comparison 2026") -> Present top results -> User selects -> Extract
Phase 2: Reconnaissance
Analyze the target page before extraction.
Step 2.1: Initial Fetch
Use WebFetch to retrieve and analyze the page structure:
WebFetch(
url = TARGET_URL,
prompt = "Analyze this page structure and report:
1. Page t