# Web Scraping & Data Extraction Engine 🕸️

Complete web scraping methodology for AI agents and developers — from legal compliance to production-scale data pipelines.

## What This Skill Does

Turns your AI agent into a web scraping expert that:

- **Assesses legality** before touching any site (robots.txt, ToS, GDPR/CCPA)
- **Selects the right tool** (HTTP clients → Scrapy → Playwright → managed services)
- **Defeats anti-bot detection** with proxy rotation, fingerprint diversity, and stealth patterns
- **Builds data pipelines** with validation, deduplication, and structured storage
- **Monitors health** with breakage detection, success rate tracking, and alerting
- **Scales efficiently** from single-site to millions of pages

## Install

```bash
clawhub install afrexai-web-scraping-engine
```

## Quick Start

Tell your agent:
- "Check if I can scrape example.com/products"
- "Build a price monitoring scraper for 3 competitor sites"
- "My scraper keeps getting blocked — help"
- "Extract product data from this URL"

## What's Inside

- **Legal compliance framework** with case law references and decision rules
- **Tool selection matrix** (8 tools compared across 5 dimensions)
- **Anti-detection strategies** (proxy tiers, stealth configs, Cloudflare bypass)
- **Code patterns** for pagination, JS rendering, authentication, change detection
- **Data pipeline** with validation, deduplication, cleaning, and storage
- **5 complete scraping patterns** (e-commerce, jobs, news, social, real estate)
- **Production operations** — monitoring dashboard, breakage detection, runbook
- **100-point quality scoring** rubric

## ⚡ Level Up

This free skill covers methodology. For industry-specific data extraction strategies:

- **[SaaS Context Pack](https://afrexai-cto.github.io/context-packs/)** — Competitor monitoring, pricing intelligence
- **[Ecommerce Context Pack](https://afrexai-cto.github.io/context-packs/)** — Product data, price tracking at scale
- **[Real Estate Context Pack](https://afrexai-cto.github.io/context-packs/)** — Listing aggregation, market analysis

Each pack: **$47** — complete AI agent context for your vertical.

## 🔗 More Free Skills by AfrexAI

- [afrexai-competitive-intel](https://clawhub.com/skills/afrexai-competitive-intel) — Competitive intelligence system
- [afrexai-data-analyst](https://clawhub.com/skills/afrexai-data-analyst) — Data analysis methodology
- [afrexai-seo-content-engine](https://clawhub.com/skills/afrexai-seo-content-engine) — SEO content creation
- [afrexai-api-architect](https://clawhub.com/skills/afrexai-api-architect) — API design & architecture
- [afrexai-lead-hunter](https://clawhub.com/skills/afrexai-lead-hunter) — Lead generation system

**Browse all AfrexAI skills →** [clawhub.com](https://clawhub.com)

**Full storefront →** [afrexai-cto.github.io/context-packs](https://afrexai-cto.github.io/context-packs/)
