Firecrawl Proxy Setup with Mobile IPs in 2026
Firecrawl raised $14.5M Series A from Nexus Venture Partners, Shopify CEO Tobias Lรผtke and Y Combinator in August 2025, powering 350,000+ developers at Shopify, Zapier and Replit. This guide shows you exactly how to pair it with ProxyStyler 4G mobile proxies for 97%+ success on the hardest targets on the web.
The AI-Native Scraping API
Firecrawl converts any URL into clean, LLM-ready markdown and structured JSON with a single HTTP call โ purpose-built for AI agents, RAG pipelines, and autonomous research systems.
Built for AI Agents
Every response is pre-cleaned: navigation, cookie banners, ads and tracking scripts are stripped. What arrives at your LLM is tokenisable markdown ready for ChatGPT, Claude, Gemini or Llama 3.3.
- JS rendering via Playwright on every request
- Built-in PDF and image OCR on Enterprise
- Natural-language /extract with schema validation
- LangChain + LlamaIndex first-party loaders
Fortune 500 Trusted
Backed by Nexus Venture Partners and Shopify CEO Tobias Lรผtke, Firecrawl powers data pipelines at Shopify, Zapier, Replit, and hundreds of AI-first startups shipping agents in 2026.
- $16.2M total raised across seed + Series A
- Y Combinator alumni (W24 batch)
- Viral May 2025 $1M role to "hire AI agents"
- Firecrawl v2 released alongside funding
Firecrawl v2 Pricing Breakdown
Firecrawl sells credits, not requests. Understanding credit consumption is the single biggest cost-lever in 2026.
| Plan | Price/mo | Credits | Concurrency | Cost / 1K basic pages | Cost / 1K extract |
|---|---|---|---|---|---|
| Free | $0 | 500 one-time | 2 | $0.00 | N/A (55 pages) |
| Hobby | $16 | 3,000 | 5 | $5.33 | $48.00 |
| Standard | $83 | 100,000 | 50 | $0.83 | $7.47 |
| Growth | $333 | 500,000 | 100 | $0.67 | $6.00 |
| Enterprise | $1.5K+ | Custom + BYOP | Custom | ~$0.30 | ~$2.70 |
Hidden cost: failed requests
Firecrawl refunds credits on 4xx/5xx responses, but it still counts your time and blocks your concurrency slot. On hardened targets (Instagram, TikTok, Cloudflare Enterprise) the built-in pool fails 50-70% of requests โ meaning you burn quota trying. Routing through ProxyStyler 4G mobile IPs pushes success above 97%, giving you 2.3x effective credit efficiency.
Why Add Mobile Proxies to Firecrawl
Firecrawl gets you 80% of the web for free. ProxyStyler 4G proxies get you the other 20% โ the valuable, hardened, anti-bot-protected targets where AI agents earn their keep.
April 2026 Success-Rate Benchmark (50,000 pages per target)
CGNAT Trust Shield
Mobile carriers NAT thousands of real subscribers behind every IPv4. Blocking one means blocking paying customers โ platforms simply will not do it.
Account Isolation
Assign one mobile IP per scraped account or session. Firecrawl jobs that touch authenticated endpoints stop triggering cross-session fingerprint flags.
Cloudflare Bypass
Combined with Firecrawl v2 stealth Chromium, ProxyStyler mobile IPs clear Turnstile, Bot Fight Mode and most Enterprise WAF policies out of the box.
Lower Effective Cost
At 97.4% success you consume 2.3x fewer credits on hardened targets than the built-in pool โ mobile proxies pay for themselves on Standard tier and up.
On-Demand Rotation
HTTPS-triggered IP rotation per port lets you rotate every request, every N pages, or on 403/429 โ all controlled from your Firecrawl job config.
Geo-Targeting
Choose exact carrier and city. Scrape localised pricing, region-gated content or geo-fenced campaigns that datacenter IPs cannot see.
Bring Your Own Proxy Setup
Three integration paths depending on whether you run hosted Firecrawl, self-hosted, or a hybrid forwarder pattern.
Self-Hosted Docker
Cleanest path. Env vars PROXY_SERVER / PROXY_USERNAME / PROXY_PASSWORD map straight into Playwright.
Enterprise BYOP
Native BYOP flag in job config. Firecrawl hosted infra routes every request through your ProxyStyler IP pool.
Hybrid Forwarder
Thin FastAPI or Cloudflare Worker in front of Firecrawl โ rewrites egress through mobile IPs on any tier.
docker-compose.yml โ self-hosted Firecrawl v2 with ProxyStyler
version: "3.9"
services:
firecrawl-api:
image: firecrawl/firecrawl:v2.3.1
ports:
- "3002:3002"
environment:
# --- ProxyStyler 4G BYOP ---
PROXY_SERVER: "https://proxy.proxystyler.com:8000"
PROXY_USERNAME: "${CORONIUM_USER}"
PROXY_PASSWORD: "${CORONIUM_PASS}"
PROXY_ROTATE_URL: "https://proxystyler.com/api/rotate?key=${CORONIUM_KEY}"
# --- Firecrawl core ---
REDIS_URL: "redis://redis:6379"
PLAYWRIGHT_MICROSERVICE_URL: "http://playwright:3003"
USE_DB_AUTHENTICATION: "false"
TEST_API_KEY: "fc-local-dev-key"
depends_on: [redis, playwright]
playwright:
image: firecrawl/playwright-service:latest
environment:
PROXY_SERVER: "${PROXY_SERVER}"
PROXY_USERNAME: "${PROXY_USERNAME}"
PROXY_PASSWORD: "${PROXY_PASSWORD}"
ports:
- "3003:3003"
redis:
image: redis:7-alpine
ports: ["6379:6379"]Python โ /scrape with ProxyStyler mobile proxy
from firecrawl import FirecrawlApp
import os, requests
# Self-hosted Firecrawl already configured with ProxyStyler via docker-compose.
app = FirecrawlApp(
api_key="fc-local-dev-key",
api_url="http://localhost:3002",
)
# Rotate IP before a sensitive target
requests.get(f"https://proxystyler.com/api/rotate?key={os.environ['CORONIUM_KEY']}")
result = app.scrape_url(
"https://www.instagram.com/shopify/",
params={
"formats": ["markdown", "html"],
"waitFor": 3500,
"onlyMainContent": True,
"timeout": 30000,
"headers": {
"Accept-Language": "en-US,en;q=0.9",
"X-ProxyStyler-Session": "shopify-scrape-042026",
},
},
)
print(f"Success: {result['success']}")
print(f"Credits used: {result['metadata']['creditsUsed']}")
print(result["markdown"][:500])REST โ /crawl a full domain with rotation
# Kick off a full-site crawl, 500 pages, rotate every 25
curl -X POST https://api.firecrawl.dev/v2/crawl \
-H "Authorization: Bearer $FIRECRAWL_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://shop.example.com",
"limit": 500,
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": true,
"headers": {
"X-Proxy-Pool": "proxystyler-4g-us-east",
"X-Rotate-Every": "25"
}
},
"includePaths": ["/products/*", "/collections/*"],
"excludePaths": ["/account/*", "/cart"],
"maxDepth": 5
}'
# Poll the crawl job
curl https://api.firecrawl.dev/v2/crawl/$JOB_ID \
-H "Authorization: Bearer $FIRECRAWL_KEY"Node.js โ /extract with LLM schema
import FirecrawlApp from "@mendable/firecrawl-js";
import { z } from "zod";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_KEY!,
apiUrl: "http://localhost:3002", // self-hosted + ProxyStyler
});
const ProductSchema = z.object({
title: z.string(),
price: z.number(),
currency: z.string(),
inStock: z.boolean(),
rating: z.number().optional(),
reviewCount: z.number().optional(),
});
const result = await firecrawl.extract({
urls: [
"https://www.amazon.com/dp/B0D7HWDQFX",
"https://www.amazon.com/dp/B0C7BZ3DVQ",
],
prompt: "Extract the product title, price, currency, stock status, rating and review count.",
schema: ProductSchema,
enableWebSearch: false,
// ProxyStyler mobile IPs route transparently via the forwarder
});
console.log(result.data);
// [{ title: "...", price: 1299, currency: "USD", inStock: true, ... }]LangChain Integration
The FirecrawlLoader ships first-party inside langchain_community. Five lines of Python take you from URL to embedding-ready documents.
FirecrawlLoader modes
- scrapeSingle URL returned as one
Document - crawlFull site walked recursively, one
Documentper page - mapURL tree returned as a single document โ cheap + fast
Downstream vector stores
- Chroma (local + Chroma Cloud)
- Pinecone (serverless + pod)
- Weaviate (OSS + Cloud)
- pgvector / Supabase
- Qdrant, Milvus, LanceDB
LangChain 0.3+ โ Firecrawl via ProxyStyler to pgvector RAG
from langchain_community.document_loaders.firecrawl import FirecrawlLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_postgres import PGVector
import os
# Self-hosted Firecrawl v2 already wired to ProxyStyler 4G via docker-compose
loader = FirecrawlLoader(
api_key="fc-local-dev-key",
api_url="http://localhost:3002",
url="https://docs.shopify.com",
mode="crawl", # walk the whole docs site
params={
"limit": 2000,
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": True,
"headers": {"X-Rotate-Every": "40"},
},
"maxDepth": 6,
},
)
docs = loader.load() # Each doc already LLM-ready markdown
print(f"Loaded {len(docs)} pages through ProxyStyler mobile proxies")
splitter = RecursiveCharacterTextSplitter(chunk_size=1200, chunk_overlap=120)
chunks = splitter.split_documents(docs)
vector_store = PGVector.from_documents(
documents=chunks,
embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
collection_name="shopify_docs_042026",
connection=os.environ["PG_CONN"],
)
# Query it
results = vector_store.similarity_search(
"How do I build a checkout extension in 2026?", k=5
)
for r in results:
print(r.metadata["source"], "-", r.page_content[:120])LlamaIndex Integration for RAG
LlamaIndex exposes FirecrawlReader via llama-index-readers-web. The reader returns LlamaIndex Document nodes that plug into any VectorStoreIndex.
LlamaIndex 0.12+ โ FirecrawlReader for a research agent
from llama_index.readers.web import FirecrawlReader
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.anthropic import Anthropic
Settings.llm = Anthropic(model="claude-opus-4-7")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
reader = FirecrawlReader(
api_key="fc-local-dev-key",
api_url="http://localhost:3002", # ProxyStyler-backed self-hosted
mode="crawl",
)
documents = reader.load_data(
url="https://www.shopify.com/partners/blog",
params={
"limit": 300,
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": True,
"headers": {"X-Proxy-Pool": "proxystyler-4g-us-east"},
},
},
)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=6)
answer = query_engine.query(
"Summarise the three most impactful Shopify partner launches in Q1 2026."
)
print(answer)
for node in answer.source_nodes:
print("Source:", node.metadata.get("source"))Agents
Combine FirecrawlReader with LlamaIndex QueryEngineTool for autonomous research agents that scrape and cite sources.
Cost Control
Use mode="map" first to discover URLs, then selectively scrape โ saves up to 70% credits on large sites.
Streaming
FirecrawlReader works with LlamaIndex Workflows for streaming token output while ingestion continues in background.
Scrape vs Crawl vs Extract
Pick the cheapest endpoint that answers your question. Picking wrong can 9x your bill.
| Endpoint | Best for | Output | Credits | Latency | Pair with mobile? |
|---|---|---|---|---|---|
| /scrape | One known URL | Markdown + HTML | 1 | 1.5-4s | Yes, for hardened targets |
| /crawl | Whole site ingest | Markdown per page | 1 per page | Async job | Essential above 500 pages |
| /map | Site URL discovery | URL list | 1 total | 2-8s | Usually not needed |
| /search | Live web search | Ranked markdown | 1 per result | 1-3s | SERP targets benefit |
| /extract | Structured LLM JSON | Pydantic/Zod JSON | 9 per page | 4-12s | Avoid retries = save 9 credits |
Pro pattern: map โ scrape, skip /extract
Call /map first (1 credit) to discover URLs, filter client-side with regex, then call /scrape only on matching pages. Parse markdown locally with marked or markdownify. You save 8 credits per page compared to /extract, which adds up to $60+ per 1,000 pages.
Firecrawl vs Jina Reader vs ScrapeGraphAI
The three AI-native scraping tools devs actually ship with in 2026 โ direct feature-for-feature comparison.
| Feature | Firecrawl v2 | Jina Reader | ScrapeGraphAI |
|---|---|---|---|
| Pricing | Free 500 / $16-$1.5K+ | Free + pay-as-you-go | Open source (self-host) |
| Single URL to markdown | Yes โ /scrape | Yes โ r.jina.ai prefix | Yes โ SmartScraperGraph |
| Full-site crawl | Yes โ /crawl native | No | Partial (DeepScraperGraph) |
| Site URL map | Yes โ /map | No | No |
| Live web search | Yes โ /search | Yes โ s.jina.ai | No |
| LLM structured JSON | Yes โ /extract | Limited | Yes โ via LLM backend |
| LangChain loader | First-party | Community | Community |
| LlamaIndex reader | First-party | Community | Community |
| BYOP mobile proxies | Enterprise + self-host | No | Yes (self-host) |
| Cloudflare bypass | Enterprise stealth | Limited | Manual |
| SDK maturity | Python + Node.js official | Python + curl | Python only |
| Funding / traction | $16.2M, 350K+ devs | Jina AI ~$40M Series A | Open source, 15K GitHub stars |
Pick Firecrawl when
- Scraping 1K+ pages/day in production
- Need crawl, map, search and extract in one SDK
- LangChain/LlamaIndex RAG pipeline
- Enterprise support + SLA required
Pick Jina Reader when
- One-off URLs inside an LLM prompt
- Free tier without sign-up
- Prototyping an agent
- No crawl or extract needed
Pick ScrapeGraphAI when
- Self-hosting is a hard requirement
- Zero per-request cost ceiling
- Tinkering with graph-based pipelines
- No managed SLA required
Production Use Cases in 2026
Patterns ProxyStyler customers are running against Firecrawl v2 today. Every pattern below is backed by a live production deployment.
E-commerce price monitoring
/crawl Amazon, Shopify and independent stores daily. ProxyStyler rotates every 25 pages so product pages never trigger bot pages. Feed deltas into Snowflake for competitive intelligence.
AI research agents
LangChain ReAct agents call /search and /scrape tools. Mobile IPs prevent agents from hitting search CAPTCHAs that would collapse the reasoning loop.
Social media intelligence
Instagram, TikTok, Threads and X public profiles. Each campaign gets a dedicated ProxyStyler port + session, so account-level fingerprints stay consistent.
Knowledge base ingestion
Crawl full SaaS documentation (Stripe, Shopify, AWS) into pgvector for in-product LLM help. Mobile IPs avoid 429s on aggressive docs CDNs.
SEO competitive scraping
/map every competitor domain weekly, /scrape net-new URLs, diff against last week. Identifies topic gaps before they rank.
Lead enrichment pipelines
Clay, Smartlead and custom GTM stacks call Firecrawl /extract with schema to pull firmographics from company sites. Mobile IPs avoid LinkedIn rate limits.
Day-1 Implementation Checklist
A concrete 60-minute plan to go from zero to production Firecrawl + ProxyStyler mobile proxies.
Provision ProxyStyler 4G port
Buy a dedicated mobile port from the ProxyStyler dashboard. Copy the HTTPS endpoint, username, password, and rotation key.
Clone Firecrawl v2 repo
git clone https://github.com/mendableai/firecrawl && cd firecrawl/apps/api โ the monorepo includes the API, Playwright worker and docker-compose.
Wire env variables
Copy .env.example โ .env. Set PROXY_SERVER, PROXY_USERNAME, PROXY_PASSWORD from ProxyStyler. Set TEST_API_KEY to any string for local dev.
Bring up the stack
docker-compose up -d spins up Redis, Playwright, and the Firecrawl API on localhost:3002. Tail logs to verify ProxyStyler egress.
Validate proxy IP
curl -x $PROXY_SERVER -U $PROXY_USERNAME:$PROXY_PASSWORD https://api.ipify.org โ should return a mobile carrier IP, not your home address.
First /scrape call
Hit the local API with a curl POST to /v2/scrape. Inspect the markdown response and the IP in the response metadata.
Wire LangChain
pip install langchain langchain-community firecrawl-py. Instantiate FirecrawlLoader with api_url=http://localhost:3002 and mode="scrape".
Add rotation hook
Wrap your loader in a retry decorator that calls the ProxyStyler rotation URL on 403/429/5xx before retry โ exponential backoff, max 3 tries.
Vector store + query
Pipe FirecrawlLoader documents through RecursiveCharacterTextSplitter โ OpenAIEmbeddings โ PGVector. Verify a similarity_search returns expected chunks.
Monitor + scale
Add Prometheus scrape of Firecrawl /metrics. Graph creditsUsed, successRate, and rotationsPerHour in Grafana to catch regressions early.
Performance Tuning Cheatsheet
Seven settings that separate a production-grade Firecrawl + ProxyStyler deployment from a fragile one.
Rotation cadence
waitFor timing
Concurrency
onlyMainContent
Session stickiness
Retry logic
Credit monitoring
Output format
// Premium Mobile Proxy Pricing
Configure & Buy Mobile Proxies
Select from 10+ countries with real mobile carrier IPs and flexible billing options
// billing-period
Select the billing cycle that works best for you
Available regions:
selected config
ONLINE๐บ๐ธUSA Configuration
AT&T โข Florida โข Monthly Plan
Your price:
No commitment โข Cancel anytime โข Purchase guide
Popular Proxy Locations
Secure payment methods accepted: Credit Card, PayPal, Bitcoin, and more. 2 free modem replacements per 24h.
Ship a 97%-success AI scraper in 2026
Get ProxyStyler 4G mobile proxies dedicated to one user, rotate on demand, and pair them with Firecrawl v2 โ self-hosted or Enterprise BYOP. The only stack that reaches Instagram, TikTok, Ticketmaster and Cloudflare Enterprise targets at production scale.
- Q01Does Firecrawl natively support custom proxies (BYOP)?
- Firecrawl ships with a managed proxy pool on every tier, but Bring-Your-Own-Proxy (BYOP) is officially exposed on the Enterprise plan and through self-hosted deployments. On the hosted Standard and Growth tiers you can still route traffic through ProxyStyler by running a thin proxy-forwarder (FastAPI or Cloudflare Worker) that terminates Firecrawl's egress, rewrites the request, and emits it through the mobile IP. Self-hosting Firecrawl v2 via Docker is the cleanest path: the PROXY_SERVER, PROXY_USERNAME and PROXY_PASSWORD environment variables are read directly by the Playwright engine.
- Q02Why use mobile proxies with Firecrawl instead of the built-in pool?
- Firecrawl's built-in datacenter and residential pools work well for 80% of the public web, but high-trust targets (Instagram, TikTok, LinkedIn, Amazon, Ticketmaster, Cloudflare Enterprise WAF) will throttle or block datacenter ASNs. ProxyStyler 4G mobile IPs sit behind carrier CGNAT, so the target sees an IP shared with thousands of real subscribers and cannot safely block it. Our internal April 2026 benchmark across 50,000 pages shows mobile IPs reach 97.4% success on protected targets versus 41% for Firecrawl's default pool and 68% for generic residential.
- Q03How much does Firecrawl cost per 1,000 scraped pages in 2026?
- Firecrawl v2 pricing (April 2026): Free 500 credits one-time, Hobby $16/month for 3,000 credits ($5.33 per 1K basic scrapes), Standard $83/month for 100,000 credits ($0.83 per 1K), Growth $333/month for 500,000 credits ($0.67 per 1K), and Enterprise custom starting around $1,500/month. One basic /scrape call = 1 credit, /search = 1 credit per result, browser interactions = 5 credits each, and advanced JSON extraction with schema = 9 credits per page. Adding ProxyStyler mobile proxies costs an additional $80 per port but unlocks targets the base plan cannot reach.
- Q04Can I use Firecrawl with LangChain and LlamaIndex RAG pipelines?
- Yes. LangChain ships FirecrawlLoader inside langchain_community.document_loaders with modes for scrape, crawl and map. LlamaIndex exposes FirecrawlReader in llama-index-readers-web. Both accept an api_key and any Firecrawl parameters (including proxy overrides on self-hosted instances), so you can feed LLM-ready markdown directly into Chroma, Pinecone, Weaviate or pgvector with a handful of lines. This is the fastest path from public URL to embedding-ready text in 2026.
- Q05What is the difference between /scrape, /crawl, /map, /search and /extract?
- /scrape returns one URL as clean markdown plus optional structured JSON. /crawl walks an entire site, respects robots.txt and returns every page as markdown. /map returns the full URL tree of a site in seconds without rendering content, useful for building sitemaps. /search performs a live web search and returns results as markdown. /extract takes one or more URLs, a natural-language prompt and a Pydantic/JSON schema, then uses an LLM to pull structured data. Extract is the most expensive (9 credits) but replaces hundreds of lines of BeautifulSoup.
- Q06Does Firecrawl handle JavaScript-heavy SPAs and Cloudflare?
- Firecrawl v2 runs a headless Chromium (Playwright) on every request, so React, Vue, Next.js and Svelte apps render identically to a real browser. Cloudflare Turnstile and Bot Fight Mode are bypassed on Enterprise via the built-in stealth mode, but Cloudflare WAF Enterprise policies with mTLS fingerprinting still require true mobile IPs. Combining Firecrawl's stealth Chromium with ProxyStyler 4G exit nodes is currently the highest-success configuration we have measured on hardened targets.
- Q07Is it legal to scrape with Firecrawl and mobile proxies?
- Scraping publicly available data is legal in the United States under hiQ Labs v. LinkedIn (9th Circuit, 2022) and the narrowed CFAA reading from Van Buren v. United States (Supreme Court, 2021). You must still respect robots.txt where contractually bound, avoid authentication walls you have not agreed to bypass, and honour GDPR/CCPA when personal data is involved. Firecrawl defaults to respecting robots.txt; disable that flag only when you have a clear legal basis.
- Q08How do I rotate ProxyStyler mobile IPs inside a Firecrawl crawl?
- ProxyStyler exposes an HTTPS rotation endpoint per port (https://proxystyler.com/api/rotate?key=YOUR_KEY). On self-hosted Firecrawl, add a per-request middleware that calls the rotation URL every N pages or when a 403/429 is detected. On Enterprise hosted, use the jobOptions.headers.X-ProxyStyler-Rotate flag we expose through our forwarder. Typical cadence is rotate every 25-50 pages for e-commerce and every 5-10 pages for social networks.
- Q09Firecrawl vs Jina Reader vs ScrapeGraphAI: which should I pick?
- Firecrawl wins on scale, crawl-mode depth, enterprise support and the largest ecosystem (350K+ developers, LangChain/LlamaIndex first-party). Jina Reader is free and excellent for single-URL reading inside an LLM prompt but lacks crawl and site-wide map. ScrapeGraphAI is a Python framework you run yourself โ cheapest per request but you build the infrastructure. For AI-agent production workloads in 2026 we recommend Firecrawl + ProxyStyler; for one-off LLM prompts, Jina; for tinkerers, ScrapeGraphAI.
- Q10What happens to my credits if a Firecrawl request fails?
- Firecrawl only charges credits for successful 2xx responses. 4xx and 5xx from the target site, timeouts over 30s, and blocked-by-bot-protection responses do not consume credits. This is why adding mobile proxies is cost-positive at scale: every 403 you prevent is a credit you keep. Our benchmarks show a 2.3x effective credit efficiency on hardened targets when routing through ProxyStyler 4G.
Related
Launch Playbook
/blog/start-mobile-proxy-reseller-business-2026
Bulk Pricing Math
/blog/mobile-proxy-bulk-pricing-volume-tiers
MobileProxy.space
/blog/mobileproxy-space-alternative
Localtonet
/blog/localtonet-alternative
LuxSocks (closed)
/blog/luxsocks-alternative
Pingproxies
/blog/pingproxies-alternative