AI Open Source · RAG 与检索

firecrawl/firecrawl

Firecrawl 把网页抓取、清洗、转 markdown 这条链路打包给 AI agent 用，输出可直接喂 LLM 的干净文本。做联网搜索、构建网页索引、给 agent 加 web context 时常用它替代手写 scraper。

🔥 Search, scrape, and clean the web for AI agents.

Repo: firecrawl/firecrawl
Stars: ★ 121k
Language: TypeScript
License: AGPL-3.0
Last push: today
Created: 2024-04-15
Topics: aiai-agentsai-crawlerai-scrapingai-searchcrawler
Homepage: https://firecrawl.dev

README

🔥 Firecrawl

Search, scrape, and clean the web for AI agents. The web context API to find sources, extract content, and turn it into clean Markdown or structured data your agents can ship with. Open source and available as a hosted service.

Pst. Hey, you, join our stargazers :)

Why Firecrawl?

Industry-leading reliability: Covers 96% of the web, including JS-heavy pages — no proxy headaches, just clean data (see benchmarks)
Blazingly fast: P95 latency of 3.4s across millions of pages, built for real-time agents and dynamic apps
LLM-ready output: Clean markdown, structured JSON, screenshots, and more — spend fewer tokens, build better AI apps
We handle the hard stuff: Rotating proxies, orchestration, rate limits, JS-blocked content, and more — zero configuration
Agent ready: Connect Firecrawl to any AI agent or MCP client with a single command
Media parsing: Parse and extract content from web-hosted PDFs, DOCX, and more
Actions: Click, scroll, write, wait, and press before extracting content
Open source: Developed transparently and collaboratively — join our community

Feature Overview

Core Endpoints

Feature	Description
Search	Search the web and get full page content from results
Scrape	Convert any URL to markdown, HTML, screenshots, or structured JSON
Interact	Scrape a page, then interact with it using AI prompts or code

More

Feature	Description
Agent	Automated data gathering, just describe what you need
Crawl	Scrape all URLs of a website with a single request
Map	Discover all URLs on a website instantly
Batch Scrape	Scrape thousands of URLs asynchronously

Quick Start

Search

Search the web and get full content from results.

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

search_result = app.search("firecrawl", limit=5)

Node.js

import Firecrawl from '@mendable/firecrawl-js';

const app = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});

app.search("firecrawl", { limit: 5 })

cURL

curl -X POST 'https://api.firecrawl.dev/v2/search' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "query": "firecrawl",
  "limit": 5
}'

CLI

firecrawl search "firecrawl" --limit 5

</details>

Output:

[
  {
    "url": "https://firecrawl.dev",
    "title": "Firecrawl",
    "markdown": "Turn websites into..."
  },
  {
    "url": "https://docs.firecrawl.dev",
    "title": "Firecrawl Docs",
    "markdown": "# Getting Started..."
  }
]

Scrape

Get LLM-ready data from any website — markdown, JSON, screenshots, and more.

from firecrawl import Firecrawl

app = Firecrawl(api_key=

同一分类的其他项

Back to RAG 与检索