AI Open Source · RAG 与检索
firecrawl/firecrawl
Firecrawl 把网页抓取、清洗、转 markdown 这条链路打包给 AI agent 用,输出可直接喂 LLM 的干净文本。做联网搜索、构建网页索引、 给 agent 加 web context 时常用它替代手写 scraper。
🔥 Search, scrape, and clean the web for AI agents.
- Stars
- ★ 121k
- Language
- TypeScript
- License
- AGPL-3.0
- Last push
- today
- Created
- 2024-04-15
- Topics
- aiai-agentsai-crawlerai-scrapingai-searchcrawler
- Homepage
- https://firecrawl.dev
README
<h3 align="center">
<a name="readme-top"></a>
<img
src="https://raw.githubusercontent.com/firecrawl/firecrawl/main/img/firecrawl_logo.png"
height="200"
>
</h3>
<div align="center">
<a href="https://github.com/firecrawl/firecrawl/blob/main/LICENSE">
<img src="https://img.shields.io/github/license/firecrawl/firecrawl" alt="License">
</a>
<a href="https://pepy.tech/project/firecrawl-py">
<img src="https://static.pepy.tech/badge/firecrawl-py" alt="Downloads">
</a>
<a href="https://GitHub.com/firecrawl/firecrawl/graphs/contributors">
<img src="https://img.shields.io/github/contributors/firecrawl/firecrawl.svg" alt="GitHub Contributors">
</a>
<a href="https://firecrawl.dev">
<img src="https://img.shields.io/badge/Visit-firecrawl.dev-orange" alt="Visit firecrawl.dev">
</a>
</div>
<div>
<p align="center">
<a href="https://twitter.com/firecrawl">
<img src="https://img.shields.io/badge/Follow%20on%20X-000000?style=for-the-badge&logo=x&logoColor=white" alt="Follow on X" />
</a>
<a href="https://www.linkedin.com/company/104100957">
<img src="https://img.shields.io/badge/Follow%20on%20LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Follow on LinkedIn" />
</a>
<a href="https://discord.gg/firecrawl">
<img src="https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Join our Discord" />
</a>
</p>
</div>
🔥 Firecrawl
Search, scrape, and clean the web for AI agents. The web context API to find sources, extract content, and turn it into clean Markdown or structured data your agents can ship with. Open source and available as a hosted service.
Pst. Hey, you, join our stargazers :)
<a href="https://github.com/firecrawl/firecrawl"> <img src="https://img.shields.io/github/stars/firecrawl/firecrawl.svg?style=social&label=Star&maxAge=2592000" alt="GitHub stars"> </a>Why Firecrawl?
- Industry-leading reliability: Covers 96% of the web, including JS-heavy pages — no proxy headaches, just clean data (see benchmarks)
- Blazingly fast: P95 latency of 3.4s across millions of pages, built for real-time agents and dynamic apps
- LLM-ready output: Clean markdown, structured JSON, screenshots, and more — spend fewer tokens, build better AI apps
- We handle the hard stuff: Rotating proxies, orchestration, rate limits, JS-blocked content, and more — zero configuration
- Agent ready: Connect Firecrawl to any AI agent or MCP client with a single command
- Media parsing: Parse and extract content from web-hosted PDFs, DOCX, and more
- Actions: Click, scroll, write, wait, and press before extracting content
- Open source: Developed transparently and collaboratively — join our community
Feature Overview
Core Endpoints
| Feature | Description |
|---|---|
| Search | Search the web and get full page content from results |
| Scrape | Convert any URL to markdown, HTML, screenshots, or structured JSON |
| Interact | Scrape a page, then interact with it using AI prompts or code |
More
| Feature | Description |
|---|---|
| Agent | Automated data gathering, just describe what you need |
| Crawl | Scrape all URLs of a website with a single request |
| Map | Discover all URLs on a website instantly |
| Batch Scrape | Scrape thousands of URLs asynchronously |
Quick Start
Sign up at firecrawl.dev to get your API key. Try the playground to test it out.
Search
Search the web and get full content from results.
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
search_result = app.search("firecrawl", limit=5)
<details>
<summary><b>Node.js / cURL / CLI</b></summary>
Node.js
import Firecrawl from '@mendable/firecrawl-js';
const app = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});
app.search("firecrawl", { limit: 5 })
cURL
curl -X POST 'https://api.firecrawl.dev/v2/search' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"query": "firecrawl",
"limit": 5
}'
CLI
firecrawl search "firecrawl" --limit 5
</details>
Output:
[
{
"url": "https://firecrawl.dev",
"title": "Firecrawl",
"markdown": "Turn websites into..."
},
{
"url": "https://docs.firecrawl.dev",
"title": "Firecrawl Docs",
"markdown": "# Getting Started..."
}
]
Scrape
Get LLM-ready data from any website — markdown, JSON, screenshots, and more.
from firecrawl import Firecrawl
app = Firecrawl(api_key=
同一分类的其他项