AI Open Source · RAG 与检索

firecrawl/firecrawl

Firecrawl 把网页抓取、清洗、转 markdown 这条链路打包给 AI agent 用,输出可直接喂 LLM 的干净文本。做联网搜索、构建网页索引、 给 agent 加 web context 时常用它替代手写 scraper。

🔥 Search, scrape, and clean the web for AI agents.

Stars
121k
Language
TypeScript
License
AGPL-3.0
Last push
today
Created
2024-04-15
Topics
aiai-agentsai-crawlerai-scrapingai-searchcrawler

README

<h3 align="center"> <a name="readme-top"></a> <img src="https://raw.githubusercontent.com/firecrawl/firecrawl/main/img/firecrawl_logo.png" height="200" > </h3> <div align="center"> <a href="https://github.com/firecrawl/firecrawl/blob/main/LICENSE"> <img src="https://img.shields.io/github/license/firecrawl/firecrawl" alt="License"> </a> <a href="https://pepy.tech/project/firecrawl-py"> <img src="https://static.pepy.tech/badge/firecrawl-py" alt="Downloads"> </a> <a href="https://GitHub.com/firecrawl/firecrawl/graphs/contributors"> <img src="https://img.shields.io/github/contributors/firecrawl/firecrawl.svg" alt="GitHub Contributors"> </a> <a href="https://firecrawl.dev"> <img src="https://img.shields.io/badge/Visit-firecrawl.dev-orange" alt="Visit firecrawl.dev"> </a> </div> <div> <p align="center"> <a href="https://twitter.com/firecrawl"> <img src="https://img.shields.io/badge/Follow%20on%20X-000000?style=for-the-badge&logo=x&logoColor=white" alt="Follow on X" /> </a> <a href="https://www.linkedin.com/company/104100957"> <img src="https://img.shields.io/badge/Follow%20on%20LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Follow on LinkedIn" /> </a> <a href="https://discord.gg/firecrawl"> <img src="https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Join our Discord" /> </a> </p> </div>

🔥 Firecrawl

Search, scrape, and clean the web for AI agents. The web context API to find sources, extract content, and turn it into clean Markdown or structured data your agents can ship with. Open source and available as a hosted service.

Pst. Hey, you, join our stargazers :)

<a href="https://github.com/firecrawl/firecrawl"> <img src="https://img.shields.io/github/stars/firecrawl/firecrawl.svg?style=social&label=Star&maxAge=2592000" alt="GitHub stars"> </a>

Why Firecrawl?

  • Industry-leading reliability: Covers 96% of the web, including JS-heavy pages — no proxy headaches, just clean data (see benchmarks)
  • Blazingly fast: P95 latency of 3.4s across millions of pages, built for real-time agents and dynamic apps
  • LLM-ready output: Clean markdown, structured JSON, screenshots, and more — spend fewer tokens, build better AI apps
  • We handle the hard stuff: Rotating proxies, orchestration, rate limits, JS-blocked content, and more — zero configuration
  • Agent ready: Connect Firecrawl to any AI agent or MCP client with a single command
  • Media parsing: Parse and extract content from web-hosted PDFs, DOCX, and more
  • Actions: Click, scroll, write, wait, and press before extracting content
  • Open source: Developed transparently and collaboratively — join our community

Feature Overview

Core Endpoints

FeatureDescription
SearchSearch the web and get full page content from results
ScrapeConvert any URL to markdown, HTML, screenshots, or structured JSON
InteractScrape a page, then interact with it using AI prompts or code

More

FeatureDescription
AgentAutomated data gathering, just describe what you need
CrawlScrape all URLs of a website with a single request
MapDiscover all URLs on a website instantly
Batch ScrapeScrape thousands of URLs asynchronously

Quick Start

Sign up at firecrawl.dev to get your API key. Try the playground to test it out.

Search

Search the web and get full content from results.

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

search_result = app.search("firecrawl", limit=5)
<details> <summary><b>Node.js / cURL / CLI</b></summary>

Node.js

import Firecrawl from '@mendable/firecrawl-js';

const app = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});

app.search("firecrawl", { limit: 5 })

cURL

curl -X POST 'https://api.firecrawl.dev/v2/search' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "query": "firecrawl",
  "limit": 5
}'

CLI

firecrawl search "firecrawl" --limit 5
</details>

Output:

[
  {
    "url": "https://firecrawl.dev",
    "title": "Firecrawl",
    "markdown": "Turn websites into..."
  },
  {
    "url": "https://docs.firecrawl.dev",
    "title": "Firecrawl Docs",
    "markdown": "# Getting Started..."
  }
]

Scrape

Get LLM-ready data from any website — markdown, JSON, screenshots, and more.

from firecrawl import Firecrawl

app = Firecrawl(api_key=

同一分类的其他项