Claude Code Skills · 论文 · 文献综述

deep-research

deep-research 是一个供 Claude Code 使用的文献综述 skill。它将综述拆解为六个有序阶段,从前沿检索、系统调查、深度阅读、代码库考察到最终合成与差距分析,每个阶段产生可检视的中间产物。适用于需要系统了解某个研究方向全貌的研究者,从零开始建立文献库与综述报告。

Conduct systematic academic literature reviews in 6 phases, producing structured notes, a curated paper database, and a synthesized final report. Output is organized by phase for clarity.

Repo
Chanw-research/claude-code-paper-writing
Slug
deep-research

SKILL.md

Deep Research Skill

Trigger

Activate this skill when the user wants to:

  • "Research a topic", "literature review", "find papers about", "survey papers on"
  • "Deep dive into [topic]", "what's the state of the art in [topic]"
  • Uses /research <topic> slash command

Overview

This skill conducts systematic academic literature reviews in 6 phases, producing structured notes, a curated paper database, and a synthesized final report. Output is organized by phase for clarity.

Installation: ~/.claude/skills/deep-research/ — scripts, references, and this skill definition. Output: .//Users/lingzhi/Code/deep-research-output/{slug}/ relative to the current working directory.

CRITICAL: Strict Sequential Phase Execution

You MUST execute all 6 phases in strict order: 1 → 2 → 3 → 4 → 5 → 6. NEVER skip any phase.

This is the single most important rule of this skill. Violations include:

  • ❌ Jumping from Phase 2 to Phase 5/6 (skipping Deep Dive and Code)
  • ❌ Writing synthesis or report before completing Phase 3 deep reading
  • ❌ Producing a final report based only on abstracts/titles from search results
  • ❌ Combining or merging phases (e.g., doing "Phase 3-5 together")

Phase Gate Protocol

Before starting Phase N+1, you MUST verify that Phase N's required output files exist on disk. If they don't exist, you have NOT completed that phase.

PhaseGate: Required Output Files
1 → 2phase1_frontier/frontier.md exists AND contains ≥10 papers
2 → 3phase2_survey/survey.md exists AND paper_db.jsonl has 35-80 papers
3 → 4phase3_deep_dive/selection.md AND phase3_deep_dive/deep_dive.md exist AND deep_dive.md contains detailed notes for ≥8 papers
4 → 5phase4_code/code_repos.md exists AND contains ≥3 repositories
5 → 6phase5_synthesis/synthesis.md AND phase5_synthesis/gaps.md exist

After completing each phase, print a phase completion checkpoint:

✅ Phase N complete. Output: [list files written]. Proceeding to Phase N+1.

Why Every Phase Matters

  • Phase 3 (Deep Dive) is where you actually READ papers — without it, your synthesis is superficial and based only on abstracts
  • Phase 4 (Code & Tools) grounds the research in practical implementations — without it, you miss the open-source ecosystem
  • Phase 5 (Synthesis) requires deep knowledge from Phase 3 — you cannot synthesize papers you haven't read
  • Phase 6 (Report) assembles content from ALL prior phases — it should cite specific findings from Phase 3 notes

Paper Quality Policy

Peer-reviewed conference papers take priority over arXiv preprints. Many arXiv papers have not undergone peer review and may contain unverified claims.

Source Priority (highest to lowest)

  1. Top AI conferences: NeurIPS, ICLR, ICML, ACL, EMNLP, NAACL, AAAI, IJCAI, CVPR, KDD, CoRL
  2. Peer-reviewed journals: JMLR, TACL, Nature, Science, etc.
  3. Workshop papers: NeurIPS/ICML workshops (lower bar but still reviewed)
  4. arXiv preprints with high citations: Likely high-quality but unverified
  5. Recent arXiv preprints: Use cautiously, note "preprint" status explicitly

When to Use arXiv Papers

  • As supplementary evidence alongside peer-reviewed work
  • For very recent results (< 3 months old) not yet at conferences
  • When a peer-reviewed version doesn't exist yet — note (preprint) in citations
  • For survey/review papers (these are useful even without peer review)

Search Tools (by priority)

1. paper_finder (primary — conference papers only)

Location: /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py

Searches ai-paper-finder.info (HuggingFace Space) for published conference papers. Supports filtering by conference + year. Outputs JSONL with BibTeX.

python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode scrape --config <config.yaml>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode download --jsonl <results.jsonl>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --list-venues

Config example:

searches:
  - query: "long horizon reasoning agent"
    num_results: 100
    venues:
      neurips: [2024, 2025]
      iclr: [2024, 2025, 2026]
      icml: [2024, 2025]
output:
  root: /Users/lingzhi/Code/deep-research-output/{slug}/phase1_frontier/search_results
  overwrite: true

2. search_semantic_scholar.py (supplementary — citation data + broader coverage)

Location: /Users/lingzhi/.claude/skills/deep-research/scripts/search_semantic_scholar.py Supports --peer-reviewed-only and --top-conferences filters. API key: /Users/lingzhi/Code/keys.md (field S2_API_Key)

3. search_arxiv.py (supplementary — latest preprints)

Location: /Users/lingzhi/.claude/skills/deep-research/scripts/search_arxiv.py For searching recent papers not yet published at conferences. Mark citations with (preprint).

Other Scripts

ScriptLocationKey Flags
download_papers.py~/.claude/skills/deep-research/scripts/--jsonl, --output-dir, --max-downloads, --sort-by-citations
extract_pdf.py~/.claude/skills/deep-research/scripts/--pdf, --pdf-dir, --output-dir, --sections-only
paper_db.py~/.claude/skills/deep-research/scripts/subcommands: merge, search, filter, tag, stats, add, export
bibtex_manager.py~/.claude/skills/deep-research/scripts/--jsonl, --output, --keys-only
compile_report.py~/.claude/skills/deep-research/scripts/--topic-dir

WebFetch Mode (no Bash)

  1. Paper discovery: WebSearch + WebFetch to query Semantic Scholar/arXiv APIs
  2. Paper reading: WebFetch on ar5iv HTML or Read tool on downloaded PDFs
  3. Writing: Write tool for JSONL, notes, report files

6-Phase Workflow

Phase 1: Frontier

Search the latest conference proceedings and preprints to understand current trends.

  1. Write phase1_frontier/paper_finder_config.yaml targeting latest 1-2 years
  2. Run paper_finder scrape
  3. WebSearch for latest accepted paper lists
  4. Identify trending directions, key breakthroughs → Output: phase1_frontier/frontier.md, phase1_frontier/search_results/

Phase 2: Survey

Build a comprehensive landscape with broader time range. Target 35-80 papers after filtering.

  1. Write phase2_survey/paper_finder_config.yaml covering 2023-2025
  2. Run paper_finder + Semantic Scholar + arXiv
  3. Merge all results: python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py merge
  4. Filter to 35-80 most relevant: python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py filter --min-score 0.80 --max-papers 70
  5. Cluster by theme, write survey notes → Output: phase2_survey/survey.md, phase2_survey/search_results/, paper_db.jsonl

Phase 3: Deep Dive ⚠️ DO NOT SKIP

This phase is MANDATORY. You must actually READ 8-15 full papers, not just their abstracts.

  1. Select 8-15 papers from paper_db.jsonl with rationale → write phase3_deep_dive/selection.md
  2. Download PDFs: python download_papers.py --jsonl paper_db.jsonl --output-dir phase3_deep_dive/papers/ --sort-by-citations --max-downloads 15
  3. For EACH selected paper, read the full text (PDF via Read or HTML via WebFetch on ar5iv)
  4. Write detailed structured notes per paper (see note-format.md template): problem, contributions, methodology, experiments, limitations, connections
  5. Write ALL notes → phase3_deep_dive/deep_dive.md

Phase 3 Gate: deep_dive.md must contain detailed notes for ≥8 papers, each with methodology and experiment sections filled in. Abstract-only summaries do NOT count.

→ Output: phase3_deep_dive/selection.md, phase3_deep_dive/deep_dive.md, phase3_deep_dive/papers/

Phase 4: Code & Tools ⚠️ DO NOT SKIP

**This phase is MANDAT

同一分类的其他项