Claude Code Skills · 论文 · 文献综述

research-refine

一个将模糊研究方向精炼为可执行方法方案的流程。核心围绕四个原则：锁定问题锚点不动摇、选最小直接机制改瓶颈、一篇论文只做一件主要贡献、用基础模型能力但不堆叠术语。通过多轮自我审查与 GPT-5.4 评审把方案打薄打实。适合已有问题但技术路线仍模糊时使用。

Turn a vague research direction into a problem-anchored, elegant, frontier-aware, implementation-oriented method plan via iterative GPT-5.4 review. Use when the user says "refine my approach", "帮我细化方案", "decompose this problem", "打磨idea", "refine research plan", "细化研究方案", or wants a concrete research method that stays simple, focused, and top-venue ready instead of a vague or overbuilt idea.

Repo: Chanw-research/claude-code-paper-writing
skills.sh: /Chanw-research/claude-code-paper-writing/research-refine
Slug: research-refine

skills.sh 详情页 GitHub 源目录

SKILL.md

Research Refine: Problem-Anchored, Elegant, Frontier-Aware Plan Refinement

Refine and concretize: $ARGUMENTS

Overview

Use this skill when the research problem is already visible but the technical route is still fuzzy. The goal is not to produce a bloated proposal or a benchmark shopping list. The goal is to turn a vague direction into a problem -> focused method -> minimal validation document that is concrete enough to implement, elegant enough to feel paper-worthy, and current enough to resonate in the foundation-model era.

Four principles dominate this skill:

Do not lose the original problem. Freeze an immutable Problem Anchor and reuse it in every round.
The smallest adequate mechanism wins. Prefer the minimal intervention that directly fixes the bottleneck.
One paper, one dominant contribution. Prefer one sharp thesis plus at most one supporting contribution.
Modern leverage is a prior, not a decoration. When LLM / VLM / Diffusion / RL / distillation / inference-time scaling naturally fit the bottleneck, use them concretely. Do not bolt them on as buzzwords.

User input (PROBLEM + vague APPROACH)
  -> Phase 0 (Claude): Freeze Problem Anchor
  -> Phase 1 (Claude): Scan grounding papers -> identify technical gap -> choose the sharpest route -> write focused proposal
  -> Phase 2 (Codex/GPT-5.4): Review for fidelity, specificity, contribution quality, and frontier leverage
  -> Phase 3 (Claude): Anchor check + simplicity check -> revise method -> rewrite full proposal
  -> Phase 4 (Codex, same thread): Re-evaluate revised proposal
  -> Repeat Phase 3-4 until OVERALL SCORE >= 9 or MAX_ROUNDS reached
  -> Phase 5: Save full history to refine-logs/
  -> Optional handoff: /experiment-plan for a detailed execution-ready experiment roadmap

Constants

REVIEWER_MODEL = gpt-5.4 — Reviewer model used via Codex MCP.
MAX_ROUNDS = 5 — Maximum review-revise rounds.
SCORE_THRESHOLD = 9 — Minimum overall score to stop.
OUTPUT_DIR = refine-logs/ — Directory for round files and final report.
MAX_LOCAL_PAPERS = 15 — Maximum local papers/notes to scan for grounding.
MAX_CORE_EXPERIMENTS = 3 — Default cap for core validation blocks inside this skill.
MAX_PRIMARY_CLAIMS = 2 — Soft cap for paper-level claims. Prefer one dominant claim plus one supporting claim.
MAX_NEW_TRAINABLE_COMPONENTS = 2 — Soft cap for genuinely new trainable pieces. Exceed only if the paper breaks otherwise.

Override via argument if needed, e.g. /research-refine "problem | approach" -- max rounds: 3, threshold: 9.

State Persistence (Checkpoint Recovery)

Long-running refinement sessions may fail mid-way (e.g., API timeout, context compaction, or session interruption). To avoid losing completed work, persist state to refine-logs/REFINE_STATE.json after each phase boundary:

{
  "phase": "review",
  "round": 1,
  "threadId": "019cd392-...",
  "last_score": 6.5,
  "last_verdict": "REVISE",
  "status": "in_progress",
  "timestamp": "2026-03-22T20:00:00"
}

Field definitions:

Field	Values	Meaning
`phase`	`"anchor"` / `"proposal"` / `"review"` / `"refine"` / `"done"`	Last completed phase
`round`	0–MAX_ROUNDS	Current round number
`threadId`	string or null	Reviewer thread ID for `codex-reply` continuity
`last_score`	number or null	Most recent overall score from reviewer
`last_verdict`	string or null	Most recent verdict (READY / REVISE / RETHINK)
`status`	`"in_progress"` / `"completed"`	Loop status
`timestamp`	ISO 8601	When state was last written

Write rules:

Write after each phase completes (not before). Overwrite each time — only the latest state matters.
On completion (Phase 5 finished), set "status": "completed".

Output Structure

refine-logs/
├── REFINE_STATE.json
├── round-0-initial-proposal.md
├── round-1-review.md
├── round-1-refinement.md
├── round-2-review.md
├── round-2-refinement.md
├── ...
├── REVIEW_SUMMARY.md
├── FINAL_PROPOSAL.md
├── REFINEMENT_REPORT.md
└── score-history.md

Every round-N-refinement.md must contain a full anchored proposal, not just incremental fixes.

Workflow

Initialization (Checkpoint Recovery)

Before starting any phase, check whether a previous run left a checkpoint:

Check for refine-logs/REFINE_STATE.json:
- If it does not exist → fresh start (proceed to Phase 0 normally)
- If it exists AND status is "completed" → fresh start (delete state file, previous run finished)
- If it exists AND status is "in_progress" AND timestamp is older than 24 hours → fresh start (stale state from a killed/abandoned run — delete the file)
- If it exists AND status is "in_progress" AND timestamp is within 24 hours → resume

On resume, read the state file and recover context:

Read all existing refine-logs/round-*.md files to restore prior work
Read refine-logs/score-history.md if it exists
Recover threadId for reviewer thread continuity
Log to the user: "Checkpoint found. Resuming after phase: {phase}, round: {round}."
Jump to the next phase based on the saved phase value:

Saved `phase`	What was completed	Resume from
`"anchor"`	Phase 0 done	Phase 1 (read anchor from round-0 context)
`"proposal"`	Phase 1 done	Phase 2 (read `round-0-initial-proposal.md`)
`"review"`	Phase 2 or 4 done	Phase 3 (read latest `round-N-review.md`)
`"refine"`	Phase 3 done	Phase 4 (read latest `round-N-refinement.md`)

On fresh start, ensure refine-logs/ directory exists and proceed to Phase 0.

Phase 0: Freeze the Problem Anchor

Before proposing anything, extract the user's immutable bottom-line problem. This anchor must be copied verbatim into every proposal and every refinement round.

Write:

Bottom-line problem: What technical problem must be solved?
Must-solve bottleneck: What specific weakness in current methods is unacceptable?
Non-goals: What is explicitly not the goal of this project?
Constraints: Compute, data, time, tooling, venue, deployment limits.
Success condition: What evidence would make the user say "yes, this method addresses the actual problem"?

If later reviewer feedback would change the problem being solved, mark that as drift and push back or adapt carefully.

Checkpoint: Write refine-logs/REFINE_STATE.json with {"phase": "anchor", "round": 0, "threadId": null, "last_score": null, "last_verdict": null, "status": "in_progress", "timestamp": "<now>"}.

Phase 1: Build the Initial Proposal

Step 1.1: Scan Grounding Material

Check papers/ and literature/ first. Read only the relevant parts needed to answer:

What mechanism do current methods use?
Where exactly do they fail for this problem?
Which recent LLM / VLM / Diffusion / RL era techniques are actually relevant here?
What training objectives, representations, or interfaces are reusable?
What details distinguish a real method from a renamed high-level idea?

If local material is insufficient, search recent top-venue/arXiv work online. Focus on method sections, training setup, and failure modes, not just abstracts.

Step 1.2: Identify the Technical Gap

Do not stop at generic research questions. Make the gap operational:

Current pipeline failure point: where does the baseline break?
Why naive fixes are insufficient: larger context, more data, prompting, memory bank, or stacking more modules.
Smallest adequate intervention: what is the least additional mechanism that could plausibly fix the bottleneck?
Frontier-native alternative: is there a more current route using foundation-model-era primitives that better matches the bottlen

Back to 论文 · 文献综述