Claude Code Skills · 论文 · 文献综述

citation-audit

Zero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says "审查引用", "check citations", "citation audit", "verify references", "引用核对", or before submission to ensure bibliography integrity.

Repo: Chanw-research/claude-code-paper-writing
skills.sh: /Chanw-research/claude-code-paper-writing/citation-audit
Slug: citation-audit

skills.sh 详情页 GitHub 源目录

SKILL.md

Citation Audit

Verify every \cite{...} in a paper against three independent layers:

Existence — the cited paper actually exists at the claimed arXiv ID / DOI / venue.
Metadata correctness — author names, year, venue, and title match canonical sources (DBLP, arXiv, ACL Anthology, Nature, OpenReview, etc.).
Context appropriateness — the cited paper actually supports the claim it is being used to support in the manuscript.

This skill is the fourth layer of \aris{}'s evidence-and-claim assurance, complementing experiment-audit (code), result-to-claim (science verdict), and paper-claim-audit (numerical claims). Together they form a bottom-up integrity stack from raw evaluation code to manuscript bibliography.

When to Use This Skill

Run before submission. The right gating point is:

After paper-write has produced the LaTeX draft and bib file
After paper-claim-audit has verified numerical claims
Before final paper-compile for submission

Do not run this on a half-written draft — most of the work is in cross-checking each \cite against context, which is wasted on placeholder text.

What This Skill Catches

The dangerous citation problems are not wildly fake citations — those are easy to spot. The dangerous ones are:

Wrong-context citations: real paper, but the cited claim is not what that paper actually establishes (e.g., citing Self-Refine to support "self-feedback produces correlated errors" — Self-Refine actually argues the opposite).
Author hallucinations: anonymous-author placeholders that slipped through, missing co-authors, wrong order.
Title drift: arXiv v1 vs v3 with different titles silently merged.
Venue confusion: arXiv preprint cited but the official venue is now CVPR/ICML/NeurIPS — using the wrong record.
Year mismatch: arXiv 2023 preprint with 2024 conference acceptance, year reported inconsistently.
Phantom DOIs: DOI looks real but does not resolve.
Self-citation drift: your own prior work cited with year off by one.

Constants

REVIEWER_MODEL = gpt-5.4 — Used via Codex MCP. Default for cross-model review with web access.
CONTEXT_POLICY = fresh — Each audit run uses a new reviewer thread (REVIEWER_BIAS_GUARD). Never codex-reply.
WEB_SEARCH = required — The reviewer must perform real web/DBLP/arXiv lookups, not pattern-match from memory.
OUTPUT = CITATION_AUDIT.md — Human-readable per-entry verdict report.
STATE = CITATION_AUDIT.json — Machine-readable verdict ledger consumable by downstream tools.
SOFT_ONLY = false — When true (set via — soft-only / — soft_only flag), the audit runs all three layers normally but forbids any .bib file mutation. Findings that would otherwise mutate the bib (FIX / REPLACE / REMOVE) are translated into per-occurrence sentence-rewrite proposals against the citing *.tex files. Used by /resubmit-pipeline Phase 1 to honor the user's hard "freeze the bib" constraint.

Workflow

Step 1: Discover bib file and section files

Locate:

references.bib (or paper.bib / similar) under the paper directory
All *.tex files containing \cite{...} calls (typically sec/ or sections/)

If multiple bib files exist, audit each separately.

Step 2: Extract all (cite-key, context) pairs

For each \cite{key1,key2,...} invocation in the paper:

Record the cite key
Record the file + line number
Record the surrounding sentence (≥ 1 full sentence around the cite, for context check)

Output a flat list of (key, file, line, surrounding_sentence) tuples.

Also build the inverse: for each bib entry, the list of all places it is cited.

Define two protocol sets used throughout the rest of the workflow: cited_keys is the set of unique cite keys appearing in any \cite{...} invocation across the audited *.tex files (de-duplicated), and bib_keys is the set of keys parsed from the audited bib file(s). cited_keys drives Step 3 (audit only cited entries); bib_keys \ cited_keys is the uncited residual surfaced by the --uncited opt-in.

If the user passed --uncited, also compute the set difference bib_keys \ cited_keys here and stash it for use in Steps 5 and the JSON aggregation; see "Uncited Entry Detection (opt-in)" below for the protocol. The set-diff is a string operation only and does not consume reviewer budget.

Save the extracted contexts to paper/.aris/citation-audit/contexts.txt so the reviewer can read it directly. Use the paper-dir-relative path .aris/citation-audit/contexts.txt when recording the file in audited_input_hashes; do not stage under /tmp or other transient locations that the verifier cannot rehash later.

Step 3: Send each entry to fresh cross-model reviewer

For each cited bib entry — i.e., each key in cited_keys with at least one extracted citation context — invoke mcp__codex__codex (NOT codex-reply — fresh thread per entry, or batch with explicit per-entry isolation). Do not send entries in bib_keys \ cited_keys to the reviewer; those are detect-only and surface only when --uncited is explicitly enabled (see "Uncited Entry Detection" below).

mcp__codex__codex:
  model: gpt-5.4
  config: {"model_reasoning_effort": "xhigh"}
  sandbox: read-only
  prompt: |
    You are auditing a bibliographic entry. Use web/DBLP/arXiv search.

    ## Bib entry
    @article{key2024example,
      author = {...}, title = {...}, journal = {...}, year = {...}, ...
    }

    ## Where this entry is cited in the paper
    [paste extracted contexts]

    For this entry, verify:
    1. EXISTENCE: does this paper exist at the claimed arXiv ID / DOI / venue?
       Output: YES / NO / UNCERTAIN, with the verifying URL.
    2. METADATA: are author names, year, venue, title correct?
       For each, output: correct / wrong: should be ... / typo: ...
    3. CONTEXT: for each use, does the cited paper actually support the surrounding claim?
       Output per-use: SUPPORTS / WEAK / WRONG, with one-sentence reasoning.

    VERDICT: KEEP / FIX / REPLACE / REMOVE
    - KEEP: entry is clean, all uses are appropriate
    - FIX: metadata needs correction; uses are appropriate
    - REPLACE: cite is wrong-context, find a different paper that actually supports the claim
    - REMOVE: entry is hallucinated or unsupportable

    Be honest. If you cannot verify online, say UNCERTAIN; do not guess.

Save the response to .aris/traces/citation-audit/<date>_runNN/<key>.md per the review-tracing protocol.

Step 4: Aggregate verdicts

Build CITATION_AUDIT.json following the schema defined in "Submission Artifact Emission" below (single authoritative schema for this file). Per-entry ledger data goes under details.per_entry, not under a top-level entries field. The top-level verdict is a single overall value (PASS / WARN / FAIL / NOT_APPLICABLE / BLOCKED / ERROR) derived from per-entry verdicts per the decision table in "Submission Artifact Emission"; the top-level summary is a one-line human-readable string.

Concretely, details carries the per-entry ledger:

"details": {
  "total_entries": 29,
  "counts": { "KEEP": 11, "FIX": 14, "REPLACE": 3, "REMOVE": 1 },
  "per_entry": [
    {
      "key": "lu2024aiscientist",
      "verdict": "KEEP",
      "axis_failures": [],
      "uses": [
        {"file": "sections/1.intro.tex", "line": 11, "verdict": "SUPPORTS"},
        {"file": "sections/6.related.tex", "line": 8, "verdict": "SUPPORTS"}
      ]
    },
    {
      "key": "madaan2023selfrefine",
      "verdict": "FIX",
      "axis_failures": ["CONTEXT"],
      "uses": [
        {"file": "sections/2.overview.tex", "line": 42, "verdict": "WRONG",
         "note": "Self-Refine demonstrates iterative improvement, not correlated errors"},
        {"file": "sections/6.related.tex", "line": 13, "verdict": "SUPPORTS"}
      ]
    }
  ]
}

See "Submission Artifact Emission" for the full artifact (to

Back to 论文 · 文献综述