Claude Code Skills · 论文 · 演讲与海报
academic-presentations
Create academic presentation slide decks and optionally demo videos from research papers. Use when the user asks to "make slides", "create a deck", "make a presentation", "demo video", "paper slides", "conference talk slides", or wants to turn a paper into a visual presentation. Covers slide generation, narration scripts, TTS audio, and video assembly.
- Repo
Chanw-research/claude-code-paper-writing- Slug
academic-presentations
SKILL.md
Academic Presentations
Produce slide decks (and optionally narrated demo videos) from research papers. The human drives all outline and visual decisions — the agent executes.
Pipeline
[1] Script Draft ──→ [2] Slide Generation ──→ [3] TTS Audio (optional) ──→ [4] Video Assembly (optional)
Claude Code nanobanana /edit edge-tts / Kokoro / ElevenLabs ffmpeg
Skip stages 3–4 for slide-only output. User can enter at any stage.
Stage 1: Script / Outline
Input: paper + user-provided outline or slide plan
Output: video-scripts.md or slide-outline.md — per-slide content with talking points
The agent drafts scripts based on the user's outline. The user owns the structure — agent does not decide slide count, order, or what to emphasize.
Stage 2: Slide Generation
Full reference: references/slide-generation.md
Tool: nanobanana (Gemini CLI extension)
Priority order (edit-first):
- Has paper figure → nanobanana
/editto wrap into slide frame - Has existing slide →
/editto adapt - User-provided reference (e.g., from NotebookLM or PPTX the user made) →
/editto refine - Title slide from scratch → generate with academic style prompt
- Content slide from scratch → generate with deck-style preamble
Key principle: prefer /edit on existing HQ paper figures over generating from scratch.
Deck style: create deck-style.md once per deck, prepend to all generate-from-scratch prompts. For /edit, style is inherited from the base image.
Example deck-style.md:
- Canvas: 1920x1080, white background
- Accent: #2563EB blue, text: #1e293b dark slate
- Clean sans-serif, flat design, no gradients/shadows
- Bottom bar: blue accent with white affiliation text
Stage 3: TTS Audio (optional)
Full reference: references/tts-engines.md Batch scripts: scripts/batch_tts_edge.py, scripts/batch_tts_kokoro.py
Output: one audio file per narrated slide
Engine Selection
| Engine | Quality | Cost | Latency | Best For |
|---|---|---|---|---|
| edge-tts (default) | Very good | Free, unlimited | ~6s/slide (cloud) | Quick generation, good male voices |
| Kokoro | Very good | Free, unlimited | ~1.5s/slide (local) | Offline use, fast batch, good female voices |
| ElevenLabs | Premium | 10k chars free/mo | ~3s/slide (cloud) | Highest quality, voice cloning |
Default: Use edge-tts unless user requests offline or premium quality.
Quick Start (edge-tts)
import edge_tts, asyncio
async def tts_slide(text, output, voice="en-US-AndrewNeural"):
await edge_tts.Communicate(text, voice).save(output)
asyncio.run(tts_slide("Your slide text here", "slide_01.mp3"))
Voices: AndrewNeural (male, presenter), AriaNeural (female), GuyNeural (male, warm), JennyNeural (female, pro)
Stage 4: Video Assembly (optional)
Tool: ffmpeg Input: slide PNGs + audio files + optional demo recording
# Use symlink to avoid iCloud path spaces: ln -sfn "long path" /tmp/workdir
# Slide with audio:
ffmpeg -y -loop 1 -i slide.png -i audio.mp3 \
-c:v libx264 -tune stillimage -pix_fmt yuv420p \
-c:a aac -ar 44100 -ac 2 -shortest seg.mp4
# Silent slide (N seconds):
ffmpeg -y -loop 1 -i slide.png -f lavfi -i anullsrc=r=44100:cl=stereo \
-c:v libx264 -tune stillimage -pix_fmt yuv420p \
-c:a aac -ar 44100 -ac 2 -t N seg.mp4
# Concat (always re-encode, never -c copy):
printf "file 'seg1.mp4'\nfile 'seg2.mp4'\n..." > concat.txt
ffmpeg -y -f concat -safe 0 -i concat.txt \
-c:v libx264 -pix_fmt yuv420p -c:a aac -ar 44100 -ac 2 final.mp4
All segments MUST share: 44100Hz sample rate, stereo, AAC codec.
PPTX Conversion (if needed)
Full reference: references/pptx-conversion.md
If starting from an existing PPTX, convert slides to PNG images first:
soffice --headless --convert-to pdf --outdir output/ presentation.pptx
pdftoppm -png -r 300 output/presentation.pdf output/slide
NotebookLM — Human Reference Only
The agent must NOT auto-invoke NotebookLM or use its outputs to drive slide/script decisions. The human owns the outline, visual arrangement, and deck direction.
When to recommend: only when the user says they're unsure what to put on slides or need inspiration.
Gotchas
- iCloud paths with spaces break ffmpeg — symlink to
/tmp/ - Audio format mismatch breaks concat — always re-encode with
-ar 44100 -ac 2 - ElevenLabs free tier —
mp3_22050_32only, 10k chars/month - edge-tts needs internet — falls back to Kokoro if offline
- Kokoro WAV files are ~7x larger — convert to MP3 with ffmpeg before video assembly
- Kokoro first run downloads ~350MB model — ensure pip is in the venv
/editdistorts figure — be more explicit: "Keep the original figure exactly as-is, only add framing"- Style drift across slides — use
/editfrom base slide or prepend shareddeck-style.md
Dependencies
| Tool | Stage | Install |
|---|---|---|
| Gemini CLI + nanobanana | 2 | gemini extensions install https://github.com/gemini-cli-extensions/nanobanana |
| LibreOffice + poppler | 2 (PPTX) | brew install --cask libreoffice && brew install poppler |
| edge-tts | 3 | pip install edge-tts |
| Kokoro | 3 (offline) | pip install kokoro soundfile |
| ElevenLabs | 3 (premium) | pip install elevenlabs + ELEVENLABS_API_KEY |
| ffmpeg | 4 | brew install ffmpeg |
同一分类的其他项