Claude Code Skills · 论文 · 文档处理

markitdown

Microsoft 开源的 MarkItDown 工具,用于将 PDF、Office 文档、图片、音频、HTML 等十余种格式统一转为干净 Markdown。Markdown 对 LLM 处理更为节约 token,适合作为知识库预处理或文档管道的前端环节。此 skill 还融合了生成科学示意图的能力,可在转换完成后自动建议补充论文级插图。

Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.

Repo
Chanw-research/claude-code-paper-writing
Slug
markitdown

SKILL.md

MarkItDown - File to Markdown Conversion

Overview

MarkItDown is a Python tool developed by Microsoft for converting various file formats to Markdown. It's particularly useful for converting documents into LLM-friendly text format, as Markdown is token-efficient and well-understood by modern language models.

Key Benefits:

  • Convert documents to clean, structured Markdown
  • Token-efficient format for LLM processing
  • Supports 15+ file formats
  • Optional AI-enhanced image descriptions
  • OCR for images and scanned documents
  • Speech transcription for audio files

Visual Enhancement with Scientific Schematics

When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.

If your document does not already contain schematics or diagrams:

  • Use the scientific-schematics skill to generate AI-powered publication-quality diagrams
  • Simply describe your desired diagram in natural language
  • Nano Banana Pro will automatically generate, review, and refine the schematic

For new documents: Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.

How to generate schematics:

python scripts/generate_schematic.py "your diagram description" -o figures/output.png

The AI will automatically:

  • Create publication-quality images with proper formatting
  • Review and refine through multiple iterations
  • Ensure accessibility (colorblind-friendly, high contrast)
  • Save outputs in the figures/ directory

When to add schematics:

  • Document conversion workflow diagrams
  • File format architecture illustrations
  • OCR processing pipeline diagrams
  • Integration workflow visualizations
  • System architecture diagrams
  • Data flow diagrams
  • Any complex concept that benefits from visualization

For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.


Supported Formats

FormatDescriptionNotes
PDFPortable Document FormatFull text extraction
DOCXMicrosoft WordTables, formatting preserved
PPTXPowerPointSlides with notes
XLSXExcel spreadsheetsTables and data
ImagesJPEG, PNG, GIF, WebPEXIF metadata + OCR
AudioWAV, MP3Metadata + transcription
HTMLWeb pagesClean conversion
CSVComma-separated valuesTable format
JSONJSON dataStructured representation
XMLXML documentsStructured format
ZIPArchive filesIterates contents
EPUBE-booksFull text extraction
YouTubeVideo URLsFetch transcriptions

Quick Start

Installation

# Install with all features
pip install 'markitdown[all]'

# Or from source
git clone https://github.com/microsoft/markitdown.git
cd markitdown
pip install -e 'packages/markitdown[all]'

Command-Line Usage

# Basic conversion
markitdown document.pdf > output.md

# Specify output file
markitdown document.pdf -o output.md

# Pipe content
cat document.pdf | markitdown > output.md

# Enable plugins
markitdown --list-plugins  # List available plugins
markitdown --use-plugins document.pdf -o output.md

Python API

from markitdown import MarkItDown

# Basic usage
md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)

# Convert from stream
with open("document.pdf", "rb") as f:
    result = md.convert_stream(f, file_extension=".pdf")
    print(result.text_content)

Advanced Features

1. AI-Enhanced Image Descriptions

Use LLMs via OpenRouter to generate detailed image descriptions (for PPTX and image files):

from markitdown import MarkItDown
from openai import OpenAI

# Initialize OpenRouter client (OpenAI-compatible API)
client = OpenAI(
    api_key="your-openrouter-api-key",
    base_url="https://openrouter.ai/api/v1"
)

md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-opus-4.5",  # recommended for scientific vision
    llm_prompt="Describe this image in detail for scientific documentation"
)

result = md.convert("presentation.pptx")
print(result.text_content)

2. Azure Document Intelligence

For enhanced PDF conversion with Microsoft Document Intelligence:

# Command line
markitdown document.pdf -o output.md -d -e "<document_intelligence_endpoint>"
# Python API
from markitdown import MarkItDown

md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
result = md.convert("complex_document.pdf")
print(result.text_content)

3. Plugin System

MarkItDown supports 3rd-party plugins for extending functionality:

# List installed plugins
markitdown --list-plugins

# Enable plugins
markitdown --use-plugins file.pdf -o output.md

Find plugins on GitHub with hashtag: #markitdown-plugin

Optional Dependencies

Control which file formats you support:

# Install specific formats
pip install 'markitdown[pdf, docx, pptx]'

# All available options:
# [all]                  - All optional dependencies
# [pptx]                 - PowerPoint files
# [docx]                 - Word documents
# [xlsx]                 - Excel spreadsheets
# [xls]                  - Older Excel files
# [pdf]                  - PDF documents
# [outlook]              - Outlook messages
# [az-doc-intel]         - Azure Document Intelligence
# [audio-transcription]  - WAV and MP3 transcription
# [youtube-transcription] - YouTube video transcription

Common Use Cases

1. Convert Scientific Papers to Markdown

from markitdown import MarkItDown

md = MarkItDown()

# Convert PDF paper
result = md.convert("research_paper.pdf")
with open("paper.md", "w") as f:
    f.write(result.text_content)

2. Extract Data from Excel for Analysis

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("data.xlsx")

# Result will be in Markdown table format
print(result.text_content)

3. Process Multiple Documents

from markitdown import MarkItDown
import os
from pathlib import Path

md = MarkItDown()

# Process all PDFs in a directory
pdf_dir = Path("papers/")
output_dir = Path("markdown_output/")
output_dir.mkdir(exist_ok=True)

for pdf_file in pdf_dir.glob("*.pdf"):
    result = md.convert(str(pdf_file))
    output_file = output_dir / f"{pdf_file.stem}.md"
    output_file.write_text(result.text_content)
    print(f"Converted: {pdf_file.name}")

4. Convert PowerPoint with AI Descriptions

from markitdown import MarkItDown
from openai import OpenAI

# Use OpenRouter for access to multiple AI models
client = OpenAI(
    api_key="your-openrouter-api-key",
    base_url="https://openrouter.ai/api/v1"
)

md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-opus-4.5",  # recommended for presentations
    llm_prompt="Describe this slide image in detail, focusing on key visual elements and data"
)

result = md.convert("presentation.pptx")
with open("presentation.md", "w") as f:
    f.write(result.text_content)

5. Batch Convert with Different Formats

from markitdown import MarkItDown
from pathlib import Path

md = MarkItDown()

# Files to convert
files = [
    "document.pdf",
    "spreadsheet.xlsx",
    "presentation.pptx",
    "notes.docx"
]

for file in files:
    try:
        result = md.convert(file)
        output = Path(file).stem + ".md"
        with open(output, "w") as f:
            f.write(result.text_content)
        print(f"✓ Converted {file}")
    except Exception as e:
        print(f"✗ Error converting {file}: {e}")

6. Extract YouTube Video Transcription

from markitdown import MarkItDown

md = MarkItDown()

# Convert YouTube video to transcript
result = md.convert("https://www.youtube.com/watch?v=VIDEO_ID")
print(result.tex

同一分类的其他项