Use Cases
Eight practical scenarios with working request examples and notes on edge cases. No marketing language — just what works and what to watch for.
Eight practical scenarios with working request examples and notes on edge cases.
1. Markdown documentation to PDF
Scenario: A CLI tool or CI pipeline converts a Markdown README or technical specification to a PDF for distribution.
curl -s http://localhost:5741/convert?format=md \
-H "Content-Type: text/plain" \
--data-binary @README.md \
-o output.pdfOr via file upload (auto-detects format from .md extension):
curl -s http://localhost:5741/convert \
-F "file=@README.md" \
-o output.pdfWhat to watch:
- Markdown with embedded HTML fragments (
<div>,<iframe>) — the engine sanitises HTML only when the format is explicitlyhtmlor when auto-detection identifies the input as HTML. Markdown with HTML fragments is not sanitised; the fragments are treated as literal text in the paragraph. - GFM tables are fully supported. GitHub-specific extensions (task lists, footnotes) are parsed as plain text.
- Fenced code blocks preserve the language hint in the AST (
Langfield) but the renderer does not yet apply syntax highlighting.
2. CSV data table to PDF
Scenario: Export a spreadsheet or database query result as a formatted PDF table.
curl -s http://localhost:5741/convert?format=csv \
-H "Content-Type: text/plain" \
--data-binary @report.csv \
-o report.pdfExample input (report.csv):
Name,Role,Status
Anna Veretennykova,Lead,Active
PDF Engine,Service,RunningWhat to watch:
- The first row is treated as column headers.
- Very wide tables (many columns) may produce columns too narrow to read at A4 width. Split into multiple narrower tables in the source document.
- TSV (tab-separated) files are auto-detected correctly by extension (
.tsv) and by content (tab delimiter heuristic).
3. DOCX upload to PDF
Scenario: Accept a DOCX file uploaded from a browser and return a PDF.
curl -s http://localhost:5741/convert \
-F "file=@document.docx" \
-o document.pdfFrom a browser form:
<form action="http://localhost:5741/convert" method="post" enctype="multipart/form-data">
<input type="file" name="file" accept=".docx">
<button type="submit">Convert</button>
</form>What to watch:
- DOCX support covers: headings (Heading1–Heading6, Title, Subtitle styles), paragraphs, bold/italic/strikethrough runs, ordered and unordered lists (reads
numbering.xml), tables. - Embedded images in DOCX are not yet extracted and rendered. The image placeholder is omitted from the PDF.
- Password-protected DOCX files cannot be parsed —
ENGINE_ERR_PARSE_FAILEDis returned. - The DOCX parser has no external dependencies — it reads the ZIP/XML structure directly.
4. Jupyter Notebook to PDF
Scenario: Convert a .ipynb notebook for sharing or archiving.
curl -s http://localhost:5741/convert \
-F "file=@analysis.ipynb" \
-o analysis.pdfWhat to watch:
- Code cells are rendered as code blocks using the monospace font.
- Markdown cells are fully parsed.
- Cell output (stdout, stderr, display_data) is not included — only cell source content is converted.
- Large notebooks with many cells may hit
document.max_nodes. Increase the limit inconfig/limits.yamlif needed.
5. HTML report to PDF
Scenario: A server-side template renders an HTML report, which is then converted to PDF.
curl -s http://localhost:5741/convert?format=html \
-H "Content-Type: text/html" \
--data-binary @report.html \
-o report.pdfWhat to watch:
- The engine sanitises HTML input via
bluemondaybefore parsing. Allowed tags: standard text elements (p, h1–h6, ul, ol, li, table, pre, code, blockquote, strong, em, a, img, hr). Script tags, iframes, and style attributes are stripped. - CSS is not applied. Inline styles, class attributes, and external stylesheets have no effect on the PDF output. The engine's own
config/style.yamlcontrols all styling. - Complex HTML layouts (grid, flexbox, floats) are not rendered as laid out — the engine extracts the text content and formats it as a document.
6. REST API with token authentication
Scenario: The engine is deployed as an internal service behind an API gateway. Only authorised callers may use it.
Set ANNAVE_INTERNAL_TOKEN to a strong random secret:
ANNAVE_INTERNAL_TOKEN=my-secret go run cmd/server/main.goAll requests must include the token:
curl -s http://localhost:5741/convert?format=md \
-H "Content-Type: text/plain" \
-H "X-Internal-Token: my-secret" \
--data-binary @README.md \
-o output.pdfWithout the token:
HTTP 401
{"error":{"code":"ENGINE_ERR_UNAUTHORIZED","stage":"input","message":"Missing or invalid internal token."}}What to watch:
- If
ANNAVE_INTERNAL_TOKENis empty or unset, token enforcement is disabled. This is intentional for local development. Never run without a token in production. - The token must match exactly, including case and any trailing whitespace. Generate a strong token with
openssl rand -hex 32.
7. Large document with pagination
Scenario: Convert a 50-page technical specification or legal document.
curl -s http://localhost:5741/convert?format=md \
-H "Content-Type: text/plain" \
--data-binary @spec.md \
-o spec.pdfIf the document exceeds the default limits:
{
"error": {
"code": "ENGINE_ERR_TOO_MANY_PAGES",
"stage": "pagination",
"message": "Document produced 147 pages; the maximum is 100. Reduce the document size or increase max_pages in config/limits.yaml."
}
}Increase the limit in config/limits.yaml:
document:
max_pages: 500Then rebuild and redeploy.
8. Image to single-page PDF
Scenario: Wrap a PNG or JPEG in a PDF page, for archiving or consistent delivery format.
curl -s http://localhost:5741/convert \
-F "file=@diagram.png" \
-o diagram.pdfOr raw body:
curl -s http://localhost:5741/convert?format=png \
-H "Content-Type: application/octet-stream" \
--data-binary @diagram.png \
-o diagram.pdfWhat to watch:
- Supported formats: PNG, JPEG, GIF, WebP
- The image is embedded at its original dimensions, scaled to fit within the page's text column width if it would overflow
- Very large images (dimensions in thousands of pixels) will be scaled down; quality depends on the original resolution
- The format is auto-detected from magic bytes, so the file extension does not need to be correct