Use Cases

Eight practical scenarios with working request examples and notes on edge cases. No marketing language — just what works and what to watch for.

Eight practical scenarios with working request examples and notes on edge cases.

1. Markdown documentation to PDF

Scenario: A CLI tool or CI pipeline converts a Markdown README or technical specification to a PDF for distribution.

bash

curl -s http://localhost:5741/convert?format=md \
  -H "Content-Type: text/plain" \
  --data-binary @README.md \
  -o output.pdf

Or via file upload (auto-detects format from .md extension):

bash

curl -s http://localhost:5741/convert \
  -F "file=@README.md" \
  -o output.pdf

What to watch:

Markdown with embedded HTML fragments (<div>, <iframe>) — the engine sanitises HTML only when the format is explicitly html or when auto-detection identifies the input as HTML. Markdown with HTML fragments is not sanitised; the fragments are treated as literal text in the paragraph.
GFM tables are fully supported. GitHub-specific extensions (task lists, footnotes) are parsed as plain text.
Fenced code blocks preserve the language hint in the AST (Lang field) but the renderer does not yet apply syntax highlighting.

2. CSV data table to PDF

Scenario: Export a spreadsheet or database query result as a formatted PDF table.

bash

curl -s http://localhost:5741/convert?format=csv \
  -H "Content-Type: text/plain" \
  --data-binary @report.csv \
  -o report.pdf

Example input (report.csv):

csv

Name,Role,Status
Anna Veretennykova,Lead,Active
PDF Engine,Service,Running

What to watch:

The first row is treated as column headers.
Very wide tables (many columns) may produce columns too narrow to read at A4 width. Split into multiple narrower tables in the source document.
TSV (tab-separated) files are auto-detected correctly by extension (.tsv) and by content (tab delimiter heuristic).

3. DOCX upload to PDF

Scenario: Accept a DOCX file uploaded from a browser and return a PDF.

bash

curl -s http://localhost:5741/convert \
  -F "file=@document.docx" \
  -o document.pdf

From a browser form:

html

<form action="http://localhost:5741/convert" method="post" enctype="multipart/form-data">
  <input type="file" name="file" accept=".docx">
  <button type="submit">Convert</button>
</form>

What to watch:

DOCX support covers: headings (Heading1–Heading6, Title, Subtitle styles), paragraphs, bold/italic/strikethrough runs, ordered and unordered lists (reads numbering.xml), tables.
Embedded images in DOCX are not yet extracted and rendered. The image placeholder is omitted from the PDF.
Password-protected DOCX files cannot be parsed — ENGINE_ERR_PARSE_FAILED is returned.
The DOCX parser has no external dependencies — it reads the ZIP/XML structure directly.

4. Jupyter Notebook to PDF

Scenario: Convert a .ipynb notebook for sharing or archiving.

bash

curl -s http://localhost:5741/convert \
  -F "file=@analysis.ipynb" \
  -o analysis.pdf

What to watch:

Code cells are rendered as code blocks using the monospace font.
Markdown cells are fully parsed.
Cell output (stdout, stderr, display_data) is not included — only cell source content is converted.
Large notebooks with many cells may hit document.max_nodes. Increase the limit in config/limits.yaml if needed.

5. HTML report to PDF

Scenario: A server-side template renders an HTML report, which is then converted to PDF.

bash

curl -s http://localhost:5741/convert?format=html \
  -H "Content-Type: text/html" \
  --data-binary @report.html \
  -o report.pdf

What to watch:

The engine sanitises HTML input via bluemonday before parsing. Allowed tags: standard text elements (p, h1–h6, ul, ol, li, table, pre, code, blockquote, strong, em, a, img, hr). Script tags, iframes, and style attributes are stripped.
CSS is not applied. Inline styles, class attributes, and external stylesheets have no effect on the PDF output. The engine's own config/style.yaml controls all styling.
Complex HTML layouts (grid, flexbox, floats) are not rendered as laid out — the engine extracts the text content and formats it as a document.

6. REST API with token authentication

Scenario: The engine is deployed as an internal service behind an API gateway. Only authorised callers may use it.

Set ANNAVE_INTERNAL_TOKEN to a strong random secret:

bash

ANNAVE_INTERNAL_TOKEN=my-secret go run cmd/server/main.go

All requests must include the token:

bash

curl -s http://localhost:5741/convert?format=md \
  -H "Content-Type: text/plain" \
  -H "X-Internal-Token: my-secret" \
  --data-binary @README.md \
  -o output.pdf

Without the token:

text

HTTP 401
{"error":{"code":"ENGINE_ERR_UNAUTHORIZED","stage":"input","message":"Missing or invalid internal token."}}

What to watch:

If ANNAVE_INTERNAL_TOKEN is empty or unset, token enforcement is disabled. This is intentional for local development. Never run without a token in production.
The token must match exactly, including case and any trailing whitespace. Generate a strong token with openssl rand -hex 32.

7. Large document with pagination

Scenario: Convert a 50-page technical specification or legal document.

bash

curl -s http://localhost:5741/convert?format=md \
  -H "Content-Type: text/plain" \
  --data-binary @spec.md \
  -o spec.pdf

If the document exceeds the default limits:

json

{
  "error": {
    "code": "ENGINE_ERR_TOO_MANY_PAGES",
    "stage": "pagination",
    "message": "Document produced 147 pages; the maximum is 100. Reduce the document size or increase max_pages in config/limits.yaml."
  }
}

Increase the limit in config/limits.yaml:

yaml

document:
  max_pages: 500

Then rebuild and redeploy.

8. Image to single-page PDF

Scenario: Wrap a PNG or JPEG in a PDF page, for archiving or consistent delivery format.

bash

curl -s http://localhost:5741/convert \
  -F "file=@diagram.png" \
  -o diagram.pdf

Or raw body:

bash

curl -s http://localhost:5741/convert?format=png \
  -H "Content-Type: application/octet-stream" \
  --data-binary @diagram.png \
  -o diagram.pdf

What to watch:

Supported formats: PNG, JPEG, GIF, WebP
The image is embedded at its original dimensions, scaled to fit within the page's text column width if it would overflow
Very large images (dimensions in thousands of pixels) will be scaled down; quality depends on the original resolution
The format is auto-detected from magic bytes, so the file extension does not need to be correct