CHECKING STATUS
I AM LISTENING TO
|

Day 49: Self-Hosted PDF Processing APIs – Form Filling, Metadata, HTML to PDF, and Header Templates – 7 Days of Docker

27. May 2026
.SHARE

Table of Contents

PDF operations are one of those recurring pain points that never fully go away. You need to fill a contract template, strip and rewrite document metadata before archiving, generate an invoice from an HTML template, and stamp every page with a branded header. The default answer is a SaaS API subscription that charges per document and routes your files through someone else’s infrastructure. The better answer is a small collection of open-source Docker containers that handle all of this on your own server.

This article maps four specific PDF challenges – form filling, metadata editing, HTML-to-PDF conversion, and header/footer templates – to the Docker tools built to solve them. Day 48 of this series covers Gotenberg in depth, so this article only references it where relevant and focuses on the remaining options.

What Does This Landscape Actually Cover?

Several open-source Docker projects solve parts of the PDF problem. None solves all of it, which is why understanding each tool’s strength matters:

  • Stirling-PDF – All-in-one web UI and REST API backed by LibreOffice and Tesseract. Best for teams that need a single container covering 70+ operations including metadata, conversion, OCR, security, and a full AcroForm form API (/api/v1/form/fill, /api/v1/form/fields, /api/v1/form/modify-fields, and more).
  • KatSick/pdftk-as-a-service – Lightweight Go + Gin wrapper around PDFtk. Fills AcroForm field values via a single REST endpoint. A minimal alternative if you need form filling without running the full Stirling-PDF stack.
  • torfs-ict/docker-pdftk-webservice – PHP + Symfony wrapper around PDFtk. Currently only supports merging PDFs over HTTP. Very limited scope.
  • WeasyPrint Docker (4teamwork/weasyprint-docker) – Python aiohttp service that generates PDFs from HTML using WeasyPrint. Headers and footers are defined in CSS @page rules – no separate HTML file upload needed. Strong CSS Paged Media compliance.
  • wkhtmltopdf Docker – C++ binary wrapped in a container. Accepts base64 HTML via JSON. The upstream binary is abandoned and has rendering bugs with modern CSS. Avoid for new projects.
  • Gotenberg – Covered fully in Day 48. Referenced below for context only.

Four Problems, Four Tool Choices

Problem 1 – Filling PDF Form Fields

PDF forms use AcroForm fields – named input slots embedded in the document. Filling them programmatically means supplying a JSON payload that maps field names to values.

Best option: Stirling-PDF /api/v1/form/fill

Stirling-PDF ships a dedicated form API under the /api/v1/form/ base path, implemented in FormFillController using Apache PDFBox. It covers the full lifecycle of a PDF form in a single container:

  • POST /api/v1/form/fields – Inspect all form fields and their metadata
  • POST /api/v1/form/fields-with-coordinates – Same, with widget pixel coordinates for rendering
  • POST /api/v1/form/extract-csv – Export all field names and current values as CSV
  • POST /api/v1/form/extract-xlsx – Export all field names and current values as XLSX
  • POST /api/v1/form/fill – Fill fields from a JSON object of field-name/value pairs
  • POST /api/v1/form/modify-fields – Update existing field definitions
  • POST /api/v1/form/delete-fields – Remove specific fields from the document

The fill endpoint accepts file (the PDF template), data (a JSON object of field-name to value pairs), and an optional flatten boolean that locks field values into static content after filling.

KatSick/pdftk-as-a-service as a lightweight alternative

KatSick/pdftk-as-a-service wraps PDFtk in a single Go/Gin REST endpoint. It is still a valid option if you specifically want a minimal, standalone form-fill microservice without the full Stirling-PDF stack. For any new deployment that already runs Stirling-PDF, the built-in /api/v1/form/fill endpoint removes the need for a separate container.

Problem 2 – Changing PDF Metadata

Metadata – title, author, subject, keywords, creator – is stored inside the PDF but often wrong, empty, or left over from a template. Correcting it via API is a clean operation that does not re-render any content.

Best option: Stirling-PDF /api/v1/misc/update-metadata

Stirling-PDF’s REST API exposes a dedicated metadata update endpoint. POST the PDF with individual fields as multipart form data, get the corrected PDF back. Simple and reliable.

Problem 3 – HTML to PDF Conversion

HTML-to-PDF is fundamentally about fidelity: does the rendered PDF look exactly like the HTML in a browser? Two self-hosted approaches exist:

  • Chromium-based (Gotenberg, Day 48): Runs a full headless browser. JavaScript executes, web fonts load, modern CSS applies. Best fidelity. Gotenberg’s /forms/chromium/convert/html endpoint is the reference implementation for this approach.
  • CSS Paged Media (WeasyPrint): Interprets CSS @page rules without a browser engine. No JavaScript. Works well for structured, predictable documents where you control the stylesheet. Much smaller container footprint.
  • LibreOffice-based (Stirling-PDF): Handles HTML but primary strength is Office formats (.docx, .xlsx, .pptx). Good for documents that originate in Word rather than HTML templates.

Problem 4 – Header and Footer Templates

Two distinct approaches exist here:

  • Separate HTML file upload (Gotenberg, Day 48): Upload header.html and footer.html alongside the main document. Chromium renders them with dynamic CSS classes (.pageNumber, .totalPages, .date, .title). The header and footer run in a separate Chromium context – your main CSS does not apply there and JavaScript does not execute.
  • CSS @page rules (WeasyPrint): Define headers and footers entirely in CSS using @top-center, @bottom-left etc. margin boxes. No separate file upload. Full access to the document’s styles. Best option when you own the full HTML and stylesheet.

Docker Compose Setups

Stirling-PDF

Use the latest-fat variant to unlock LibreOffice-based conversion, Office format support, and enhanced HTML handling:

KatSick/pdftk-as-a-service

A minimal container with no configuration required beyond the port mapping:

WeasyPrint Docker

The 4teamwork/weasyprint-docker image runs a Python aiohttp service. Upload an HTML file with optional CSS and receive a PDF:

torfs-ict/docker-pdftk-webservice

Useful only if you need HTTP-driven PDF merging and nothing else:

Installation Steps

1. Choose Your Stack

Pick tools based on what you actually need:

  • All-in-one PDF operations plus a web UI: Stirling-PDF
  • Programmatic AcroForm filling: Stirling-PDF /api/v1/form/fill (or KatSick as a standalone micro-alternative)
  • HTML-to-PDF with CSS Paged Media headers: WeasyPrint Docker
  • HTML-to-PDF with browser rendering + per-page header/footer files: Gotenberg (Day 48)

2. Create the Project Directory

3. Pull Images

Note: stirlingtools/stirling-pdf:latest-fat is around 2.5 GB due to LibreOffice. The other images are well under 200 MB each.

4. Launch the Stack

5. Verify Each Service

Environment Variables Explained

Stirling-PDF: DOCKER_ENABLE_SECURITY

Purpose: Master switch for the authentication system in Docker deployments.

Format: Boolean string

Default: false

Must be set to true before enabling login. Leaving this false while enabling SECURITY_ENABLELOGIN produces unexpected behaviour.

Stirling-PDF: SECURITY_ENABLELOGIN

Purpose: Requires users to log in before accessing the UI or API.

Format: Boolean string

Default: false

When enabled, API requests must include an X-API-KEY header. Generate the key from user settings after first login.

Stirling-PDF: SECURITY_INITIALLOGIN_USERNAME / PASSWORD

Purpose: Credentials for the administrator account created on first startup.

Security note: Change the default password immediately after first login.

Stirling-PDF: SYSTEM_MAXFILESIZE

Purpose: Maximum single file upload size in megabytes, at the application layer.

Default: 2000

Must be paired with both SPRING_SERVLET_MULTIPART_MAX_FILE_SIZE and SPRING_SERVLET_MULTIPART_MAX_REQUEST_SIZE – Spring enforces its own limit independently and will silently reject large files before Stirling-PDF’s check applies.

Stirling-PDF: JAVA_TOOL_OPTIONS

Purpose: JVM startup flags. Controls heap memory allocation for the Spring Boot process.

Default: Not set

-Xmx sets the maximum heap. Always leave headroom below total container memory for LibreOffice and OS overhead.

Stirling-PDF: INSTALL_BOOK_AND_ADVANCED_HTML_OPS

Purpose: Installs Calibre and additional conversion tools at startup to enable ebook and advanced HTML processing routes.

Default: false

Only meaningful on the latest-fat image. Has no effect on standard or ultra-lite images.

KatSick/pdftk-as-a-service

This container has no application-level environment variables. It binds to port 8080 inside the container and requires no configuration. Everything is per-request via multipart form data.

WeasyPrint Docker

The 4teamwork weasyprint-docker image also requires no environment variables for basic operation. It starts an aiohttp server on port 5000 immediately.

Volume Mounts Explained

Stirling-PDF: /configs

Purpose: Stores settings.yml, the embedded H2 database containing user accounts, API keys, and audit log.

Mount Point: /configs

Critical when login is enabled. Back this volume up before any container updates.

Stirling-PDF: /usr/share/tessdata

Purpose: Tesseract OCR trained data files. One .traineddata file per language, downloaded and dropped here to add OCR support for additional languages without rebuilding the image.

Mount Point: /usr/share/tessdata

Stirling-PDF: /pipeline

Purpose: Stores no-code automation pipeline definitions created in the UI. JSON files that chain multiple PDF operations into a reusable workflow.

Mount Point: /pipeline

KatSick and WeasyPrint

Neither tool requires persistent volumes. Both are stateless: they accept a request, process it in memory, return the result, and retain nothing between calls.

Common Use Cases

Fill a Contract or Invoice Template

Create a PDF template with named AcroForm fields in any PDF editor. First inspect the field names, then POST the values as JSON to Stirling-PDF. The returned PDF has all fields populated and is ready to send. Pass flatten=true to lock the values into static content before delivery:

The flatten=false default leaves fields editable in the returned PDF – useful when the recipient needs to add a signature or make corrections before finalising.

Fix Metadata Before Archiving

Documents generated from templates often carry wrong or empty metadata. Correct it via Stirling-PDF before storing in a document management system:

Generate a Branded PDF from an HTML Template using WeasyPrint

Write the report in HTML and CSS. Define headers and footers directly in the stylesheet using CSS Paged Media @page rules. Upload to WeasyPrint and receive the PDF:

WeasyPrint supports all CSS Paged Media margin boxes: @top-left, @top-center, @top-right, @bottom-left, @bottom-center, @bottom-right, and corner boxes. Headers and footers inherit document fonts and colours because they share the same rendering context, unlike the separate Chromium contexts used by Gotenberg.

Merge and Secure a Document Package

Combine multiple PDFs with Stirling-PDF, add a password, then stamp with a watermark – all via three API calls in sequence:

OCR and Archive Scanned Documents

Add a searchable text layer to scanned PDFs via Stirling-PDF’s Tesseract integration, then strip metadata before archiving:

Convert Office Documents to PDF

Stirling-PDF’s fat image includes LibreOffice. POST any Word, Excel, or PowerPoint file and receive a PDF without installing Office anywhere:

Useful Links

Conclusion

No single Docker image covers every PDF need perfectly, but Stirling-PDF comes closest. It wins on breadth – 70+ tools, a polished web UI, a complete REST API, active maintenance – and now ships a full AcroForm form suite under /api/v1/form/, covering field inspection, filling, modification, deletion, and CSV/XLSX export. The one remaining gap from earlier versions has been closed.

KatSick/pdftk-as-a-service remains a valid choice if you want a standalone form-fill microservice with zero overhead and no Java runtime to manage, but for any deployment already running Stirling-PDF it is redundant. WeasyPrint Docker handles the CSS Paged Media use case where headers and footers are defined in CSS @page rules rather than uploaded as separate HTML files.

For pixel-perfect HTML-to-PDF with headless browser rendering and per-page header and footer HTML templates, Day 48 of this series covers Gotenberg, which remains the strongest tool for that specific job.

The practical stack for most teams is Stirling-PDF as the primary workhorse. Add WeasyPrint if your invoice or report templates are CSS-driven. Add Gotenberg if you need browser-accurate rendering with per-page header and footer HTML files. All three run happily on the same Docker network and can be composed together in a few lines of YAML.

FAQ

Which self-hosted Docker tool is best for filling PDF form fields via API?

Stirling-PDF’s POST /api/v1/form/fill endpoint is the most capable option. Send the PDF template as file and a JSON object of field-name/value pairs as data. An optional flatten boolean locks the values into static content. The same container also provides field inspection (/fields), modification (/modify-fields), deletion (/delete-fields), and CSV/XLSX export in one stack. KatSick/pdftk-as-a-service remains a lightweight standalone alternative if you specifically want a minimal form-fill microservice without the full Stirling-PDF runtime.

What is Stirling-PDF?

Stirling-PDF is an open-source self-hosted PDF platform with 70+ tools including merge, split, OCR, convert, redact, watermark, and metadata editing. It provides both a React web UI and a full REST API at /api/v1/, backed by LibreOffice, Tesseract, and QPDF.

Does Stirling-PDF support programmatic form filling?

Yes. Stirling-PDF ships a dedicated form API at /api/v1/form/fill. POST the PDF template as file and a JSON object of field-name/value pairs as data. The optional flatten parameter converts filled fields to static content. The same base path also provides /fields (inspect), /fields-with-coordinates, /extract-csv, /extract-xlsx, /modify-fields, and /delete-fields.

How do I update PDF metadata with Stirling-PDF?

POST to /api/v1/misc/update-metadata with the PDF as fileInput and individual fields (title, author, subject, keywords, creator, producer) as multipart form values. The response is the updated PDF.

What is WeasyPrint Docker and when should I use it?

WeasyPrint Docker is a Python aiohttp service that converts HTML to PDF using WeasyPrint’s CSS Paged Media engine. Use it when your HTML and CSS are under your control, you need headers and footers defined in CSS @page rules, and you do not need JavaScript execution during rendering.

What is CSS Paged Media and why does it matter for headers and footers?

CSS Paged Media is a W3C specification that extends CSS to define how a document is paginated for print. The @page rule lets you declare margin boxes (@top-center, @bottom-right, etc.) that hold running headers and footers. WeasyPrint implements this specification, so headers and footers share the document’s full CSS context and can use the same fonts and colours as the main content.

How do I define a page number in a WeasyPrint header?

Use CSS counters inside a @page margin box: content: "Page " counter(page) " of " counter(pages);. WeasyPrint substitutes the correct values at render time. No JavaScript or separate file upload is needed.

What is the difference between WeasyPrint headers and Gotenberg headers?

WeasyPrint defines headers and footers in CSS @page rules that share the document’s full stylesheet. Gotenberg uses separate uploaded header.html and footer.html files rendered in an isolated Chromium context, where the main document’s CSS does not apply and JavaScript does not run.

What does torfs-ict/docker-pdftk-webservice do?

It is a PHP/Symfony webservice that merges PDFs via a POST to /merge with multipart file uploads. Only merge is currently implemented; the project acknowledges that other features are planned but not shipped. It is largely inactive and not recommended for new projects.

Should I use wkhtmltopdf Docker for new projects?

No. The upstream wkhtmltopdf binary is abandoned, has rendering bugs with modern CSS, and is no longer maintained. Use WeasyPrint for CSS Paged Media rendering or Gotenberg (Day 48) for Chromium-based rendering instead.

What are the three Stirling-PDF Docker image variants?

latest is the standard image for most PDF tools. latest-fat adds LibreOffice, extra fonts, and Calibre for Office conversion and highest-quality output. latest-ultra-lite strips it down to core operations only for resource-constrained environments.

Why do I need the fat image for HTML to PDF in Stirling-PDF?

HTML-to-PDF conversion in Stirling-PDF goes through LibreOffice, which is only present in the latest-fat image. The INSTALL_BOOK_AND_ADVANCED_HTML_OPS=true environment variable must also be set to install Calibre at startup.

Does Stirling-PDF have a web UI?

Yes. The React-based web UI is the primary interface and is accessible at port 8080. Every tool is available through the UI with no code required. The same operations are also available via the REST API for automation.

How do I authenticate Stirling-PDF API requests?

When login is enabled, every API request must include -H "X-API-KEY: your-key". Generate the key from the user settings page in the web UI after logging in with the initial admin credentials.

What port does each service use?

Stirling-PDF listens on 8080. KatSick/pdftk-as-a-service listens on 8080 inside the container (map to a different host port if running both). WeasyPrint Docker listens on 5000. torfs-ict listens on 80.

Does KatSick/pdftk-as-a-service support any operations besides form filling?

No. The container exposes a single endpoint, POST /fill-pdf. It does not support metadata editing, merging, splitting, or any other PDF operation. For anything beyond AcroForm field filling, use Stirling-PDF or another tool.

How do I merge PDFs via the Stirling-PDF API?

POST multiple files to /api/v1/general/merge-pdfs using repeated -F "fileInput=@file.pdf" fields. The files are merged in the order they are sent.

Can Stirling-PDF convert Word documents to PDF?

Yes, via POST /api/v1/convert/file/pdf with the latest-fat image. LibreOffice handles .docx, .xlsx, .pptx, .odt, and 100+ other Office formats.

How does OCR work in Stirling-PDF?

Tesseract OCR adds a searchable text layer to image-only or scanned PDFs via POST /api/v1/misc/ocr-pdf. English is bundled. Additional language packs are .traineddata files placed in the /usr/share/tessdata volume mount.

Can I add a watermark via the Stirling-PDF API?

Yes. POST /api/v1/misc/add-watermark accepts watermarkType (text or image), watermarkText, fontSize, rotation, and opacity as multipart fields.

Does Stirling-PDF store uploaded files?

No. Files are processed in memory and deleted immediately after each operation. Nothing is retained between requests. Sensitive documents do not persist on the server.

Which volume is most important to back up in Stirling-PDF?

The /configs volume. It contains settings.yml and the embedded H2 database holding user accounts, API keys, and audit logs. Losing it when login is enabled means losing all user access.

How do I control JVM memory in Stirling-PDF?

Set JAVA_TOOL_OPTIONS="-Xms512m -Xmx4g" to define initial and maximum JVM heap. Always leave memory for the OS and LibreOffice below the Docker container’s memory limit.

Can I redact text programmatically?

Yes. Stirling-PDF’s POST /api/v1/security/auto-redact endpoint accepts search terms or patterns and blacks them out in the output PDF, making the redaction permanent and unrecoverable.

What is the pipeline feature in Stirling-PDF?

Pipelines are no-code automation chains built in the web UI and saved to the /pipeline volume. You link multiple PDF operations in sequence – merge, then watermark, then compress – without writing any code. Pipelines can also be triggered programmatically.

How many languages does the Stirling-PDF UI support?

40+ languages including English, German, French, Spanish, Dutch, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Russian, and many more. Set the default with SYSTEM_DEFAULTLOCALE.

Can I run all these services together on one server?

Yes. Add each service to the same Docker Compose file with a shared network. Use different host port mappings to avoid conflicts. Stirling-PDF (fat) needs 4+ GB RAM; KatSick and WeasyPrint are each under 200 MB and negligible in comparison.

What is a good minimum server spec for this PDF stack?

For Stirling-PDF fat plus KatSick and WeasyPrint: 4 GB RAM, 2 vCPUs, and 10 GB disk. For heavy OCR or simultaneous Office conversions, 8 GB RAM and 4 vCPUs is more comfortable.

What license is Stirling-PDF under?

Stirling-PDF uses an open-core model. The community edition is free and open source. Enterprise features such as SSO and audit logging require a commercial license. See the LICENSE file in the GitHub repository for the full terms.

Is Stirling-PDF actively maintained?

Very much so. With over 79,000 GitHub stars and 6,900 forks, it is the most-starred self-hosted PDF project. Issues are responded to quickly and releases are frequent.

Let’s Talk!

Looking for a reliable partner to bring your project to the next level? Whether it’s development, design, security, or ongoing support—I’d love to chat and see how I can help.

Get in touch,
and let’s create something amazing together!

RELATED POSTS

Here’s the thing about the macOS menu bar: Apple gives you zero control over it. Your apps just pile in from the right, squeezing together like commuters on a rush-hour train, and you either live with it or you don’t. There’s no padding, no grouping, no breathing room. Just a wall of tiny icons staring […]

Generating PDFs on a server is one of those tasks that sounds simple until you actually sit down to do it. HTML-to-PDF rendering drifts between browsers, LibreOffice headless mode is finicky to install, and most SaaS solutions charge per page once you hit volume. Gotenberg solves this cleanly: a single Docker container that bundles headless […]

I already wrote about FAQ schema in 2026 — short version: it’s still useful, just not the rich-snippet darling it once was. But that piece got a lot of follow-up questions like “okay, what about all the OTHER schema types?” Fair question. Here’s the deal in 2026: Google quietly killed seven more schema types’ rich […]

Alexander

I am a full-stack developer. My expertise include:

  • Server, Network and Hosting Environments
  • Data Modeling / Import / Export
  • Business Logic
  • API Layer / Action layer / MVC
  • User Interfaces
  • User Experience
  • Understand what the customer and the business needs


I have a deep passion for programming, design, and server architecture—each of these fuels my creativity, and I wouldn’t feel complete without them.

With a broad range of interests, I’m always exploring new technologies and expanding my knowledge wherever needed. The tech world evolves rapidly, and I love staying ahead by embracing the latest innovations.

Beyond technology, I value peace and surround myself with like-minded individuals.

I firmly believe in the principle: Help others, and help will find its way back to you when you need it.