RichyHBM

Software engineer with a focus on game development and scalable backend development

Converting HTML to PDF in Continuous Integration using Containers

Having content displayed as HTML can be very convenient for certain things, rapidly changing content, accessibility tools, and being able to style it more concisely, are some examples. However when creating a print out you are at the whim of the person printing it keeping the correct styling without any browser/printer adding any extra styling of its own, for this reason I wanted to host a PDF version, to allow people to easily print out or save an offline copy with the styling I originally envisioned.

On my website I currently host a (somewhat redacted) copy of my resume, which I would then print to PDF in order to produce a send-able file, however if someone were to want this directly from the site, they may not set up the print settings correctly in the way I would want it to appear. For this reason, I decided to add a PDF version directly that I controlled the styling with, however I feel like I may forget to refresh the PDF file after making any changes to the files, and so wanted to look at automating this process.

Styling and CSS

Step one for this process is to do as much as possible in CSS, this ensures that even if someone skipped the PDF button and printed the HTML directly in their browser, it would look as much as possible like the PDF. Firstly I make sure the page size and margins are set, this makes sure the page prints out as an A4 page.

@page
{
  size: 210mm 297mm;
  margin: 15mm 15mm 15mm 15mm;
}

Next up is overriding the page CSS with print specific values, this is to ensure that text is readable, or that for example the page is white with black text, it also defines some other rules to make sure the page split due to running over in to another page doesn’t disturb any formatting, as well as other smaller changes.

@media print {
  body {
  	background: #fff;
    font-size: 1.2em;
    margin: 0em 0.5em;
    font-family: 'Open Sans', sans-serif;
  }
  .container {
    width: 100%;
  }
  .dont-split-print {
    page-break-inside: avoid;
  }
  .no-print {
    display: none;
  }
  ...
}

Printing with Containers

The main thing with automating the printing process is making sure the output now is the same as in 5 months time, for that reason I wanted to make use of containers in order to easily set the versions of all my dependencies. After looking at many different tools, it seemed the easiest way to get a PDF print of an HTML file is to use a browser’s “Print to PDF” feature, to that extent using chrome’s CLI tools seemed the easiest.

Due to how reading HTML from disk works, I also have to make sure to use relative paths for linked files, however I want to keep absolute ones on the actual deploy, so once I have created a Hugo build, I make a copy of the file to print and modify it to use relative paths.

#!/bin/sh

podman volume create out-resume

podman run --rm -it \
    -v ./out:/site:ro \
    -v out-resume:/out \
    docker.io/library/alpine:3.20 \
        cp /site/resume/index.html /out/index.html

podman run --rm -it \
    -v out-resume:/site 
    docker.io/library/alpine:3.20 
        sed -i 's/href=\/css/href=..\/css/g' /site/index.html

podman run --user 0:0 --rm -it \
    -v ./out:/site \
    -v out-resume:/site/resume:ro \
    gcr.io/zenika-hub/alpine-chrome:124 \
        --no-sandbox \
        --headless=new \
        --disable-gpu \
        --run-all-compositor-stages-before-draw \
        --hide-scrollbars \
        --no-pdf-header-footer \
        --print-to-pdf=/site/pdf/resume.pdf \
        /site/resume/index.html

podman volume rm out

Hooking it all up with Github Actions

Last part of the puzzle is adding up the above into a set of Github Actions in order to automate the creation of the PDF file, the following is fairly straightforward adding the printing process to my list of actions, and ensuring Podman was installed.

steps:
  - name: Checkout
    uses: actions/checkout@v4
    with:
      submodules: recursive
      fetch-depth: 0
    #
    # ...
    #
  - name: Install Podman and fetch images
    run: |
      sudo apt-get update
      sudo apt-get install podman
      podman pull docker.io/library/alpine:3.20
      podman pull gcr.io/zenika-hub/alpine-chrome:124      
    #
  - name: Merge CV
    run: sh merge.sh
    #
  - name: Build with Hugo
    env:
      # For maximum backward compatibility with Hugo modules
      HUGO_ENVIRONMENT: production
      HUGO_ENV: production
      TZ: Europe/London
    run: |
      hugo \
        --gc \
        --minify \
        --baseURL "${{ steps.pages.outputs.base_url }}/"      
    #
  - name: Print CV
    run: sh print.sh
    #