Notes / HTML to PDF with Emoji using Headless Chrome

The goal is to produce PDFs like test.pdf from HTML, and have them be automatically built by CI. See here for the repo.

I also wanted to use the same emoji as on Android so they look familiar to the recipient when I print a letter off. This has some problems though, so needed a workaround.

Motivation

I wanted a way to write letters, with the ability to include charts, emoji, and do arbitrary layouts, etc. but be easy enough to write in, and furthemore be able to see the (rough) output without a toolchain. This lead me to choosing web technologies and then printing to PDF.

I tried \( \LaTeX \), but found layout too fiddly for one off letters, also it required a toolchain. I found Pandoc not flexible enough.

Having HTML as the source means the file can be locally served and opened in a browser to check what it'll (roughly) look like when output to PDF, and anything that can be rendered in the browser can be rendered in the output, e.g. D3.js, Mermaid, and even TikZJax.

Implementation

All of the actual work is done by Puppeteer/Chrome. A build script prints to PDF for each HTML file it finds. The whole thing is run in Docker so it can be run by GitLab CI.

{ build.js }

const express = require("express");
const fs = require("fs").promises;
const globby = require("globby");
const path = require("path");
const puppeteer = require("puppeteer");
const replaceExt = require("replace-ext");


async function main() {
    const server = await startServer(3000);
    const browser = await startBrowser();

    await fs.mkdir("build", { recursive: true });

    for (const file of await globby("src/*.html")) {
        const fileName = path.basename(file);
        const outputFileName = replaceExt(fileName, ".pdf");

        await printToPdf(
            browser,
            `http://localhost:3000/${fileName}`,
            `build/${outputFileName}`,
        );
    }

    await server.close();
    await browser.close();
}

async function startServer(port) {
    const app = express();
    app.use(express.static(__dirname + "/src"));

    return new Promise((resolve, reject) => {
        const server = app.listen(port, (error) => {
            if (error) {
                return reject(error);
            }
        });

        return resolve(server);
    })
}

async function startBrowser() {
    const browser = await puppeteer.launch({
        args: [
            "--headless",
            "--no-sandbox",
            "--disable-gpu",
            "--disable-dev-shm-usage",
        ],
    });

    return browser;
}

async function printToPdf(browser, url, outputPath) {
    const page = await browser.newPage();
    await page.goto(url, {
        waitUntil: "networkidle0",
    });
    await page.pdf({
        path: outputPath,
        printBackground: true,
        format: "A4",
    });
}

main();

{ .gitlab-ci.yml }

image: zenika/alpine-chrome:84-with-node

variables:
  PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: '1'
  PUPPETEER_EXECUTABLE_PATH: /usr/bin/chromium-browser

build:
  stage: build
  script:
    - npm ci
    - npm run build
  when: manual
  only:
    - master
  artifacts:
    paths:
      - build

Problems

A problem is that emoji are rendered using the system font, so they look different on different systems. We can fix this by referencing an emoji font in CSS and using that, e.g. Noto Color Emoji. This makes what is shown in the browser consistent, but there seems to be a limitation in Chrome with printing to PDF using colour fonts. They end up all black in the output! See below.

A workaround is to use something like Twemoji to replace all of the emoji with SVG representations of themselves. In fact if I was using these on a website I might opt for this approach, rather than loading a chunky web font, only load the SVGs for the emoji used.

This works but we loose out on the noto emoji, however the noto-emoji repo has SVG assets, so we can rename them to the form Twemoji expects and point it at them instead. Comment/uncomment the lines in the head of test.html to load a different emoji set. We can even go back in time and get blob emoji!

I definitely prefer the middle row, the 2020-04-08 build of the emoji. I don't like the newest ones lack of border.

NOTE: My renaming of the SVG assets was a little naive, I think it breaks on files with leading 0s, but the idea works. It might be worth collecting all the different versions of these emoji, renaming them properly, and pairing them with a script based on a stripped down version of what Twemoji does.