In the unlikely event you need to represent Emoji in RTF using Perl …

Of all the niche blog entries I’ve written, this must be the nichest. I don’t even like the topic I’m writing about. But I’ve worked it out, and there seems to be a shortage of documented solutions.

For the both of you that generate Rich Text Format (RTF) documents by hand, you might be wondering how RTF converts ‘💩’ (that’s code point U+1F4A9) to the seemingly nonsensical \u-10179?\u-9047?. It seems that RTF imposes two encoding limitations on characters: firstly, everything must be in 7-bit ASCII for easy transmission, and secondly, it uses the somewhat old-fashioned UTF-16 representation for non-ASCII characters.

UTF-16 grew out of an early standard, UCS-2, that was all like “Hey, there will never be a Unicode cope point above 65536, so we can hard code the characters in two bytes … oh shiiii…”. So not merely does it have to escape emoji code points down to two bytes using a very dank scheme indeed, it then has to further escape everything to ASCII. That’s how your single enoji becomes 17 bytes in an RTF document.

So here’s a tiny subroutine to do the conversion. I wrote it in Perl, but it doesn’t do anything Perl-specific:

#!/usr/bin/env perl
# emoji2rtf - 2017 - scruss
# See UTF-16 decoder for the dank details
#  <>

use v5.20;
use strict;
use warnings;
use utf8;
sub emoji2rtf($);

my $c = substr( $ARGV[0], 0, 1 );
say join( "\t⇒ ", $c, sprintf( "U+%X", ord($c) ), emoji2rtf($c) );

sub emoji2rtf($) {
    my $n = ord( substr( shift, 0, 1 ) );
    die "emoji2rtf: code must be >= 65536\n" if ( $n < 0x10000 );
    return sprintf( "\\u%d?\\u%d?",
        0xd800 + ( ( $n - 0x10000 ) & 0xffc00 ) / 0x400 - 0x10000,
        0xdC00 + ( ( $n - 0x10000 ) & 0x3ff ) - 0x10000 );

This will take any emoji fed to it and spit out the RTF code:

📓	⇒ U+1F4D3	⇒ \u-10179?\u-9005?
💽	⇒ U+1F4BD	⇒ \u-10179?\u-9027?
🗽	⇒ U+1F5FD	⇒ \u-10179?\u-8707?
😱	⇒ U+1F631	⇒ \u-10179?\u-8655?
🙌	⇒ U+1F64C	⇒ \u-10179?\u-8628?
🙟	⇒ U+1F65F	⇒ \u-10179?\u-8609?
🙯	⇒ U+1F66F	⇒ \u-10179?\u-8593?
🚥	⇒ U+1F6A5	⇒ \u-10179?\u-8539?
🚵	⇒ U+1F6B5	⇒ \u-10179?\u-8523?
🛅	⇒ U+1F6C5	⇒ \u-10179?\u-8507?
💨	⇒ U+1F4A8	⇒ \u-10179?\u-9048?
💩	⇒ U+1F4A9	⇒ \u-10179?\u-9047?
💪	⇒ U+1F4AA	⇒ \u-10179?\u-9046?

Just to show that this encoding scheme really is correct, I made a tiny test RTF file unicode-emoji.rtf that looked like this in Google Docs on my desktop:

It looks a bit better on my phone, but there are still a couple of glyphs that won’t render:

“The Error Message as a bourgeois construct”

If you try to run the (unmodified) BASIC code for Oregon Trail (1975) on PDP-8 BASIC, you get this:

 DI 30
 XC 45
 XC 205
 IF 700
 NM 730
… (many, many more lines …)

I thought at first it was a stack trace, but nope — it’s error messages! You need to dig through your trusty language manual, and on page 132 it has a table to explain:


(and yes, they’re in all-caps. Mixed case? Mixed feelings!)

So whenever Python throws a tantrum (or as it calls it, an exception) and wails at length about its problems, remember PDP-8 BASIC: Two letters + a line number. That’s all.

PoorFish, v2

On FontLibrary: PoorFish

Local copy:

Full Language Support: Afrikaans, Baltic, Basic Latin, Catalan, Central European, Dutch, Esperanto, Euro, Turkish, Western European. Terrible kerning comes free.

(original version from 2010: PoorFish)

The Pocket DEC Pretender (PDP) Zero

PDP (Pocket DEC Pretender) Zero: lettering came out a bit more, um,  artisanal than I’d hoped …

Digital (aka DEC) used to make some very solid minicomputers back when a minicomputer was fridge-sized and people were still building nuclear power stations to be controlled by them. The Raspberry Pi Zero is a very mini computer indeed, and in USB gadget mode running SimH it makes a nice little emulation platform.

The case is from Thingiverse: One Piece Raspberry Pi Zero + Camera Case (with GPIO) by Superrei, but with the DEC PDP logo in relief on the top.

DEC minis were famous for their arrays of blinkenlights. The Pocket DEC Pretender, not so much: it has one tiny green light that flickers a bit now and again:

PDP (Pocket DEC Pretender) Zero: case open, very few blinkenlights

But it’s a genuinely useful (for my values of useful) emulation platform. Here it is pretending to be a PDP-8, running BASIC under OS-8:

PDP (Pocket DEC Pretender) Zero: PDP-8 BASIC!

(background in case pictures woven in Toronto by Deftly Weft)

TYPE BANG: First Person Shooter, 1975 style

— from the source code of an early (1975) time-shared system version of The Oregon Trail, as documented in On the Trail of the Oregon Trail.

Library Hand – Disjoint

LibHandDis — Based on scans of “Library Hand – Disjoint”, described in Dana’s A Library Primer, with some modifications.

Major changes from scan:

  • As the scan only covered A-Z, a-z, 0-9 and ‘&’, I had to make the rest up.
  • Many of the descenders had to be shortened to fit with modern typography conventions.
  • Kerning is much tighter than Dana’s guidelines suggest.

(idea for this came via MetaFilter, This question of library handwriting is an exceedingly practical one)

Local copy:

The punctuation is a farce, the kerning is ropey – but here’s my attempt at Dana’s Library Hand

The punctuation is a farce, the kerning is ropey – but here’s my attempt at Dana’s Library Hand

Instagram filter used: Normal

View in Instagram ⇒

a font for the person you’re just dotty about

LoveMatrix is a lo-fi dot matrix font made of ♥♥♥s. It’s a seasonally-adjusted version of my mnicmp font.

Local copy:

more geometry to colour in

Click on image to download as PDF

based on the main repeating pattern from a Pierced Window Screen at The Metropolitan Museum of Art — particularly this image.