Instagram filter used: Lo-fi
Author: scruss
-
In the unlikely event you need to represent Emoji in RTF using Perl …
Of all the niche blog entries I’ve written, this must be the nichest. I don’t even like the topic I’m writing about. But I’ve worked it out, and there seems to be a shortage of documented solutions.
For the both of you that generate Rich Text Format (RTF) documents by hand, you might be wondering how RTF converts ‘💩’ (that’s code point U+1F4A9) to the seemingly nonsensical \u-10179?\u-9047?. It seems that RTF imposes two encoding limitations on characters: firstly, everything must be in 7-bit ASCII for easy transmission, and secondly, it uses the somewhat old-fashioned UTF-16 representation for non-ASCII characters.
UTF-16 grew out of an early standard, UCS-2, that was all like “Hey, there will never be a Unicode code point above 65536, so we can hard code the characters in two bytes … oh shiiii…â€. So not merely does it have to escape emoji code points down to two bytes using a very dank scheme indeed, it then has to further escape everything to ASCII. That’s how your single emoji becomes 17 bytes in an RTF document.
So here’s a tiny subroutine to do the conversion. I wrote it in Perl, but it doesn’t do anything Perl-specific:
#!/usr/bin/env -S perl -CAS # emoji2rtf - 2017 - scruss # See UTF-16 decoder for the dank details # <https://en.wikipedia.org/wiki/UTF-16> # run with 'perl -CAS ...' or set PERL_UNICODE to 'AS' for UTF-8 argv # doesn't work from Windows cmd prompt because Windows ¯\_(ツ)_/¯ # https://scruss.com/blog/2017/03/12/in-the-unlikely-event-you-need-to-represent-emoji-in-rtf-using-perl/ use v5.20; use strict; use warnings qw( FATAL utf8 ); use utf8; use open qw( :encoding(UTF-8) :std ); sub emoji2rtf($); my $c = substr( $ARGV[0], 0, 1 ); say join( "\t⇒ ", $c, sprintf( "U+%X", ord($c) ), emoji2rtf($c) ); exit; sub emoji2rtf($) { my $n = ord( substr( shift, 0, 1 ) ); die "emoji2rtf: code must be >= 65536\n" if ( $n < 0x10000 ); return sprintf( "\\u%d?\\u%d?", 0xd800 + ( ( $n - 0x10000 ) & 0xffc00 ) / 0x400 - 0x10000, 0xdC00 + ( ( $n - 0x10000 ) & 0x3ff ) - 0x10000 ); }
This will take any emoji fed to it as a command line argument and spits out the RTF code:
📓 ⇒ U+1F4D3 ⇒ \u-10179?\u-9005? 💽 ⇒ U+1F4BD ⇒ \u-10179?\u-9027? 🗽 ⇒ U+1F5FD ⇒ \u-10179?\u-8707? 😱 ⇒ U+1F631 ⇒ \u-10179?\u-8655? 🙌 ⇒ U+1F64C ⇒ \u-10179?\u-8628? 🙟 ⇒ U+1F65F ⇒ \u-10179?\u-8609? 🙯 ⇒ U+1F66F ⇒ \u-10179?\u-8593? 🚥 ⇒ U+1F6A5 ⇒ \u-10179?\u-8539? 🚵 ⇒ U+1F6B5 ⇒ \u-10179?\u-8523? 🛅 ⇒ U+1F6C5 ⇒ \u-10179?\u-8507? 💨 ⇒ U+1F4A8 ⇒ \u-10179?\u-9048? 💩 ⇒ U+1F4A9 ⇒ \u-10179?\u-9047? 💪 ⇒ U+1F4AA ⇒ \u-10179?\u-9046?
Just to show that this encoding scheme really is correct, I made a tiny test RTF file unicode-emoji.rtf that looked like this in Google Docs on my desktop:
It looks a bit better on my phone, but there are still a couple of glyphs that won’t render:
Update, 2020-07: something has changed in the Unicode handling, so I’ve modified the code to expect arguments and stdio in UTF-8. Thanks to Piyush Jain for noticing this little piece of bitrot.Further update: Windows command prompt does bad things to arguments in Unicode, so this script won’t work. Strawberry Perl gives me:
perl -CAS .\emoji2rtf.pl ☺
emoji2rtf: code must be >= 65536; saw 63I have no interest in finding out why.
-
hidráulica tile query from metafilter
(to explain this: Origin of a geometric tile design pattern?)
-
“The Error Message as a bourgeois constructâ€
If you try to run the (unmodified) BASIC code for Oregon Trail (1975) on PDP-8 BASIC, you get this:
DI 30 XC 45 XC 205 … IF 700 NM 730 … (many, many more lines …)
I thought at first it was a stack trace, but nope — it’s error messages! You need to dig through your trusty language manual, and on page 132 it has a table to explain:
DI ERROR IN DIM STATEMENT IF ERROR IN IF STATEMENT NM MISSING LINE NUMBER XC CHARS AFTER END OF LINE
(and yes, they’re in all-caps. Mixed case? Mixed feelings!)
So whenever Python throws a tantrum (or as it calls it, an exception) and wails at length about its problems, remember PDP-8 BASIC: Two letters + a line number. That’s all.
-
PoorFish, v2
On FontLibrary: PoorFish
Local copy: PoorFish.zip
Full Language Support: Afrikaans, Baltic, Basic Latin, Catalan, Central European, Dutch, Esperanto, Euro, Turkish, Western European. Terrible kerning comes free.
(original version from 2010: PoorFish)
-
The Pocket DEC Pretender (PDP) Zero
PDP (Pocket DEC Pretender) Zero: lettering came out a bit more, um, artisanal than I’d hoped … Digital (aka DEC) used to make some very solid minicomputers back when a minicomputer was fridge-sized and people were still building nuclear power stations to be controlled by them. The Raspberry Pi Zero is a very mini computer indeed, and in USB gadget mode running SimH it makes a nice little emulation platform.
The case is from Thingiverse: One Piece Raspberry Pi Zero + Camera Case (with GPIO) by Superrei, but with the DEC PDP logo in relief on the top.
DEC minis were famous for their arrays of blinkenlights. The Pocket DEC Pretender, not so much: it has one tiny green light that flickers a bit now and again:
PDP (Pocket DEC Pretender) Zero: case open, very few blinkenlights But it’s a genuinely useful (for my values of useful) emulation platform. Here it is pretending to be a PDP-8, running BASIC under OS-8:
PDP (Pocket DEC Pretender) Zero: PDP-8 BASIC! (background in case pictures woven in Toronto by Deftly Weft)
-
TYPE BANG: First Person Shooter, 1975 style
— from the source code of an early (1975) time-shared system version of The Oregon Trail, as documented in On the Trail of the Oregon Trail.
-
Library Hand – Disjoint
LibHandDis — Based on scans of “Library Hand – Disjointâ€, described in Dana’s A Library Primer, with some modifications.Major changes from scan:
- As the scan only covered A-Z, a-z, 0-9 and ‘&’, I had to make the rest up.
- Many of the descenders had to be shortened to fit with modern typography conventions.
- Kerning is much tighter than Dana’s guidelines suggest.
(idea for this came via MetaFilter, This question of library handwriting is an exceedingly practical one)
Local copy: LibHandDis.zip.
-
The punctuation is a farce, the kerning is ropey – but here’s my attempt at Dana’s Library Hand
Update: don’t use this terrible thing. Margo Burns has made an amazing version: Dana Library Hand.
Instagram filter used: Normal
-
a font for the person you’re just dotty about
LoveMatrix is a lo-fi dot matrix font made of ♥♥♥s. It’s a seasonally-adjusted version of my mnicmp font.
Local copy: LoveMatrix.zip