Here are my slides:
Or if you like PDF:
work as if you live in the early days of a better nation
Here are my slides:
Or if you like PDF:
If you need to ship things, you’re probably not too keen on queuing at the post office right now. Canada Post’s Ship Online service is pretty handy if you have a printer. The PDFs it produces are okay to print on plain paper, but if you’re using full-sheet labels like Avery 5165 you’re going to waste half a sheet of expensive labels.
If you’ve got two parcels to mail, this shell script will extract the right side of each page and create a single 2-up PDF with both your labels on the same page. You will need:
On my Ubuntu system, you can get good-enough¹ versions by doing this:
sudo apt install poppler-utils netpbm img2pdf
The code:
#!/bin/bash
# cp2up.sh - fits the important part of Canada Post print labels 2 per sheet
# scruss, 2021-05 - CC-BY-SA
# hard-coded input name (document.pdf)
# hard-coded output name (labels-2up.pdf)
# accepts exactly two labels (sorry)
dpi=600
width_in=11
height_in=8.5
# png intermediate format uses pixels per metre
dpm=$(echo "scale=3; $dpi * 1000 / 25.4" | bc)
# calculated pixel sizes, truncated to integer
half_width_px=$(echo "$width_in * $dpi / 2" | bc | sed 's/\..*$//')
height_px=$(echo "$height_in * $dpi" | bc | sed 's/\..*$//')
pdftoppm -mono -r "$dpi" -x "$half_width_px" -y 0 \
-W "$half_width_px" -H "$height_px" document.pdf labels
pnmcat -lr labels-1.pbm labels-2.pbm |\
pnmtopng -compression 9 -phys "$dpm" "$dpm" 1 > labels.png \
&& rm labels-1.pbm labels-2.pbm
# fix PDF time stamps
now=$(date --utc --iso-8601=seconds)
img2pdf -o labels-2up.pdf --creationdate "$now" --moddate "$now" \
--pagesize "Letter^T" labels.png \
&& rm labels.png
# saved from:
# history | tail | awk '{$1=""; print}' |\
# perl -pwle 'chomp;s/^\s+//;' > cp2up.sh
It’s got a few hard-coded assumptions:
Clever people could write code to work around these. Really clever people could modify this to feed a dedicated label printer.
Yes, I could probably have done all this with one ImageMagick command. When ImageMagick’s command line syntax begins to make sense, however, it’s probably time to retire to that remote mountain cabin and write that million-word thesis on a manual typewriter. Also, ImageMagick’s PDF generation is best described as pish.
One of the issues that this script avoids is aliasing in the bar-codes. For reasons known only to the anonymous PDF rendering library used by Canada Post, their shipping bar-codes are stored as smallish (780 × 54 px) bitmaps that are scaled up to a 59 × 19 mm print size. Most PDF viewers (and Adobe Viewer is one of these) will anti-alias scaled images, making them slightly soft. If you’re really unlucky, your printer driver will output these as fuzzy lines that no bar-code scanner could ever read. Rendering them to high resolution mono images may still render the edges a little roughly, but they’ll be crisply rough, and scanners don’t seem to mind that.
¹: Debian/Ubuntu’s netpbm package is roughly 20 years out of date for reasons that only a very few nerds care about, and the much better package is blocked by Debian’s baroque and gatekeepery packaging protocol. I usually build it from source for those times I need the new features.
It is good that there are so many scanned manuals for old computer systems out there. Every old system did things its very own special way, and life’s too short to guess. I mean, there’s not much out there on the SYM-1 I’m trying to get working again:
— not much except for 6502.org’s excellent Synertek SYM-1 Resources, that is.
Some manuals, though, while lovingly scanned, are just too large to download, browse or file. Take, for instance, AppleIIScans’ Apple II BASIC Programming With ProDOS. It’s a very faithful colour scan, but at 170 MB for 280 pages, it’s a bit unwieldy. I suspect it’s Adobe Acrobat Paper Capture’s fault: while it makes turning scans into readable files really easy, it doesn’t warn against using 600 dpi full colour for a book with only decorative use of colour.
Good old Ghostscript saves the day, though:
gs -sDEVICE=pdfwrite -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -dSAFER -q -sOutputFile=1983-A2L2013-m-a2-bpwp-grey.pdf -- 1983-A2L2013-m-a2-bpwp.pdf
By downsampling the scanned images and converting everything to greyscale, the result’s only 16 MB. All text and indexing from Acrobat is left intact.
The original seems to have fallen off the web (though the Wayback machine might have it), but:
Wikimedia has some metadata from the source, drawn by Anthony Atkielski:
By Agateller at English Wikipedia – Transferred from en.wikipedia to Commons by Sreejithk2000 using CommonsHelper., Public Domain, Link
All fired up by Natalie Draz‘s presentation about The Transmitting Library at Make Change Conference last Sunday, I made a tiny zine using Natalie’s template:
Fold it, cut across the broken line in the middle, then re-fold so the front and back cover are on the outside. Colour it in!
I’ve supplied it in a couple of formats:
My lightning talk for GTALUG seemed to go down quite well. Here are the slides. It’s mostly based on experience gleaned from My bank broke PDF … and how I used PDFBeads to fix it. I really must write this up properly … oh wait, I just did.
I also prepared — but didn’t get to use — notes on using Mini Printers and Linux. Again, this is from Thermal Printer driver for CUPS, Linux, and Raspberry Pi: zj-58 and Notes on mini-printers and Linux.
I’m rather pleased with this, as it’s the first pattern I’ve worked out from a real example, a screen in the Aga Khan Museum:
Here’s the pattern as a full page PDF from Inkscape: akm-screen-tiled-strapwork.pdf. The pattern’s not much more than an 8-pointed star with a smaller 8-pointed star inside it, rotated 22½°. But it’s still kinda neat.
Artifex’s GSView is rather good. It describes itself as ‘a user friendly viewer for Postscript, PDF, XPS, EPUB, CBZ, JPEG, and PNG’, and it sure does those things. It’s currently bundled as Mac, Windows and Linux Intel-only binaries, but maybe we’ll see ARM distribution or source soon enough.
The name confused me a bit. Russell Lang of Ghostgum Software Pty Ltd has maintained a nice Windows-only Ghostscript front end called GSview for years. Note the huge difference in names: Artifex‘s release is GSView 6, while Ghostgum’s is GSview 5. Hmm.
Naming aside, GSView does make it very easy to convert its input files to PDF/A, the ISO standard archival PDF definition that is immune to Adobe’s format meddling. (Adobe have, with Acrobat Reader DC, maintained an unbroken tradition that their latest PDF reader software is more bloated and craptastic than the last.)
PDF/A defines several archival settings such as font embedding and colour management. It’s possible to do this on the Ghostscript command line, but it’s fiddly. GSView just needs you to point it at the colour standard files on your system. On Mac, these live in /System/Library/ColorSync/Profiles/, and in the image below, I’ve picked out the generic ones:
On Linux, these files will likely be somewhere predictable; for me, they are in /usr/share/ghostscript/9.15/iccprofiles/. I made copies in the GSView executable folder so they wouldn’t get lost if my system updates Ghostscript:
The PDF/A files you get can be considerably smaller than the originals. A 10 MB LibreOffice Impress slide deck from a presentation on OpenStreetMap that I gave last week shrunk down to 1.3 MB when saved by GSView, with only very minor JPEG gribblies visible in the slide background. The graphic above (modified from the Ghostscript example file ‘golfer.eps’; yay, Illustrator 1.0!) shrunk by ⅓. These are handy savings, plus you get a standalone archival format that will never change!
Instagram filter used: Normal
I made this after being inspired by this MetaFilter comment. There’s a PDF linked from the image below:
Aargh! Ubuntu 16.10 has decided that ImageMagick doesn’t need JPEG 2000 support, and will quietly (and very very wrongly) write JP2s as JPEG.
(NB: JPEG 2000 images still maybe crash Ubuntu’s file browser in 14.10. My old installation didn’t like them, but my reinstall seems quite happy. Go figure.)
JPEG 2000 is a great image file format: well-defined, and able to store high quality photographic data in a very small space. It truly is the JPEG of the 2000s — except for its dismal support under Ubuntu.
The problem is the patents. An open library has been a long time coming, and lots of Linux software is built without JP2 support. This helped keep it away from my desktop.
Under Ubuntu 14.04, here’s what does and doesn’t support JP2 files:
Since JPEG 2000 isn’t included in web browsers (grar), I’ve embedded a sample scanned JPEG into a PDF, and added a series of progressively more compressed JPEG 2000 versions: JPEG-2000Booklet [PDF]. The booklet has notes showing the byte size of each page. The image still looks pretty good at 8% of the original file size!
It’s impractically huge, but under the image link lives a table of all of the Hershey fonts (well, the Western ones, at least). It’s interesting to note Dr Hershey’s preferences in this pre-ASCII table: almost every variant has degree, minute and second symbols, but none of them have ‘\’. Many of them don’t have ‘@’, either, so no e-mail addresses in Hershey Fraktur for you …
ICQuestionBank2csv: A tool to extract both the Basic and Advanced Amateur Radio Examination guides from Industry Canada’s rather annoying two-column PDFs. Written for IC’s 2014-02 database updates.
See: Amateur Radio Exam Generator.
Written by Stewart C. Russell (aka scruss) / VA3PID – 2014-03-07.
Run either basic2csv.sh
or advanced2csv.sh
to download the source PDF and extract the data.
WTFPL (srsly).
Oh man, Protext! For years, it was all I used: every magazine article, every essay at university (all two of them), my undergraduate dissertation (now mercifully lost to time: The Parametric Design of a Medium Specific Speed Pump Impeller, complete with spline-drawing code in HiSoft BASIC for the Amiga, is unlikely to be of value to anyone these days), letters — you name it, I used Protext for it.
I first had it on 16kB EPROM for the Amstrad CPC464; instant access with |P. I then ran it on the Amiga, snagging a cheap copy direct from the authors at a trade show. I think I had it for the PC, but I don’t really remember my DOS days too well.
The freeware version runs quite nicely under dosemu. You can even get it to print directly to PDF:
The results come out not bad at all:
Protext’s file import and export is a bit dated. You can use the CONVERT utility to create RTF, but it assumes Code page 437, so your accents won’t come out right. Adding \ansicpg437 to the end of the first line should make it read okay.
(engraving of Michel de Montaigne in mad puffy sleeves: public domain from Wikimedia Commons: File:Michel de Montaigne 1.jpg – Wikimedia Commons)
This is how wind turbines were supposed to look, at least in the 1940s. It’s the experimental Smith-Putnam 1.25 MW unit than ran for a short while on a hill near Rutland, VT. The picture’s from a rather falling-apart copy of Large Horizontal-axis Wind Turbines (Thresher, R. W., & Solar Energy Research Institute. (1982). Large horizontal-axis wind turbines: Proceedings of a workshop held in Cleveland, Ohio, July 28-30, 1981. Golden, Colo: Solar Energy Research Institute) that I rescued from Jim‘s recycling years ago.
The first part of these proceedings has a historical review of the Smith-Putnam turbine, including an excerpt from the S. Morgan Smith Company’s house organ on the project. As the rest of the book is pretty much all about the MOD series of turbines, it’s of less interest. I’ve scanned the bits about the Smith-Putnam turbine, and put them here: NASA_DOE-1981-large_horizontal_axis_wind_turbines-excerpt. If anyone wants the book, let me know. It’s very ratty, but readable.
I’ve written about this turbine before, but in relation to a packet of crayons. More awesome turbine pictures from Paul Gipe: Smith-Putnam Industrial Photos.
Here are the complete 1988-vintage Sun manuals “Using NROFF and TROFF†and “Formatting Documents†scanned just for you. I’d scanned these in 2000, and they’d sat on a forgotten archive volume since then.
Update: there are better versions on the Internet Archive: Using NROFF and TROFF and Formatting Documents, all as part of the Sun Microsystems, Inc. manual collection.
(if you need to get your troff on, go to Ralph’s troff.org.)
I don’t often need it, but the code printing facility in the Arduino IDE is very weak. It has some colour highlighting, but no page numbering, no line numbering, and no headers at all.
a2ps will sort you right out here. Years back, it was a simple text to PostScript filter, but now it has many wonderful filters for pretty-printing code. The Wiring/Arduino language is basically C++, and a2ps knows how to deal with that. So, to create a PostScript file with a nice version of the the most basic Blink sketch:
a2ps --pro=color -C -1 -M letter -g --pretty-print='c++' -o ~/Desktop/Blink.ps Blink.ino
If you’re somewhere that uses sensible paper sizes (in other words, not North America), you probably don’t want the -M letter
option. a2ps is supposed to have a PDF print option (. You can’t use the -P pdf
), but it doesn’t work on my installation, so I just splat the output through ps2pdf-P «printer»
option combined with the -o «file»
option, but cups-pdf is your friend if you need to print to a PDF. The results are linked below:
Not bad, eh?
(Update: think I must have written this post on a Mac with a case-insensitive filesystem. Using the --pretty-print='C++'
option I had before failed on Linux.)
— an ad from the December 1976 edition of Byte, from the BYTE magazine scanning effort.
I’m on a major decluttering toot. When I realised that the filing cabinet I bought three years ago would no longer close with all the papers stuffed in it, I knew something had to change. I’ve been shredding like it’s Houston in 2001. I have the duplex scanner to suck in the stuff I need to keep. I’m moving to paperless wherever possible to stop it building up again.
My bank provides PDF statements. Of this I approve. PDF is almost perfect for this: it provides an electronic version of the page, but with searchable text and the potential for some level of security. Except, this is not the way that my bank does it. At first glance, the text looks pretty harmless:
Zoom in, and it gets a bit blocky:
Zoom right in:
Aargh! Blockarama! Did they really store text as bitmaps? Sure enough, pdftotext output from the files contains no text. Running pdfimages produces hundreds of tiny images; here’s just a few:
Dear oh dear. This format is the worst of electronic, combined with paper’s lack of computer indexability. The producer claims to be Xenos D2eVision. Smooth work there, Xenos.
So, how can I fix this? It’s a bit of a pain to set this workflow up, but what I’ve done is:
gs -SDEVICE=tiffg4 -r300x300 -sOutputFile=file%03d.tif -dNOPAUSE -dBATCH -- file.pdf
for f in file*tif
do
tesseract $f `basename $f` hocr
done
for f in file*tif
do
cuneiform -f hocr -o `basename $f .tif`.html $f
done
pdfbeads * > ../Output.pdf
The files are really small, and the text is recognized pretty well. It still looks pretty bad:
but at least the text can be copied and indexed.
This thread “Convert Scanned Images to a Single PDF File†got me up and running with PDFBeads. You might also have success using the method described here: “How to extract text with OCR from a PDF on Linux?†— it uses hocr2pdf to create single-page OCR’d PDFs, then joins them.