Big old scanned manuals to small old scanned manuals

It is good that there are so many scanned manuals for old computer systems out there. Every old system did things its very own special way, and life’s too short to guess. I mean, there’s not much out there on the SYM-1 I’m trying to get working again:

— not much except for 6502.org’s excellent Synertek SYM-1 Resources, that is.

Some manuals, though, while lovingly scanned, are just too large to download, browse or file. Take, for instance, AppleIIScans’ Apple II BASIC Programming With ProDOS. It’s a very faithful colour scan, but at 170 MB for 280 pages, it’s a bit unwieldy. I suspect it’s Adobe Acrobat Paper Capture’s fault: while it makes turning scans into readable files really easy, it doesn’t warn against using 600 dpi full colour for a book with only decorative use of colour.

Good old Ghostscript saves the day, though:

gs -sDEVICE=pdfwrite -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -dSAFER -q -sOutputFile=1983-A2L2013-m-a2-bpwp-grey.pdf -- 1983-A2L2013-m-a2-bpwp.pdf

By downsampling the scanned images and converting everything to greyscale, the result’s only 16 MB. All text and indexing from Acrobat is left intact.

Fortran Coding Form

The original seems to have fallen off the web (though the Wayback machine might have it), but:

fortran(PDF is linked under image)

Wikimedia has some metadata from the source, drawn by Anthony Atkielski:

FortranCodingForm.png
By Agateller at English Wikipedia – Transferred from en.wikipedia to Commons by Sreejithk2000 using CommonsHelper., Public Domain, Link

nerrrdy Bourgoin mini-zine

All fired up by Natalie Draz‘s presentation about The Transmitting Library at Make Change Conference last Sunday, I made a tiny zine using Natalie’s template:

patterns_from_bourgoin-zine_smallJust fold it, cut across the broken line in the middle, then re-fold so the front and back cover are on the outside. Colour it in!

I’ve supplied it in a couple of formats:

  1. PDF: all vector, no jaggies — patterns_from_bourgoin-zine.pdf.
  2. PNG: if you’re still all about the pixels — patterns_from_bourgoin-zine.png.

GTALUG Lightning Talk(s): Mostly Searchable and Mini Printers

My lightning talk for GTALUG seemed to go down quite well. Here are the slides. It’s mostly based on experience gleaned from My bank broke PDF … and how I used PDFBeads to fix it. I really must write this up properly.

I also prepared — but didn’t get to use — notes on using Mini Printers and Linux. Again, this is from Thermal Printer driver for CUPS, Linux, and Raspberry Pi: zj-58 and Notes on mini-printers and Linux.

Screen pattern, Aga Khan Museum

Rendering of Aga Khan Museum screen

I’m rather pleased with this, as it’s the first pattern I’ve worked out from a real example, a screen in the Aga Khan Museum:

Screen detail, Aga Khan Museum

Here’s the pattern as a full page PDF from Inkscape: akm-screen-tiled-strapwork.pdf. The pattern’s not much more than an 8-pointed star with a smaller 8-pointed star inside it, rotated 22½°. But it’s still kinda neat.

GSView … not to be confused with GSview

Artifex’s GSView is rather good. It describes itself as ‘a user friendly viewer for Postscript, PDF, XPS, EPUB, CBZ, JPEG, and PNG’, and it sure does those things. It’s currently bundled as Mac, Windows and Linux Intel-only binaries, but maybe we’ll see ARM distribution or source soon enough.

The name confused me a bit. Russell Lang of Ghostgum Software Pty Ltd has maintained a nice Windows-only Ghostscript front end called GSview for years. Note the huge difference in names: Artifex‘s release is GSView 6, while Ghostgum’s is GSview 5. Hmm.

GSView on Linux

Naming aside, GSView does make it very easy to convert its input files to PDF/A, the ISO standard archival PDF definition that is immune to Adobe’s format meddling. (Adobe have, with Acrobat Reader DC, maintained an unbroken tradition that their latest PDF reader software is more bloated and craptastic than the last.)

PDF/A defines several archival settings such as font embedding and colour management. It’s possible to do this on the Ghostscript command line, but it’s fiddly.  GSView just needs you to point it at the colour standard files on your system. On Mac, these live in /System/Library/ColorSync/Profiles/, and in the image below, I’ve picked out the generic ones:

GSView Mac Colour Profiles

On Linux, these files will likely be somewhere predictable; for me, they are in /usr/share/ghostscript/9.15/iccprofiles/. I made copies in the GSView executable folder so they wouldn’t get lost if my system updates Ghostscript:

GSView Linux Colour Profiles

The PDF/A files you get can be considerably smaller than the originals. A 10 MB LibreOffice Impress slide deck from a presentation on OpenStreetMap that I gave last week shrunk down to 1.3 MB when saved by GSView, with only very minor JPEG gribblies visible in the slide background. The graphic above (modified from the Ghostscript example file ‘golfer.eps’; yay, Illustrator 1.0!) shrunk by ⅓. These are handy savings, plus you get a standalone archival format that will never change!

The Gadsden Flag Nouveau

The Gadsden Flag Nouveau

Instagram filter used: Normal

View in Instagram ⇒

I made this after being inspired by this MetaFilter comment. There’s a PDF linked from the image below:

gadsden_nouveau

JPEG 2000 on Ubuntu — without anyone getting stabbed

Aargh! Ubuntu 16.10 has decided that ImageMagick doesn’t need JPEG 2000 support, and will quietly (and very very wrongly) write JP2s as JPEG.

(NB: JPEG 2000 images still maybe crash Ubuntu’s file browser in 14.10. My old installation didn’t like them, but my reinstall seems quite happy. Go figure.)

JPEG 2000 is a great image file format: well-defined, and able to store high quality photographic data in a very small space. It truly is the JPEG of the 2000s — except for its dismal support under Ubuntu.

The problem is the patents. An open library has been a long time coming, and lots of Linux software is built without JP2 support. This helped keep it away from my desktop.

Under Ubuntu 14.04, here’s what does and doesn’t support JP2 files:

  • Gimp — not supported. It appears to have a non-functioning plugin that tries to read the file, then gives up. This is annoying, as Gimp is defined as the system default viewer for JPEG 2000.
  • Image Viewer — does support JP2, but occasionally mis-renders pages. To make this the default, right-click on a JP2 file, and select Open with → Other Application …, then choose Image Viewer. It should work from then onwards.
  • Document Viewer — a bit rough when looking at JPEG 2000-encoded PDFs. Very slow, too.
  • GraphicsMagick — seems to be the most painless way of converting graphics files to JPEG 2000. My preferred method of invoking it is:
    gm convert -define 'jp2:rate=0.008' in.png out.jp2
    The rate option should be a small number; the smaller, the greater the compression, and the worse the image quality.
  • OpenJPEG — provides the image_to_j2k and j2k_to_image tools. Far more picky about input formats than it should be, and often fails on seemingly perfect input.
  • img2pdf — (built from source)  is a tiny gem of a package. All it does is wrap various image formats into a PDF file. It doesn’t modify the image data in any way, so with a bit of ingenuity (and pdftk) you can use PDF as a true metafile archive. You can view the content on any platform, but get the source images out bit-for-bit perfect. We used to call files which could contain files metafiles, but that stopped being popular when TIFF started to be a baroque travesty of an image container back in the mid-1990s.
  • poppler — (for full features, build from source) has a tool, pdfimages, which can extract embedded image files from PDFs. Some of the metadata might get lost, but all of the image bits come through.

Since JPEG 2000 isn’t included in web browsers (grar), I’ve embedded a sample scanned JPEG into a PDF, and added a series of progressively more compressed JPEG 2000 versions: JPEG-2000Booklet [PDF].  The booklet has notes showing the byte size of each page. The image still looks pretty good at 8% of the original file size!

Precisely what nobody wanted — the Hershey fonts as a huge great PDF

HersheyNumberedOccidentalGrid-section

It’s impractically huge, but under the image link lives a table of all of the Hershey fonts (well, the Western ones, at least). It’s interesting to note Dr Hershey’s preferences in this pre-ASCII table: almost every variant has degree, minute and second symbols, but none of them have ‘\’. Many of them don’t have ‘@’, either, so no e-mail addresses in Hershey Fraktur for you …

ICQuestionBank2csv

ICQuestionBank2csv: A tool to extract both the Basic and Advanced Amateur Radio Examination guides from Industry Canada’s rather annoying two-column PDFs. Written for IC’s 2014-02 database updates.

See: Amateur Radio Exam Generator.

Written by Stewart C. Russell (aka scruss) / VA3PID – 2014-03-07.

Requirements

  • Perl, with Text::CSV_XS
  • xpdf tools
  • Bash
  • wget

Usage

Run either basic2csv.sh or advanced2csv.sh to download the source PDF and extract the data.

Licence

WTFPL (srsly).

For all your HP 7470a plotter manual needs

On the off chance you need to control a 30 year old graphics plotter, have I got something for you:

HP-7470A_Graphics_Plotter-Interfacing_and_Programming_Manual

The image links to a scanned copy of the HP 7470A Graphics Plotter: Interfacing and Programming Manual which I found on the web, and cleaned up. The pages have been OCR’d, so it should be searchable.

Protext lives!

Protext screenshot (dosemu)

Oh man, Protext! For years, it was all I used: every magazine article, every essay at university (all two of them), my undergraduate dissertation (now mercifully lost to time: The Parametric Design of a Medium Specific Speed Pump Impeller, complete with spline-drawing code in HiSoft BASIC for the Amiga, is unlikely to be of value to anyone these days), letters — you name it, I used Protext for it.

I first had it on 16kB EPROM for the Amstrad CPC464; instant access with |P. I then ran it on the Amiga, snagging a cheap copy direct from the authors at a trade show. I think I had it for the PC, but I don’t really remember my DOS days too well.

The freeware version runs quite nicely under dosemu. You can even get it to print directly to PDF:

  1. In your Linux printer admin, set up a CUPS PDF printer. Anything sent to it will appear as a PDF file in the folder ~/PDF.
  2. Add the following lines to your ~/.dosemurc:
    $_lpt1 = “lpr -l -P PDF”
    $_printer_timeout = (20)
  3. In Protext, configure your printer to be a PostScript printer on LPT1:

The results come out not bad at all:

protext output as pdfProtext’s file import and export is a bit dated. You can use the CONVERT utility to create RTF, but it assumes Code page 437, so your accents won’t come out right. Adding \ansicpg437 to the end of the first line should make it read okay.

(engraving of Michel de Montaigne in mad puffy sleeves: public domain from Wikimedia Commons: File:Michel de Montaigne 1.jpg – Wikimedia Commons)

Wind Power, 1940s style

smith putnam wind turbineThis is how wind turbines were supposed to look, at least in the 1940s. It’s the experimental Smith-Putnam 1.25 MW unit than ran for a short while on a hill near Rutland, VT. The picture’s from a rather falling-apart copy of Large Horizontal-axis Wind Turbines (Thresher, R. W., & Solar Energy Research Institute. (1982). Large horizontal-axis wind turbines: Proceedings of a workshop held in Cleveland, Ohio, July 28-30, 1981. Golden, Colo: Solar Energy Research Institute) that I rescued from Jim‘s recycling years ago.

The first part of these proceedings has a historical review of the Smith-Putnam turbine, including an excerpt from the S. Morgan Smith Company’s house organ on the project. As the rest of the book is pretty much all about the MOD series of turbines, it’s of less interest. I’ve scanned the bits about the Smith-Putnam turbine, and put them here: NASA_DOE-1981-large_horizontal_axis_wind_turbines-excerpt. If anyone wants the book, let me know. It’s very ratty, but readable.

I’ve written about this turbine before, but in relation to a packet of crayons. More awesome turbine pictures from Paul Gipe: Smith-Putnam Industrial Photos.

.awesome

Here are the complete 1988-vintage Sun manuals “Using NROFF and TROFF” and “Formatting Documents” scanned just for you. I’d scanned these in 2000, and they’d sat on a forgotten archive volume since then.

(if you need to get your troff on, go to Ralph’s troff.org.)

pretty-printing Arduino sketches

I don’t often need it, but the code printing facility in the Arduino IDE is very weak. It has some colour highlighting, but no page numbering, no line numbering, and no headers at all.

a2ps will sort you right out here. Years back, it was a simple text to PostScript filter, but now it has many wonderful filters for pretty-printing code. The Wiring/Arduino language is basically C++, and a2ps knows how to deal with that. So, to create a PostScript file with a nice version of the the most basic Blink sketch:

a2ps --pro=color -C -1 -M letter -g --pretty-print='c++' -o ~/Desktop/Blink.ps Blink.ino

If you’re somewhere that uses sensible paper sizes (in other words, not North America), you probably don’t want the -M letter option. a2ps is supposed to have a PDF print option (-P pdf), but it doesn’t work on my installation, so I just splat the output through ps2pdf. The results are linked below:

Not bad, eh?

(Update: think I must have written this post on a Mac with a case-insensitive filesystem. Using the --pretty-print='C++' option I had before failed on Linux.)

My bank broke PDF … and how I used PDFBeads to fix it

I’m on a major decluttering toot. When I realised that the filing cabinet I bought three years ago would no longer close with all the papers stuffed in it, I knew something had to change. I’ve been shredding like it’s Houston in 2001. I have the duplex scanner to suck in the stuff I need to keep. I’m moving to paperless wherever possible to stop it building up again.

My bank provides PDF statements. Of this I approve. PDF is almost perfect for this: it provides an electronic version of the page, but with searchable text and the potential for some level of security. Except, this is not the way that my bank does it. At first glance, the text looks pretty harmless:

Zoom in, and it gets a bit blocky:

Zoom right in:

Aargh! Blockarama! Did they really store text as bitmaps? Sure enough, pdftotext output from the files contains no text. Running pdfimages produces hundreds of tiny images; here’s just a few:

Dear oh dear. This format is the worst of electronic, combined with paper’s lack of computer indexability. The producer claims to be Xenos D2eVision. Smooth work there, Xenos.

So, how can I fix this? It’s a bit of a pain to set this workflow up, but what I’ve done is:

  1. Convert the PDF to individual TIFF files at 300 dpi. Ghostscript is good for this:
    gs -SDEVICE=tiffg4 -r300x300 -sOutputFile=file%03d.tif -dNOPAUSE -dBATCH -- file.pdf
  2. Run Tesseract OCR on the TIFF files to make hOCR output:
    for f in file*tif
    do
    tesseract $f `basename $f` hocr
    done

    Update: Cuneiform seems to work better than Tesseract when feeding pdfbeads:
    for f in file*tif
    do
    cuneiform -f hocr -o `basename $f .tif`.html $f
    done
  3. Move the images and the hOCR files to a new folder; the filenames must correspond, so file001.tif needs file001.html, file002.tif file002.html, etc.
  4. In the new folder, run pdfbeads * > ../Output.pdf

The files are really small, and the text is recognized pretty well. It still looks pretty bad:

but at least the text can be copied and indexed.

This thread “Convert Scanned Images to a Single PDF File” got me up and running with PDFBeads. You might also have success using the method described here: “How to extract text with OCR from a PDF on Linux?” — it uses hocr2pdf to create single-page OCR’d PDFs, then joins them.

TCA is online

Radio Amateurs of Canada may seem a bit slow at times, but they’ve quietly gone and put their magazine The Canadian Amateur online. It has a decent interface, definitely up there with Exact Editions‘ work:

The files are downloadable as PDF, too. They look pretty decent on my e-reader:

(and yes, that is really an article about making a contact over 121km using a 5mW laser)

I don’t think any of the editions before 2012 will be going online. It would be nice, but RAC is severely limited in resources. The almost total lack of fanfare is a contrast to the ARRL’s digital QST, which is much announced but not actually available yet …

pdf qsl cards for all!

I’m right chuffed that people have started sending me PDF QSL cards in return. Here are just a few samples:

(Okay, so Federico’s was a JPEG; near enough …)

Thanks to all who sent them.

Yours truly, ‰te>a…t

I like epost. I’d like it even more if they hurried up and processed my direct payment ability — which required a form and a void cheque mailed to an address in Toronto — but it’s a pretty good service. I get my bills, viewable and payable online, on the day of issue. No paper. This is good.

This is good because every single filing container I buy eventually ends up full of (paid) bills and financial administrivia. Less paper = less messy Stewart = happy Stewart. Some messes, like my electronics table, could be classed as glorious, however, and therefore joyous in their creation and use. Not all tidiness is good.

So I got my first Visa bill by e-post. Yay! Reviewed it, paid it. No hassle. But since this a PDF facsimile of my bill, something mighty odd has happened to my address:

It’s a perfect substitution cypher of my name and address. I’ve been out of the prepress industry for long enough not to immediately recognize it as a font encoding error. I’m confused why it might have A, T, E & N, but no M. Odd indeed.