Tag Archives: scan

A (mostly) colour-managed workflow for Linux for not too many $$$

Colour management is good. It means that what I see on the screen is what you meant it to look like, and anything I make with a colour-managed workflow you’ll see in the colours I meant it to have. (Mostly.) You can spend a lot of money to do this professionally, but you can also get most of the benefits for about $125, if you’re prepared to do some fiddly stuff.

The most important part is calibrating your display. Hughski’s ColorHug (which I’ve mentioned before) is as close to plug-and-play as you’ll get: plug it in, and the colour management software pops up with prompts on what to do next. Attach the ColorHug to the screen (with the newly supplied stretchy band), let it burble away for 10–20 minutes, and the next time you log in, colours will be just right.

Calibrating the scanner on my Epson WorkForce WF-7520 was much more work, and the process could use optimization. To calibrate any scanner, you need a physical colour target to scan and compare against reference data. The cheapest place to get these (unless there was one in the box with your scanner) is Wolf Faust’s Affordable IT 8.7 (ISO 12641) Scanner Colour Calibration Targets. If there are a bunch of likeminded folk in your area, it’s definitely worth clubbing together on a group buy to save on shipping. It’s also less work for Wolf, since he doesn’t have to send out so many little packages.

(I’ve known of Wolf Faust since my Amiga days. He produced the most glorious drivers for Canon printers, and Jeff Walker produced the camera-ready copy for JAM using Wolf’s code. While Macs had the high end DTP sewn up back then, you could do amazing things on a budget with an Amiga.)

colour targetThe target comes packed in a protective sleeve, and along with a CD-R containing the calibration data which matches the print run of the target. Wolf makes a lot of targets for OEMs, and cost savings from his volume clients allow him to sell to individuals cheaply.

Scanning the thing without introducing automatic image corrections was the hard part. I found that my scanner had two drivers (epson2 and epkowa), the latter of which claimed to support 48-bit scanning. Unfortunately, it only supports 24-bit, like the epson2 driver, so whichever I chose was moot. I used the scanimage command line tool to make the scan:

scanimage --mode Color -x 175 -y 125 --format=tiff --resolution 300 > Epson-Workforce_WF-7520-WFaust-R1.tiff

which looks, when reduced down to web resolution, a bit like this:

Epson-Workforce_WF-7520-WFaust-R1It looks a lot darker than the physical target, so it’s clear that the scanner needs calibrating. To do this, you need two tools from the Argyll Colour Management System. The first creates a text representation of the scanned target’s colour patches:

scanin -v Epson-Workforce_WF-7520-WFaust-R1.tiff /usr/share/color/argyll/ref/it8.cht IT87/r130227.txt diag.tiff

The result is a smallish text file Epson-Workforce_WF-7520-WFaust-R1.ti3 which needs one more step to make a standard ICC profile:

colprof -A Epson -M 'Workforce WF-7520' -D 'WFaust R1' -ax -qm Epson-Workforce_WF-7520-WFaust-R1

I didn’t quite need to add that much metadata, but I could, so I did. The resultant ICC file can be used to apply colour calibrations to scanned images. Here’s the target scan, corrected:

Epson-Workforce_WF-7520-WFaust-R1-corrected

(I’ve made this a mouseover with the original image, so you can see the difference. Also, yes, there is a greasy thumb-print on my scanner glass near the bottom right, thank you so much for noticing.)

I used tifficc from the Little CMS package to apply the colour correction:

tifficc -v -i Epson-Workforce_WF-7520-WFaust-R1.icc Epson-Workforce_WF-7520-WFaust-R1.tiff Epson-Workforce_WF-7520-WFaust-R1-corrected.tiff

There are probably many easier, quicker ways of doing this, but this was the first thing I found that worked.

To show you a real example, here’s an un-retouched scan of the cover of Algrove Publishing‘s book “All the Knots You Need”, scanned at 75 dpi. Mouseover to see the corrected version:

all-the-knots-you-need_algrove

(Incidentally, there are two old but well-linked programs that are out there that purport to do scanner calibration: Scarse and LPROF. Don’t use them! They’re really hard to build on modern systems, and the Argyll tools work well.)

The last part of my workflow that remains uncalibrated is my printer. I could make a target with Argyll, print it, scan it, colour correct it, then use that as the input to colprof as above. I’m suspecting the results would be mediocre, as my scanner’s bit depth isn’t great, and I’d have to do this process for every paper and print setting combination. I’d also have to work out what magic CUPS does and compensate. Maybe later, but not yet.

Wind Power, 1940s style

smith putnam wind turbineThis is how wind turbines were supposed to look, at least in the 1940s. It’s the experimental Smith-Putnam 1.25 MW unit than ran for a short while on a hill near Rutland, VT. The picture’s from a rather falling-apart copy of Large Horizontal-axis Wind Turbines (Thresher, R. W., & Solar Energy Research Institute. (1982). Large horizontal-axis wind turbines: Proceedings of a workshop held in Cleveland, Ohio, July 28-30, 1981. Golden, Colo: Solar Energy Research Institute) that I rescued from Jim‘s recycling years ago.

The first part of these proceedings has a historical review of the Smith-Putnam turbine, including an excerpt from the S. Morgan Smith Company’s house organ on the project. As the rest of the book is pretty much all about the MOD series of turbines, it’s of less interest. I’ve scanned the bits about the Smith-Putnam turbine, and put them here: NASA_DOE-1981-large_horizontal_axis_wind_turbines-excerpt. If anyone wants the book, let me know. It’s very ratty, but readable.

I’ve written about this turbine before, but in relation to a packet of crayons. More awesome turbine pictures from Paul Gipe: Smith-Putnam Industrial Photos.

.awesome

Here are the complete 1988-vintage Sun manuals “Using NROFF and TROFF” and “Formatting Documents” scanned just for you. I’d scanned these in 2000, and they’d sat on a forgotten archive volume since then.

(if you need to get your troff on, go to Ralph’s troff.org.)

Too many QR Codes

I have, of late, been rather more attached to QR Codes than might be healthy. I’ve been trying all sorts of sizes and input data, printing them, and seeing what camera phones can scan them. I tried three different devices to scan the codes:

  • iPhone 4s – 8 MP, running either i-nigma (free) or Denso Wave’s own QRdeCODE ($2). QRdeCODE is better, but then, it should be, since it was created by the developer of the QR Code standard.
  • Nexus 7 – 1.2 MP, running Google Goggles.
  • Nokia X2-01Catherine‘s new(ish) phone, which I can’t believe only has a 0.3 MP VGA camera on it. Still, it worked for a small range of codes.

QR Code readability is defined by the module size; that is, the number of device pixels (screen or print) that represent a single QR Code pixel. Denso Wave recommends that each module is made up of 4 or more dots. I was amazed that the iPhone could read images with a module size of 1 from the screen, like this one:

hello_____-ei-m01-300dpi

On this laptop, one pixel is about 0.24 mm. The other cameras didn’t fare so well on reading from the screen:

  • iPhone 4s – Min module size: 1-2 pixels (0.24-0.48 mm/module)
  • Nexus 7 – Min module size: 2-3 pixels (0.48-0.72 mm/module)
  • Nokia X2-01 – Min module size: 3-4 pixels (0.72-0.96 mm/module)

So I guess for screen scanning, Denso Wave’s recommendation of 4 pixels/module will pretty much work everywhere.

I then generated and printed a bunch of codes on a laser printer, and scanned them. The results were surprisingly similar:

  • iPhone 4s – Min module size: 3-4 dots (0.25-0.34 mm/module)
  • Nexus 7 – Min module size: 4-5 dots (0.34-0.42 mm/module)
  • Nokia X2-01 – Min module size: 8-9 dots (0.68-0.76 mm/module)

A test print on an inkjet resulted in far less impressive results. I reckon you need to make the module size around 25% bigger on an inkjet than a laser, perhaps because the inkjet is less crisp.

I have to admit I went a bit nuts with QR Codes. I made a Vcard: my vcard

(and while I was at it, I created a new field for ham radio operators: X-CALLSIGN. Why not?). I even encoded some locations in QR Codes.

Just to show you what qrencode can do, here’s a favourite piece of little prose:

a_real_man

optar: paper-based archiving

I’ve spent most of the day messing around with Twibright Optar, a way of creating printed archives of binary data that can be scanned back in and restored.  It looks like it was written as a proof-of-concept, as the only way to change options is to modify the code and recompile. Eppur si muove.

To compile the code on OS X, I found I had to change this line in the Makefile from:

LDFLAGS=-lm

to

LDFLAGS=-lm  `libpng-config --L_opts`

After trying to print some samples at the default resolution, I had no luck, so for reliability I halved the data density settings in the file optar.h:

#define XCROSSES 33 /* Number of crosses horizontally */
#define YCROSSES 43 /* Number of crosses vertically */

It’s quite important that your image prints and scans with a whole number of printer dots to image pixels. This used to be quite easy to do, before the advent of PDF’s “Scale to fit” misfeature, and also printer drivers that do a tonne of work in the background to “improve” the image. Add the mismatch between laser printer resolutions (300, 600, 1200 dpi …) and inkjets (360, 720, 1440 dpi …), and you’ve got lots of ways that this can go wrong.

Thankfully, there’s one common resolution that works across both types of printers. If you output the image at 120 dpi, that’s 5 laser printer dots at 600 dpi, or six inkjet dots at 720 dpi. And there was peace in the kingdom.

Here’s a demo, based on this:

So I took this track (which I used to have as a 7″, got at a jumble sale in the mid-70s) and converted it to a really low quality MPEG-2.5: MichelinJingle8kbit — that’s 175KB for just shy of three minutes of music (which, at this bitrate, sounds like it’s played through a layer of socks at the bottom of the Marianas Trench, but still).

Passing it through optar (which I wish wouldn’t produce PGM files; its output is mono) and bundling the pages into a PDF, I get this: optar_mj.pdf (760KB). Scanning that printout at 600dpi and running the pages through unoptar, I got this: optar1_mj.mp3. It’s the same as the input file, except padded with zeros at the end.

Sometimes, the scanning and conversion doesn’t do so well:

  • mjoptar300dpi.mp3 — this is what happens when you scan at too low a resolution.
  • mjx.mp3 — I have no idea what went wrong here, but: glitchtastic!

My bank broke PDF … and how I used PDFBeads to fix it

I’m on a major decluttering toot. When I realised that the filing cabinet I bought three years ago would no longer close with all the papers stuffed in it, I knew something had to change. I’ve been shredding like it’s Houston in 2001. I have the duplex scanner to suck in the stuff I need to keep. I’m moving to paperless wherever possible to stop it building up again.

My bank provides PDF statements. Of this I approve. PDF is almost perfect for this: it provides an electronic version of the page, but with searchable text and the potential for some level of security. Except, this is not the way that my bank does it. At first glance, the text looks pretty harmless:

Zoom in, and it gets a bit blocky:

Zoom right in:

Aargh! Blockarama! Did they really store text as bitmaps? Sure enough, pdftotext output from the files contains no text. Running pdfimages produces hundreds of tiny images; here’s just a few:

Dear oh dear. This format is the worst of electronic, combined with paper’s lack of computer indexability. The producer claims to be Xenos D2eVision. Smooth work there, Xenos.

So, how can I fix this? It’s a bit of a pain to set this workflow up, but what I’ve done is:

  1. Convert the PDF to individual TIFF files at 300 dpi. Ghostscript is good for this:
    gs -SDEVICE=tiffg4 -r300x300 -sOutputFile=file%03d.tif -dNOPAUSE -dBATCH -- file.pdf
  2. Run Tesseract OCR on the TIFF files to make hOCR output:
    for f in file*tif
    do
    tesseract $f `basename $f` hocr
    done

    Update: Cuneiform seems to work better than Tesseract when feeding pdfbeads:
    for f in file*tif
    do
    cuneiform -f hocr -o `basename $f .tif`.html $f
    done
  3. Move the images and the hOCR files to a new folder; the filenames must correspond, so file001.tif needs file001.html, file002.tif file002.html, etc.
  4. In the new folder, run pdfbeads * > ../Output.pdf

The files are really small, and the text is recognized pretty well. It still looks pretty bad:

but at least the text can be copied and indexed.

This thread “Convert Scanned Images to a Single PDF File” got me up and running with PDFBeads. You might also have success using the method described here: “How to extract text with OCR from a PDF on Linux?” — it uses hocr2pdf to create single-page OCR’d PDFs, then joins them.

creating a TrueType font from your handwriting with your scanner, your printer, and FontForge

This looks more than a bit like my handwriting

because it is my handwriting! Sure, the spacing of the punctuation needs major work, and I could have fiddled with the baseline alignment, but it’s legible, which is more than can usually be said of my own chicken-scratch.

This process is a little fiddly, but all the parts are free, and it uses free software. This all runs from the command line. I wrote and tested this on a Mac (with some packages installed from DarwinPorts), but it should run on Linux. It might need Cygwin under Windows; I don’t know.

Software you will need:

  • a working Perl interpreter
  • NetPBM, the free graphics converter toolkit
  • FontForge, the amazing free font editor. (Yes, I said amazing. I didn’t say easy to use …)
  • autotrace or potrace so that FontForge can convert the scanned bitmaps to vectors
  • some kind of bitmap editor.

You will need to download

  • fonttrace.pl – splits up a (very particular) bitmap grid into character cells
  • chargrid.pdf – the font grid template for printing

Procedure:

  1. Print at least the first page of chargrid.pdf. The second page is guidelines that you can place under the page. This doesn’t work very well if you use thick paper.
  2. Draw your characters in the boxes. Keep well within the lines; there’s nothing clever about how fonttrace.pl splits the page up.
  3. Scan the page, making sure the page is as straight as possible and the scanner glass is spotless. You want to scan in greyscale or black and white.
  4. Crop/rotate/skew the page so the very corners of the character grid table are at the edges of the image, like this: I find it helpful at this stage to clean off any specks/macules. I also scale and threshold the image so I get a very dark image at 300-600dpi.
  5. Save the image as a Portable Bitmap (PBM). It has to be 1-bit black and white. You might want to put a new font in a new folder, as the next stage creates lots of files, and might overwrite your old work.
  6. Run fonttrace.pl like this:
    fonttrace.pl infile.pbm | sh
    If you miss out the call to the shell, it will just print out the commands it would have run to create the character tiles.
  7. This should result in a bunch of files called uniNNNN.png in the current folder, like these:
    W

    uni0057.png

    i

    uni0069.png

    s

    uni0073.png

    p

    uni0070.png

    y

    uni0079.png

  8. Fire up FontForge. You’ll want to create a New font. Now File→Import…, and use Image Template as the format. Point it at the first of the image tiles (uni0020.png), and Import.
  9. Select Edit→Select→All, then Element→Autotrace. You’ll see your characters appear in the main window.
  10. And that’s – almost – it. You’ll need to fiddle with (auto)spacing, set up some kerning tables, set the font name (in Element→Font Info … – and you’ll probably want to set the em scale to 1024, as TrueType fonts like powers of two), then File→Generate Fonts. Fontforge will throw you a bunch of warnings and suggestions, and I’d recommend reading the help to find out what they mean.

There are a couple of limitations to the process:

  • Most of the above process could be written into a FontForge script to make things easier
  • Only ASCII characters are supported, to keep the number of scanned pages simple. Sorry. I’d really like to support more. You’re free to build on this.

Lastly, a couple of extra files:

  • CrapHand2.pbm – a sample array drawn by me, gzipped for your inconvenience (and no, I don’t know why WordPress is changing the file extension to ‘pbm_’ either).
  • chargrid.ods – the OpenOffice spreadsheet used to make chargrid.pdf

Have fun! Write nicely!

to scan film, or not

I’ve recently taken up film photography again. But processing is expensive.

To have 24 exposures processed and scanned at 6MP at Downtown Camera costs $12 + tax. That’s a pretty good price for black and white.

I can process at home (yay stinky toxic chemicals!) for a bit less. I’d need to buy a scanner, and the cheapest film scanners come in at around $300.

What to do, what to do?

Lady Goosepelt Rides Again!

Lady Goosepelt, from What a Life!

In case anyone wants them, the 600 dpi page images of What a Life! are stored in this PDF: what_a_life.pdf (16MB). If you merely wish to browse, all the images from the book are here.

I got a bit carried away with doing this. Instead of just smacking together all the 360 dpi TIFFs I scanned seven years ago, I had to scan a new set at a higher resolution, then crop them, then fix the page numbers, add chapter marks, and make the table of contents a set of live links.

I’ve got out of the way of thinking in PostScript, so I spent some time looking for tools that would do things graphically. Bah! These things’d cost a fortune, so armed only with netpbm, libtiff, ghostscript, the pdfmark reference, Aquamacs, awk to add content based on the DSC, and gimp to work out the link zones on the contents page, I made it all go. Even I’m impressed.

One thing that didn’t impress me, though:

aquamacs file size warning

I used to edit multi-gigabyte files with emacs on Suns. They never used to complain like this. They just loaded (admittedly fairly slowly) and let me do my thing. Real emacs don’t give warning messages.

All the printers I’ve ever owned …

bird you can see: hp print test

  • An ancient (even in 1985) Centronics serial dot-matrix printer that we never got working with the CPC464. The print head was driven along a rack, and when it hit the right margin, an idler gear was wedged in place, forcing the carriage to return. Crude, noisy but effective.
  • Amstrad DMP-2000. Plasticky but remarkably good 9-pin printer. Had an open-loop ribbon that we used to re-ink with thick oily endorsing ink until the ribbons wore through.
  • NEC Pinwriter P20. A potentially lovely 24-pin printer ruined by a design flaw. Print head pins would get caught in the ribbon, and snap off. It didn’t help that the dealer that sold it to me wouldn’t refund my money, and required gentle persuasion from a lawyer to do so.
  • Kodak-Diconix 300 inkjet printer. I got this to review for Amiga Computing, and the dealer never wanted it back. It used HP ThinkJet print gear which used tiny cartridges that sucked ink like no tomorrow; you could hear the droplets hit the page.
  • HP DeskJet 500. I got this for my MSc thesis. Approximately the shape of Torness nuclear power station (and only slightly smaller), last I heard it was still running.
  • Canon BJ 200. A little mono inkjet printer that ran to 360dpi, or 720 if you had all the time in the world and an unlimited ink budget.
  • Epson Stylus Colour. My first colour printer. It definitely couldn’t print photos very well.
  • HP LaserJet II. Big, heavy, slow, and crackling with ozone, this was retired from Glasgow University. Made the lights dim when it started to print. Came with a clone PostScript cartridge that turned it into the world’s second-slowest PS printer. We did all our Canadian visa paperwork on it.
  • Epson Stylus C80. This one could print photos tolerably well, but the cartridges dried out quickly, runing the quality and making it expensive to run.
  • Okidata OL-410e PS. The world’s slowest PostScript printer. Sold by someone on tortech who should’ve known better (and bought by someone who also should’ve known better), this printer jams on every sheet fed into it due to a damaged paper path. Unusually, it uses an LED imaging system instead of laser xerography, and has a weird open-hopper toner system that makes transporting a part-used print cartridge a hazard.
  • HP LaserJet 4M Plus. With its duplexer and extra paper tray it’s huge and heavy, but it still produces crisp pages after nearly 1,000,000 page impressions. I actually have two of these; one was bought for $99 refurbished, and the other (which doesn’t print nearly so well) was got on eBay for $45, including duplexer and 500-sheet tray. Combining the two (and judiciously adding a bunch of RAM) has given me a monster network printer which lets you know it’s running by dimming the lights from here to Etobicoke.
  • IBM Wheelwriter typewriter/ daisywheel printer. I’ve only ever produced a couple of pages on this, but this is the ultimate letter-quality printer. It also sounds like someone slowly machine-gunning the neighbourhood, so mostly lives under wraps.
  • HP PhotoSmart C5180. It’s a network photo printer/scanner that I bought yesterday. Really does print indistinguishably from photos, and prints direct from memory cards. When first installed, makes an amusing array of howls, boinks, squeals, beeps and sproings as it primes the print heads.

stones, as current vernacular would have it

Finatics' sign, by Big Al's

I’m no fan of billboards, but I have to congratulate Mike of Finatics for sheer gall when he put up this sign. See the plastic shark on the building behind? That’s Big Al’s, one of the biggest aquarium stores in Canada. Mike’s probably not going to get any favours from them any time soon.

in Aalborg

I like Aalborg. I think we’re staying in exactly the same hotel (the Scandic) as I stayed in 10 years ago with RES. We’re going to see some really big wind turbines tomorrow.

Oh, and the Google Maps locations I picked off for this hotel are pretty darn accurate; the one I double-clicked on for this hotel is less than 50m from my room. I like.