Raspberry Pi Zero 2 W: initial performance

Running A Pi Pie Chart turned out some useful performance numbers. It’s almost, but not quite, a Raspberry Pi 3B in a Raspberry Pi Zero form factor.

32-bit mode

Running stock Raspberry Pi OS with desktop, compiled with stock options:

pie chart comparing multi-thread numeric performance of Raspberry Pi Zero 2 W: slightly faster than a Raspberry Pi 2B
multi-thread results
pie chart comparing single-thread numeric performance of Raspberry Pi Zero 2 W: slightly faster than a Raspberry Pi 2B
single-thread results
time ./pichart-openmp -t "Zero 2W, OpenMP"
pichart -- Raspberry Pi Performance OPENMP version 36

Prime Sieve          P=14630843 Workers=4 Sec=2.18676 Mops=427.266
Merge Sort           N=16777216 Workers=8 Sec=1.9341 Mops=208.186
Fourier Transform    N=4194304 Workers=8 Sec=3.10982 Mflops=148.36
Lorenz 96            N=32768 K=16384 Workers=4 Sec=4.56845 Mflops=705.102

The Zero 2W, OpenMP has Raspberry Pi ratio=8.72113
Making pie charts...done.

real	8m20.245s
user	15m27.197s
sys	0m3.752s

-----------------------------

time ./pichart-serial -t "Zero 2W, Serial"
pichart -- Raspberry Pi Performance Serial version 36

Prime Sieve          P=14630843 Workers=1 Sec=8.77047 Mops=106.531
Merge Sort           N=16777216 Workers=2 Sec=7.02049 Mops=57.354
Fourier Transform    N=4194304 Workers=2 Sec=8.58785 Mflops=53.724
Lorenz 96            N=32768 K=16384 Workers=1 Sec=17.1408 Mflops=187.927

The Zero 2W, Serial has Raspberry Pi ratio=2.48852
Making pie charts...done.

real	7m50.524s
user	7m48.854s
sys	0m1.370s

64-bit

Running stock/beta 64-bit Raspberry Pi OS with desktop. Curiously, these ran out of memory (at least, in oom-kill‘s opinion) with the desktop running, so I had to run from console. This also meant it was harder to capture the program run times.

The firmware required to run in this mode should be in the official distribution by now.

pie chart comparing 64 bit multi-thread numeric performance of Raspberry Pi Zero 2 W: slightly faster than a Raspberry Pi 2B
multi-thread, 64 bit: no, I can’t explain why Lorenz is better than a 3B+
pie chart comparing 64 bit single-thread numeric performance of Raspberry Pi Zero 2 W: slightly faster than a Raspberry Pi 2B
single thread, again with the bump in Lorenz performance
pichart -- Raspberry Pi Performance OPENMP version 36

Prime Sieve          P=14630843 Workers=4 Sec=1.78173 Mops=524.395
Merge Sort           N=16777216 Workers=8 Sec=1.83854 Mops=219.007
Fourier Transform    N=4194304 Workers=4 Sec=2.83797 Mflops=162.572
Lorenz 96            N=32768 K=16384 Workers=4 Sec=2.66808 Mflops=1207.32

The Zero2W-64bit has Raspberry Pi ratio=10.8802
Making pie charts...done.

-------------------------

pichart -- Raspberry Pi Performance Serial version 36

Prime Sieve          P=14630843 Workers=1 Sec=7.06226 Mops=132.299
Merge Sort           N=16777216 Workers=2 Sec=6.75762 Mops=59.5851
Fourier Transform    N=4194304 Workers=2 Sec=7.73993 Mflops=59.6095
Lorenz 96            N=32768 K=16384 Workers=1 Sec=9.00538 Mflops=357.7

The Zero2W-64bit has Raspberry Pi ratio=3.19724
Making pie charts...done.

The main reason for the Raspberry Pi Zero 2 W appearing slower than the 3B and 3B+ is likely that it uses LPDDR2 memory instead of LPDDR3. 64-bit mode provides is a useful performance increase, offset by increased memory use. I found desktop apps to be almost unusably swappy in 64-bit mode, but there might be some tweaking I can do to avoid this.

Unlike the single core Raspberry Pi Zero, the Raspberry Pi Zero 2 W can be made to go into thermal throttling if you’re really, really determined. Like “3 or more cores running flat-out“-determined. In my testing, two cores at 100% (as you might get in emulation) won’t put it into thermal throttling, even in the snug official case closed up tight. More on this later.

(And a great big raspberry blown at Make, who leaked the Raspberry Pi Zero 2 W release a couple of days ago. Not classy.)

bench64: a new BASIC benchmark index for 8-bit computers

Nobody asked for this. Nobody needs this. But here we are …

commodore 64 screen shot showing benchmark results:
basic bench index
>i good. ntsc c64=100

1/8 - for:
 60 s; 674.5 /s; i= 100
2/8 - goto:
 60 s; 442.3 /s; i= 100
3/8 - gosub:
 60 s; 350.8 /s; i= 100
4/8 - if:
 60 s; 242.9 /s; i= 100
5/8 - fn:
 60 s; 60.7 /s; i= 100
6/8 - maths:
 60 s; 6.4 /s; i= 100
7/8 - string:
 60 s; 82.2 /s; i= 100
8/8 - array:
 60 s; 27.9 /s; i= 100

overall index= 100


ready.
bench64 running on the reference system, an NTSC Commodore 64c

Inspired by J. G. Harston’s clever but domain-specific ClockSp benchmark, I set out to write a BASIC benchmark suite that was:

  1. more portable;
  2. based on a benchmark system that more people might own;
  3. and a bunch of other less important ideas.

Since I already had a Commodore 64, and seemingly several million other people did too, it seemed like a fair choice to use as the reference system. But the details, so many details …

basic bench index
>i good. ntsc c64=100

1/8 - for:
 309.5 s; 130.8 /s; i= 19 
2/8 - goto:
 367.8 s; 72.1 /s; i= 16 
3/8 - gosub:
 340.9 s; 61.7 /s; i= 18 
4/8 - if:
 181.8 s; 80.1 /s; i= 33 
5/8 - fn:
 135.3 s; 26.9 /s; i= 44 
6/8 - maths:
 110.1 s; 3.5 /s; i= 54 
7/8 - string:
 125.8 s; 39.2 /s; i= 48 
8/8 - array:
 103 s; 16.3 /s; i= 58 

overall index= 29
It was entirely painful running the same code on a real ZX Spectrum at under â…“ the speed of a C64

(I mean: who knew that Commodore PET BASIC could run faster or slower depending on how your numbered your lines? Not me — until today, that is.)

While the benchmark doesn’t scale well for BASIC running on modern computers — the comparisons between a simple 8-bit processor at a few MHz and a multi-core wildly complex modern CPU at many GHz just aren’t applicable — it turns out I may have one of the fastest 8-bit BASIC computers around in the matchbox-sized shape of the MinZ v1.1 (36.864 Z180, CP/M 2.2, BBC BASIC [Z80] v3):

BASIC BENCH INDEX
>I GOOD. NTSC C64=100

1/8 - FOR:
 3.2 S; 12778 /S; I= 1895 
2/8 - GOTO:
 6.1 S; 4324.5 /S; I= 978 
3/8 - GOSUB:
 3.1 S; 6789 /S; I= 1935 
4/8 - IF:
 2.9 S; 4966.9 /S; I= 2046 
5/8 - FN:
 3.5 S; 1030.6 /S; I= 1698 
6/8 - MATHS:
 1.5 S; 255.3 /S; I= 4000 
7/8 - STRING:
 2.6 S; 1871.6 /S; I= 2279 
8/8 - ARRAY:
 3.1 S; 540.3 /S; I= 1935 

OVERALL INDEX= 1839 

That’s more than 9× the speed of a BBC Micro Model B.

Github link: bench64 – a new BASIC benchmark index for 8-bit computers.

Archive download:

BeagleBone Black: slow as a dog

All benchmarks are artificial, but this one had me scratching my head. One hears  that the BeagleBone Black is screamingly fast compared to the Raspberry Pi; faster, newer processor, blahdeblah, mcbtyc, etc. I found the opposite is true.

So I buy one at the exceptionally soggy Toronto Mini Maker Faire. Props to the CircuitCo folks, they are easy to set up: just a mini-USB cable provides power and virtual network shell. And BoneScript — an Arduino-like JavaScript library — is very clever indeed. But I need to see if this thing has any grunt, and so I need a benchmark.

After hearing about the business-card raytracer, I thought it would be perfect. I compiled it on both machines with:

g++  -Ofast   card.cpp   -o card

and then ran it with:

time ./card > /dev/null

The results are … surprising:

  • Raspberry Pi: 4′ 15″
  • BeagleBone Black: 12′ 39″ → 3× slower

(In comparison, my i7 quad-core laptop runs it in 8½ seconds.)

I don’t have any explanation why the BBB is so much slower. It’s almost as if the compiler isn’t fully optimizing under Ã…ngström Linux.

Raspberry Pi: system info

$ uname -a
Linux rpi 3.6.11+ #538 PREEMPT Fri Aug 30 20:42:08 BST 2013 armv6l GNU/Linux

$ cat /proc/cpuinfo 
Processor    : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS    : 697.95
Features    : swp half thumb fastmult vfp edsp java tls 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x0
CPU part    : 0xb76
CPU revision    : 7

Hardware    : BCM2708
Revision    : 000f

BeagleBone Black: system info

# uname -a
Linux beaglebone 3.8.13 #1 SMP Tue Jun 18 02:11:09 EDT 2013 armv7l GNU/Linux
# cat /proc/cpuinfo 
processor    : 0
model name    : ARMv7 Processor rev 2 (v7l)
BogoMIPS    : 297.40
Features    : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x3
CPU part    : 0xc08
CPU revision    : 2

Hardware    : Generic AM33XX (Flattened Device Tree)
Revision    : 0000

Both boards are running at stock speed.

Update: I’ve tried with an external power supply, and checked that the processor was running at full speed. It made no difference. I suspect that Raspbian enables armhf floating point by default, while Ã…ngström needs to be told to use it.