All benchmarks are artificial, but this one had me scratching my head. One hears that the BeagleBone Black is screamingly fast compared to the Raspberry Pi; faster, newer processor, blahdeblah, mcbtyc, etc. I found the opposite is true.
So I buy one at the exceptionally soggy Toronto Mini Maker Faire. Props to the CircuitCo folks, they are easy to set up: just a mini-USB cable provides power and virtual network shell. And BoneScript — an Arduino-like JavaScript library — is very clever indeed. But I need to see if this thing has any grunt, and so I need a benchmark.
After hearing about the business-card raytracer, I thought it would be perfect. I compiled it on both machines with:
g++ -Ofast  card.cpp  -o card
and then ran it with:
time ./card > /dev/null
The results are … surprising:
- Raspberry Pi: 4′ 15″
- BeagleBone Black: 12′ 39″ → 3× slower
(In comparison, my i7 quad-core laptop runs it in 8½ seconds.)
I don’t have any explanation why the BBB is so much slower. It’s almost as if the compiler isn’t fully optimizing under Ã…ngström Linux.
Raspberry Pi: system info
$ uname -a Linux rpi 3.6.11+ #538 PREEMPT Fri Aug 30 20:42:08 BST 2013 armv6l GNU/Linux $ cat /proc/cpuinfo Processor   : ARMv6-compatible processor rev 7 (v6l) BogoMIPS   : 697.95 Features   : swp half thumb fastmult vfp edsp java tls CPU implementer   : 0x41 CPU architecture: 7 CPU variant   : 0x0 CPU part   : 0xb76 CPU revision   : 7 Hardware   : BCM2708 Revision   : 000f
BeagleBone Black: system info
# uname -a Linux beaglebone 3.8.13 #1 SMP Tue Jun 18 02:11:09 EDT 2013 armv7l GNU/Linux # cat /proc/cpuinfo processor   : 0 model name   : ARMv7 Processor rev 2 (v7l) BogoMIPS   : 297.40 Features   : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls CPU implementer   : 0x41 CPU architecture: 7 CPU variant   : 0x3 CPU part   : 0xc08 CPU revision   : 2 Hardware   : Generic AM33XX (Flattened Device Tree) Revision   : 0000
Both boards are running at stock speed.
Update: I’ve tried with an external power supply, and checked that the processor was running at full speed. It made no difference. I suspect that Raspbian enables armhf floating point by default, while Ã…ngström needs to be told to use it.
The stock speed on the BeagleBone is not set at the specs.
Use cpufreq-info and cpufreq-set to change.
Look here: http://beaglebone.cameon.net/home/set-cpu-speed
I ran this test on my normal beaglebone, not the black with the clock forced at 720 mhz on Ubuntu 13.10 saucy:
ubuntu@arm:~/temp$ g++ -Ofast card.cpp -o card
ubuntu@arm:~/temp$ time ./card > /dev/null
real 11m7.456s
user 11m4.244s
sys 0m0.125s
uname -a readout :
3.8.13-bone28 #1 SMP Fri Sep 13 01:11:14 UTC 2013 armv7l armv7l armv7l GNU/Linux
cat /proc/cpuinfo readout:
processor : 0
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 181.83
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc08
CPU revision : 2
Hardware : Generic AM33XX (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000
Interesting how a slower cpu with slower ram can run a little faster. Perhaps you are right about the distro?
also, -march=native only reduced execution time by mere seconds and has a negligible impact.
I forgot my g++ -v:
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.8/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v –with-pkgversion=’Ubuntu/Linaro 4.8.1-10ubuntu8′ –with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs –enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ –prefix=/usr –program-suffix=-4.8 –enable-shared –enable-linker-build-id –libexecdir=/usr/lib –without-included-gettext –enable-threads=posix –with-gxx-include-dir=/usr/include/c++/4.8 –libdir=/usr/lib –enable-nls –with-sysroot=/ –enable-clocale=gnu –enable-libstdcxx-debug –enable-libstdcxx-time=yes –enable-gnu-unique-object –disable-libitm –disable-libquadmath –enable-plugin –with-system-zlib –disable-browser-plugin –enable-java-awt=gtk –enable-gtk-cairo –with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-armhf/jre –enable-java-home –with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-armhf –with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-armhf –with-arch-directory=arm –with-ecj-jar=/usr/share/java/eclipse-ecj.jar –enable-objc-gc –enable-multiarch –enable-multilib –disable-sjlj-exceptions –with-arch=armv7-a –with-fpu=vfpv3-d16 –with-float=hard –with-mode=thumb –disable-werror –enable-checking=release –build=arm-linux-gnueabihf –host=arm-linux-gnueabihf –target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)
Beaglebones don’t run at top speed if powered only by USB. They power adaptor must be connected.
That seems a bit daft.
Hmm, I tried it. With external power supply and freq set to max (1000Mhz) and I got same result:12 min. It is weird.
Is your Raspberry overclocked? Maybe PREEMPT linux helps for better results…I dont know.
I booted Debian from SD card and I tried it again.
Sys. informations:
Linux debian-armhf 3.8.13-bone30 #1 SMP Thu Nov 14 02:59:07 UTC 2013 armv7l GNU/Linux
processor : 0
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 663.07
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc08
CPU revision : 2
Hardware : Generic AM33XX (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000
Comp.:
g++ -Ofast card.cpp -o card -mfloat-abi=hard
Result:
real 4m55.239s
user 4m55.117s
sys 0m0.008s
Yes, I now find out that Ångström only runs as soft float, and to get peak performance, you need a command line like:
g++ -o card-bbb -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ffast-math -O3 -lm card.cpp
What is the peak performance you get, if you use those options?
The same 4′ 55″ that BeranekCZ got.
Mt Beaglebone Black has BogoMIPS as 990+
As does mine, now it’s running Debian.
My BBB has BogoMIPS as 990+
And with Single cable connection, the same exercise yield time of
real 12m56.424s
user 12m41.154s
sys 0m0.191s
——————————–
Ooh, painful; I managed to put a dev system onto an Intel Galileo. I may be doing something wrong:
time ./card > /dev/null
real 13m19.794s
user 13m11.310s
sys 0m3.560s
=======
# uname -a
Linux clanton 3.8.7-yocto-standard #1 Tue Oct 1 00:09:01 IST 2013 i586 GNU/Linux
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 5
model : 9
model name : 05/09
stepping : 0
cpu MHz : 399.100
cache size : 0 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : yes
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 7
wp : yes
flags : fpu vme pse tsc msr pae cx8 apic pge pbe nx smep
bogomips : 798.20
clflush size : 32
cache_alignment : 32
address sizes : 32 bits physical, 32 bits virtual
power management:
Note that in this case, the Raspberry Pi’s ARM11 CPU is actually superior!
The ARM11 has a nice, pipelined VFP implementation. Cortex-A8 only has a stripped down VFP-lite configuration that is not pipelined. You can force the A8 to use the NEON unit for some VFP instructions (the so-called RunFast mode), but that has its limits and isn’t trivial to use from C.
So even with hardfloat configured correctly, Cortex-A8 is simply unlikely to outperform ARM11 clock by clock.
I tested it on Intel Edison (EDI1BB.AL.K), Ubilinux
bogomips 998.40 (but has 2 cores !)
time ./card >/dev/null
real 5m18.076s
user 5m17.130s
sys 0m0.000s
Hi Alain — extra cores won’t necessarily help you here, as it’s a single process.
Just ran it on a Raspberry Pi 2: 1′ 53″ …
When using BOINC, the Beaglebone Black reports 184 MIPS floating point and 2047 MIPS Integer performance while under Ångström Linux -the OS the Beaglebone is provided with on the eMMC.
Installing Android 4.4.4 gave 277 MIPS floating point (thus up some 50% !) and 1607 integer MIPS (so down 20%) under the BOINC benchmarks.
Installing the latest BeagleBone Debian Stretch the MIPS floating point go up to 226 while the MIPS Integer performance go through the roof: 11,779! swp gets traded in for vfpd32, feature-wise.
Hmm, something does seem to be running more quickly on a Stretch-based BBB, ‘cos good old card runs in 7′ 51″