Categories
computers suck

BeagleBone Black: slow as a dog

All benchmarks are artificial, but this one had me scratching my head. One hears  that the BeagleBone Black is screamingly fast compared to the Raspberry Pi; faster, newer processor, blahdeblah, mcbtyc, etc. I found the opposite is true.

So I buy one at the exceptionally soggy Toronto Mini Maker Faire. Props to the CircuitCo folks, they are easy to set up: just a mini-USB cable provides power and virtual network shell. And BoneScript — an Arduino-like JavaScript library — is very clever indeed. But I need to see if this thing has any grunt, and so I need a benchmark.

After hearing about the business-card raytracer, I thought it would be perfect. I compiled it on both machines with:

g++  -Ofast   card.cpp   -o card

and then ran it with:

time ./card > /dev/null

The results are … surprising:

  • Raspberry Pi: 4′ 15″
  • BeagleBone Black: 12′ 39″ → 3× slower

(In comparison, my i7 quad-core laptop runs it in 8½ seconds.)

I don’t have any explanation why the BBB is so much slower. It’s almost as if the compiler isn’t fully optimizing under Ångström Linux.

Raspberry Pi: system info

$ uname -a
Linux rpi 3.6.11+ #538 PREEMPT Fri Aug 30 20:42:08 BST 2013 armv6l GNU/Linux

$ cat /proc/cpuinfo 
Processor    : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS    : 697.95
Features    : swp half thumb fastmult vfp edsp java tls 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x0
CPU part    : 0xb76
CPU revision    : 7

Hardware    : BCM2708
Revision    : 000f

BeagleBone Black: system info

# uname -a
Linux beaglebone 3.8.13 #1 SMP Tue Jun 18 02:11:09 EDT 2013 armv7l GNU/Linux
# cat /proc/cpuinfo 
processor    : 0
model name    : ARMv7 Processor rev 2 (v7l)
BogoMIPS    : 297.40
Features    : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x3
CPU part    : 0xc08
CPU revision    : 2

Hardware    : Generic AM33XX (Flattened Device Tree)
Revision    : 0000

Both boards are running at stock speed.

Update: I’ve tried with an external power supply, and checked that the processor was running at full speed. It made no difference. I suspect that Raspbian enables armhf floating point by default, while Ångström needs to be told to use it.

20 replies on “BeagleBone Black: slow as a dog”

I ran this test on my normal beaglebone, not the black with the clock forced at 720 mhz on Ubuntu 13.10 saucy:

ubuntu@arm:~/temp$ g++ -Ofast card.cpp -o card
ubuntu@arm:~/temp$ time ./card > /dev/null

real 11m7.456s
user 11m4.244s
sys 0m0.125s

uname -a readout :

3.8.13-bone28 #1 SMP Fri Sep 13 01:11:14 UTC 2013 armv7l armv7l armv7l GNU/Linux

cat /proc/cpuinfo readout:
processor : 0
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 181.83
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc08
CPU revision : 2

Hardware : Generic AM33XX (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000

Interesting how a slower cpu with slower ram can run a little faster. Perhaps you are right about the distro?

also, -march=native only reduced execution time by mere seconds and has a negligible impact.

I forgot my g++ -v:

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.8/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v –with-pkgversion=’Ubuntu/Linaro 4.8.1-10ubuntu8′ –with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs –enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ –prefix=/usr –program-suffix=-4.8 –enable-shared –enable-linker-build-id –libexecdir=/usr/lib –without-included-gettext –enable-threads=posix –with-gxx-include-dir=/usr/include/c++/4.8 –libdir=/usr/lib –enable-nls –with-sysroot=/ –enable-clocale=gnu –enable-libstdcxx-debug –enable-libstdcxx-time=yes –enable-gnu-unique-object –disable-libitm –disable-libquadmath –enable-plugin –with-system-zlib –disable-browser-plugin –enable-java-awt=gtk –enable-gtk-cairo –with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-armhf/jre –enable-java-home –with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-armhf –with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-armhf –with-arch-directory=arm –with-ecj-jar=/usr/share/java/eclipse-ecj.jar –enable-objc-gc –enable-multiarch –enable-multilib –disable-sjlj-exceptions –with-arch=armv7-a –with-fpu=vfpv3-d16 –with-float=hard –with-mode=thumb –disable-werror –enable-checking=release –build=arm-linux-gnueabihf –host=arm-linux-gnueabihf –target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)

Beaglebones don’t run at top speed if powered only by USB. They power adaptor must be connected.

Hmm, I tried it. With external power supply and freq set to max (1000Mhz) and I got same result:12 min. It is weird.

Is your Raspberry overclocked? Maybe PREEMPT linux helps for better results…I dont know.

I booted Debian from SD card and I tried it again.

Sys. informations:
Linux debian-armhf 3.8.13-bone30 #1 SMP Thu Nov 14 02:59:07 UTC 2013 armv7l GNU/Linux

processor : 0
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 663.07
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc08
CPU revision : 2

Hardware : Generic AM33XX (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000

Comp.:
g++ -Ofast card.cpp -o card -mfloat-abi=hard

Result:
real 4m55.239s
user 4m55.117s
sys 0m0.008s

Yes, I now find out that Ångström only runs as soft float, and to get peak performance, you need a command line like:

g++ -o card-bbb -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ffast-math -O3 -lm card.cpp

My BBB has BogoMIPS as 990+
And with Single cable connection, the same exercise yield time of
real 12m56.424s
user 12m41.154s
sys 0m0.191s
——————————–

Ooh, painful; I managed to put a dev system onto an Intel Galileo. I may be doing something wrong:

time ./card > /dev/null

real 13m19.794s
user 13m11.310s
sys 0m3.560s

=======
# uname -a
Linux clanton 3.8.7-yocto-standard #1 Tue Oct 1 00:09:01 IST 2013 i586 GNU/Linux
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 5
model : 9
model name : 05/09
stepping : 0
cpu MHz : 399.100
cache size : 0 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : yes
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 7
wp : yes
flags : fpu vme pse tsc msr pae cx8 apic pge pbe nx smep
bogomips : 798.20
clflush size : 32
cache_alignment : 32
address sizes : 32 bits physical, 32 bits virtual
power management:

Note that in this case, the Raspberry Pi’s ARM11 CPU is actually superior!

The ARM11 has a nice, pipelined VFP implementation. Cortex-A8 only has a stripped down VFP-lite configuration that is not pipelined. You can force the A8 to use the NEON unit for some VFP instructions (the so-called RunFast mode), but that has its limits and isn’t trivial to use from C.

So even with hardfloat configured correctly, Cortex-A8 is simply unlikely to outperform ARM11 clock by clock.

I tested it on Intel Edison (EDI1BB.AL.K), Ubilinux
bogomips 998.40 (but has 2 cores !)

time ./card >/dev/null
real 5m18.076s
user 5m17.130s
sys 0m0.000s

Hi Alain — extra cores won’t necessarily help you here, as it’s a single process.

Just ran it on a Raspberry Pi 2: 1′ 53″ …

When using BOINC, the Beaglebone Black reports 184 MIPS floating point and 2047 MIPS Integer performance while under Ångström Linux -the OS the Beaglebone is provided with on the eMMC.
Installing Android 4.4.4 gave 277 MIPS floating point (thus up some 50% !) and 1607 integer MIPS (so down 20%) under the BOINC benchmarks.

Installing the latest BeagleBone Debian Stretch the MIPS floating point go up to 226 while the MIPS Integer performance go through the roof: 11,779! swp gets traded in for vfpd32, feature-wise.

Leave a Reply

Your email address will not be published. Required fields are marked *