BeagleBone Black: slow as a dog

All benchmarks are artificial, but this one had me scratching my head. One hears  that the BeagleBone Black is screamingly fast compared to the Raspberry Pi; faster, newer processor, blahdeblah, mcbtyc, etc. I found the opposite is true.

So I buy one at the exceptionally soggy Toronto Mini Maker Faire. Props to the CircuitCo folks, they are easy to set up: just a mini-USB cable provides power and virtual network shell. And BoneScript — an Arduino-like JavaScript library — is very clever indeed. But I need to see if this thing has any grunt, and so I need a benchmark.

After hearing about the business-card raytracer, I thought it would be perfect. I compiled it on both machines with:

g++  -Ofast   card.cpp   -o card

and then ran it with:

time ./card > /dev/null

The results are … surprising:

  • Raspberry Pi: 4′ 15″
  • BeagleBone Black: 12′ 39″ → 3× slower

(In comparison, my i7 quad-core laptop runs it in 8½ seconds.)

I don’t have any explanation why the BBB is so much slower. It’s almost as if the compiler isn’t fully optimizing under Ã…ngström Linux.

Raspberry Pi: system info

$ uname -a
Linux rpi 3.6.11+ #538 PREEMPT Fri Aug 30 20:42:08 BST 2013 armv6l GNU/Linux

$ cat /proc/cpuinfo 
Processor    : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS    : 697.95
Features    : swp half thumb fastmult vfp edsp java tls 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x0
CPU part    : 0xb76
CPU revision    : 7

Hardware    : BCM2708
Revision    : 000f

BeagleBone Black: system info

# uname -a
Linux beaglebone 3.8.13 #1 SMP Tue Jun 18 02:11:09 EDT 2013 armv7l GNU/Linux
# cat /proc/cpuinfo 
processor    : 0
model name    : ARMv7 Processor rev 2 (v7l)
BogoMIPS    : 297.40
Features    : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls 
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x3
CPU part    : 0xc08
CPU revision    : 2

Hardware    : Generic AM33XX (Flattened Device Tree)
Revision    : 0000

Both boards are running at stock speed.

Update: I’ve tried with an external power supply, and checked that the processor was running at full speed. It made no difference. I suspect that Raspbian enables armhf floating point by default, while Ã…ngström needs to be told to use it.

Comments

20 Responses to “BeagleBone Black: slow as a dog”

  1. tlhingan Avatar
    tlhingan

    The stock speed on the BeagleBone is not set at the specs.
    Use cpufreq-info and cpufreq-set to change.
    Look here: http://beaglebone.cameon.net/home/set-cpu-speed

  2. Casey Avatar
    Casey

    I ran this test on my normal beaglebone, not the black with the clock forced at 720 mhz on Ubuntu 13.10 saucy:

    ubuntu@arm:~/temp$ g++ -Ofast card.cpp -o card
    ubuntu@arm:~/temp$ time ./card > /dev/null

    real 11m7.456s
    user 11m4.244s
    sys 0m0.125s

    uname -a readout :

    3.8.13-bone28 #1 SMP Fri Sep 13 01:11:14 UTC 2013 armv7l armv7l armv7l GNU/Linux

    cat /proc/cpuinfo readout:
    processor : 0
    model name : ARMv7 Processor rev 2 (v7l)
    BogoMIPS : 181.83
    Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
    CPU implementer : 0x41
    CPU architecture: 7
    CPU variant : 0x3
    CPU part : 0xc08
    CPU revision : 2

    Hardware : Generic AM33XX (Flattened Device Tree)
    Revision : 0000
    Serial : 0000000000000000

    Interesting how a slower cpu with slower ram can run a little faster. Perhaps you are right about the distro?

  3. Casey Avatar
    Casey

    also, -march=native only reduced execution time by mere seconds and has a negligible impact.

    I forgot my g++ -v:

    Using built-in specs.
    COLLECT_GCC=g++
    COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.8/lto-wrapper
    Target: arm-linux-gnueabihf
    Configured with: ../src/configure -v –with-pkgversion=’Ubuntu/Linaro 4.8.1-10ubuntu8′ –with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs –enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ –prefix=/usr –program-suffix=-4.8 –enable-shared –enable-linker-build-id –libexecdir=/usr/lib –without-included-gettext –enable-threads=posix –with-gxx-include-dir=/usr/include/c++/4.8 –libdir=/usr/lib –enable-nls –with-sysroot=/ –enable-clocale=gnu –enable-libstdcxx-debug –enable-libstdcxx-time=yes –enable-gnu-unique-object –disable-libitm –disable-libquadmath –enable-plugin –with-system-zlib –disable-browser-plugin –enable-java-awt=gtk –enable-gtk-cairo –with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-armhf/jre –enable-java-home –with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-armhf –with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-armhf –with-arch-directory=arm –with-ecj-jar=/usr/share/java/eclipse-ecj.jar –enable-objc-gc –enable-multiarch –enable-multilib –disable-sjlj-exceptions –with-arch=armv7-a –with-fpu=vfpv3-d16 –with-float=hard –with-mode=thumb –disable-werror –enable-checking=release –build=arm-linux-gnueabihf –host=arm-linux-gnueabihf –target=arm-linux-gnueabihf
    Thread model: posix
    gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)

  4. person Avatar
    person

    Beaglebones don’t run at top speed if powered only by USB. They power adaptor must be connected.

  5. scruss Avatar

    That seems a bit daft.

  6. BeranekCZ Avatar
    BeranekCZ

    Hmm, I tried it. With external power supply and freq set to max (1000Mhz) and I got same result:12 min. It is weird.

    Is your Raspberry overclocked? Maybe PREEMPT linux helps for better results…I dont know.

  7. BeranekCZ Avatar
    BeranekCZ

    I booted Debian from SD card and I tried it again.

    Sys. informations:
    Linux debian-armhf 3.8.13-bone30 #1 SMP Thu Nov 14 02:59:07 UTC 2013 armv7l GNU/Linux

    processor : 0
    model name : ARMv7 Processor rev 2 (v7l)
    BogoMIPS : 663.07
    Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
    CPU implementer : 0x41
    CPU architecture: 7
    CPU variant : 0x3
    CPU part : 0xc08
    CPU revision : 2

    Hardware : Generic AM33XX (Flattened Device Tree)
    Revision : 0000
    Serial : 0000000000000000

    Comp.:
    g++ -Ofast card.cpp -o card -mfloat-abi=hard

    Result:
    real 4m55.239s
    user 4m55.117s
    sys 0m0.008s

  8. scruss Avatar

    Yes, I now find out that Ångström only runs as soft float, and to get peak performance, you need a command line like:

    g++ -o card-bbb -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ffast-math -O3 -lm card.cpp

  9. Cev Ing Avatar
    Cev Ing

    What is the peak performance you get, if you use those options?

  10. scruss Avatar

    The same 4′ 55″ that BeranekCZ got.

  11. zdf Avatar
    zdf

    Mt Beaglebone Black has BogoMIPS as 990+

  12. scruss Avatar

    As does mine, now it’s running Debian.

  13. RAJIV JAIN Avatar
    RAJIV JAIN

    My BBB has BogoMIPS as 990+
    And with Single cable connection, the same exercise yield time of
    real 12m56.424s
    user 12m41.154s
    sys 0m0.191s
    ——————————–

  14. scruss Avatar

    Ooh, painful; I managed to put a dev system onto an Intel Galileo. I may be doing something wrong:

    time ./card > /dev/null

    real 13m19.794s
    user 13m11.310s
    sys 0m3.560s

    =======
    # uname -a
    Linux clanton 3.8.7-yocto-standard #1 Tue Oct 1 00:09:01 IST 2013 i586 GNU/Linux
    # cat /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 5
    model : 9
    model name : 05/09
    stepping : 0
    cpu MHz : 399.100
    cache size : 0 KB
    fdiv_bug : no
    hlt_bug : no
    f00f_bug : yes
    coma_bug : no
    fpu : yes
    fpu_exception : yes
    cpuid level : 7
    wp : yes
    flags : fpu vme pse tsc msr pae cx8 apic pge pbe nx smep
    bogomips : 798.20
    clflush size : 32
    cache_alignment : 32
    address sizes : 32 bits physical, 32 bits virtual
    power management:

  15. greg Avatar
    greg

    Note that in this case, the Raspberry Pi’s ARM11 CPU is actually superior!

    The ARM11 has a nice, pipelined VFP implementation. Cortex-A8 only has a stripped down VFP-lite configuration that is not pipelined. You can force the A8 to use the NEON unit for some VFP instructions (the so-called RunFast mode), but that has its limits and isn’t trivial to use from C.

    So even with hardfloat configured correctly, Cortex-A8 is simply unlikely to outperform ARM11 clock by clock.

  16. Alain Avatar

    I tested it on Intel Edison (EDI1BB.AL.K), Ubilinux
    bogomips 998.40 (but has 2 cores !)

    time ./card >/dev/null
    real 5m18.076s
    user 5m17.130s
    sys 0m0.000s

  17. scruss Avatar

    Hi Alain — extra cores won’t necessarily help you here, as it’s a single process.

    Just ran it on a Raspberry Pi 2: 1′ 53″ …

  18. Dirk Broer Avatar
    Dirk Broer

    When using BOINC, the Beaglebone Black reports 184 MIPS floating point and 2047 MIPS Integer performance while under Ångström Linux -the OS the Beaglebone is provided with on the eMMC.
    Installing Android 4.4.4 gave 277 MIPS floating point (thus up some 50% !) and 1607 integer MIPS (so down 20%) under the BOINC benchmarks.

  19. Dirk Broer Avatar
    Dirk Broer

    Installing the latest BeagleBone Debian Stretch the MIPS floating point go up to 226 while the MIPS Integer performance go through the roof: 11,779! swp gets traded in for vfpd32, feature-wise.

  20. scruss Avatar

    Hmm, something does seem to be running more quickly on a Stretch-based BBB, ‘cos good old card runs in 7′ 51″

Leave a Reply

Your email address will not be published. Required fields are marked *