mamba create -n bench -c anaconda -c conda-forge ipython numpy "libblas=*=*mkl" "liblapack=*=*mkl" intel-openmp
=*=*openblas
: uses OpenBLAS; default
=*=*mkl
: uses Intel’s Math Kernel Library (MKL)
=*=*accelerate
: uses Apple’s Accelerate framework with vecLib
=*=*blis
: uses AMD' AOCL-BLIS
TODO:
- how about MKL/Accelerate builds for scipy and sklearn?
- does intel have optimizations or hardware acceleration for Pandas
Matrix Multiplication (2048x2048) on Apple M1 (4+4)
- OpenBLAS: 94.7 ms ± 2.34 ms per loop
- Accelerate: 68.6 ms ± 321 µs per loop
- Notes: OpenBLAS uses all available cores, while Accelerate only dedicates one performance core. Accelerate still ends up better with less CPU usage.
Matrix Multiplication (2048x2048) on Mac Intel i7 (6)
- OpenBLAS: 96.5 ms ± 6.72 ms per loop
- MKL: 74.6 ms ± 3.35 ms per loop
- Accelerate: 99 ms ± 4.18 ms per loop
- Notes: MKL only uses the physical cores, while OpenBLAS and Accelerate includes the hyperthreads. All libraries consumed ~100W peak while performing the operation.
on AMD, MKL is better than BLIS