mamba create -n bench -c anaconda -c conda-forge ipython numpy "libblas=*=*mkl" "liblapack=*=*mkl" intel-openmp

TODO:

Matrix Multiplication (2048x2048) on Apple M1 (4+4)

Matrix Multiplication (2048x2048) on Mac Intel i7 (6)

on AMD, MKL is better than BLIS