I have merged the Dense Algebra branche into the master branch. Now if NTPoly detects that two matrix blocks are both dense (>30% full), it will perform a dense matrix multiply. This optimization might sound like what is done in the DBCSR[1] library, but it’s not nearly that fancy. It is really meant for if you’re trying to compare to the full dense solution, because our block sizes are still fairly large.

[1] https://dbcsr.cp2k.org/