This paper lays out insights and opportunities for implementing
higher-p...
Matrix libraries often focus on achieving high performance for problems
...
As the ratio between the rate of computation and rate with which data ca...
We approach the problem of implementing mixed-datatype support within th...
Conventional GPU implementations of Strassen's algorithm (Strassen) typi...
Discovering "good" algorithms for an operation is often considered an ar...
Dijkstra observed that verifying correctness of a program is difficult a...
Tensor contraction (TC) is an important computational kernel widely used...
Matrix multiplication (GEMM) is a core operation to numerous scientific
...
Matrix-matrix multiplication is a fundamental operation of great importa...
We dispel with "street wisdom" regarding the practical implementation of...