This paper introduces the batch-parallel Compressed Packed Memory Array
...
Dedicated accelerator hardware has become essential for processing AI-ba...
Similarity search is one of the most fundamental computations that are
r...
We present a data structure to randomly sample rows from the Khatri-Rao
...
Low-rank Candecomp / PARAFAC (CP) Decomposition is a powerful tool for t...
Modern network sensors continuously produce enormous quantities of raw d...
De novo genome assembly, i.e., rebuilding the sequence of an unknown gen...
Long range detection is a cornerstone of defense in many operating domai...
The Internet has become a critical component of modern civilization requ...
Sampled Dense Times Dense Matrix Multiplication (SDDMM) and Sparse Times...
We develop a family of parallel algorithms for the SpKAdd operation that...
We present Atos, a task-parallel GPU dynamic scheduling framework that i...
Computing the product of two sparse matrices (SpGEMM) is a fundamental
o...
The Internet has never been more important to our society, and understan...
Combinatorial algorithms such as those that arise in graph analysis, mod...
Randomized algorithms have propelled advances in artificial intelligence...
Can cloud computing infrastructures provide HPC-competitive performance ...
Understanding protein structure-function relationships is a key challeng...
One of the most computationally intensive tasks in computational biology...
Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in
...
Identifying similar protein sequences is a core step in many computation...
Graph Neural Networks (GNNs) are powerful and flexible neural networks t...
HipMCL is a high-performance distributed memory implementation of the po...
Pairwise sequence alignment is one of the most computationally intensive...
We present a parallel algorithm and scalable implementation for genome
a...
Genomic data sets are growing dramatically as the cost of sequencing
con...
We present the design of a solver for the efficient and high-throughput
...
Distributed data structures are key to implementing scalable application...
High-performance implementations of graph algorithms are challenging to
...
One-sided communication is a useful paradigm for irregular parallel
appl...
Metagenome assembly is the process of transforming a set of short,
overl...
We factor Beamer's push-pull, also known as direction-optimized
breadth-...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitiv...
We implement two novel algorithms for sparse-matrix dense-matrix
multipl...
We design and implement an efficient parallel approximation algorithm fo...
We propose a new integrated method of exploiting model, batch and domain...
We propose a new integrated method of exploiting both model and data
par...
Undirected graphical models compactly represent the structure of large,
...
Ordering vertices of a graph is key to minimize fill-in and data structu...
The GraphBLAS standard (GraphBlas.org) is being developed to bring the
p...