A growing number of applications depend on Machine Learning (ML)
functio...
Benchmarking and co-design are essential for driving optimizations and
i...
Collective communications are an indispensable part of distributed train...
Recently, the U.S. Department of Energy (DOE), Office of Science, Biolog...
As deep learning models and input data are scaling at an unprecedented r...
HPC applications are critical in various scientific domains ranging from...
Deep Learning (DL) acceleration support in CPUs has recently gained a lo...
Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse...
Modern Deep Learning (DL) models have grown to sizes requiring massive
c...
Real-time multi-model multi-task (MMMT) workloads, a new form of deep
le...
Map Space Exploration is the problem of finding optimized mappings of a ...
Sparsity has become one of the promising methods to compress and acceler...
RDMA over Converged Ethernet (RoCE) has gained significant attraction fo...
The high efficiency of domain-specific hardware accelerators for machine...
The design of DNN accelerators includes two key parts: HW resource
confi...
Dataflow/mapping decides the compute and energy efficiency of DNN
accele...
Recently, numerous sparse hardware accelerators for Deep Neural Networks...
The continuous growth in both size and training data for modern Deep Neu...
As AI-based applications become pervasive, CPU vendors are starting to
i...
Deep Neural Networks have gained significant attraction due to their wid...
To meet the extreme compute demands for deep learning across commercial ...
Design space exploration is an important but costly step involved in the...
Attention mechanisms form the backbone of state-of-the-art machine learn...
There is a growing interest in custom spatial accelerators for machine
l...
As Deep Learning continues to drive a variety of applications in datacen...
Sparsity, which occurs in both scientific applications and Deep Learning...
Recently, Graph Neural Networks (GNNs) have received a lot of interest
b...
With increasing diversity in Deep Neural Network(DNN) models in terms of...
The everlasting demand for higher computing power for deep neural networ...
Deep neural network (DNN) models continue to grow in size and complexity...
DNN accelerators provide efficiency by leveraging reuse of
activations/w...
Recent advancements in machine learning algorithms, especially the
devel...
Using multiple nodes and parallel computing algorithms has become a prin...
Compute in-memory (CIM) is a promising technique that minimizes data
tra...
The open-source and community-supported gem5 simulator is one of the mos...
Deep Learning (DL) training platforms are built by interconnecting multi...
Deep Learning (DL) training platforms are built by interconnecting multi...
The design of specialized architectures for accelerating the inference
p...
Designing resource-efficient Deep Neural Networks (DNNs) is critical to
...
To efficiently run DNNs on the edge/cloud, many new DNN inference
accele...
The efficiency of a spatial DNN accelerator depends heavily on the compi...
The efficiency of a spatial DNN accelerator depends heavily on the compi...
Neural Architecture Search (NAS) has demonstrated its power on various A...
Deep Neural Networks have flourished at an unprecedented pace in recent
...
Recent advances in deep neural networks (DNNs) have made DNNs the backbo...
Applying Machine Learning (ML) techniques to design and optimize compute...
Systolic Arrays are one of the most popular compute substrates within De...
Systolic Arrays are one of the most popular compute substrates within De...
Modern deep learning systems rely on (a) a hand-tuned neural network
top...
We present MAESTRO, a framework to describe and analyze CNN dataflows, a...