Tushar Krishna

research

∙ 06/21/2023

Subgraph Stationary Hardware-Software Inference Co-Design

A growing number of applications depend on Machine Learning (ML) functio...

0 Payman Behnam, et al. ∙

research

∙ 05/23/2023

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Benchmarking and co-design are essential for driving optimizations and i...

0 Srinivas Sridharan, et al. ∙

research

∙ 04/11/2023

TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Training

Collective communications are an indispensable part of distributed train...

0 William Won, et al. ∙

research

∙ 04/07/2023

Perspectives on AI Architectures and Co-design for Earth System Predictability

Recently, the U.S. Department of Energy (DOE), Office of Science, Biolog...

0 Maruti K. Mudunuru, et al. ∙

research

∙ 03/24/2023

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

As deep learning models and input data are scaling at an unprecedented r...

0 William Won, et al. ∙

research

∙ 03/20/2023

Exploiting Inter-Operation Data Reuse in Scientific Applications using GOGETA

HPC applications are critical in various scientific domains ranging from...

0 Raveesh Garg, et al. ∙

research

∙ 02/17/2023

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

Deep Learning (DL) acceleration support in CPUs has recently gained a lo...

0 Geonhwa Jeong, et al. ∙

research

∙ 01/25/2023

Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing

Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse...

0 Francisco Muñoz-Martínez, et al. ∙

research

∙ 11/30/2022

COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training

Modern Deep Learning (DL) models have grown to sizes requiring massive c...

0 Divya Kiran Kadiyala, et al. ∙

research

∙ 11/16/2022

XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse

Real-time multi-model multi-task (MMMT) workloads, a new form of deep le...

0 Hyoukjun Kwon, et al. ∙

research

∙ 10/07/2022

Demystifying Map Space Exploration for NPUs

Map Space Exploration is the problem of finding optimized mappings of a ...

0 Sheng-Chun Kao, et al. ∙

research

∙ 09/15/2022

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

Sparsity has become one of the promising methods to compress and acceler...

6 Sheng-Chun Kao, et al. ∙

research

∙ 07/22/2022

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

RDMA over Converged Ethernet (RoCE) has gained significant attraction fo...

5 Tarannum Khan, et al. ∙

research

∙ 06/07/2022

A Formalism of DNN Accelerator Flexibility

The high efficiency of domain-specific hardware accelerators for machine...

0 Sheng-Chun Kao, et al. ∙

research

∙ 01/26/2022

DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators

The design of DNN accelerators includes two key parts: HW resource confi...

3 Sheng-Chun Kao, et al. ∙

research

∙ 01/26/2022

DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators

Dataflow/mapping decides the compute and energy efficiency of DNN accele...

0 Sheng-Chun Kao, et al. ∙

research

∙ 01/21/2022

Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity

Recently, numerous sparse hardware accelerators for Deep Neural Networks...

0 Eric Qin, et al. ∙

research

∙ 10/09/2021

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

The continuous growth in both size and training data for modern Deep Neu...

0 Saeed Rashidi, et al. ∙

research

∙ 10/05/2021

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU

As AI-based applications become pervasive, CPU vendors are starting to i...

0 Geonhwa Jeong, et al. ∙

research

∙ 09/24/2021

Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models

Deep Neural Networks have gained significant attraction due to their wid...

7 William Won, et al. ∙

research

∙ 09/15/2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial ...

0 Geonhwa Jeong, et al. ∙

research

∙ 08/16/2021

AIRCHITECT: Learning Custom Architecture Design and Mapping Space

Design space exploration is an important but costly step involved in the...

5 Ananda Samajdar, et al. ∙

research

∙ 07/13/2021

FLAT: An Optimized Dataflow for Mitigating Attention Performance Bottlenecks

Attention mechanisms form the backbone of state-of-the-art machine learn...

7 Sheng-Chun Kao, et al. ∙

research

∙ 06/19/2021

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

There is a growing interest in custom spatial accelerators for machine l...

0 Gordon E. Moon, et al. ∙

research

∙ 04/28/2021

Domain-specific Genetic Algorithm for Multi-tenant DNNAccelerator Scheduling

As Deep Learning continues to drive a variety of applications in datacen...

9 Sheng-Chun Kao, et al. ∙

research

∙ 03/18/2021

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

Sparsity, which occurs in both scientific applications and Deep Learning...

0 Eric Qin, et al. ∙

research

∙ 03/14/2021

A Taxonomy for Classification and Comparison of Dataflows for GNN Accelerators

Recently, Graph Neural Networks (GNNs) have received a lot of interest b...

0 Raveesh Garg, et al. ∙

research

∙ 01/12/2021

Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration

With increasing diversity in Deep Neural Network(DNN) models in terms of...

12 Ananda Samajdar, et al. ∙

research

∙ 12/23/2020

Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators

The everlasting demand for higher computing power for deep neural networ...

0 Jan Moritz Joseph, et al. ∙

research

∙ 11/30/2020

Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package

Deep neural network (DNN) models continue to grow in size and complexity...

0 Robert Guirado, et al. ∙

research

∙ 09/04/2020

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

DNN accelerators provide efficiency by leveraging reuse of activations/w...

0 Sheng-Chun Kao, et al. ∙

research

∙ 08/27/2020

CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices

Recent advancements in machine learning algorithms, especially the devel...

0 Parth Mannan, et al. ∙

research

∙ 08/19/2020

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

Using multiple nodes and parallel computing algorithms has become a prin...

0 Afshin Abdi, et al. ∙

research

∙ 08/15/2020

Breaking Barriers: Maximizing Array Utilization for Compute In-Memory Fabrics

Compute in-memory (CIM) is a promising technique that minimizes data tra...

0 Brian Crafton, et al. ∙

research

∙ 07/07/2020

The gem5 Simulator: Version 20.0+

The open-source and community-supported gem5 simulator is one of the mos...

0 Jason Lowe-Power, et al. ∙

research

∙ 06/30/2020

Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms

Deep Learning (DL) training platforms are built by interconnecting multi...

0 Saeed Rashidi, et al. ∙

research

∙ 06/30/2020

Efficient Communication Acceleration for Next-GenScale-up Deep Learning Training Platforms

Deep Learning (DL) training platforms are built by interconnecting multi...

0 Saeed Rashidi, et al. ∙

research

∙ 06/10/2020

STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators

The design of specialized architectures for accelerating the inference p...

0 Francisco Muñoz-Martínez, et al. ∙

research

∙ 06/06/2020

Conditional Neural Architecture Search

Designing resource-efficient Deep Neural Networks (DNNs) is critical to ...

0 Sheng-Chun Kao, et al. ∙

research

∙ 06/06/2020

Generative Design of Hardware-aware DNNs

To efficiently run DNNs on the edge/cloud, many new DNN inference accele...

0 Sheng-Chun Kao, et al. ∙

research

∙ 02/18/2020

Marvel: A Data-centric Compiler for DNN Operators on Spatial Accelerators

The efficiency of a spatial DNN accelerator depends heavily on the compi...

0 Prasanth Chatarasi, et al. ∙

research

∙ 02/18/2020

MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators

The efficiency of a spatial DNN accelerator depends heavily on the compi...

0 Prasanth Chatarasi, et al. ∙

research

∙ 02/10/2020

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

Neural Architecture Search (NAS) has demonstrated its power on various A...

0 Lei Yang, et al. ∙

research

∙ 12/03/2019

Understanding the Impact of On-chip Communication on DNN Accelerator Performance

Deep Neural Networks have flourished at an unprecedented pace in recent ...

0 Robert Guirado, et al. ∙

research

∙ 09/13/2019

HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices

Recent advances in deep neural networks (DNNs) have made DNNs the backbo...

0 Hyoukjun Kwon, et al. ∙

research

∙ 08/13/2019

Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization

Applying Machine Learning (ML) techniques to design and optimize compute...

4 Sheng-Chun Kao, et al. ∙

research

∙ 10/16/2018

SCALE-Sim: Systolic CNN Accelerator Simulator

Systolic Arrays are one of the most popular compute substrates within De...

0 Ananda Samajdar, et al. ∙

research

∙ 10/16/2018

SCALE-Sim: Systolic CNN Accelerator

Systolic Arrays are one of the most popular compute substrates within De...

0 Ananda Samajdar, et al. ∙

research

∙ 08/03/2018

GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware

Modern deep learning systems rely on (a) a hand-tuned neural network top...

8 Ananda Samajdar, et al. ∙

research

∙ 05/04/2018

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators

We present MAESTRO, a framework to describe and analyze CNN dataflows, a...

0 Hyoukjun Kwon, et al. ∙

Tushar Krishna

Featured Co-authors

Sign in with Google

Consider DeepAI Pro