b'Pradeep Dubey'

research

∙ 04/14/2023

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

Sparse training is emerging as a promising avenue for reducing the compu...

0 Abhisek Kundu, et al. ∙

research

∙ 09/12/2022

FP8 Formats for Deep Learning

FP8 is a natural progression for accelerating deep learning training inf...

0 Paulius Micikevicius, et al. ∙

research

∙ 10/29/2020

Systolic Computing on GPUs for Productive Performance

We propose a language and compiler to productively build high-performanc...

0 Hongbo Rong, et al. ∙

research

∙ 06/05/2020

MISIM: An End-to-End Neural Code Similarity System

Code similarity systems are integral to a range of applications from cod...

6 Fangke Ye, et al. ∙

research

∙ 03/24/2020

Context-Aware Parse Trees

The simplified parse tree (SPT) presented in Aroma, a state-of-the-art c...

7 Fangke Ye, et al. ∙

research

∙ 09/17/2019

K-TanH: Hardware Efficient Activations For Deep Learning

We propose K-TanH, a novel, highly accurate, hardware efficient approxim...

0 Abhisek Kundu, et al. ∙

research

∙ 05/29/2019

A Study of BFLOAT16 for Deep Learning Training

This paper presents the first comprehensive empirical study demonstratin...

0 Dhiraj Kalamkar, et al. ∙

research

∙ 03/29/2019

SysML: The New Frontier of Machine Learning Systems

Machine learning (ML) techniques are enjoying rapidly increasing adoptio...

0 Alexander Ratner, et al. ∙

research

∙ 02/03/2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations

The state-of-the-art (SOTA) for mixed precision training is dominated by...

0 Dipankar Das, et al. ∙

research

∙ 01/24/2018

On Scale-out Deep Learning Training for Cloud and HPC

The exponential growth in use of large deep neural networks has accelera...

0 Srinivas Sridharan, et al. ∙

research

∙ 08/31/2017

Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies

The nature of dark energy and the complete theory of gravity are two cen...

0 Brian Friesen, et al. ∙

research

∙ 08/17/2017

Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

This paper presents the first, 15-PetaFLOP Deep Learning system for solv...

0 Thorsten Kurth, et al. ∙

research

∙ 07/15/2017

Ternary Residual Networks

Sub-8-bit representation of DNNs incur some discernible loss of accuracy...

0 Abhisek Kundu, et al. ∙

research

∙ 05/02/2017

Ternary Neural Networks with Fine-Grained Quantization

We propose a novel fine-grained quantization (FGQ) method to ternarize p...

0 Naveen Mellempudi, et al. ∙

research

∙ 11/18/2016

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures

Word2vec is a widely used algorithm for extracting low-dimensional vecto...

0 Shihao Ji, et al. ∙

research

∙ 08/04/2016

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Phenomenally successful in practical inference problems, convolutional n...

0 Jongsoo Park, et al. ∙

research

∙ 04/15/2016

Parallelizing Word2Vec in Shared and Distributed Memory

Word2Vec is a widely used algorithm for extracting low-dimensional vecto...

0 Shihao Ji, et al. ∙

research

∙ 11/21/2015

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

We propose BlackOut, an approximation algorithm to efficiently train mas...

0 Shihao Ji, et al. ∙

Pradeep Dubey

Featured Co-authors

Sign in with Google

Consider DeepAI Pro