Graph Neural Networks (GNNs) are a powerful tool for handling structured...
Execution graphs of parallel loop programs exhibit a nested, repeating
s...
The current hardware landscape and application scale is driving performa...
With the rise of specialized hardware and new programming languages, cod...
As deep learning models grow, sparsity is becoming an increasingly criti...
Performance optimization is an increasingly challenging but often repeti...
As the accuracy of machine learning models increases at a fast rate, so ...
The multi-pumping resource sharing technique can overcome the limitation...
Post-processing ensemble prediction systems can improve weather forecast...
Optimizing application performance in today's hardware architecture land...
Multilinear algebra kernel performance on modern massively-parallel syst...
We present a new parallel model of computation suitable for spatial
arch...
Earth system models are developed with a tight coupling to target hardwa...
C is the lingua franca of programming and almost any device can be progr...
Rapid progress in deep learning is leading to a diverse set of quickly
c...
Matrix factorizations are among the most important building blocks of
sc...
Python has become the de facto language for scientific computing. Progra...
We present a graph neural network to learn graph coloring heuristics usi...
Determining I/O lower bounds is a crucial step in obtaining
communicatio...
The growing energy and performance costs of deep learning have driven th...
I/O is emerging as a major bottleneck for machine learning training,
esp...
Compiler architects increasingly look to machine learning when building
...
Hierarchical structure and repetition are prevalent in graphs originatin...
Spatial computing devices have been shown to significantly accelerate st...
Developing high-performance and energy-efficient algorithms for maximum
...
Dense linear algebra kernels, such as linear solvers or tensor contracti...
Transformers have become widely used for language modeling and sequence
...
Quantifying uncertainty in weather forecasts typically employs ensemble
...
Deep learning at scale is dominated by communication time. Distributing
...
The increasing complexity of computing systems places a tremendous burde...
Designing efficient cooling systems for integrated circuits (ICs) relies...
The computational efficiency of a state of the art ab initio quantum
tra...
Modern weather forecast models perform uncertainty quantification using
...
Load imbalance pervasively exists in distributed deep learning training
...
The ubiquity of accelerators in high-performance computing has driven
pr...
With the ubiquity of accelerators, such as FPGAs and GPUs, the complexit...
Graph processing has become an important part of various areas, such as
...
We introduce Deep500: the first customizable benchmarking infrastructure...
Large-batch SGD is important for scaling training of deep neural network...
With the recent success of embeddings in natural language processing,
re...
NVIDIA cuDNN is a low-level library that provides GPU kernels frequently...
Deep Neural Networks (DNNs) are becoming an important tool in modern
com...