Georg Hager

research

∙ 09/11/2023

SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study

In this work, fundamental performance, power, and energy characteristics...

0 Ayesha Afzal, et al. ∙

research

∙ 09/05/2023

Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs

Sparse linear iterative solvers are essential for many large-scale simul...

0 Christie Alappat, et al. ∙

research

∙ 02/23/2023

Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

Comprehending the performance bottlenecks at the core of the intricate h...

0 Ayesha Afzal, et al. ∙

research

∙ 02/22/2023

MD-Bench: Engineering the in-core performance of short-range molecular dynamics kernels from state-of-the-art simulation packages

Molecular dynamics (MD) simulations provide considerable benefits for th...

0 Rafael Ravedutti Lucio Machado, et al. ∙

research

∙ 09/05/2022

Orthogonal layers of parallelism in large-scale eigenvalue computations

We address the communication overhead of distributed sparse matrix-(mult...

0 Andreas Alvermann, et al. ∙

research

∙ 05/27/2022

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

This paper studies the utility of using data analytics and machine learn...

0 Ayesha Afzal, et al. ∙

research

∙ 05/09/2022

The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs

The performance of highly parallel applications on distributed-memory sy...

0 Ayesha Afzal, et al. ∙

research

∙ 05/03/2022

Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication

The multiplication of a sparse matrix with a dense vector (SpMV) is a ke...

0 Christie L. Alappat, et al. ∙

research

∙ 04/29/2022

Analytical Performance Estimation during Code Generation on Modern GPUs

Automatic code generation is frequently used to create implementations o...

0 Dominik Ernst, et al. ∙

research

∙ 07/02/2021

Opening the Black Box: Performance Estimation during Code Generation for GPUs

Automatic code generation is frequently used to create implementations o...

0 Dominik Ernst, et al. ∙

research

∙ 03/04/2021

Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

Most distributed-memory bulk-synchronous parallel programs in HPC assume...

0 Ayesha Afzal, et al. ∙

research

∙ 03/04/2021

ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX

The A64FX CPU is arguably the most powerful Arm-based processor design t...

0 Christie Alappat, et al. ∙

research

∙ 10/31/2020

An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs

Complex applications running on multicore processors show a rich perform...

0 Ayesha Afzal, et al. ∙

research

∙ 09/29/2020

Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX

The A64FX CPU powers the current number one supercomputer on the Top500 ...

0 Christie L. Alappat, et al. ∙

research

∙ 02/09/2020

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors

Hardware platforms in high performance computing are constantly getting ...

0 Christie L. Alappat, et al. ∙

research

∙ 02/07/2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

Analytic, first-principles performance modeling of distributed-memory pa...

0 Ayesha Afzal, et al. ∙

research

∙ 10/01/2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Useful models of loop kernel runtimes on out-of-order architectures requ...

0 Jan Laukemann, et al. ∙

research

∙ 07/15/2019

A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication

The symmetric sparse matrix-vector multiplication (SymmSpMV) is an impor...

0 Christie L. Alappat, et al. ∙

research

∙ 07/01/2019

Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors

We describe a universal modeling approach for predicting single- and mul...

0 Johannes Hofmann, et al. ∙

research

∙ 06/19/2019

Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT

Stencil algorithms have been receiving considerable interest in HPC rese...

0 Julian Hornich, et al. ∙

research

∙ 05/25/2019

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

Analytic, first-principles performance modeling of distributed-memory ap...

0 Ayesha Afzal, et al. ∙

research

∙ 05/25/2019

Delay Propagation and Overlapping Mechanisms on Clusters: A Case Study of Idle Periods based on Workload, Communication, and Delay Granularity

Analytic, first-principles performance modeling of distributed-memory ap...

0 Ayesha Afzal, et al. ∙

research

∙ 05/08/2019

Performance Engineering for Real and Complex Tall Skinny Matrix Multiplication Kernels on GPUs

General matrix-matrix multiplications with double-precision real and com...

0 Dominik Ernst, et al. ∙

research

∙ 05/08/2019

Performance Engineering for a Tall Skinny Matrix Multiplication Kernel on GPUs

General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS lib...

0 Dominik Ernst, et al. ∙

research

∙ 01/16/2019

Analytic Performance Modeling and Analysis of Detailed Neuron Simulations

Big science initiatives are trying to reconstruct and model the brain by...

0 Francesco Cremonesi, et al. ∙

research

∙ 09/04/2018

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

An accurate prediction of scheduling and execution of instruction stream...

0 Jan Laukemann, et al. ∙

research

∙ 03/06/2018

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs

Chebyshev filter diagonalization is well established in quantum chemistr...

0 Moritz Kreutzer, et al. ∙

research

∙ 03/05/2018

On the accuracy and usefulness of analytic energy models for contemporary multicore processors

This paper presents refinements to the execution-cache-memory performanc...

0 Johannes Hofmann, et al. ∙

research

∙ 10/11/2017

Validation of hardware events for successful performance pattern identification in High Performance Computing

Hardware performance monitoring (HPM) is a crucial ingredient of perform...

0 Thomas Röhl, et al. ∙

research

∙ 08/31/2017

A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials

We introduce PVSC-DTM (Parallel Vectorized Stencil Code for Dirac and To...

0 Andreas Pieper, et al. ∙

research

∙ 08/31/2017

PVSC-DTM: A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials

We introduce PVSC-DTM, a highly parallel and SIMD-vectorized library and...

0 Andreas Pieper, et al. ∙

research

∙ 02/24/2017

An analysis of core- and chip-level architectural features in four generations of Intel server processors

This paper presents a survey of architectural features among four genera...

0 Johannes Hofmann, et al. ∙

research

∙ 01/13/2017

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

Achieving optimal program performance requires deep insight into the int...

0 Julian Hammer, et al. ∙

research

∙ 12/17/2013

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

We examine the Xeon Phi, which is based on Intel's Many Integrated Cores...

0 Johannes Hofmann, et al. ∙

research

∙ 07/23/2013

A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units

Sparse matrix-vector multiplication (spMVM) is the most time-consuming k...

0 Moritz Kreutzer, et al. ∙

research

∙ 12/23/2011

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in...

0 Moritz Kreutzer, et al. ∙

Georg Hager

Featured Co-authors

Sign in with Google

Consider DeepAI Pro