The Two-Pass Softmax Algorithm

01/13/2020
by   Marat Dukhan, et al.
0

The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization constant, and two other passes to compute outputs from normalized inputs. We analyze two variants of the Three-Pass algorithm and demonstrate that in a well-optimized implementation on HPC-class processors performance of all three passes is limited by memory bandwidth. We then present a novel algorithm for softmax computation in just two passes. The proposed Two-Pass algorithm avoids both numerical overflow and the extra normalization pass by employing an exotic representation for intermediate values, where each value is represented as a pair of floating-point numbers: one representing the "mantissa" and another representing the "exponent". Performance evaluation demonstrates that on out-of-cache inputs on an Intel Skylake-X processor the new Two-Pass algorithm outperforms the traditional Three-Pass algorithm by up to 28 proposed Two-Pass algorithm also outperforms the traditional Three-Pass algorithm on Intel Broadwell and AMD Zen 2 processors. To foster reproducibility, we released an open-source implementation of the new Two-Pass Softmax algorithm and other experiments in this paper as a part of XNNPACK library at GitHub.com/google/XNNPACK.

READ FULL TEXT

page 6

page 8

research
06/15/2018

AVX-512 extension to OpenQCD 1.6

We publish an extension of openQCD-1.6 with AVX-512 vector instructions ...
research
05/11/2022

An Efficient Summation Algorithm for the Accuracy, Convergence and Reproducibility of Parallel Numerical Methods

Nowadays, parallel computing is ubiquitous in several application fields...
research
04/28/2019

Softmax Optimizations for Intel Xeon Processor-based Platforms

Softmax is popular normalization method used in machine learning. Deep l...
research
09/08/2019

Accurate Computation of the Log-Sum-Exp and Softmax Functions

Evaluating the log-sum-exp function or the softmax function is a key ste...
research
04/29/2017

A floating point division unit based on Taylor-Series expansion algorithm and Iterative Logarithmic Multiplier

Floating point division, even though being an infrequent operation in th...
research
04/24/2017

A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake

The modern CPU's design, which is composed of hierarchical memory and SI...
research
08/31/2017

Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies

The nature of dark energy and the complete theory of gravity are two cen...

Please sign up or login with your details

Forgot password? Click here to reset