GELU- Gaussian Error Linear Unit

Versioned name: Gelu-7

Category: Activation

Short description: Calculates Gaussian error linear.

Detailed description: Gelu(x) = x * Φ(x), where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution. The Gelu operation is introduced in the paper.

Attributes

  • approximation_mode

  • Description: Specifies the formulae to calculate the output.

  • Range of values:
    • erf -- calculate output using the Gauss error function.
    • tanh -- calculate output using tanh approximation
  • Type: string
  • Default value: erf
  • Required: no

Mathematical Formulation

For the erf approximation mode: \f[ Gelu(x) = 0.5 \cdot x \cdot (1.0 + erf((x) / \sqrt{2}) \f]

For the tanh approximation mode:

\f[ Gelu(x) \approx 0.5 \cdot x \cdot (1.0 + tanh(\sqrt{2.0/pi} \cdot (x + 0.044715 \cdot x ^ 3)) \f]

Inputs:

  • 1: Multidimensional input tensor of type T. Required.

Outputs:

  • 1: Floating point tensor with shape and type T matching the input tensor.

Types

  • T: any floating point type.

Examples

<layer ... type="Gelu">
    <data approximation_mode="tanh"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>128</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>128</dim>
        </port>
    </output>
</layer>
<layer ... type="Gelu">
    <data approximation_mode="erf"/>
    <input>
        <port id="0">
            <dim>3</dim>
            <dim>7</dim>
            <dim>9</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>3</dim>
            <dim>7</dim>
            <dim>9</dim>
        </port>
    </output>
</layer>