references:
https://www.v7labs.com/blog/neural-networks-activation-functions#:~:text=An%20Activation%20Function%20decides%20whether,prediction%20using%20simpler%20mathematical%20operations.
https://blog.csdn.net/weixin_45134475/article/details/123672283
https://zhuanlan.zhihu.com/p/92412922
Sigmoid
$\sigma(x) = \frac{1}{1+e^{-x}}$
Tanh
$tanh(x) = \frac{1-e^{-2x}}{1+e^{-2x}}$
ReLU
$ReLU(x)=\left\lbrace\begin{array}{cll}
x & , & x \ge 0 \\
0 & , & x < 0
\end{array}\right.$
或
$ReLU(x) = \max(0, x)$
Leaky ReLU
$ReLU(x)=\left\lbrace\begin{array}{cll}
x & , & x \ge 0 \\
0.1x & , & x < 0
\end{array}\right.$
或
$ReLU(x) = \max(0.1x, x)$
ELU
$ELU(x)=\left\lbrace\begin{array}{cll}
x & , & x \ge 0 \\
\alpha(e^x-1) & , & x < 0
\end{array}\right.$
Swish
$swish(x) = x*sigmoid(x) =\frac{x}{1+e^{-x}}$
Maxout
$max(z_1, z_2, …)$
Softmax
$softmax(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$