- Known as today’s godfather of deep neural networks
- Half-time work at Google Brain (2013 - 2017)
- Now Chief scientific advisor of Vector Institute at University of Toronto

To begin with, consider the XOR-classification example from the AML book “Learning from Data”" (e-Chapter 7):

The two linear decision functions for predicting \(\{-1, +1\}\) labels: \[ \begin{aligned} h_1(\mathbf{x}) & = \mbox{sign}(w_{10} + w_{11}x_1 + w_{12}x_2) = \mbox{sign}(\mathbf{w}_1^T\mathbf{x})\\ h_2(\mathbf{x}) & = \mbox{sign}(w_{20} + w_{21}x_1 + w_{22}x_2) = \mbox{sign}(\mathbf{w}_2^T\mathbf{x}) \end{aligned} \]