Gradients & Hessians

🔢 Gradients & Hessians

Gradients and Hessians allow us to find directions of change and understand the shape (curvature) of functions of multiple variables.

🟢 1. The Gradient Vector ( $\nabla f$ )

The Gradient is a vector of all first-order partial derivatives. $\nabla f(\mathbf{x}) = \left[ \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right]^T$

Properties of the Gradient

Steepest Ascent: The gradient points in the direction where the function $f$ increases most rapidly.
Normal Vector: The gradient at $(a, b)$ is perpendicular to the level curve $f(x, y) = k$ .
Magnitude: $|\nabla f|$ is the maximum rate of change.

Directional Derivatives

The rate of change of $f$ in the direction of a unit vector $\mathbf{u}$ is: $D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$

🟡 2. The Hessian Matrix (Curvature)

The Hessian is a square matrix of all second-order partial derivatives. $H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}$

For a function $f(x, y)$ : $\mathbf{H} = \begin{bmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{bmatrix}$

Using the Hessian

Shape: If the Hessian is positive definite, the surface is shaped like a “bowl” (convex).
Optimization: We use the Hessian to determine if a critical point is a local maximum, local minimum, or a saddle point.

🔴 3. The Second Derivative Test

To find local extrema of $f(x, y)$ , first find critical points $(a, b)$ where $\nabla f(a, b) = 0$ . Calculate $D = f_{xx}(a, b)f_{yy}(a, b) - [f_{xy}(a, b)]^2$ .

If $D > 0$ and $f_{xx}(a, b) > 0$ , then $(a, b)$ is a local minimum.
If $D > 0$ and $f_{xx}(a, b) < 0$ , then $(a, b)$ is a local maximum.
If $D < 0$ , then $(a, b)$ is a saddle point.
If $D = 0$ , the test is inconclusive.

🎯 4. Taylor Approximation (Quadratic)

We can approximate a function $f$ around a point $\mathbf{a}$ using its gradient and Hessian: $f(\mathbf{x}) \approx f(\mathbf{a}) + \nabla f(\mathbf{a})^T (\mathbf{x} - \mathbf{a}) + \frac{1}{2} (\mathbf{x} - \mathbf{a})^T \mathbf{H}(\mathbf{a}) (\mathbf{x} - \mathbf{a})$

💡 Practical Example: Gradient Descent Step

In machine learning, we update parameters in the opposite direction of the gradient to minimize error.

import numpy as np

def gradient_descent(x_start, lr, n_steps):
    x = x_start
    for _ in range(n_steps):
        # f(x) = x^2, so f'(x) = 2x
        gradient = 2 * x
        x = x - lr * gradient
    return x

# Minimum is at 0
min_point = gradient_descent(x_start=10, lr=0.1, n_steps=50)
print(f"Approximated Minimum: {min_point}")

🚀 Key Takeaways

Gradients point up.
Hessians describe the “bend” of the surface.
The Second Derivative Test uses the Hessian to classify critical points.