Skip to content

Gradients & Hessians

πŸ”’ Gradients & Hessians

Gradients and Hessians allow us to find directions of change and understand the shape (curvature) of functions of multiple variables.


🟒 1. The Gradient Vector (βˆ‡f\nabla f)

The Gradient is a vector of all first-order partial derivatives. βˆ‡f(x)=[βˆ‚fβˆ‚x1,…,βˆ‚fβˆ‚xn]T\nabla f(\mathbf{x}) = \left[ \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right]^T

Properties of the Gradient

  • Steepest Ascent: The gradient points in the direction where the function ff increases most rapidly.
  • Normal Vector: The gradient at (a,b)(a, b) is perpendicular to the level curve f(x,y)=kf(x, y) = k.
  • Magnitude: βˆ£βˆ‡f∣|\nabla f| is the maximum rate of change.

Directional Derivatives

The rate of change of ff in the direction of a unit vector u\mathbf{u} is: Duf=βˆ‡fβ‹…uD_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}


🟑 2. The Hessian Matrix (Curvature)

The Hessian is a square matrix of all second-order partial derivatives. Hij=βˆ‚2fβˆ‚xiβˆ‚xjH_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}

For a function f(x,y)f(x, y): H=[fxxfxyfyxfyy]\mathbf{H} = \begin{bmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{bmatrix}

Using the Hessian

  • Shape: If the Hessian is positive definite, the surface is shaped like a β€œbowl” (convex).
  • Optimization: We use the Hessian to determine if a critical point is a local maximum, local minimum, or a saddle point.

πŸ”΄ 3. The Second Derivative Test

To find local extrema of f(x,y)f(x, y), first find critical points (a,b)(a, b) where βˆ‡f(a,b)=0\nabla f(a, b) = 0. Calculate D=fxx(a,b)fyy(a,b)βˆ’[fxy(a,b)]2D = f_{xx}(a, b)f_{yy}(a, b) - [f_{xy}(a, b)]^2.

  1. If D>0D > 0 and fxx(a,b)>0f_{xx}(a, b) > 0, then (a,b)(a, b) is a local minimum.
  2. If D>0D > 0 and fxx(a,b)<0f_{xx}(a, b) < 0, then (a,b)(a, b) is a local maximum.
  3. If D<0D < 0, then (a,b)(a, b) is a saddle point.
  4. If D=0D = 0, the test is inconclusive.

🎯 4. Taylor Approximation (Quadratic)

We can approximate a function ff around a point a\mathbf{a} using its gradient and Hessian: f(x)β‰ˆf(a)+βˆ‡f(a)T(xβˆ’a)+12(xβˆ’a)TH(a)(xβˆ’a)f(\mathbf{x}) \approx f(\mathbf{a}) + \nabla f(\mathbf{a})^T (\mathbf{x} - \mathbf{a}) + \frac{1}{2} (\mathbf{x} - \mathbf{a})^T \mathbf{H}(\mathbf{a}) (\mathbf{x} - \mathbf{a})


πŸ’‘ Practical Example: Gradient Descent Step

In machine learning, we update parameters in the opposite direction of the gradient to minimize error.

import numpy as np

def gradient_descent(x_start, lr, n_steps):
    x = x_start
    for _ in range(n_steps):
        # f(x) = x^2, so f'(x) = 2x
        gradient = 2 * x
        x = x - lr * gradient
    return x

# Minimum is at 0
min_point = gradient_descent(x_start=10, lr=0.1, n_steps=50)
print(f"Approximated Minimum: {min_point}")

πŸš€ Key Takeaways

  • Gradients point up.
  • Hessians describe the β€œbend” of the surface.
  • The Second Derivative Test uses the Hessian to classify critical points.