Gradients & Hessians
π’ Gradients & Hessians
Gradients and Hessians allow us to find directions of change and understand the shape (curvature) of functions of multiple variables.
π’ 1. The Gradient Vector ()
The Gradient is a vector of all first-order partial derivatives.
Properties of the Gradient
- Steepest Ascent: The gradient points in the direction where the function increases most rapidly.
- Normal Vector: The gradient at is perpendicular to the level curve .
- Magnitude: is the maximum rate of change.
Directional Derivatives
The rate of change of in the direction of a unit vector is:
π‘ 2. The Hessian Matrix (Curvature)
The Hessian is a square matrix of all second-order partial derivatives.
For a function :
Using the Hessian
- Shape: If the Hessian is positive definite, the surface is shaped like a βbowlβ (convex).
- Optimization: We use the Hessian to determine if a critical point is a local maximum, local minimum, or a saddle point.
π΄ 3. The Second Derivative Test
To find local extrema of , first find critical points where . Calculate .
- If and , then is a local minimum.
- If and , then is a local maximum.
- If , then is a saddle point.
- If , the test is inconclusive.
π― 4. Taylor Approximation (Quadratic)
We can approximate a function around a point using its gradient and Hessian:
π‘ Practical Example: Gradient Descent Step
In machine learning, we update parameters in the opposite direction of the gradient to minimize error.
import numpy as np
def gradient_descent(x_start, lr, n_steps):
x = x_start
for _ in range(n_steps):
# f(x) = x^2, so f'(x) = 2x
gradient = 2 * x
x = x - lr * gradient
return x
# Minimum is at 0
min_point = gradient_descent(x_start=10, lr=0.1, n_steps=50)
print(f"Approximated Minimum: {min_point}")π Key Takeaways
- Gradients point up.
- Hessians describe the βbendβ of the surface.
- The Second Derivative Test uses the Hessian to classify critical points.