%%html <iframe src='./examples/gradient_descent.html', width=820, height=700></iframe>
This graph attempts to show how a Wolfe Line Search works. The goal here is to move downwards along the gradient so that the loss is reduced sufficiently (controlled by the c1 parameter) and also that the slope of the loss is decreased sufficiently (controlled by the c2 parameter). Making sure the slope decreases sufficiently ensures that we don't take too many short steps, and is usually the more important parameter here.
The goal here isn't to exactly find the best point along the line, but to cheaply find a good enough point. The black dots represent points that were calculated as part of doing the line search. Minimizing the total number of samples taken while still converging quickly is the goal here:
%%html <iframe src='./examples/line_search.html', width=880, height=660></iframe>