# A new direction in optimization

Post date: Jul 21, 2009 2:19:57 AM

Early on in my process, i tried to implement a gradient descent optimizer, but the algorithm i used was very sensitive to the step size taken at each iteration and i could not figure out a way to make it robustly converge from an arbitrary starting position. After that experience, i began experimenting with a global Simulated Annealing optimizer. Due to the stochastic nature of the simulated annealing optimizer, the same parameters might sometimes converge to a true global optimum solution and other times would get stuck in a local minimum. This random nature and the many parameters made it extremely difficult to reliably tune the optimizer.

Because of this, I have been searching for another option for an optimizer. I decided to revisit the class of Gradient Descent optimizers, by studying the ITK source code. In ITK, I found another version of a gradient descent optimizer that used the magnitude of the gradient to normalize the step length that was taken at each iteration. This feautre makes it much more robust. I translated the C++ code into python and set it up to work for image registration.

Early results:

This trial was begun at: X,Y,Z translations (mm) and X,Y,Z rotations (degrees)

-0.312936991453 0.53450101614 0.496414005756 -8.7072395596 -5.74378739148 6.73976002672

The final position determined by the optimizer:

-0.373708620509 -0.447215463732 -0.380775420333 -0.664190692257 1.00376091072 0.11370633275

These results compare favorably with the results from the simulated annealing optimizer, but since the Regular Step Gradient Descent optimizer is deterministic, the results will be repeatable every time for the same starting position and parameter set.

I am currently running the set of 200 starting positions with the same parameter set to determine the capture range of the optimizer.

The results will be updated below in real time.

These are the displacements and angulation errors for 200 starting positions (from near 0 to 20 mm/deg rms).

If we focus on only the first 70 starting positions (7mm/deg rms), we see that the optimizer does a pretty good job of converging: