Using iterative solvers – a few lessons learnt

I’ve been working on a couple of projects relating to camera and joint calibration on the Nao Robot for the last couple of years. During the last few months I put more focus on calibrating joints of this robot, particularly the legs.

Almost all calibration tools I’ve know of usually rely on non-linear iterative solvers, particularly the venerable Levenberg-Marquardt solver. The usual setting is the user define a cost function or a fitness function (usually in case of a genetic or like algorithm). This can also be introduced as an optimization problem depending on the field of interest.

This post is not meant to be an exhaustive review of the vast field of optimization or to give specific recommendations as I’m not an expert at this, but to share a few issues I came across and how they affect if not taken proper care of.

Lesson 1: Local or Global solver?

This is one of the most crucial selections in my opinion as it determines the time for calculation and many other factors. Why would this matter? Can’t we just throw in a global optimizer for all problems? – The simple answer is No!

Extrema example
Figure 1. Local and Global minimum; Source: I, KSmrq [CC BY-SA 3.0 (
Taking a look at Figure 1, if this plot relates to a cost function, then minimizing the cost is the objective, thus the solution would be at the global minimum. Another example can be in robotics, the case of inverse kinematics where there can be multiple solutions for a robot to reach the same place.

Majority of the solvers, rely on the gradient of the curve (Derivative or Jacobian matrix in case of multi-variable function), why? knowing the derivative helps the solver to quickly know if it is heading in the right direction! Some might not directly use the derivative, but see if it reduced or increased the cost.

The class of solvers called “local minima” solvers will terminate when they find a minima. Since they don’t spend searching the entire solution space, they are usually fast to give a solution. The ones that rely on derivative (AKA gradient) tend to converge fast as they usually employ “acceleration factors”. The overall result is quick results than brute-force search.

Unfortunately, this behavior of the local solvers also means it is sensitive to the initial position/ initial parameter vector, as the solution depends on where it started. Therefore these solvers work poorely when the cost function is noisy or discontinuous.

Why not use a global solver all the time?

Simply because it consumes a lot of time as it has to check the entire solution space (Consider a system with 20 variables with granularity of 0.1 and bounds of +- 5 -> (10/ 0.1)^20 possible solutions. There are huge systems with thousands of parameters. In order to reduce the workload, bounds for the parameters can be introduced and also to simplify the model. There are also methods such as Particle Swarm optimization which enables parallel processing, etc. Yet the results weren’t that magnificent in a few quick trials, coupled with the time for calculation, I would not recommend global solvers unless absolutely necessary.

This is a widely researched field and most of the machine learning or neural network training algorithms rely on one or more flavours of this class of solvers, clearly there are many new and old methods.

L1 : Conclusion

Try to use a local solver, such as Levenberg-Marquardt. But keep in mind of their limitations, particularly providing a good initial parameter set.

Lesson 2: Derivative free solvers?

This is another important choice and it will definitely ruin the day if not taken proper care of!.

  1. Is the cost/ fitness function differentiable?
  2. Is it noisy?
  3. Is there an analytical solution? Or is the numerical solution good/ stable?

If the answer to first is yes, then the options are wider, else the derivative free solvers have to be used or the cost function could be refactored to be differentiable.

If the second is also yes then one might need to employ smoothing functions or to refactor the cost function.

If differentiable but too much effort for analytical solution, then numerical differentiation could be used. But Numerical Differentiation poses additional dangers and can severely degrade performance if not used with care. I would call this the most important learnt-lesson for my case. Once I had a 10% drop of failures just by changing to central difference method alone! An alternative is Automatic Differentiation (Available with libraries like Eigen, see for more)

So far, in my experience, gradient or derivative or similarly motivated solvers converge fast under right conditions than simple brute force searches. So use them if possible. Else Genetic algorithms, CMA-ES, Pattern search, particle swarm optimization, etc could be used.

Lesson 3: Think before leap!

Before wasting hours or days, think it through which solver is good for the particular case. While it is tempting to use Levenberg Marquardt (probably the 3rd mention in this post) which works pretty well for a vast range of situations, it might not be the best for the job. Also pay attention to the factor of differentiability and quality of the differentiation if such solvers are used.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s