Non linear optimization for parameter fitting example

In this page, we describe an example of nonlinear optimization in Scilab. We are searching for the parameters of a system of ordinary differential equations which best fit experimental data. The context is a chemical reaction for processing waters with phenolic compounds. We use the fminsearch, optim, derivative and leastsq functions in a practical case.

These examples must be used under the terms of the CeCILL.

The problem

Consider the following system of ordinary differential equations:

$$\frac{dy}{dt}=f(t,y)$$

$$y(t_0)=y_0$$

where $$y_0$$ is the initial state, f is the right hand side of the ODE. In our particular case, the size of the y is equal to 2. The right hand side function f is defined by the following function.

function dy = myModel ( t , y , a , b ) 
   // The right-hand side of the Ordinary Differential Equation.
    dy(1) = -a*y(2) + y(1) + t^2 + 6*t + b 
    dy(2) = b*y(1) - a*y(2) + 4*t + (a+b)*(1-t^2) 
endfunction 

Notice that the function takes the extra parameter a and b as input arguments. The parameters a and b are unknown. We are given the following experimental data, which indicates the value of y for particular values of the time t.

//
// Experimental data 
t = [0 1 2 3 4 5 6]'; 
y_exp(:,1) = [-1 2 11 26 47 74 107]'; 
y_exp(:,2) = [ 1 3 09 19 33 51 73]'; 
//
// Store data for future use
global MYDATA;
MYDATA.t = t;
MYDATA.y_exp = y_exp;
MYDATA.funeval = 0;

The number of calls to the function funeval is stored in the data structure. This will be useful later in this document, when we will compare the algorithms.

We are searching for the parameters a and b which best fit the experimental data.

Non linear least squares

Given a and b, we can use the ode function and get the value of y at the given times t. The problem is that a and b are unknown.

In order to find the parameters a and b, we define a least squares function, which measures the difference between the simulated value of y, as predicted by the ode function, and the experimental data. The problem is then the unconstrained non linear least squares optimization problem:

Minimize L(k)

where k=[a;b] is the unknown.

We proceed in two steps. First, we define a function which returns a column vector containing the differences between the simulated ordinary differential equation and the experimental data. Then we define the objective function as the sum of squares of these differences.

The function myDifferences below computes the difference between the simulated ODE, as computed by the ode function, and the experimental data. This function returns a column vector containing the differences of both the first component and the second component of the solution, i.e. it concatenates the first and second columns of the difference matrix. Since the function myModel takes 2 extra parameters as input arguments, we give the list list(myModel,a,b) as the 4th input argument of the ode function.

function f = myDifferences ( k ) 
    // Returns the difference between the simulated differential 
    // equation and the experimental data.
    global MYDATA
    t = MYDATA.t
    y_exp = MYDATA.y_exp
    a = k(1) 
    b = k(2) 
    y0 = y_exp(1,:)
    t0 = 0
    y_calc=ode(y0',t0,t,list(myModel,a,b)) 
    diffmat = y_calc' - y_exp
    // Make a column vector
    f = diffmat(:)
    MYDATA.funeval = MYDATA.funeval+ 1
endfunction 

The following L_Squares function defines the cost function of our optimization problem.

function val = L_Squares ( k ) 
    // Computes the sum of squares of the differences.
    f = myDifferences ( k ) 
    val = sum(f.^2)
endfunction 

The initial guess of the optimization problem is given:

// 
// Initial guess
a = 0.1; 
b = 0.4; 
x0 = [a;b]; 

The expected solution is

xstar = [2;3];
fstar = L_Squares ( xstar ) 

where the difference is very small :

-->fstar = L_Squares ( xstar ) 
 fstar  =
    2.029D-14  

With fminsearch

A straightforward way of finding the parameters a and b is to use the fminsearch function, which does not require the derivative of the objective function.

options = optimset("TolFun",1D-02,"TolX",1.D-02,"MaxFunEvals",1.0D+06,"MaxIter",1.0D+06); 
[xopt,fopt,flag,details] = fminsearch(L_Squares,x0,options)

The previous script produces the following output.

-->[xopt,fopt,flag,details] = fminsearch(L_Squares,x0,options)
 details  =
 
   algorithm: "Nelder-Mead simplex direct search"
   funcCount: 105
   iterations: 56
   message: [3x1 string]
 flag  =
    1.  
 fopt  =
    0.0058101  
 xopt  =
    1.9984543    2.9996091  

The least squares value fopt is small, so that this may be an acceptable solution.

The following plots present the evolution of the output variable y depending on the time t before and after optimization.

width=400

width=400

With optim

The non linear least squares problems are usually associated with a regular objective function. This suggests to use the optim function, which can manage smooth objective functions very efficiently.

In order to use the optim function, we must create a glue function, which can be directly used by the optim function. The gradient of the least squares function is computed with the derivative function.

function [f, g, ind] = modelCost (x, ind)
  f = L_Squares ( x )
  g = derivative ( L_Squares , x )
endfunction

Then we use the optim function, as in the following script.

-->[ fopt , xopt , gopt ] = optim ( modelCost , x0 )
 gopt  =
    0.0000005  
    0.0000001  
 xopt  =
    2.  
    3.  
 fopt  =
    2.804D-14  

This produces an extremely small gradient and provides a much more accurate optimum xopt than fminsearch.

With leastsq

Actually, the problem has much more structure than what we have used so far. Indeed, the current problem is not a general unconstrained non-linear optimization problem : it is a least-squares problem. This implies that the optimum can be found by specialized algorithms which are specifically designed for these problems.

The leastsq function is especially designed for non-linear least squares problems. It takes as input argument a function returning the differences to reduce (and not the sum of squares).

-->[fopt,xopt,gopt]=leastsq(myDifferences, x0)
 gopt  =
  - 1.060D-08  
  - 3.155D-09  
 xopt  =
    2.  
    3.  
 fopt  =
    1.804D-14  

Given that the least squares value fopt is very small, the solution is accurate.

The implementation of the leastsq function is in fact based on the optim function. Moreover, we notice that we do not have to provide the gradient, as required by the optim function. Indeed, the gradient can be automatically computed from the function (but it can also be provided by the user). By default, the gradient is based on finite differences. Notice that the gradient of a sum of squares has a particular structure and this is used by leastsq to compute the derivative of the function.

With lsqrsolve

The lsqrsolve function minimizes the sum of the squares of nonlinear functions by the Levenberg-Marquardt algorithm. It makes sense to use this algorithm, because it allows to use the most of the specific structure of our problem. Indeed, sum squares problems are very specific problems, for which the gradient can be accurately computed if the gradient of the functions are provided. Moreover, the Hessian matrix of this type of problems can be more accurately approximated, which may reduce the number of iterations.

In order to use the lsqrsolve function, we must provide a function which computes the differences to be minimized. The first argument of the function is x, the current point and m, the number of functions to be minimized. This is why we define the following function.

function f = myDiffLsqrsolve ( x , m )
  f = myDifferences ( x ) 
endfunction

In our case, the number of functions to minimize is 14, as shown in the following session.

-->y0 = myDifferences ( x0 );
-->m = size(y0,"*")
 m  =
    14.  

The following session shows how the lsqrsolve function performs in our case.

-->[xopt,diffopt]=lsqrsolve(x0,myDiffLsqrsolve,m)
 diffopt  =
    0.         
  - 2.195D-08  
  - 4.880D-08  
  - 8.743D-09  
    2.539D-08  
    7.487D-09  
  - 2.196D-08  
    0.         
    6.534D-08  
  - 6.167D-08  
  - 7.148D-08  
  - 6.018D-09  
    2.043D-08  
    1.671D-08  
 xopt  =
    2.  
    3.  
-->fopt = sum(diffopt.^2)
 fopt  =
    1.804D-14  

Conditionning of the problem

In unconstrained nonlinear optimization, we can measure the difficulty of the problem by computing the condition number of the Hessian matrix. This gives a measure of the narrowness of the curved valley in which the solver tries to find the minimum. This can be done easily, by combining the derivative function (which computes the derivatives with finite differences) with the cond function (which computes the 2-norm condition number).

The following script computes the gradient and Hessian matrix and computes the square root of the 2-norm condition number.

[g1opt,Hopt] = derivative(L_Squares,xopt,H_form="blockmat")
sqrt(cond(Hopt))

This produces :

-->[g1opt,Hopt] = derivative(L_Squares,xopt,H_form="blockmat")
 Hopt  =
    4605.9236    26.592208  
    26.592208    3712.8462  
 g1opt  =
    0.0000093    0.0000036  
-->sqrt(cond(Hopt))
 ans  =
    1.1140084  

This shows that the condition number is very close to 1. It implies that the current problem is very easy to solve numerically and that the solution xopt has many significant digits. If this number were close to 10^17, then we may doubt that the solution has any significant digit.

Computing the gradient

The method that the leastsq is based on the particular structure of a sum of squares. Indeed, we can compute the gradient of the objective function by knowing the value of the differences and the Jacobian matrix.

The following script computes the gradient of the sum of squares by using the derivative function on the L_squares function.

-->g10 = derivative(L_Squares,x0)
 g10  =
  - 22963763.    2594177.2  

In the following script, we compute the gradient of the sum of squares function.

d0 = myDifferences ( x0 );
J0 = derivative(myDifferences,x0);
g20 = 2*d0'*J0

We get :

-->g20 = 2*d0'*J0
 g20  =
  - 22963763.    2594177.2  

We see that the two evaluations match.

A comparison of the algorithms

Given that there are several algorithms to solve the same problem, we may wonder what algorithm works best.

The following function will help us to summarize the results of a given algorithm.

function printsummary(solvername,fopt,xopt,xstar,funeval)
  mprintf("%s:\n",solvername);
  mprintf("  Function evaluations: %d\n", funeval);
  mprintf("  Function value: %e\n", fopt);
  mprintf("  X: [%e,%e]\n", xopt(1), xopt(2));
  xopt=xopt(:)
  mprintf("  Absolute error on X: %e\n", norm(xopt-xstar));
endfunction

In the following script, we measure the performances of the fminsearch algorithm.

MYDATA.funeval = 0;
options = optimset("TolFun",1D-02,"TolX",1.D-02,"MaxFunEvals",1.0D+06,"MaxIter",1.0D+06); 
[xopt,fopt,flag,details] = fminsearch(L_Squares,x0,options);
printsummary("fminsearch",fopt,xopt,xstar,MYDATA.funeval);

The previous script produces the following output.

fminsearch:
  Function evaluations: 105
  Function value: 5.810101e-003
  X: [1.998454e+000,2.999609e+000]
  Absolute error on X: 1.594399e-003

It is then easy to gather the results and to format them into the following table.

Algorithm

Function Evaluations

Function value

Absolute Error on X

fminsearch

105

5.810101e-003

1.594399e-003

optim

200

2.818684e-014

1.946910e-009

leastsq

220

1.803679e-014

1.096167e-009

lsqrsolve

31

1.803705e-014

1.108243e-009

We see that there are many differences between the algorithms. To understand these differences, we focus on the purpose of the algorithms and the way they can get informations from the problem.

The best algorithm for this application is, without any doubt, lsqrsolve. We should not be surprised by this results, given that this algorithm is designed specifically for this type of problems. As a result, the lsqrsolve asks for more details than the other functions.

  1. Notice that the function requires to compute the differences, and not the sum of their squares. Hence the gradient of the objective function can be computed more accurately.
  2. The Hessian matrix of a sum of squares has a very specific form. Hence, the algorithm can approximate it more accurately than a general-purpose function such as optim. This can reduce the number of iterations to get to the optimum.

All in all, lsqrsolve has more informations that the solvers which only get the sum of squares as their input.

The leastsq function seems to perform nearly as well as leastsq, although it requires more function evaluations. Scilab is Open-Source: hence, we know that leastsq is based on the optim solver. This why the optim function performs nearly as well as leastsq in general. But we may expect that leastsq can work a more accurately in some cases. This is because leastsq asks for the differences to be minimized: the sum of squares is computed internally. Hence, the gradient of the sum of squares is computed from the gradient of the differences. This can reduce the error produced by finite differences in some cases.

The optim function works well in this case, although the number of function evaluations is relatively large with respect to lsqrsolve. In fact, optim has a much more restricted view of the structure of the problem, since it only have the sum of squares to minimize, compared to the detailed information which is passed to lsqrsolve. Indeed, the optim solver is a more general-purpose algorithm. The algorithm used by optim/qn is actually excellent: it is based on a BFGS-type unconstrained Quasi-Newton solver, combine with a robust line-search safeguarded algorithm. The update of the approximated Hessian matrix is based on the update of the Cholesky factors, which guarantees a good performance. Moreover, the optim/cg algorithm can manage relatively large optimization problems, based on a limited-memory BFGS algorithm.

The fminsearch algorithm has the poorest results in this case, since the precision that it gives in terms of function value is large and since the absolute error on X is also large. In fact, the algorithm used by fminsearch only uses function values and does not require derivatives. This is an advantage in many situations where computing the derivatives is difficult. In the context of least squares minimization, the gradient can be accurately computed by finite differences, which brings a huge advantage. On the other hand, fminsearch provided a relatively good reduction of the function value, which may be sufficient for some applications.

Acknowledgements

Thanks to Marcio Barbalho for providing this example.

The script

The complete script is available here:

References

public: Non linear optimization for parameter fitting example (last edited 2012-11-30 09:15:31 by michael.baudin@contrib.scilab.org)