D.P. Solomatine
One of the major elements in constructing most mathematical models of natural phenomena is the choice of model parameters' values, or model calibration, which can be formulated as an optimization problem. The present document gives an overview of the calibration problem as an optimisation problem, describes various approaches to optimization, and provides results of applying the selected randomized search techniques (controlled random search, canonical genetic algorithm, and adaptive cluster covering) in solving several minimization problems, including calibration of a hydrologic rainfall-runoff model. Adaptive cluster covering (ACCO) has appeared to be very efficient, requiring the lowest number of function evaluations. The combination of ACCO with the subsequent direct Powell-Brent method of the local search starting from the identified minimizer estimate (ACCOL algorithm) combines rather low number of function evaluations with the high accuracy. The mentioned algorithms are implemented in the form of a prototype PC-based tool GLOBE which can be used for calibration of a wide range of models.
The objective of calibration (parameter optimization) of any model of a physical system is to identify the values of some parameters in a model which are not known a priori. This is achieved by feeding this model with input data, and comparing the computed output variables to the variable values measured in the physical system. Their difference should be minimized by solving an optimization problem, with independent variables being the unknown model parameters.
Given the vector OBSt of observed output variables values, and the corresponding vector MODt of the modelled values at time moments t = 1...T, among the widely used evaluation criteria for the discrepancy (error) between the model results and observations (or, in other words, goodness of fit of the model) the least square fir (SLS) is often used, or a more genral one:
where
We pose an optimization problem as follows. Find an optimizer x* such that
where the objective function (in calibration problems, a discrepancy measure) f(x) is defined in the finite interval (box) region of the n-dimensional Euclidean space:
This constrained optimization problem can be transformed to an unconstrained optimization problem by introducing the penalty function with the high value outside the specified constraints - this allows for using many of the algorithms mentioned below which were initially developed for unconstrained minimization.
In the cases when the exact value of an optimizer cannot be found, we will speak about its estimate, and correspondingly, about the minimum estimate.
Approaches to solving the problem (5)-(6) depend on the properties of f(x). The following cases may be considered:
(1) f(x) is a single-extremum analytically expressed function;
(2) f(x) is a single-extremum function which is not analytically expressed;
(3) no assumptions are made over the properties of f(x), so
in general case it is multi-extremum function which is not expressed analytically.
The problem of single-extremum minimization (or local search problem) has been extensively studied for the last 50 years (references and numerical codes can be found e.g. in Jacobs 1977, Press et al. 1991). If analytical expression of an objective function is available, and derivatives can be computed, then gradient methods are used:
- steepest descent method (classical, often mentioned, but inefficient method);
- conjugate gradient methods (Fletcher-Reeves, Polak-Ribiere algorithms, see Polak 1971);
- quasi-Newton or variable metric methods, like Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS) (see Jacobs 1977);
- set of methods using second derivatives values or assessments.
In the case when no analytical expression is given and derivatives cannot be computed, the following direct search methods can be used:
- downhill simplex (Nelder and Mead 1965) - not to be confused with the simplex method in linear programming;
- rotating directions (Rosenbrock 1960);
- direction set method (Powell 1964) with modifications of Brent 1973.
The use of the methods from this class in hydrological models calibration has been demonstrated, e.g. by Johnston and Pilgrim 1976, Pickup 1977, Sorooshian 1985.
Most of the engineering applications done in the 1970s and 1980s are
using the listed methods for the single-extremum functions minimization,
but often without the investigation of the property of unimodality. Most
researchers recognized the problem of the "good" initial starting point,
and even mentioned the necessity of trying several of such points, but
the rigorous practical procedures have been outside their attention. Partly,
this can be attributed to the lack of the wide awareness of engineering
community of the developments in the area of global optimization.
Methods of global search are not based on the assumption of single-extremum, and some of the groups of such methods are referred below. For the more extensive coverage of the topic see, e.g., Archetti and Schoen 1984, Torn and Zilinskas 1989, Zhigljavski 1991. It is necessary to mention, that many algorithms aimed at global search, include various ideas so that no accurate grouping of methods is possible. For the purposes of this paper we distinguish the following groups:
- set (space) covering techniques;
- random search methods;
- methods based on multiple local searches (multistart);
- evolutionary and genetic algorithms;
- other methods (simulated annealing, trajectory techniques, tunneling
approach, analysis methods based on a stochastic model of the objective
function).
In set covering methods the search space is covered by N subsets X1,...,XN, such that
Then the objective function is evaluated in N representative points {x1, ..., xN}, one representing one subset, and an optimizer with the least function value is taken as an approximation of a global one.
If the way of choosing these sets (or points representing these sets) does not depend on the values that f takes in these points, then the point set {x1, ..., xN} is called a grid, and the corresponding minimization algorithm is called a grid (passive covering) algorithm.
Typically, cubic and rectangular grids are considered; in this case the number of function evaluations grows exponentially with the dimension n. It is also possible to construct random grids; in our classification this type of algorithm, if a uniform distribution is used, falls into the group of "pure random search methods" and is covered below.
For some types of functional classes there exist grid algorithms that are optimal in the sense of number of function evaluations, typically in the minimax (worst case) sense. However, in most practical problems, when objective functions appear to be dissimilar to the functions where optimal algorithms perform best, the grid algorithms are very inefficient.
If, however, all previously chosen points {x1, ..., xk} and function values {f(x1), ..., f(xk)} may be used when choosing the next point xk+1, then the algorithm is called a sequential (active) covering algorithm (for an example of such, see Pinter 1986). This class of algorithms is prefered to grid algorithms since it provides for the more directed type of search.
If f(x) is a Lipschitz function, i.e. the one for which there holds:
(i.e. the variation rate of an objective function is bound by a constant L), then the set covering algorithms have some theoretically attractive properties. One of them is the possibility to find a global minimum with any predefined accuracy, which gives the possibility to establish convergence criteria. For example, if domain X is covered by the balls of radius , then the minimization problem can be solved with the accuracy
with respect
to values of f. In most problems the Lipschitz constant L
is apriori unknown, but it can be estimated during the search.
Note, that if f is continuously differentiable on X,
then (9) always holds. However, many models exhibit "step-wise" changes
in their outputs, so the objective function can be non-smooth and even
discontinuous; in these cases f is not everywhere differentiable,
the condition (8) does not hold and establishing the convergence properties
of algorithms becomes difficult. For example, many rainfall-runoff tank
models exhibit non-smooth behaviour due to the possibility of abrupt changes
in runoff.
Pure direct random search (uniform sampling)
We will start with the simplest scheme of obtaining an optimizer assessment,
a pure direct random search (referred also as pure random search,
Monte Carlo search, or direct search):
Draw N points from a uniform distribution in X and
evaluate f in these points. The smallest function value found
is assessment for f*.
In spite of its obvious simplicity, the pure direct random search offers an asymptotic guarantee in a probabilistic sense. Let X0 \&} X, then the probability that at least one point from uniform sample {xj, j = 1,...,N} will belong to X0 is given by
where m() is the Lebesgue measure of a set (which is, in a sense, is analogous to volume). Now, if X0 is a neighborhood of x*, then one could derive, given and a probability level P´, a sample size
and assume
as an approximation to f*. This allows to prove that
the following:
if f is continuous, then with the increase of N, fN*
converges to the global minimum value f* with probability 1.
If f is a Lipschitz function, then the effectiveness of random
sampling can be evaluated against that given by grid search. Uniform random
sampling is argued to be more effective than grid search for dimension
n>6 (for additional references, see Rinnoy Kan and Timmer
1987). Nonetheless, pure random search in this form, like grid methods,
requires the number of function evaluations that grows exponentially with
dimension n. One of the possible improvements is to make the
generation of evaluation points in a sequential manner; then the previously
chosen points and function values are taken into account when the next
point is chosen. By analogy with set covering methods, this approach can
be called sequential (active) random search, but traditionally
another term is used - adaptive random search.
Adaptive random search (ARS) and controlled random search (CRS)
The basic structure of many random search methods can be outlined by
the following sequence:
Step 0. Choose a point x0, set number of iteration k=0.
Step 1. Generate one (or several) new trial point yk = xk+rk, where rk is a random vector drawn from a distribution µk; yk X.
Step 2. Generate xk+1 according to some rule, e.g.
Adaptive random search. Different random search methods depend on the choice of µk and on the rule used in step 2. Most naturally will be to choose µk depending on f(x1),...,f(xk). In this case the search is called adaptive random search (ARS) (Schumer and Steiglitz 1968, Pronzato et al. 1981).
An idea applied in adaptive random search most often, is the following one. When xk is far from x*, the domain space must be investigated by drawing rk with large variance, and when xk is close to x*, a smaller variance is to be prefered for more detailed investigation. Since it is not known, how far xk is from x*, the scheme proposed by Bekey and Masri 1983 and modified by Pronzato et al. 1984 will be to repeatedly alternate two phases. At a first phase the variance of xk yielding the best value of criterion is selected among a set of predefined variances (for example, every time selecting the variance which is 10 times smaller than the previous one). At the second phase this most successful variance is used for the fixed number of iterations.
Controlled random search. Price (1983) proposed
a version of an algorithm called controlled random search (the
version considered in the referred paper was abbreviated as CRS2).
In it, the vector rk is generated on the basis of
a randomly chosen subset of previously generated points. At each iteration,
the worst point in the initially generated set is replaced by new trial
point.
Step 1. Generate a set A of N random points and evaluate f in these points.
Step 2. Determine the point xa with largest (worst) function value fa, and point xb with the least (best) function value fb.
Step 3. Choose from A randomly any m distinct points r2, r3...,rm+1 excluding xb. Let r1 = fb. Determine the centroid G of the points r1, r2...,rm. Determine the trial point P such that P = 2G-rn+1.
Step 4. If P is outside imposed constraints, then go to 3.
Step 5. Determine function value fP at point P. If fP > fa, then go to 3.
Step 6. In set A, replace the point fa with fP.
Step 7. If stopping criterion not satisfied, go to 2.
As a stopping criterion, Price used the following condition of absolute
convergence: fa - fb < 10-6.
This criterion is not applicable for many situations, so in our study
we found it reasonable to change it to a more complex criterion including
fractional convergence (ratio between function value at best point and
at some point in the middle of a sorted set) of 10-2, or if
the situation occurs when half of all the points in A become
the same.
Lately, there were several variations of the controlled random search
proposed.
The basic idea of the family of the so-called multistart methods is to apply a local search procedure several times, and then to choose an assessment of global optimizer. Here, we will concentrate mainly on one of the versions of multistart based on the beforehand clustering of potential starting points.
For the following discussion, we will define the region (area) of attraction of a local minimum x* as the set of points in X starting from which a given local search procedure P converges to x*. In an ideal case, the multistart methods aim at starting this local search procedure exactly once in every region of attraction.
Multistart methods can be structured after the following pattern:
1. construct a set Xq of q points by drawing them from X according to a certain rule, and compute function values in these points;
2. construct set Xc by selecting `the most promising points' from Xq, and start from them a local search routine obtaining a least value fc;
3. a termination criterion indicates wether to stop or to return to step 1. The local minimum with the smallest function value fc found is the approximation of the global minimum f*.
Constructing Xq. The rule in step 1 may be deterministic (e.g., points may be generated from a deterministic grid), or non-deterministic - then points are sampled from a distribution in X. Typically, a uniform distribution is used, then the Proposition 1 can be applied, and already at this stage the conservative probabilistic bounds for the error in finding global optimizer can be computed.
Constructing Xc. The selection of `the most promising' points can be done in a number of ways. One of the ways, referred as reduction is simply the removal of a certain fraction of the sample points with the highest function values.
Another way of constructing Xc, referred as clustering is to create groups (clusters) of mutually close points that hopefully correspond to relevant regions of attraction. Then local search procedure can be started once in every such region, e.g. from its centroid. In order to identify such clusters, the methods of cluster analysis (also called pattern recognition, or automatic classification) are used (Hartigan 1975, Torn and Zilinskas 1989).
There are two typical problems connected with clustering methods. First, clusters may contain several areas of attraction, and the local search procedure launched once may miss the global minimum; we will call such situation underclustering. Second, one region of attraction may be divided over several clusters, and in this case the local search procedure will locate the same minimum more than once with the corresponding redundant function evaluations; this situation may be called overclustering.
Stopping rules. An important problem is the one of stopping the multistart algorithm, i.e. the decision wether a local minimum can be accepted as the global one. An important step taken by Zielinski 1981 towards a mathematical analysis of the stopping problem, has been followed by other works of (see Snijman and Fatti 1985, Boender et al. 1985).
It has been shown, how to find Bayesian posterior expectation (1) of probability that there are K local minima; (2) of the number of local minima, and (3) of the relative size of the still non-observed regions of attraction - given that W different local minima have been found in M local searches. On the basis of such estimates it becomes possible to construct the optimal Bayesian stopping rules - such rules incorporate assumption about the costs and benefits of continuing search, and weigh these against each other probabilistically to calculate the optimal stopping point.
One the interesting conclusions of these analyses is that the number of local searches that has to be performed, depends only on the number of minima located and not on the dimension n (of course, the computational effort for each local search increases with the increase of n).
Extensive coverage of clustering methods is given in Rinnoy Kan
and Timmer 1987. They analyse various aspects of the approach, including
three methods of cluster identification - density clustering, single linkage
clustering, and mode analysis. Also, some deficiencies of the method are
presented.
The family of evolutionary algorithms is based on the idea of modelling the search process of natural evolution, although these models are crude simplifications of biological reality. In fact, any random search may be interpreted in terms of biological evolution: generating a random point is analogous to a mutation, and the step towards the minimum after a successful trial is a selection.
Evolutionary algorithms introduce some important modifications to random
search, and use the terminology from biology and genetics. The algorithmic
description follows.
0.Set population number t = 0.
1 (initialize population). Construct a set P(t) of N points by drawing them from X according to a certain rule.
2 (evaluate initial population). Compute function values in all points of the population.
3 (recombine). Recombine points from P(t) according to the recombination operator, creating new points (individuals) called offsprings; put them to the population thus creating set P1(t).
4 (mutate). Modify (mutate) coordinates of some points from P1(t) according to the mutation operator, creating set P2(t).
5 (evaluate population). Compute function values in all points of the population P2(t).
6 (select). Apply selection operator and select some best points from P2(t), thus creating the next population P(t+1).
7. If termination criterion is not fulfilled, then increase t
and goto step 3, otherwise stop.
Historically, evolution algorithms have been developed in three variations - evolution strategies (ES), evolutionary programming (EP), and genetic algorithms (GA). Back and Schwefel 1993 give an overview of these approaches, which differ mainly in the types of mutation, recombination and selection operators.
A canonical genetic algorithm (Holland 1975, Goldberg 1989, Michalewicz 1992) received lately a lot of attention, and not only in professional publications. In GA, the binary coding of coordinates is introduced, so that an l-bit binary variable is used to represent integer code of one coordinate xi, with the value ranging from 0 to 2l-1 that can be mapped into the real-valued interval [ai,bi]. Connecting the codings of all coordinates, we get the overall binary string G of length nl for each point, which is called genotype.
For example, if n=4 and l = 5, and the
point (5, 20, 9, 25) is given, then connecting the codings of all coordinates
will give the following binary string:
00101 10100 01001 11001 (nl =28 bits)
x1 x2 x3 x4
Now, mutation operator changes a randomly chosen bit in the
string G to its negation. Recombination (or crossover)
operator is in the following: select two points (parents) S and
T from the population according to some rule (e.g., randomly),
select a number (e.g., randomly) between 1 and nl,
and form either one new point S', or two new points S'
and T', by taking some of the bits of coordinate values from
first parent S, and some from the other parent T according
to the following:
S' = { S1,...,S,T+1,...,Tnl }
T' = { T1,...,T,S+1,...,Snl
}
For example, if two points S and T are coded, and =7:
S= 00101 10100 01001 11001
T= 10001 11111 00011 00011
(=7, so cut is made here)
then the result of the recombination operator will be two offsprings
S' and T':
S'= 00101 10100 01011 00011
T'= 10001 11111 00001 11001
x1 x2 x3 x4
This one-point crossover can be generalized to a m-point crossover by sampling more than one breakpoint and alternately exchanging each second resulting segment, or even to a uniform crossover when a random decision is made whether to exchange information between parents for each bit of G (on multi-point crossover see De Jong and Spears 1992).
There are different versions of GA known. For example, a real-valued version of crossover allows the breakpoint to occur only between codings of individual coordinates, so that can take values in {l, 2l, 3l,...,(n1)l}. In this case, crossover is in fact, exchange of coordinates' values, and the binary coding is not needed at all. In our opinion, such version resembles the process of crossover in nature even closer, since coordinates can be seen as genes, and genes are not split.
Selection can be done in several ways - by selecting µ best points out of the set of offspring points (so that the parents are immediately excluded), or out of the joint population of parents and offsprings. Third possibility is to perform selection of the second type, but allowing parents to survive in the population only for a fixed number of populations. In these two last cases the selection is called elitist, since it chooses only for best points, thus guaranteeing a monotonically improving performance.
In evolutionary strategies (ES), mutation of coordinates is
performed with respect to corresponding variances of a certain n-dimensional
normal distribution. Recombination is known in two variants: discrete
recombination is similar to a crossover in GA with real-valued coding;
intermediate recombination creates a point S, such that
S'i = Si + g (Ti
- Si)
where g {belongs to} [0,1] is a uniform random variable. Selection in
ES, like in GA, can be done in two ways - by selecting µ best points
out of set of K offspring points (so-called (µ,K)-selection), or
out of the union of parents and offsprings ((µ+K)-selection). Applications
of evolutionary algorithms, especially GAs, are found in many areas, see,
e.g., Wang 1991, Cieniawski 1995, Gulsen et al 1995.
To this group the following solution and analysis methods may be ascribed:
- simulated annealing (see e.g., Bohachevski et al. 1986);
- trajectory methods, based on the idea of finding many, hopefully all, local optima by means of numerical integration of a differential equation. One of the examples of such approach is the so-called "heavy ball" method by (Pshenichnyi and Marchenko 1967) according to which the search trajectory coincides with the trajectory of a ball motion on the surface generated by the objective function; the ball may pass by inertia over flat hollows but may also stop in one of them.
- tunneling (Levy and Montalvo 1985) techniques which, based on one local minimum found, seek minimum of specially constructed "tunnelling" function giving a new local minimum of the objective function;
- methods of analysis based on a stochastic model of the objective
function assume that the objective function can be represented as
a sample path of a stochastic process on X. This approach gives
also additional possibilities in constructing stopping rules
and analysis of other algorithms, for example, it allows to move from
the worst case analysis (based on maximum guaranteed error) to average
case analysis (based on average error), by computing the a posteriori
(Bayesian) expected performance of algorithms. The probabilistic
bounds for the error in finding global optimizer can also be computed.
This type of approach has been reported by the group of researchers from
Lithuania (see Mockus et al. 1978, Thorn and Zilinskas 1989),
and also by Zielinski 1981, Betro and Rotondi 1984,
and Boender et al. 1985. Some results in this area we mentioned
when considering clustering methods.
Currently, it is possible to identify three large groups of researchers linked to the following research topics:
- traditional optimization (non-linear optimization, internal point methods, etc.);
- traditional global optimization (set covering, random search methods, analysis based on stochastic model of objective function);
- evolutionary and genetic algorithms;
- other methods (tabu search, simulated annealing etc.).
It seems, that for a long time there was lack of communication and exchange of ideas between these groups. For example, works on global optimization almost never mention evolutionary approaches, and visa versa, papers, books and discussions on Internet on evolutionary strategies and genetic algorithms tend not to notice large area of research called global optimization, and understand under `optimization' mainly derivative-based methods.
There are some works where the proposed search strategies would inherit some features from different methods. One example of such approach is presented by Fox 1993, where tabu search and simulated annealing are combined with genetic algorithm. Another example is given by Duan et al., 1993; there, the controlled random search (Price 1983) is combined with the local search by downhill simplex method (Nelder and Mead 1965) in several areas, and the periodic shuffling of points from different areas (the term `evolution' applied to Nelder-Mead search must not be confused with the notions used in evolutionary algorithms).
One of the objectives of this paper is to show the common features of
many global optimization methods (even if different terminology is used),
and attract attention of all groups working in the area of global optimization
(in the broad sense) to the possibilities of using best features of various
methods.
Historically, an approach presented here was initially conceived as a workable combination of some common sense ideas that could be useful in general-purpose search. Later, of course, the author has discovered that most of these ideas have been long discussed in the literature on global optimization, and used to some extent in different methods of search. However, the way in which it is proposed to combine these ideas, appears to be new.
The strategy, proposed here, is based on the following principles:
Principle 1. Clustering of perspective points in the domain of objective function (principle of clustering).
Principle 2. Sequential random search in certain constrained regions of the domain (principle of repetitive covering).
Principle 3. Shifting the regions of search depending on the obtained information about the function behaviour (principle of adaptation).
Principle 4. Periodic randomization of search in order to cope with the problem of possible "unfortunate" random generation of points that happened to be far from the global optimum (principle of periodic randomization).
Depending on the concrete implementation of each of these principles, it is possible to construct various algorithms, suitable for certain situations, for example, different types of objective functions.
Details of ACCO strategy are described in a separate publication (Solomatine,
1995).
Algorithm ACCOL: combination of ACCO with multiple local searches
In this strategy of global search, referred as ACCOLm (adaptive cluster covering with m local searches), there are two phases:
- (ACCO phase) ACCO algorithm is used to find several regions of attraction, represented by the promising points which are close to potential minimizers (such points we will call "potent"). The potent set P1 is formed by taking one best point from Rk found for each cluster during progress of ACCO. After ACCO stops, the set P1 is reduced to P2 by leaving only several m (1...4) best points wich are also distant from each other, with the distance at each dimension being larger than, for example 10% of the range for this dimension;
- (LS phase) an accurate algorithm of non-derivative (direct) local search (LS) is started from each of potent points of P2 (multistart) in order to accurately find the minimum. As a LS algorithm, we used a version of Powell non-derivative minimization, where the line minimization is achieved by bracketing and Brent quadratic interpolation (see Brent 1973, Press et al., 1991). It has been modified for the use in constrained optimization problems by the introduction of a penalty function with the high values outside the constraints interval.
The advantage of such approach if compared with traditional multistart, is the significant economy in function evaluations due to the following:
- the dangers brought by underclustering (see above) are significantly decreased;
- to process one cluster, ACCO requires significantly less function evaluations than one direct LS;
- ACCO moves the representatives of areas of attraction (potent points) much closer to potential minimizers - this allows to eliminate most "non-interesting" potent points from P1 and thus to significantly reduce the number of the necessary multistarts made;
- the fact that ACCO moves the starting potent points much closer to potential minimizers, allows for the faster identification of minimum by LS.
This strategy of search uses the idea of structural adaptation,
since it changes algortihm in the process of search. Our experiments have
shown high efficiency and effectiveness of such scheme since it combines
the efficiency of ACCO to identify minimum assessment reasonably quickly,
and high effectiveness of local search which, if started from the point
close to the minimum, reaches it with the high degree of accuracy.
In order to test the applicability of global optimization to problems
of models calibration, the prototype PC-based system GLOBE has been built
by the author. GLOBE can be configured to use an external program as a
supplier of the objective function values. The number of independent variables
and the constraints imposed on their values are supplied by the user in
the form of a simple text file. Fig. 2 shows the organization of the calibration
process. Model must be an executable module (program) which does
not require any user input, and the user has to supply two transfer programs
P1 and P2. These three programs (Model, P1,
P2) are activated from GLOBE in a loop.
The user interface includes several graphical windows displaying the progress of minimization in different coordinate planes projections. The parameters of the algorithms can be easily changed by the user. More advanced version of GLOBE for Windows is planned.
Currently, GLOBE includes several algorithms, of which three algorithms of randomized search has been chosen for comparison:
- CRS (controlled random search, by Price 1983) descibed above;
- genetic algorithm (GA) with the one-point crossover, and with the choice among the real-valued or binary (15 bit were used in experiments) coding, and among the normal, upward or downward adjacency mutation;
- adaptive cluster covering (ACCO-1), as described above.
- adaptive cluster covering with direct local search (ACCOL), as described above.
It should be mentioned, that comparison of algorithms proposed by different authors is always a difficult task, since so much depends on the details of implementation. Quite naturally, authors try to design the test suite in a way that would show the best features of the proposed algorithm. In spite of that, testing is always useful, since it gives general idea about algorithms' performance and may point out to classes of functions where this or that algorithm excels.
- the strategy of adaptive cluster covering (ACCO) is potentially efficient, effective and robust. If the fast evaluation of a minimum is needed, ACCO seems to be a preference in most cases;
- the combination of ACCO with the local search being started form the best point (that is, the version called ACCOL), has been obviously superior to other tested methods;
- each direct local search is normally quite expensive in terms of function evaluations, so if many direct local searches have to be started as a part of multistart strategy, the number of function evaluations increases correspondingly;
- CRS (controlled random search of Price 1983) was found to be less efficient than ACCO on most functions, but well-behaving on Rastrigin function.
- genetic algorithm has been found generally less efficient than ACCO(L) and CRS - it requires more function evaluations to reach the similar or even slightly lower accuracy. (Comparable values of the final accuracy could be attributed to the same degree of the variables discretization used).
Important characteristic of any global optimization algortihm is reliability (robusteness), that is low (or zero) number of failures of finding the global minimum. Our experience shows similar robustness (lack of failures to find minimum) of all tested algorithms, except the case of Rastrigin function. On this highly variable function with many optima with comparable values (see Fig. 4 and 5), GA and CRS (being strongly 'search oriented'), were always finding the global minimum, and ACCO (restricting the search every time by the current cluster interval) was converging to one of the nearest local optima approximately one time in five restarts. However, on average, with several restarts, ACCO was identifying the global minimum faster.
In our opinion, the identified relatively lower efficiency of GA and CRS with respect to ACCO, may be attributed to the strategy of search which is permanently oriented towards the whole function domain. with the high percentage of extra function evaluations.
Apart from that, the lower efficiency of GA can be attributed to the type of 'crossover' used (exchange of some of the parents' coordinates' values) which leads often to the appearance of 'offsprings' in the search space quite far from their highly fit parents, and hence normally with the lower fit; so, the fit gained by parents is not inherited by many offsprings.
Is this feature inherent to the whole class of evolutionary algorithms following the ideas of natural evolution, indeed quite appealing but highly redundant, or it is just the feature of the version of a canonical GA implemented in this study, has yet to be investigated. Efficiency of GAs can be increased by using other types of crossover, like intermediate recombination in evolutionary strategies. Examples given in Back and Schewefel 1993 show the higher efficiency of evolution strategies which reportedly better preserve from generation to generation the identified structure of the objective function.
Due to the relative freshness of the approach, the genetic algorithms are receiving a lot of attention, and are applied in many areas, e.g., on their applications in hydraulic research see Wang 1991, Babovic et al. 1994, Cieniawski 1995). It is necessary to mention, that practically in all areas where Gas are currently used, any other global optimization algorithm could be used.
If we turn to ACCO, its high efficiency can be explained by
such features as the randomized start, using clusters of prospective points,
and the adaptive strategy of subsequent randomized search. In our opinion,
this class of algorithms, especially in combination with the subsequent
local search (ACCOL) is very promising for solving wide class
of multi-extremum optimization problems.
1. The traditional single-extremum optimization methods are often inadequate to the complex properties of objective functions in many problems, in particular, parameter optimization problems. Test runs of several algorithms of randomized search using the global optimization tool GLOBE and the results of other studies have confirmed the usefulness of the global optimization methods in parameter optimization (model calibration).
2. Due to the high capacity of modern computers, the computationally intensive global search methods can be now more widely used in model calibration and other optimization problems. One of the technical obstacles here is the difficulty in implementing the calibration scheme of Fig.2 because many commercial and research modeling systems cannot run in `unattended' mode.
3. The presented experiments have shown:
- the attractive features of the proposed strategy of adaptive cluster covering (ACCO), compared with two other algorithms of randomized search - genetic algorithm (GA) and controlled random search (CRS);
- the usefulness of combining an efficient (fast) random search algorithm with the subsequent effective (accurate) one or several local searches. In the present study, a version of ACCO algorithm has been combined with the modified Powell-Brent method of direct non-derivative optimization - comprising the introduced ACCOL algorithm.
4. The choice between various methods of global optimization
may depend on the type of the problem, and more research is needed in
order to compare other reportedly efficient methods like simulated annealing,
evolution strategies and non-canonical genetic algorithms. Our experience
shows, that the best results can be achieved on the way of structural
adaptation, that is by switching in the process of search between
different algorithms.
Archetti, F. and Schoen, F. (1984). A Survey on the global optimization problem: general theory and computational approaches. Annals of Operations Research 1, J.C.Baltzer A.G. Publishing. pp. 87-110.
Babovic, V., Wu, Z. and Larsen L.C. (1994). Calibrating hydrodynamic models by means of simulated evolution. Proc., First Intern. Conference on Hydroinformatics, Delft, The Netherlands. Balkema, Rotterdam, 193-200.
Bekey G.A. and Masri S.F. (1983). Random search techniques for optimization of nonlinear systems with many parameters. Mathematics and Computers in Simulation, v. 25 (3), 210-213.
Boender C.G.E., Rinnoy Kan A.H.G. and Timmer G.T. (1985). A stochastic approach to global optimization. - In: K.Schittkowski, ed. Computational mathematical programming, NATO ASI Series, Vol. F15, Springer-Verlag, Berlin, pp. 291-308.
Bohachevski, I.O., Johnson, M.E., and Stein, M.L. (1986). Generalized simulated annealing for function optimization. Technometrics, 28, No. 3, 209-217.
Brent, R.P. (1973). Algorithms for Minimization without Derivatives. Prentice-Hall, Englewood-Cliffs, N.J., 195p.
Cieniawski, S.E, Eheart, J.W. and Ranjithan, S. (1995) Using genetic algorithms to solve a multiobjective groundwater monitoring problem. Water Resources Research, 31, No.2, 399-409.
De Jong, K.A., and Spears W.M. (1992). A formal analysis of the role of multi-point crossover in genetic algorithms. Annals of Mathematics and Artificial Intelligence 5, J.C.Baltzer A.G. Publishing. pp. 1-26.
Dixon, L.C.W. and Szego, G.P. (eds.) (1978). Towards global optimization. Amsterdam: North-Holland, 472p.
Duan, Q., Sorooshian, S., Gupta, V. (1992). Effective and efficient global optimization for conceptual rainfall-runoff models. Water Resources Research, 28, No.4, 1015-1031.
Goldberg, D.E. (1989). Genetic algorithms in search, optimization and machine learning. Reading, MA: Addison-Wesley.
Griewank, A.O. (1981). Generalized descent for global optimization. Journal of optimization theory and applications, vol. 34, No.1, pp. 11-39.
Hartigan, J. (1975). Cluster analysis. N.Y.: Wiley.
Jacobs, D.A.H. (1977). The state of the art in numerical analysis. London: Academic Press.
Michalewicz, Z. (1992). Genetic algorithms + data structures = evolution programs. Berlin: Springer.
Pinter J. (1986). Globally convergent methods for n-dimensional multiextremal optimization. Optimization, 17, No.2, 187-202.
Polak, E. (1971). Computational methods in optimization. N.Y.: Academic Press.
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. (1989). Numerical recipes. The art of scientific computing. Cambridge: Cambridge University Press, 702p.
Price W.L. (1983). Global optimization by controlled random search. Journal of Optimization Theory and Applications, vol. 40, p. 333-348.
Pronzato L., Walter E., Venot A. and Lebruchec J.-F. (1984). A general purpose global optimizer: implementation and applications. Mathematics and Computers in Simulation, v. 26, 412-422.
Rastrigin, L.A. (1974). Systems of extremal control. Moscow: Nauka, 630p. (in Russian).
Rosenbrock, H.H. (1960). An automatic mathod for finding the greatest or least value of a function. Computer J., 3, p. 175-184.
Schumer M.A. and Steiglitz K. (1968). Adaptive step size random search. IEEE Transactions AC, vol. AC-13, p.351.
Solomatine, D.P. (1995). The use of global random search methods for models calibration. Proc. XXVIth Congress of the International Association for Hydraulic Research, London, September 1995.
Sugawara, M. (1978). Automatic calibration of the tank Model. Intern. Symposium on Logistics and Benefits of Using Mathematical Models of Hydrologic and Water Resource Systems. Preprints. Italy, October 24-26. IIASA, Laxenburg, Austria.
Wang, Q.J. (1991). The genetic algorithm and its application to calibrating conceptual rainfall-runoff models. Water Resources Research, 27, No.9, 2467-2471.
Zhigljavsky, A. (1991). Theory of global random search. Dordrecht: Kluwer Academic Publishers, 341p.
Zielinski R. (1981). A stochastic estimate of the structure of multi-extremal problems. Mathematical programming, vol. 21, pp. 348-356.
The following functions were minimized using four algorithms.
Rosenbrok function (2-dimensional, in admissible domain
[-1.5,1.5][-0.5,1.5], with 1 minimum 0 at (1, 1)), see Rosenbrok 1960:
Hosaki function (2-dimensional, with two minima, a global one at (4, 2) and a local one at (1, 2))
Rastrigin function (2-dimensional, in admissible domain [-1,1][-1,1], with global minimum of 0.0 at (0, 0) and more than 50 local minima spread as a lattice around the global one), Rastrigin 1974, Dixon and Szego 1978:
This function appeared to be a hard test for optimization algortihms,
since it has multiple local minima with close values (see Fig. 4 and 5).
Ackley function (generalized variant, 30-dimensional version was used, in admissible domain [-30,30]30, with the global minimum 0 at (0,0)), see Schwefel 1993):
Six-hump camelback function (2-dimensional, in admissible domain [-1,2] × [-1,1], with three pairs of local minima, and global minimum 0 at (0.008983,-0.7126)), see Duan 1993:
Goldstein-Price function (2-dimensional, in admissible domain [-2,2]2, with four local minima and the global minimum 3.0 at a point (0,1), see Price 1983:
Shekel function (4-dimensional, in admissible domain [0,10]4, with the global minimum 0.0 at a point close to (4,4,4,4), and 10 local minima), see Dixon and Szego 1978, Price 1983, Duan et al. 1993:
Griewank function (10-dimensional, in admissible domain [-600,600]10, with several thousands of local minima and the global minimum 0 at the origin), see Griewank 1981, Duan 1993:
Discrepancy error of a conceptual rainfall-runoff hydrological
tank model (Sugawara, 1978) applied in study of a catchment
in Venezuela (8-dimensional).
The following plots show the comparative performance of four algorithms.
The last accurate point on the plot corresponding to ACCOL is produced
by the LS phase working in real-valued space.
Algorithms with the higher efficiency (less function evaluations) will
have the corresponding points closer to the ordinate axis; algorithms
with the higher effectiveness (providing function value closer to the
minimum) will have the corresponding points closer to the abscissa axis.
Among hydrologic models, the lumped conceptual rainfall-runoff (CRR) models, for example, are based upon the simpler algebraic and step-wise relations between various components of rainfall, precipitation-evaporation and the resulting runoff from the catchment. Such models are characterized by the following:
- physically sound structures and equations are used together with semi-empirical ones;
- all parameters and variables represent average values over the entire catchment;
- model parameters
cannot be assessed from field data but have to be obtained through calibration.
Often, such models represent the catchment as a system of inter-connected tanks with additional outlets; e.g., Sacramento (soil moisture accounting part of the National Weather Service River Forecast System, USA) with 17 parameters (see, e.g., Dung er al. 1992), NAM (part of MIKE-11 modelling package) with 11 parameters, and simpler version of Sugawara model (Sugawara 1978) with 8 parameters (see Fig.6; this particular version is used primarily for educational purposes). Inspite of the seeming simplicity of equations used in CRR models, they are, like hydrodynamic models, highly non-linear, and may contain discontinuities, resulting in discontinuous and non-differentiable functions used for calibration evaluation criteria, which are to be minimized (Dung et al. 1992).
Many authors argue that in practical calibration, apart from the minimization of the formal criteria for goodness of fit, there are some additional, "non-formal" conditions in determining goodness of fit, that must be taken into account:
(a) an agreement between average simulated and observed flows (i.e. good water balance)
(b) an agreement for the peak flows (volume, rate, timing) (c) an agreement for low peaks
(d) an overall agreement for hydrograph shape.

In our opinion, these conditions do not contradict the idea of automatic calibration at all - they can be also formalized and taken into account. It is true, that it is difficult to make these conditions part of an analytical formula, but in practice this is not necessary - these conditions are easily made part of the algorithm (program), that represents goodness of fit.
If a hydrologist has his or her own additional criteria used for determining the quality of the model, then an attemp could be made to include these criteria into the evaluation of the goodness of fit, or they can be applied in a non-formal manner - after the stage of automatic calibration.
Taking this all into account, we can specify certain assumptions concerning the properties of the calibration problem in a CRR model context:
1. The model is considered to be deterministic, so that there are no unobservable and/or stochastic processes influencing the results of the model run. It means that the results of a model run can be reproduced.
2. The model outputs and the observations are discretized in time and are represented as scalars - this eliminates the possible ambiguity of calibration of models with several outputs. For CRR models, runoff is considered the only output value.
3. The problem of calibration, or discrepancy minimization is considered as the optimization problem.
The performance of uncalibrated and calibrated models can be seen on Fig. 7 and 8 respectively.
D.P. Solomatine