Null hypotheses, constraints, and cost functions

Next: Computational issues of simulated Up: General constrained randomisation Previous: General constrained randomisation

Null hypotheses, constraints, and cost functions

As we have discussed previously, we will often have to specify a null hypothesis in terms of a complete set of observable properties of the data. Only in specific cases (e.g. the two point autocorrelation function), there is a one-to-one correspondence to a class of models (here the ARMA process). In any case, if denotes a surrogate time series, the constraints will most often be of (or can be brought into) the form

Such constraints can always be turned into a cost function
equation1066
The fact that has a global minimum when the constraints are fulfilled is unaffected by the choice of the weights and the order q of the average. The least squares or average is obtained at q=2, at q=1 and the maximum distance when . Geometric averaging is also possible (and can be formally obtained by taking the limit in a proper way). We have experimented with different choices of q but we haven't found a choice that is uniformly superior to others. It seems plausible to give either uniform weights or to enhance those constraints which are particularly difficult to fulfil. Again, conclusive empirical results are still lacking.

Consider as an example the constraint that the sample autocorrelation function of the surrogate (data rescaled to zero mean and unit variance) are the same as those of the data, . This is done by specifying zero discrepancy as a constraint . If the correlations decay fast, can be restricted, otherwise (the largest available lag). Thus, a possible cost function could read

Other choices of q and the weights are of course also possible.

In all the cases considered in this paper, one constraint will be that the surrogates take on the same values as the data but in different time order. This ensures that data and surrogates can equally likely be drawn from the same (unknown) single time probability distribution. This particular constraint is not included in the cost function but identically fulfilled by considering only permutations without replacement of the data for minimisation.

By introducing a cost function, we have turned a difficult nonlinear, high dimensional root finding problem (21) into a minimisation problem (22). This leads to extremely many false minima whence such a strategy is discouraged for general root finding problems [42]. Here, the situation is somewhat different since we need to solve Eq.(21) only over the set of all permutations of . Although this set is big, it is still discrete and powerful combinatorial minimisation algorithms are available that can deal with false minima very well. We choose to minimise among all permutations of the original time series using the method of simulated annealing. Configurations are updated by exchanging pairs in . The annealing scheme will decide which changes to accept and which to reject. With an appropriate cooling scheme, the annealing procedure can reach any desired accuracy. Apart from simulated annealing, genetic algorithms [35] have become very popular for this kind of problems and there is no reason why they couldn't be used for the present purpose as well.

Next: Computational issues of simulated Up: General constrained randomisation Previous: General constrained randomisation

Thomas Schreiber
Mon Aug 30 17:31:48 CEST 1999