The randomisation schemes discussed so far all base the quantification of
linear correlations on the Fourier amplitudes of the data. Unfortunately, this
is not exactly what we want. Remember that the autocorrelation structure given
by
corresponds to the Fourier amplitudes only if the time series is one
period of a sequence that repeats itself every N time steps. This is, however,
not what we believe to be the case. Neither is it compatible with the null
hypothesis. Conserving the Fourier amplitudes of the data means that the
periodic auto-covariance function
is reproduced, rather than . This seemingly harmless difference can
lead to serious artefacts in the surrogates, and, consequently, spurious
rejections in a test. In particular, any mismatch between the beginning and
the end of a time series poses problems, as discussed e.g. in
Ref. [7]. In spectral estimation, problems caused by edge
effects are dealt with by windowing and zero padding. None of these techniques
have been successfully implemented for the phase randomisation of surrogates
since they destroy the invertibility of the transform.
Figure: Effect of end point mismatch on Fourier based surrogates. Upper trace: 1500 iterates of . Lower trace: a surrogate sequence with the same Fourier amplitudes. Observe the additional ``crinkliness'' of the surrogate.
Let us illustrate the artefact generated by an end point mismatch with an example. In order to generate an effect that is large enough to be detected visually, consider 1500 iterates of the almost unstable AR(2) process, (upper trace of Fig. 6). The sequence is highly correlated and there is a rather big difference between the first and the last points. Upon periodic continuation, we see a jump between and . Such a jump has spectral power at all frequencies but with delicately tuned phases. In surrogate time series conserving the Fourier amplitudes, the phases are randomised and the spectral content of the jump is spread in time. In the surrogate sequence shown as the lower trace in Fig. 6, the additional spectral power is mainly visible as a high frequency component. It is quite clear that the difference between the data and such surrogates will be easily been picked up by, say, a nonlinear predictor, and can lead to spurious rejections of the null hypothesis.
Figure: Repair of end point mismatch by selecting a sub-sequence of length 1350 of the signal shown in Fig. 6 that has an almost perfect match of end points. The surrogate shows no spurious high frequency structure.
The problem of non-matching ends can often be overcome by choosing a
sub-interval of the recording such that the end points do match as closely as
possible [33]. The possibly remaining finite phase slip at the
matching points usually is of lesser importance. It can become dominant,
though, if the signal is otherwise rather smooth. As a systematic strategy,
let us propose to measure the end point mismatch by
and the mismatch in the first derivative by
The fractions and
give the contributions to the total power of the series of the mismatch of the
end points and the first derivatives, respectively. For the series shown in
Fig. 6, and the end effect
dominates the high frequency end of the spectrum. By systematically going
through shorter and shorter sub-sequences of the data, we find that a segment
of 1350 points starting at sample 102 yields or an almost perfect match. That sequence is shown as the
upper trace of Fig. 7, together with a surrogate (lower
trace). The spurious ``crinkliness'' is removed.
In practical situations, the matching of end points is a simple and mostly sufficient precaution that should not be neglected. Let us mention that the SOI data discussed before is rather well behaved with little end-to-end mismatch (). Therefore we didn't have to worry about the periodicity artefact.
The only method that has been proposed so far that strictly implements rather than is given in Ref. [26] and will be discussed in detail in Sec. 5 below. The method is very accurate but also rather costly in terms of computer time. It should be used in cases of doubt and whenever a suitable sub-sequence cannot be found.