Periodicity artefacts

Next: Iterative multivariate surrogates Up: Fourier based surrogates Previous: Example: Southern oscillation index

Periodicity artefacts

The randomisation schemes discussed so far all base the quantification of linear correlations on the Fourier amplitudes of the data. Unfortunately, this is not exactly what we want. Remember that the autocorrelation structure given by

corresponds to the Fourier amplitudes only if the time series is one period of a sequence that repeats itself every N time steps. This is, however, not what we believe to be the case. Neither is it compatible with the null hypothesis. Conserving the Fourier amplitudes of the data means that the periodic auto-covariance function

is reproduced, rather than . This seemingly harmless difference can lead to serious artefacts in the surrogates, and, consequently, spurious rejections in a test. In particular, any mismatch between the beginning and the end of a time series poses problems, as discussed e.g. in Ref. [7]. In spectral estimation, problems caused by edge effects are dealt with by windowing and zero padding. None of these techniques have been successfully implemented for the phase randomisation of surrogates since they destroy the invertibility of the transform.

Figure: Effect of end point mismatch on Fourier based surrogates. Upper trace: 1500 iterates of . Lower trace: a surrogate sequence with the same Fourier amplitudes. Observe the additional ``crinkliness'' of the surrogate.

Let us illustrate the artefact generated by an end point mismatch with an example. In order to generate an effect that is large enough to be detected visually, consider 1500 iterates of the almost unstable AR(2) process, (upper trace of Fig. 6). The sequence is highly correlated and there is a rather big difference between the first and the last points. Upon periodic continuation, we see a jump between and . Such a jump has spectral power at all frequencies but with delicately tuned phases. In surrogate time series conserving the Fourier amplitudes, the phases are randomised and the spectral content of the jump is spread in time. In the surrogate sequence shown as the lower trace in Fig. 6, the additional spectral power is mainly visible as a high frequency component. It is quite clear that the difference between the data and such surrogates will be easily been picked up by, say, a nonlinear predictor, and can lead to spurious rejections of the null hypothesis.

Figure: Repair of end point mismatch by selecting a sub-sequence of length 1350 of the signal shown in Fig. 6 that has an almost perfect match of end points. The surrogate shows no spurious high frequency structure.

The problem of non-matching ends can often be overcome by choosing a sub-interval of the recording such that the end points do match as closely as possible [33]. The possibly remaining finite phase slip at the matching points usually is of lesser importance. It can become dominant, though, if the signal is otherwise rather smooth. As a systematic strategy, let us propose to measure the end point mismatch by

and the mismatch in the first derivative by

The fractions and give the contributions to the total power of the series of the mismatch of the end points and the first derivatives, respectively. For the series shown in Fig. 6, and the end effect dominates the high frequency end of the spectrum. By systematically going through shorter and shorter sub-sequences of the data, we find that a segment of 1350 points starting at sample 102 yields or an almost perfect match. That sequence is shown as the upper trace of Fig. 7, together with a surrogate (lower trace). The spurious ``crinkliness'' is removed.

In practical situations, the matching of end points is a simple and mostly sufficient precaution that should not be neglected. Let us mention that the SOI data discussed before is rather well behaved with little end-to-end mismatch (). Therefore we didn't have to worry about the periodicity artefact.

The only method that has been proposed so far that strictly implements rather than is given in Ref. [26] and will be discussed in detail in Sec. 5 below. The method is very accurate but also rather costly in terms of computer time. It should be used in cases of doubt and whenever a suitable sub-sequence cannot be found.

Next: Iterative multivariate surrogates Up: Fourier based surrogates Previous: Example: Southern oscillation index

Thomas Schreiber
Mon Aug 30 17:31:48 CEST 1999