6.2.5. correlate¶

Correlation functions for multi-channel cross-correlation of seismic data.

Various routines used mostly for testing, including links to a compiled routine using FFTW, a Numpy fft routine which uses bottleneck for normalisation and a compiled time-domain routine. These have varying levels of efficiency, both in terms of overall speed, and in memory usage. The time-domain is the most memory efficient but slowest routine (although fast for small cases of less than a few hundred correlations), the Numpy routine is fast, but memory inefficient due to a need to store large double-precision arrays for normalisation. The fftw compiled routine is faster and more memory efficient than the Numpy routine.

copyright:: EQcorrscan developers.
license:: GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)

6.2.5.1. Classes & Functions¶

`fftw_multi_normxcorr`	Use a C loop rather than a Python loop - in some cases this will be fast.
`fftw_normxcorr`	Normalised cross-correlation using the fftw library.
`numpy_normxcorr`	Compute the normalized cross-correlation using numpy and bottleneck.
`time_multi_normxcorr`	Compute cross-correlations in the time-domain using C routine.
`get_array_xcorr`	Get an normalized cross correlation function that takes arrays as inputs.
`get_stream_xcorr`	Return a function for performing normalized cross correlation on lists of streams.
`register_array_xcorr`	Decorator for registering correlation functions.

All functions within this module (and therefore all correlation functions used in EQcorrscan) are normalised cross-correlations, which follow the definition that Matlab uses for normxcorr2.

In the time-domain this is as follows: \[c(t) = \frac{\sum_{y}(a_{y} - \overline{a})(b_{t+y} - \overline{b_t})}{\sqrt{\sum_{y}(a_{y} - \overline{a})(a_{y} - \overline{a})\sum_{y}(b_{t+y} - \overline{b_t})(b_{t+y} - \overline{b_t})}}\]

where \(c(t)\) is the normalised cross-correlation at at time \(t\), \(a\) is the template window which has length \(y\), \(b\) is the continuous data, and \(\overline{a}\) is the mean of the template and \(\overline{b_t}\) is the local mean of the continuous data.

In practice the correlations only remove the mean of the template once, but we do remove the mean from the continuous data at every time-step. Without doing this (just removing the global mean), correlations are affected by jumps in average amplitude, which is most noticeable during aftershock sequences, or when using short templates.

In the frequency domain (functions eqcorrscan.utils.correlate.numpy_normxcorr(), eqcorrscan.utils.correlate.fftw_normxcorr(), fftw_multi_normxcorr()), correlations are computed using an equivalent method. All methods are tested against one-another to check for internal consistency, and checked against results computed using Matlab’s normxcorr2 function. These routines also give the same (within 0.00001) of openCV cross-correlations.

6.2.5.2. Selecting a correlation function¶

EQcorrscan strives to use sensible default algorithms for calculating correlation values, however, you may want to change how correlations are calculated to be more advantageous to your specific needs.

There are currently 3 different correlations functions currently included in EQcorrscan:

eqcorrscan.utils.correlate.numpy_normxcorr() known as “numpy”

eqcorrscan.utils.correlate.time_multi_normxcorr() known as “time_domain”

eqcorrscan.utils.correlate.fftw_normxcorr() known as “fftw”

Number 3 is the default.

A further time-domain correlation backend is available, see note below on using Fast Matched Filter within EQcorrscan.

6.2.5.3. Setting FFT length¶

For version 0.4.0 onwards, the “fftw” backend allows the user to pass an fft_len keyword to it. This will set the length of the FFT internally. Smaller FFT’s use less memory, but can be faster. Larger FFTs use more memory and can be slower. By default this is set to the minimum of 2 ** 13, or the next fast length of the sum of the template length and the stream length. #285 showed that 2 ** 13 was consistently fastest over a range of data shapes on an intel i7 with 8-threads. Powers of two are generally fastest.

6.2.5.4. Using Fast Matched Filter within EQcorrscan¶

Fast Matched Filter provides fast time-domain correlations for both CPU and GPU architectures. For massively multi-threaded environment this runs faster than the frequency-domain routines native to EQcorrscan (when more than 40 CPU cores, or an NVIDIA GPU card is available) with less memory consumption. Fast Matched Filter is now supported natively, and can be used as a backend by setting xcorr_func=”fmf”.

6.2.5.5. Switching which correlation function is used¶

You can switch between correlation functions using the xcorr_func parameter included in: - eqcorrscan.core.match_filter.match_filter(), - eqcorrscan.core.match_filter.Tribe.detect(), - eqcorrscan.core.match_filter.Template.detect()

by:

passing a string (eg “numpy”, “time_domain”, or “fftw”) or;
passing a function

for example:

>>> import obspy
>>> import numpy as np
>>> from eqcorrscan.utils.correlate import numpy_normxcorr, set_xcorr
>>> from eqcorrscan.core.match_filter import match_filter


>>> # generate some toy templates and stream
>>> random = np.random.RandomState(42)
>>> template = obspy.read()
>>> stream = obspy.read()
>>> for num, tr in enumerate(stream):  # iter stream and embed templates
...     data = tr.data
...     tr.data = random.randn(6000) * 5
...     tr.data[100: 100 + len(data)] = data

>>> # do correlation using numpy rather than fftw
>>> detections = match_filter(['1'], [template], stream, .5, 'absolute',
...                           1, False, xcorr_func='numpy')

>>> # do correlation using a custom function
>>> def custom_normxcorr(templates, stream, pads, *args, **kwargs):
...     # Just to keep example short call other xcorr function
...     return numpy_normxcorr(templates, stream, pads, *args, **kwargs)

>>> detections = match_filter(
...     ['1'], [template], stream, .5, 'absolute', 1, False,
...     xcorr_func=custom_normxcorr) 

You can also use the set_xcorr object (eqcorrscan.utils.correlate.set_xcorr) to change which correlation function is used. This can be done permanently or within the scope of a context manager:

>>> # change the default xcorr function for all code in the with block
>>> with set_xcorr(custom_normxcorr):
...     detections = match_filter(['1'], [template], stream, .5,
...                               'absolute', 1, False) 

>>> # permanently set the xcorr function (until the python kernel restarts)
>>> set_xcorr(custom_normxcorr)
default changed to custom_normxcorr
>>> detections = match_filter(['1'], [template], stream, .5, 'absolute',
...                           1, False) 
>>> set_xcorr.revert()  # change it back to the previous state

6.2.5.6. Notes on accuracy¶

To cope with floating-point rounding errors, correlations may not be calculated for data with low variance. If you see a warning saying: “Some correlations not computed, are there zeros in the data? If not consider increasing gain”, check whether your data have zeros, and if not, but have low, but real amplitudes, multiply your data by some value.

6.2.5.7. Warning¶

If data are padded with zeros prior to filtering then correlations may be incorrectly calculated where there are zeros. You should always pad after filtering. If you see warnings saying “Low variance, possible zeros, check result”, you should check that you have padded correctly and check that correlations haven’t been calculated when you don’t expect them.