correlate
---------
.. currentmodule:: eqcorrscan.utils.correlate
.. automodule:: eqcorrscan.utils.correlate
.. comment to end block
Classes & Functions
-------------------
.. autosummary::
:toctree: autogen
:nosignatures:
fftw_multi_normxcorr
fftw_normxcorr
numpy_normxcorr
time_multi_normxcorr
get_array_xcorr
get_stream_xcorr
register_array_xcorr
.. comment to end block
All functions within this module (and therefore all correlation functions used
in EQcorrscan) are normalised cross-correlations, which follow the definition
that Matlab uses for |normxcorr2|.
In the time-domain this is as follows
.. math::
c(t) = \frac{\sum_{y}(a_{y} - \overline{a})(b_{t+y} - \overline{b_t})}{\sqrt{\sum_{y}(a_{y} - \overline{a})(a_{y} - \overline{a})\sum_{y}(b_{t+y} - \overline{b_t})(b_{t+y} - \overline{b_t})}}
where :math:`c(t)` is the normalised cross-correlation at at time :math:`t`,
:math:`a` is the template window which has length :math:`y`, :math:`b` is the
continuous data, and :math:`\overline{a}` is the mean of the template and
:math:`\overline{b_t}` is the local mean of the continuous data.
In practice the correlations only remove the mean of the template once, but we do
remove the mean from the continuous data at every time-step. Without doing this
(just removing the global mean), correlations are affected by jumps in average
amplitude, which is most noticeable during aftershock sequences, or when using
short templates.
In the frequency domain (functions :func:`eqcorrscan.utils.correlate.numpy_normxcorr`,
:func:`eqcorrscan.utils.correlate.fftw_normxcorr`, :func:`fftw_multi_normxcorr`),
correlations are computed using an equivalent method. All methods are tested
against one-another to check for internal consistency, and checked against results
computed using Matlab's |normxcorr2| function. These routines also give the same
(within 0.00001) of openCV cross-correlations.
.. |normxcorr2| raw:: html
normxcorr2
Selecting a correlation function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
EQcorrscan strives to use sensible default algorithms for calculating
correlation values, however, you may want to change how correlations are
calculated to be more advantageous to your specific needs.
There are currently 3 different correlations functions currently included in EQcorrscan:
1. :func:`eqcorrscan.utils.correlate.numpy_normxcorr` known as "numpy"
2. :func:`eqcorrscan.utils.correlate.time_multi_normxcorr` known as "time_domain"
3. :func:`eqcorrscan.utils.correlate.fftw_normxcorr` known as "fftw"
Number 3 is the default.
A further time-domain correlation backend is available, see note below on using
Fast Matched Filter within EQcorrscan.
Setting FFT length
~~~~~~~~~~~~~~~~~~
For version 0.4.0 onwards, the "fftw" backend allows the user to pass an `fft_len`
keyword to it. This will set the length of the FFT internally. Smaller FFT's use
less memory, but can be faster. Larger FFTs use more memory and can be slower.
By default this is set to the minimum of 2 ** 13, or the next fast length of the
sum of the template length and the stream length. |#285| showed that 2 ** 13 was
consistently fastest over a range of data shapes on an intel i7 with 8-threads.
Powers of two are generally fastest.
.. |#285| raw:: html
#285
Using Fast Matched Filter within EQcorrscan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|FMF| provides fast time-domain correlations for both CPU and GPU
architectures. For massively multi-threaded environment this runs faster than
the frequency-domain routines native to EQcorrscan (when more than 40 CPU cores,
or an NVIDIA GPU card is available) with less memory consumption. |FMF| is now
supported natively, and can be used as a backend by setting `xcorr_func="fmf"`.
.. |FMF| raw:: html
Fast Matched Filter
Switching which correlation function is used
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can switch between correlation functions using the `xcorr_func` parameter
included in:
- :func:`eqcorrscan.core.match_filter.match_filter`,
- :meth:`eqcorrscan.core.match_filter.Tribe.detect`,
- :meth:`eqcorrscan.core.match_filter.Template.detect`
by:
1. passing a string (eg "numpy", "time_domain", or "fftw") or;
2. passing a function
for example:
.. code-block:: python
>>> import obspy
>>> import numpy as np
>>> from eqcorrscan.utils.correlate import numpy_normxcorr, set_xcorr
>>> from eqcorrscan.core.match_filter import match_filter
>>> # generate some toy templates and stream
>>> random = np.random.RandomState(42)
>>> template = obspy.read()
>>> stream = obspy.read()
>>> for num, tr in enumerate(stream): # iter stream and embed templates
... data = tr.data
... tr.data = random.randn(6000) * 5
... tr.data[100: 100 + len(data)] = data
>>> # do correlation using numpy rather than fftw
>>> detections = match_filter(['1'], [template], stream, .5, 'absolute',
... 1, False, xcorr_func='numpy')
>>> # do correlation using a custom function
>>> def custom_normxcorr(templates, stream, pads, *args, **kwargs):
... # Just to keep example short call other xcorr function
... return numpy_normxcorr(templates, stream, pads, *args, **kwargs)
>>> detections = match_filter(
... ['1'], [template], stream, .5, 'absolute', 1, False,
... xcorr_func=custom_normxcorr) # doctest:+ELLIPSIS
You can also use the set_xcorr object (eqcorrscan.utils.correlate.set_xcorr)
to change which correlation function is used. This can be done permanently
or within the scope of a context manager:
.. code-block:: python
>>> # change the default xcorr function for all code in the with block
>>> with set_xcorr(custom_normxcorr):
... detections = match_filter(['1'], [template], stream, .5,
... 'absolute', 1, False) # doctest:+ELLIPSIS
>>> # permanently set the xcorr function (until the python kernel restarts)
>>> set_xcorr(custom_normxcorr)
default changed to custom_normxcorr
>>> detections = match_filter(['1'], [template], stream, .5, 'absolute',
... 1, False) # doctest:+ELLIPSIS
>>> set_xcorr.revert() # change it back to the previous state
Notes on accuracy
~~~~~~~~~~~~~~~~~
To cope with floating-point rounding errors, correlations may not be
calculated for data with low variance. If you see a warning saying:
*"Some correlations not computed, are there zeros in the data? If not
consider increasing gain"*, check whether your data have zeros, and if not,
but have low, but real amplitudes, multiply your data by some value.
Warning
~~~~~~~
If data are padded with zeros prior to filtering then correlations may be
incorrectly calculated where there are zeros. You should always pad after
filtering. If you see warnings saying *"Low variance, possible zeros, check
result"*, you should check that you have padded correctly and check that
correlations haven't been calculated when you don't expect them.