core.match_filter.tribe

Functions for network matched-filter detection of seismic data.

Designed to cross-correlate templates generated by template_gen function with data and output the detections.

copyright:

EQcorrscan developers.

license:

GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)


class eqcorrscan.core.match_filter.tribe.Tribe(templates=None)[source]

Holder for multiple templates.

Parameters:

templates (List of Template) – The templates within the Tribe.

Methods

client_detect(client, starttime, endtime, ...)

Detect using a Tribe of templates within a continuous stream.

cluster(method, **kwargs)

Cluster the tribe.

construct(method, lowcut, highcut, ...[, ...])

Generate a Tribe of Templates.

copy()

Copy the Tribe.

detect(stream, threshold, threshold_type, ...)

Detect using a Tribe of templates within a continuous stream.

read(filename)

Read a tribe of templates from a tar formatted file.

remove(template)

Remove a template from the tribe.

select(template_name)

Select a particular template from the tribe.

sort()

Sort the tribe, sorts by template name.

write(filename[, compress, catalog_format])

Write the tribe to a file using tar archive formatting.

__init__(templates=None)[source]
client_detect(client, starttime, endtime, threshold, threshold_type, trig_int, plot=False, plotdir=None, min_gap=None, daylong=False, parallel_process=True, xcorr_func=None, concurrency=None, cores=None, concurrent_processing=False, ignore_length=False, ignore_bad_data=False, group_size=None, return_stream=False, full_peaks=False, save_progress=False, process_cores=None, retries=3, check_processing=True, **kwargs)[source]

Detect using a Tribe of templates within a continuous stream.

Parameters:
  • client (obspy.clients.*.Client) – Any obspy client (or client-like object) with a dataselect service.

  • starttime (obspy.core.UTCDateTime) – Start-time for detections.

  • endtime (obspy.core.UTCDateTime) – End-time for detections

  • threshold (float) – Threshold level, if using threshold_type=’MAD’ then this will be the multiple of the median absolute deviation.

  • threshold_type (str) – The type of threshold to be used, can be MAD, absolute or av_chan_corr. See Note on thresholding below.

  • trig_int (float) – Minimum gap between detections from one template in seconds. If multiple detections occur within trig_int of one-another, the one with the highest cross-correlation sum will be selected.

  • plot (bool) – Turn plotting on or off.

  • plotdir (str) – The path to save plots to. If plotdir=None (default) then the figure will be shown on screen.

  • min_gap (float) – Minimum gap allowed in data - use to remove traces with known issues

  • daylong (bool) – Set to True to use the eqcorrscan.utils.pre_processing.dayproc() routine, which preforms additional checks and is more efficient for day-long data over other methods.

  • parallel_process (bool) –

  • xcorr_func (str or callable) – A str of a registered xcorr function or a callable for implementing a custom xcorr function. For more information see: eqcorrscan.utils.correlate.register_array_xcorr()

  • concurrency (str) – The type of concurrency to apply to the xcorr function. Options are ‘multithread’, ‘multiprocess’, ‘concurrent’. For more details see eqcorrscan.utils.correlate.get_stream_xcorr()

  • cores (int) – Number of workers for processing and detection.

  • concurrent_processing (bool) – Whether to process steps in detection workflow concurrently or not. See https://github.com/eqcorrscan/EQcorrscan/pull/544 for benchmarking.

  • ignore_length (bool) – If using daylong=True, then dayproc will try check that the data are there for at least 80% of the day, if you don’t want this check (which will raise an error if too much data are missing) then set ignore_length=True. This is not recommended!

  • ignore_bad_data (bool) – If False (default), errors will be raised if data are excessively gappy or are mostly zeros. If True then no error will be raised, but an empty trace will be returned (and not used in detection).

  • group_size (int) – Maximum number of templates to run at once, use to reduce memory consumption, if unset will use all templates.

  • full_peaks (bool) – See eqcorrscan.utils.findpeaks.find_peaks2_short

  • save_progress (bool) – Whether to save the resulting party at every data step or not. Useful for long-running processes.

  • process_cores (int) – Number of processes to use for pre-processing (if different to cores).

  • return_stream (bool) – Whether to also output the stream downloaded, useful if you plan to use the stream for something else, e.g. lag_calc.

  • retries (int) – Number of attempts allowed for downloading - allows for transient server issues.

Returns:

eqcorrscan.core.match_filter.Party of Families of detections.

Note

When using the “fftw” correlation backend the length of the fft can be set. See eqcorrscan.utils.correlate for more info.

Note

Ensures that data overlap between loops, which will lead to no missed detections at data start-stop points (see note for eqcorrscan.core.match_filter.Tribe.detect() method). This will result in end-time not being strictly honoured, so detections may occur after the end-time set. This is because data must be run in the correct process-length.

Warning

Plotting within the match-filter routine uses the Agg backend with interactive plotting turned off. This is because the function is designed to work in bulk. If you wish to turn interactive plotting on you must import matplotlib in your script first, when you then import match_filter you will get the warning that this call to matplotlib has no effect, which will mean that match_filter has not changed the plotting behaviour.

Note

Thresholding:

MAD threshold is calculated as the:

\[threshold {\times} (median(abs(cccsum)))\]

where \(cccsum\) is the cross-correlation sum for a given template.

absolute threshold is a true absolute threshold based on the cccsum value.

av_chan_corr is based on the mean values of single-channel cross-correlations assuming all data are present as required for the template, e.g:

\[av\_chan\_corr\_thresh=threshold \times (cccsum / len(template))\]

where \(template\) is a single template from the input and the length is the number of channels within this template.

cluster(method, **kwargs)[source]

Cluster the tribe.

Cluster templates within a tribe: returns multiple tribes each of which could be stacked.

Parameters:

method (str) – Method of stacking, see eqcorrscan.utils.clustering

Returns:

List of tribes.

construct(method, lowcut, highcut, samp_rate, filt_order, length, prepick, swin='all', process_len=86400, all_horiz=False, delayed=True, plot=False, plotdir=None, min_snr=None, parallel=False, num_cores=False, skip_short_chans=False, save_progress=False, **kwargs)[source]

Generate a Tribe of Templates.

Parameters:
  • method (str) – Method of Tribe generation. Possible options are: from_client, from_meta_file. See below on the additional required arguments for each method.

  • lowcut (float) – Low cut (Hz), if set to None will not apply a lowcut

  • highcut (float) – High cut (Hz), if set to None will not apply a highcut.

  • samp_rate (float) – New sampling rate in Hz.

  • filt_order (int) – Filter level (number of corners).

  • length (float) – Length of template waveform in seconds.

  • prepick (float) – Pre-pick time in seconds

  • swin (str) – P, S, P_all, S_all or all, defaults to all: see note in eqcorrscan.core.template_gen.template_gen()

  • process_len (int) – Length of data in seconds to download and process.

  • all_horiz (bool) – To use both horizontal channels even if there is only a pick on one of them. Defaults to False.

  • delayed (bool) – If True, each channel will begin relative to it’s own pick-time, if set to False, each channel will begin at the same time.

  • plot (bool) – Plot templates or not.

  • plotdir (str) – The path to save plots to. If plotdir=None (default) then the figure will be shown on screen.

  • min_snr (float) – Minimum signal-to-noise ratio for a channel to be included in the template, where signal-to-noise ratio is calculated as the ratio of the maximum amplitude in the template window to the rms amplitude in the whole window given.

  • parallel (bool) – Whether to process data in parallel or not.

  • num_cores (int) – Number of cores to try and use, if False and parallel=True, will use either all your cores, or as many traces as in the data (whichever is smaller).

  • save_progress (bool) – Whether to save the resulting template set at every data step or not. Useful for long-running processes.

  • skip_short_chans (bool) – Whether to ignore channels that have insufficient length data or not. Useful when the quality of data is not known, e.g. when downloading old, possibly triggered data from a datacentre

Note

Method specific arguments:

  • from_client requires:
    param str client_id:

    string passable by obspy to generate Client, or any object with a get_waveforms method, including a Client instance.

    param obspy.core.event.Catalog catalog:

    Catalog of events to generate template for

    param float data_pad:

    Pad length for data-downloads in seconds

  • from_meta_file requires:
    param str meta_file:

    Path to obspy-readable event file, or an obspy Catalog

    param obspy.core.stream.Stream st:

    Stream containing waveform data for template. Note that this should be the same length of stream as you will use for the continuous detection, e.g. if you detect in day-long files, give this a day-long file!

    param bool process:

    Whether to process the data or not, defaults to True.

Note

Method: from_sac is not supported by Tribe.construct and must use Template.construct.

Note

Templates will be named according to their start-time.

copy()[source]

Copy the Tribe.

Example

>>> tribe_a = Tribe(templates=[Template(name='a')])
>>> tribe_b = tribe_a.copy()
>>> tribe_a == tribe_b
True
detect(stream, threshold, threshold_type, trig_int, plot=False, plotdir=None, daylong=False, parallel_process=True, xcorr_func=None, concurrency=None, cores=None, concurrent_processing=False, ignore_length=False, ignore_bad_data=False, group_size=None, overlap='calculate', full_peaks=False, save_progress=False, process_cores=None, pre_processed=False, check_processing=True, **kwargs)[source]

Detect using a Tribe of templates within a continuous stream.

Parameters:
  • stream (Queue or obspy.core.stream.Stream) – Queue of streams of continuous data to detect within using the Templates, or just the continuous data itself.

  • threshold (float) – Threshold level, if using threshold_type=’MAD’ then this will be the multiple of the median absolute deviation.

  • threshold_type (str) – The type of threshold to be used, can be MAD, absolute or av_chan_corr. See Note on thresholding below.

  • trig_int (float) – Minimum gap between detections from one template in seconds. If multiple detections occur within trig_int of one-another, the one with the highest cross-correlation sum will be selected.

  • plot (bool) – Turn plotting on or off.

  • plotdir (str) – The path to save plots to. If plotdir=None (default) then the figure will be shown on screen.

  • daylong (bool) – Set to True to use the eqcorrscan.utils.pre_processing.dayproc() routine, which preforms additional checks and is more efficient for day-long data over other methods.

  • parallel_process (bool) –

  • xcorr_func (str or callable) – A str of a registered xcorr function or a callable for implementing a custom xcorr function. For more information see: eqcorrscan.utils.correlate.register_array_xcorr()

  • concurrency (str) – The type of concurrency to apply to the xcorr function. Options are ‘multithread’, ‘multiprocess’, ‘concurrent’. For more details see eqcorrscan.utils.correlate.get_stream_xcorr()

  • cores (int) – Number of workers for processing and detection.

  • concurrent_processing (bool) – Whether to process steps in detection workflow concurrently or not. See https://github.com/eqcorrscan/EQcorrscan/pull/544 for benchmarking.

  • ignore_length (bool) – If using daylong=True, then dayproc will try check that the data are there for at least 80% of the day, if you don’t want this check (which will raise an error if too much data are missing) then set ignore_length=True. This is not recommended!

  • ignore_bad_data (bool) – If False (default), errors will be raised if data are excessively gappy or are mostly zeros. If True then no error will be raised, but an empty trace will be returned (and not used in detection).

  • group_size (int) – Maximum number of templates to run at once, use to reduce memory consumption, if unset will use all templates.

  • overlap (float) – Either None, “calculate” or a float of number of seconds to overlap detection streams by. This is to counter the effects of the delay-and-stack in calculating cross-correlation sums. Setting overlap = “calculate” will work out the appropriate overlap based on the maximum lags within templates.

  • full_peaks (bool) – See eqcorrscan.utils.findpeaks.find_peaks2_short

  • save_progress (bool) – Whether to save the resulting party at every data step or not. Useful for long-running processes.

  • process_cores (int) – Number of processes to use for pre-processing (if different to cores).

  • pre_processed (bool) – Whether the stream has been pre-processed or not to match the templates.

  • check_processing (bool) – Whether to check that all templates were processed the same.

Returns:

eqcorrscan.core.match_filter.Party of Families of detections.

Note

When using the “fftw” correlation backend the length of the fft can be set. See eqcorrscan.utils.correlate for more info.

Note

stream must not be pre-processed. If your data contain gaps you should NOT fill those gaps before using this method. The pre-process functions (called within) will fill the gaps internally prior to processing, process the data, then re-fill the gaps with zeros to ensure correlations are not incorrectly calculated within gaps. If your data have gaps you should pass a merged stream without the fill_value argument (e.g.: stream = stream.merge()).

Note

Data overlap:

Internally this routine shifts and trims the data according to the offsets in the template (e.g. if trace 2 starts 2 seconds after trace 1 in the template then the continuous data will be shifted by 2 seconds to align peak correlations prior to summing). Because of this, detections at the start and end of continuous data streams may be missed. The maximum time-period that might be missing detections is the maximum offset in the template.

To work around this, if you are conducting matched-filter detections through long-duration continuous data, we suggest using some overlap (a few seconds, on the order of the maximum offset in the templates) in the continuous data. You will then need to post-process the detections (which should be done anyway to remove duplicates). See below note for how overlap argument affects data internally if stream is longer than the processing length.

Note

If stream is longer than processing length, this routine will ensure that data overlap between loops, which will lead to no missed detections at data start-stop points (see above note). This will result in end-time not being strictly honoured, so detections may occur after the end-time set. This is because data must be run in the correct process-length.

Note

Thresholding:

MAD threshold is calculated as the:

\[threshold {\times} (median(abs(cccsum)))\]

where \(cccsum\) is the cross-correlation sum for a given template.

absolute threshold is a true absolute threshold based on the cccsum value.

av_chan_corr is based on the mean values of single-channel cross-correlations assuming all data are present as required for the template, e.g:

\[av\_chan\_corr\_thresh=threshold \times (cccsum / len(template))\]

where \(template\) is a single template from the input and the length is the number of channels within this template.

read(filename)[source]

Read a tribe of templates from a tar formatted file.

Parameters:

filename (str) – File to read templates from.

Example

>>> tribe = Tribe(templates=[Template(name='c', st=read())])
>>> tribe.write('test_tribe')
Tribe of 1 templates
>>> tribe_back = Tribe().read('test_tribe.tgz')
>>> tribe_back == tribe
True
>>> # This can also read pickled templates
>>> import pickle
>>> with open("test_tribe.pkl", "wb") as f:
...    pickle.dump(tribe, f)
>>> tribe_back = Tribe().read("test_tribe.pkl")
>>> tribe_back == tribe
True
remove(template)[source]

Remove a template from the tribe.

Parameters:

template (eqcorrscan.core.match_filter.Template) – Template to remove from tribe

Example

>>> tribe = Tribe(templates=[Template(name='c'), Template(name='b'),
...                          Template(name='a')])
>>> tribe.remove(tribe.templates[0])
Tribe of 2 templates
select(template_name)[source]

Select a particular template from the tribe.

Parameters:

template_name (str) – Template name to look-up

Returns:

Template

Example

>>> tribe = Tribe(templates=[Template(name='c'), Template(name='b'),
...                          Template(name='a')])
>>> tribe.select('b') 
Template b:
 0 channels;
 lowcut: None Hz;
 highcut: None Hz;
 sampling rate None Hz;
 filter order: None;
 process length: None s
sort()[source]

Sort the tribe, sorts by template name.

Example

>>> tribe = Tribe(templates=[Template(name='c'), Template(name='b'),
...                          Template(name='a')])
>>> tribe.sort()
Tribe of 3 templates
>>> tribe[0] 
Template a:
 0 channels;
 lowcut: None Hz;
 highcut: None Hz;
 sampling rate None Hz;
 filter order: None;
 process length: None s
write(filename, compress=True, catalog_format='QUAKEML')[source]

Write the tribe to a file using tar archive formatting.

Parameters:
  • filename (str) – Filename to write to, if it exists it will be appended to.

  • compress (bool) – Whether to compress the tar archive or not, if False then will just be files in a folder.

  • catalog_format (str) – What format to write the detection-catalog with. Only Nordic, SC3ML, QUAKEML are supported. Note that not all information is written for all formats (QUAKEML is the most complete, but is slow for IO).

Example

>>> tribe = Tribe(templates=[Template(name='c', st=read())])
>>> tribe.write('test_tribe')
Tribe of 1 templates
>>> tribe.write(
...    "this_wont_work.bob",
...    catalog_format="BOB") 
Traceback (most recent call last):
TypeError: BOB is not supported

Functions

eqcorrscan.core.match_filter.tribe.read_tribe(fname)[source]

Read a Tribe of templates from a tar archive.

Parameters:

fname – Filename to read from

Returns:

eqcorrscan.core.match_filter.Tribe