
EQcorrscan¶
A Python package for the detection and analysis of repeating and near-repeating seismicity. EQcorrscan contains an efficient, multi-parallel, matched-filter detection routine (template-matching), as well as routines to implement subspace detection.
Code is stored on github, the development branches are here, or the latest stable release can be found here.
EQcorrscan uses Obspy bindings when reading and writing seismic data, and for handling most of the event metadata, which ensures that detections can be easily migrated between software.
Also within this package are:
- Correlation re-picking;
- Clustering routines for seismic data;
- Peak finding algorithm (basic);
- Stacking routines including phase-weighted stacking based on Thurber at al. (2014);
This package is written by the EQcorrscan developers, and is distributed under the LGPL GNU Licence, Copyright EQcorrscan developers 2017.
Citation¶
If you use this package in your work, please cite the following paper: Chamberlain, C. J., Hopp, C. J., Boese, C. M., Warren-Smith, E., Chambers, D., Chu, S. X., Michailos, K., Townend, J., EQcorrscan: Repeating and near-repeating earthquake detection and analysis in Python. Seismological Research Letters, 2017
Contents:¶
Introduction to the EQcorrscan package¶
Supported environments¶
We support Linux, OSX and Windows environments running Python 3.x.
Functionality¶
Within EQcorrscan you will find the core routines to generate templates (template_gen), compute cross-channel correlations from these templates (match_filter), generate cross-correlation corrected pick-times (lag_calc), and run subspace detection (subspace).
Running tests¶
You can run tests yourself locally to ensure that everything runs as you would expect in your environment. Although we try to ensure that these tests run smoothly on all supported environments (using the ci bots), if you do find any issues, please let us know on the github page.
To run the tests you will need to have pytest installed along with a couple of extras (pytest-pep8 and pytest-cov). These can be installed by pip:
pip install pytest pytest-pep8 pytest-cov
To test your installed version of EQcorrscan we provide a test-script. For version<=0.3.2 you should download the script and run it. In later versions this script is included in the package.
This test-script will download the test data and run the tests (you no longer have to clone the git repository). Just run (from anywhere):
test_eqcorrscan.py
Tests will take about half an hour to run (as of version 0.3.2) and will provide a coverage report at the end and notify you of any failures.
EQcorrscan installation¶
EQcorrscan is a Python package with C extensions. The C extensions in EQcorrscan have their own dependencies on compiled libraries. We heavily recommend installing EQcorrscan using conda because this will:
- make your life easier;
- separate your EQcorrscan install from your system Python, meaning you can experiment to your hearts-content without breaking your operating system (yay);
- ensure that compiled modules are compiled using the correct C-compiler against the correct libraries
If you do not have either a miniconda or anaconda installation you can follow the conda-install instructions.
If you do not already have a conda environment we recommend creating one with the following:
conda create -n eqcorrscan -c conda-forge eqcorrscan
source activate eqcorrscan
This will create an environment called eqcorrscan and install eqcorrscan and its dependancies in that environment.
If you already have a conda environment that you want to use then to install EQcorrscan you can simply run:
conda install -c conda-forge eqcorrscan
Installation without conda¶
Installing EQcorrscan without conda involves two steps:
- Installing fftw3 libraries;
- Installing python dependancies and EQcorrscan.
How you undertake the first step depends on your operating system and system package manager.
Non-Python dependencies–Ubuntu:¶
Prior to installing the python routines you will need to install the fftw library. On linux use apt (or your default package manager - note you may need sudo access):
apt-get install libfftw3-dev
Note that you will need to ensure you have the single-precision libraries of fftw3 (files named fftw3f…). On CentOS you can install the fftw-libs package.
Non-Python dependencies–OSX:¶
For MacOS/OS-X systems we have tested using homebrew and macports (fink options are available, but we haven’t tested them).
Homebrew¶
You will need a recent version of gcc (the homebrew gcc-4.9 port has issues with openMP). We have tested the following and found it to work (note that you may need to prepend sudo depending on your configuration):
brew install gcc6
brew install fftw
Then run the following to install EQcorrscan (note the need to select CC=gcc, you can install using clang, but you will need additional libraries for openmp support):
CC=gcc pip install eqcorrscan
MacPorts¶
The following has been tested and found to work (note that you may need to prepend sudo depending on your configuration):
- Install an up-to-date gcc (gcc is needed for openmp compatibility) - any gcc should work (>4), here we use gcc6 for example:
port install gcc6
- Install python from macports (tested for python35, but its up to you)
port install python35`
# optional: select python35 as default python for terminal:
port select --set python python35
- Install numpy and pip from macports:
port install py35-numpy py35-pip
# optional, select pip35 as default pip
port select --set pip pip35
Install fftw3 from source:
- Download - link to fftw 3.3.7, most recent as of 10/01/2018
- unzip/untar
- Run the following from within the expanded directory:
./configure --enable-threads --enable-float && make make install ./configure --enable-threads && make # Need both double and float precision files make install
Run: (if you didn’t run the port select –set pip pip35 command you will need to replace pip with pip35)
CC=gcc pip install eqcorrscan
Non-Python dependencies–Windows:¶
For Windows systems you should follow the instructions on the fftw-windows install page and use the pre-compiled dynamic libraries. These should be installed somewhere on your system path, or the install location added to your path. The correlation routines use openMP for parallel workflows, however, some aspects of this run into issues with version of MSVC < 10.0 (due to old C standards being used), as such, by default, the correlation routines are compiled as serial workflows on windows. If you have a need for this threading in windows please get in touch with the developers.
EQcorrscan install via pip:¶
Once you have installed fftw the EQcorrscan install should be as simple as:
pip install eqcorrscan
Installation from source¶
pip pulls the package from the PyPi package repository and runs the setup.py file. If instead you wish to install from source, download the package (either by cloning the git repository, or by downloading the source code) from github, change directory to the EQcorrscan directory and run:
python setup.py install
If this fails because the default compiler is clang you can run:
CC=gcc python setup.py install
Note though that this will compile EQcorrscan using a different compiler than used to build your Python, which may have unwanted effects, if you do this you MUST test you install using the instructions here: Running tests.
Using Intel’s MKL¶
For versions >= 0.3.0 EQcorrscan supports compilation against the Intel Math Kernel Libraries (MKL). This has shown speed ups compared to the standard FFTW library. To enable this you must install MKL before compiling EQcorrscan. MKL is available from most package managers (including conda). Once you have MKL installed you can follow the Installation from source section. Check that near the top of the install that the MKL libraries are found.
Notes¶
You may have issues with these installs if you don’t have numpy installed: but if you don’t have numpy installed then you have bigger issues…
If you plan to generate a grid of synthetic templates you will need to have grid csv files, which the authors have previously used NonLinLoc to generate. This is not provided here and should be sourced from NonLinLoc. This will provide the Grid2Time routine which is required to set-up a lag-time grid for your velocity model. You should read the NonLinLoc documentation for more information regarding how this process works and the input files you are required to give.
EQcorrscan FAQs¶
This is a developing list of frequently asked questions. If you find yourself experiencing a problem similar to one of these, then try the solution here first!
If your problem/question isn’t answered here then check the issues on the github page and, if there isn’t an issue related to your problem/question then open a new one and we will try to help.
Usage Questions¶
No output to terminal¶
EQcorrscan uses Logging to handle output. If you are seeing no output to screen you probably haven’t set up your logging settings. A simple way to do this is to run code like:
You can set different levels to get more or less output (DEBUG is the most output, then INFO, then WARNNG, then ERROR, then CRITICAL). You will need to run this before any calls to EQcorrscan’s functions to get logging output.
—
No correlations computed¶
Frequently the cause of no correlations being computed is that the SEED ID (network.station.location.channel) for your template do not match your continuous data. Check that they match, and try increasing the logging output (above) to help you find the issue.
—
Everything is done multiple times!¶
EQcorrscan uses multiprocessing under the hood, which will spawn multiple processes when called. To ensure that programs do not run multiple times you should always write your scripts with a form:
def main():
# Do the mahi
if __name__ == "__main__":
main()
See the multiprocessing note on “Safe importing of main module” for more info.
—
Making templates from SAC files¶
While there is support for making templates from SAC, it generally isn’t a good idea, unless you have SAC waveforms of the length that you want to correlate with. This is for two reasons: 1. Because templates and continuous data must be processed the same, including using the same length FFT, for correlations to be accurate. If you make your template from short data you will be forced to correlate with short chunks of data, which is not very efficient 2. The programming team are not SAC users, and do not understand the nuances of where picks can be saved in SAC headers.
Instead: you might find it better to convert your SAC picks into another obspy readable pick/event format. You can try EQcorrscan’s basic sac_utils to do this, but not everything will be retained.
Design Questions¶
Can I use different lengths for different channels in a template?¶
Not yet in EQcorrscan - we want this as well, but haven’t had time to implement it. If you want this then we would really appreciate the contribution! There are two main issues with this that require some thought: 1) How correlations are computed, and 2) how correlations are weighted in the correlation sum.
Why doesn’t EQcorrscan have a GUI?¶
This would be cool, and it would be great if someone wants to contribute this, however, the developers thus far have been focused on other things and don’t have unlimited time.
If you want this, and know how to program GUIs then please do contribute, it would open EQcorrscan up to more users, which would be great!
Why do you have a functional and object-oriented API?¶
Simply legacy, when Calum started writing EQcorrscan he didn’t know anything about classes. The functional API is retained so that old codes can still be run, but if you are starting from scratch please use the OO API where possible.
What’s new¶
Version 0.4.2¶
Add seed-ids to the _spike_test’s message.
utils.correlation - Cross-correlation normalisation errors no-longer raise an error - When “out-of-range” correlations occur a warning is given by the C-function
with details of what channel, what template and where in the data vector the issue occurred for the user to check their data.
- Out-of-range correlations are set to 0.0
- After extensive testing these errors have always been related to data issues within regions where correlations should not be computed (spikes, step artifacts due to incorrectly padding data gaps).
- USERS SHOULD BE CAREFUL TO CHECK THEIR DATA IF THEY SEE THESE WARNINGS
utils.mag_calc.amp_pick_event - Added option to output IASPEI standard amplitudes, with static amplification
of 1 (rather than 2080 as per Wood Anderson specs).
- Added filter_id and method_id to amplitudes to make these methods more traceable.
core.match_filter - Bug-fix - cope with data that are too short with ignore_bad_data=True.
This flag is generally not advised, but when used, may attempt to trim all data to zero length. The expected behaviour is to remove bad data and run with the remaining data.
Party: - decluster now accepts a hypocentral_separation argument. This allows
the inclusion of detections that occur close in time, but not in space. This is underwritten by a new findpeaks.decluster_dist_time function based on a new C-function.
Tribe: - Add monkey-patching for clients that do not have a get_waveforms_bulk
method for use in .client_detect. See issue #394.
utils.pre_processing - Only templates that need to be reshaped are reshaped now - this can be a lot
faster.
Version 0.4.1¶
- core.match_filter - BUG-FIX: Empty families are no longer run through lag-calc when using Party.lag_calc(). Previously this resulted in a “No matching data” error, see #341.
- core.template_gen - BUG-FIX: Fix bug where events were incorrectly associated with templates in Tribe().construct() if the given catalog contained events outside of the time-range of the stream. See issue #381 and PR #382.
- utils.catalog_to_dd - Added ability to turn off parallel processing (this is turned off by default now) for write_correlations - parallel processing for moderate to large datasets was copying far too much data and using lots of memory. This is a short-term fix - ideally we will move filtering and resampling to C functions with shared-memory parallelism and GIL releasing. See PR #374. - Moved parallelism for _compute_dt_correlations to the C functions to reduce memory overhead. Using a generator to construct sub-catalogs rather than making a list of lists in memory. See issue #361.
- utils.mag_calc: - amp_pick_event now works on a copy of the data by default - amp_pick_event uses the appropriate digital filter gain to correct the applied filter. See issue #376. - amp_pick_event rewritten for simplicity. - amp_pick_event now has simple synthetic tests for accuracy. - _sim_wa uses the full response information to correct to velocity this includes FIR filters (previously not used), and ensures that the wood-anderson poles (with a single zero) are correctly applied to velocity waveforms. - calc_max_curv is now computed using the non-cumulative distribution.
- Some problem solved in _match_filter_plot. Now it shows all new detections.
- Add plotdir to eqcorrscan.core.lag_calc.lag_calc function to save the images.
Version 0.4.0¶
Change resampling to use pyFFTW backend for FFT’s. This is an attempt to alleviate issue related to large-prime length transforms. This requires an additional dependency, but EQcorrscan already depends on FFTW itself (#316).
Refactor of catalog_to_dd functions (#322): - Speed-ups, using new correlation functions and better resource management - Removed enforcement of seisan, arguments are now standard obspy objects.
Add plotdir to lag-calc, template construction and matched-filter detection methods and functions (#330, #325).
Wholesale re-write of lag-calc function and methods. External interface is similar, but some arguments have been depreciated as they were unnecessary (#321). - This was done to make use of the new internal correlation functions which
are faster and more memory efficient.
- Party.lag_calc and Family.lag_calc now work in-place on the events in the grouping.
- Added relative_mags method to Party and Family; this can be called from lag-calc to avoid reprocessing data.
- Added lag_calc.xcorr_pick_family as a public facing API to implement correlation re-picking of a group of events.
Renamed utils.clustering.cross_chan_coherence to utils.clustering.cross_chan_correlation to better reflect what it actually does.
Add –no-mkl flag for setup.py to force the FFTW correlation routines not to compile against intels mkl. On NeSI systems mkl is currently causing issues.
BUG-FIX: eqcorrscan.utils.mag_calc.dist_calc calculated the long-way round the Earth when changing hemispheres. We now use the Haversine formula, which should give better results at short distances, and does not use a flat-Earth approximation, so is better suited to larger distances as well.
Add C-openmp parallel distance-clustering (speed-ups of ~100 times).
Allow option to not stack correlations in correlation functions.
Use compiled correlation functions for correlation clustering (speed-up).
Add time-clustering for catalogs and change how space-time cluster works so that it uses the time-clustering, rather than just throwing out events outside the time-range.
Changed all prints to calls to logging, as a result, debug is no longer an argument for function calls.
find-peaks replaced by compiled peak finding routine - more efficient both in memory and time #249 - approx 50x faster * Note that the results of the C-func and the Python functions are slightly
different. The C function (now the default) is more stable when peaks are small and close together (e.g. in noisy data).
multi-find peaks makes use of openMP parallelism for more efficient memory usage #249
enforce normalization of continuous data before correlation to avoid float32 overflow errors that result in correlation errors (see pr #292).
Add SEC-C style chunked cross-correlations. This is both faster and more memory efficient. This is now used by default with an fft length of 2 ** 13. This was found to be consistently the fastest length in testing. This can be changed by the user by passing the fft_len keyword argument. See PR #285.
Outer-loop parallelism has been disabled for all systems now. This was not useful in most situations and is hard to maintain.
Improved support for compilation on RedHat systems
Refactored match-filter into smaller files. Namespace remains the same. This was done to ease maintenance - the match_filter.py file had become massive and was slow to load and process in IDEs.
Refactored _prep_data_for_correlation to reduce looping for speed, now approximately six times faster than previously (minor speed-up) * Now explicitly doesn’t allow templates with different length traces -
previously this was ignored and templates with different length channels to other templates had their channels padded with zeros or trimmed.
Add skip_short_channels option to template generation. This allows users to provide data of unknown length and short channels will not be used, rather than generating an error. This is useful for downloading data from datacentres via the from_client method.
Remove pytest_namespace in conftest.py to support pytest 4.x
Add ignore_bad_data kwarg for all processing functions, if set to True (defaults to False for continuity) then any errors related to bad data at process-time will be supressed and empty traces returned. This is useful for downloading data from datacentres via the from_client method when data quality is not known.
Added relative amplitude measurements as utils.mag_calc.relative_amplitude (#306).
Added relative magnitude calculation using relative amplitudes weighted by correlations to utils.mag_calc.relative_magnitude.
Added relative_magnitudes argument to eqcorrscan.core.match_filter.party.Party.lag_calc to provide an in-flow way to compute relative magnitudes for detected events.
Events constructed from detections now include estimated origins alongside the picks. These origins are time-shifted versions of the template origin and should be used with caution. They are corrected for prepick (#308).
Picks in detection.event are now corrected for prepick if the template is given. This is now standard in all Tribe, Party and Family methods. Picks will not be corrected for prepick in match_filter (#308).
Fix #298 where the header was repeated in detection csv files. Also added a write_detections function to eqcorrscan.core.match_filter.detection to streamline writing detections.
Remove support for Python 2.7.
Add warning about unused data when using Tribe.detect methods with data that do not fit into chunks. Fixes #291.
Fix #179 when decimating for cccsum_hist in _match_filter_plot
utils.pre_processing now uses the .interpolate method rather than .resample to change the sampling rate of data. This is generally more stable and faster than resampling in the frequency domain, but will likely change the quality of correlations.
Removed depreciated template_gen functions and bright_lights and seismo_logs. See #315
—
Older Versions¶
Version 0.3.3¶
- Make test-script more stable - use the installed script for testing.
- Fix bug where set_xcorr as context manager did not correctly reset stream_xcorr methods.
- Correct test-script (test_eqcorrscan.py) to find paths properly.
- BUG-FIX in Party.decluster when detections made at exactly the same time the first, rather than the highest of these was taken.
- Catch one-sample difference in day properly in pre-processing.dayproc
- Shortproc now clips and pads to the correct length asserted by starttime and endtime.
- Bug-fix: Match-filter collection objects (Tribe, Party, Family) implemented addition (__add__) to alter the main object. Now the main object is left unchanged.
- Family.catalog is now an immutable property.
Version 0.3.2¶
- Implement reading Party objects from multiple files, including wildcard expansion. This will only read template information if it was not previously read in (which is a little more efficient).
- Allow reading of Party objects without reading the catalog files.
- Check quality of downloaded data in Tribe.client_detect() and remove it if it would otherwise result in errors.
- Add process_cores argument to Tribe.client_detect() and Tribe.detect() to provide a separate number of cores for processing and peak-finding - both functions are less memory efficient that fftw correlation and can result in memory errors if using lots of cores.
- Allow passing of cores_outer kwarg through to fftw correlate functions to control inner/outer thread numbers. If given, cores will define the number of inner-cores (used for parallel fft calculation) and cores_outer sets the number of channels to process in parallel (which results in increased memory usage).
- Allow Tribe and Party IO to use QUAKEML or SC3ML format for catalogs (NORDIC to come once obspy updates).
- Allow Party IO to not write detection catalogs if so desired, because writing and reading large catalogs can be slow.
- If detection-catalogs are not read in, then the detection events will be generated on the fly using Detection._calculate_event.
- BUG-FIX: When one template in a set of templates had a channel repeated, all detections had an extra, spurious pick in their event object. This should no-longer happen.
- Add select method to Party and Tribe to allow selection of a specific family/template.
- Add ability to “retry” downloading in Tribe.client_detect.
- Change behaviour of template_gen for data that are daylong, but do not start within 1 minute of a day-break - previous versions enforced padding to start and end at day-breaks, which led to zeros in the data and undesirable behaviour.
- BUG-FIX: Normalisation errors not properly passed back from internal fftw correlation functions, gaps not always properly handled during long-period trends - variance threshold is now raised, and Python checks for low-variance and applies gain to stabilise correlations if needed.
- Plotting functions are now tested and have a more consistent interface:
- All plotting functions accept the keyword arguments save, savefile, show, return_figure and title.
- All plotting functions return a figure.
- SVD_plot renamed to svd_plot
- Enforce pre-processing even when no filters or resampling is to be done to ensure gaps are properly processed (when called from Tribe.detect, Template.detect or Tribe.client_detect)
- BUG-FIX in Tribe.client_detect where data were processed from data
one sample too long resulting in minor differences in data processing
(due to difference in FFT length) and therefore minor differences
in resulting correlations (~0.07 per channel).
- Includes extra stability check in fftw_normxcorr which affects the last sample before a gap when that sample is near-zero.
- BUG-FIX: fftw correlation dot product was not thread-safe on some systems. The dot-product did not have the inner index protected as a private variable. This did not appear to cause issues for Linux with Python 3.x or Windows, but did cause issues for on Linux for Python 2.7 and Mac OS builds.
- KeyboardInterrupt (e.g. ctrl-c) should now be caught during python parallel processes.
- Stopped allowing outer-threading on OSX, clang openMP is not thread-safe for how we have this set-up. Inner threading is faster and more memory efficient anyway.
- Added testing script (test_eqcorrscan.py, which will be installed to your path on installation of EQcorrscan) that will download all the relevant data and run the tests on the installed package - no need to clone EQcorrscan to run tests!
Version 0.3.1¶
- Cleaned imports in utils modules
- Removed parallel checking loop in archive_read.
- Add better checks for timing in lag-calc functions (#207)
- Removed gap-threshold of twice the template length in Tribe.client_detect, see issue #224.
- Bug-fix: give multi_find_peaks a cores kwarg to limit thread usage.
- Check for the same value in a row in continuous data when computing correlations and zero resulting correlations where the whole window is the same value repeated (#224, #230).
- BUG-FIX: template generation from_client methods for swin=P_all or S_all now download all channels and return them (as they should). See #235 and #206
- Change from raising an error if data from a station are not long enough, to logging a critical warning and not using the station.
- Add ability to give multiple swin options as a list. Remains backwards compatible with single swin arguments.
- Add option to save_progress for long running Tribe methods. Files are written to temporary files local to the caller.
- Fix bug where if gaps overlapped the endtime set in pre_processing an error was raised - happened when downloading data with a deliberate pad at either end.
Version 0.3.0¶
- Compiled peak-finding routine written to speed-up peak-finding.
- Change default match-filter plotting to not decimate unless it has to.
- BUG-FIX: changed minimum variance for fftw correlation backend.
- Do not try to process when no processing needs to be done in core.match_filter._group_process.
- Length checking in core.match_filter._group_process done in samples rather than time.
- BUG-FIX: Fix bug where data lengths were not correct in match_filter.Tribe.detect when sampling time-stamps were inconsistent between channels, which previously resulted in error.
- BUG-FIX: Fix memory-leak in tribe.construct
- Add plotting options for plotting rate to Party.plot
- Add filtering detections by date as Party.filter
- BUG-FIX: Change method for Party.rethreshold: list.remove was not reliable.
- Add option full_peaks to detect methods to map to find_peaks.
- pre-processing (and match-filter object methods) are now gap-aware and will accept gappy traces and can return gappy traces. By default gaps are filled to maintain backwards compatibility. Note that the fftw correlation backend requires gaps to be padded with zeros.
- Removed sfile_utils This support for Nordic IO has been upgraded and moved to obspy for obspy version 1.1.0. All functions are there and many bugs have been fixed. This also means the removal of nordic-specific functions in EQcorrscan - the following functions have been removed: * template_gen.from_sfile * template_gen.from_contbase * mag_calc.amp_pick_sfile * mag_calc.pick_db All removed functions will error and tell you to use obspy.io.nordic.core. This now means that you can use obspy’s read_events to read in sfiles.
- Added P_all and S_all options to template generation functions to allow creation of multi-channel templates starting at the P and S times respectively.
- Refactored template_gen, all options are available via template_gen(method=…), and depreciation warnings are in place.
- Added some docs for converting older templates and detections into Template and Party objects.
Version 0.2.7¶
- Patch multi_corr.c to work with more versions of MSVC;
- Revert to using single-precision floats for correlations (as in previous, < 0.2.x versions) for memory efficiency.
Version 0.2.6¶
- Added the ability to change the correlation functions used in detection methods through the parameter xcorr_func of match_filter, Template.detect and Tribe.detect, or using the set_xcorr context manager in the utils.correlate module. Supported options are: * numpy * fftw * time-domain * or passing a function that implements the xcorr interface.
- Added the ability to change the concurrency strategy of xcorr functions using the paramter concurrency of match_filter, Template.detect and Tribe.detect. Supported options are: * None - for single-threaded execution in a single process * multithread - for multi-threaded execution * multiprocess- for multiprocess execution * concurrent - allows functions to describe their own preferred currency methods, defaults to multithread
- Change debug printing output, it should be a little quieter;
- Speed-up time-domain using a threaded C-routine - separate from frequency domain C-routines;
- Expose useful parallel options for all correlation routines;
- Expose cores argument for match-filter objects to allow limits to be placed on how much of your machine is used;
- Limit number of workers created during pre-processing to never be more than the number of traces in the stream being processed;
- Implement openMP parallelisation of cross-correlation sum routines - memory consumption reduced by using shared memory, and by computing the cross-correlation sums rather than individual channel cross-correlations. This also leads to a speed-up. This routine is the default concurrent correlation routine;
- Test examples in rst doc files to ensure they are up-to-date;
- Tests that were prone to timeout issues have been migrated to run on circleci to allow quick re-starting of fails not due to code errors
Version 0.2.5¶
- Fix bug with _group_process that resulted in stalled processes.
- Force NumPy version
- Support indexing of Tribe and Party objects by template-name.
- Add tests for lag-calc issue with preparing data
- Change internals of eqcorrscan.core.lag_calc._prepare_data to use a dictionary for delays, and to work correctly! Issues arose from not checking for masked data properly and not checking length properly.
- Fix bug in match_filter.match_filter when checking for equal length traces, length count was one sample too short.
Version 0.2.4¶
- Increase test coverage (edge-cases) in template_gen;
- Fix bug in template_gen.extract_from_stack for duplicate channels in template;
- Increase coverage somewhat in bright_lights, remove non-parallel option (previously only used for debugging in development);
- Increase test coverage in lag_calc;
- Speed-up tests for brightness;
- Increase test coverage for match_filter including testing io of detections;
- Increase subspace test coverage for edge cases;
- Speed-up catalog_to_dd_tests;
- Lag-calc will pick S-picks on channels ending E, N, 1 and 2, change from only picking on E and N before; warning added to docs;
- Add full tests for pre-processing;
- Run tests in parallel on ci, speed-up tests dramatically;
- Rename singular-value decomposition functions (with depreciation warnings);
- Rename SVD_moments to lower-case and add depreciation warning;
- Increase test coverage in utils.mag_calc;
- Add Template, Tribe, Family, Party objects and rename DETECTION to Detection * Template objects maintain meta-data associated with their creation to stream-line processing of data (e.g. reduce chance of using the wrong filters). * Template events have a detect method which takes unprocessed data and does the correct processing using the Template meta-data, and computes the matched-filter detections. * Tribe objects are containers for multiple Templates. * Tribe objects have a detect method which groups Templates with similar meta-data (processing information) and runs these templates in parallel through the matched-filter routine. Tribe.detect outputs a Party of Family objects. * The Party object is a container for many Family objects. * Family objects are containers for detections from the same Template. * Family and Party objects have a lag_calc method which computes the cross-correlation pick-refinements. * The upshot of this is that it is possible to, in one line, generate a Tribe of templates, compute their matched-filter detections, and generate cross-correlation pick refinements, which output Event objects, which can be written to a catalog: Tribe.construct(method, **kwargs).detect(st, **kwargs).lag_calc(**kwargs).write() * Added 25 tests for these methods. * Add parameters threshold_type and threshold_input to Detection class. Add support for legacy Detection objects via NaN and unset values.
- Removed support for obspy < 1.0.0
- Update / correct doc-strings in template-gen functions when describing processing parameters.
- Add warning message when removing channels from continuous data in match_filter;
- Add min_snr option for template generation routines, if the signal-to-noise ratio is below a user-defined threshold, the channel will not be used.
- Stop enforcing two-channel template channel names.
- Fix bug in detection_multiplot which didn’t allow streams with fewer traces than template;
- Update internals to custom C fftw-based correlation rather than openCV (Major change); * OpenCV has been removed as a dependancy; * eqcorrscan.core.match_filter.normxcorr2 now calls a compiled C routine; * Parallel workflows handled by openMP rather than Python Multiprocessing for matched-filter operations to allow better memory handling. * It is worth noting that we tried re-writing using SciPy internals which led to a significant speed-up, but with high memory costs, we ended up going with this option, which was the more difficult option, because it allows effective use on SLURM managed systems where python multiprocessing results in un-real memory spikes (issue #88).
Version 0.2.0-0.2.3¶
- See 0.2.4: these versions were not fully released while trying to get anaconda packages to build properly.
Version 0.1.6¶
- Fix bug introduced in version 0.1.5 for match_filter where looping through multiple templates did not correctly match image and template data: 0.1.5 fix did not work;
- Bug-fix in catalog_to_dd for events without magnitudes;
- Amend match-filter to not edit the list of template names in place. Previously, if a template was not used (due to no matching continuous data) then the name of the template was removed: this now copies the list of template_names internally and does not change the external list.
Version 0.1.5¶
- Migrate coverage to codecov;
- Fix bug introduced in version 0.1.5 for match_filter where looping through multiple templates did not correctly match image and template data.
Version 0.1.4¶
- Bug-fix in plot_repicked removed where data were not normalized properly;
- Bug-fix in lag_calc where data were missing in the continuous data fixed (this led to incorrect picks, major bug!);
- Output cross-channel correlation sum in lag-calc output;
- Add id to DETECTION objects, which is consistent with the events within DETECTION objects and catalog output, and used in lag_calc to allow linking of detections to catalog events;
- Add lots of logging and error messages to lag-calc to ensure user understands limits;
- Add error to day-proc to ensure user is aware of risks of padding;
- Change utils.pre_processing.process to accept different length of data enforcement, not just full day (allow for overlap in processing, which might be useful for reducing day start and end effects);
- Bug-fix in mag_calc.amp_pick_event, broke loop if data were missing;
- Lots of docs adjustment to sort order of doc-strings and hyper-links;
- Allow multiple uses of the same channel in templates (e.g. you can now use a template with two windows from the same channel, such as a P and an S);
- Add evaluation mode filter to utils.catalog_utils.filter_picks;
- Update subspace plot to work when detector is not partitioned;
- Make tests run a little faster;
- Add pep8 testing for all code.
Version 0.1.3¶
- Now testing on OSX (python 2.7 and 3.5) - also added linux python 3.4;
- Add lag-calculation and tests for it;
- Change how lag-calc does the trace splitting to reduce memory usage;
- Added pick-filtering utility to clean up tutorials;
- Change template generation function names for clarity (wrappers for depreciated names);
- Add more useful error messages when picks are not associated with waveforms;
- Add example plots for more plotting functions;
- Add subspace detector including docs and tutorial.
- Add delayed option to all template_gen functions, set to True by default which retains old behaviour.
Version 0.1.2¶
- Add handling for empty location information in sfiles;
- Added project setup script which creates a useful directory structure and copies a default match-filter script to the directory;
- Add archive reader helper for default script, and parameter classes and definitions for default script;
- Re-write history to make repository smaller, removed trash files that had been added carelessly;
- Now tested on appveyor, so able to be run on Windows;
- Added ability to read hypoDD/tomoDD phase files to obspy events;
- Added simple despiking algorithm - not ideal for correlation as spikes are interpolated around when found: eqcorrscan.utils.despike;
- Option to output catalog object from match_filter - this will become the default once we introduce meta-data to templates - currently the picks for events are the template trace start-times, which will be before the phase-pick by the lag defined in the template creation - also added event into detection class, so you can access the event info from the detections, or create a catalog from a list of detections;
- Add option to extract detections at run-time in match_filter.match_filter;
- Edited multi_event_singlechan to take a catalog with multiple picks, but requires you to specify the station and channel to plot;
- Add normalize option to stacking routines;
- Add tests for stacking - PWS test needs more checks;
- Add many examples to doc-strings, not complete though;
- Change docs to have one page per function.
- Python 3.5 testing underway, all tests pass, but only testing about 65% of codebase.
- Add io functions to match_filter to simplify detection handling including writing detections to catalog and to text file.
- Stricter match_filter testing to enforce exactly the same result with a variety of systems.
- Add hack to template_gen tutorial to fix differences in sorting between python 3.x and python 2.
- Added advanced network triggering routine from Konstantinos, allows different parameters for individual stations - note only uses recursive sta-lta triggering at the moment. Useful for template generations alongside pickers.
- Added magnitude of completeness and b-value calculators to utils.mag_calc
Version 0.1.1¶
- Cope with events not always having time_errors in them in eventtoSfile;
- Convert Quakeml depths from m to km;
- Multiple little fixes to make Sfile conversion play well with GeoNet QuakeML files;
- Add function to convert from obspy.core.inventory.station.Station to string format for Seisan STATION0.HYP file;
- Merged feature branch - hypoDD into develop, this provides mappings for the hypoDD location program, including generation of dt.cc files;
- Added tests for functions in catalog_to_dd;
- Implemented unittest tests;
- Changed name of EQcorrscan_plotting to plotting;
- Added depreciation warnings;
- Changed internal structure of pre-processing to aid long-term upkeep;
- Added warnings in docs for template_gen relating to template generation from set length files;
- Updated template_creation tutorial to use day-long data;
- Renamed Sfile_util to sfile_util, and functions there-in: will warn about name changes;
- Updated template plotting to include pick labels;
- Updated template_creation tutorial to download S-picks as well as P-picks;
- Update sfile_util to cope with many possible unfilled objects;
- Added sac_util to convert from sac headers to useful event information - note, does not convert all things, just origin and pick times;
- Added from_sac function to template_gen.
EQcorrscan tutorials¶
Welcome to EQcorrscan - this package is designed to compute earthquake detections using a paralleled matched-filter network cross-correlation routine, and analyse the results.
EQcorrscan is divided into two main sub-modules, the core and utils sub-modules. The core sub-module contains the main, high-level functions:
template_gen: | A series of routines to generate templates for match-filter detection from continuous or cut data, with pick-times either defined manually, or defined in event files; |
---|---|
match_filter: | The main matched-filter routines, this is split into several smaller functions to allow python-based parallel-processing; |
subspace: | Subspace detection routine based on Harris (2006). |
lag_calc: | Routines for calculating optimal lag-times for events detected by the match-filter routine, these lags can then be used to define new picks for high accuracy re-locations. |
Some other high-level functions are included in the utils sub-module and are documented here with tutorials:
mag_calc: | Simple local magnitude calculation and high-precision relative moment calculation using singular-value decomposition. |
---|---|
clustering: | Routines for clustering earthquakes based on a range of metircs using agglomorative clustering methods. |
The utils sub-module contains useful, but small functions. These functions are rarely cpu intensive, but perform vital operations, such as finding peaks in noisy data (findpeaks), converting a database to hypoDD formatted files and computing cross-correlations between detections for HypoDD (a double difference relocation software) (catalog_to_dd), calculating magnitudes (mag_calc), clustering detections (clustering), stacking detections (stacking), making plots (plotting), and processing seismic data in the same way repeatedly using Obspy’s functionality (pre_processing).
As of EQcorrscan v.0.4.0, output is controlled by the Logging module. This allows the user to decide now much output is needed (via the level option), and where output should be sent (either to a file, stdout, or both). By default, if no logging parameters are set before calling EQcorrscan functions only messages of WARNING or above are printed to stdout. To make things a little prettier you can begin your work using something like:
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")
This will output messages at level INFO and above (quite verbose), in messages like:
2018-06-25 22:48:17,113 eqcorrscan.utils.pre_processing INFO Working on: NZ.CPWZ.10.EHZ
For more options, see the Logging HOWTO guide.
The following is an expanding set of tutorials that should take you through some of the key functionality of the EQcorrscan package. The tutorials are a (slow) work in progress to convert to jupyter notebook. If you have anything you want to see in the tutorials please let us know on github.
Quick Start¶
The goal of this notebook is to provide you with a quick overview of some of the key features of EQcorrscan. From this you should be able to dig deeper into the API to learn some of the more advanced features.
Matched-filter earthquake detection¶
One of the main applications of EQcorrscan is to detect earthquakes or other seimic signals using matched-filters (aka template-matching). To undertake matched-filtering templates are required. Templates are simply processed waveforms. In EQcorrscan templates are realised as Template
objects, which contain the waveform data and meta-data associated with their creation. Groups of Template
objects are stored in Tribe
objects. To demonstrate this, we will create a Tribe
for the
Parkfield 2004 earthquake.
Before we begin we will set up logging - EQcorrscan provides textual output through the native Python logging library. The standard DEBUG
, INFO
, WARNING
, ERROR
and CRITICAL
levels are supported. For this we will set to ERROR
to reduce the output. When starting off you might want to set to INFO
, or if something is happening that you don’t expect, DEBUG
will give you much more output.
[1]:
import logging
logging.basicConfig(
level=logging.ERROR,
format="%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")
First we need a catalog of events. We will just use a few earthquakes from the NCEDC. We are also only going to used picks from the 20 most picked stations.
[2]:
%matplotlib inline
from obspy import UTCDateTime
from obspy.clients.fdsn import Client
from eqcorrscan.utils.catalog_utils import filter_picks
client = Client("NCEDC")
t1 = UTCDateTime(2004, 9, 28)
t2 = t1 + 86400
catalog = client.get_events(
starttime=t1, endtime=t2, minmagnitude=2.5, minlatitude=35.7, maxlatitude=36.1,
minlongitude=-120.6, maxlongitude=-120.2, includearrivals=True)
fig = catalog.plot(projection="local", resolution="h")
catalog = filter_picks(catalog=catalog, evaluation_mode="manual", top_n_picks=20)
/home/calumch/miniconda3/envs/conda_37/lib/python3.7/site-packages/obspy/imaging/maps.py:387: MatplotlibDeprecationWarning:
The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead.
width=width, height=height, ax=ax)
/home/calumch/miniconda3/envs/conda_37/lib/python3.7/site-packages/obspy/imaging/maps.py:435: MatplotlibDeprecationWarning:
The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead.
bmap.drawcountries(color="0.75")

When data are available via a Client
, you can generate templates directly from the client without having to download data yourself. However, if you have data locally you can pass that data. Have a look at the notebook on template creation for more details on other methods.
[3]:
from eqcorrscan import Tribe
tribe = Tribe().construct(
method="from_client", lowcut=4.0, highcut=15.0, samp_rate=50.0, length=6.0,
filt_order=4, prepick=0.5, client_id=client, catalog=catalog, data_pad=20.,
process_len=21600, min_snr=5.0, parallel=True)
print(tribe)
Tribe of 34 templates
This makes a tribe by:
- Downloading data for each event (if multiple events occur close in time they will share data;
- Detrending, filtering and resampling the data;
- Trimming the data around picks within each event in the catalog.
To make this Tribe
we used a few different arguments:
lowcut
: This controls the lower corner frequency of a filter in Hz. EQcorrscan natively uses Obspy’s butterworth filter functions. For more details, see the Obspy docs. This can optionally be set toNone
to not apply filtering at low frequencies.highcut
: This controls the upper corner frequency of a filter in Hz. It is applied in tandem with lowcut, or can be set toNone
to not apply filtering at high frequencies.samp_rate
: The desired sampling-rate of the templates in Hz. EQcorrscan requires that templates and continuous data be sampled at the same rate. In this case we are downsampling data from 100.0 Hz to 50.0 Hz. We do this to reduce memory-load. EQcorrscan uses Obspy’s resampling methodlength
: This is the length of the template waveform on each channel in seconds.filt_order
: This sets the number of corners/order of the filter applied usinghighcut
andlowcut
.prepick
: The number of seconds prior to a phase-pick to start a template waveform.client_id
: This is theClient
or client ID (e.g. our client id would be"NCEDC"
) from which to download the data from.catalog
: The catalog of events to generate templates for.data_pad
: A fudge-factor to cope with data centres not providing quite the data you asked for. This is the number of seconds extra to download to cope with this.process_len
: The length of data to download in seconds for each template. This should be the same as the length of your continuous data, e.g. if you data are in day-long files, this should be set to86400
.min_snr
: The minimum signal-to-noise ratio required for a channel to be included in a template. This provides a simple way of getting rid of poor-quality data from your templates.parallel
: Whether to process the data in parallel or not. This will increase memory load, if you find yourself running out of memory, setparallel=False
for these template construction methods.
Lets have a quick look at the attributes of a Template
:
[4]:
print(tribe[0])
fig = tribe[0].st.plot(equal_scale=False, size=(800, 600))
Template 2004_09_28t17_15_25:
20 channels;
lowcut: 4.0 Hz;
highcut: 15.0 Hz;
sampling rate 50.0 Hz;
filter order: 4;
process length: 21600.0 s

We can see that this template has data from multiple stations, and that our processing parameters are stored alongside the template waveform. This means that we can ensure that when we work on continuous data to run matched-filtering the data are processed in the same way.
Lets remove templates with fewer than five stations then use this tribe to detect earthquakes within the first few days of the Parkfield aftershock sequence:
[5]:
tribe.templates = [t for t in tribe if len({tr.stats.station for tr in t.st}) >= 5]
print(tribe)
party, st = tribe.client_detect(
client=client, starttime=t1, endtime=t1 + (86400 * 2), threshold=9.,
threshold_type="MAD", trig_int=2.0, plot=False, return_stream=True)
Tribe of 28 templates
This will:
- Download the data required;
- Process the data in the same way as the templates in the
Tribe
(note that your tribe can contain templates processed in different ways: these will be grouped into similarly processed templates and the matched-filter process will be run multiple times; once for each group of similar templates); - Cross-correlate the templates with the processed data and stack the correlations;
- Detect within each templates stacked cross-correlation array based on a given threshold;
- Generate
Detection
objects with Obspy events and meta-data
Again, we used a few arguments here:
client
: This again is the client from which to download data;starttime
: The start of the data we want to detect within;endtime
: The end of the data we want to detect within;threshold
: The threshold for detection: note that the correlation sum is both positive and negative valued. Detections are made based on the absolute of the correlation vector, so both postively and negatively correlated detections are made;threshold_type
: The type of threshold to use, in this case we used"MAD"
, the median absolute deviation, ourthreshold
is the MAD multiplier;trig_int
: The minimum time (in seconds) between triggers. The highest absolute correlation value will be used within this window if multiple possible triggers occur. Note that this only applies within the detections from one template.plotvar
: We turned plotting off in this case;return_stream
: Setting this toTrue
will return the pre-processed stream downloaded from the client, allowing us to reuse this stream for later processing.
Lets have a look at the cumulative detections we made.
[6]:
fig = party.plot(plot_grouped=True)

We were returned a Party
and a Stream
. The Party
is a container for Family
objects. Each Family
contains a Template
and all the detections associated with that Template
. Detections are stored as Detection
objects.
Lets have a look at the most productive family:
[7]:
family = sorted(party.families, key=lambda f: len(f))[-1]
print(family)
Family of 106 detections from template 2004_09_28t20_05_53
[8]:
fig = family.template.st.plot(equal_scale=False, size=(800, 600))

We can get a dictionary of streams for each detection in a Family
and look at some of those:
[9]:
streams = family.extract_streams(stream=st, length=10, prepick=2.5)
print(family.detections[0])
fig = streams[family.detections[0].id].plot(equal_scale=False, size=(800, 600))
Detection on template: 2004_09_28t20_05_53 at: 2004-09-28T17:24:03.440000Z with 20 channels: [('BAP', 'SHZ'), ('BCW', 'EHZ'), ('BJC', 'EHZ'), ('BPO', 'SHZ'), ('BSG', 'EHZ'), ('PAP', 'SHZ'), ('PBW', 'SHZ'), ('PCA', 'EHZ'), ('PGH', 'EHZ'), ('PHP', 'SHZ'), ('PHS', 'SHZ'), ('PJC', 'EHZ'), ('PMM', 'EHZ'), ('PMR', 'EHZ'), ('PPT', 'SHZ'), ('PSA', 'SHZ'), ('PSM', 'SHZ'), ('PSR', 'EHZ'), ('PST', 'EHZ'), ('PTA', 'SHZ')]

On some stations we can clearly see the earthquake that was detected. Remember that these data have not been filtered.
We can also undertake cross-correlation phase-picking using our templates. This is achieved using the lag_calc
method on Party
or Family
objects. You can also use the eqcorrscan.core.lag_calc
module directly on other catalogs.
We need to provide a merged stream to the lag-calc function. The stream we have obtained from tribe.detect
has overlaps in it. Using stream.merge()
would result in gaps at those overlaps, which we do not want. We use stream.merge(method=1)
to take include the real data in that gap.
[10]:
st = st.merge(method=1)
repicked_catalog = party.lag_calc(st, pre_processed=False, shift_len=0.5, min_cc=0.4)
/home/calumch/miniconda3/envs/conda_37/lib/python3.7/site-packages/obspy/signal/detrend.py:35: RuntimeWarning: invalid value encountered in true_divide
data -= x1 + np.arange(ndat) * (x2 - x1) / float(ndat - 1)
This returns a catalog with picks for each detected event. The arguments we have used are:
- The stream that we downloaded previously;
pre_processed
: Our data have not been processed, so we rely onlag_calc
to process our data - this means that our data will be processed in the same way as our templates were;shift_len
: This is the maximum allowed shift (positive and negative) of the pick-time relative to the template moveout in seconds.min_cc
: The minimum normalised cross-correlation value required to generate a pick. Picks are made on the maximum correlation within a window.
Lets look at one of the events that we have picked. The correlation value for the pick is stored in a comment:
[20]:
print(repicked_catalog[100].picks[0])
Pick
resource_id: ResourceIdentifier(id="smi:local/08139539-1fec-4707-b60e-b87f68ecfb9e")
time: UTCDateTime(2004, 9, 28, 18, 45, 22, 820000)
waveform_id: WaveformStreamID(network_code='NC', station_code='BPO', channel_code='SHZ', location_code='')
method_id: ResourceIdentifier(id="EQcorrscan")
phase_hint: 'P'
evaluation_mode: 'automatic'
creation_info: CreationInfo(agency_id='eqcorrscan.core.lag_calc')
---------
comments: 1 Elements
[21]:
print(repicked_catalog[100].picks[0].comments[0])
Comment(text='cc_max=0.439941', resource_id=ResourceIdentifier(id="smi:local/383b3803-a499-407b-bf1c-8d78ab4b73d1"))
Note:¶
Using long templates like this that include both P and S energy does not usually produce good correlation derived phase arrivals. This is simply a demonstration. You should carefully select your template length, and ideally include both P and S picks in the catalog that you use to make templates.
Explore the rest of the tutorials and API docs for further hints.
Matched-filter detection¶
This tutorial will cover using both the match-filter objects, and using the internal functions within match-filter. The match-filter objects are designed to simplify meta-data handling allowing for shorter code with fewer mistakes and therefore more consistent results.
Match-filter objects¶
The match-filter module contains five objects:
The Tribe object is a container for multiple Template objects. Templates contain the waveforms of the template alongside the metadata used to generate the template. Both Templates and Tribes can be written to disk as tar archives containing the waveform data in miniseed format, event catalogues associated with the Templates (if provided) in quakeml format and meta-data in a csv file. This archives can be read back in or transferred between machines.
The Detection, Family and Party objects are heirachical, a single Detection object describes a single event detection, and contains information regarding how the detection was made, what time it was made at alongside other useful information, it does not store the Template object used for the detection, but does store a reference to the name of the Template. Family objects are containers for multiple Detections made using a single Template (name chosen to match the literature). These objects do contain the Template used for the detections, and as such can be used to re-create the list of detections is necessary. Party objects are containers for multiple Family objects. All objects in the detection heirachy have read and write methods - we recommend writing to tar archives (default) for Party and Family objects, as this will store all metadata used in detection, which should allow for straightforward reproduction of results.
Template creation¶
Templates have a construct method which accesses the functions in template_gen. Template.construct only has access to methods that work on individual events, and not catalogs; for that use the Tribe.construct method. For example, we can use the from_sac method to make a Template from a series of SAC files associated with a single event:
>>> import glob
>>> from eqcorrscan.core.match_filter import Template
>>> import os
>>> from eqcorrscan import tests
>>> # Get the path for the test-data so we can test this
>>> TEST_PATH = os.path.dirname(tests.__file__)
>>> sac_files = glob.glob(TEST_PATH + '/test_data/SAC/2014p611252/*')
>>> # sac_files is now a list of all the SAC files for event id:2014p611252
>>> template = Template().construct(
... method='from_sac', name='test', lowcut=2.0, highcut=8.0,
... samp_rate=20.0, filt_order=4, prepick=0.1, swin='all',
... length=2.0, sac_files=sac_files)
Tribe creation¶
As eluded to above, Template.construct only works for individual events, to make a lot of templates we have to use the Tribe.construct method. The syntax is similar, but we don’t specify names - templates are named according to their start-time, but you can rename them later if you wish:
>>> from eqcorrscan.core.match_filter import Tribe
>>> from obspy.clients.fdsn import Client
>>> client = Client('NCEDC')
>>> catalog = client.get_events(eventid='72572665', includearrivals=True)
>>> # To speed the example we have a catalog of one event, but you can have
>>> # more, we are also only using the first five picks, again to speed the
>>> # example.
>>> catalog[0].picks = catalog[0].picks[0:5]
>>> tribe = Tribe().construct(
... method='from_client', catalog=catalog, client_id='NCEDC', lowcut=2.0,
... highcut=8.0, samp_rate=20.0, filt_order=4, length=6.0, prepick=0.1,
... swin='all', process_len=3600, all_horiz=True)
Matched-filter detection using a Tribe¶
Both Tribe and Template objects have detect methods. These methods call the main match_filter function. They can be given an un-processed stream and will complete the appropriate processing using the same processing values stored in the Template objects. Because Tribe objects can contain Templates with a range of processing values, this work is completed in groups for groups of Templates with the same processing values. The Tribe object also has a client_detect method which will download the appropriate data. Both detect and client_detect methods return Party objects.
For example, we can use the Tribe we created above to detect through a day of data by running the following:
>>> from obspy import UTCDateTime
>>> party, stream = tribe.client_detect(
... client=client, starttime=UTCDateTime(2016, 1, 2),
... endtime=UTCDateTime(2016, 1, 3), threshold=8, threshold_type='MAD',
... trig_int=6, plotvar=False, return_stream=True)
Generating a Party from a Detection csv¶
If you are moving from detections written out as a csv file from an older version of EQcorrscan, but want to use Party objects now, then this section is for you!
First, you need to generate a Tribe from the templates you used to make the detections. Instructions for this are in the Template creation tutorial section.
Once you have a Tribe, you can generate a Party using the following:
>>> detections = read_detections(detection_file)
>>> party = Party()
>>> for template in tribe:
... template_detections = [d for d in detections
... if d.template_name == template.name]
... family = Family(template=template, detections=template_detections)
... party += family
Lag-calc using a Party¶
Because parties contain Detection and Template information they can be used to generate re-picked catalogues using lag-calc:
>>> stream = stream.merge().sort(['station'])
>>> repicked_catalog = party.lag_calc(stream, pre_processed=False,
... shift_len=0.2, min_cc=0.4)
By using the above examples you can go from a standard catalog available from data centers, to a matched-filter detected and cross-correlation repicked catalog in a handful of lines.
Simple example - match-filter.match-filter¶
This example does not work out of the box, you will have to have your own templates and data, and set things up for this. However, in principle matched-filtering can be as simple as:
from eqcorrscan.core.match_filter import match_filter
from eqcorrscan.utils import pre_processing
from obspy import read
# Read in and process the daylong data
st = read('continuous_data')
# Use the same filtering and sampling parameters as your template!
st = pre_processing.dayproc(
st, lowcut=2, highcut=10, filt_order=4, samp_rate=50,
starttime=st[0].stats.starttime.date)
# Read in the templates
templates = []
template_names = ['template_1', 'template_2']
for template_file in template_names:
templates.append(read(template_file))
detections = match_filter(
template_names=template_names, template_list=templates, st=st,
threshold=8, threshold_type='MAD', trig_int=6, plotvar=False, cores=4)
This will create a list of detections, which are of class detection. You can write out the detections to a csv (colon separated) using the detection.write method, set append=True to write all the detections to one file. Beware though, if this is set and the file already exists, it will just add on to the old file.
for detection in detections:
detection.write('my_first_detections.csv', append=True)
Data gaps and how to handle them¶
Data containing gaps can prove problematic for normalized cross-correlation. Because the correlations are normalized by the standard deviation of the data, if the standard deviation is low, floating-point rounding errors can occur. EQcorrscan tries to avoid this in two ways:
- In the ‘eqcorrscan.utils.correlate` (fftw) functions, correlations are not computed when the variance of the data window is less than 1e-10, or when there are fewer than template_len - 1 non-flat data values (e.g. at-least one sample that is not in a gap), or when the mean of the data multiplied by the standard deviation of the data is less than 1e-10.
- The pre_processing functions fill gaps prior to processing, process the data, then edit the data within the gaps to be zeros. During processing aliased signal will appear in the gaps, so it is important to remove those artifacts to ensure that gaps contain zeros (which will be correctly identified by the correlate functions.
As a caveat of point 1: if your data have very low variance, but real data, your data will be artificially gained by pre_processing to ensure stable correlations.
If you provide data with filled gaps (e.g. you used st = st.merge(fill_value=0) to either:
Then you will end up with the wrong result from the correlation or match_filter functions. You should provide data with gaps maintained, but merged (e.g. run st = st.merge() before passing the data to those functions).
If you have data that you know contains gaps that have been padded you must remove the pads and reinstate the gaps.
Memory limitations and what to do about it¶
You may (if you are running large numbers of templates, long data durations, or using a machine with small memory) run in to errors to do with memory consumption. The most obvious symptom of this is your computer freezing because it has allocated all of its RAM, or declaring that it cannot allocate memory. Because EQcorrscan computes correlations in parallel for multiple templates for the same data period, it will generate a large number of correlation vectors. At start-up, EQcorrscan will try to assign the memory it needs (although it then requires a little more later to do the summation across channels), so you might find that it fills your memory very early - this is just to increase efficiency and ensure that the memory is available when needed.
To get around memory limitations you can:
- Reduce the number of templates you run in parallel at once - for example you can make groups of a number of templates and run that group in parallel, before running the next group in parallel. This is not much less efficient, unless you have a machine with more CPU cores than your group-size.
- Reduce the length of data you are correlating at any one time. The default is to use day-long files, but there is nothing stopping you using shorter waveform durations.
- Reduce the number of channels in templates to only those that you need. Note, EQcorrscan will generate vectors of zeros for templates that are missing a channel that is present in other templates, again for processing efficiency, if not memory efficiency.
- Reduce your sampling rate. Obviously this needs to be at-least twice as large as your upper frequency filter, but much above this is wasted data.
The three threshold parameters¶
EQcorrscan detects both positively and negatively correlated waveforms. The match-filter routine has three key threshold parameters:
- threshold_type can either be MAD, abs or av_chan_corr. MAD stands for Median Absolute Deviation and is the most commonly used detection statistic in matched-filter studies. abs is the absolute cross-channel correlation sum, note that if you have different numbers of channels in your templates then this threshold metric probably isn’t for you. av_chan_corr sets a threshold in the cross-channel correlation sum based on av_chan_corr x number of channels.
- threshold is the value used for the above metric.
- trig_int is the minimum interval in seconds for a detection using the same template. If there are multiple detections within this window for a single template then EQcorrscan will only give the best one (that exceeds the threshold the most).
Advanced example - match-filter-match-filter¶
In this section we will outline using the templates generated in the first tutorial to scan for similar earthquakes within a day of data. This small example does not truly exploit the parallel operations within this package however, so you would be encouraged to think about where parallel operations occur (hint, the code can run one template per CPU), and why there are –instance and–splits flags in the other scripts in the github repository (hint, if you have heaps of memory and CPUs you can do some brute force day parallelisation!).
The main processing flow is outlined in the figure below, note the main speedups in this process are achieved by running multiple templates at once, however this increases memory usage. If memory is a problem there are flags (mem_issue) in the match_filter.py source that can be turned on - the codes will then write temporary files, which is slower, but can allow for more data crunching at once, your trade-off, your call.

"""
Simple tutorial to demonstrate some of the basic capabilities of the EQcorrscan
matched-filter detection routine. This builds on the template generation
tutorial and uses those templates. If you haven't run that tutorial script
then you will need to before you can run this script.
"""
import glob
import logging
from http.client import IncompleteRead
from multiprocessing import cpu_count
from obspy.clients.fdsn import Client
from obspy import UTCDateTime, Stream, read
from eqcorrscan.utils import pre_processing
from eqcorrscan.utils import plotting
from eqcorrscan.core import match_filter
# Set up logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")
def run_tutorial(plot=False, process_len=3600, num_cores=cpu_count(),
**kwargs):
"""Main function to run the tutorial dataset."""
# First we want to load our templates
template_names = glob.glob('tutorial_template_*.ms')
if len(template_names) == 0:
raise IOError('Template files not found, have you run the template ' +
'creation tutorial?')
templates = [read(template_name) for template_name in template_names]
# Work out what stations we have and get the data for them
stations = []
for template in templates:
for tr in template:
stations.append((tr.stats.station, tr.stats.channel))
# Get a unique list of stations
stations = list(set(stations))
# We will loop through the data chunks at a time, these chunks can be any
# size, in general we have used 1 day as our standard, but this can be
# as short as five minutes (for MAD thresholds) or shorter for other
# threshold metrics. However the chunk size should be the same as your
# template process_len.
# You should test different parameters!!!
start_time = UTCDateTime(2016, 1, 4)
end_time = UTCDateTime(2016, 1, 5)
chunks = []
chunk_start = start_time
while chunk_start < end_time:
chunk_end = chunk_start + process_len
if chunk_end > end_time:
chunk_end = end_time
chunks.append((chunk_start, chunk_end))
chunk_start += process_len
unique_detections = []
# Set up a client to access the GeoNet database
client = Client("GEONET")
# Note that these chunks do not rely on each other, and could be paralleled
# on multiple nodes of a distributed cluster, see the SLURM tutorial for
# an example of this.
for t1, t2 in chunks:
# Generate the bulk information to query the GeoNet database
bulk_info = []
for station in stations:
bulk_info.append(('NZ', station[0], '*',
station[1][0] + 'H' + station[1][-1], t1, t2))
# Note this will take a little while.
print('Downloading seismic data, this may take a while')
st = Stream()
for _bulk in bulk_info:
try:
st += client.get_waveforms(*_bulk)
except IncompleteRead:
print(f"Could not download {_bulk}")
# Merge the stream, it will be downloaded in chunks
st.merge()
# Pre-process the data to set frequency band and sampling rate
# Note that this is, and MUST BE the same as the parameters used for
# the template creation.
print('Processing the seismic data')
st = pre_processing.shortproc(
st, lowcut=2.0, highcut=9.0, filt_order=4, samp_rate=20.0,
num_cores=num_cores, starttime=t1, endtime=t2)
# Convert from list to stream
st = Stream(st)
# Now we can conduct the matched-filter detection
detections = match_filter.match_filter(
template_names=template_names, template_list=templates,
st=st, threshold=8.0, threshold_type='MAD', trig_int=6.0,
plotvar=plot, plotdir='.', cores=num_cores,
plot_format='png', **kwargs)
# Now lets try and work out how many unique events we have just to
# compare with the GeoNet catalog of 20 events on this day in this
# sequence
for master in detections:
keep = True
for slave in detections:
if not master == slave and abs(master.detect_time -
slave.detect_time) <= 1.0:
# If the events are within 1s of each other then test which
# was the 'best' match, strongest detection
if not master.detect_val > slave.detect_val:
keep = False
print('Removed detection at %s with cccsum %s'
% (master.detect_time, master.detect_val))
print('Keeping detection at %s with cccsum %s'
% (slave.detect_time, slave.detect_val))
break
if keep:
unique_detections.append(master)
print('Detection at :' + str(master.detect_time) +
' for template ' + master.template_name +
' with a cross-correlation sum of: ' +
str(master.detect_val))
# We can plot these too
if plot:
stplot = st.copy()
template = templates[template_names.index(
master.template_name)]
lags = sorted([tr.stats.starttime for tr in template])
maxlag = lags[-1] - lags[0]
stplot.trim(starttime=master.detect_time - 10,
endtime=master.detect_time + maxlag + 10)
plotting.detection_multiplot(
stplot, template, [master.detect_time.datetime])
print('We made a total of ' + str(len(unique_detections)) + ' detections')
return unique_detections
if __name__ == '__main__':
run_tutorial()
SLURM example¶
When the authors of EQcorrscan work on large projects, we use grid computers with the SLURM (Simple Linux Utility for Resource Management) job scheduler installed. To facilitate ease of setup, what follows is an example of how we run this.
#!/bin/bash
#SBATCH -J MatchTest
#SBATCH -A ##########
#SBATCH --time=12:00:00
#SBATCH --mem=7G
#SBATCH --nodes=1
#SBATCH --output=matchout_%a.txt
#SBATCH --error=matcherr_%a.txt
#SBATCH --cpus-per-task=16
#SBATCH --array=0-49
# Load the required modules here.
module load OpenCV/2.4.9-intel-2015a
module load ObsPy/0.10.3rc1-intel-2015a-Python-2.7.9
module load joblib/0.8.4-intel-2015a-Python-2.7.9
# Run your python script using srun
srun python2.7 LFEsearch.py --splits 50 --instance $SLURM_ARRAY_TASK_ID
Where we use a script (LFEsearch.py) that accepts splits and instance flags, this section of the script is as follows:
Split=False
instance=False
if len(sys.argv) == 2:
flag=str(sys.argv[1])
if flag == '--debug':
Test=True
Prep=False
elif flag == '--debug-prep':
Test=False
Prep=True
else:
raise ValueError("I don't recognise the argument, I only know --debug and --debug-prep")
elif len(sys.argv) == 5:
# Arguments to allow the code to be run in multiple instances
Split=True
Test=False
Prep=False
args=sys.argv[1:len(sys.argv)]
for i in xrange(len(args)):
if args[i] == '--instance':
instance=int(args[i+1])
print 'I will run this for instance '+str(instance)
elif args[i] == '--splits':
splits=int(args[i+1])
print 'I will divide the days into '+str(splits)+' chunks'
elif not len(sys.argv) == 1:
raise ValueError("I only take one argument, no arguments, or two flags with arguments")
else:
Test=False
Prep=False
Split=False
The full script is not included in EQcorrscan, but is available on request.
Template creation (old API)¶
Note: These tutorials are for the functional workflow - for details on creating Template and Tribe objects see the matched filter docs page.
Simple example¶
Within template_gen there are lots of methods for generating templates depending on the type of pick data and waveform data you are using. Have a look at those to check their simple examples.
Example of how to generate a useful template from data available via FDSN (see obspy.clients.fdsn for a list of possible clients):
>>> from obspy.clients.fdsn import Client
>>> from obspy.core.event import Catalog
>>> from eqcorrscan.core.template_gen import template_gen
>>> client = Client('NCEDC')
>>> catalog = client.get_events(eventid='72572665', includearrivals=True)
>>> templates = template_gen(method="from_client", catalog=catalog,
... client_id='NCEDC', lowcut=2.0, highcut=9.0,
... samp_rate=20.0, filt_order=4, length=3.0,
... prepick=0.15, swin='all', process_len=200)
This will download data for a single event (given by eventid) from the NCEDC database, then use that information to download relevant waveform data. These data will then be filtered and cut according the parameters set.
It is often wise to set these parameters once as variables at the start of your scripts to ensure that the same parameters are used for both template creation and matched-filtering.
The method from_client can cope with multiple events (hence the use of a Catalog object), so you can download a lot of events from your chosen client and generate templates for them all.
Useful considerations¶
With these data-mining techniques, memory consumption is often an issue, as well as speed. To reduce memory consumption and increase efficiency it is often worth using a subset of picks. The advanced example below shows one way to do this. In practice, five picks (and therefore traces in a template) is often sufficient for matched-filter detections. However, you should test this on your own data.
Some other things that you might want to consider when generating templates include:
- Template-length, you probably only want to include the real earthquake signal in your template, so really long templates are probably not the best idea.
- On the same note, don’t include much (if any) data before the P-phase, unless you have good reason to - assuming your noise is random, including noise will reduce the correlations.
- Consider your frequency band - look for peak power in the chosen waveform relative to the noise.
- Coda waves often describe scatterers - scattered waves are very interesting, but may reduce the generality of your templates. If this is what you want, include coda, if you want a more general template, I would suggest not including coda. For examples of this you could try generating a lot of templates from a sequence and computing the SVD of the templates to see where the most coherent energy is (in the first basis vector), or just computing the stack of the waveforms.
Storing templates¶
Templates are returned as obspy Stream objects. You will likely want to store these templates on disk. This is usually best done by saving them as miniseed files. Miniseed uses an efficient compression algorithm and allows multiplexing, which is useful for template storage. However we do not constrain you to this.
>>> templates[0].write('template.ms', format="MSEED")
Advanced example¶
In this example we will download some data and picks from the New Zealand GeoNet database and make use of the functions in EQcorrscan to quickly and simply generate templates for use in matched-filter detection. In the example we are looking at an earthquake sequence on the east coast of New Zealand’s North Island that occurred on the 4th of January 2016. We will take a set of four template events from the sequence that have been picked by GeoNet, of a range of magnitudes. You can decide if these were good templates or not. You could easily extend this by choosing more template events (the mainshock in the sequence is a M 5 and you can get the information by clicking here).
You do not need to use data from an online server, many pick formats can be parsed, either by obspy, or (for seisan pick files) through the Sfile_utils module in EQcorrscan.
This template script is written to be general, so you should be able to give command line arguments to the script to generate templates from other FDSN databases. Note that some data-centers do not support full FDSN quakeml files, and working out which do is quite painful.
"""
Simple tutorial detailing how to generate a series of templates from catalog\
data available online.
"""
import logging
from obspy.clients.fdsn import Client
from obspy.core.event import Catalog
from eqcorrscan.utils.catalog_utils import filter_picks
from eqcorrscan.core import template_gen
# Set up logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")
def mktemplates(
network_code='GEONET', plot=True, public_ids=None):
"""Functional wrapper to make templates"""
public_ids = public_ids or [
'2016p008122', '2016p008353', '2016p008155', '2016p008194']
client = Client(network_code)
# We want to download a few events from an earthquake sequence, these are
# identified by publiID numbers, given as arguments
catalog = Catalog()
for public_id in public_ids:
try:
catalog += client.get_events(
eventid=public_id, includearrivals=True)
except TypeError:
# Cope with some FDSN services not implementing includearrivals
catalog += client.get_events(eventid=public_id)
# Lets plot the catalog to see what we have
if plot:
catalog.plot(projection='local', resolution='h')
# We don't need all the picks, lets take the information from the
# five most used stations - note that this is done to reduce computational
# costs.
catalog = filter_picks(catalog, top_n_picks=5)
# We only want the P picks in this example, but you can use others or all
# picks if you want.
for event in catalog:
for pick in event.picks:
if pick.phase_hint == 'S':
event.picks.remove(pick)
# Now we can generate the templates
templates = template_gen.template_gen(
method='from_client', catalog=catalog, client_id=network_code,
lowcut=2.0, highcut=9.0, samp_rate=20.0, filt_order=4, length=3.0,
prepick=0.15, swin='all', process_len=3600, plot=plot)
# We now have a series of templates! Using Obspy's Stream.write() method we
# can save these to disk for later use. We will do that now for use in the
# following tutorials.
for i, template in enumerate(templates):
template.write('tutorial_template_' + str(i) + '.ms', format='MSEED')
# Note that this will warn you about data types. As we don't care
# at the moment, whatever obspy chooses is fine.
return
if __name__ == '__main__':
"""Wrapper for template creation"""
import sys
import warnings
if not len(sys.argv) > 1:
warnings.warn('Needs a network ID followed by a list of event IDs, ' +
'will run the test case instead')
mktemplates()
else:
net_code = sys.argv[1]
idlist = list(sys.argv)[2:]
print(idlist)
mktemplates(network_code=net_code, public_ids=idlist)
Try this example for another, Northern California Data Center earthquake:
python template_creation.py NCEDC 72572665
This will (unfortunately for you) generate a warning about un-labelled picks, as many data-centers do not provide phase-hints with their picks. We care about which phase is which when we have to run our cross-correlation re-picker (yet to be completed), which means that we, by convention, assign P-picks to the vertical channel and S-picks to both horizontal channels.
This will also show template waveforms for both the automatic and the manual
picks - you can change this by removing either automatic or manual picks
y setting the evaluation_mode flag in eqcorrscan.utils.catalog_utils.filter_picks()
.
Note: To run this script and for all map plotting you will need to install matplotlib-toolkits basemap package. Install instructions and a link to the download are here. We recommend that you install the package using your system package manager if it is available.
Important considerations
In this tutorial we enforce downloading of day-long data for the template generation. This is to ensure that the data we make the template from, and the data we use for detection are processed in exactly the same way. If we were to only download a short segment of data around the event and process this we would find that the resampling process would result in minor differences between the templates and the continuous data. This has the effect that, for self-detections, the cross-correlation values are less than 1.
This is an important effect and something that you should consider when generating your own templates. You MUST process your templates in the exact same way (using the same routines, same filters, same resampling, and same data length) as your continuous data. It can have a very significant impact to your results.
The functions provided in template_gen are there to aid you, but if you look at the source code, all they are doing is:
- Detrending;
- Resampling;
- Filtering;
- and cutting.
If you want to do these things another way you are more then welcome to!
Converting from templates to Template and Tribe objects¶
This section should guide you through generating
eqcorrscan.core.match_filter.Template
and
eqcorrscan.core.match_filter.Tribe
objects from the simple templates
created above. This should provide you with a means to migrate from older
scripts to the more modern, object-oriented workflows in
matched filter.
To generate a eqcorrscan.core.match_filter.Template
from a cut, processed
waveform generated by the template_gen functions
you must fill all the attributes of the eqcorrscan.core.match_filter.Template
object. You must use the parameters you used when generating the template
waveform. Note that you do not need to include an event to make a complete
eqcorrscan.core.match_filter.Template
object, but it will be benficial
for further location of detected events.
In the following example we take a cut-waveform and convert it to a
eqcorrscan.core.match_filter.Template
object using the parameters we
used to generate the waveform:
>>> from eqcorrscan.core.match_filter import Template
>>> template = Template(
... name='test_template', st=templates[0], lowcut=2.0, highcut=9.0,
... samp_rate=20.0, filt_order=4, prepick=0.15, process_length=200)
You can then make a eqcorrscan.core.match_filter.Tribe
from a list of
eqcorrscan.core.match_filter.Template
objects as follows:
>>> from eqcorrscan.core.match_filter import Tribe
>>> tribe = Tribe(templates=[template])
Subspace Detection¶
EQcorrscan’s subspace detection methods are closely modelled on the method described by Harris (2006), Subspace Detectors: Theory. We offer options to multiplex data or leave as single-channels (multiplexing is significantly faster).
Subspace detection is implemented in an object-oriented style, whereby individual detectors are constructed from data, then used to detect within continuous data. At the core of the subspace routine is a Cython-based, static-typed routine to calculate the detection statistics. We do this to make use of numpy’s vectorized calculations, while taking advantage of the speed-ups afforded by compiling the sliding window loop.
WARNING¶
Subspace in EQcorrscan is in beta, you must check your results match what you expect - if you find errors please report them. Although our test-cases run correctly, changes in data quality may affect the routines in ways we have not accounted for.
Important¶
How you generate you detector is likely to be the most important thing, careful selection and alignment is key, and because of this we haven’t provided a total cookie-cutter system for doing this. You have freedom to chose your parameters, how to process, how you align, what traces to keep, whether you multiplex or not, etc. This also means you have a lot of freedom to get it wrong. You will have to do significant testing with your own dataset to work out what works and what doesn’t. Anything that you find that doesn’t work well in EQcorrscans system, it would be great to hear about so that we can make it better.
The following examples demonstrate some of the options, but not all of them. The advanced example is the example used to test and develop subspace and took a fair amount of effort over a number of weeks.
Simple example¶
To begin with you will need to create a Detector:
>>> from eqcorrscan.core import subspace
>>> detector = subspace.Detector()
This will create an empty detector object. These objects have various attributes, including the data to be used as a detector (detector.data), alongside the full input and output basis vector matrices (detector.u and detector.v respectively) and the vector of singular-values (detector.sigma). Meta-data are also included, including whether the detector is multiplexed or not (detector.multiplex), the filters applied (detector.lowcut, detector.highcut, detection.filt_order, detector.sampling_rate), the dimension of the subspace (detector.dimension), and the name of the detector, which you can use for book-keeping (detector.name).
To populate the empty detector you need a design set of streams that have been aligned (see clustering submodule for alignment methods).
>>> from obspy import read
>>> import glob
>>> import os
>>> from eqcorrscan import tests
>>> # Get the path for the test-data so we can test this
>>> TEST_PATH = os.path.dirname(tests.__file__)
>>> wavefiles = glob.glob(
... TEST_PATH + '/test_data/similar_events_processed/*')
>>> wavefiles.sort() # Sort the wavefiles to ensure reproducibility
>>> streams = [read(w) for w in wavefiles[0:3]]
>>> # Channels must all be the same length
>>> detector.construct(streams=streams, lowcut=2, highcut=9, filt_order=4,
... sampling_rate=20, multiplex=True, name='Test_1',
... align=True, shift_len=0.5, reject=0.2)
Detector: Test_1
This will populate all the attributes of your detector object, and fill the detector.data with the full input basis vector matrix.
You will want to reduce the dimensions of your subspace detector, such that you are just describing the signal, preferably with a lot of generality. Details for selecting dimensionality should be found in Harris (2006). To do this in EQcorrscan simply use the partition method:
>>> detector.partition(2)
Detector: Test_1
This will populate detector.data with the first four, left-most input basis vectors. You can test to see how much of your original design set is described by this detector by using the energy_capture method:
>>> percent_capture = detector.energy_capture()
This will return a percentage capture, you can run this for multiple dimensions to test what dimension best suits your application. Again, details for this selection can be found in Harris (2006).
Finally, to use your detector to detect within continuous data you should use the detect method. This requires a stream with the same stations and channels used in the detector, and a threshold from 0-1, where 0 is no signal, and 1 is totally described by your detector. You can extract streams for the detections at the same time as the detections by setting the extract_detections flag to True.
>>> stream = read(wavefiles[0])
>>> detections = detector.detect(st=stream, threshold=0.5, trig_int=3)
Advanced Example¶
This example computes detections for a short data-period during an earthquake sequence in the Wairarapa region of New Zealand’s North Island. This example only shows one subspace detector, but could be extended, using the various clustering routines in EQcorrscan, to create many subspace detectors. These could be run using the subspace_detect function, which runs similar detectors in parallel through the given data.
"""
Advanced subspace tutorial to show some of the capabilities of the method.
This example uses waveforms from a known earthquake sequence (in the Wairarapa
region north of Wellington, New Zealand). The catalogue locations etc can
be downloaded from this link:
http://quakesearch.geonet.org.nz/services/1.0.0/csv?bbox=175.37956,-40.97912,175.53097,-40.84628&startdate=2015-7-18T2:00:00&enddate=2016-7-18T3:00:00
"""
import logging
from http.client import IncompleteRead
from obspy.clients.fdsn import Client
from obspy import UTCDateTime, Stream
from eqcorrscan.utils.catalog_utils import filter_picks
from eqcorrscan.utils.clustering import catalog_cluster
from eqcorrscan.core import subspace
# Set up logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")
def run_tutorial(plot=False, multiplex=True, return_streams=False, cores=4,
verbose=False):
"""
Run the tutorial.
:return: detections
"""
client = Client("GEONET", debug=verbose)
cat = client.get_events(
minlatitude=-40.98, maxlatitude=-40.85, minlongitude=175.4,
maxlongitude=175.5, starttime=UTCDateTime(2016, 5, 1),
endtime=UTCDateTime(2016, 5, 20))
print(f"Downloaded a catalog of {len(cat)} events")
# This gives us a catalog of events - it takes a while to download all
# the information, so give it a bit!
# We will generate a five station, multi-channel detector.
cat = filter_picks(catalog=cat, top_n_picks=5)
stachans = list(set(
[(pick.waveform_id.station_code, pick.waveform_id.channel_code)
for event in cat for pick in event.picks]))
# In this tutorial we will only work on one cluster, defined spatially.
# You can work on multiple clusters, or try to whole set.
clusters = catalog_cluster(
catalog=cat, metric="distance", thresh=2, show=False)
# We will work on the largest cluster
cluster = sorted(clusters, key=lambda c: len(c))[-1]
# This cluster contains 32 events, we will now download and trim the
# waveforms. Note that each chanel must start at the same time and be the
# same length for multiplexing. If not multiplexing EQcorrscan will
# maintain the individual differences in time between channels and delay
# the detection statistics by that amount before stacking and detection.
client = Client('GEONET')
design_set = []
st = Stream()
for event in cluster:
print(f"Downloading for event {event.resource_id.id}")
bulk_info = []
t1 = event.origins[0].time
t2 = t1 + 25.1 # Have to download extra data, otherwise GeoNet will
# trim wherever suits.
t1 -= 0.1
for station, channel in stachans:
try:
st += client.get_waveforms(
'NZ', station, '*', channel[0:2] + '?', t1, t2)
except IncompleteRead:
print(f"Could not download for {station} {channel}")
print(f"Downloaded {len(st)} channels")
for event in cluster:
t1 = event.origins[0].time
t2 = t1 + 25
design_set.append(st.copy().trim(t1, t2))
# Construction of the detector will process the traces, then align them,
# before multiplexing.
print("Making detector")
detector = subspace.Detector()
detector.construct(
streams=design_set, lowcut=2.0, highcut=9.0, filt_order=4,
sampling_rate=20, multiplex=multiplex, name='Wairarapa1', align=True,
reject=0.2, shift_len=6, plot=plot).partition(9)
print("Constructed Detector")
if plot:
detector.plot()
# We also want the continuous stream to detect in.
t1 = UTCDateTime(2016, 5, 11, 19)
t2 = UTCDateTime(2016, 5, 11, 20)
# We are going to look in a single hour just to minimize cost, but you can
# run for much longer.
bulk_info = [('NZ', stachan[0], '*',
stachan[1][0] + '?' + stachan[1][-1],
t1, t2) for stachan in detector.stachans]
print("Downloading continuous data")
st = client.get_waveforms_bulk(bulk_info)
st.merge().detrend('simple').trim(starttime=t1, endtime=t2)
# We set a very low threshold because the detector is not that great, we
# haven't aligned it particularly well - however, at this threshold we make
# two real detections.
print("Computing detections")
detections, det_streams = detector.detect(
st=st, threshold=0.4, trig_int=2, extract_detections=True,
cores=cores)
if return_streams:
return detections, det_streams
else:
return detections
if __name__ == '__main__':
run_tutorial()
Lag-time and pick correction¶
The following is a work-in-progress tutorial for lag-calc functionality.
An important note¶
Picks generated by lag-calc are relative to the start of the template waveform, for example, if you generated your templates with a pre_pick of 0.2, you should expect picks to occur 0.2 seconds before the actual phase arrival. The result of this is that origin-times will be shifted by the same amount.
If you have applied different pre_picks to different channels when generating template (currently not supported by any EQcorrscan functions), then picks generated here will not give the correct location.
Advanced Example: Parkfield 2004¶
"""Tutorial to illustrate the lag_calc usage."""
import logging
from multiprocessing import cpu_count
from obspy.clients.fdsn import Client
from obspy.clients.fdsn.header import FDSNException
from obspy.core.event import Catalog
from obspy import UTCDateTime, Stream
from eqcorrscan.core import template_gen, match_filter, lag_calc
from eqcorrscan.utils import pre_processing, catalog_utils
# Set up logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")
def run_tutorial(min_magnitude=2, shift_len=0.2, num_cores=4, min_cc=0.5):
"""Functional, tested example script for running the lag-calc tutorial."""
if num_cores > cpu_count():
num_cores = cpu_count()
client = Client('NCEDC')
t1 = UTCDateTime(2004, 9, 28)
t2 = t1 + 86400
print('Downloading catalog')
catalog = client.get_events(
starttime=t1, endtime=t2, minmagnitude=min_magnitude,
minlatitude=35.7, maxlatitude=36.1, minlongitude=-120.6,
maxlongitude=-120.2, includearrivals=True)
# We don't need all the picks, lets take the information from the
# five most used stations - note that this is done to reduce computational
# costs.
catalog = catalog_utils.filter_picks(
catalog, channels=['EHZ'], top_n_picks=5)
# There is a duplicate pick in event 3 in the catalog - this has the effect
# of reducing our detections - check it yourself.
for pick in catalog[3].picks:
if pick.waveform_id.station_code == 'PHOB' and \
pick.onset == 'emergent':
catalog[3].picks.remove(pick)
print('Generating templates')
templates = template_gen.template_gen(
method="from_client", catalog=catalog, client_id='NCEDC',
lowcut=2.0, highcut=9.0, samp_rate=50.0, filt_order=4, length=3.0,
prepick=0.15, swin='all', process_len=3600)
# In this section we generate a series of chunks of data.
start_time = UTCDateTime(2004, 9, 28, 17)
end_time = UTCDateTime(2004, 9, 28, 20)
process_len = 3600
chunks = []
chunk_start = start_time
while chunk_start < end_time:
chunk_end = chunk_start + process_len
if chunk_end > end_time:
chunk_end = end_time
chunks.append((chunk_start, chunk_end))
chunk_start += process_len
all_detections = []
picked_catalog = Catalog()
template_names = [template[0].stats.starttime.strftime("%Y%m%d_%H%M%S")
for template in templates]
for t1, t2 in chunks:
print('Downloading and processing for start-time: %s' % t1)
# Download and process the data
bulk_info = [(tr.stats.network, tr.stats.station, '*',
tr.stats.channel, t1, t2) for tr in templates[0]]
# Just downloading a chunk of data
try:
st = client.get_waveforms_bulk(bulk_info)
except FDSNException:
st = Stream()
for _bulk in bulk_info:
st += client.get_waveforms(*_bulk)
st.merge(fill_value='interpolate')
st = pre_processing.shortproc(
st, lowcut=2.0, highcut=9.0, filt_order=4, samp_rate=50.0,
num_cores=num_cores)
detections = match_filter.match_filter(
template_names=template_names, template_list=templates, st=st,
threshold=8.0, threshold_type='MAD', trig_int=6.0, plotvar=False,
plotdir='.', cores=num_cores)
# Extract unique detections from set.
unique_detections = []
for master in detections:
keep = True
for slave in detections:
if not master == slave and\
abs(master.detect_time - slave.detect_time) <= 1.0:
# If the events are within 1s of each other then test which
# was the 'best' match, strongest detection
if not master.detect_val > slave.detect_val:
keep = False
break
if keep:
unique_detections.append(master)
all_detections += unique_detections
picked_catalog += lag_calc.lag_calc(
detections=unique_detections, detect_data=st,
template_names=template_names, templates=templates,
shift_len=shift_len, min_cc=min_cc, interpolate=False, plot=False)
# Return all of this so that we can use this function for testing.
return all_detections, picked_catalog, templates, template_names
if __name__ == '__main__':
run_tutorial(min_magnitude=4, num_cores=cpu_count())
Magnitude calculation¶
EQcorrscan contains both an automatic amplitude picker and a singular-value decomposition derived magnitude calculation, which is very accurate but requires high levels of event similarity.
Relative moment by singular-value decomposition¶
This method closely follows the method outlined by Rubinstein & Ellsworth 2010.
This example requires data downloaded from the eqcorrscan github repository.
>>> from eqcorrscan.utils.mag_calc import svd_moments
>>> from obspy import read
>>> import glob
>>> from eqcorrscan.utils.clustering import svd
>>> import numpy as np
>>> from eqcorrscan import tests
>>> import os
>>> # Get the path for the test-data so we can test this
>>> testing_path = os.path.dirname(
... tests.__file__) + '/test_data/similar_events_processed'
>>> stream_files = glob.glob(os.path.join(testing_path, '*'))
>>> stream_list = [read(stream_file) for stream_file in stream_files]
>>> event_list = []
>>> for i, stream in enumerate(stream_list):
... st_list = []
... for tr in stream:
... # Only use the vertical channels of sites with known high similarity.
... # You do not need to use this step for your data.
... if (tr.stats.station, tr.stats.channel) not in\
... [('WHAT2', 'SH1'), ('WV04', 'SHZ'), ('GCSZ', 'EHZ')]:
... stream.remove(tr)
... continue
... st_list.append(i)
... event_list.append(st_list)
<obspy.core.stream.Stream object at ...>
>>> event_list = np.asarray(event_list).T.tolist()
>>> SVectors, SValues, Uvectors, stachans = svd(stream_list=stream_list)
>>> M, events_out = svd_moments(u=Uvectors, s=SValues, v=SVectors,
... stachans=stachans, event_list=event_list)
Clustering and stacking¶
Prior to template generation, it may be beneficial to cluster earthquake waveforms. Clusters of earthquakes with similar properties can then be stacked to create higher signal-to-noise templates that describe the dataset well (and require fewer templates to reduce computational cost). Clusters could also be reduced to their singular-vectors and employed in a subspace detection routine (coming soon to eqcorrscan).
The following outlines a few examples of clustering and stacking.
Cluster in space¶
Download a catalog of global earthquakes and cluster in space, set the distance threshold to 1,000km
>>> from eqcorrscan.utils.clustering import catalog_cluster
>>> from obspy.clients.fdsn import Client
>>> from obspy import UTCDateTime
>>> client = Client("IRIS")
>>> starttime = UTCDateTime("2002-01-01")
>>> endtime = UTCDateTime("2002-02-01")
>>> cat = client.get_events(starttime=starttime, endtime=endtime,
... minmagnitude=6, catalog="ISC")
>>> groups = catalog_cluster(
... catalog=cat, metric="distance", thresh=1000, show=False)
Download a local catalog of earthquakes and cluster much finer (distance threshold of 2km).
>>> client = Client("NCEDC")
>>> cat = client.get_events(starttime=starttime, endtime=endtime,
... minmagnitude=2)
>>> groups = catalog_cluster(
... catalog=cat, metric="distance", thresh=2, show=False)
Setting show to true will plot the dendrogram for grouping with individual groups plotted in different colours. The clustering is performed using scipy’s hierachical clustering routines. Specifically clustering is performed using the linkage method, which is an agglomorative clustering routine. EQcorrscan uses the average method with the euclidean distance metric.
Cluster in time and space¶
EQcorrscan’s space-time clustering routine first computes groups in space, using the space_cluster method, then splits the returned groups based on their inter-event time.
The following example extends the example of the global catalog with a 1,000km distance threshold and a one-day temporal limit.
>>> from eqcorrscan.utils.clustering import space_time_cluster
>>> client = Client("IRIS")
>>> cat = client.get_events(starttime=starttime, endtime=endtime,
... minmagnitude=6, catalog="ISC")
>>> groups = space_time_cluster(catalog=cat, t_thresh=86400, d_thresh=1000)
Cluster according to cross-correlation values¶
Waveforms from events are often best grouped based on their similarity. EQcorrscan has a method to compute clustering based on average cross-correlations. This again uses scipy’s hierachical clustering routines, however in this case clusters are computed using the single method. Distances are computed from the average of the multi-chanel cross-correlation values.
The following example uses data stored in the EQcorrscan github repository, in the tests directory.
>>> from obspy import read
>>> import glob
>>> import os
>>> from eqcorrscan.utils.clustering import cluster
>>> from eqcorrscan import tests
>>> # You will need to edit this line to the location of your eqcorrscan repo.
>>> TEST_PATH = os.path.dirname(tests.__file__)
>>> testing_path = TEST_PATH + '/test_data/similar_events_processed'
>>> stream_files = glob.glob(os.path.join(testing_path, '*'))
>>> stream_list = [(read(stream_file), i)
... for i, stream_file in enumerate(stream_files)]
>>> for stream in stream_list:
... for tr in stream[0]:
... if tr.stats.station not in ['WHAT2', 'WV04', 'GCSZ']:
... stream[0].remove(tr)
... continue
>>> groups = cluster(template_list=stream_list, show=False,
... corr_thresh=0.3, cores=2)
Stack waveforms (linear)¶
Following from clustering, similar waveforms can be stacked. EQcorrscan includes two stacking algorithms, a simple linear stacking method, and a phase-weighted stacking method.
The following examples use the test data in the eqcorrscan github repository.
>>> from eqcorrscan.utils.stacking import linstack
>>> # groups[0] should contain 3 streams, which we can now stack
>>> # Groups are returned as lists of tuples, of the stream and event index
>>> group_streams = [st_tuple[0] for st_tuple in groups[0]]
>>> stack = linstack(streams=group_streams)
Stack waveforms (phase-weighted)¶
The phase-weighted stack method closely follows the method outlined by Thurber et al. 2014. In this method the linear stack is weighted by the stack of the instantaneous phase. In this manor coherent signals are amplified.
>>> from eqcorrscan.utils.stacking import PWS_stack
>>> # groups[0] should contain 3 streams, which we can now stack
>>> # Groups are returned as lists of tuples, of the stream and event index
>>> stack = PWS_stack(streams=group_streams)
EQcorrscan API¶
EQcorrscan contains two main modules, core and utils. Core contains most of the most useful functions (including matched-filtering), and utils contains a range of possibly helpful functions that may be useful to you and/or are used by the core routines.
Core¶
Core routines of the EQcorrscan project.
lag_calc¶
Functions to generate pick-corrections for events detected by correlation.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Functions for generating pick-corrections from cross-correlations with a template. Originally this was designed for events detected by matched-filtering, however you can use any well correlated events. Based on the method of Shelly and Hardebeck (2010).
Functions¶
-
eqcorrscan.core.lag_calc.
lag_calc
(detections, detect_data, template_names, templates, shift_len=0.2, min_cc=0.4, min_cc_from_mean_cc_factor=None, horizontal_chans=['E', 'N', '1', '2'], vertical_chans=['Z'], cores=1, interpolate=False, plot=False, plotdir=None, export_cc=False, cc_dir=None)[source]¶ Cross-correlation derived picking of seismic events.
Overseer function to take a list of detection objects, cut the data for them to lengths of the same length of the template + shift_len on either side. This will output a
obspy.core.event.Catalog
of picked events. Pick times are based on the lag-times found at the maximum correlation, providing that correlation is above the min_cc.Parameters: - detections (list) – List of
eqcorrscan.core.match_filter.Detection
objects. - detect_data (obspy.core.stream.Stream) – All the data needed to cut from - can be a gappy Stream.
- template_names (list) – List of the template names, used to help identify families of events. Must be in the same order as templates.
- templates (list) –
- List of the templates, templates must be a list of
obspy.core.stream.Stream
objects.
- shift_len (float) – Shift length allowed for the pick in seconds, will be plus/minus this amount - default=0.2
- min_cc (float) – Minimum cross-correlation value to be considered a pick, default=0.4.
- min_cc_from_mean_cc_factor (float) – If set to a value other than None, then the minimum cross-correlation value for a trace is set individually for each detection based on: min(detect_val / n_chans * min_cc_from_mean_cc_factor, min_cc).
- horizontal_chans (list) – List of channel endings for horizontal-channels, on which S-picks will be made.
- vertical_chans (list) – List of channel endings for vertical-channels, on which P-picks will be made.
- cores (int) – Number of cores to use in parallel processing, defaults to one.
- interpolate (bool) – Interpolate the correlation function to achieve sub-sample precision.
- plot (bool) – To generate a plot for every detection or not, defaults to False
- plotdir – Path to plotting folder, plots will be output here.
- export_cc (bool) – To generate a binary file in NumPy for every detection or not, defaults to False
- cc_dir (str) – Path to saving folder, NumPy files will be output here.
Returns: Catalog of events with picks. No origin information is included. These events can then be written out via
obspy.core.event.Catalog.write()
, or to Nordic Sfiles usingeqcorrscan.utils.sfile_util.eventtosfile()
and located externally.Return type: Note
Picks output in catalog are generated relative to the template start-time. For example, if you generated your template with a pre_pick time of 0.2 seconds, you should expect picks generated by lag_calc to occur 0.2 seconds before the true phase-pick. This is because we do not currently store template meta-data alongside the templates.
Warning
Because of the above note, origin times will be consistently shifted by the static pre_pick applied to the templates.
Warning
This routine requires only one template per channel (e.g. you should not use templates with a P and S template on a single channel). If this does occur an error will be raised.
Note
S-picks will be made on horizontal channels, and P picks made on vertical channels - the default is that horizontal channels end in one of: ‘E’, ‘N’, ‘1’ or ‘2’, and that vertical channels end in ‘Z’. The options vertical_chans and horizontal_chans can be changed to suit your dataset.
Note
Individual channel cross-correlations are stored as a
obspy.core.event.Comment
for each pick, and the summed cross-correlation value resulting from these is stored as aobspy.core.event.Comment
in the mainobspy.core.event.Event
object.Note
The order of events is preserved (e.g. detections[n] == output[n]), providing picks have been made for that event. If no picks have been made for an event, it will not be included in the output. However, as each detection has an ID associated with it, these can be mapped to the output resource_id for each Event in the output Catalog. e.g.
detections[n].id == output[m].resource_idif the output[m] is for the same event as detections[n].
Note
The correlation data that are saved to the binary files can be useful to select an appropriate threshold for your data.
- detections (list) – List of
-
eqcorrscan.core.lag_calc.
xcorr_pick_family
(family, stream, shift_len=0.2, min_cc=0.4, min_cc_from_mean_cc_factor=None, horizontal_chans=['E', 'N', '1', '2'], vertical_chans=['Z'], cores=1, interpolate=False, plot=False, plotdir=None, export_cc=False, cc_dir=None)[source]¶ Compute cross-correlation picks for detections in a family.
Parameters: - family (eqcorrscan.core.match_filter.family.Family) – Family to calculate correlation picks for.
- stream (obspy.core.stream.Stream) – Data stream containing data for all (or a subset of) detections in the Family
- shift_len (float) – Shift length allowed for the pick in seconds, will be plus/minus this amount - default=0.2
- min_cc (float) – Minimum cross-correlation value to be considered a pick, default=0.4.
- min_cc_from_mean_cc_factor (float) – If set to a value other than None, then the minimum cross-correlation value for a trace is set individually for each detection based on: min(detect_val / n_chans * min_cc_from_mean_cc_factor, min_cc).
- horizontal_chans (list) – List of channel endings for horizontal-channels, on which S-picks will be made.
- vertical_chans (list) – List of channel endings for vertical-channels, on which P-picks will be made.
- cores (int) – Number of cores to use in parallel processing, defaults to one.
- interpolate (bool) – Interpolate the correlation function to achieve sub-sample precision.
- plot (bool) – To generate a plot for every detection or not, defaults to False
- plotdir (str) – Path to plotting folder, plots will be output here.
- export_cc (bool) – To generate a binary file in NumPy for every detection or not, defaults to False
- cc_dir (str) – Path to saving folder, NumPy files will be output here.
Returns: Dictionary of picked events keyed by detection id.
Private Functions¶
Note that these functions are not designed for public use and may change at any point.
-
eqcorrscan.core.lag_calc.
_concatenate_and_correlate
(streams, template, cores)[source]¶ Concatenate a list of streams into one stream and correlate that with a template.
All traces in a stream must have the same length.
-
eqcorrscan.core.lag_calc.
_prepare_data
(family, detect_data, shift_len)[source]¶ Prepare data for lag_calc - reduce memory here.
Parameters: - family (eqcorrscan.core.match_filter.family.Family) – The Family containing the template and detections.
- detect_data (obspy.core.stream.Stream) – Stream to extract detection streams from.
- shift_len (float) – Shift length in seconds allowed for picking.
Returns: Dictionary of detect_streams keyed by detection id to be worked on
Return type:
-
eqcorrscan.core.lag_calc.
_xcorr_interp
(ccc, dt)[source]¶ Interpolate around the maximum correlation value for sub-sample precision.
Parameters: - ccc (numpy.ndarray) – Cross-correlation array
- dt (float) – sample interval
Returns: Position of interpolated maximum in seconds from start of ccc
Return type:
match_filter¶
Classes and functions for matched-filtering.
This is designed for large-scale, multi-paralleled detection, with moderate to large numbers (hundreds to a few thousands) of templates.
Object-oriented API¶
EQcorrscan’s matched-filter object oriented API is split into two halves, input (Template and Tribe objects) and output (Detection, Family and Party objects).
- A Tribe is a collection of Templates.
- A Family is a collection of Detections for a single Template.
- A Party is a collection of Families for a collection of Templates (a Tribe).
These objects retain useful meta-data including the obspy Event associated with the Template, and how the Template was processed. A core aim of the object-oriented API is reproducibility, and we encourage first-time users to adopt this from the off.
The methods of Template and Tribe objects mirror
each other and in general you are best-off working with Tribes .
Both the Template and Tribe objects
have detect
methods, which allow you to conduct matched-filtering.
The Detection, Family and Party
objects contain all the metadata needed to re-create your detections. Furthermore,
the Family and Party objects have a
lag_calc
method for conducting cross-correlation phase-picking based on
correlation with the Template.
Object | Purpose |
---|---|
Template | To contain the template waveform and meta-data used to create the template. |
Tribe | A collection of multiple Templates. Use the detect method to run matched-filter detections! |
Detection | Root of the detection object tree - contains information relevant to a single detection from a single template. |
Family | Collection of detections for a single Template. |
Party | Collection of Family objects. |
Function-based API¶
Functions for network matched-filter detection of seismic data.
Designed to cross-correlate templates generated by template_gen function with data and output the detections.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
match_filter |
Main matched-filter detection function. |
Helper functions for network matched-filter detection of seismic data.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
extract_from_stream |
Extract waveforms for a list of detections from a stream. |
normxcorr2 |
Thin wrapper to eqcorrscan.utils.correlate functions. |
temporary_directory |
make a temporary directory, yeild its name, cleanup on exit |
subspace¶
This module contains functions relevant to executing subspace detection for earthquake catalogs.
We recommend that you read Harris’ detailed report on subspace detection theory which can be found here: https://e-reports-ext.llnl.gov/pdf/335299.pdf
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Subspace detection for either single-channel cases, or network cases. This is modelled on the methods described by Harris. This method allows for slightly more variation in detected waveforms than the traditional matched-filter method. In this method, templates are constructed either by using the empirical subspace method, or by computing the basis vectors by singular-value decomposition. Both methods are provided as part of EQcorrscan in the clustering module.
Classes¶
-
class
eqcorrscan.core.subspace.
Detector
(name=None, sampling_rate=None, multiplex=None, stachans=None, lowcut=None, highcut=None, filt_order=None, data=None, u=None, sigma=None, v=None, dimension=None)[source]¶ Bases:
object
Class to serve as the base for subspace detections.
Parameters: - name (str) – Name of subspace detector, used for book-keeping
- sampling_rate (float) – Sampling rate in Hz of original waveforms
- multiplex (bool) – Is this detector multiplexed.
- stachans (list) – List of tuples of (station, channel) used in detector. If multiplexed, these must be in the order that multiplexing was done.
- lowcut (float) – Lowcut filter in Hz
- highcut (float) – Highcut filter in Hz
- filt_order (int) – Number of corners for filtering
- data (numpy.ndarray) – The actual detector
- u (numpy.ndarray) – Full rank U matrix of left (input) singular vectors.
- sigma (numpy.ndarray) – Full rank vector of singular values.
- v (numpy.ndarray) – Full rank right (output) singular vectors.
- dimension (int) – Dimension of data.
Warning
Changing between scipy.linalg.svd solvers (obvious changes between scipy version 0.18.x and 0.19.0) result in sign changes in svd results. You should only run a detector created using the same scipy version as you currently run.
Methods
construct
(streams, lowcut, highcut, …[, …])Construct a subspace detector from a list of streams, full rank. detect
(st, threshold, trig_int[, moveout, …])Detect within continuous data using the subspace method. energy_capture
([stachans, size, show])Calculate the average percentage energy capture for this subspace. partition
(dimension)Partition subspace into desired dimension. plot
([stachans, size, show])Plot the output basis vectors for the detector at the given dimension. read
(filename)Read detector from a file, must be HDF5 format. write
(filename)Write detector to a file - uses HDF5 file format. -
__init__
(name=None, sampling_rate=None, multiplex=None, stachans=None, lowcut=None, highcut=None, filt_order=None, data=None, u=None, sigma=None, v=None, dimension=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
construct
(streams, lowcut, highcut, filt_order, sampling_rate, multiplex, name, align, shift_len=0, reject=0.3, no_missed=True, plot=False)[source]¶ Construct a subspace detector from a list of streams, full rank.
Subspace detector will be full-rank, further functions can be used to select the desired dimensions.
Parameters: - streams (list) – List of
obspy.core.stream.Stream
to be used to generate the subspace detector. These should be pre-clustered and aligned. - lowcut (float) – Lowcut in Hz, can be None to not apply filter
- highcut (float) – Highcut in Hz, can be None to not apply filter
- filt_order (int) – Number of corners for filter.
- sampling_rate (float) – Desired sampling rate in Hz
- multiplex (bool) – Whether to multiplex the data or not. Data are multiplexed according to the method of Harris, see the multi function for details.
- name (str) – Name of the detector, used for book-keeping.
- align (bool) – Whether to align the data or not - needs to be done at some point
- shift_len (float) – Maximum shift allowed for alignment in seconds.
- reject (float) – Minimum correlation to include traces - only used if align=True.
- no_missed (bool) – Reject streams with missed traces, defaults to True. A missing trace from lots of events will reduce the quality of the subspace detector if multiplexed. Only used when multi is set to True.
- plot (bool) – Whether to plot the alignment stage or not.
Note
The detector will be normalized such that the data, before computing the singular-value decomposition, will have unit energy. e.g. We divide the amplitudes of the data by the L1 norm of the data.
Warning
EQcorrscan’s alignment will attempt to align over the whole data window given. For long (more than 2s) chunks of data this can give poor results and you might be better off using the
eqcorrscan.utils.stacking.align_traces()
function externally, focusing on a smaller window of data. To do this you would align the data prior to running construct.- streams (list) – List of
-
detect
(st, threshold, trig_int, moveout=0, min_trig=0, process=True, extract_detections=False, cores=1)[source]¶ Detect within continuous data using the subspace method.
Parameters: - st (obspy.core.stream.Stream) – Un-processed stream to detect within using the subspace detector.
- threshold (float) – Threshold value for detections between 0-1
- trig_int (float) – Minimum trigger interval in seconds.
- moveout (float) – Maximum allowable moveout window for non-multiplexed, network detection. See note.
- min_trig (int) – Minimum number of stations exceeding threshold for non-multiplexed, network detection. See note.
- process (bool) – Whether or not to process the stream according to the parameters defined by the detector. Default is True, which will process the data.
- extract_detections (bool) – Whether to extract waveforms for each detection or not, if True will return detections and streams.
- cores (int) – Number of threads to process data with.
Returns: list of
eqcorrscan.core.match_filter.Detection
Return type: list
Warning
Subspace is currently in beta, see note in the subspace tutorial for information.
Note
If running in bulk with detectors that all have the same parameters then you can pre-process the data and set process to False. This will speed up this detect function dramatically.
Warning
If the detector and stream are multiplexed then they must contain the same channels and multiplexed in the same order. This is handled internally when process=True, but if running in bulk you must take care.
Note
Non-multiplexed, network detection. When the detector is not multiplexed, but there are multiple channels within the detector, we do not stack the single-channel detection statistics because we do not have a one-size-fits-all solution for computing delays for a subspace detector (if you want to implement one, then please contribute it!). Therefore, these parameters provide a means for declaring a network coincidence trigger using single-channel detection statistics, in a similar fashion to the commonly used network-coincidence trigger with energy detection statistics.
-
energy_capture
(stachans='all', size=(10, 7), show=False)[source]¶ Calculate the average percentage energy capture for this subspace.
Returns: Percentage energy capture Return type: float
-
partition
(dimension)[source]¶ Partition subspace into desired dimension.
Parameters: dimension (int) – Maximum dimension to use.
-
plot
(stachans='all', size=(10, 7), show=True)[source]¶ Plot the output basis vectors for the detector at the given dimension.
Corresponds to the first n horizontal vectors of the V matrix.
Parameters: - stachans (list) – list of tuples of station, channel pairs to plot.
- stachans – List of tuples of (station, channel) to use. Can set to ‘all’ to use all the station-channel pairs available. If detector is multiplexed, will just plot that.
- size (tuple) – Figure size.
- show (bool) – Whether or not to show the figure.
Returns: Figure
Return type: matplotlib.pyplot.Figure
Note
See
eqcorrscan.utils.plotting.subspace_detector_plot()
for example.
Functions¶
-
eqcorrscan.core.subspace.
read_detector
(filename)[source]¶ Read detector from a filename.
Parameters: filename (str) – Filename to save the detector to. Returns: Detector object Return type: eqcorrscan.core.subspace.Detector
-
eqcorrscan.core.subspace.
multi
(stream)[source]¶ Internal multiplexer for multiplex_detect.
Parameters: stream (obspy.core.stream.Stream) – Stream to multiplex Returns: trace of multiplexed data Return type: obspy.core.trace.Trace Maps a standard multiplexed stream of seismic data to a single traces of multiplexed data as follows:
Input: x = [x1, x2, x3, …] y = [y1, y2, y3, …] z = [z1, z2, z3, …]
Output: xyz = [x1, y1, z1, x2, y2, z2, x3, y3, z3, …]
-
eqcorrscan.core.subspace.
subspace_detect
(detectors, stream, threshold, trig_int, moveout=0, min_trig=1, parallel=True, num_cores=None)[source]¶ Conduct subspace detection with chosen detectors.
Parameters: - detectors (list) – list of
eqcorrscan.core.subspace.Detector
to be used for detection. - stream (obspy.core.stream.Stream) – Stream to detect within.
- threshold (float) – Threshold between 0 and 1 for detection, see
Detector.detect()
- trig_int (float) – Minimum trigger interval in seconds.
- moveout (float) – Maximum allowable moveout window for non-multiplexed, network detection. See note.
- min_trig (int) – Minimum number of stations exceeding threshold for non-multiplexed,
network detection. See note in
Detector.detect()
. - parallel (bool) – Whether to run detectors in parallel in groups.
- num_cores (int) – How many cpu cores to use if parallel==True. If set to None (default), will use all available cores.
Return type: list
Returns: List of
eqcorrscan.core.match_filter.Detection
detections.Note
This will loop through your detectors using their detect method. If the detectors are multiplexed it will run groups of detectors with the same channels at the same time.
- detectors (list) – list of
template_gen¶
Functions to generate template waveforms and information to go with them for the application of cross-correlation of seismic data for the detection of repeating events.
Note
All functions use obspy filters, which are implemented such that if both highcut and lowcut are set a bandpass filter will be used, but of highcut is not set (None) then a highpass filter will be used and if only the highcut is set then a lowpass filter will be used.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Routines for cutting waveforms around picks for use as templates for matched filtering.
Functions¶
-
eqcorrscan.core.template_gen.
template_gen
(method, lowcut, highcut, samp_rate, filt_order, length, prepick, swin='all', process_len=86400, all_horiz=False, delayed=True, plot=False, plotdir=None, return_event=False, min_snr=None, parallel=False, num_cores=False, save_progress=False, skip_short_chans=False, **kwargs)[source]¶ Generate processed and cut waveforms for use as templates.
Parameters: - method (str) – Template generation method, must be one of (‘from_client’, ‘from_seishub’, ‘from_sac’, ‘from_meta_file’). - Each method requires associated arguments, see note below.
- lowcut (float) – Low cut (Hz), if set to None will not apply a lowcut.
- highcut (float) – High cut (Hz), if set to None will not apply a highcut.
- samp_rate (float) – New sampling rate in Hz.
- filt_order (int) – Filter level (number of corners).
- length (float) – Length of template waveform in seconds.
- prepick (float) – Pre-pick time in seconds
- swin (str) – P, S, P_all, S_all or all, defaults to all: see note in
eqcorrscan.core.template_gen.template_gen()
- process_len (int) – Length of data in seconds to download and process.
- all_horiz (bool) – To use both horizontal channels even if there is only a pick on one of them. Defaults to False.
- delayed (bool) – If True, each channel will begin relative to it’s own pick-time, if set to False, each channel will begin at the same time.
- plot (bool) – Plot templates or not.
- plotdir (str) – The path to save plots to. If plotdir=None (default) then the figure will be shown on screen.
- return_event (bool) – Whether to return the event and process length or not.
- min_snr (float) – Minimum signal-to-noise ratio for a channel to be included in the template, where signal-to-noise ratio is calculated as the ratio of the maximum amplitude in the template window to the rms amplitude in the whole window given.
- parallel (bool) – Whether to process data in parallel or not.
- num_cores (int) – Number of cores to try and use, if False and parallel=True, will use either all your cores, or as many traces as in the data (whichever is smaller).
- save_progress (bool) – Whether to save the resulting templates at every data step or not. Useful for long-running processes.
- skip_short_chans (bool) – Whether to ignore channels that have insufficient length data or not. Useful when the quality of data is not known, e.g. when downloading old, possibly triggered data from a datacentre
Returns: List of
obspy.core.stream.Stream
TemplatesReturn type: list
Note
By convention templates are generated with P-phases on the vertical channel and S-phases on the horizontal channels, normal seismograph naming conventions are assumed, where Z denotes vertical and N, E, R, T, 1 and 2 denote horizontal channels, either oriented or not. To this end we will only use Z channels if they have a P-pick, and will use one or other horizontal channels only if there is an S-pick on it.
Warning
If there is no phase_hint included in picks, and swin=all, all channels with picks will be used.
Note
If swin=all, then all picks will be used, not just phase-picks (e.g. it will use amplitude picks). If you do not want this then we suggest that you remove any picks you do not want to use in your templates before using the event.
Note
Method specific arguments:
- from_client requires:
param str client_id: string passable by obspy to generate Client, or any object with a get_waveforms method, including a Client instance. param obspy.core.event.Catalog catalog: Catalog of events to generate template for param float data_pad: Pad length for data-downloads in seconds
- from_seishub requires:
param str url: url to seishub database param obspy.core.event.Catalog catalog: Catalog of events to generate template for param float data_pad: Pad length for data-downloads in seconds
- from_sac requires:
param list sac_files: osbpy.core.stream.Stream of sac waveforms, or list of paths to sac waveforms. Note
See eqcorrscan.utils.sac_util.sactoevent for details on how pick information is collected.
- from_meta_file requires:
param str meta_file: Path to obspy-readable event file, or an obspy Catalog param obspy.core.stream.Stream st: Stream containing waveform data for template. Note that this should be the same length of stream as you will use for the continuous detection, e.g. if you detect in day-long files, give this a day-long file! param bool process: Whether to process the data or not, defaults to True.
Note
process_len should be set to the same length as used when computing detections using match_filter.match_filter, e.g. if you read in day-long data for match_filter, process_len should be 86400.
Example
>>> from obspy.clients.fdsn import Client >>> from eqcorrscan.core.template_gen import template_gen >>> client = Client('NCEDC') >>> catalog = client.get_events(eventid='72572665', includearrivals=True) >>> # We are only taking two picks for this example to speed up the >>> # example, note that you don't have to! >>> catalog[0].picks = catalog[0].picks[0:2] >>> templates = template_gen( ... method='from_client', catalog=catalog, client_id='NCEDC', ... lowcut=2.0, highcut=9.0, samp_rate=20.0, filt_order=4, length=3.0, ... prepick=0.15, swin='all', process_len=300, all_horiz=True) >>> templates[0].plot(equal_scale=False, size=(800,600)) # doctest: +SKIP
Example
>>> from obspy import read >>> from eqcorrscan.core.template_gen import template_gen >>> # Get the path to the test data >>> import eqcorrscan >>> import os >>> TEST_PATH = os.path.dirname(eqcorrscan.__file__) + '/tests/test_data' >>> st = read(TEST_PATH + '/WAV/TEST_/' + ... '2013-09-01-0410-35.DFDPC_024_00') >>> quakeml = TEST_PATH + '/20130901T041115.xml' >>> templates = template_gen( ... method='from_meta_file', meta_file=quakeml, st=st, lowcut=2.0, ... highcut=9.0, samp_rate=20.0, filt_order=3, length=2, prepick=0.1, ... swin='S', all_horiz=True) >>> print(len(templates[0])) 10 >>> templates = template_gen( ... method='from_meta_file', meta_file=quakeml, st=st, lowcut=2.0, ... highcut=9.0, samp_rate=20.0, filt_order=3, length=2, prepick=0.1, ... swin='S_all', all_horiz=True) >>> print(len(templates[0])) 15
Example
>>> from eqcorrscan.core.template_gen import template_gen >>> import glob >>> # Get all the SAC-files associated with one event. >>> sac_files = glob.glob(TEST_PATH + '/SAC/2014p611252/*') >>> templates = template_gen( ... method='from_sac', sac_files=sac_files, lowcut=2.0, highcut=10.0, ... samp_rate=25.0, filt_order=4, length=2.0, swin='all', prepick=0.1, ... all_horiz=True) >>> print(templates[0][0].stats.sampling_rate) 25.0 >>> print(len(templates[0])) 15
-
eqcorrscan.core.template_gen.
extract_from_stack
(stack, template, length, pre_pick, pre_pad, Z_include=False, pre_processed=True, samp_rate=None, lowcut=None, highcut=None, filt_order=3)[source]¶ Extract a multiplexed template from a stack of detections.
Function to extract a new template from a stack of previous detections. Requires the stack, the template used to make the detections for the stack, and we need to know if the stack has been pre-processed.
Parameters: - stack (obspy.core.stream.Stream) – Waveform stack from detections. Can be of any length and can have delays already included, or not.
- template (obspy.core.stream.Stream) – Template used to make the detections in the stack. Will use the delays of this for the new template.
- length (float) – Length of new template in seconds
- pre_pick (float) – Extract additional data before the detection, seconds
- pre_pad (float) – Pad used in seconds when extracting the data, e.g. the time before the detection extracted. If using clustering.extract_detections this half the length of the extracted waveform.
- Z_include (bool) – If True will include any Z-channels even if there is no template for this channel, as long as there is a template for this station at a different channel. If this is False and Z channels are included in the template Z channels will be included in the new_template anyway.
- pre_processed (bool) – Have the data been pre-processed, if True (default) then we will only cut the data here.
- samp_rate (float) – If pre_processed=False then this is required, desired sampling rate in Hz, defaults to False.
- lowcut (float) – If pre_processed=False then this is required, lowcut in Hz, defaults to False.
- highcut (float) – If pre_processed=False then this is required, highcut in Hz, defaults to False
- filt_order (int) – If pre_processed=False then this is required, filter order, defaults to False
Returns: Newly cut template.
Return type:
Utils¶
Various utility functions to help the core routines, and/or for use to analyse the results of the core routines.
archive_read¶
Helper functions for reading data from archives for the EQcorrscan package.
Note
These functions are tools to aid simplification of general scripts, they do not cover all use cases, however if you have a use case you want to see here, then let the authors know, or implement it yourself and contribute it back to the project.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Private Functions¶
_check_available_data |
Function to check what stations are available in the archive for a given day. |
_get_station_file |
Helper function to find the correct file. |
catalog_to_dd¶
Functions to generate hypoDD input files from catalogs.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
compute_differential_times |
Generate groups of differential times for a catalog. |
read_phase |
Read hypoDD phase files into Obspy catalog class. |
write_catalog |
Generate a dt.ct file for hypoDD for a series of events. |
write_correlations |
Write a dt.cc file for hypoDD input for a given list of events. |
write_event |
Write obspy.core.event.Catalog to a hypoDD format event.dat file. |
write_phase |
Write a phase.dat formatted file from an obspy catalog. |
write_station |
Write a hypoDD formatted station file. |
catalog_utils¶
Helper functions for common handling tasks for catalog objects.
Note
These functions are tools to aid simplification of general scripts, they do not cover all use cases, however if you have a use case you want to see here, then let the authors know, or implement it yourself and contribute it back to the project, or, if its really good, give it to the obspy guys!
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
filter_picks |
Filter events in the catalog based on a number of parameters. |
clustering¶
Functions to cluster seismograms by a range of constraints.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
svd |
Compute the SVD of a number of templates. |
svd_to_stream |
Convert the singular vectors output by SVD to streams. |
catalog_cluster |
Cluster a catalog by distance only. |
cluster |
Cluster template waveforms based on average correlations. |
corr_cluster |
Group traces based on correlations above threshold with the stack. |
cross_chan_correlation |
Calculate cross-channel correlation. |
dist_mat_km |
Compute the distance matrix for a catalog using hypocentral separation. |
distance_matrix |
Compute distance matrix for waveforms based on cross-correlations. |
empirical_svd |
Empirical subspace detector generation function. |
extract_detections |
Extract waveforms associated with detections |
group_delays |
Group template waveforms according to their arrival times (delays). |
re_thresh_csv |
Remove detections by changing the threshold. |
space_time_cluster |
Cluster detections in space and time. |
correlate¶
Correlation functions for multi-channel cross-correlation of seismic data.
Various routines used mostly for testing, including links to a compiled routine using FFTW, a Numpy fft routine which uses bottleneck for normalisation and a compiled time-domain routine. These have varying levels of efficiency, both in terms of overall speed, and in memory usage. The time-domain is the most memory efficient but slowest routine (although fast for small cases of less than a few hundred correlations), the Numpy routine is fast, but memory inefficient due to a need to store large double-precision arrays for normalisation. The fftw compiled routine is faster and more memory efficient than the Numpy routine.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
fftw_multi_normxcorr |
Use a C loop rather than a Python loop - in some cases this will be fast. |
fftw_normxcorr |
Normalised cross-correlation using the fftw library. |
numpy_normxcorr |
Compute the normalized cross-correlation using numpy and bottleneck. |
time_multi_normxcorr |
Compute cross-correlations in the time-domain using C routine. |
get_array_xcorr |
Get an normalized cross correlation function that takes arrays as inputs. |
get_stream_xcorr |
Return a function for performing normalized cross correlation on lists of streams. |
register_array_xcorr |
Decorator for registering correlation functions. |
All functions within this module (and therefore all correlation functions used in EQcorrscan) are normalised cross-correlations, which follow the definition that Matlab uses for normxcorr2.
- In the time-domain this is as follows
- \[c(t) = \frac{\sum_{y}(a_{y} - \overline{a})(b_{t+y} - \overline{b_t})}{\sqrt{\sum_{y}(a_{y} - \overline{a})(a_{y} - \overline{a})\sum_{y}(b_{t+y} - \overline{b_t})(b_{t+y} - \overline{b_t})}}\]
where \(c(t)\) is the normalised cross-correlation at at time \(t\), \(a\) is the template window which has length \(y\), \(b\) is the continuous data, and \(\overline{a}\) is the mean of the template and \(\overline{b_t}\) is the local mean of the continuous data.
In practice the correlations only remove the mean of the template once, but we do remove the mean from the continuous data at every time-step. Without doing this (just removing the global mean), correlations are affected by jumps in average amplitude, which is most noticeable during aftershock sequences, or when using short templates.
In the frequency domain (functions eqcorrscan.utils.correlate.numpy_normxcorr()
,
eqcorrscan.utils.correlate.fftw_normxcorr()
, fftw_multi_normxcorr()
),
correlations are computed using an equivalent method. All methods are tested
against one-another to check for internal consistency, and checked against results
computed using Matlab’s normxcorr2 function. These routines also give the same
(within 0.00001) of openCV cross-correlations.
Selecting a correlation function¶
EQcorrscan strives to use sensible default algorithms for calculating correlation values, however, you may want to change how correlations are caclulated to be more advantageous to your specific needs.
There are currently 3 different correlations functions currently included in EQcorrscan:
eqcorrscan.utils.correlate.numpy_normxcorr()
known as “numpy”eqcorrscan.utils.correlate.time_multi_normxcorr()
known as “time_domain”eqcorrscan.utils.correlate.fftw_normxcorr()
known as “fftw”
Number 3 is the default.
Setting FFT length¶
For version 0.4.0 onwards, the “fftw” backend allows the user to pass an fft_len keyword to it. This will set the length of the FFT internally. Smaller FFT’s use less memory, but can be faster. Larger FFTs use more memory and can be slower. By default this is set to the minimum of 2 ** 13, or the next fast length of the sum of the template length and the stream length. #285 showed that 2 ** 13 was consistently fastest over a range of data shapes on an intel i7 with 8-threads. Powers of two are generally fastest.
Using Fast Matched Filter within EQcorrscan¶
Fast Matched Filter provides fast time-domain correlations for both CPU and GPU architectures. For massively multi-threaded environment this runs faster than the frequency-domain routines native to EQcorrscan (when more than 40 CPU cores, or an NVIDIA GPU card is available) with less memory consumption. Documentation on how to call Fast Matched Filter from within EQcorrscan is provided here: fast_matched_filter
Switching which correlation function is used¶
You can switch between correlation functions using the xcorr_func parameter
included in:
- eqcorrscan.core.match_filter.match_filter()
,
- eqcorrscan.core.match_filter.Tribe.detect()
,
- eqcorrscan.core.match_filter.Template.detect()
by:
- passing a string (eg “numpy”, “time_domain”, or “fftw”) or;
- passing a function
for example:
>>> import obspy
>>> import numpy as np
>>> from eqcorrscan.utils.correlate import numpy_normxcorr, set_xcorr
>>> from eqcorrscan.core.match_filter import match_filter
>>> # generate some toy templates and stream
>>> random = np.random.RandomState(42)
>>> template = obspy.read()
>>> stream = obspy.read()
>>> for num, tr in enumerate(stream): # iter stream and embed templates
... data = tr.data
... tr.data = random.randn(6000) * 5
... tr.data[100: 100 + len(data)] = data
>>> # do correlation using numpy rather than fftw
>>> detections = match_filter(['1'], [template], stream, .5, 'absolute',
... 1, False, xcorr_func='numpy')
>>> # do correlation using a custom function
>>> def custom_normxcorr(templates, stream, pads, *args, **kwargs):
... # Just to keep example short call other xcorr function
... print('calling custom xcorr function')
... return numpy_normxcorr(templates, stream, pads, *args, **kwargs)
>>> detections = match_filter(
... ['1'], [template], stream, .5, 'absolute', 1, False,
... xcorr_func=custom_normxcorr)
calling custom xcorr function...
You can also use the set_xcorr object (eqcorrscan.utils.correlate.set_xcorr) to change which correlation function is used. This can be done permanently or within the scope of a context manager:
>>> # change the default xcorr function for all code in the with block
>>> with set_xcorr(custom_normxcorr):
... detections = match_filter(['1'], [template], stream, .5,
... 'absolute', 1, False)
calling custom xcorr function...
>>> # permanently set the xcorr function (until the python kernel restarts)
>>> set_xcorr(custom_normxcorr)
default changed to custom_normxcorr
>>> detections = match_filter(['1'], [template], stream, .5, 'absolute',
... 1, False)
calling custom xcorr function...
>>> set_xcorr.revert() # change it back to the previous state
Notes on accuracy¶
To cope with floating-point rounding errors, correlations may not be calculated for data with low variance. If you see a warning saying: “Some correlations not computed, are there zeros in the data? If not consider increasing gain”, check whether your data have zeros, and if not, but have low, but real amplitudes, multiply your data by some value.
Warning¶
If data are padded with zeros prior to filtering then correlations may be incorrectly calculated where there are zeros. You should always pad after filtering. If you see warnings saying “Low variance, possible zeros, check result”, you should check that you have padded correctly and check that correlations haven’t been calculated when you don’t expect them.
despike¶
Functions for despiking seismic data.
Warning
Despike functions are in beta, they do not work that well.
Todo
Deconvolve spike to remove it, find peaks in the f-domain.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
median_filter |
Filter out spikes in data above a multiple of MAD of the data. |
template_remove |
Looks for instances of template in the trace and removes the matches. |
findpeaks¶
Functions to find peaks in data above a certain threshold.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
coin_trig |
Find network coincidence triggers within peaks of detection statistics. |
decluster |
Decluster peaks based on an enforced minimum separation. |
find_peaks_compiled |
Determine peaks in an array of data above a certain threshold. |
find_peaks2_short |
Determine peaks in an array of data above a certain threshold. |
multi_find_peaks |
Wrapper for find-peaks for multiple arrays. |
mag_calc¶
Functions to aid magnitude estimation.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
amp_pick_event |
Pick amplitudes for local magnitude for a single event. |
calc_b_value |
Calculate the b-value for a range of completeness magnitudes. |
calc_max_curv |
Calculate the magnitude of completeness using the maximum curvature method. |
dist_calc |
Function to calculate the distance in km between two points. |
svd_moments |
Calculate relative moments/amplitudes using singular-value decomposition. |
relative_amplitude |
Compute the relative amplitudes between two streams. |
relative_magnitude |
Compute the relative magnitudes between two events. |
Private Functions¶
Note that these functions are not designed for public use and may change at any point.
_sim_WA |
Remove the instrument response from a trace and simulate a Wood-Anderson. |
_pairwise |
Wrapper on itertools for SVD_magnitude. |
_max_p2t |
Finds the maximum peak-to-trough amplitude and period. |
picker¶
Functions to pick earthquakes detected by EQcorrscan.
Designed primarily locate stacks of detections to give family locations. Extensions may later be written, not tested for accuracy, just simple wrappers.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
cross_net |
Generate picks using a simple envelope cross-correlation. |
stalta_pick |
Basic sta/lta picker, suggest using alternative in obspy. |
plotting¶
Utility code for most of the plots used as part of the EQcorrscan package.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
All plotting functions accept the save, savefile, title and show keyword arguments.
chunk_data
Downsample data for plotting. cumulative_detections
Plot cumulative detections or detection rate in time. detection_multiplot
Plot a stream of data with a template on top of it at detection times. freq_mag
Plot a frequency-magnitude histogram and cumulative density plot. interev_mag
Plot inter-event times against magnitude. multi_event_singlechan
Plot data from a single channel for multiple events. obspy_3d_plot
Plot obspy Inventory and obspy Catalog classes in three dimensions. peaks_plot
Plot peaks to check that the peak finding routine is running correctly. plot_repicked
Plot a template over a detected stream, with picks corrected by lag-calc. plot_synth_real
Plot multiple channels of data for real data and synthetic. pretty_template_plot
Plot of a single template, possibly within background data. spec_trace
Plots seismic data with spectrogram behind. subspace_detector_plot
Plotting for the subspace detector class. svd_plot
Plot singular vectors from the eqcorrscan.utils.clustering
routines.threeD_gridplot
Plot in a series of grid points in 3D. threeD_seismplot
Plot seismicity and stations in a 3D, movable, zoomable space. triple_plot
Plot a seismogram, correlogram and histogram. xcorr_plot
Plot a template overlying an image aligned by correlation. noise_plot
Plot signal and noise fourier transforms and the difference.
pre_processing¶
Pre-processing modules take care of ensuring all data are processed the same. Note that gaps should be padded after filtering if you decide not to use these routines (see notes in correlation_warnings).
Utilities module whose functions are designed to do the basic processing of the data using obspy modules (which also rely on scipy and numpy).
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
sac_util¶
Part of the EQcorrscan package: tools to convert SAC headers to obspy event objects.
Note
This functionality is not supported for obspy versions below 1.0.0 as references times are not read in by SACIO, which are needed for defining pick times.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
sactoevent |
Convert SAC headers (picks only) to obspy event class. |
stacking¶
Utility module of the EQcorrscan package to allow for different methods of stacking of seismic signal in one place.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
linstack |
Compute the linear stack of a series of seismic streams of multiplexed data. |
PWS_stack |
Compute the phase weighted stack of a series of streams. |
align_traces |
Align traces relative to each other based on their cross-correlation value. |
synth_seis¶
Early development functions to do very basic simulations of seismograms to be used as general matched-filter templates and see how well a simple model would fit with real data. Mostly used in EQcorrscan for testing.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
generate_synth_data |
Generate a synthetic dataset to be used for testing. |
seis_sim |
Generate a simulated seismogram from a given S-P time. |
SVD_sim |
Generate basis vectors of a set of simulated seismograms. |
template_grid |
Generate a group of synthetic seismograms for a grid of sources. |
trigger¶
Functions to enable simple energy-base triggering in a network setting where different stations have different noise parameters.
copyright: | EQcorrscan developers. |
---|---|
license: | GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html) |
Classes & Functions¶
network_trigger |
Main function to compute triggers for a network of stations. |
read_trigger_parameters |
Read the trigger parameters into trigger_parameter classes. |
TriggerParameters |
Base class for trigger parameters. |