4. Subspace Detection

EQcorrscan’s subspace detection methods are closely modelled on the method described by Harris (2006), Subspace Detectors: Theory. We offer options to multiplex data or leave as single-channels (multiplexing is significantly faster).

Subspace detection is implemented in an object-oriented style, whereby individual detectors are constructed from data, then used to detect within continuous data. At the core of the subspace routine is a Cython-based, static-typed routine to calculate the detection statistics. We do this to make use of numpy’s vectorized calculations, while taking advantage of the speed-ups afforded by compiling the sliding window loop.

4.1. WARNING

Subspace in EQcorrscan is in beta, you must check your results match what you expect - if you find errors please report them. Although our test-cases run correctly, changes in data quality may affect the routines in ways we have not accounted for.

4.2. Important

How you generate you detector is likely to be the most important thing, careful selection and alignment is key, and because of this we haven’t provided a total cookie-cutter system for doing this. You have freedom to chose your parameters, how to process, how you align, what traces to keep, whether you multiplex or not, etc. This also means you have a lot of freedom to get it wrong. You will have to do significant testing with your own dataset to work out what works and what doesn’t. Anything that you find that doesn’t work well in EQcorrscans system, it would be great to hear about so that we can make it better.

The following examples demonstrate some of the options, but not all of them. The advanced example is the example used to test and develop subspace and took a fair amount of effort over a number of weeks.

4.3. Simple example

To begin with you will need to create a Detector:

>>> from eqcorrscan.core import subspace
>>> detector = subspace.Detector()

This will create an empty detector object. These objects have various attributes, including the data to be used as a detector (detector.data), alongside the full input and output basis vector matrices (detector.u and detector.v respectively) and the vector of singular-values (detector.sigma). Meta-data are also included, including whether the detector is multiplexed or not (detector.multiplex), the filters applied (detector.lowcut, detector.highcut, detection.filt_order, detector.sampling_rate), the dimension of the subspace (detector.dimension), and the name of the detector, which you can use for book-keeping (detector.name).

To populate the empty detector you need a design set of streams that have been aligned (see clustering submodule for alignment methods).

>>> from obspy import read
>>> import glob
>>> import os
>>> from eqcorrscan import tests
>>> # Get the path for the test-data so we can test this
>>> TEST_PATH = os.path.dirname(tests.__file__)
>>> wavefiles = glob.glob(
...    TEST_PATH + '/test_data/similar_events_processed/*')
>>> wavefiles.sort()  # Sort the wavefiles to ensure reproducibility
>>> streams = [read(w) for w in wavefiles[0:3]]
>>> # Channels must all be the same length
>>> detector.construct(streams=streams, lowcut=2, highcut=9, filt_order=4,
...                    sampling_rate=20, multiplex=True, name='Test_1',
...                    align=True, shift_len=0.5, reject=0.2)
Detector: Test_1

This will populate all the attributes of your detector object, and fill the detector.data with the full input basis vector matrix.

You will want to reduce the dimensions of your subspace detector, such that you are just describing the signal, preferably with a lot of generality. Details for selecting dimensionality should be found in Harris (2006). To do this in EQcorrscan simply use the partition method:

>>> detector.partition(2)
Detector: Test_1

This will populate detector.data with the first four, left-most input basis vectors. You can test to see how much of your original design set is described by this detector by using the energy_capture method:

>>> percent_capture = detector.energy_capture()

This will return a percentage capture, you can run this for multiple dimensions to test what dimension best suits your application. Again, details for this selection can be found in Harris (2006).

Finally, to use your detector to detect within continuous data you should use the detect method. This requires a stream with the same stations and channels used in the detector, and a threshold from 0-1, where 0 is no signal, and 1 is totally described by your detector. You can extract streams for the detections at the same time as the detections by setting the extract_detections flag to True.

>>> stream = read(wavefiles[0])
>>> detections = detector.detect(st=stream, threshold=0.5, trig_int=3) 

4.4. Advanced Example

This example computes detections for a short data-period during an earthquake sequence in the Wairarapa region of New Zealand’s North Island. This example only shows one subspace detector, but could be extended, using the various clustering routines in EQcorrscan, to create many subspace detectors. These could be run using the subspace_detect function, which runs similar detectors in parallel through the given data.

"""
Advanced subspace tutorial to show some of the capabilities of the method.

This example uses waveforms from a known earthquake sequence (in the Wairarapa
region north of Wellington, New Zealand). The catalogue locations etc can
be downloaded from this link:

http://quakesearch.geonet.org.nz/services/1.0.0/csv?bbox=175.37956,-40.97912,175.53097,-40.84628&startdate=2015-7-18T2:00:00&enddate=2016-7-18T3:00:00

"""

import logging

from http.client import IncompleteRead
from obspy.clients.fdsn import Client
from obspy import UTCDateTime, Stream

from eqcorrscan.utils.catalog_utils import filter_picks
from eqcorrscan.utils.clustering import catalog_cluster
from eqcorrscan.core import subspace

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")


def run_tutorial(plot=False, multiplex=True, return_streams=False, cores=4,
                 verbose=False):
    """
    Run the tutorial.

    :return: detections
    """
    client = Client("GEONET", debug=verbose)
    cat = client.get_events(
        minlatitude=-40.98, maxlatitude=-40.85, minlongitude=175.4,
        maxlongitude=175.5, starttime=UTCDateTime(2016, 5, 1),
        endtime=UTCDateTime(2016, 5, 20))
    print(f"Downloaded a catalog of {len(cat)} events")
    # This gives us a catalog of events - it takes a while to download all
    # the information, so give it a bit!
    # We will generate a five station, multi-channel detector.
    cat = filter_picks(catalog=cat, top_n_picks=5)
    stachans = list(set(
        [(pick.waveform_id.station_code, pick.waveform_id.channel_code)
         for event in cat for pick in event.picks]))
    # In this tutorial we will only work on one cluster, defined spatially.
    # You can work on multiple clusters, or try to whole set.
    clusters = catalog_cluster(
        catalog=cat, metric="distance", thresh=2, show=False)
    # We will work on the largest cluster
    cluster = sorted(clusters, key=lambda c: len(c))[-1]
    # This cluster contains 32 events, we will now download and trim the
    # waveforms.  Note that each chanel must start at the same time and be the
    # same length for multiplexing.  If not multiplexing EQcorrscan will
    # maintain the individual differences in time between channels and delay
    # the detection statistics by that amount before stacking and detection.
    client = Client('GEONET')
    design_set = []
    st = Stream()
    for event in cluster:
        print(f"Downloading for event {event.resource_id.id}")
        bulk_info = []
        t1 = event.origins[0].time
        t2 = t1 + 25.1  # Have to download extra data, otherwise GeoNet will
        # trim wherever suits.
        t1 -= 0.1
        for station, channel in stachans:
            try:
                st += client.get_waveforms(
                    'NZ', station, '*', channel[0:2] + '?', t1, t2)
            except IncompleteRead:
                print(f"Could not download for {station} {channel}")
    print(f"Downloaded {len(st)} channels")
    for event in cluster:
        t1 = event.origins[0].time
        t2 = t1 + 25
        design_set.append(st.copy().trim(t1, t2))
    # Construction of the detector will process the traces, then align them,
    # before multiplexing.
    print("Making detector")
    detector = subspace.Detector()
    detector.construct(
        streams=design_set, lowcut=2.0, highcut=9.0, filt_order=4,
        sampling_rate=20, multiplex=multiplex, name='Wairarapa1', align=True,
        reject=0.2, shift_len=6, plot=plot).partition(9)
    print("Constructed Detector")
    if plot:
        detector.plot()
    # We also want the continuous stream to detect in.
    t1 = UTCDateTime(2016, 5, 11, 19)
    t2 = UTCDateTime(2016, 5, 11, 20)
    # We are going to look in a single hour just to minimize cost, but you can
    # run for much longer.
    bulk_info = [('NZ', stachan[0], '*',
                  stachan[1][0] + '?' + stachan[1][-1],
                  t1, t2) for stachan in detector.stachans]
    print("Downloading continuous data")
    st = client.get_waveforms_bulk(bulk_info)
    st.merge().detrend('simple').trim(starttime=t1, endtime=t2)
    # We set a very low threshold because the detector is not that great, we
    # haven't aligned it particularly well - however, at this threshold we make
    # two real detections.
    print("Computing detections")
    detections, det_streams = detector.detect(
        st=st, threshold=0.4, trig_int=2, extract_detections=True,
        cores=cores)
    if return_streams:
        return detections, det_streams
    else:
        return detections


if __name__ == '__main__':
    run_tutorial()