6.2.4.1.4. eqcorrscan.utils.clustering.cluster

eqcorrscan.utils.clustering.cluster(template_list, show=True, corr_thresh=0.3, shift_len=0, allow_individual_trace_shifts=True, save_corrmat=False, replace_nan_distances_with=None, cores='all', **kwargs)[source]

Cluster template waveforms based on average correlations.

Function to take a set of templates and cluster them, will return groups as lists of streams. Clustering is done by computing the cross-channel correlation sum of each stream in stream_list with every other stream in the list. scipy.cluster.hierarchy functions are then used to compute the complete distance matrix, where distance is 1 minus the normalised cross-correlation sum such that larger distances are less similar events. Groups are then created by clustering the distance matrix at distances less than 1 - corr_thresh.

When distance_matrix contains NaNs (event pairs that cannot be directly compared), then the mean correlation between templates is used instead of NaN (see https://github.com/eqcorrscan/EQcorrscan/issues/484).

Will compute the distance matrix in parallel, using all available cores. The method, metric, and order to compute linkage from the distance matrix can be controled with parameters from scipy.cluster.hierarchy.linkage as kwargs.

Parameters:
  • template_list (list) – List of tuples of the template (obspy.core.stream.Stream) and the template id to compute clustering for

  • show (bool) – plot linkage on screen if True, defaults to True

  • corr_thresh (float) – Cross-channel correlation threshold for grouping

  • shift_len (float) – How many seconds to allow the templates to shift

  • allow_individual_trace_shifts (bool) – Controls whether templates are shifted by shift_len in relation to the picks as a whole, or whether each trace can be shifted individually. Defaults to True.

  • save_corrmat (bool) – If True will save the distance matrix to dist_mat.npy in the local directory.

  • replace_nan_distances_with (None, 'mean', 'min', or float) – Controls how the clustering handles nan-distances in the distance matrix. None/False only performs a check, while other choices (e.g., 1, ‘mean’, ‘min’ or float) replace nans in the distance matrix.

  • cores (int) – number of cores to use when computing the distance matrix, defaults to ‘all’ which will work out how many cpus are available and hog them.

Returns:

List of groups. Each group is a list of obspy.core.stream.Stream making up that group.