The `clus` module

This module is a part of the pyOMA2 package and provides utility functions to support the implementation of clustering algorithms.

Functions:

kmeans(): Perform k-means clustering on the given feature array.
GMM(): Perform Gaussian Mixture Model (GMM) clustering on the given feature array.
hierarc(): Perform hierarchical clustering with specified parameters.
spectral(): Perform spectral clustering with the given similarity matrix.
affinity(): Perform affinity propagation clustering on the given similarity matrix.
optics(): Perform OPTICS clustering on the given pairwise distance matrix.
hdbscan(): Perform HDBSCAN clustering on the given pairwise distance matrix.
reorder_clusters(): Reorder cluster labels based on ascending frequencies values.
post_freq_lim(): Filter clusters based on specified frequency range.
post_fn_med(): Filter clusters based on a median frequency threshold.
post_fn_IQR(): Filter clusters based on the interquartile range (IQR) of frequencies.
post_xi_IQR(): Filter clusters based on the interquartile range (IQR) of damping values.
post_min_size(): Filter clusters based on a minimum cluster size.
post_min_size_pctg(): Filter clusters based on a percentage of the largest cluster size.
post_min_size_kmeans(): Filter clusters based on size using k-means clustering.
post_min_size_gmm(): Filter clusters based on size using Gaussian Mixture Model (GMM).
post_merge_similar(): Merge clusters that are similar based on inter-medoid distances.
post_1xorder(): Ensure only one sample per order in each cluster.
post_MTT(): Ensure only one sample per order in each cluster.
output_selection(): Select output results based on the specified selection method.
MTT(): Apply the Modified Thompson Tau technique to remove outliers.
filter_fl_list(): Filter and extract stable elements from a list of feature arrays.
vectorize_features(): Vectorize features by flattening them and indexing valid (non-NaN) entries.
build_tot_simil(): Compute a total similarity matrix by combining multiple distance matrices with weights.
build_tot_dist(): Compute a total distance matrix by combining multiple distance matrices with weights.
build_feature_array(): Build a feature array from multiple distance metrics with optional transformations.
oned_to_2d(): Convert a 1D array to a 2D array based on order and shape.
UnionFind: A Union-Find data structure for efficient disjoint set operations.
relative_difference_abs(): Compute the relative absolute difference between two values.
MAC_difference(): Compute the Modal Assurance Criterion (MAC) difference between two mode shapes.
dist_all_f(): Compute a pairwise distance matrix for a flattened 1D array using relative absolute difference.
dist_all_phi(): Compute a pairwise distance matrix for 3D mode shape data using the MAC difference.
dist_all_complex(): Compute pairwise relative distances for a 1D array of complex numbers.
dist_n_n1_f(): Compute distances between successive columns of a 2D array using relative differences.
dist_n_n1_phi(): Compute distances between successive columns of a 3D mode shape array using MAC differences.
dist_n_n1_f_complex(): Compute distances between successive columns of a 2D complex array using relative differences.
dist_all_complex(): Compute pairwise relative distances for a 1D array of complex numbers.
FuzzyCMeansClustering: Fuzzy C-Means clustering class implementation.
FCMeans(): Perform Fuzzy C-Means clustering on the given feature array.
post_adjusted_boxplot(): Filter clusters using the adjusted boxplot method.
adjusted_boxplot_bounds(): Compute adjusted boxplot bounds (used in outlier detection).

Created on Sun Nov 24 07:07:42 2024

@author: dagghe

pyoma2.functions.clus.FCMeans(feat_arr: ndarray) → ndarray[source]

Perform Fuzzy C-Means clustering on the given feature array.

Parameters:: feat_arr (ndarray of shape (n_samples, n_features)) – Input feature array for clustering.
Returns:: labels_all – Cluster labels for each sample. Labels are adjusted such that the first cluster corresponds to the smaller centroid (stable modes).
Return type:: ndarray of shape (n_samples,)

class pyoma2.functions.clus.FuzzyCMeansClustering(n_clusters: int = 2, m: float = 2.0, max_iter: int = 100, tol: float = 1e-05, random_state: int | None = None)[source]

Fuzzy C-Means clustering algorithm class.

Parameters:

n_clusters (int, default=2) – The number of clusters to form.
m (float, default=2.0) – Fuzziness parameter. Must be > 1.
max_iter (int, default=100) – Maximum number of iterations of the algorithm.
tol (float, default=1e-5) – Tolerance for convergence. If improvement is less than tol, stop.
random_state (int, default=None) – Seed for membership matrix initialization.

__init__(n_clusters: int = 2, m: float = 2.0, max_iter: int = 100, tol: float = 1e-05, random_state: int | None = None) → None[source]

fit(X: ndarray, y: ndarray | None = None) → FuzzyCMeansClustering[source]

Compute fuzzy c-means clustering.

Parameters:: X (array-like of shape (n_samples, n_features)) – Training instances to cluster.
Return type:: self

fit_predict(X: ndarray, y: ndarray | None = None) → ndarray[source]

Compute cluster centers and predict cluster index for each sample.

Returns:: labels
Return type:: ndarray of shape (n_samples,)

predict(X: ndarray) → ndarray[source]

Predict the closest cluster each sample in X belongs to.

Parameters:: X (array-like of shape (n_samples, n_features)) –
Returns:: labels – Index of the cluster each sample belongs to.
Return type:: ndarray of shape (n_samples,)

pyoma2.functions.clus.GMM(feat_arr: ndarray, dist: bool = False) → ndarray | tuple[numpy.ndarray, float][source]

Perform Gaussian Mixture Model (GMM) clustering on the given feature array.

Parameters:: feat_arr (ndarray of shape (n_samples, n_features)) – Input feature array for clustering.
Returns:: labels_all – Cluster labels for each sample. Labels are adjusted such that the first cluster corresponds to the smaller mean (stable modes).
Return type:: ndarray of shape (n_samples,)

pyoma2.functions.clus.MAC_difference(x: ndarray, y: ndarray) → float[source]

Compute the Modal Assurance Criterion (MAC) difference between two mode shapes.

Parameters:

x (ndarray) – First mode shape vector.
y (ndarray) – Second mode shape vector.

Returns:

The MAC difference between x and y, defined as 1 - MAC(x, y).

Return type:

float

pyoma2.functions.clus.MTT(arr: ndarray, indices: ndarray, alpha: float = 0.01) → ndarray[source]

Apply the Modified Thompson Tau technique to remove outliers.

Parameters:

arr (ndarray) – Array of values to filter.
indices (ndarray) – Indices of the values in the original dataset.
alpha (float, optional) – Significance level for outlier detection. Default is 0.01.

Returns:

ind – Indices of values that are not outliers.

Return type:

ndarray

class pyoma2.functions.clus.UnionFind(elements: list[int])[source]

A Union-Find data structure for efficient disjoint set operations.

parent

Maps each element to its parent in the disjoint-set forest.

Type:: dict

find(elem):: Find the root of the set containing elem with path compression.

union(elem1, elem2):: Merge the sets containing elem1 and elem2.

__init__(elements: list[int]) → None[source]

pyoma2.functions.clus.adjusted_boxplot_bounds(data: ndarray) → tuple[float, float][source]

Compute the lower and upper fences of the adjusted boxplot for skewed distributions.

For MC >= 0:: lower_bound = Q1 - 1.5 * exp(-4 * MC) * IQR upper_bound = Q3 + 1.5 * exp(3 * MC) * IQR
For MC < 0:: lower_bound = Q1 - 1.5 * exp(-3 * MC) * IQR upper_bound = Q3 + 1.5 * exp(4 * MC) * IQR

Parameters:: data (array-like) – 1D numeric data.
Returns:: (lower_bound, upper_bound)
Return type:: tuple

pyoma2.functions.clus.affinity(dsim: ndarray) → ndarray[source]

Perform affinity propagation clustering on the given similarity matrix.

Parameters:: dsim (ndarray of shape (n_samples, n_samples)) – Precomputed similarity matrix for clustering.
Returns:: labels_clus – Cluster labels for each sample.
Return type:: ndarray of shape (n_samples,)

pyoma2.functions.clus.build_feature_array(distances: list[str], data_dict: dict[str, numpy.ndarray], ordmax: int, step: int, transform: str | None = None) → ndarray[source]

Build a feature array from multiple distance metrics with optional transformations.

Parameters:

distances (list of str) – A list of distance metrics to compute features (e.g., ‘dfn’, ‘dxi’, ‘dlambda’, ‘dMAC’, ‘dMPC’, ‘dMPD’).
data_dict (dict) – Dictionary containing data arrays corresponding to each distance metric. Expected keys include ‘Fns’, ‘Xis’, ‘Lambdas’, ‘Phis’, ‘MPC’, and ‘MPD’.
ordmax (int) – Maximum order to consider for feature computation.
step (int) – Step size for iterating through model orders.
transform (str, optional) – Transformation method for features, such as ‘box-cox’, by default None.

Returns:

A 2D feature array, where each column corresponds to a specific distance metric.

Return type:

np.ndarray

Raises:

AttributeError – If the transform is not ‘box-cox’ or None.

pyoma2.functions.clus.build_tot_dist(distances: list[str], data_dict: dict[str, numpy.ndarray], len_fl: int, weights: ndarray | str | None = None, sqrtsqr: bool = False) → ndarray[source]

Compute a total distance matrix by combining multiple distance matrices with weights.

Parameters:

distances (list of str) – A list of distance metrics (e.g., ‘dfn’, ‘dxi’, ‘dlambda’, ‘dMAC’, ‘dMPC’, ‘dMPD’).
data_dict (dict) – Dictionary containing data arrays corresponding to each distance metric. Expected keys include ‘Fn_fl’, ‘Xi_fl’, ‘Lambda_fl’, ‘Phi_fl’, ‘MPC_fl’, and ‘MPD_fl’.
len_fl (int) – The size of the resulting distance matrix (len_fl x len_fl).
weights (np.ndarray, optional) – Weights for each distance metric. By default, equal weights are assigned to all metrics.
sqrtsqr (bool, optional) – Whether to apply a squared-sum approach for combining distances, by default False.

Returns:

A total distance matrix (square, of shape (len_fl, len_fl)).

Return type:

np.ndarray

Raises:

AttributeError – If the lengths of distances and weights do not match.

pyoma2.functions.clus.build_tot_simil(distances: list[str], data_dict: dict[str, numpy.ndarray], len_fl: int, weights: ndarray | None = None) → ndarray[source]

Compute a total similarity matrix by combining multiple distance matrices with weights.

Parameters:

distances (list of str) – A list of distance metrics (e.g., ‘dfn’, ‘dxi’, ‘dlambda’, ‘dMAC’, ‘dMPC’, ‘dMPD’).
data_dict (dict) – Dictionary containing data arrays corresponding to each distance metric. Expected keys include ‘Fn_fl’, ‘Xi_fl’, ‘Lambda_fl’, ‘Phi_fl’, ‘MPC_fl’, and ‘MPD_fl’.
len_fl (int) – The size of the resulting similarity matrix (len_fl x len_fl).
weights (np.ndarray, optional) – Weights for each distance metric. Must sum to 1 if specified. By default None.

Returns:

A total similarity matrix (square, of shape (len_fl, len_fl)). Values are scaled between 0 and 1.

Return type:

np.ndarray

Raises:

AttributeError – If the weights do not sum to 1 or if the lengths of distances and weights do not match.

pyoma2.functions.clus.dist_all_complex(complex_array: ndarray) → ndarray[source]

Compute pairwise relative distances for a 1D array of complex numbers.

Parameters:: complex_array (np.ndarray) – Input array of complex numbers of shape (M,) or (M, N), flattened to 1D if 2D.
Returns:: Pairwise relative distance matrix of shape (P, P), where P is the number of valid (non-NaN) complex entries.
Return type:: np.ndarray

Notes

Relative distance is computed as the modulus of the difference divided by the maximum modulus.
Invalid values (NaNs or infinite values) are handled gracefully and set to 0.

pyoma2.functions.clus.dist_all_f(array: ndarray) → ndarray[source]

Compute a pairwise distance matrix for a flattened 1D array using relative absolute difference.

Parameters:: array (np.ndarray) – Input array of shape (M,) or (M, N) to compute pairwise distances. If 2D, the array is flattened column-wise (Fortran order).
Returns:: Pairwise distance matrix of shape (P, P), where P is the number of non-NaN entries in array.
Return type:: np.ndarray

pyoma2.functions.clus.dist_all_phi(array: ndarray) → ndarray[source]

Compute a pairwise distance matrix for 3D mode shape data using the MAC difference.

Parameters:: array (np.ndarray) – Input 3D array of mode shapes with shape (M, N, K). Each slice array[i, :, :] represents the mode shape data for one observation.
Returns:: Pairwise distance matrix of shape (P, P), where P is the number of non-NaN rows in the reshaped array.
Return type:: np.ndarray

pyoma2.functions.clus.dist_n_n1_f(array: ndarray, ordmin: int, ordmax: int, step: int) → ndarray[source]

Compute distances between successive columns of a 2D array using relative differences.

Parameters:

array (np.ndarray) – Input 2D array of shape (M, N), where M is the number of rows and N is the number of columns.
ordmin (int) – Minimum order for computing distances.
ordmax (int) – Maximum order for computing distances.
step (int) – Step size for iterating through model orders.

Returns:

A 1D array of distances between successive columns, with NaN entries handled appropriately.

Return type:

np.ndarray

pyoma2.functions.clus.dist_n_n1_f_complex(array: ndarray, ordmin: int, ordmax: int, step: int) → ndarray[source]

Compute distances between successive columns of a 2D complex array using relative differences.

Parameters:

array (np.ndarray) – Input 2D array of complex numbers with shape (M, N). Each column represents a different order.
ordmin (int) – Minimum order for computing distances.
ordmax (int) – Maximum order for computing distances.
step (int) – Step size for iterating through model orders.

Returns:

A 1D array of relative distances between successive columns, with NaN entries handled appropriately.

Return type:

np.ndarray

pyoma2.functions.clus.dist_n_n1_phi(array: ndarray, ordmin: int, ordmax: int, step: int) → ndarray[source]

Compute distances between successive columns of a 3D mode shape array using MAC differences.

Parameters:

array (np.ndarray) – Input 3D array of mode shapes with shape (M, N, K). Each slice array[:, :, k] represents the mode shape data for one observation.
ordmin (int) – Minimum order for computing distances.
ordmax (int) – Maximum order for computing distances.
step (int) – Step size for iterating through model orders.

Returns:

A 1D array of distances between successive columns, with NaN entries handled appropriately.

Return type:

np.ndarray

pyoma2.functions.clus.filter_fl_list(fl_list: list[numpy.ndarray | None], stab_lab: ndarray) → list[numpy.ndarray | None][source]

Filter and extract stable elements from a list of feature arrays.

Parameters:

fl_list (list of ndarray) – List of feature arrays, where each array represents a specific feature.
stab_lab (ndarray) – Indices of stable elements in the feature arrays.

Returns:

List of extracted feature arrays, where only stable elements are retained.

Return type:

list of ndarray

pyoma2.functions.clus.hdbscan(dtot: ndarray, min_size: int) → ndarray[source]

Perform HDBSCAN clustering on the given pairwise distance matrix.

Parameters:

dtot (ndarray of shape (n_samples, n_samples)) – Pairwise distance matrix for clustering.
min_size (int) – Minimum cluster size and minimum number of samples for clustering.

Returns:

labels_clus – Cluster labels for each sample.

Return type:

ndarray of shape (n_samples,)

pyoma2.functions.clus.hierarc(dtot: ndarray, dc: float | str | None, linkage: str, n_clusters: int | str | None, ordmax: int, step: float, Fns: ndarray, Phis: ndarray) → ndarray[source]

Perform hierarchical clustering with specified parameters.

Parameters:

dtot (ndarray of shape (n_samples, n_samples)) – Pairwise distance matrix for clustering.
dc (float or str, optional) – Distance threshold for clustering. Special string options include: - “mu+2sig”: Mean plus two standard deviations of distances. - “95weib”: 95th percentile of a Weibull distribution fit to the distances. - “auto”: Automatic threshold estimation based on KDE.
n_clusters (int or str, optional) – Number of clusters. If “auto”, it is calculated as 25% of the maximum order.
linkage ({'complete', 'average', 'single'}, optional) – Linkage criterion for hierarchical clustering.
ordmax (int) – Maximum order of clustering.
step (float) – Step size for computing distances.
Fns (ndarray) – Frequencies for distance calculation.
Phis (ndarray) – Mode shapes for distance calculation.

Returns:

labels_clus – Cluster labels for each sample.

Return type:

ndarray of shape (n_samples,)

pyoma2.functions.clus.kmeans(feat_arr: ndarray) → ndarray[source]

Perform k-means clustering on the given feature array.

Parameters:: feat_arr (ndarray of shape (n_samples, n_features)) – Input feature array for clustering.
Returns:: labels_all – Cluster labels for each sample. Labels are adjusted such that the first cluster corresponds to the smaller centroid (stable modes).
Return type:: ndarray of shape (n_samples,)

pyoma2.functions.clus.oned_to_2d(list_array1d: list[numpy.ndarray | None], order: ndarray, shape: tuple[int, int], step: int) → list[numpy.ndarray] | None[source]

Convert a 1D array to a 2D array based on order and shape.

Parameters:

array1d (np.ndarray or None) – The input 1D array to reshape.
order (np.ndarray) – Model order array corresponding to the data points in array1d.
shape (tuple of int) – The desired shape of the output 2D array.
step (int) – Step size for iterating through model orders.

Returns:

A 2D array reshaped from array1d, with NaNs where no data is present.

Return type:

np.ndarray

pyoma2.functions.clus.optics(dtot: ndarray, min_size: int) → ndarray[source]

Perform OPTICS clustering on the given pairwise distance matrix.

Parameters:

dtot (ndarray of shape (n_samples, n_samples)) – Pairwise distance matrix for clustering.
min_size (int) – Minimum cluster size and minimum number of samples for clustering.

Returns:

labels_clus – Cluster labels for each sample.

Return type:

ndarray of shape (n_samples,)

pyoma2.functions.clus.output_selection(select: str, clusters: dict[int, numpy.ndarray], flattened_results: tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray], medoid_indices: ndarray | None) → tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray | None][source]

Select output results based on the specified selection method.

Parameters:

select (str) – Selection method. Options include: - “medoid”: Select medoids of clusters. - “avg”: Select average values of clusters. - “fn_mean_close”: Select samples with frequency closest to cluster mean. - “xi_med_close”: Select samples with damping closest to cluster median.
clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
flattened_results (tuple of ndarray) –
Tuple containing: - Fn_fl : ndarray of shape (n_samples,)

Frequencies corresponding to each sample.
- Xi_flndarray of shape (n_samples,)
  Damping values corresponding to each sample.
- Phi_flndarray (n_samples, n_channels)
  Mode shape corresponding to each sample.
- order_flndarray of shape (n_samples,)
  Order values corresponding to each sample.
medoid_indices (ndarray, optional) – Indices of medoids for each cluster.

Returns:

Fn_out (ndarray) – Selected frequency values based on the chosen method.
Xi_out (ndarray) – Selected damping values based on the chosen method.
Phi_out (ndarray) – Selected additional feature values based on the chosen method.
order_out (ndarray) – Selected order values based on the chosen method.

pyoma2.functions.clus.post_1xorder(clusters: dict[int, numpy.ndarray], labels: ndarray, dtot: ndarray, order_fl: ndarray) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Ensure only one sample per order in each cluster.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
dtot (ndarray of shape (n_samples, n_samples)) – Pairwise distance matrix.
order_fl (ndarray of shape (n_samples,)) – Order values corresponding to each sample.

Returns:

clusters (dict) – Updated dictionary of clusters with only one sample per order.
labels (ndarray of shape (n_samples,)) – Updated cluster labels reflecting refined clusters.

pyoma2.functions.clus.post_MTT(clusters: dict[int, numpy.ndarray], labels: ndarray, flattened_results: tuple[numpy.ndarray, numpy.ndarray]) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Remove outliers using the Modified Thompson Tau technique.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
flattened_results (tuple of ndarray) –
Tuple containing: - Fn_fl : ndarray of shape (n_samples,)

Frequencies corresponding to each sample.
- Xi_flndarray of shape (n_samples,)
  Damping values corresponding to each sample.

Returns:

clusters (dict) – Updated dictionary of clusters with outliers removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with outliers assigned -1.

pyoma2.functions.clus.post_adjusted_boxplot(clusters: dict[int, numpy.ndarray], labels: ndarray, flattened_results: tuple[numpy.ndarray, numpy.ndarray]) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Remove outliers using the adjusted boxplot method.

For each cluster, the function computes the adjusted boxplot boundaries for both frequency and damping, then marks as outliers those observations that do not lie within the respective inlier intervals for both measures. Outliers are assigned a label of -1. The clusters dictionary is updated to include only the remaining (non-outlier) indices.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
flattened_results (tuple of ndarray) –
Tuple containing:
- Fn_flndarray of shape (n_samples,)
  Frequencies corresponding to each sample.
- Xi_flndarray of shape (n_samples,)
  Damping values corresponding to each sample.

Returns:

clusters (dict) – Updated dictionary of clusters with outliers removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with outliers assigned -1.

pyoma2.functions.clus.post_fn_IQR(clusters: dict[int, numpy.ndarray], labels: ndarray, Fn_fl: ndarray) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Filter clusters based on the interquartile range (IQR) of frequencies.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
Fn_fl (ndarray of shape (n_samples,)) – Frequencies corresponding to each sample.

Returns:

clusters (dict) – Updated dictionary of clusters with outliers removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with outliers assigned -1.

pyoma2.functions.clus.post_fn_med(clusters: dict[int, numpy.ndarray], labels: ndarray, flattened_results: tuple[numpy.ndarray, numpy.ndarray]) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Filter clusters based on a median frequency threshold.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
flattened_results (tuple of ndarray) –
Tuple containing: - Fn_fl : ndarray of shape (n_samples,)

Frequencies corresponding to each sample.
- Fn_std_flndarray of shape (n_samples,)
  Standard deviations corresponding to frequencies.

Returns:

clusters (dict) – Updated dictionary of clusters with outliers removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with outliers assigned -1.

pyoma2.functions.clus.post_freq_lim(clusters: dict[int, numpy.ndarray], labels: ndarray, freq_lim: tuple[float, float], Fn_fl: ndarray) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Filter clusters based on specified frequency range.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
freq_lim (tuple of float) – Minimum and maximum allowable frequencies (inclusive).
Fn_fl (ndarray of shape (n_samples,)) – Frequencies corresponding to each sample.

Returns:

clusters (dict) – Updated dictionary of clusters with outliers removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with outliers assigned -1.

pyoma2.functions.clus.post_merge_similar(clusters: dict[int, numpy.ndarray], labels: ndarray, dtot: ndarray, merge_dist: float) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Merge clusters that are similar based on inter-medoid distances.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
dtot (ndarray of shape (n_samples, n_samples)) – Pairwise distance matrix.
merge_dist (float) – Maximum distance threshold for merging clusters.

Returns:

clusters (dict) – Updated dictionary of clusters after merging.
labels (ndarray of shape (n_samples,)) – Updated cluster labels reflecting merged clusters.

pyoma2.functions.clus.post_min_size(clusters: dict[int, numpy.ndarray], labels: ndarray, min_size: int) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Filter clusters based on a minimum cluster size.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
min_size (int) – Minimum allowable cluster size.

Returns:

clusters (dict) – Updated dictionary of clusters with small clusters removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with small clusters assigned -1.

pyoma2.functions.clus.post_min_size_gmm(labels: ndarray) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Filter clusters based on size using Gaussian Mixture Model (GMM).

Parameters:

labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.

Returns:

clusters (dict) – Updated dictionary of clusters with smaller clusters removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with small clusters assigned -1.

pyoma2.functions.clus.post_min_size_kmeans(labels: ndarray) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Filter clusters based on size using k-means clustering.

Parameters:

labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.

Returns:

clusters (dict) – Updated dictionary of clusters with smaller clusters removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with small clusters assigned -1.

pyoma2.functions.clus.post_min_size_pctg(clusters: dict[int, numpy.ndarray], labels: ndarray, min_pctg: float) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Filter clusters based on a percentage of the largest cluster size.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
min_pctg (float) – Minimum allowable cluster size as a percentage of the largest cluster.

Returns:

clusters (dict) – Updated dictionary of clusters with small clusters removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with small clusters assigned -1.

pyoma2.functions.clus.post_xi_IQR(clusters: dict[int, numpy.ndarray], labels: ndarray, Xi_fl: ndarray) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Filter clusters based on the interquartile range (IQR) of damping values.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
Xi_fl (ndarray of shape (n_samples,)) – Damping values corresponding to each sample.

Returns:

clusters (dict) – Updated dictionary of clusters with outliers removed.
labels (ndarray of shape (n_samples,)) – Updated cluster labels with outliers assigned -1.

pyoma2.functions.clus.relative_difference_abs(x: ndarray, y: ndarray) → float[source]

Compute the relative absolute difference between two values.

Parameters:

x (ndarray) – First input value, expected to be a single-element array.
y (ndarray) – Second input value, expected to be a single-element array.

Returns:

Relative absolute difference between x and y. Returns infinity if x is zero.

Return type:

float

pyoma2.functions.clus.reorder_clusters(clusters: dict[int, numpy.ndarray], labels: ndarray, Fn_fl: ndarray) → tuple[dict[int, numpy.ndarray], numpy.ndarray][source]

Reorder cluster labels based on ascending frequencies values.

Parameters:

clusters (dict) – Dictionary of clusters where keys are cluster labels and values are arrays of indices.
labels (ndarray of shape (n_samples,)) – Array of cluster labels for each sample.
Fn_fl (ndarray of shape (n_samples,)) – Frequencies corresponding to each sample.

Returns:

new_clusters (dict) – Reordered dictionary of clusters with updated labels.
new_labels (ndarray of shape (n_samples,)) – Array of updated cluster labels.

pyoma2.functions.clus.spectral(dsim: ndarray, n_clusters: int | str | None, ordmax: int) → ndarray[source]

Perform spectral clustering with the given similarity matrix.

Parameters:

dsim (ndarray of shape (n_samples, n_samples)) – Similarity matrix for clustering.
n_clusters (int or str, optional) – Number of clusters. If “auto”, it is calculated as 25% of the maximum order.
ordmax (int) – Maximum order.

Returns:

labels_clus – Cluster labels for each sample.

Return type:

ndarray of shape (n_samples,)

pyoma2.functions.clus.vectorize_features(features: list[numpy.ndarray | None], non_nan_index: ndarray) → list[numpy.ndarray | None][source]

Vectorize features by flattening them and indexing valid (non-NaN) entries.

Parameters:

features (list of np.ndarray) – A list of 2D or 3D arrays where each array represents a feature.
non_nan_index (np.ndarray) – Indices of non-NaN entries in a flattened array.

Returns:

List where each feature is vectorized and removed of the nan entries. If a feature is None, its corresponding output will be None. For 2D features, the output is a 1D array. For 3D features, the output is a 2D array with shape (len(non_nan_index), feature.shape[2]).

Return type:

list of np.ndarray

The clus module

The `clus` module