Table 1

List of common “mines” of model selection and clustering discussed in the paper

	Issue	Suggestion	Example
Mine #1	Selecting models without noticing it	Be aware of the assumptions behind analysis methods; treat the choice among different algorithms as a model selection problem	Chari et al. (2021)
Mine #2	Overfitting with overly complex models	Use statistical model selection tools which penalize too many parameters	Polynomial fitting
Mine #3	Selecting from a pool of poorly fitting models might lead to false confidence	Simulate data from each of the tested models multiple times and test whether the real data are sufficient to distinguish across the competing models	Figure 1b
Mine #4	Different information criteria might favor different models	Consider the strengths and limitations of the different approaches (Table 2); simulated data can be used to test which model selection method is the most reliable for the given problem	Figure 1c (AIC favors overfitting), e (BIC chooses an oversimplified model), f; Evans (2019)
Mine #5	Model selection might be sensitive to parameters ignored by the tested models	Avoid model classes that are too restrictive to account for data heterogeneity	Chandrasekaran et al. (2018)
Mine #6	Cross-validation techniques are prone to overfitting	A data splitting approach was proposed by Genkin and Engel in which optimal model complexity is determined by calculating KL divergence	Genkin and Engel (2020)
Mine #7	Agglomerative hierarchical clustering is sensitive to outliers	Consider divisive methods	Figure 2c; Varshavsky et al. (2008)
Mine #8	K-means clustering might converge to local minima	Repeat several times from different starting centroid locations	Figure 2e, right
Mine #9	Number of clusters not known	Use the elbow method, gap statistics, or model selection approaches	Figure 2e, left