14.5 Exercises

  1. Repeat the external validation of the 2011 Gapminder dataset, using Ward linkage instead of complete linkage, and also for other clustering methods and external groupings (4 regions, 8 regions, etc.). Do the results change dramatically?

  2. Cluster the Iowa Housing, Vowel, Wisconsin Breast Cancer, and Wine datasets using \(k-\)means, for various distance metrics and algorithm parameters. What is your best estimation for the number of clusters in each case? Validate your results.

  3. Cluster the Iowa Housing, Vowel, Wisconsin Breast Cancer, and Wine datasets using hierarchical clustering, for various algorithm parameters. Validate your results.

  4. Cluster the Iowa Housing, Vowel, Wisconsin Breast Cancer, and Wine datasets using DBSCAN, for various algorithm parameters. Validate your results.

  5. Cluster the Iowa Housing, Vowel, Wisconsin Breast Cancer, and Wine datasets using spectral clustering, for various algorithm parameters. Validate your results.

  6. Cluster the Iowa Housing, Vowel, Wisconsin Breast Cancer, and Wine datasets using expectation-maximization clustering, for various algorithm parameters. Validate your results.

  7. Cluster the Iowa Housing, Vowel, Wisconsin Breast Cancer, and Wine datasets using affinity propagation clustering, for various algorithm parameters. Validate your results.

  8. Cluster the Iowa Housing, Vowel, Wisconsin Breast Cancer, and Wine datasets using fuzzy clustering, for various algorithm parameters. Validate your results.

  9. Cluster the Iowa Housing, Vowel, Wisconsin Breast Cancer, and Wine datasets using the combined results of problems 2 to 8. Validate your results.

  10. Cluster the datasets GlobalCitiesPBI.csv, 2016collisionsfinal.csv, polls_us_election_2016.csv, UniversalBank.csv, HR_2016_Census_simple.csv (as described in the Exercises sections of Modules 8, 9, 11), the Gapminder dataset, or any other datasets of interest, using the approaches discussed in this module (or other other appropriate approaches). Validate your results. Where are there difficulties? What decisions must you make along the way? How could you use the results?