16.5 Exercises

  1. Use other metrics and parameter values to find distance-based anomalies in the artificial dataset.

  2. Use other metrics and parameter values to find LOF outliers in the artificial dataset.

  3. Use other metrics and parameter values to find DBSCAN/HDBSCAN/OPTICS outliers in the artificial dataset.

  4. Use other metrics and parameter values to find Isolation Forest outliers in the artificial dataset.

  5. Find distance-based anomalies in the datasets GlobalCitiesPBI.csv, 2016collisionsfinal.csv, polls_us_election_2016.csv HR_2016_Census_simple.csv, UniversalBank.csv, and algae_bloom.csv (as described in the Exercises sections of Modules 8, 9, 11).

  6. Find density-based anomalies in the datasets GlobalCitiesPBI.csv, 2016collisionsfinal.csv, polls_us_election_2016.csv HR_2016_Census_simple.csv, UniversalBank.csv, and algae_bloom.csv.

  7. Find categorical anomalies in the datasets GlobalCitiesPBI.csv, 2016collisionsfinal.csv, UniversalBank.csv, and algae_bloom.csv.

  8. Find projection-based anomalies in the datasets GlobalCitiesPBI.csv, 2016collisionsfinal.csv, polls_us_election_2016.csv HR_2016_Census_simple.csv, UniversalBank.csv, and algae_bloom.csv.

  9. Find subspace-based anomalies in the datasets GlobalCitiesPBI.csv, 2016collisionsfinal.csv, polls_us_election_2016.csv HR_2016_Census_simple.csv, UniversalBank.csv, and algae_bloom.csv.

  10. Find ensemble-based anomalies in the datasets GlobalCitiesPBI.csv, 2016collisionsfinal.csv, polls_us_election_2016.csv HR_2016_Census_simple.csv, UniversalBank.csv, and algae_bloom.csv.

  11. Conduct an analysis of anomalous observations in the 2011 Gapminder data (as described in Modules 12, 13, 14, and 15).

  12. Consider the dataset flights1_2019_1.csv.

    1. Explore and visualize the dataset.

    2. Do any observations appear to be anomalous or outlying? Justify your answer.

    3. If necessary, reduce the dimension of the dataset prior to analysis.

    4. Using at least 4 anomaly detection algorithms, identify anomalous observations in the dataset.

    5. Can you validate the results?