## 9.6 Exercises

1. Find examples of data presentations that you consider to be particularly insightful and/or powerful. Discuss their strengths and weaknesses.

2. Find examples of data presentations that you consider to be particularly misleading and/or useless. Discuss their strengths and weaknesses.

3. How do you think new technologies (e.g. virtual or augmented reality, 3D-printing, wearable computing) will influence data presentations?

4. Consider the following datasets: GlobalCitiesPBI.csv, 2016collisionsfinal.csv, polls_us_election_2016.csv, and HR_2016_Census_simple.csv.

1. Create a data dictionary for this dataset. Establish a list of variables that you think are crucial to a good understanding of the dataset. Justify your choices.

2. Create (at least) 5 bivariate/univariate visualizations that can help you understand the dataset.

3. Produce (at least) 3 “definitive” visualizations for the dataset. Use the principles discussed in class (including documentation, legends, annotations, Multiple I’s, etc.). Emphasis should be placed on content AND on presentation (suggestions: consider creating a reasonably high number of charts using a random selection of a random number of variables in order to minimize the odds of missing out on useful information).

5. Repeat the previous question with any dataset of your liking.

6. Identify a scenario for which a dashboard could prove useful. Determine specific questions that the dashboard could help answer or insights that it could provide. Identify data sources and data elements that could be fed into your dashboard. Design a display (with pen and paper) with mock charts. What are the strengths and limitations of your dashboard? Is it functional? Elegant?

7. The remaining exercises use the Gapminder Tools (there is also an offline version).

1. At what point in the data science workflow do you think that visualizations of this nature could be useful?

2. What are the ways in which observations could be anomalous? Have you found any such anomalies? Do you have explanations for them? In particular, consider the case of South Africa in 2012, which appears to be a clear outlier. Follow the path of the South African bubble from 1975 to 2020, in relation to the general pattern. Does the apartheid/income inequity explanation suggested in the text still make sense?

3. Pick 2+ “definitive” visualizations (methods, variables, etc.) other than the default configuration. What are some important insights?

4. How would you describe the insights of step 3 without resorting to visual vocabulary?

5. Can you think of ways in which the data of interest to you in your day-to-day activities could benefit from the same treatment? What situations could you explore in such a scenario? How would that help your team better understand the system under consideration?

8. Consider the following Australian population figures, by state (in 1000s):

1. Graph the New South Wales (NSW) population with all defaults using plot(). Redo the graph by adding a title, a line to connect the points, and some colour.

2. Compare the population of New South Wales (NSW) and the Australian Capital Territory (ACT) by using the functions plot() and lines(), then add a legend to appropriately display your graph.

3. Use a bar chart to graph the population of Queensland (QLD), add an appropriate title to your graph, and display the years from 1917 to 2017 on the appropriate bars.

4. Create a light blue histogram for the population of South Australia (SA).

 Year NSW Vic. Qld SA WA Tas. NT ACT Aust. 1917 1904 1409 683 440 306 193 5 3 4941 1927 2402 1727 873 565 392 211 4 8 6182 1937 2693 1853 993 589 457 233 6 11 6836 1947 2985 2055 1106 646 502 257 11 17 7579 1957 3625 2656 1413 873 688 326 21 38 9640 1967 4295 3274 1700 1110 879 375 62 103 11799 1977 5002 3837 2130 1286 1204 415 104 214 14192 1987 5617 4210 2675 1393 1496 449 158 265 16264 1997 6274 4605 3401 1480 1798 474 187 310 18532 2007 6889 5205 4182 1585 2106 493 215 340 21017 2017 7861 6324 4928 1723 2580 521 246 410 24599