15.1 Data Reduction for Insight

For small datasets, the benefits of data mining may not always be evident. Consider, for instance, the following excerpt from a lawn mowing instruction manual (which we consider to be data for the time being):

Before starting your mower inspect it carefully to ensure that there are no loose parts and that it is in good working order.

It is a fairly short and organized way to convey a message. It could be further shortened and organized, perhaps, but it’s not clear that one would gain much from the process.

15.1.1 Reduction of an NHL Game

For a meatier example, consider the NHL game that took place between the Ottawa Senators and the Toronto Maple Leafs on February 18, 2017 [291].

As a first approximation, we shall think of a hockey game as a series of sequential and non-overlapping “events” involving two teams of skaters. What does it mean to have extracted useful insights from such a series of events?

At some level, the most complete raw understanding of that night’s game belongs to the game’s active and passive participants (players, referees, coaches, general managers, official scorer and time-keeper, etc.).286

The larger group of individuals who attended the game in person, watched it on TV/Internet, or listened to it on the radio presumably also have a lot of the facts at their disposal, with some contamination, as it were, by commentators (in the two latter cases).

Presumably, the participants and the witnesses also possess insights into the specific game: how could that information best be relayed to members of the public who did not catch the game? There are many ways to do so, depending on the intended level of abstraction and on the target audience (see Figure 15.1).

A schematic diagram of data reduction as it could apply to a professional hockey game.

Figure 15.1: A schematic diagram of data reduction as it could apply to a professional hockey game.

Play-by-Play Text File

If a hockey game is a series of events, why not simply list the events, in the order in which they occurred? Of course, not everything that happens in the “raw” game requires reporting – it might be impressive to see Auston Matthews skate by Dion Phaneuf on his way to the Senators’ net at the 8:45 mark of the 2nd period, say, but reporting this “event” would only serve to highlight the fact that Matthews is a better skater than Phaneuf. It is true, to be sure, but some level of filtering must be applied in order to retain only relevant (or “high-level”) information, such as:

blocked shots, face-off wins, giveaways, goals, hits, missed shots, penalties, power play events, saves, shorthanded events, shots on goal, stoppage (goalie stopped, icing, offside, puck in benches), takeaways, etc.

In a typical game, between 300 and 400 events are recorded (see Figure 15.2 for an extract of the play-by-play file for the game under consideration; the full list is found at [291]).

Play-by-play extract, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017 [@TM_OTT_TOR].

Figure 15.2: Play-by-play extract, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017 [291].

A certain amount of knowledge about the sport is required to make sense of some of the entries (colouring, use of bold text, etc.), but if one has the patience, one can pretty much re-constitute the flow of the game. This approach is, of course, fully descriptive.

Boxscore

The play-by-play does convey the game’s events, but the relevance of its entries is sometimes questionable. In the general context of the game, how useful is to know that Nikita Zaitsev blocked a shot by Erik Karlsson at the 2:38 mark of the 1st period? Had this blocked shot saved a certain Ottawa goal or directly lead to a Toronto goal, one could have argued for its inclusion in the list of crucial events to report, but only the most fastidious observer (or a statistical analyst) would bemoan its removal from the game’s report.

The game’s boxscore provides relevant information, at the cost of completeness: it distills the play-by-play file into a series of meaningful statistics and summaries, providing insights into the game that even a fan in attendance might have missed while the game was going on (see Figure 15.3).

Advanced Boxscore, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017 [@TM_OTT_TOR].Advanced Boxscore, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017 [@TM_OTT_TOR].Advanced Boxscore, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017 [@TM_OTT_TOR].

Figure 15.3: Advanced Boxscore, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017 [291].

Once again, a certain amount of knowledge about the sport is required to make sense of the statistics, and to place them in the right context: is it meaningful that the Senators won 36 faceoffs to the Maple Leafs’ 31? That Mark Stone was a +4 on the night? That both teams went 1-for-4 on the powerplay? One cannot re-constitute the full flow of the game from the boxscore alone, but the approach is not solely descriptive – questions can be asked, and answers provided… the analytical game is afoot!

Recap/Highlights

One of the boxscore’s shortcomings is that it does not provide much in the way of narrative, which has become a staple of sports reporting – what really happened during that game? How does it impact the current season for either team?

Associated Press, 19 February 2017 TORONTO
The Ottawa Senators have the Atlantic Division lead in their sights.

Mark Stone had a goal and four assists, Derick Brassard scored twice in the third period and the Senators recovered after blowing a two-goal lead to beat the Toronto Maple Leafs 6-3 on Saturday night.

The Senators pulled within two points of Montreal for first place in the Atlantic Division with three games in hand. “We like where we’re at. We’re in a good spot,” Stone said. “But there’s a little bit more that we want. Obviously, there’s teams coming and we want to try and create separation, so the only way to do that is keep winning hockey games.”

Ottawa led 2-0 after one period but trailed 3-2 in the third before getting a tying goal from Mike Hoffman and a power-play goal from Brassard. Stone and Brassard added empty-netters, and Chris Wideman and Ryan Dzingel also scored for the Senators. Ottawa has won four of five overall and three of four against the Leafs this season. Craig Anderson stopped 34 shots.

Morgan Rielly, Nazem Kadri and William Nylander scored and Auston Matthews had two assists for the Maple Leafs. Frederik Andersen allowed four goals on 40 shots. Toronto has lost eight of 11 and entered the night with a tenuous grip on the final wild-card spot in the Eastern Conference.

“The reality is we’re all big boys, we can read the standings. You’ve got to win hockey games,” Babcock said. After Nylander made it 3-2 with a power-play goal 2:04 into the third, Hoffman tied it by rifling a shot from the right faceoff circle off the post and in. On a power play 54 seconds later, Andersen stopped Erik Karlsson’s point shot, but Brassard jumped on the rebound and put it in for a 4-3 lead.

Wideman started the scoring in the first, firing a point shot through traffic moments after Stone beat Nikita Zaitsev for a puck behind the Leafs goal. Dzingel added to the lead when he deflected Marc Methot’s point shot 20 seconds later.

Andersen stopped three shots during a lengthy 5-on-3 during the second period, and the Leafs got on the board about three minutes later. Rielly scored with 5:22 left in the second by chasing down a wide shot from Matthews, carrying it to the point and shooting through a crowd in front.

About three minutes later, Zaitsev fired a shot from the right point that sneaked through Anderson’s pads and slid behind the net. Kadri chased it down and banked it off Dzingel’s helmet and in for his 24th goal of the season. Dzingel had fallen in the crease trying to prevent Kadri from stuffing the rebound in.

“Our game plan didn’t change for the third period, and that’s just the maturity we’re gaining over time,” Senators coach Guy Boucher said. “Our leaders have been doing a great job, but collectively, the team has grown dramatically in terms of having poise, executing under pressure.”

Game notes: Mitch Marner sat out for Toronto with an upper-body injury. Marner leads Toronto with 48 points and is also expected to sit Sunday night against Carolina.

UP NEXT Senators: Host Winnipeg on Sunday night. Maple Leafs: Travel to Carolina for a game Sunday night.

Simple Boxscore

A hockey pool participant might be interested in the fact that Auston Matthews spent nearly 4 minutes on the powerplay (see Figure 15.3), but a casual observer is likely to find the full boxscore monstrous overkill. How much crucial information is lost/provided by Fgiure 15.4, instead?

Simple Boxscore, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017 [@TM_OTT_TOR].

Figure 15.4: Simple Boxscore, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017 [291].

Headline

If one takes the view that human beings impose a narrative on sporting events (rather than unearth it), it could be argued that the only “true” informational content is found in the following headline:

Sens rally after blowing lead, beat Leafs, gain on Habs. [291]

Visualization

It is easy to get lost in row after row of statistics and events description, or in large bodies of text – doubly so for a machine in the latter case. Visualizations can help complement our understanding of any data analytic situation.While visualizations can be appealing on their own, a certain amount of external context is required to make sense of most of them (see Figure 15.5).

Visualizations, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017: offensive zone unblocked shots heat map (top left), gameflow chart, Corsi +/- , all situations (top right), player shift chart (bottom left), shots and goals (bottom right) [@TM_NST].Visualizations, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017: offensive zone unblocked shots heat map (top left), gameflow chart, Corsi +/- , all situations (top right), player shift chart (bottom left), shots and goals (bottom right) [@TM_NST].Visualizations, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017: offensive zone unblocked shots heat map (top left), gameflow chart, Corsi +/- , all situations (top right), player shift chart (bottom left), shots and goals (bottom right) [@TM_NST].Visualizations, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017: offensive zone unblocked shots heat map (top left), gameflow chart, Corsi +/- , all situations (top right), player shift chart (bottom left), shots and goals (bottom right) [@TM_NST].

Figure 15.5: Visualizations, Ottawa Senators @ Toronto Maple Leafs, February 18, 2017: offensive zone unblocked shots heat map (top left), gameflow chart, Corsi +/- , all situations (top right), player shift chart (bottom left), shots and goals (bottom right) [292].

General Context

A document which is prepared for analysis is often part of a more general context or collection. Can the analysis of all the games between the Senators and the Maple Leafs shed some light on their rivalry on the ice? Obviously, the more arcane the representation method, the more in-depth knowledge of the game and its statistics is required, but to those in the know, summaries and visualizations can provide valuable insight (see Figure 15.6).

A schematic diagram of data reduction as it applies to a \textit{corpus} of professional hockey games, with visualization and summarizing of regular season games between the Ottawa Senators and Toronto Maple Leafs (1993-2017).A schematic diagram of data reduction as it applies to a \textit{corpus} of professional hockey games, with visualization and summarizing of regular season games between the Ottawa Senators and Toronto Maple Leafs (1993-2017).

Figure 15.6: A schematic diagram of data reduction as it applies to a of professional hockey games, with visualization and summarizing of regular season games between the Ottawa Senators and Toronto Maple Leafs (1993-2017).

There are thus various ways to understand a single hockey game – and a series of games – depending on the desired (or required) levels of abstraction and complexity. But as all quantitative methods, data reduction for insight is subject to analytic choices – you may have noticed that we conspicuously averted reporting on playoff results, and on post-2017 results. Would the overall “understanding” of the game in question (and the rivalry, in general) change if they were included?287


Clearly, the specific details of data reduction as applied to a hockey game are not always portable, but the main concept is.

15.1.2 Meaning in Macbeth

It is a tale told by an idiot, full of sound and fury, signifying nothing. [Macbeth, V.5, line 30]

In a sense, in order to extract the full meaning out of a document, said document needs to be read and understood in its entirety.288 But even if we have the luxury of doing so, some issues appear:

  • do all readers extract the same meaning?

  • does meaning stay constant over time?

  • is meaning retained by the language of the document?

  • do the author’s intentions constitute the true (baseline) meaning?

  • does re-reading the document change its meaning?  

Given the uncertain nature of what a document’s meaning actually is, it is counter-productive to talk about insight or meaning (in the singular); rather we look for insights and meanings (in the plural). Consider the following passage from Macbeth (Act I, Scene 5, Lines 45-52):

[Enter MACBETH]
LADY MACBETH: Great Glamis, worthy Cawdor,
Greater than both, by the all-hail hereafter,
Thy letters have transported me beyond
This ignorant present, and I feel now
The future in the instant
MACBETH: My dearest love, Duncan comes here tonight.
LADY MACBETH: And when goes hence?
MACBETH: Tomorrow, as he purposes.

What is the “meaning” of this scene? What is the “meaning” of Macbeth as a whole? As a starting point, it’s crucial to note that the “meaning” of the scene is likely not independent of the play’s context up to this scene (a description of the plot in modern prose is provided in [293]).

Does the plot description carry the same “meaning” as the play itself? What about TVTropes’s laconic description of Macbeth [294]:

Hen-pecked Scottish nobleman murders his king and spends the rest of the play regretting it.

Or Mister Apple’s haiku description (same site)?

Macbeth and his wife
      Want to become the royals
           So they kill ’em all.

Or this literary description, from an unknown author?

Macbeth dramatizes the battle between good and evil, exploring the psychological effects of King Duncan’s murder on Macbeth and Lady Macbeth. His conflicting feelings of guilt and ambition embody this timeless battle of good vs evil.

Or yet again the (fantastic) 2001 movie Scotland, PA, featuring James LeGros, Maura Tierney, and Christopher Walken [295]?

For non-native English speakers (and for a number of native speakers as well, it should be said…), the play (to say nothing of the quoted passage above) might prove difficult to parse and understand.

A modern translation (which is a form of data reduction) is available at No Fear Shakespeare, shedding some light on the semantic role of the scene:

MACBETH enters.
LADY MACBETH: Great thane of Glamis! Worthy thane of Cawdor! You’ll soon be greater than both those titles, once you become king! Your letter has transported me from the present moment, when who knows what will happen, and has made me feel like the future is already here.
MACBETH: My dearest love, Duncan is coming here tonight.
LADY MACBETH: And when is he leaving?
MACBETH: He plans to leave tomorrow.

Consider, also, the French translation by F. Victor Hugo:

Entre MACBETH.
LADY MACBETH, continuant: Grand Glamis! Digne Cawdor! plus grand que tout cela par le salut futur! Ta lettre m’a transportée au delà de ce présent ignorant, et je ne ne sens plus dans l’instant que l’avenir.
MACBETH: Mon cher amour, Duncan arrive ici ce soir.
LADY MACBETH: Et quand repart-il?
MACBETH: Demain… C’est son intention.

Do these all carry the same Macbeth essence? Do they all even carry a Macbeth essence? Are they all Macbeth? How much, if anything, of Macbeth do they preserve? The French translation, for instance, adds a very ominous tone to Macbeth’s last retort to his wife. Those of us who have read the rest of the play know that the tone is in keeping with the events that will eventually transpire, but does the translation add some foreshadowing that is simply not present up to that point in the original? If so, does it matter?


One way or another, similar questions must be addressed when investigating aspects of the universe through data analysis; we have already alluded to this problem in Data Science Basics (see Figure 7.2, in particular).

References

[291]
[292]
Natural Stat Trick, Ottawa Senators @ Toronto Maple Leafs Game Log.” 2017.
[293]
Wikipedia, Macbeth.”
[294]
Tvtropes.org, Laconic Macbeth.”
[295]
W. Morrissette, Scotland, PA.” 2001.