Modern Data Science with R by Benjamin Baumer, Daniel Kaplan, Nicholas Horton

By Benjamin Baumer, Daniel Kaplan, Nicholas Horton

By comparing the percentage of the vote, we can control for the size of the voting population in each district. Similarly, it makes less sense to focus on the total amount of money spent, as opposed to the percentage of money spent. 8 we present the same comparison, but with both axes scaled to percentages. 7. First, there does appear to be a positive association between the percentage of money supporting a candidate and the percentage of votes that they earn. However, that relationship is of greatest interest towards the center of the plot, where elections are actually contested.

Today, the manner in which we extract meaning from data is different in two ways—both due primarily to advances in computing: 1. we are able to compute many more things than we could before, and; 2. we have a lot more data than we had before. , the bootstrap, permutation tests). The second change means that many of the data we now collect are observational—they don’t come from a designed experiment and they aren’t really sampled at random. This makes developing realistic probability models for these data much more challenging, which in turn makes formal statistical inference a more challenging (and perhaps less relevant) problem.

Our view is that it is not. First, over the last half century, a coherent set of simple data operations have been developed that can be used as the building blocks of sophisticated data wrangling processes. The trick is not mastering programming but rather learning to think in terms of these operations. Much of this book is intended to help you master such thinking. Second, it is possible to use recent developments in software to vastly reduce the amount of programming needed to use these data operations.

