2.1 Descriptive Analysis
This is getting to know your data.
Broadly speaking, we can categorize data into three classes:
- Continuous
- Ordinal
- Nominal
Continuous data are numerical that can take on a range of values, such as the temperature, or someone’s income. For continuous data, the magnitude of the numbers, as well as the difference between numbers, is meaningful.
Ordinal data are also numerical, but only the order of the values is important, not their difference. An example is the order in which people finish a race (e.g., 1st place, 2nd place, etc.). It doesn’t make sense to add these numbers (1st place + 2nd place does not equal 3rd place).
Nominal data represent the names of things (e.g., countries, occupations, gender). These data may be represented numerically, but be careful!
2.1.1 Measures of central tendency
- mean
- median
- mode
2.1.2 Measures of variability
- variance
- coefficient of variation
- range
- quantiles
- percentiles
\[ \mathrm{var(x)}, \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x - x_i)^2 \]
2.1.3 Visualizations
- histogram
2.1.4 Transformations
- logarithm
2.1.5 Correlation
- correlation coefficient