Understanding Statistical Concepts
Etymology of "Variable"
The term "variable" comes from the Latin word variābilis, consisting of:
- vari(us) - meaning "various" or "different"
- -ābilis - meaning "-able" or "capable of"
Together, these elements convey the meaning "capable of changing" - which perfectly describes the nature of variables in statistics.
Variables are classified into different types:
Can take any value within a range. They are measured, not counted.
Statistical measures commonly used: Mean, Median, Standard Deviation
Takes only specific values, often whole numbers. They are counted rather than measured.
Statistical measures commonly used: Mode, Frequency, Proportion
Categories with no inherent order or ranking. Simply names or labels.
Statistical measures commonly used: Mode, Frequency, Chi-square tests
Categories with a meaningful order or ranking, but the intervals between values may not be equal.
Statistical measures commonly used: Median, Mode, Percentiles
Descriptive statistics summarize and organize characteristics of a data set. These measures provide simple summaries about the sample and observations.
Measures of Central Tendency
MeanThe arithmetic average of a set of values
The sum of all values divided by the number of values. The mean is sensitive to extreme values (outliers).
Best used for: Normally distributed data without significant outliers
MedianThe middle value in a set of ordered values
The middle value when data is arranged in ascending or descending order. The median is not sensitive to extreme values.
Best used for: Skewed distributions or when outliers are present
ModeThe most frequently occurring value
The value that appears most frequently in a dataset. A dataset may have no mode, one mode, or multiple modes.
Best used for: Categorical data or discrete values
Measures of Dispersion
RangeThe difference between the maximum and minimum values
The difference between the largest and smallest values in a dataset. It provides a simple measure of spread.
Best used for: Quick assessment of data spread
VarianceThe average of squared deviations from the mean
The average of squared differences from the mean. Variance provides a measure of how spread out the values are.
Best used for: Understanding overall dispersion (though difficult to interpret in original units)
Standard DeviationThe square root of the variance
The square root of the variance. Standard deviation is expressed in the same units as the original data.
Best used for: Measuring dispersion in normally distributed data
Standard ErrorThe standard deviation of a sampling distribution
The standard deviation of the sampling distribution of a statistic. It measures the precision of the sample mean.
Best used for: Estimating the precision of a sample mean
Measures of Shape
SkewnessA measure of asymmetry in the distribution
Skewness measures the asymmetry of the probability distribution. A normal distribution has a skewness of zero.
- Positive skewness: Right tail is longer (mean > median)
- Negative skewness: Left tail is longer (mean < median)
KurtosisA measure of the "tailedness" of a distribution
Kurtosis describes the shape of a probability distribution's tails. It measures whether the data are heavy-tailed or light-tailed relative to a normal distribution.
- Leptokurtic (positive kurtosis): Heavy tails, more outliers
- Mesokurtic (zero kurtosis): Normal distribution
- Platykurtic (negative kurtosis): Light tails, fewer outliers
When to Use Each Statistic
Statistic | Variable Type | Best Used When |
---|---|---|
Mean | Continuous, Discrete (numeric) | Data is normally distributed, no extreme outliers |
Median | Continuous, Discrete (numeric), Ordinal | Data is skewed, outliers present |
Mode | All types (especially Nominal) | Finding most common category or value |
Standard Deviation | Continuous, Discrete (numeric) | Measuring normal distribution spread |
Range | Continuous, Discrete (numeric) | Quick assessment of spread |
Percentiles | Continuous, Discrete (numeric), Ordinal | Understanding distribution position |
Enter a set of comma-separated values to calculate various descriptive statistics.
Or select a sample dataset:
Variable Types Quiz
1. What type of variable is "Height measured in centimeters"?
2. Which measure of central tendency is most appropriate for nominal data?
3. If a distribution has a positive skew, which of the following is true?
4. What type of variable is "Satisfaction rating on a scale of 1-5"?
5. Which descriptive statistic is most affected by outliers?