PROF. RINO
  • About me
  • CV
  • Awards & Impact
  • Publications
  • Funding
  • News & Podcasts
  • Stats101
Interactive Statistical Concepts

Understanding Statistical Concepts

Understanding Variables in Statistics

Etymology of "Variable"

The term "variable" comes from the Latin word variābilis, consisting of:

  • vari(us) - meaning "various" or "different"
  • -ābilis - meaning "-able" or "capable of"

Together, these elements convey the meaning "capable of changing" - which perfectly describes the nature of variables in statistics.

Variables are classified into different types:

Variables Continuous Discrete Height Weight Temperature Time Nominal Ordinal Gender, Blood Type Eye Color, Nationality Education Level Rating Scales (1-5)
Continuous Variables

Can take any value within a range. They are measured, not counted.

Examples: Height, Weight, Time, Distance, Temperature

Statistical measures commonly used: Mean, Median, Standard Deviation

Discrete Variables

Takes only specific values, often whole numbers. They are counted rather than measured.

Examples: Number of children, Number of cars, Count of occurrences

Statistical measures commonly used: Mode, Frequency, Proportion

Nominal Variables

Categories with no inherent order or ranking. Simply names or labels.

Examples: Gender, Blood Type, Hair Color, Nationality

Statistical measures commonly used: Mode, Frequency, Chi-square tests

Ordinal Variables

Categories with a meaningful order or ranking, but the intervals between values may not be equal.

Examples: Education level (primary, secondary, tertiary), Satisfaction ratings (poor, fair, good, excellent)

Statistical measures commonly used: Median, Mode, Percentiles

Descriptive Statistics

Descriptive statistics summarize and organize characteristics of a data set. These measures provide simple summaries about the sample and observations.

Measures of Central Tendency

MeanThe arithmetic average of a set of values

The sum of all values divided by the number of values. The mean is sensitive to extreme values (outliers).

Mean (x̄) = (x₁ + x₂ + ... + xₙ) / n = Σx / n

Best used for: Normally distributed data without significant outliers

MedianThe middle value in a set of ordered values

The middle value when data is arranged in ascending or descending order. The median is not sensitive to extreme values.

Median = middle value of ordered data

Best used for: Skewed distributions or when outliers are present

ModeThe most frequently occurring value

The value that appears most frequently in a dataset. A dataset may have no mode, one mode, or multiple modes.

Mode = most frequent value

Best used for: Categorical data or discrete values

Measures of Dispersion

RangeThe difference between the maximum and minimum values

The difference between the largest and smallest values in a dataset. It provides a simple measure of spread.

Range = Maximum - Minimum

Best used for: Quick assessment of data spread

VarianceThe average of squared deviations from the mean

The average of squared differences from the mean. Variance provides a measure of how spread out the values are.

Variance (s²) = Σ(x - x̄)² / (n-1)

Best used for: Understanding overall dispersion (though difficult to interpret in original units)

Standard DeviationThe square root of the variance

The square root of the variance. Standard deviation is expressed in the same units as the original data.

Standard Deviation (s) = √(Σ(x - x̄)² / (n-1))

Best used for: Measuring dispersion in normally distributed data

Standard ErrorThe standard deviation of a sampling distribution

The standard deviation of the sampling distribution of a statistic. It measures the precision of the sample mean.

Standard Error (SE) = s / √n

Best used for: Estimating the precision of a sample mean

Measures of Shape

SkewnessA measure of asymmetry in the distribution

Skewness measures the asymmetry of the probability distribution. A normal distribution has a skewness of zero.

  • Positive skewness: Right tail is longer (mean > median)
  • Negative skewness: Left tail is longer (mean < median)
Positive Skew
Normal
Negative Skew

KurtosisA measure of the "tailedness" of a distribution

Kurtosis describes the shape of a probability distribution's tails. It measures whether the data are heavy-tailed or light-tailed relative to a normal distribution.

  • Leptokurtic (positive kurtosis): Heavy tails, more outliers
  • Mesokurtic (zero kurtosis): Normal distribution
  • Platykurtic (negative kurtosis): Light tails, fewer outliers
Leptokurtic
Mesokurtic
Platykurtic

When to Use Each Statistic

Statistic Variable Type Best Used When
Mean Continuous, Discrete (numeric) Data is normally distributed, no extreme outliers
Median Continuous, Discrete (numeric), Ordinal Data is skewed, outliers present
Mode All types (especially Nominal) Finding most common category or value
Standard Deviation Continuous, Discrete (numeric) Measuring normal distribution spread
Range Continuous, Discrete (numeric) Quick assessment of spread
Percentiles Continuous, Discrete (numeric), Ordinal Understanding distribution position
Statistics Calculator

Enter a set of comma-separated values to calculate various descriptive statistics.

Enter numerical values only, separated by commas

Or select a sample dataset:

Normal Distribution
Right-Skewed
Left-Skewed
Bimodal

Results

Statistic Value Interpretation
Test Your Knowledge

Variable Types Quiz

1. What type of variable is "Height measured in centimeters"?

2. Which measure of central tendency is most appropriate for nominal data?

3. If a distribution has a positive skew, which of the following is true?

4. What type of variable is "Satisfaction rating on a scale of 1-5"?

5. Which descriptive statistic is most affected by outliers?

Ruggiero Lovreglio - Copyrights 2024 - [email protected]
  • About me
  • CV
  • Awards & Impact
  • Publications
  • Funding
  • News & Podcasts
  • Stats101