Eroxl's Notes
Histogram

A histogram is similar to a bar graph but instead displays the frequency of intervals of quantitative variables.

Instructions

  1. Split the values of the variable into a set of continuous intervals of typically equal size.
  2. For each interval count the total quantity of occurrences that fit into that interval.
  3. Draw bars for each interval whose height is the quantity of occurrences counted before.

Properties

"Modality"

The "modality" of a histogram describe where the peaks occur and are described in 3 main categories:

  • Unimodal - One clear peak
  • Bimodal - Two clear peaks
  • Multimodal - More than 2 clear peaks

Distribution Mounds.png

Shape

The shape of a histogram describes how the "mass" of it falls, generally the shape can be described as one of the three following categories

  • Symmetric - Both the left and right sides are largely mirrors of each other
  • Left Skewed - Most of the "mass" is on the right side and then a long smaller left trail
  • Right Skewed - Most of the "mass" is on the left side and then a long smaller right trail

Numerically when a histogram is "left skewed", it's median will be much greater than it's mean and conversely if it's "right skewed", it's median will be a lot smaller. If the median and mean are approximately the same the histogram is symmetric.

Centre

The centre of a histogram is the location where the values usually "cluster".

Measures of Centre

Mean

The mean of a distribution is it's long-running average value. Formally the mean is defined as the expected value which is calculated using the following formulae.

Formula

Discrete Case

For a discrete random variable (ie. a randomized dice that can only be the integers 1-6):

Where the sum is taken over all possible values of and is the probability that the random variable takes the value of .

Continuous Case

For a continuous random variable:

Estimations

The mean of a distribution can be estimated be estimated when the exact probabilities of values are unknown.

Sample Survey

Given a sample survey with being the number of observation and being the observation we can estimate the mean of the random variable as follows:

Example

The numbers of hours spent studying for a subset of students are 4, 6, 8, 7, 5. Estimate the mean number of hours spent studying for students.

Properties

Addition by a Constant

Multiplication by a Constant

Addition

Multiplication

Median

The mean of a distribution is a method of determining the centre of the given distribution. Given a distribution where is a given observation, and is the total observations the median can be determined using the following steps

If odd (ie. 1, 3, 5), the median is the observation alternatively if is even the median is the mean of the and observations.

Examples

Odd

Given the numbers 12, 14, 15, 17, 20, 24, 24, 27, 29, find the median

Even

Given the numbers 12, 14, 15, 17, 20, 24, 24, 27, 29, 30, find the median

Spread

The spread of a histogram describes how far most points usually are from the centre of the histogram.

Outlier

An outlier is a single observation in a histogram that is visibly removed from the main "mass" of observations, in other words it's unusually far from the centre of the histogram.

Example

Construct a histogram of the following numbers:

175, 192, 207, 212, 213, 214, 218, 225, 229, 230, 231, 235, 235, 237, 240, 240, 242, 248, 250, 253, 257, 260, 265, 265