SuperExamsSuperExams
Search papers…
Menu
DashboardBrowse papersRevision notesBooksSavedRevision packsFlashcardsMy progressAchievementsAI TutorMy classMessages
Back to dashboard

Unlock worked solutions

Step-by-step answers by examiners. From €5/mo.

Try Premium free →
← Maths notes
Edexcel IGCSE·Maths·Edexcel IGCSE Maths

Statistics: Data, Averages & Diagrams

5 min read

Averages, statistical diagrams, cumulative frequency, box plots and histograms.

Types of Data

Statistics begins with knowing what kind of data you are handling, because it decides which averages and diagrams are sensible.

    Qualitative (categorical) data describes qualities — colours, names, types. It cannot be averaged numerically.
    Quantitative data is numerical. It splits into discrete data (countable values such as the number of goals: 0,1,2,…0, 1, 2, \dots0,1,2,…) and continuous data (measured on a scale such as height or time, which can take any value in a range).

Key terms Primary data is collected first-hand by you; secondary data is taken from an existing source.

Population is the whole group of interest; a sample is a smaller part chosen to represent it. A good sample is large and unbiased.

Mean, Median, Mode and Range

For a simple list of values:

    Mean =sum of valuesnumber of values= \dfrac{\text{sum of values}}{\text{number of values}}=number of valuessum of values​ — uses every value, but is distorted by outliers.
    Median is the middle value when the data is in order. For nnn values it lies at position n+12\frac{n+1}{2}2n+1​.

Worked example For 4,7,7,9,134, 7, 7, 9, 134,7,7,9,13: mean =405=8=\frac{40}{5}=8=540​=8, median =7=7=7 (3rd value), mode =7=7=7, range =13−4=9=13-4=9=13−4=9.

Mean from a Frequency Table

When data is grouped by frequency, multiply each value xxx by its frequency fff, total the products, then divide by the total frequency.

xˉ=∑fx∑f\bar{x}=\frac{\sum fx}{\sum f}xˉ=∑f∑fx​

Goals xxxFrequency ffffxfxfx
050
188
248
339
Total

Mean =2520=1.25=\dfrac{25}{20}=1.25=2025​=1.25 goals. The mode is 111 (highest frequency). The median is the 20+12=10.5\frac{20+1}{2}=10.5220+1​=10.5th value, which falls in the "1" group, so the median is 111.

Estimated Mean from Grouped Data

With grouped continuous data you do not know exact values, so use the midpoint of each class as a best estimate of xxx.

Worked example Times ttt (minutes) for 404040 runners:

| Time (min) | Freq fff | Midpoint xxx | fxfxfx | |---|---|---|---| | 0<t≤100<t\le100<t≤10 | 6 | 5 | 30 | | 10<t≤2010<t\le2010<t≤20 | 14 | 15 | 210 | | 20<t≤3020<t\le3020<t≤30 | 13 | 25 | 325 | | 30<t≤4030<t\le4030<t≤40 | 7 | 35 | 245 |

Watch out It is only an estimate — say "estimated mean" and never give exact-looking accuracy. Use midpoints, not class boundaries.

Statistical Diagrams

    Bar charts show frequencies of categories using bars of equal width with gaps; the height is the frequency.
    Pictograms use a symbol to represent a number of items; always read the key.
    Pie charts show proportions. Each category's angle =frequencytotal×360∘=\dfrac{\text{frequency}}{\text{total}}\times360^\circ=totalfrequency​×360∘. For a total of 202020 people, one person =18∘=18^\circ=18∘.

Exam tip In a pie chart question, to go back from an angle to a frequency, divide the angle by 360∘360^\circ360∘ and multiply by the total.

Scatter Graphs and Correlation

A scatter graph plots paired data to reveal a relationship.

    Positive correlation: as one variable rises, so does the other.
    Negative correlation: as one rises, the other falls.
    No correlation: no clear pattern.

A line of best fit is a straight line following the trend with roughly equal points either side, passing through the mean point (xˉ,yˉ)(\bar{x},\bar{y})(xˉ,yˉ​). Use it to estimate values — interpolation (within the data) is reliable; extrapolation (beyond it) is risky.

Watch out Correlation does not prove causation. Two things may rise together because of a third hidden factor.

Cumulative Frequency

Cumulative frequency is a running total of frequencies. Plot it against the upper class boundary of each group, then join the points with a smooth curve.

0 10 20 30 40 0 10 20 30 40 Time (minutes) Cumulative frequency median ≈ 20 Q1 Q3
Cumulative frequency curve for 40 runners, with quartiles marked

To find the median, read across from n2\frac{n}{2}2n​ (for a curve, use n2\frac{n}{2}2n​, not n+12\frac{n+1}{2}2n+1​). The lower quartile Q1Q_1Q1​ is read at n4\frac{n}{4}4n​ and the upper quartile Q3Q_3Q3​ at 3n4\frac{3n}{4}43n​.

Interquartile range (IQR)=Q3−Q1\text{Interquartile range (IQR)} = Q_3 - Q_1Interquartile range (IQR)=Q3​−Q1​

Worked example For n=40n=40n=40: median at 402=20⇒≈20\frac{40}{2}=20 \Rightarrow \approx 20240​=20⇒≈20 min; Q1Q_1Q1​ at 10⇒≈1310 \Rightarrow \approx 1310⇒≈13 min; Q3Q_3Q3​ at 30⇒≈2730 \Rightarrow \approx 2730⇒≈27 min.

Box Plots

A box plot (box-and-whisker) summarises five numbers: minimum, Q1Q_1Q1​, median, Q3Q_3Q3​, maximum. The box spans the IQR; the line inside is the median; whiskers reach the extremes.

010 2030 40 Time (minutes) 413 20 2738
Box plot of runner times (min 4, Q1 13, median 20, Q3 27, max 38)

Box plots make it easy to compare two distributions: compare medians for average and IQRs (box widths) for consistency.

Histograms with Unequal Class Widths

When class widths differ, bar heights must show frequency density, not frequency — otherwise wide classes look misleadingly large. The area of each bar equals the frequency.

frequency density=frequencyclass width\text{frequency density} = \frac{\text{frequency}}{\text{class width}}frequency density=class widthfrequency​

Worked example | Mass mmm (kg) | Freq | Width | Freq density | |---|---|---|---| | 0<m≤100<m\le100<m≤10 | 8 | 10 | 0.8 | | 10<m≤2010<m\le2010<m≤20 | 18 | 10 | 1.8 | | 20<m≤4020<m\le4020<m≤40 | 24 | 20 | 1.2 | | 40<m≤7040<m\le7040<m≤70 | 9 | 30 | 0.3 |

0 0.5 1.0 1.5 2.0 Frequency density 010 2040 70 Mass (kg)
Histogram of mass using frequency density (unequal class widths)

Exam tip The golden rule is frequency = area = frequency density ×\times× class width. If a question gives you density and width, multiply; if it gives frequency and width, divide. Bars in a histogram have no gaps.

Viewing only

This content is free to read on superexams.com and cannot be printed or downloaded.

Read the full note, free

Create a free account to read this note in full. Every free account gets 2 complete revision notes, no card needed.

Sign up free →Log in

More Maths notes

Numbers & the Number System

Ratio, Proportion & Percentages

Algebra: Expressions, Indices, Expanding & Factorising

Solving Equations & Inequalities

Mode is the most frequent value (there can be none, one, or several).
Range === largest −-− smallest. It measures spread, not average.
20
25

∑f=40\sum f = 40∑f=40, ∑fx=810\sum fx = 810∑fx=810. Estimated mean =81040=20.25=\dfrac{810}{40}=20.25=40810​=20.25 min.

The modal class is 10<t≤2010<t\le2010<t≤20 (largest frequency). The median is the 20.520.520.5th value; cumulative frequencies are 6,20,33,406, 20, 33, 406,20,33,40, so the median lies in the class containing the median, 20<t≤3020<t\le3020<t≤30.

Stem-and-leaf diagrams keep the actual data while showing shape. Always include a key such as 2 ∣ 5=252\,|\,5 = 252∣5=25, and order the leaves.

IQR=27−13=14\text{IQR} = 27 - 13 = 14IQR=27−13=14 min. The IQR measures the spread of the middle half and ignores outliers.

The bar for 20<m≤4020<m\le4020<m≤40 is wide but only medium height. To find frequency in a part of a class, use area: in 20<m≤3020<m\le3020<m≤30, frequency ≈1.2×10=12\approx 1.2\times10=12≈1.2×10=12.