## Probability and statistics in context

### MSD

Maths
Many manufacturing processes produce items that vary slightly in some way (dimensions, mass, resistance etc.) but are generally within certain limits. Repeated physical measurements of sample items tend to follow what is called a normal distribution about the average or mean value. Variations can be due to the manufacturing processes or the measurement process.

The full lessons along with a supporting toolkit are available in three different formats, A4, A3 and as a Powerpoint deck.

Contains the full lesson along with a supporting toolkit, including teachers’ notes. ## Introduction

If you bought a mobile phone, you would expect that it had been fully tested before it went on sale. However, if you bought a carton of milk or a bar of chocolate you would not expect that it had been individually tested, although it might have been inspected in some way. Chemical or biological analysis of items often requires modifying them by, for example, adding some reagent and observing the effect, or measuring pH. So, instead of putting each item through a range of separate tests, random samples are assessed. If the samples are in line with expectations then all the other items are likely to be so as well.

## Testing items that vary discreetly

If 10% of the items produced by a manufacturing process are defective and 90% are without defect, then the chance that an item chosen at random is not defective is 90%. However if two items are chosen then the chance that no item is defective is 0.90 2 = 0.81 (or 81%) and the chance that one or both are defective is therefore 1-0.81 (or 19%). If five items are chosen then the chance that no item is defective is 0.90 5 = 0.59 (or 59%) and the chance that one or more are defective is therefore 1 − 0.59 (or 41%). In order to have a better than 50:50 chance of detecting a defective item from this production line a sample of 7 (or more) would need to be selected. (1 − 0.907 = 0.52 or 52%).

If, however, 99.99% of the products are perfect then to have a better than 50:50 chance of detecting at least one defective item, almost 7000 samples would need to be checked! (1 − 0.99996932 = 50.004%).

## Testing items that vary continuously

In reality many manufacturing processes produce items that vary slightly in some way (dimensions, mass, resistance etc.) but are generally within certain limits. Repeated physical measurements of sample items tend to follow what is called a normal distribution about the average or mean value. The amount of variation in measurements could be due to
1. variations in the manufacturing process(es)
2. variations in the measurement process.
The spread of values can be represented graphically as a histogram (if the values are given, for example, to the nearest millimetre). If more and more measurements are taken with finer intervals the shape of the histogram approaches that of the normal distribution curve.

### True or False?

1. The average of a set of values is called the mean. true
2. The sum of the deviations from the mean is zero. true
3. The sum of the absolute deviations from the mean is called the standard deviation. false
4. About 95% of a set of data that has a normal distribution will be within 2 standard deviations of the mean. true
5. A 99% confidence level means that if the test were repeated many times 99% of them would give the same result. true
6. A confidence level is also called a ‘margin of error’. false
7. A ‘confidence interval’ is the range of values that contains the true value, at the specified confidence level. true
8. The main stages in drug development are discovery, production and approval. false
9. In a double blind test neither the patients nor the researchers know who receives the product being tested. true
10. Drug development usually takes about one year. false

### Glossary of terms

absolute value
the numerical value of a number, ignoring any leading minus sign
confidence interval
the interval around an estimate which includes the margin of error above and below the estimate
double-blind test
a test in which neither the participants nor the researchers know which participants belong to the control group
histogram
a graphical representation of sample data using bars of different heights with no gaps between them
level of confidence
the probability that an estimate is correct; in the case of random samples the level of confidence depends on the sample size: e.g. for 99% level of confidence the margin of error is approximately 1.29(√n)
margin of error
the range above and below an estimate in which one is confident that the true value lies; e.g. 55% +/- 2%
mean
the average
normal distribution
when many random samples of a measurement are made they tend to be distributed about the mean in a mathematically predictable way; sometimes called a 'bell curve'
objective
without bias; opposite to subjective
pH
a measure of the acidity or basicity of an aqueous solution; pH values are generally in the range 0 to 14, where 0 is a strong acid, 7 is neutral and 14 is a strong base. Negative values are possible; commercial HCl has a pH of about -1.1. Saturated NaOH has a pH of about 15.
pharmaceutical
relating to the production of medicines
placebo
an ineffective treatment that the patient assumes to be effective
production line
a manufacturing process in which a product is made in a sequence of steps, typically in adjoining locations
random sample
a sample that is not selected for any particular reason
reagent
a substance that takes part in a chemical reaction
standard deviation
a measure of how sample measurements are clustered around the mean; it is equal to the square root of the variance
variance
a measure of how sample measurements are clustered around the mean; it is equal to the square of the standard deviation