Description
Discussion #2
Topics covered: Description of Samples and Populations. Chapter 2, 40-59 Description of
Samples and Populations. Chapter 2, 59-67
2. Trojans’ score for the last season are as follows: 43, 3, 14, 39, 24, 31, 28, 35, 38, 14, 27, 17 Draw a boxplot manually and verify using R.
Review Sheet
1. Median (y˜): The value that most nearly lies in the middle of the sample. If the number of observations are odd, it is the middle most value. If the number of observations is even, it is the mean of the middle most two values.
Mean (y¯): sum of observation divided by number of observations
Deviation: The difference between each data point and the mean of the entire data set deviation = (observation − y¯)
2. Quartiles: Divides the distribution intow four equal parts:
• First Quartile (Q1): Median of values in the lower half of distribution
• Second Quartile (Q2): Median of all the values in the distribution
• Third Quartile (Q3): Median of values in the upper half of distribution
3. Interquartile range (IQR): Difference between third and first quartile. IQR = Q3 − Q1
4. Outlier: A data point that differs so much from the rest of the data that it doesn’t seem to belong with the other data. Outliers can be helpful in pinpointing a problem with the experimental protocol.
Lower fence: Q1 −1.5× IQR
Upper fence: Q3 +1.5× IQR
Outlier lies out of the bounds of both the fences. That is a data point is an outlier if:
data point < Q1 −1.5× IQR
or
data point > Q3 +1.5× IQR
5. Boxplot: A visual representation of the five-number summary (minimum, Q1, median, Q2 and maximum).
Outliers
Figure 1: Boxplot with outliers. Note that if there are no outliers, the whiskers (Q1−1.5×IQR and Q3+1.5×IQR) represent the minimum and the maximum values in the data respectively.
6. Univariate summary: A graphical or numerical summary of a single variables.
Bivariate summary: A graphical or numerical summary of the relationship between pairs of variables:
• Bivariate frequency table: Used to understand the relationship between two categorical variables.
• Stacked bar chart: A visualization of the bivariate frequency table
• Stacked relative frequency chart: A visualization of the bivariate frequency table such that the total counts of each category have been normalized.
• Side-by-side boxplot: Used to compare the center, spread, skewness and outliers of a dataset across different groups.
• Scatterplot: Used to examine the relationship between two numeric variables X and Y . It plots each observed pair (x,y) as a dot on the x − y plane.
7. Measures of dispersion:
• Range: The difference between the maximum and minimum values in the data (max − min).
• Interquartile range (IQR): Difference between third and first quartile. Q3 − Q1 • Standard deviation (SD): Often denoted by s and defined as :
Sum of squared deviations. The sum of squared deviation can also be written as is called the degrees of freedom. A simple (but incomplete) explaination of the
denominator being n −1 is that it would otherwise not hold for n = 1, resulting in .
• Coefficient of Variation: Ratio of Standard deviation to mean (might be expresssed as percentage). It is unitless and is not affected by change in scale (unlike the above measures of dispersion) and hence is useful for comparing disperion of two or more variables that have been measured on different scales.
Measure of disperion Formula Robust Units Effect of multiplicative
transformation: a × Y Effect of additive transorfmation:
Y + c
Range (R) max − min No Same as data Scales with the multiplicative factor: a × R Remains Same: R
Inter Quartile
Range
(IQR) Q3 − Q1 Yes Same as data Scales with the multiplicative factor: a × IQR Remains Same:
IQR
Standard Deviation (s) No Same as data Scales with the multiplicative factor:
a × s Remains Same: s
Coefficient of
Variation (cv) standard deviation
mean No Unitless Remains same: cv Remains same: cv
Table 1: Measures of dispersion
8. Linear Transformation: A variable Y can be transformed linearly to Y 0. If the graph of Y 0 agains Y is a straight line, such a transformation is linear.
Y 0 = aY Multiplicative transformation
Y 0 = Y + c Additive transformation
Statistic Effect of multiplicative transformation: a × Y Effect of additive transorfmation:
Y + c
Mean (y¯) Scales with multiplicative factor: a × y¯ Shifts by constant factor: y¯+ c
Median (ymedian) Scales with multiplicative factor: a × ymedian Shifts by constant factor: ymedian + c
Mode (ymode) Scales with multiplicative factor: a × ymode Shifts by constant factor: ymode + c
Table 2: Effect of linear transformation on various descriptive statistics




Reviews
There are no reviews yet.