CSCI3022 – 0.1 – Assignment Hub

Description

5/5 – (1 vote)

The probability density function for a normal distribution with mean 𝜇 and SD 𝜎 is defined by
𝑓(𝑥) = √
𝜎 2𝜋
The following picture depicts a much-often spouted fact in statistics classes that roughly 68% of the probability for a normal distribution falls within 1 standard deviation of the mean, roughly 95% falls within two standard deviations of the mean, etc:
Use Calculus to prove that the inflection point(s) of the probability density function of the normal distribution occur at 𝑥 = 𝜇 ± 𝜎
Show all steps using LaTeX in the Markdown cell below:
First Derivative:
1 𝑑 2
′(𝑥) = √ (𝑒−12(𝑥−𝜇𝜎 ) )
Now, let’s find the second derivative:
Second Derivative:
𝜎
=
𝜎
𝜎
Now we set second derivative = 0 to find inflection points

𝑥 = 𝜇 ± 𝜎
2
0.2 Question 2 :
A hardware store receives a shipment of 10,000 bolts that are supposed to be 12 cm long. The mean of this shipment of 10,000 bolts is indeed 12 cm, and the standard deviation is 0.2 cm.
For the following questions, determine if you have enough information to answer. If you do, then show all steps calculating the answer in the Markdown cell directly below. If you don’t, explain what additional information you would need. If you use any theorems, cite the theorem and specify which assumptions are necessary for the theorem to hold.
Question 2a). What is the probability that a randomly chosen bolt is less than 10 cm long?
Question 2b). For quality control, the hardware store chooses 100 bolts at random to measure. They will declare the shipment defective and return it to the manufacturer if the average length of 100 bolts is less than 11.97 cm or greater than 12.04 cm. Find the probability that the shipment is found satisfactory (i.e. not defective).
2a) We cannot find this because we don’t know the distribution.
2b) From the Central Limit Theorem: 𝑆𝐷 = 0.2
We find the z scores with 𝑧1 = = −1.5,𝑧2 = = 2
From this we find that the chance through the standard normal deviation table the shipment is found satisfactory is that the chance where the bolt is less than 12.04 cm, the chance is less than 11.97 centimeters, which is 0.9772 – 0.0668 = 0.9104.
4 QUESTION 3A: Load the data into a pandas DataFrame called dfIncome, calculate the population mean of Income and then make a density histogram of the Distribution of the Income data with 15 bins. Include a title for your plot and label the x-axis (we have provided a label for the y-axis). Note we have included code to mark where the population mean lies on the histogram.
In [5]: dfIncome = pd.read_csv(‘income_data.csv’) mean_income = dfIncome[‘Income’].mean() print(“Population income mean is”, mean_income)
plt.hist(dfIncome[‘Income’], bins=15, density=True, alpha=0.7) plt.title(‘Income Distribution (Population)’) plt.xlabel(‘Income’)
plt.ylabel(‘Probability Per Dollar’)
plt.scatter(mean_income, -0.0000001, marker=’^’, color=’red’, s=300) plt.show()
Population income mean is 60613.8492

6 QUESTION 3B: Describe the shape of the Income distribution (i.e. comment on modality and skew) The shape of the Income distribution is a right skew and is unimodal, as there is one prominent peak.
8 QUESTION 3C:
i). Write a function to collect a random sample of size sample_size with replacement from dfIncome and plot the density histogram of the empirical distribution of the income for your sample. Use 15 bins for your histogram and set the x-axis range to be from (0,210000). (Hint: use the dataframe method .sample()).
Include a title and label for both axes.
Then run the cells provided below to output 3 separate distributions for sample sizes of 10, 100 and 1000.
In [6]: def income_sample(df, sample_size): sample = df[‘Income’].sample(sample_size) plt.hist(sample, bins=15, density=True) plt.xlim(0,210000)
plt.title(f’Income Distribution (Sample Size = {sample_size})’) plt.xlabel(‘Income’)
plt.ylabel(‘Probability Per Dollar’) plt.show()
In [7]: income_sample(dfIncome,10)

In [8]: income_sample(dfIncome,100)

In [9]: income_sample(dfIncome,1000)

10
Part 3cii). What happens to the shape of the empirical sample distributions of income as you increase the sample size?
As you increase the sample size, the shape slowly becomes the distribution of the total/overall data.
12 QUESTION 3D:
If we want to estimate the mean of the population we can draw a sample from the population and compute the sample mean. As we learned in class, since samples can vary, the sample mean can vary and thus it is a random variable and has its own distribution.
i). Complete the function income_sample_mean below to randomly sample sample_size rows from dfIncome with replacement and return the sample mean of income for that sample.
ii). Complete the function income_sample_dist below to simulate num_simulations of randomly sampling sample_size rows from dfIncome with replacement and calculate the sample mean of income for each sample. Store the sample means in an np.array called means. The function should output a density histogram of the empirical sample mean income distribution. On the histogram, include two markers on the histogram: A red one for the population mean (that you calculated in part 3A) and a yellow one for the mean of the num_simulations sample mean estimates. Include a title and labels for the x and y-axis.
Then run the cells provided below to output 3 separate distributions for num_simulations=1000 and sample_size = 10, 100 and 1000
In [10]: def income_sample_mean(df, sample_size): sample = df[‘Income’].sample(sample_size, replace=True) return sample.mean()
In [11]: def income_sample_dist(df, sample_size, num_simulations): means = np.array([income_sample_mean(df, sample_size) for _ in range(num_simulations)])
# ‘means’ stores “num_simulations” means from samples of size “sample_size” plt.hist(means, bins=30, density=True) plt.xlim([20000,120000])
plt.title(f’Sample Mean Income Distribution ({num_simulations} Simulations, Sample Size = plt.xlabel(‘Income Mean’) plt.ylabel(‘Probability Per Dollar’) plt.scatter(mean_income, -0.0000001, marker=’^’, color=’red’, s=300, label=’Population Mean sample_mean_estimate = means.mean()
plt.scatter(sample_mean_estimate, -0.0000001, marker=’^’, color=’yellow’, s=300, label=’Sam
plt.legend() plt.show()
# Your code for part (ii) above
In [12]: income_sample_dist(dfIncome, 10, 1000)

In [13]: income_sample_dist(dfIncome, 100, 1000)

In [14]: income_sample_dist(dfIncome, 1000, 1000)
14

16 QUESTION 3E:
Describe the shapes of the empirical sample mean distributions (comment on their modality and skew compared to the modality and skew of the population distribution). What happens to the mean and standard deviations of these distributions as you increase the sample size? What is the name of the theorem that explains what you are observing?
The empirical sample mean distributions are all unimodal. The mean seems to stay the same, and standard devaition seems to decrease. The Central Limit Theorm (CLT) helps explain how the shape, mean, and standard deviation of empirical sample mean distributions change as we increase sample size.
18 QUESTION 4H.
Create an array called simulated_statistics that contains 50,000 simulated values of the test statistic under the null hypothesis. Assume that the original sample consisted of 210 experiments.
As usual, start by defining a function one_simulated_statistic() that simulates one value of the statistic. Your function should use np.random.DISTRIBUTION where DISTRIBUTION is the distribution you chose in part 4g. Your function should also use your statistic function from part 4e.
We have included the code that plots the distribution of the simulated values. The red dot represents the observed statistic you found in Question 4f.
In [29]: def one_simulated_statistic(): return statistic(0.50, np.random.binomial(210, 0.5) / 210)
num_simulations = 50000
simulated_statistics = np.array([one_simulated_statistic() for _ in range(num_simulations)])
# Run the this cell a few times to see how the simulated statistic changes one_simulated_statistic()
Out[29]: 2.3809523809523836
In [30]: # Run this cell to produce a histogram of the simulated statistics plt.hist(simulated_statistics, density = True, ec= “white”) plt.xlabel(‘Simulated Statistic’) plt.ylabel(‘Percent per Unit’)
plt.title(‘Histogram of Simulated Statistics’) plt.gca().yaxis.set_major_formatter(PercentFormatter(1)) plt.scatter(observed_statistic, -0.002, color=’red’, s=100); plt.show()

Reviews

There are no reviews yet.

Be the first to review “CSCI3022 – 0.1”

CSCI3022 – 0.1

Description

Reviews

Related products

CSCI3022 – Question 1.2. Solved

CSCI3022 – 0.0.1

CSCI3022 – 1

CSCI3022 – 0.0.1

CSCI3022 – 0.0.1