4.6 Exercises

  1. Consider a sample of \(n=10\) observations displayed in ascending order: \[15, 16, 18, 18, 20, 20, 21, 22, 23, 75.\]

    1. Compute the sample mean and sample variance.

    2. Find the 5-point summary of the data. Is the distribution skewed?

    3. Are there any likely outliers in the sample? If so, indicate their values.

    4. Build and display the sample’s boxplot chart.

    5. Build and display a sample histogram.

  2. The daily number of accidents in Sydney over a 40-day period are provided below: \[\begin{aligned} &6, 3, 2, 24, 12, 3, 7, 14, 21, 9, 14, 22, 15, 2, 17, 10\\ &7, 7, 31, 7, 18, 6, 8, 2, 3, 2, 17, 7, 7, 21, 13, 23, 1, 11\\ &3, 9, 4, 9, 9, 25\end{aligned}\]

    1. Compute the sample mean and sample variance.

    2. Find the 5-point summary of the data. Is the distribution skewed?

    3. Are there any likely outliers in the sample? If so, indicate their values.

    4. Build and display the sample’s boxplot chart.

    5. Build and display a sample histogram.

  3. Repeat the previous question when the “31” is replaced by a “130”.

  4. The grades in a class are shown below. \[\begin{aligned} &80,73,83,60,49,96,87,87,60,53,66,83,32,80,66 \\ &90,72,55,76,46,48,69,45,48,77,52,59,97,76,89 \\ &73,73,48,59,55,76,87,55,80,90,83,66,80,97,80 \\ &55,94,73,49,32,76,57,42,94,80,90,90,62,85,87 \\ &97,50,73,77,66,35,66,76,90,73,80,70,73,94,59 \\ &52,81,90,55,73,76,90,46,66,76,69,76,80,42,66 \\ &83,80,46,55,80,76,94,69,57,55,66,46,87,83,49 \\ &82,93,47,59,68,65,66,69,76,38,99,61,46,73,90,\\ &66,100,83,48,97,69,62,80,66,55,28,83,59,48,61 \\ &87,72,46,94,48,59,69,97,83,80,66,76,25,55,69 \\ &76,38,21,87,52,90,62,73,73,89,25,94,27,66,66 \\ &76,90,83,52,52,83,66,48,62,80,35,59,72,97,69 \\ &62,90,48,83,55,58,66,100,82,78,62,73,55,84,83 \\ &66,49,76,73,54,55,87,50,73,54,52,62,36,87,80,80\end{aligned}\]

    1. Compute the sample mean and sample variance.

    2. Find the 5-point summary of the data. Is the distribution skewed?

    3. Are there any likely outliers in the sample? If so, indicate their values.

    4. Build and display the sample’s boxplot chart.

    5. Build and display a sample histogram.

    6. Based on your analysis, how well did the class do?

  5. Consider the following dataset: \[2.6, 3.7, 0.8, 9.6, 5.8, -0.8, 0.7, 0.6, 4.8, 1.2, 3.3, 5.0, 3.7, 0.1, -3.1, 0.3.\] What are the median and the interquartile range of the sample?

  6. The following charts show a histogram and a boxplot for two samples, \(A\) and \(B\). Based on these charts, which of \(A\) and/or \(B\) (or neither) is likely to arise from a normal population?

  1. Consider the following dataset: \[12, 14, 6, 10, 1, 20, 4, 8.\] What are its median and its first quartile?

  2. A manufacturer of fluoride toothpaste regularly measures the concentration of of fluoride in the toothpaste to make sure that it is within the specifications of \(0.85-1.10\) mg/g.

    1. Build a relative frequency histogram of the data (a histogram with area \(=1\)).

    2. Compute the data’s mean \(\overline{x}\) and its standard deviation \(s_x\).

    3. The mean and the variance can also be approximated as follows. Let \(u_i\) be the class mark for each of the histogram’s classes (the midpoint along the rectangles’ widths), \(n\) be the total number of observations, and \(k\) be the number of classes. Then \[\overline{u}=\frac{1}{n}\sum_{i=1}^kf_iu_i\quad\text{and}\quad s^2_u=\frac{1}{n-1}f_i(u_i-\overline{u})^2.\] Compute \(\overline{u}\) and \(s_u\). How do they compare with \(\overline{x}\) and \(s_x\)?

    4. Provide a the \(5-\)point summary of the data, as well as the interquartile range \(\text{IQR}\).

    5. Display this information as a boxplot chart.

    6. Compute the midrange \(\frac{1}{2}(Q_0+Q_4)\), the trimean \(\frac{1}{4}(Q_1+2Q_2+Q_3)\), and the range \(Q_4-Q_0\) for the fluoride data.

  3. The compressive strength of concrete is normally distributed with mean \(\mu=2500\) and standard deviation \(\sigma=50\). A random sample of size \(5\) is taken. What is the standard error of the sample mean?

  4. A new cure has been developed for a certain type of cement that should change its mean compressive strength. It is known that the standard deviation of the compressive strength is \(130 \text{ kg}/\text{cm}^2\) and that we may assume that it follows a normal distribution. \(9\) chunks of cement have been tested and the observed sample mean is \(\overline{X}= 4970\). Find the \(95\%\) confidence interval for the mean of the compressive strength.

  5. Consider the same set-up as in the previous question, but now \(100\) chunks of cement have been tested and the observed sample mean is \(\overline{X}= 4970\). Find the \(95\%\) confidence interval for the mean of the compressive strength.

  6. Consider the same set-up as in two questions ago, but now we do not know the standard deviation of the normal distribution. \(9\) chunks of cement have been tested, and the measurements are \[5001, 4945, 5008, 5018, 4991, 4990, 4968, 5020, 5003.\] Find the \(95\%\) confidence interval for the mean of the compressive strength.

  7. A steel bar is measured with a device which a known precision of \(\sigma=0.5\)mm. Suppose we want to estimate the mean measurement with an error of at most \(0.2\)mm at a level of significance \(\alpha=0.05\). What sample size is required? Assume normality.

  8. In a random sample of \(1000\) houses in the city, it is found that \(228\) are heated by oil. Find a \(99\%\) C.I. for the proportion of homes in the city that are heated by oil.

  9. Past experience indicates that the breaking strength of yarn used in manufacturing drapery material is normally distributed and that \(\sigma=2\) psi. A random sample of \(15\) specimens is tested and the average breaking strength is found to be \(\overline{x}=97.5\) psi.

    1. Find a \(95\%\) confidence interval on the true mean breaking strength.

    2. Find a \(99\%\) confidence interval on the true mean breaking strength.

  10. The diameter holes for a cable harness follow a normal distribution with \(\sigma = 0.01\) inch. For a sample of size \(10\), the average diameter is \(1.5045\) inches.

    1. Find a \(99\%\) confidence interval on the mean hole diameter.

    2. Repeat this for \(n=100\).

  11. A journal article describes the effect of delamination on the natural frequency of beams made from composite laminates. The observations are as follows: \[230.66, 233.05, 232.58, 229.48, 232.58, 235.22.\] Assuming that the population is normal, find a \(95\%\) confidence interval on the mean natural frequency.

  12. A textile fibre manufacturer is investigating a new drapery yarn, which the company claims has a mean thread elongation of \(\mu=12\) kilograms with standard deviation of \(\sigma=0.5\) kilograms.

    1. What should be the sample size so that with probability \(0.95\) we will estimate the mean thread elongation with error at most \(0.15\) kg?

    2. What should be the sample size so that with probability \(0.95\) we will estimate the mean thread elongation with error at most \(0.05\) kg?

  13. An article in Computers and Electrical Engineering considered the speed-up of cellular neural networks (CNN) for a parallel general-purpose computing architecture. Various speed-ups are observed: \[3.77, 3.35, 4.21, 4.03, 4.03, 4.63, 4.63, 4.13, 4.39, 4.84, 4.26, 4.60.\] Assume that the population is normally distributed. Find a 99% C.I. for the mean speed-up.

  14. An engineer measures the weight of \(n=25\) pieces of steel, which follows a normal distribution with variance \(16\). The average observed weight for the sample is \(\overline{x}=6\). What is the two-sided 95% C.I. for the mean \(\mu\)?

  15. The brightness of television picture tube can be evaluated by measuring the amount of current required to achieve a particular brightness level. An engineer thinks that one has to use 300 microamps of current to achieve the required brightness level. A sample of size \(n=20\) has been taken to verify the engineer’s hypotheses.

    1. Formulate the null and the alternative hypotheses (use a two-sided test alternative).

    2. For the sample of size \(n= 20\) we obtain \(\overline{x}=319.2\) and \(s=18.6\). Test the hypotheses from part a) with \(\alpha=5\%\) by computing a critical region. Calculate the \(p\)-value.

    3. Use the data from part b) to construct a \(95\%\) confidence interval for the mean required current.

  16. We say that a particular production process is stable if it produces at most \(2\%\) defective items. Let \(p\) be the true proportion of defective items.

    1. We sample \(n=200\) items at random and consider hypotheses testing about \(p\). Formulate null and alternative hypotheses.

    2. What is your conclusion of the above test, if one observes \(3\) defective items out of \(200\)? Note: you have to choose an appropriate confidence level \(\alpha\).

  17. Ten engineers’ knowledge of basic statistical concepts was measured on a scale of \(0-100\), before and after a short course in statistical quality control. The results are:

    Engineer \(1\) \(2\) \(3\) \(4\) \(5\)
    Before \(X_{1i}\) \(43\) \(82\) \(77\) \(39\) \(51\)
    After \(X_{2i}\) \(51\) \(84\) \(74\) \(48\) \(53\)
    Engineer \(6\) \(7\) \(8\) \(9\) \(10\)
    Before \(X_{1i}\) \(66\) \(55\) \(61\) \(79\) \(43\)
    After \(X_{2i}\) \(61\) \(59\) \(75\) \(82\) \(53\)

    Let \(\mu_1\) and \(\mu_2\) be the mean mean score before and after the course. Perform the test \(H_0:\mu_1=\mu_2\) against \(H_A: \mu_1<\mu_2\). Use \(\alpha=0.05\).

  18. It is claimed that \(15\%\) of a certain population is left-handed, but a researcher doubts this claim. They decide to randomly sample \(200\) people and use the anticipated small number to provide evidence against the claim of \(15\%\). Suppose \(22\) of the \(200\) are left-handed. Compute the \(p-\)value associated with the hypothesis (assuming a binomial distribution), and provide an interpretation.

  19. A child psychologist believes that nursery school attendance improves children’s social perceptiveness (SP). They use \(8\) pairs of twins, randomly choosing one to attend nursery school and the other to stay at home, and then obtains scores for all \(16\). In \(6\) of the \(8\) pairs, the twin attending nursery school scored better on the SP test. Compute the \(p-\)value associated with the hypothesis (assuming a binomial distribution), and provide an interpretation.

  20. A certain power supply is stated to provide a constant voltage output of \(10\)kV. Ten measurements are taken and yield the sample mean of \(11\)kV. Formulate a test for this situation. Should it be \(1-\)sided or \(2-\)sided? What value of \(\alpha\) should you use? What conclusion does the test and the sample yield?

  21. A company is currently using titanium alloy rods it purchases from supplier \(A\). A new supplier (supplier \(B\)) approaches the company and offers the same quality (at least according to supplier B’s claim) rods at a lower price.

    The company’s decision makers are interested in the offer. At the same time, they want to make sure that the safety of their product is not compromised.

    They randomly selects ten rods from each of the lots shipped by suppliers \(A\) and \(B\) and measures the yield strengths of the selected rods. The observed sample mean and sample standard deviation are \(651\) MPa and \(2\) MPa for supplier’s \(A\) rods, respectively, and the same parameters are \(657\) MPa and \(3\) MPa for supplier B’s rods.

    Perform the test \(H_0:\mu_A=\mu_B\) against \(\mu_A\not=\mu_B\). Use \(\alpha=0.05\). Assume that the variances are equal but unknown.

  22. The deflection temperature under load for two different types of plastic pipe is being investigated. Two random samples of \(15\) pipe specimens are tested, and the deflection temperatures observed are as follows:

    • \(206\), \(188\), \(205\), \(187\), \(194\), \(193\), \(207\), \(185\), \(189\), \(213\), \(192\), \(210\), \(194\), \(178\), \(205\).

    • \(177\), \(197\), \(206\), \(201\), \(180\), \(176\), \(185\), \(200\), \(197\), \(192\), \(198\), \(188\), \(189\), \(203\), \(192\).

    Does the data support the claim that the deflection temperature under load for type \(1\) pipes exceeds that of type \(2\)? Calculate the \(p\)-value, using \(\alpha=0.05\), and state your conclusion.

  23. It is claimed that the breaking strength of yarn used in manufacturing drapery material is normally distributed with mean \(97\) and \(\sigma=2\) psi. A random sample of nine specimens is tested and the average breaking strength is found to be \(\overline{X}=98\) psi. Formulate a test for this situation. Should it be \(1-\)sided or \(2-\)sided? What value of \(\alpha\) should you use? What conclusion does the test and the sample yield?

  24. A civil engineer is analyzing the compressive strength of concrete. It is claimed that its mean is \(80\) and variance is known to be \(2\). A random sample of size \(60\) yields the sample mean \(59\). Formulate a test for this situation. Should it be \(1-\)sided or \(2-\)sided? What value of \(\alpha\) should you use? What conclusion does the test and the sample yield?

  25. The sugar content of the syrup in canned peaches is claimed to be normally distributed with mean \(10\) and variance \(2\). A random sample of \(n=10\) cans yields a sample mean \(11\). Another random sample of \(n=10\) cans yields a sample mean \(9\). Formulate a test for this situation. Should it be \(1-\)sided or \(2-\)sided? What value of \(\alpha\) should you use? What conclusion does the test and the sample yield?

  26. The mean water temperature downstream from a power water plant cooling tower discharge pipe should be no more than \(100\)F. Past experience has indicated that that the standard deviation is \(2\)F. The water temperature is measured on nine randomly chosen days, and the average temperature is found to be \(98\)F. Formulate a test for this situation. Should it be \(1-\)sided or \(2-\)sided? What value of \(\alpha\) should you use? What conclusion does the test and the sample yield?

  27. We are interested in the mean burning rate of a solid propellant used to power aircrew escape systems. We want to determine whether or not the mean burning rate is \(50\) cm/second. A sample of \(10\) specimens is tested and we observe \(\overline{X} =48.5\). Assume normality with \(\sigma=2.5\).

  28. Ten individuals have participated in a diet modification program to stimulate weight loss. Their weight both before and after participation in the program is shown below: \[\begin{array}{cl}\hline \mbox{Before} & 195, 213, 247, 201, 187, 210, 215, 246, 294, 310\\\hline \mbox{After} & 187, 195, 221, 190, 175, 197, 199, 221, 278, 285\\\hline \end{array}\] Is there evidence to support the claim that this particular diet-modification program is effective in producing mean weight reduction? Use \(\alpha=0.05\). Compute the associated \(p-\)value.

  29. We want to test the hypothesis that the average content of containers of a particular lubricant equals \(10\)L against the two-sided alternative. The contents of a random sample of \(10\) containers are

    \(10.2\) \(9.7\) \(10.1\) \(10.3\) \(10.1\)
    \(9.8\) \(9.9\) \(10.4\) \(10.3\) \(9.5\)

    Find the \(p-\)value of this two-sided test. Assume that the distribution of contents is normal. Note that if \(x_i\) represent the measurements, \(\sum_{i=1}^{10}x_i^2=1006.79\).

  30. An engineer measures the weight of \(n=25\) pieces of steel, which follows a normal distribution with variance \(16\). The average weight for the sample is \(\overline{X}=6\). They want to test for \(H_{0}:\mu =5\) against \(H_{1}:\mu >5\). What is the \(p-\)value for the test?

  31. The thickness of a plastic film (in mm) on a substrate material is thought to be influenced by the temperature at which the coating is applied. A completely randomized experiment is carried out. \(11\) substrates are coated at \(125\)F, resulting in a sample mean coating thickness of \(\overline{x}_1=103.5\) and a sample standard deviation of \(s_1=10.2\). Another \(11\) substrates are coated at \(150\)F, for which \(\overline{x}_2=99.7\) and \(s_2=11.7\) are observed. We want to test equality of means against the two-sided alternative. Assume that population variances are unknown but equal. The value of the appropriate test statistics and the decision are \((\text{for } \alpha=0.05)\):

  32. The following output was produced with t.test command in R.

     One Sample t-test
     data:  x
     t = 2.0128, df = 99, p-value = 0.02342
     alternative hypothesis: true mean is greater than 0

    Based on this output, which statement is correct?

    1. If the type I error is \(0.05\), then we reject \(H_0:\mu=0\) in favour of \(H_1:\mu>0\);

    2. If the type I error is \(0.05\), then we reject \(H_0:\mu=0\) in favour of \(H_1:\mu\not=0\);

    3. If the type I error is \(0.01\), then we reject \(H_0:\mu=0\) in favour of \(H_1:\mu>0\);

    4. If the type I error is \(0.01\), then we reject \(H_0:\mu=0\) in favour of \(H_1:\mu<0\);

    5. The type I error is \(0.02342\).

  33. A pharmaceutical company claims that a drug decreases a blood pressure. A physician doubts this claim. They test \(10\) patients and records results before and after the drug treatment:

     > Before=c(140,135,122,150,126,
         138,141,155,128,130)
     > After=c(135,136,120,148,122,
         136,140,153,120,128)

    At the R command prompt, they type:

     > test.t(Before,After,alternative=
         "greater")
         data:  Before and After
         t = 0.5499, p-value = 0.2946
         alternative hypothesis: true 
             difference in means is 
             greater than 0
         sample estimates: mean of x mean of y
             136.5     133.8

    Their assistant claims that the command should instead be:

     > test.t(Before,After,paired=TRUE,
         alternative="greater")
    
         data: Before and After t = 3.4825, 
             df = 9, p-value = 0.003456
         alternative hypothesis: true 
             difference in means is 
             greater than 0
         sample estimates: mean of the 
             differences
             2.7

    Which answer is best?

    1. The assistant uses the correct command. There is not enough evidence to justify that the new drug decreases blood pressure;

    2. The assistant uses the correct command. There is enough evidence to justify that the new drug decreases blood pressure for any reasonable choice of \(\alpha\);

    3. The physician uses the correct command. There is not enough evidence to justify that the new drug decreases blood pressure;

    4. The physician uses the correct command. There is enough evidence to justify that the new drug decreases blood pressure for any reasonable choice of \(\alpha\);

    5. Nobody is correct, \(t-\)tests should not be used here.

  34. A company claims that the mean deflection of a piece of steel which is \(10\)ft long is equal to \(0.012\)ft. A buyer suspects that it is bigger than \(0.012\)ft. The following data \(x_i\) has been collected: \[\begin{aligned} 0.0132, 0.0138, 0.0108, 0.0126, 0.0136, \\ 0.0112, 0.0124, 0.0116, 0.0127, 0.0131.\end{aligned}\] Assuming normality and that \(\sum_{i=1}^{10}x_i^2=0.0016\), what are the \(p-\)value for the appropriate one-sided test and the corresponding decision?

    1. \(p\in (0.05, 0.1)\) and reject \(H_0\) at \(\alpha=0.05\).

    2. \(p\in (0.05, 0.1)\) and do not reject \(H_0\) at \(\alpha=0.05\).

    3. \(p\in (0.1, 0.25)\) and reject \(H_0\) at \(\alpha=0.05\).

    4. \(p\in (0.1, 0.25)\) and do not reject \(H_0\) at \(\alpha=0.05\).

  35. In an effort to compare the durability of two different types of sandpaper, \(10\) pieces of type \(A\) sandpaper were subjected to treatment by a machine which measures abrasive wear; \(11\) pieces of type \(B\) sandpaper were subjected to the same treatment. We have the following observations:

    \(x_A\): 27, 26, 24, 29, 30, 26, 27, 23, 28, 27 \(x_B\): 24, 23, 22, 27, 24, 21, 24, 25, 24, 23, 20

    Note that \(\sum x_{A,i}=267\), \(\sum x_{B,i}=257\), \(\sum x_{A,i}^2=7169\), \(\sum x_{B,i}^2=6041\). Assuming normality and equality of variances in abrasive wear for \(A\) and \(B\), we want to test for equality of mean abrasive wear for \(A\) and \(B\). What is the appropriate \(p-\)value for this test?

  36. The following output was produced with t.test command in R.

     One Sample t-test
     data:  x
     t = 32.9198, df = 999, p-value < 2.2e-16
     alternative hypothesis: true mean is not 
         equal to 0

    Based on this output, which statement is correct?

    1. If the type I error is \(0.05\), then we reject \(H_0:\mu=0\) in favour of \(H_1:\mu>0\);

    2. If the type I error is \(0.05\), then we reject \(H_0:\mu=0\) in favour of \(H_1:\mu\not=0\);

    3. If the type I error is \(0.01\), then we reject \(H_0:\mu=0\) in favour of \(H_1:\mu>0\);

    4. If the type I error is \(0.01\), then we reject \(H_0:\mu=0\) in favour of \(H_1:\mu<0\);

    5. None of the above.

  37. A medical team wants to test whether a particular drug decreases diastolic blood pressure. Nine people have been tested. The team measured blood pressure before (\(X\)) and after (\(Y\)) applying the drug. The corresponding means were \(\overline{X}= 91\), \(\overline{Y} = 87\). The sample variance of the differences was \(S^2_D= 25\). What is the \(p-\)value for the appropriate one-sided test?

  38. A researcher studies a difference between two programming languages. Twelve experts familiar with both languages were asked to write a code for a particular function using both languages and the time for writing those codes was registered. The observations are as follows.

     Expert 01 02 03 04 05 06 07 08 09 10 11 12
     Lang 1 17 16 21 14 18 24 16 14 21 23 13 18
     Lang 2 18 14 19 11 23 21 10 13 19 24 15 29

    Construct a 95% C.I. for the mean difference between the first and the second language. Do we have any evidence that one of the languages is preferable to the other (i.e.the average time to write a function is shorter)?

  39. Consider a proportion of recaptured moths in the light-coloured (\(p_1\)) and the dark-coloured (\(p_2\)) populations.

    Among the \(n_1=137\) light-coloured moths, \(y_1=18\) were recaptured; among the \(n_2=493\) dark-coloured moths, \(y_2=131\) were recaptured. Is there a significant difference between the proportion of recaptured moths in both populations?