Final exam practice problems

Question 1: Multiple Choice Questions

(a) A factorial design to assess the effects of seven factors (each has two levels) in eight runs is an example of a

A. $2^{7}$ factorial design

B. $2^{3}$ factorial design

C. $2^{7-4}$ factorial design

D. $2^{8-5}$ factorial design

(b) A clinical trial comparing five treatment means using an ANOVA model at the 5% level found a significant F test. If all pairs of treatment means are compared then the probability of not falsely declaring that at least one pair of treatment means is significant different is:

A. 0.05

B. 0.5

C. 0.599

D. 0.401

(c) Three different washing solutions are being compared to study their effectiveness in retarding bacterial growth in 5-gallon milk containers. The analysis is done in a laboratory, and only three trials can be run on any given day. Because days may introduce variability, the experiment is conducted over four days. Which experimental design is most appropriate?

A. Randomized Block Design

B. Latin Square Design

C. Randomized Design

D. Graeco-Latin Square Design

(d) Suppose that a single-factor experiment with five levels of the factor has been conducted. There are three replicates and the experiment has been conducted as a complete randomized design. If the experiment had been conducted in blocks, the pure error degrees of freedom would be reduced by

A. 10

B. 2

C. 8

(e) A completely randomized design is conducted with 4 treatments (A, B, C, D) and 12 participants. If 4 participants are assigned to treatment A, 3 to treatment B, 3 to treatment C, and 2 to treatment D, how many possible treatment assignments are there?

A. 72

B. 495

C. 277200

D. None of the above.

(f) Suppose you are interested in the effect of the presence of vending machines in schools on childhood obesity. What randomized experiment would you want to do to evaluate this question?

A. Experiment

B. Observational Study

C. Block Randomized Design

D. Paired Randomized Design

E. Not enough information to determine.

(g) Suppose you are interested in the effect of smoking on lung cancer. What randomized experiment could you plausibly perform to evaluate this effect?

A. Experiment

B. Observational Study

C. Block Randomized Design

D. Paired Randomized Design

E. Not enough information to determine.

(h) Suppose we want to study the causal effect of receiving the measles vaccine on developing measles. Let $Y(0)$ be Zahra’s potential measles status if she is not vaccinated, and let $Y(1)$ be her potential measles status if she is vaccinated. Zahra has a sibling, Steph. According to the Stable Unit Treatment Value Assumption (SUTVA), which of the following must be true?

A. Zahra’s outcome depends on Steph’s vaccination status.

B. Zahra’s outcome depends only on Zahra’s vaccination status, not on Steph’s vaccination status.

C. Zahra and Steph must not receive different versions of the measles vaccine.


Question 2

(a) If each run in a $2^{4}$ full factorial design has been conducted only once, describe three methods that might be used to distinguish real effects from noise.

(b) How would you structure the runs for a $2^{4}$ factorial design in the following cases?

(i) Two blocks of eight runs. Justify any assumptions you make.

(ii) Four blocks of four runs. Justify any assumptions you make.

(c) If each run in a $2^{4}$ factorial design has been independently duplicated, how can the standard error of an effect be calculated? What is meant by “independently duplicated”?


Question 3

A metallurgical engineer is about to begin a comprehensive study to determine the effects of six variables on the strength of a certain type of alloy.

(a) If a $2^{6}$ factorial design is used, how many runs are required?

(b) If $\sigma^{2}$ represents the experimental error variance of an individual observation, what is the variance of a main effect?

(c) What is the usual formula for a 99% confidence interval for the main effect of a factor?

(d) Based on previous work, it is believed that $\sigma = 8000$ pounds. If the experimenter wants 99% confidence intervals for the main effects and interactions with lengths equal to 4000 pounds (i.e., the difference between the upper and lower limits is 4000 pounds), how many replications of the $2^{6}$ factorial design are required?


Question 4

Consider a $2^{8-4}$ fractional factorial design:

(a) How many factors are included in this design?

(b) How many runs are required for this design?

(c) How many levels does each factor have?

(d) How many independent generators are there for this design?

(e) How many words in the defining relation?


Question 5

(Cube Plot Data: C levels 1/-1, T levels 1/-1, K levels 1/-1. Values at corners: 92, 94, 105, 103, 116, 107, 92, 102)

(a) Determine the main effects of factors T, K, and C using the cube plot.

(b) Determine the two-factor interaction effects of TK, TC, and KC using the cube plot.


Question 6

Six burn treatments A, B, C, D, E, F were tested on six subjects (volunteers). Each subject has six sites on which a burn could be applied for testing (each arm with two below the elbow and one above). A standard burn was administered at each site and the six treatments were arranged so that each treatment occurred once with every subject once in every position. After treatment each burn was covered by a clean gauze; treatment C was a control with clean gauze but without other treatment. The data are the number of hours for a clearly defined degree of partial healing to occur.

Table 1. Burn treatment Latin square: hours to partial healing by position

Positions on arm123456
I$A=32$$B=40$$C=72$$D=43$$E=35$$F=50$
II$B=29$$E=53$$D=32$$A=37$$F=59$$C=53$
III$C=40$$B=48$$F=37$$D=56$$A=53$$E=43$
IV$D=29$$A=56$$C=38$$E=67$$F=59$$B=42$
V$E=28$$C=50$$B=100$$F=46$$A=29$$D=56$
VI$F=37$$E=42$$D=67$$C=50$$B=33$$A=48$

(a) Identify the type of experimental design used in this study. Describe its main characteristics.

(b) Explain how this design could be randomized while preserving its structure, and why randomization is important.

(c) Construct the ANOVA table for this design, including sources of variation, degrees of freedom, sums of squares, mean squares, and F-statistic.

(d) Perform the ANOVA F-test at the $\alpha=0.05$ significance level. State your conclusion about whether there is a significant difference among the treatment means.


Question 7

Consider an experiment to compare 7 treatments in blocks of size 5. Taking all possible combinations of five treatments from seven gives a balanced incomplete block design with $r=15$.

(a) How many blocks does the design have?

(b) Show that r must be a multiple of five for a balanced incomplete block design with $v=7$ treatments and blocks of size $k=5$ to exist.

(c) Show that the smallest balanced incomplete block design has $r=15$ observations per treatment.


Question 8

A study is conducted to compare $k=4$ treatments, based on $n=6$ replicates per treatment.

(a) Complete the following ANOVA table. Assume that the between treatments sum of squares accounts for 60% of the total variation in the sample data.

Table 2. ANOVA table (to be completed)

Source of VariationdfSum of SquaresMean SquareF
Between treatments
Within treatments
Total10000

(b) Using the completed ANOVA table, test whether there are significant differences among the treatment means at the significance level $\alpha=0.05$.


Question 9

A study was conducted to compare 3 treatments for patients with HIV. The response was a measure of change in CD4+ counts. There were $n=90$ patients in each of the 3 treatment conditions. The means and MSE are given below:

$$\bar{y}_{1\cdot}=12.2, \quad \bar{y}_{2\cdot}=5.1, \quad \bar{y}_{3\cdot}=-0.3, \quad MSE=400.$$

(a) Compute simultaneous 95% confidence intervals for all pairs of treatments means using both Bonferroni and Tukey methods:

Bonferroni: $\mu_{1}-\mu_{2}$; $\mu_{1}-\mu_{3}$; $\mu_{2}-\mu_{3}$

Tukey: $\mu_{1}-\mu_{2}$; $\mu_{1}-\mu_{3}$; $\mu_{2}-\mu_{3}$

(b) Based on the Bonferroni confidence intervals, indicate which pairs of treatments are significant different.

(c) Based on the Tukey confidence intervals, indicate which pairs of treatments are significant different.


Question 10

A study compared the antioxidant activity for 4 varieties of green tea. Five replicates were obtained from each tea variety and the total phenolic content was measured.

Table 3. Green tea: sample means and SDs of total phenolic content

Variety1234
Mean160170140190
SD15181215

(a) Complete the following ANOVA table.

(b) Test whether there are significant differences among the mean total phenolic contents at $\alpha=0.05$.


Question 11

Consider the following hypothetical experiment on 800 persons.

Table 4. Potential outcomes and treatment by category (800 persons)

CategoryPersonsxTY(0)Y(1)
11000023
21001023
32000123
42001123
5500056
6501056
7500156
8501156

(a) Give an example of a context for this study. Define $x, T, Y(0), Y(1)$.

(b) Calculate the average causal treatment effect.

(c) Is the covariate $x$ balanced between the treatment groups?

(d) Is it plausible to believe that these data came from a randomized experiment?

(e) What is the relation between (Mean Y|T=1 - Mean Y|T=0) and the average causal treatment effect?

(f) Is it plausible to believe that treatment assignment is ignorable given the covariate $x$?


Midterm — Regular

Question 2 (20 marks)

A study is conducted to investigate whether memory recall differs depending on whether a person sleeps or ingests caffeine during a break. Six adults are randomly assigned equally to one of two groups. All participants are first given a list of six words to memorize. During a break, one group takes a 90-minute nap, while the other group ingests a caffeine pill. After the break, the number of words each participant correctly recalls is recorded. We are interested in determining whether there is a difference in the average number of words recalled between the sleep and caffeine groups. The data are shown below:

Table 5. Words recalled: sleep vs caffeine

AdultNumber of Words RecalledGroup
114Sleep
212Caffeine
318Sleep
414Caffeine
511Sleep
66Caffeine

(a) (3 marks) State the null and alternative hypothesis for this study. Specify the statistical parameters in terms of the experimental context.

Let $\theta = \mu_s - \mu_c$ where $\mu_s$: mean number of words recalled by adults who make a 90-minute nap during the break. $\mu_c$: mean number of words recalled by adults who ingest a caffeine pill during the break.

$H_0: \theta = 0$ vs $H_1: \theta \neq 0$


[Remark: R Code here, Analysis #1 and #2 calculating all possible randomization differences under the null hypothesis and returning the summary of observed results.]

(b) (2 marks) What assumption do we make in creating the randomization distribution? Explain

The number of words recalled would be the same regardless of whether the subject was put in the sleep or the caffeine group. This means that the two results obtained from each particular adult will be exchangeable.

(c) (5 marks) Describe how the randomization distribution for the difference in means is calculated. Write down all the values of this randomization distribution. (Hint: Look at the analyses done using R)

The randomization distribution is calculated by assuming that $H_0$ is true. There are $\binom{6}{3} = 20$ possible treatment assignments. The values of the randomization distribution are the 20 mean differences that could have arisen if $H_0$ is true. The values are obtained from Analysis #2: $3.667, 4.333, \dots, -3.667$.

(d) (5 marks) Calculate the appropriate randomization p-value. Is there a significant difference in the average number of words recalled between the sleep and caffeine groups at the 2% significance level? Explain your reasoning. (Hint: Look at the analyses done using R)

$$P\text{-value} = P(|\delta| \ge |\delta_{obs}|) = \frac{1}{20}\sum_{k=1}^{20} I(|\delta_k| \ge |\delta_{obs}|)$$

$\delta_{obs} = 3.667$

$\#\{|\delta_k| \ge |\delta_{obs}|\} = \#\{3.667, 4.333, 5.667, -4.333, -3.667, \dots, -3.667\} = 10$

$$P\text{-value} = \frac{10}{20} = 0.5 = 50\%$$

P-value > 2% $\Rightarrow$ Fail to reject $H_0$ at $\alpha=2\%$.

Therefore, not enough evidence to say the sleep and caffeine groups are different at $\alpha=2\%$.

(e) (5 marks) A confused statistician calculated two t-tests: one is appropriate, and the other is inappropriate (see the R output below). If you use the results from the appropriate t-test do you reach the same conclusion as you did using the randomization test (assume the 2% significance level)? What has to be assumed in order for the t-test to be valid.

The appropriate t-test is t-test #1.

The two-sample t-test has a p-value = 0.3101.

p-value > 2% $\Rightarrow$ Fail to reject $H_0$ at $\alpha=2\%$.

Same conclusion as the randomization test.

The t-test assumes that the two groups are independent and that the data in each group are independent and normally distributed with different means, and possibly different variances.


[Remark: R Code here, showing outputs for Welch Two Sample t-test (t-test #1) and Paired t-test (t-test #2)]


Question 3 (8 marks)

A business analyst believes that the average number of tasks completed per week is greater for employees who work in the office than for those who work remotely. Weekly productivity levels in the two groups are assumed independent and normally distributed with means $\mu_1$ (in-office) and $\mu_2$ (remote), and common variance $\sigma^2$. To test this belief, the following hypotheses are considered:

$H_0: \theta = 0$ versus $H_1: \theta > 0$ where $\theta = \mu_1 - \mu_2$

(a) (3 marks) Assuming the test statistic

$$Z = \frac{\overline{Y}_1 - \overline{Y}_2}{\sigma\sqrt{1/n_1 + 1/n_2}}$$

is used, show that the power of the test at level $\alpha$ is:

$$1 - \beta = 1 - \Phi\left(z_\alpha - \frac{\theta}{\sigma\sqrt{1/n_1 + 1/n_2}}\right), \theta > 0,$$

where $\Phi(\cdot)$ is the standard normal CDF, and $z_\alpha$ is the $100(1-\alpha)^{th}$ percentile of the standard normal distribution.

By definition, power = $P(\text{Reject } H_0 \mid H_1 \text{ true})$.

$$ \begin{aligned} 1 - \beta &= P\left(\frac{\overline{Y}_1 - \overline{Y}_2}{\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} > z_\alpha \bigg| \theta > 0\right) \\ &= P\left(\frac{\overline{Y}_1 - \overline{Y}_2 - \theta}{\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} > z_\alpha - \frac{\theta}{\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \bigg| \theta > 0\right) \\ 1 - \beta &= P\left(Z > z_\alpha - \frac{\theta}{\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\right) \\ \Rightarrow\quad 1 - \beta &= 1 - \Phi\left(z_\alpha - \frac{\theta}{\sigma\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\right) \end{aligned} $$

(b) (5 marks) The analyst plans an experiment with equal group sizes, $n_1 = n_2 = \frac{n}{2}$, where $n_1$ is the number of in-office employees and $n_2$ is the number of remote employees. Show the total sample size $n = n_1 + n_2$ required for the experiment to achieve power $1-\beta$ is

$$n = \frac{4\sigma^2(z_\alpha + z_\beta)^2}{\theta^2}.$$

$\frac{1}{n_1} + \frac{1}{n_2} = \frac{2}{n} + \frac{2}{n} = \frac{4}{n}$ $\beta = \Phi\left(z_\alpha - \frac{\sqrt{n}\theta}{2\sigma}\right)$ $\Rightarrow -z_\beta = z_\alpha - \frac{\sqrt{n}\theta}{2\sigma}$ $\frac{\sqrt{n}\theta}{2\sigma} = z_\alpha + z_\beta$

Thus,

$$n = \frac{4\sigma^2(z_\alpha + z_\beta)^2}{\theta^2}$$

Question 4 (5 marks)

A statistician is designing a phase III clinical trial comparing a continuous outcome in two groups receiving experimental versus standard therapy with a total sample size of 168 patients. The team requires the study have 80% power at the 5% significance level to detect a difference of 1. Assume that the standard deviation of the outcome is 2. The design team would like to investigate whether it’s possible to have four times as many patients in the experimental group versus the control group without having to increase the total sample size.

[Placeholder: Graph of Power as a function of allocation ratio with fixed total sample size n=168 and significance level alpha=0.05. The curve peaks near ratio 1.0 (power ~0.9) and slowly descends as the ratio increases towards 5.]

(a) (2 marks) What is the power if there are four times as many patients in the experimental group?

Allocation ratio: $r = 4$

From the graph, when $r=4$, the power is slightly below 75%, likely around 73-74%.

(b) (3 marks) What should the statistician recommend to the team in order for the study to have at least 80% power?

Since the power with a 4:1 allocation is approximately $73\%-74\%$ which is below the required $80\%$, the statistician should recommend an allocation ratio between $r=0.3$ and $r=3$ in order to achieve at least $80\%$ power while keeping the total sample size fixed at 168.


Question 5 (5 marks)

A study is being designed to compare how iOS and Android smartphone users view the use of biometric authentication (e.g., face or fingerprint recognition) as “secure.” The investigators estimate that it is feasible to enroll 200 participants per group. Emma, a statistician among the investigators, simulated 30 hypothetical studies using R, testing $H_0: p_1 - p_2 = 0$ versus $H_1: p_1 - p_2 = 0.14$. The simulation assumed that $p_1 = 0.79$ is the proportion of iOS users who consider biometric authentication secure, and $p_2 = 0.65$ is the proportion of Android users who consider it secure.

Table 6. Simulated two-sample $z$-statistics (30 studies)

Study Numberz-statisticStudy Numberz-statisticStudy Numberz-statisticStudy Numberz-statisticStudy Numberz-statistic
11.23821.29331.58741.60052.065
62.11272.13782.15592.164102.212
112.665122.753133.005143.029153.339
163.792173.889184.319194.426205.015
215.685225.890236.429246.609256.635
266.800276.836288.148298.9503010.795

Estimate the power of the test at the 1% significance level? Show your work and interpret the estimated power in the context of this study. (Hint: Use the R output below to answer the question)

[Remark: R Code here, showing quantiles for standard normal distribution using qnorm(c(0.90, 0.95, 0.975, 0.990, 0.995))]

$H_0: p_1 - p_2 = 0$ vs $H_1: p_1 - p_2 = 0.14$ Test decision: Reject $H_0$ if $|z| \ge z_{\alpha/2}$ where $z$ is observed test statistic. $z_{\alpha/2} = z_{0.01/2} = z_{0.005} = \text{qnorm}(1-0.005) = \text{qnorm}(0.995) = 2.5758$

$$Power = \frac{1}{n}\sum_{i=1}^{n} I(|z_i| \ge z_{\alpha/2}) = \frac{\#\{|z_i| \ge 2.5758\}}{30}$$

$\#\{|z_i| \ge 2.5758\} = 20, i=1,2,\dots,30$ Thus, $Power = \frac{20}{30} = 66.67\%$


Question 6 (15 marks)

The table below lists two potential outcomes (Y(0), Y(1)), where 0 denotes the placebo and 1 denotes the active treatment, for a set of 6 individuals. The table also includes the observed treatment assignment T and a pre-treatment covariate X for each individual. Assume we are interested in estimating the additive treatment effect.

Table 7. Potential outcomes and assignment for six units

UnitTXY(0)Y(1)
Alice118193
Alex007687
Beatrice116671
Bob007280
Karl016574
Sarah107582

(a) (2 marks) Explain the meaning of Y(0) and Y(1).

$Y(0)$ is the outcome that would be observed if unit received the control treatment.

$Y(1)$ is the outcome that would be observed if the same unit received the active treatment.

(b) (1 mark) What is the causal effect of receiving treatment for Beatrice?

$$Y_3(1) - Y_3(0) = 71 - 66 = 5$$

(c) (2 marks) The Stable Unit Treatment Value Assumption (SUTVA) plays a central role in the potential outcomes approach to causal inference. What does this assumption say?

No interference: Potential outcome for unit $i$ do not depend on the treatment assigned to other units.

No hidden variation of treatments: One form of the treatment so that $Y(1)$ is well defined and always means the same thing.

(d) (4 marks) Compute the average treatment effect in this population of 6 individuals?

$$ \begin{aligned} ATE &= \frac{1}{n}\sum_{i=1}^{n}\Delta_i = \frac{1}{n}\sum_{i=1}^{n}\{Y_i(1) - Y_i(0)\} \\ &= \frac{1}{6}\{12 + 11 + 5 + 8 + 9 + 7\} = \frac{52}{6} = 8.667 \end{aligned} $$

(e) (2 marks) Describe what it means for the treatment assignment mechanism to be unconfounded given X.

The treatment assignment is unconfounded given $X$ if $T$ is independent of $Y(0)$ and $Y(1)$ conditional on $X$.

(f) (4 marks) Based on the table above, do you believe that treatment assignment is unconfounded given X? Justify your answer.

We need to verify that

$$P(T_i=1 \mid Y_i(0), Y_i(1), X_i) = P(T_i=1 \mid X_i)$$

$X=0$: $P(T_i=1 \mid X_i=0) = \frac{1}{3}$

Table 8. Units with $X=0$

UnitY(0)Y(1)T
Alex76870
Bob72800
Sarah75821

Treatment occurs with probability $\frac{1}{3}$ regardless of the potential outcomes.

$X=1$: $P(T_i=1 \mid X_i=1) = \frac{2}{3}$

Table 9. Units with $X=1$

UnitY(0)Y(1)T
Alice81931
Beatrice66711
Karl65740

Units with different potential outcomes still receive treatment with probability $\frac{2}{3}$.

Therefore, the treatment assignment is unconfounded given $X$.


Midterm — Makeup

Question 2 (20 marks)

Two drugs A and B are to be tested on four subjects’ eyes. The drugs will be randomly assigned to each subject’s eyes based on the flip of a fair coin. If the coin toss is heads then a subject will receive drug A in the left eye and drug B in the right eye; if the coin toss is tails then the subject will receive drug A in the right eye and drug B in the left eye. The outcome of interest is intraoccular pressure (IOP), measured in millimeters of mercury (mmHg). The table below shows the change in IOP for each eye after one week of treatment. The experimenters are interested in determining whether the intraoccular pressure from A and B are different.

Table 10. Change in IOP (mmHg) by subject and eye

SubjectDrugAsideADrugBsideB
1-4.1L-2.7R
2-3.2R-3.5L
3-2.8L-1.9R
4-3.6L-2.4R

(a) (3 marks) State the null and alternative hypothesis for this study. Specify the statistical parameters in terms of the experimental context.

Let $\mu_A$: mean changes in IOP for drug A $\mu_B$: mean changes in IOP for drug B Let $D_i =$ IOP change from drug A - IOP change from drug B for subject i

$H_0: \mu_A - \mu_B = 0$ vs $H_1: \mu_A - \mu_B \neq 0$ or $H_0: \mu_D = 0$ vs $H_1: \mu_D \neq 0$


[Remark: R Code here, Analysis #1 and #2 calculating the randomization distribution of the mean differences and observed difference.]


(b) (2 marks) What type of design is used in this study? Explain your answer.

Type of design: Paired randomized design.

Explanation:

  • Each subject has two eyes (paired experimental units).

  • A fair coin flip determines which eye gets drug A and which eye gets drug B.

(c) (5 marks) Describe how the randomization distribution for the difference in means is calculated. What has been assumed in calculating this distribution? Write down all the values of this randomization distribution. (HINT: Look at the analyses done using R)

The randomization distribution is calculated by assuming that $H_0$ is true.

The randomization distribution assumes that, if the $H_0$ is true, the two results obtained from each particular subject will be exchangeable.

There are $2^4 = 16$ possible treatment assignments. The values of the randomization distribution are the 16 mean differences that could have arisen if $H_0$ is true. The values are obtained from Analysis #2: $0.80, \dots, -0.80$.

(d) (5 marks) Calculate the appropriate randomization p-value. Is there a significant difference in effectiveness between drugs A and B at the 3% significance level? Explain your reasoning. (Hint: Look at the analyses done using R.)

$$P\text{-value} = \frac{1}{16} \sum_{k=1}^{16} I(|\delta_k| \ge |\delta_{obs}|)$$$$\delta_{obs} = \frac{-1.4 + 0.3 - 0.9 - 1.2}{4} = -0.8$$$$\#\{|\delta_k| \ge |\delta_{obs}|\} = \#\{0.80, 0.95, -0.95, -0.80\} = 4$$$$p\text{-value} = \frac{4}{16} = 0.25 = 25\%$$

p-value > 3% $\Rightarrow$ Fail to reject $H_0$ at $\alpha=3\%$.

Therefore, not enough evidence to say the drugs are different at $\alpha=3\%$.

(e) (5 marks) A confused statistician calculated two t-tests: one is appropriate, and the other is inappropriate (see the R output below). If you use the results from the appropriate t-test do you reach the same conclusion as you did using the randomization test (assume the 3% significance level)? What has to be assumed in order for the t-test to be valid.

The appropriate t-test is t-test #1.

$P\text{-value} = 0.1265 = 12.65\%$.

P-value > 3% $\Rightarrow$ Fail to reject $H_0$ at $\alpha=3\%$.

Same conclusion as the randomization test.

Assumption for the paired t-test:

For each subject i,

$$D_i \sim \text{ind } N(\mu_D, \sigma_D^2) \text{ where } \mu_D = \mu_A - \mu_B$$

[Remark: R Code here, showing outputs for Paired t-test (t-test #1) and Welch Two Sample t-test (t-test #2)]


Question 3 (10 marks)

(a) (3 marks) A team of scientists believes they have found a drug that improves memory in older adults. The drug will be tested on pairs of twins. Within each twin pair, treatment assignment will be determined by flipping a fair coin. If the coin lands heads, the first twin receives the drug and the second twin receives a placebo; if the coin lands tails, the first twin receives the placebo and the second twin receives the drug. Memory performance is assessed by asking participants a standardized series of questions about their childhood, and a total memory score is recorded for each individual. The outcome of interest is the difference in memory scores between treated and untreated twins within each pair. The smallest clinically meaningful difference in memory score is 2.5 points. The standard deviation of memory scores under each condition is 3 points, and the standard deviation of the within-pair differences is 2 points. Circle the correct R output below that will allow you to determine how many twin pairs the researchers should enroll so that the study has 85% power to detect this difference at the 5% significance level.

[Remark: R Code here, showing Power calculation #1 using sd=3 and Power calculation #2 using sd=2]

The correct answer is Power calculation #2

(b) (4 marks) A clinical trial where an experimental drug is to be compared with the standard treatment for a continuous biomarker measurement is being planned. The biomarker measurements in each group are independent and normally distributed with different means $\mu_1$, $\mu_2$, and common variance $\sigma^2$. The power function of the test $H_0: \theta=0$ versus $H_1: \theta>0$, where $\theta = \mu_1 - \mu_2$ at level $\alpha$ is:

$$1-\beta=1-\Phi\left(z_{\alpha}-\frac{\theta}{\sigma\sqrt{1/n_1+1/n_2}}\right), \theta>0,$$

where, $\Phi(\cdot)$ is the standard normal CDF, and $z_{\alpha}$ is the $100(1-\alpha)^{th}$ percentile of the standard normal distribution. Show that if the numbers of patients in the two groups are the same with $n_1=n_2=\frac{n}{2}$ then the total sample size required so that the trial has power $1-\beta$ is:

$$n=\frac{4\sigma^2(z_{\alpha}+z_{\beta})^2}{\theta^2}.$$

$\frac{1}{n_1} + \frac{1}{n_2} = \frac{2}{n} + \frac{2}{n} = \frac{4}{n}$ Thus, $\beta = \Phi\left(z_{\alpha} - \frac{\sqrt{n}\theta}{2\sigma}\right)$ $\Rightarrow -z_{\beta} = z_{\alpha} - \frac{\sqrt{n}\theta}{2\sigma}$ $\frac{\sqrt{n}\theta}{2\sigma} = z_{\alpha} + z_{\beta}$ $\Rightarrow n = \frac{4\sigma^2(z_{\alpha} + z_{\beta})^2}{\theta^2}$

(c) (3 marks) Consider again the situation in part (b) on the previous page. The smallest clinically meaningful treatment difference is $\theta=1$ and the variance is $\sigma^2=9$. If we conduct a one-sided Z-test at a significance level of $\alpha=0.05$ what is the power if the researchers enroll 80 subjects in each arm? The clinical trial statistician calculated the power using R (the value for power is below each R code statement). Which calculation is the correct power for the scenario in this question? Circle the correct answer.

[Remark: R Code here, multiple calculations for power using pnorm and dnorm to evaluate standard normal CDF/PDF]

The correct answer is 1 - pnorm(qnorm(1 - 0.05) - 1 / (3 * sqrt(1/80 + 1/80)))


Question 4 (5 marks)

A clinical trial to test an experimental drug for prostate cancer is being designed by a group of university researchers. The response rate of the standard chemotherapy for prostate cancer is 25%, and the researchers expect that the experimental drug would increase the response rate of the standard treatment to 36%. The researchers would like to have an allocation ratio between the standard and experimental arms of 3:1. The graph below displays the total sample size required for 80% power at the 0.05 significance level.

[Placeholder: Line graph showing ’total sample size’ on the y-axis (from 450 to 750) versus ‘allocation ratio’ on the x-axis (from 1 to 5). A point is marked at an allocation ratio of 3, corresponding to a total sample size of approximately 575.]

If the type I error rate is $\alpha=0.05$ then approximately how many patients should the researchers recruit into the experimental arm so that their study has 80% power?

Let $n_S$ and $n_E$ be the number of subjects in the standard and experimental group. $n_S = 3n_E$ and from the plot $n_S + n_E \approx 575$.

So,

$$3n_E + n_E \approx 575$$

$$\Rightarrow 4n_E \approx 575 \Rightarrow n_E \approx \frac{575}{4} \approx 143.75$$

$$n_E \approx 143$$