Introduction:
In this study we are trying to relate students’ understanding of size and scale, and their achievements in science and mathematics by providing extra teaching them about size and scale.
Our objective is to find out whether extra teaching on size and scale will influence their understanding on them and get them interested in STEM education. The most important thing that will help support our finding is to study the results of the tests designed by the researchers mainly with two metrics: comparing academic year over year performance of both the groups involved in the experiment and comparing year over year differences in progress between control and experiment groups.
Background:
Students’ dream of making a successful career generally starts in school. In order for them to build a science career they need to choose science subjects, but it depends on how much interest students have developed toward science to take up those subjects. So, we became interested to start with students’ perception of size and scale and how the teaching of those influences their understandings of size and scale, and their achievement in science and mathematics.
Design:
Objective here is to find out whether extra teaching on size and scale influences students’ understanding of those topics. The experiment is designed using experimental design technique called Randomized Block Design. This design was chosen to remove the source variability among selecting the students.
Students in experimental group get extra 150 hours of teaching about size and scale, but they do not receive exact question and answers that are part of their exams, while the Students in the control group do not receive any extra teaching.
Data Collection:
Data were collected from 150 students who entered Grade 6 from 2013 in middle school. Students were randomly selected for both the groups. 90 Students were selected into the experiment group and 64 students were selected into the Control group.
Data is consisted of scores obtained by students from both the groups. Exams were conducted for 3 years on the beginning and end of each academic year.
Two different types of exams were held. First one consisted of 26 abstract questions about the actual size of an object, and the second one consisted of 31 pictured cards that needed to be sorted in relative order.
Two different Grading systems were used to grade the answers. In the first one full credit was assigned to each correct answer or close to the correct answer, and in the second one full credit was assigned to the right answer and half credit to the answer that was close to the right answer. Data were labeled with student id, year, grading system and academic year of the exam (pre or post).
One of the problems in the data collection that the researchers missed to address is the missing values, there is a lot of missing values for students from Control group in year 2 and 3, Researchers have missed specifying if there is any reason to missing values. This makes imputing the missing values a harder problem. Hence, we cannot use any method to impute the data. So, data with missing values will be ignored during the data cleaning process.
Hypothesis:
Here the null hypothesis is that any extra teaching on size and scale will neither improve students’ interest towards stem education nor will it increase their score in those exams. Alternate hypothesis is there is a clear improvement of the score in students belonging to the experiment group, both year by year and across three years of the experiment.
Researchers are also interested in knowing how students’ interest have increased year by year, the two grading systems will help us understand how close students were to the right answer and how many were confident about the right answer.
Data Analysis:
We will use three different tools to perform the analysis.
Linear Regression:
Linear Regression is a way to determine the joint effect of independent on a dependent variable in the form:
Y =β0 +β1X1 +β2X2 +…+βnXn +ε
With
the following assumptions:
a) All n observations are independent
b) E(εj) = 0 mean of the error is 0
c) Var(εj) = σ2 the
variance of the error is constant
d) Cov(εj, εk) = 0 the error terms are uncorrelated
Here our dependent variable is the post score the one they have achieved after their semester and independent score is their pre score and which group do they belong to.
Anova:
When compared to linear regression ANOVA is the better tool to get insights into a randomized block design. It will help us understand how much variance each group has; in our case how much variance does the score of students in different group has. Also, it will help us compare the mean of both the groups.
The ANOVA model specifies that the mean for a given population, μl, is a function of an overall mean μ, and population specific effects, τl .
μl = μ + τl
ANOVA is used to test the null hypothesis that the means of all populations are equal against the alternative hypothesis that at least one mean vector is not equal to the others

Crossed factor using ANOVA to determine if either gender or ethnicity plays a role in the score.
yijk=m+ai+bj(i)+ϵijk
Paired T Test:
A research design that can be used to
investigate the effects of a single treatment is a paired- comparison study. We
are using paired t test to if there is any difference exist between pre and
post semester scores.
Data used here is score of pre and post semester. The null hypothesis is that
there are not differences between the groups.
Paired comparison tests are needed to assess the efficacy of a treatment or
multiple treatments.
1. Calculate the difference (di = yi − xi) between the
two observations on each pair, making sure you distinguish between positive and
negative differences.
2. Calculate the mean difference, ̄d.
3. Calculate the standard
deviation of the differences, sd, and use this to calculate the standard error
of the mean difference, SE( ̄d) = √ sd n
4.
Calculate the t-statistic, which is given by T = ̄d SE( ̄d) . Under the null
hypothesis, this statistic follows a t-distribution with n − 1 degrees of
freedom.
5. Use tables of the t-distribution to compare your value for T to the tn−1
distribution. This will give the p-value for the paired t-test.
The assumptions of the paired t-test are:
1. The data are continuous (not discrete).
2. The data, i.e., the differences for the matched-pairs, follow a normal probability distribution.
3. The sample of pairs is a simple random sample from its population. Each individual in the population has an equal probability of being selected in the sample.
Summary and discussion:
P-value in each of these techniques can be used to check the significant of the independent variable on the final score.
In linear regression
===================================================================================================================
Dependent variable:
———————————————————————————————–
yr1_post_std1
(1) (2) (3) (4)
——————————————————————————————————————-
yr1_pre_std1 0.532*** 0.532*** 0.627*** 0.623***
(0.075) (0.075) (0.069) (0.068)
Group 1.397 0.958 1.069 0.658
(1.368) (1.335) (1.406) (1.364)
EthniticyAA 2.899 2.137
(2.123) (2.045)
EthniticyAF -6.090 -7.328
(5.814) (5.747)
EthniticyAS -1.326 -1.913
(5.753) (5.719)
EthniticyH -1.284 -1.870
(2.356) (2.294)
EthniticyO 6.505 5.917
(5.755) (5.722)
EthniticyW 5.157** 4.480**
(1.984) (1.883)
GenderF -3.444 -0.585
(4.847) (4.792)
GenderFF -1.089 3.770
(9.251) (9.349)
GenderM -5.183 -2.127
(4.880) (4.779)
Constant 11.160** 7.657*** 9.192* 8.112***
(4.846) (2.105) (4.944) (1.791)
——————————————————————————————————————-
Observations 149 149 149 149
R2 0.437 0.426 0.372 0.364
Adjusted R2 0.391 0.393 0.350 0.356
Residual Std. Error 7.825 (df = 137) 7.812 (df = 140) 8.086 (df = 143) 8.050 (df = 146)
F Statistic 9.648*** (df = 11; 137) 12.989*** (df = 8; 140) 16.934*** (df = 5; 143) 41.854*** (df = 2; 146)
===================================================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
This is a done with SCS dataset for year 1. We have comparison of 4 different models. We can see that only the yr1_pre_std1 which is a pre semester score and EthniticyW has significance over the post semester score.
Looking at the adjusted R2 we can come to a conclusion that linear model does not fit properly to the dataset.
With Anova
===================================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
——————————————————————-
Sum Sq 5 2,524.234 3,491.530 63.855 156.016 3,051.787 8,387.801
Df 5 29.600 60.073 1 1 6 137
F value 4 13.589 24.184 0.849 0.995 14.425 49.846
Pr(> F) 4 0.199 0.229 0.000 0.015 0.349 0.469
——————————————————————-
===================================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
——————————————————————-
Sum Sq 4 3,144.558 3,821.342 31.406 695.811 4,450.252 8,543.818
Df 4 37.000 68.707 1 1 39.5 140
F value 3 17.861 28.338 0.515 1.510 26.534 50.563
Pr(> F) 3 0.166 0.267 0.000 0.012 0.250 0.474
——————————————————————-
===================================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
——————————————————————-
Sum Sq 4 3,736.445 4,518.546 37.793 93.137 6,422.547 9,349.512
Df 4 37.000 70.673 1 1 38 143
F value 3 28.152 47.768 0.569 0.573 41.944 83.310
Pr(> F) 3 0.362 0.327 0.000 0.224 0.542 0.636
——————————————————————-
====================================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
——————————————————————–
Sum Sq 3 4,962.785 4,739.013 15.085 2,713.629 7,436.635 9,461.097
Df 3 49.333 83.716 1 1 73.5 146
F value 2 41.876 58.892 0.233 21.054 62.697 83.519
Pr(> F) 2 0.315 0.446 0.000 0.158 0.473 0.630
——————————————————————–
We can see that none of the model is significant enough to prove the relationship, which supports the adjusted R2 from the linear model.
Crossed factor Anova
Df Sum Sq Mean Sq F value Pr(>F)
yr1_pre_std1 1 5409 5409 84.175 3.76e-15 ***
Group 1 15 15 0.235 0.6290
Gender 3 112 37 0.579 0.6302
Ethniticy 6 962 160 2.494 0.0268 *
yr1_pre_std1:std 1 5 5 0.071 0.7903
yr1_pre_std1:Gender 2 88 44 0.687 0.5054
std:Gender 2 74 37 0.575 0.5644
yr1_pre_std1:Ethniticy 6 164 27 0.427 0.8598
std:Ethniticy 3 108 36 0.558 0.6442
Gender:Ethniticy 3 87 29 0.450 0.7181
yr1_pre_std1:std:Gender 1 42 42 0.650 0.4218
yr1_pre_std1:std:Ethniticy 3 385 128 1.998 0.1187
yr1_pre_std1:Gender:Ethniticy 3 80 27 0.413 0.7442
std:Gender:Ethniticy 3 477 159 2.473 0.0656 .
yr1_pre_std1:std:Gender:Ethniticy3 3 1 0.015 0.9974
Residuals 107 6876 64
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Again, crossed factor anova supports the result from the linear regression. Control group or experiment group does not affect the post semester score.
We can also perform paired t test to check if there is any significant deference in the pre and post semester score.
Paired t-test
data: stem_ans_na_removed$yr1_pre_std1 and stem_ans_na_removed$yr1_post_std1
t = -1.4021, df = 148, p-value = 0.163
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.4417102 0.4148646
sample estimates:
mean of the differences
-1.013423
Based on the above output we cannot reject the null hypothesis. The paired population means are equal.
Let’s now check if there is a difference in mean in pre and post semester score in control group.
Paired t-test
data: stem_ans_na_removed_C_group$yr1_pre_std1 and stem_ans_na_removed_C_group$yr1_post_std1
t = -0.1132, df = 56, p-value = 0.9103
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.624080 2.343379
sample estimates:
mean of the differences
-0.1403509
We cannot reject the null hypothesis here as well. Which in our case it means that the extra classes did not make a significant difference in their post semester score.
To add to all these results, lets plot the interaction plot

Which shows that there is no trend that shows control group to be different from experiment group.
Concluding remarks:
Our findings suggest that there is no difference between students who took extra classes and students who did not take extra classes.
Further work is recommended to understand why there is no significant difference in pre and post semester scores. Students should have gained some knowledge on the size and scale irrespective of the group they belong too.