Inference for a Single Mean

Installing Packages

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(mosaic)

Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum

library(ggformula)
library(infer)


Attaching package: 'infer'

The following objects are masked from 'package:mosaic':

    prop_test, t_test

library(broom) # Clean test results in tibble form
library(resampledata) # Datasets from Chihara and Hesterberg's book


Attaching package: 'resampledata'

The following object is masked from 'package:datasets':

    Titanic

library(openintro) # More datasets

Loading required package: airports
Loading required package: cherryblossom
Loading required package: usdata

Attaching package: 'openintro'

The following object is masked from 'package:mosaic':

    dotPlot

The following objects are masked from 'package:lattice':

    ethanol, lsegments

Dataset: Toy Data - Reading

Notes:

set.seed : This function is used to ensure that the random number generation is reproducible. By setting a seed value (in this case, 40), every time this code is run, it will produce the same random numbers. This is essential for replicability in research and analysis.
rnorm(n, mean, sd): generates random numbers following a normal distribution.
For this case study rnorm(50, mean = 2, sd = 2)

n = 50 This specifies that 50 random numbers should be generated.

mean = 2 This sets the mean of the distribution to 2.

sd = 2 This sets the standard deviation of the distribution to 2

The result is a vector y containing 50 random values drawn from a normal distribution with the specified mean and standard deviation. This data can be used for statistical analyses like t-tests.
mydata <- tibble(y = y): This line converts the vector y into a tibble (a modern version of a data frame provided by the tibble package in R).

set.seed(40)  # for replication
# Data as individual vectors ( for t.tests etc)
y <- rnorm(50, mean = 2, sd = 2)

# And as tibble too
mydata <- tibble(y = y)
mydata

# A tibble: 50 × 1
        y
    <dbl>
 1  2.96 
 2  2.99 
 3  0.281
 4  0.342
 5  1.36 
 6 -0.608
 7 -0.843
 8  5.49 
 9  1.42 
10 -0.618
# ℹ 40 more rows

Inspecting and Charting Data

Notes:

dist = “dnorm”: This function fits a normal distribution to the data.

There are others named “dexp” for Exponential distribution and “dpois” for Poisson distribution
gf_fitdistr: It fits a normal distribution to the data and overlays it on the density plot, allowing for a direct visual comparison.

mydata %>%
    gf_density(~y) %>%
    gf_fitdistr(dist = "dnorm") %>%
    gf_labs(title = "Densities of Original Data Variables", subtitle = "Compared with Normal Density")

t1 <- mosaic::t_test(
          y, # Name of variable
          mu = 0, # belief of population mean
          alternative = "two.sided") %>% # Check both sides
  
  broom::tidy() # Make results presentable, and plottable!!
t1

# A tibble: 1 × 8
  estimate statistic     p.value parameter conf.low conf.high method alternative
     <dbl>     <dbl>       <dbl>     <dbl>    <dbl>     <dbl> <chr>  <chr>      
1     2.05      6.79     1.43e-8        49     1.44      2.65 One S… two.sided

t2 <- wilcox.test(y, # variable name
                  mu = 0, # belief
                  alternative = "two.sided",
                  conf.int = TRUE,
                  conf.level = 0.95) %>% 
  broom::tidy()
t2

# A tibble: 1 × 7
  estimate statistic    p.value conf.low conf.high method            alternative
     <dbl>     <dbl>      <dbl>    <dbl>     <dbl> <chr>             <chr>      
1     2.05      1144 0.00000104     1.38      2.72 Wilcoxon signed … two.sided

Exam Data

data("exam_grades")
exam_grades

# A tibble: 233 × 6
   semester sex   exam1 exam2 exam3 course_grade
   <chr>    <chr> <dbl> <dbl> <dbl>        <dbl>
 1 2000-1   Man    84.5  69.5  86.5         76.3
 2 2000-1   Man    80    74    67           75.4
 3 2000-1   Man    56    70    71.5         67.1
 4 2000-1   Man    64    61    67.5         63.5
 5 2000-1   Man    90.5  72.5  75           72.4
 6 2000-1   Man    74    78.5  84.5         71.4
 7 2000-1   Man    60.5  44    58           56.1
 8 2000-1   Man    89    82    88           78.0
 9 2000-1   Woman  87.5  86.5  95           82.9
10 2000-1   Man    91    98    88           89.1
# ℹ 223 more rows

exam_grades %>%
    gf_density(~course_grade) %>%
    gf_fitdistr(dist = "dnorm") %>%
    gf_labs(title = "Density of Course Grade", subtitle = "Compared with Normal Density")

stats::shapiro.test(x = exam_grades$course_grade) %>%
    broom::tidy()

# A tibble: 1 × 3
  statistic p.value method                     
      <dbl>   <dbl> <chr>                      
1     0.994   0.471 Shapiro-Wilk normality test

library(nortest)
# Especially when we have >= 5000 observations
nortest::ad.test(x = exam_grades$course_grade) %>%
    broom::tidy()

# A tibble: 1 × 3
  statistic p.value method                         
      <dbl>   <dbl> <chr>                          
1     0.331   0.512 Anderson-Darling normality test

t4 <- mosaic::t_test(
          exam_grades$course_grade, # Name of variable
          mu = 80, # belief
          alternative = "two.sided") %>% # Check both sides
broom::tidy()
t4

# A tibble: 1 × 8
  estimate statistic  p.value parameter conf.low conf.high method    alternative
     <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr>     <chr>      
1     72.2     -12.1 2.19e-26       232     71.0      73.5 One Samp… two.sided

t5 <- wilcox.test(
          exam_grades$course_grade, # Name of variable
          mu = 90, # belief
          alternative = "two.sided",
          conf.int = TRUE,
          conf.level = 0.95) %>% # Check both sides
  
  broom::tidy() # Make results presentable, and plottable!!
t5

# A tibble: 1 × 7
  estimate statistic  p.value conf.low conf.high method              alternative
     <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr>               <chr>      
1     72.4        75 1.49e-39     71.2      73.7 Wilcoxon signed ra… two.sided