Summaries

Author

Sara S

Installing and Importing packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mosaic)
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum
library(skimr)

Attaching package: 'skimr'

The following object is masked from 'package:mosaic':

    n_missing
library(knitr)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(ggformula)
library(babynames)

Dataset: Mpg

mpg
# A tibble: 234 × 11
   manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
   <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
 1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
 2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
 3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
 4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
 5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
 6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
 7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
 8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
 9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
# ℹ 224 more rows

Filtering the data(top 10)

Retrieves the first 10 rows of the dataset

mpg %>% 
  head(10)
# A tibble: 10 × 11
   manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
   <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
 1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
 2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
 3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
 4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
 5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
 6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
 7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
 8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
 9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…

Using glimpse, mosaic, skimr and mutate to organize data

Notes:

  • Glimpse offers a brief overview of the dataset structure, showing the variable names and types for better understanding.

  • Inspect function from the mosaic package provides a detailed summary of the dataset’s columns, including basic statistics like levels.

  • Skim delivers a comprehensive summary of the dataset, detailing counts, means, and distribution metrics for each variable.

glimpse(mpg)
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
$ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
$ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
$ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
$ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
$ trans        <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto…
$ drv          <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4…
$ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
$ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
$ fl           <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p…
$ class        <chr> "compact", "compact", "compact", "compact", "compact", "c…
inspect(mpg)

categorical variables:  
          name     class levels   n missing
1 manufacturer character     15 234       0
2        model character     38 234       0
3        trans character     10 234       0
4          drv character      3 234       0
5           fl character      5 234       0
6        class character      7 234       0
                                   distribution
1 dodge (15.8%), toyota (14.5%) ...            
2 caravan 2wd (4.7%) ...                       
3 auto(l4) (35.5%), manual(m5) (24.8%) ...     
4 f (45.3%), 4 (44%), r (10.7%)                
5 r (71.8%), p (22.2%), e (3.4%) ...           
6 suv (26.5%), compact (20.1%) ...             

quantitative variables:  
   name   class    min     Q1 median     Q3  max        mean       sd   n
1 displ numeric    1.6    2.4    3.3    4.6    7    3.471795 1.291959 234
2  year integer 1999.0 1999.0 2003.5 2008.0 2008 2003.500000 4.509646 234
3   cyl integer    4.0    4.0    6.0    8.0    8    5.888889 1.611534 234
4   cty integer    9.0   14.0   17.0   19.0   35   16.858974 4.255946 234
5   hwy integer   12.0   18.0   24.0   27.0   44   23.440171 5.954643 234
  missing
1       0
2       0
3       0
4       0
5       0
skimr:: skim(mpg)
Data summary
Name mpg
Number of rows 234
Number of columns 11
_______________________
Column type frequency:
character 6
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
manufacturer 0 1 4 10 0 15 0
model 0 1 2 22 0 38 0
trans 0 1 8 10 0 10 0
drv 0 1 1 1 0 3 0
fl 0 1 1 1 0 5 0
class 0 1 3 10 0 7 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
displ 0 1 3.47 1.29 1.6 2.4 3.3 4.6 7 ▇▆▆▃▁
year 0 1 2003.50 4.51 1999.0 1999.0 2003.5 2008.0 2008 ▇▁▁▁▇
cyl 0 1 5.89 1.61 4.0 4.0 6.0 8.0 8 ▇▁▇▁▇
cty 0 1 16.86 4.26 9.0 14.0 17.0 19.0 35 ▆▇▃▁▁
hwy 0 1 23.44 5.95 12.0 18.0 24.0 27.0 44 ▅▅▇▁▁

Factorizing data

It converts qualitative data into factors.

Notes:

  • ‘something_modified <- something’ creates a new dataset by taking the existing dataset and preparing it for further modifications.

  • ‘dplyr::mutate’ is the mutate function from the dplyr package, which is used to create or modify columns in a dataframe.

  • ‘something = as_factor(something)’ converts the column from int or chr to a factor.

mpg_modified <- mpg %>%
  dplyr::mutate(
    cyl = as_factor(cyl),
    fl = as_factor(fl),
    drv = as_factor(drv),
    class = as_factor(class),
    trans = as_factor(trans)
  )
glimpse(mpg_modified)
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
$ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
$ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
$ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
$ cyl          <fct> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
$ trans        <fct> auto(l5), manual(m5), manual(m6), auto(av), auto(l5), man…
$ drv          <fct> f, f, f, f, f, f, f, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, r, …
$ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
$ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
$ fl           <fct> p, p, p, p, p, p, p, p, p, p, p, p, p, p, p, p, p, p, r, …
$ class        <fct> compact, compact, compact, compact, compact, compact, com…

Grouped by cly

This groups the modified dataset by the number of cylinders (cyl) and calculates the average highway miles per gallon (average_hwy) along with the count of cars in each group.

mpg_modified %>%
  group_by(cyl) %>%
  summarize(average_hwy = mean(hwy), count = n())
# A tibble: 4 × 3
  cyl   average_hwy count
  <fct>       <dbl> <int>
1 4            28.8    81
2 5            28.8     4
3 6            22.8    79
4 8            17.6    70

Grouping with cyl and fl

It groups the dataset by both the number of cylinders (cyl) and fuel type (fl) calculating the average highway miles per gallon (average_hwy) along with the count of cars in each group.

mpg_modified %>%
  group_by(cyl, fl) %>%
  summarize(average_hwy = mean(hwy), count = n())
`summarise()` has grouped output by 'cyl'. You can override using the `.groups`
argument.
# A tibble: 13 × 4
# Groups:   cyl [4]
   cyl   fl    average_hwy count
   <fct> <fct>       <dbl> <int>
 1 4     p            27.8    22
 2 4     r            28.3    55
 3 4     d            43       3
 4 4     c            36       1
 5 5     r            28.8     4
 6 6     p            25.3    17
 7 6     r            22.2    60
 8 6     e            17       1
 9 6     d            22       1
10 8     p            20.8    13
11 8     r            17.5    49
12 8     e            12.7     7
13 8     d            17       1

Grouping by manufacturer

It groups the mpg dataset by manufacturer, calculating the average city miles per gallon (average_cty) and the count of models from each manufacturer.

mpg %>% 
  group_by(manufacturer) %>% 
  summarize(average_cty = mean(cty), count = n())
# A tibble: 15 × 3
   manufacturer average_cty count
   <chr>              <dbl> <int>
 1 audi                17.6    18
 2 chevrolet           15      19
 3 dodge               13.1    37
 4 ford                14      25
 5 honda               24.4     9
 6 hyundai             18.6    14
 7 jeep                13.5     8
 8 land rover          11.5     4
 9 lincoln             11.3     3
10 mercury             13.2     4
11 nissan              18.1    13
12 pontiac             17       5
13 subaru              19.3    14
14 toyota              18.5    34
15 volkswagen          20.9    27

Conclusion

The analysis of the mpg dataset shows important trends in fuel efficiency related to vehicle features. Converting numeric variables into factors helps with the analysis, and grouping operations offer clear averages and counts.

Dataset: Babynames

babynames
# A tibble: 1,924,665 × 5
    year sex   name          n   prop
   <dbl> <chr> <chr>     <int>  <dbl>
 1  1880 F     Mary       7065 0.0724
 2  1880 F     Anna       2604 0.0267
 3  1880 F     Emma       2003 0.0205
 4  1880 F     Elizabeth  1939 0.0199
 5  1880 F     Minnie     1746 0.0179
 6  1880 F     Margaret   1578 0.0162
 7  1880 F     Ida        1472 0.0151
 8  1880 F     Alice      1414 0.0145
 9  1880 F     Bertha     1320 0.0135
10  1880 F     Sarah      1288 0.0132
# ℹ 1,924,655 more rows
babynames %>% filter (name == "Sara") %>% head(15)
# A tibble: 15 × 5
    year sex   name      n    prop
   <dbl> <chr> <chr> <int>   <dbl>
 1  1880 F     Sara    165 0.00169
 2  1881 F     Sara    147 0.00149
 3  1882 F     Sara    180 0.00156
 4  1883 F     Sara    183 0.00152
 5  1884 F     Sara    197 0.00143
 6  1885 F     Sara    215 0.00151
 7  1886 F     Sara    247 0.00161
 8  1887 F     Sara    214 0.00138
 9  1888 F     Sara    293 0.00155
10  1889 F     Sara    286 0.00151
11  1890 F     Sara    262 0.00130
12  1891 F     Sara    260 0.00132
13  1892 F     Sara    276 0.00123
14  1893 F     Sara    290 0.00129
15  1894 F     Sara    304 0.00129

Using glimpse, mosaic, skimr to organize data

babynames%>% mosaic::inspect()

categorical variables:  
  name     class levels       n missing
1  sex character      2 1924665       0
2 name character  97310 1924665       0
                                   distribution
1 F (59.1%), M (40.9%)                         
2 Francis (0%), James (0%) ...                 

quantitative variables:  
  name   class      min        Q1    median        Q3          max         mean
1 year numeric 1.88e+03 1.951e+03 1.985e+03 2.003e+03 2.017000e+03 1.974851e+03
2    n integer 5.00e+00 7.000e+00 1.200e+01 3.200e+01 9.968600e+04 1.808733e+02
3 prop numeric 2.26e-06 3.870e-06 7.300e-06 2.288e-05 8.154561e-02 1.362963e-04
            sd       n missing
1 3.402948e+01 1924665       0
2 1.533337e+03 1924665       0
3 1.151693e-03 1924665       0
babynames %>% dplyr::glimpse()
Rows: 1,924,665
Columns: 5
$ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880,…
$ sex  <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", …
$ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida",…
$ n    <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258,…
$ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.016…

Factorizing data

babynames_modified <- babynames %>%
  dplyr::mutate(
    sex = as_factor(sex)
  )
glimpse(babynames_modified)
Rows: 1,924,665
Columns: 5
$ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880,…
$ sex  <fct> F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,…
$ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida",…
$ n    <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258,…
$ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.016…
babynames %>% skimr::skim()
Data summary
Name Piped data
Number of rows 1924665
Number of columns 5
_______________________
Column type frequency:
character 2
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
sex 0 1 1 1 0 2 0
name 0 1 2 15 0 97310 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 1974.85 34.03 1880 1951 1985 2003 2017.00 ▁▂▃▅▇
n 0 1 180.87 1533.34 5 7 12 32 99686.00 ▇▁▁▁▁
prop 0 1 0.00 0.00 0 0 0 0 0.08 ▇▁▁▁▁

Conclusion

The examination of the babynames dataset reveals significant patterns in naming trends. By filtering for the name “Sara” and converting the qualitative variable into a factor, the data becomes more organized and easier to analyze.

Dataset: Math Anxiety

Notes:

  • ‘read_delim’ is for reading text files that have semicolons.

  • ‘read_csv’ is for reading a csv file stored on the device.

math_an <- read_delim("../../data/MathAnxiety.csv")
Rows: 599 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (2): Gender, Grade
dbl (3): AMAS, RCMAS, Arith,
num (1): Age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
math_an
# A tibble: 599 × 6
     Age Gender Grade      AMAS RCMAS `Arith,`
   <dbl> <chr>  <chr>     <dbl> <dbl>    <dbl>
 1  1378 Boy    Secondary     9    20        6
 2  1407 Boy    Secondary    18     8        6
 3  1379 Girl   Secondary    23    26        5
 4  1428 Girl   Secondary    19    18        7
 5  1356 Boy    Secondary    23    20        1
 6  1350 Girl   Secondary    27    33        1
 7  1336 Boy    Secondary    22    23        4
 8  1393 Boy    Secondary    17    11        7
 9  1317 Girl   Secondary    28    32        2
10  1348 Boy    Secondary    20    30        6
# ℹ 589 more rows

Organizing Data

Provides an overview of the dataset’s structure, including data types, variable details, counts,etc.

math_an %>% dplyr::glimpse()
Rows: 599
Columns: 6
$ Age      <dbl> 1378, 1407, 1379, 1428, 1356, 1350, 1336, 1393, 1317, 1348, 1…
$ Gender   <chr> "Boy", "Boy", "Girl", "Girl", "Boy", "Girl", "Boy", "Boy", "G…
$ Grade    <chr> "Secondary", "Secondary", "Secondary", "Secondary", "Secondar…
$ AMAS     <dbl> 9, 18, 23, 19, 23, 27, 22, 17, 28, 20, 16, 20, 21, 36, 16, 27…
$ RCMAS    <dbl> 20, 8, 26, 18, 20, 33, 23, 11, 32, 30, 10, 4, 23, 26, 24, 21,…
$ `Arith,` <dbl> 6, 6, 5, 7, 1, 1, 4, 7, 2, 6, 2, 5, 2, 6, 2, 7, 2, 4, 7, 3, 8…
math_an %>% mosaic::inspect()

categorical variables:  
    name     class levels   n missing
1 Gender character      2 599       0
2  Grade character      2 599       0
                                   distribution
1 Boy (53.9%), Girl (46.1%)                    
2 Primary (66.9%), Secondary (33.1%)           

quantitative variables:  
    name   class min     Q1 median     Q3  max       mean         sd   n
1    Age numeric  37 1061.5   1208 1418.5 1875 1246.49249 223.112183 599
2   AMAS numeric   4   18.0     22   26.5   45   21.98164   6.597962 599
3  RCMAS numeric   1   14.0     19   25.0   41   19.24040   7.566802 599
4 Arith, numeric   0    4.0      6    7.0    8    5.30217   2.105220 599
  missing
1       0
2       0
3       0
4       0
math_an %>% skimr::skim()
Data summary
Name Piped data
Number of rows 599
Number of columns 6
_______________________
Column type frequency:
character 2
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Gender 0 1 3 4 0 2 0
Grade 0 1 7 9 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Age 0 1 1246.49 223.11 37 1061.5 1208 1418.5 1875 ▁▁▇▇▃
AMAS 0 1 21.98 6.60 4 18.0 22 26.5 45 ▂▆▇▃▁
RCMAS 0 1 19.24 7.57 1 14.0 19 25.0 41 ▂▇▇▅▁
Arith, 0 1 5.30 2.11 0 4.0 6 7.0 8 ▂▃▃▇▇

Factorizing

Converting qualitative data columns into factors.

math_an_modified <- math_an %>%
  dplyr::mutate(
    Age = as_factor(Age),
    Gender = as_factor(Gender),
    Grade = as_factor(Grade),
  )
glimpse(math_an_modified)
Rows: 599
Columns: 6
$ Age      <fct> 1378, 1407, 1379, 1428, 1356, 1350, 1336, 1393, 1317, 1348, 1…
$ Gender   <fct> Boy, Boy, Girl, Girl, Boy, Girl, Boy, Boy, Girl, Boy, Boy, Bo…
$ Grade    <fct> Secondary, Secondary, Secondary, Secondary, Secondary, Second…
$ AMAS     <dbl> 9, 18, 23, 19, 23, 27, 22, 17, 28, 20, 16, 20, 21, 36, 16, 27…
$ RCMAS    <dbl> 20, 8, 26, 18, 20, 33, 23, 11, 32, 30, 10, 4, 23, 26, 24, 21,…
$ `Arith,` <dbl> 6, 6, 5, 7, 1, 1, 4, 7, 2, 6, 2, 5, 2, 6, 2, 7, 2, 4, 7, 3, 8…

Anxiety based on Gender

Groups the data by Gender and calculates the average scores for the AMAS and RCMAS, along with the count for each gender.

math_an_modified %>%
  group_by(Gender) %>%
  summarize(average_AMAS = mean(AMAS),average_RCMAS = mean(RCMAS), count = n())
# A tibble: 2 × 4
  Gender average_AMAS average_RCMAS count
  <fct>         <dbl>         <dbl> <int>
1 Boy            21.2          18.1   323
2 Girl           22.9          20.6   276

This analysis indicates that male participants generally have lower average scores on both scales compared to female participants, suggesting that females experience higher levels of math anxiety.

Anxiety based on Age

Groups the data by Age and calculates the average scores for the AMAS and RCMAS, along with the count for each age group.

math_an_modified %>%
  group_by(Age) %>%
  summarize(average_AMAS = mean(AMAS),average_RCMAS = mean(RCMAS), count = n())
# A tibble: 391 × 4
   Age   average_AMAS average_RCMAS count
   <fct>        <dbl>         <dbl> <int>
 1 37            11             5       1
 2 837           26            23       1
 3 910           17            13       1
 4 925           28            21       1
 5 927           10            12       1
 6 931           16            20       1
 7 938           29.5          26.5     2
 8 940           13            15       1
 9 947           17            31       2
10 948           14            17       1
# ℹ 381 more rows

The results reveal that older age groups tend to have higher average scores on both the AMAS and RCMAS, indicating that math anxiety may increase with age.

Anxiety based on Grade

Groups the data by Grade and calculates the average scores for the AMAS and RCMAS, along with the count for each grade level.

math_an_modified %>%
  group_by(Grade) %>%
  summarize(average_AMAS = mean(AMAS),average_RCMAS = mean(RCMAS), count = n())
# A tibble: 2 × 4
  Grade     average_AMAS average_RCMAS count
  <fct>            <dbl>         <dbl> <int>
1 Secondary         22.3          18.5   198
2 Primary           21.8          19.6   401

The results show that primary school students have slightly higher average scores on the RCMAS compared to secondary school students, suggesting that younger students may experience greater levels of math anxiety, despite secondary school students having a higher average on the AMAS.

Anxiety based on Gender & Grade

Groups the data by Gender and Grade, calculating the average AMAS and RCMAS scores, along with the count for each gender and grade combination.

math_an_modified %>% group_by(Gender, Grade) %>% summarize(average_AMAS = mean(AMAS),average_RCMAS = mean(RCMAS), count = n())
`summarise()` has grouped output by 'Gender'. You can override using the
`.groups` argument.
# A tibble: 4 × 5
# Groups:   Gender [2]
  Gender Grade     average_AMAS average_RCMAS count
  <fct>  <fct>            <dbl>         <dbl> <int>
1 Boy    Secondary         21.5          17.4   124
2 Boy    Primary           20.9          18.6   199
3 Girl   Secondary         23.5          20.3    74
4 Girl   Primary           22.7          20.6   202

The analysis shows that girls, in both primary and secondary grades, have higher average anxiety scores on the AMAS and RCMAS scales compared to boys. Secondary school girls have the highest anxiety, while boys in both grades have lower anxiety scores.

Conclusion

The analysis reveals that female students, especially those in secondary grades, experience higher math anxiety than males across all measures. Additionally, younger students exhibit higher anxiety on the RCMAS scale, while older students show slightly increased anxiety on the AMAS scale.