弘前データ１-２：法定検診との関係の詳細検討

BP, FLCΣ の値の高低を用いた項目反応モデル

著者

司馬博文

日付

5/13/2025

はじめに

前々稿で得た知見を基に，さらに詳細な検討を行う．前稿では BP, FLCΣ の値を３段階で離散化して，層別して可視化した．ここではこの離散化に基づいて層別して回帰する，項目反応モデリングを考える．次節では LAB と BP，LOX-1 と FLCΣ の間の回帰関係の階層モデリングを目指す．

各項目に対する項目反応モデリングは最も面白い方向であろうが，「項目毎」「個人毎」では PC で簡単に実行できるモデルにはならない．代わりに，男女差調整済みの尺度である「活気」「抑うつ感」などに対して，「BP のレベル３段階毎」で違う係数を推定することを考える．

kable(head(df))

受付番号	受診日年齢	性別	Weight	BP	FLCΣ	BP備考	判別式	LOX-1_H	LAB_H	LOX-index_H	タイプ	多様性	item1	item2	item3	item4	item5	item6	item7	item8	item9	item10	item11	item12	item13	item14	item15	item16	item17	item18	年代	活気	イライラ感	疲労感	不安感	抑うつ感
4	46	1	76.3	1.10	1.95	NA	-2.974	136	3.8	517	B	2	2	3	3	2	2	2	1	1	2	2	2	1	2	2	1	1	1	1	40代	2	3	2	3	2
6	30	1	59.5	0.76	0.64	NA	-2.776	46	2.2	101	B	2	2	2	2	3	4	3	3	3	3	4	4	3	3	3	3	3	1	1	30代	3	5	4	5	4
7	51	1	76.6	1.69	2.58	NA	-2.377	78	3.2	250	B	2	2	2	2	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	50代	3	3	3	3	2
8	47	2	43.2	0.56	0.55	NA	-3.048	40	2.7	108	B	2	3	3	3	1	1	1	2	1	1	2	1	1	1	1	1	1	1	1	40代	2	1	2	2	1
10	39	1	61.9	0.52	0.64	NA	-3.165	54	2.4	130	B	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	30代	3	3	3	3	3
11	55	1	61.0	0.89	2.64	NA	-3.708	60	3.1	186	E	1	2	2	2	1	1	2	1	1	2	1	1	1	1	1	1	1	1	1	50代	3	2	2	1	1

1 各項目データに対する項目反応モデリング

1.1 データ長形式化

df_1PL <- df[, c("受付番号", "BP", "FLCΣ", paste0("item", 1:18))]
colnames(df_1PL) <- c("id", "BP", "FLCΣ", paste0("item", 1:18))

library(tidyr)

df_1PL_long <- df_1PL %>%
  pivot_longer(
    cols = starts_with("item"),
    names_to = "item",
    values_to = "response"
  )

df_1PL_long$res <- ifelse(df_1PL_long$response > 3, 1, 0)

kable(head(df_1PL_long[6:13,]))

id	BP	FLCΣ	item	response
4	1.1	1.95	item6	2
4	1.1	1.95	item7	1
4	1.1	1.95	item8	1
4	1.1	1.95	item9	2
4	1.1	1.95	item10	2
4	1.1	1.95	item11	2

1.2 2PL モデル

次のモデルは重すぎて実行できず．

library(brms)

formula_2PL <- bf(
  res ~ exp(loggamma) * eta,
  loggamma ~ 1 + (1|item),
  eta ~ 1 + (1|item) + (1|id),
  nl = TRUE
)

fit_2PL <- brm(
  formula = formula_2PL,
  data = df_1PL_long,
  family = brmsfamily("bernoulli", link = "logit"),
  chains = 4, cores = 4, iter = 5000
)

1.3 1PL モデル

library(brms)

Warning: パッケージ 'brms' はバージョン 4.3.1 の R の下で造られました

 要求されたパッケージ Rcpp をロード中です

Loading 'brms' package (version 2.21.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').


 次のパッケージを付け加えます: 'brms'

 以下のオブジェクトは 'package:stats' からマスクされています:

    ar

formula_1PL <- bf(
  res ~ 1 + (1|item) + (1|BP)
)

fit_1PL <- brm(
  formula = formula_1PL,
  data = df_1PL_long,
  family = brmsfamily("bernoulli", link = "logit"),
  chains = 4, cores = 4, iter = 5000
)

Warning: Rows containing NAs were excluded from the model.

Compiling Stan program...

Start sampling

summary(fit_1PL)

 Family: bernoulli 
  Links: mu = logit 
Formula: res ~ 1 + (1 | item) + (1 | BP) 
   Data: df_1PL_long (Number of observations: 10365) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Multilevel Hyperparameters:
~BP (Number of levels: 130) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.24      0.14     0.99     1.54 1.00     3054     4847

~item (Number of levels: 18) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.86      0.18     0.57     1.30 1.00     3752     5935

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -3.88      0.26    -4.41    -3.37 1.00     2903     4664

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

plot(fit_1PL)

1.3.1 項目パラメータ

library(ggplot2)
ranef_item <- ranef(fit_1PL)$item
posterior_means <- ranef_item[,1,1]
lower_bounds <- ranef_item[,3,1]
upper_bounds <- ranef_item[,4,1]
plot_df_item <- data.frame(
  item = rownames(ranef_item),
  mean = posterior_means,
  lower = lower_bounds,
  upper = upper_bounds
)
p_PL1 <- ggplot(plot_df_item, aes(x = mean, y = item)) +
  geom_point() +
  geom_errorbar(aes(xmin = lower, xmax = upper), width = 0.2) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "Posterior Means and 95% Credible Intervals for Items",
       x = "Posterior Estimate",
       y = "Item")
p_PL1

item2_3 <- df_1PL_long[df_1PL_long$item %in% c("item2", "item3"),]
summary(item2_3$response)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1.00    2.00    2.00    2.37    3.00    4.00       2

ほとんどの人が高い値を答えた．

item4_9 <- df_1PL_long[df_1PL_long$item %in% c("item4", "item5", "item6", "item7", "item8", "item9"),]
summary(item4_9$response)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1.00    1.00    2.00    1.91    2.00    4.00       1

item15_18 <- df_1PL_long[df_1PL_long$item %in% c("item15", "item16", "item17", "item18"),]
summary(item15_18$response)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   1.405   2.000   4.000

ほとんどの人が低い値を答えた．

1.3.2 個人パラメータ

library(ggplot2)
ranef_BP <- ranef(fit_1PL)$BP
posterior_means <- ranef_BP[,1,1]
lower_bounds <- ranef_BP[,3,1]
upper_bounds <- ranef_BP[,4,1]
plot_df_BP <- data.frame(
  BP = rownames(ranef_BP),
  mean = posterior_means,
  lower = lower_bounds,
  upper = upper_bounds
)
p_PL1 <- ggplot(plot_df_BP, aes(x = mean, y = BP)) +
  geom_point() +
  geom_errorbar(aes(xmin = lower, xmax = upper), width = 0.2) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "Posterior Means and 95% Credible Intervals for BP",
       x = "Posterior Estimate",
       y = "BP")
p_PL1

基本的に，BP は項目反応には関係ない．しかし，一部の人だけ高い回帰係数が推定されている．

2 BP の高低と項目反応

2.1 BP の層別

df$BP_level <- case_when(
  df$BP > 0.9 ~ 3,
  df$BP > 0.6 & df$BP <= 0.9 ~ 2,
  df$BP <= 0.6 ~ 1
)
length(df[df$BP_level == 3, ]$BP)/dim(df)[1]
length(df[df$BP_level == 2, ]$BP)/dim(df)[1]
length(df[df$BP_level == 1, ]$BP)/dim(df)[1]

2.2 1PL モデル

library(brms)

formula_1PL <- bf(
  活気 ~ 1 + (1 | BP_level)
)

fit_1PL <- brm(
  formula = formula_1PL,
  data = df,
  family = brmsfamily("categorical"),
  chains = 4, cores = 4, iter = 5000
)

summary(fit_1PL)

ranef(fit_1PL)