弘前データ１-１：法定検診との関係の詳細検討

BP, FLCΣ の値で層別して可視化

著者

司馬博文

日付

5/13/2025

はじめに

前稿で得た知見を基に，さらに詳細な検討を行う．

BP の生の値で回帰するよりも，「BP が高いかどうか」の２値変数（あるいは３値変数）に向けて，ロジスティック回帰を行うことが有望なのではないかと判明した．「疲労感」と「抑うつ感」で高 BP 群がスコア５を付けやすいことが簡単なバープロットから判る． \(\Sigma\)-FLC が低い群は「活気」と「抑うつ感」でスコア５を付けやすい・

次節では LAB と BP，LOX-1 と FLCΣ の間のロジスティック回帰関係の階層モデリングを目指す．

kable(head(df))

受診日年齢	性別	Weight	BP	FLCΣ	BP備考	判別式	LOX-1_H	LAB_H	LOX-index_H	タイプ	多様性	item1	item2	item3	item4	item5	item6	item7	item8	item9	item10	item11	item12	item13	item14	item15	item16	item17	item18	年代	活気	イライラ感	疲労感	不安感	抑うつ感
48	2	52.7	0.47	0.81	B	-3.343	75	2.2	165	B	2	2	2	2	2	2	2	4	3	2	3	2	2	2	2	1	2	3	2	40代	3	3	4	3	3
53	2	64.8	0.67	-0.30	BDG	-2.727	55	2.8	154	A	1	3	1	1	1	1	2	3	3	3	2	2	2	2	2	1	1	1	1	50代	4	2	4	3	2
46	1	76.3	1.10	1.95	NA	-2.974	136	3.8	517	B	2	2	3	3	2	2	2	1	1	2	2	2	1	2	2	1	1	1	1	40代	2	3	2	3	2
65	1	72.4	0.88	-8.96	D	-7.338	58	2.3	133	C	2	3	3	3	2	2	2	1	1	1	2	1	1	1	1	1	1	1	1	60代	2	3	1	2	1
30	1	59.5	0.76	0.64	NA	-2.776	46	2.2	101	B	2	2	2	2	3	4	3	3	3	3	4	4	3	3	3	3	3	1	1	30代	3	5	4	5	4
51	1	76.6	1.69	2.58	NA	-2.377	78	3.2	250	B	2	2	2	2	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	50代	3	3	3	3	2

BP 備考について

A: 採尿条件不良（測定条件を満たしていない）
B: BP 測定値が下限以下
C: kappa 下限値未満
D: kappa 10倍希釈では失敗（上限オーバー）
E: kappa が凍結融解を２回必要とした
F: lambda 下限値未満
G: lambda 10倍希釈では失敗（上限オーバー）
H: lambda が凍結融解を２回必要とした

1 BP の値が高い人だけの部分標本でモデリングをする

1.1 BP 分布の可視化

df <- df[!is.na(df$BP), ]

# 日本語フォントの設定
par(family = "HiraKakuPro-W3")  # macOSの場合
par(mar = c(5, 6, 2, 2))
# 基本的なヒストグラム
hist(df$BP,
     main = NULL,
     xlab = "BP値",
     ylab = "割合",
     col = "lightblue",
     border = "white",
     freq = FALSE,  # 頻度ではなく密度（割合）を表示
     breaks = 20,  # Sturgesの公式に基づくビン数の自動設定
     cex.lab = 2.9,   # 軸ラベルの文字サイズ
     cex.axis = 2.2)  # 軸の目盛りの文字サイズ

# 密度曲線を追加
lines(density(df$BP, na.rm = TRUE), 
      col = "red", 
      lwd = 2)

# 基本統計量を表示
summary(df$BP)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2300  0.5800  0.7500  0.8262  0.9700  4.2600

bin の大きさは約 \(0.2\) である．BP が \(3.0\) 以上の者は２人のみで，これは外れ値として扱った方が良いと思われる．

df[df$BP > 4.0, ]   # ２人

# A tibble: 2 × 37
  受診日年齢  性別 Weight    BP  FLCΣ BP備考 判別式 `LOX-1_H` LAB_H
       <dbl> <dbl>  <dbl> <dbl> <dbl> <chr>   <dbl>     <dbl> <dbl>
1         45     1   70.7  4.26  0.25 <NA>     3.12        26   3.6
2         49     1   84.6  4.18  3.13 <NA>     1.34       161   3.3
# ℹ 28 more variables: `LOX-index_H` <dbl>, タイプ <chr>, 多様性 <dbl>,
#   item1 <dbl>, item2 <dbl>, item3 <dbl>, item4 <dbl>, item5 <dbl>,
#   item6 <dbl>, item7 <dbl>, item8 <dbl>, item9 <dbl>, item10 <dbl>,
#   item11 <dbl>, item12 <dbl>, item13 <dbl>, item14 <dbl>, item15 <dbl>,
#   item16 <dbl>, item17 <dbl>, item18 <dbl>, BP_A_flag <dbl>, 年代 <ord>,
#   活気 <dbl>, イライラ感 <dbl>, 疲労感 <dbl>, 不安感 <dbl>, 抑うつ感 <dbl>

df[df$BP > 3.0, ]   # ２人

# A tibble: 2 × 37
  受診日年齢  性別 Weight    BP  FLCΣ BP備考 判別式 `LOX-1_H` LAB_H
       <dbl> <dbl>  <dbl> <dbl> <dbl> <chr>   <dbl>     <dbl> <dbl>
1         45     1   70.7  4.26  0.25 <NA>     3.12        26   3.6
2         49     1   84.6  4.18  3.13 <NA>     1.34       161   3.3
# ℹ 28 more variables: `LOX-index_H` <dbl>, タイプ <chr>, 多様性 <dbl>,
#   item1 <dbl>, item2 <dbl>, item3 <dbl>, item4 <dbl>, item5 <dbl>,
#   item6 <dbl>, item7 <dbl>, item8 <dbl>, item9 <dbl>, item10 <dbl>,
#   item11 <dbl>, item12 <dbl>, item13 <dbl>, item14 <dbl>, item15 <dbl>,
#   item16 <dbl>, item17 <dbl>, item18 <dbl>, BP_A_flag <dbl>, 年代 <ord>,
#   活気 <dbl>, イライラ感 <dbl>, 疲労感 <dbl>, 不安感 <dbl>, 抑うつ感 <dbl>

BP が 1 以上の人が上位 \(23\%\) に入る．そこで，0.9 以上の3割，4割，3割に分ける．

df$BP_level <- case_when(
  df$BP > 0.9 ~ 3,
  df$BP > 0.6 & df$BP <= 0.9 ~ 2,
  df$BP <= 0.6 ~ 1
)
length(df[df$BP_level == 3, ]$BP)/dim(df)[1]

[1] 0.3107143

length(df[df$BP_level == 2, ]$BP)/dim(df)[1]

[1] 0.4035714

length(df[df$BP_level == 1, ]$BP)/dim(df)[1]

[1] 0.2857143

1.2 \(\mathtt{BP}>1\) 部分標本の解析（活気で試す）

library(ggplot2)

# BP_levelを因子型に変換
df$BP_level <- factor(df$BP_level, 
                      levels = c(1, 2, 3),
                      labels = c("低 (BP ≤ 0.6)", "中 (0.6 < BP ≤ 0.9)", "高 (BP > 0.9)"))

# 箱ひげ図の作成（平均値付き）
ggplot(df, aes(x = BP_level, y = 活気, fill = BP_level)) +
  geom_boxplot() +
  stat_summary(fun = mean, geom = "point", shape = 23, size = 3, fill = "white") +  # 平均値を追加
  scale_y_continuous(breaks = 1:5) +  # y軸を1から5までの整数に設定
  theme_minimal() +
  labs(title = "BP_level別の活気スコアの分布（平均値付き）",
       x = "BP_level",
       y = "活気スコア") +
  theme(text = element_text(family = "HiraKakuPro-W3"),
        plot.title = element_text(hjust = 0.5),
        legend.position = "none") +
  scale_fill_brewer(palette = "Pastel1")

Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_summary()`).

Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(count)` instead.

Warning: Removed 2 rows containing non-finite outside the scale range (`stat_count()`).
Removed 2 rows containing non-finite outside the scale range (`stat_count()`).

1.3 イライラ感

Warning: Removed 1 row containing non-finite outside the scale range (`stat_count()`).
Removed 1 row containing non-finite outside the scale range (`stat_count()`).

1.4 疲労感

Warning: Removed 1 row containing non-finite outside the scale range (`stat_count()`).
Removed 1 row containing non-finite outside the scale range (`stat_count()`).

1.5 不安感

Warning: Removed 1 row containing non-finite outside the scale range (`stat_count()`).
Removed 1 row containing non-finite outside the scale range (`stat_count()`).

1.6 抑うつ感

Warning: Removed 1 row containing non-finite outside the scale range (`stat_count()`).
Removed 1 row containing non-finite outside the scale range (`stat_count()`).

2 \(\Sigma\)-FLC で同じことをしてみる

2.1 \(\Sigma\)-FLC の可視化

df <- df[is.na(df$`BP備考`), ]

# 日本語フォントの設定
par(family = "HiraKakuPro-W3")  # macOSの場合
par(mar = c(5, 6, 2, 2))
# 基本的なヒストグラム
hist(df$FLCΣ,
     main = NULL,
     xlab = "FLCΣ",
     ylab = "割合",
     col = "lightblue",
     border = "white",
     freq = FALSE,  # 頻度ではなく密度（割合）を表示
     breaks = 20,  # Sturgesの公式に基づくビン数の自動設定
     cex.lab = 2.9,   # 軸ラベルの文字サイズ
     cex.axis = 2.2)  # 軸の目盛りの文字サイズ

# 密度曲線を追加
lines(density(df$FLCΣ, na.rm = TRUE), 
      col = "red", 
      lwd = 2)

# 基本統計量を表示
summary(df$FLCΣ)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1400  0.8675  1.4700  1.9678  2.5425 17.3500

次のように，30%, 30%, 40% に分ける．

df$FLC_level <- case_when(
  df$FLCΣ >= 2 ~ 3,
  df$FLCΣ >= 0.7 & df$FLCΣ < 2 ~ 2,
  df$FLCΣ < 0.7 ~ 1
)
length(df[df$FLC_level == 3, ]$FLCΣ)/dim(df)[1]

[1] 0.3819444

length(df[df$FLC_level == 2, ]$FLCΣ)/dim(df)[1]

[1] 0.4479167

length(df[df$FLC_level == 1, ]$FLCΣ)/dim(df)[1]

[1] 0.1701389

2.2 活気

Warning: Removed 2 rows containing non-finite outside the scale range (`stat_count()`).
Removed 2 rows containing non-finite outside the scale range (`stat_count()`).

2.3 イライラ感

Warning: Removed 1 row containing non-finite outside the scale range (`stat_count()`).
Removed 1 row containing non-finite outside the scale range (`stat_count()`).

2.4 疲労感

df[df$疲労感 == 5, ]$FLCΣ

 [1] 1.17 0.83 2.50 5.31 0.81 1.00 0.98 1.23 1.75 1.24 2.34 0.87 3.18 1.83