提问者:小点点

如何使用dplyr(最好)将yes/no行转换为比例?[关闭]


编辑问题以包括所需的行为、特定的问题或错误以及重现问题所需的最短代码。这将有助于其他人回答问题。

数据来自这个RData数据集

这是脚本:

library(dplyr)
library(ggplot2)
load("brfss2013.RData")

test <- brfss2013 %>%
  select(chcscncr,exract11) %>% 
  filter(chcscncr != "NA" , exract11 != "NA") %>% 
  group_by(exract11,chcscncr) %>% 
  summarise(count = n())

其结果如下表所示:

> head(test)
Source: local data frame [6 x 3]
Groups: exract11 [3]

                                                  exract11 chcscncr count
                                                    <fctr>   <fctr> <int>
1 Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19
2 Active Gaming Devices (Wii Fit, Dance, Dance revolution)       No   287
3                                  Aerobics video or class      Yes   800
4                                  Aerobics video or class       No  7340
5                                              Backpacking      Yes     4
6                                              Backpacking       No    38

我想实现一个表格,给出每种运动的“是”比例,比如:

Type     Ans Count
Sport A  yes 45
Sport A  no  55
Sport B  yes 34
Sport B  no  66

到:

Type      p(yes)
Sport A   0.45
Sport B   0.34

共1个答案

匿名用户

prop. table将总计转换为比例(在本例中,每个组的值只需x/sum(x)),因此对于您的“From”表:

brfss2013 %>%
    select(chcscncr,exract11) %>% 
    na.omit() %>%    # `==` doesn't work for NA
    count(exract11, chcscncr) %>%    # equivalent to `group_by(...) %>% summarise(n = n())`
    group_by(exract11) %>%
    mutate(pct = prop.table(n) * 100)    # `* 100` to convert to percent

## Source: local data frame [144 x 4]
## Groups: exract11 [75]
## 
##                                                    exract11 chcscncr     n      pct
##                                                      <fctr>   <fctr> <int>    <dbl>
## 1  Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19  6.20915
## 2  Active Gaming Devices (Wii Fit, Dance, Dance revolution)       No   287 93.79085
## 3                                   Aerobics video or class      Yes   800  9.82801
## 4                                   Aerobics video or class       No  7340 90.17199
## 5                                               Backpacking      Yes     4  9.52381
## 6                                               Backpacking       No    38 90.47619
## 7                                                 Badminton      Yes     4 10.52632
## 8                                                 Badminton       No    34 89.47368
## 9                                                Basketball      Yes    37  1.64664
## 10                                               Basketball       No  2210 98.35336
## # ... with 134 more rows

对于您的“to”表,过滤“Yes”行:

brfss2013 %>%
    select(chcscncr,exract11) %>% 
    na.omit() %>% 
    count(exract11, chcscncr) %>%
    group_by(exract11) %>%
    mutate(p_yes = prop.table(n)) %>%
    filter(chcscncr == "Yes")

## Source: local data frame [69 x 4]
## Groups: exract11 [69]
## 
##                                                                 exract11 chcscncr     n      p_yes
##                                                                   <fctr>   <fctr> <int>      <dbl>
## 1               Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19 0.06209150
## 2                                                Aerobics video or class      Yes   800 0.09828010
## 3                                                            Backpacking      Yes     4 0.09523810
## 4                                                              Badminton      Yes     4 0.10526316
## 5                                                             Basketball      Yes    37 0.01646640
## 6                                             Bicycling machine exercise      Yes   987 0.13708333
## 7                                                              Bicycling      Yes   728 0.08519602
## 8  Boating (Canoeing, rowing, kayaking, sailing for pleasure or camping)      Yes    22 0.11518325
## 9                                                                Bowling      Yes    68 0.09985316
## 10                                                               Boxing       Yes     5 0.01633987
## # ... with 59 more rows

“是”值的比例非常小,从第一个表中可以看出。