For循环提取分散在另一个R数据帧中多个列上的数据 [英] For loop to extract data scattered across multiple columns in another R dataframe

查看:71
本文介绍了For循环提取分散在另一个R数据帧中多个列上的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个调查问题,受访者可以选择多个答案(对于16种可能的组合,例如您喜欢哪种颜色?可能会导致回答红色,蓝色,绿色,黄色或红色,蓝色,绿色 ,黑色等。

I have a survey question in which respondents could select multiple answers (for 16 possible combinations, e.g. "Which color do you like?" can result in responses "red, blue, green, yellow" or "red, blue, green, black" etc.

这16种可能的组合包含在电子表格中:

These 16 possible combinations are contained in a spreadsheet:

图片1:电子表格的前两行(完整的电子表格有16行)

示例1:

structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("red", "ruby"), class = "factor"), 
V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("blue", "violet"), class = "factor"), 
V3 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 2L, 2L), .Label = c("green", "turqoise"), class = "factor"), 
V4 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L), .Label = c("black", "yellow"), class = "factor")), .Names = c("V1", 
"V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -16L
))

带有答案的数据框有16列针对此问题(每简单一列颜色的组合)。如果受访者1选择了第一个组合,则只有第一列包含数据;否则,只有第一列包含数据。同样,如果受访者2选择了第二组合,则第二列包含数据。另一个为空:

The dataframe with responses has sixteen columns for this question (one column per every simple combination of colors). If respondent 1 selected the first combination, only the first column contains data; similarly, if respondent 2 selected the second combination, the second column contains data. The other are empty:

图片2 :数据框的前两列

示例2:

structure(list(respondentID = 1:16, v1 = c(1L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v2 = c(NA, 1L, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v3 = c(NA, 
NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
v4 = c(NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, NA, 
NA, NA, NA), v5 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA), v6 = c(NA, 1L, NA, NA, NA, NA, NA, 
NA, NA, 1L, NA, NA, NA, NA, NA, NA), v7 = c(NA, NA, NA, NA, 
1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v8 = c(NA, 
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), v9 = c(NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 
NA, NA, NA, NA), v10 = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), v11 = c(NA, NA, NA, NA, 
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA), v12 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA
), v13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 1L, NA, NA), v14 = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), v15 = c(NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v16 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L
)), .Names = c("respondentID", "v1", "v2", "v3", "v4", "v5", 
"v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", 
"v16"), class = "data.frame", row.names = c(NA, -16L))

(当然,实际上,受访者1不一定选择组合1)。

(Of course, in practice respondent 1 didn't necessarily choose combination 1).

数据框中的所有信息都是数字 1,对应于电子表格中的适当组合。

All the information in the dataframe is the number "1", which corresponds to appropriate combination in the spreadsheet.

在订单中为了分析问题的答案,我需要从电子表格中提取组合,并将其导入到具有响应的数据框中,以便在数据框中获得四个新列,其中包含受访者选择的颜色(例如受访者1)的红色,蓝色,绿色,黄色。

In order to analyze responses to the question, I need to extract the combination from the spreadsheet and import it into the dataframe with responses, so that I get four new columns in the dataframe with the combination of colors chosen by a respondent (e.g. red, blue, green, yellow for respondent 1).

我认为没有任何方法可以使用apply进行操作,所以我想我需要写一个for循环以提取和导入数据。有关如何执行此操作的任何建议?

I don't think there's any way to do this using apply, so I guess I need to write a for loop to extract and import the data. Any advice on how to do this?

推荐答案

如果将第二个数据框放长形,则可以过滤每个人选择的组合,然后将第二个数据框与第一个数据框合并。这两个数据框具有组合标签,可以在这两个数据框之间进行协调。

If you put the second data frame into a long shape, you can filter for just the combinations each person chose, and then join the second data frame with the first. The two data frames have combination labels that can be reconciled between the two to join on.

请注意,我在第一个数据框中更改了列名, df1_with_id ,为 color1 ,依此类推,仅是因为否则您将拥有 v1 v2 ,...在一个数据帧中,以及 V1 V2 ,...代表彼此不同的东西。

Note that I changed the column names in the first data frame, df1_with_id, to be color1, etc, only because otherwise you would have v1, v2, ... in one data frame, and V1, V2, ... representing something different in the other. Not a necessary change, but it's good to keep from confusing what different variables mean.

library(tidyverse)

df1 <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("red", "ruby"), class = "factor"),V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,1L, 1L, 2L, 2L, 2L, 2L), .Label = c("blue", "violet"), class = "factor"),V3 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,2L, 2L, 1L, 1L, 2L, 2L), .Label = c("green", "turqoise"), class = "factor"),V4 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,2L, 1L, 2L, 1L, 2L, 1L), .Label = c("black", "yellow"), class = "factor")), .Names = c("V1","V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -16L))

df2 <- structure(list(respondentID = 1:16, v1 = c(1L, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v2 = c(NA, 1L, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v3 = c(NA,NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA),v4 = c(NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, NA,NA, NA, NA), v5 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA), v6 = c(NA, 1L, NA, NA, NA, NA, NA,NA, NA, 1L, NA, NA, NA, NA, NA, NA), v7 = c(NA, NA, NA, NA,1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v8 = c(NA,NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v9 = c(NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,NA, NA, NA, NA), v10 = c(NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), v11 = c(NA, NA, NA, NA,NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA), v12 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA), v13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,NA, 1L, NA, NA), v14 = c(NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), v15 = c(NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v16 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L)), .Names = c("respondentID", "v1", "v2", "v3", "v4", "v5","v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15","v16"), class = "data.frame", row.names = c(NA, -16L))

df1_with_id <- df1 %>% 
    setNames(paste0("color", 1:4)) %>%
    mutate(combo = paste0("v", row_number()))

head(df1_with_id)
#>   color1 color2   color3 color4 combo
#> 1    red   blue    green yellow    v1
#> 2    red   blue    green  black    v2
#> 3    red   blue turqoise yellow    v3
#> 4    red   blue turqoise  black    v4
#> 5    red violet    green yellow    v5
#> 6    red violet    green  black    v6

df2 %>%
    gather(key = combo, value = val, -respondentID) %>%
    filter(!is.na(val)) %>%
    left_join(df1_with_id, by = "combo")
#>    respondentID combo val color1 color2   color3 color4
#> 1             1    v1   1    red   blue    green yellow
#> 2             2    v2   1    red   blue    green  black
#> 3             7    v3   1    red   blue turqoise yellow
#> 4             4    v4   1    red   blue turqoise  black
#> 5            11    v4   1    red   blue turqoise  black
#> 6            12    v4   1    red   blue turqoise  black
#> 7             3    v5   1    red violet    green yellow
#> 8             2    v6   1    red violet    green  black
#> 9            10    v6   1    red violet    green  black
#> 10            5    v7   1    red violet turqoise yellow
#> 11            6    v8   1    red violet turqoise  black
#> 12            8    v9   1   ruby   blue    green yellow
#> 13            9   v11   1   ruby   blue turqoise yellow
#> 14           13   v12   1   ruby   blue turqoise  black
#> 15           14   v13   1   ruby violet    green yellow
#> 16           16   v16   1   ruby violet turqoise  black

reprex包(v0.2.0)。

这篇关于For循环提取分散在另一个R数据帧中多个列上的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆