检查一个数据框的值是否以正确的顺序存在于另一数据框中 [英] Check if values of one dataframe exist in another dataframe in exact order
问题描述
下面是一个玩具数据集。
数据框
组类型值
1泰迪
1威廉
1拉斯
2 B多洛雷斯
2 B Elsie
2 C Maeve
2 C夏洛特
2 C Bernard
Reference_A
类型值
泰迪
威廉
拉尔斯
Reference_B
类型值
B Elsie
B Dolores
Reference_C
类型值
C Maeve
C Hale
C Bernard
例如,在玩具数据集中,group1得分为1.0(100%正确),因为它在A中的所有值都与reference_A中An的值和值顺序匹配。但是,group2的得分为0.0,因为B中的值与reference_B相比是乱序,而0.66是因为C中的2/3值与reference_C中的值和值顺序匹配。
所需的输出
组类型得分
1 A 1.0
2 B 0.0
2 C 0.66
这很有帮助,但未考虑顺序:
检查一个数据帧列中的值是否存在于第二数据帧中
更新:谢谢所有提供解决方案的人!这些解决方案非常适合玩具数据集,但尚未适用于具有更多列的数据集。再次,就像我在帖子中所写的那样,上面列出的列很重要-我宁愿在必要时不要删除不需要的列。
我们也可以使用 mget
来返回列表
的 data.frames
,将它们绑定在一起,然后按逻辑矢量平均值
进行分组
library(dplyr)
mget(ls(pattern ='^ Reference_ [AZ] $'))%&%;%
bind_rows()%&%;%
bind_cols(df1)%>%
group_by(group,type = type ... 1)%>%
summarise(分数=平均值(值... 2 ==值)。 ..5))
#组:组[2]
#组类型得分
#< int> < chr> < dbl>
#1 1 A 1
#2 2 B 0
#3 2 C 0.667
I have 1 dataframe of data and multiple "reference" dataframes. I'm trying to automate checking if values of the dataframe match the values of the reference dataframes. Importantly, the values must also be in the same order as the values in the reference dataframes. These columns are of the columns of importance, but my real dataset contains many more columns.
Below is a toy dataset.
Dataframe
group type value
1 A Teddy
1 A William
1 A Lars
2 B Dolores
2 B Elsie
2 C Maeve
2 C Charlotte
2 C Bernard
Reference_A
type value
A Teddy
A William
A Lars
Reference_B
type value
B Elsie
B Dolores
Reference_C
type value
C Maeve
C Hale
C Bernard
For example, in the toy dataset, group1 would score 1.0 (100% correct) because all its values in A match the values and order of values of An in reference_A. However, group2 would score 0.0 because the values in B are out of order compared to reference_B and 0.66 because 2/3 values in C match the values and order of values in reference_C.
Desired output
group type score
1 A 1.0
2 B 0.0
2 C 0.66
This was helpful, but does not take order into account: Check whether values in one data frame column exist in a second data frame
Update: Thank you to everyone that has provided solutions! These solutions are great for the toy dataset, but have not yet been adaptable to datasets with more columns. Again, like I wrote in my post, the columns that I've listed above are of importance — I'd prefer to not drop the unneeded columns if necessary.
We may also do this with mget
to return a list
of data.frames
, bind them together, and do a group by mean
of logical vector
library(dplyr)
mget(ls(pattern = '^Reference_[A-Z]$')) %>%
bind_rows() %>%
bind_cols(df1) %>%
group_by(group, type = type...1) %>%
summarise(score = mean(value...2 == value...5))
# Groups: group [2]
# group type score
# <int> <chr> <dbl>
#1 1 A 1
#2 2 B 0
#3 2 C 0.667
这篇关于检查一个数据框的值是否以正确的顺序存在于另一数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!