使用模糊逻辑连接两个数据集 [英] Joining two datasets using fuzzy logic
本文介绍了使用模糊逻辑连接两个数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试在两个数据集之间的R中进行模糊逻辑联接:
I’m trying to do a fuzzy logic join in R between two datasets:
- 第一个数据集具有一个位置的名称和一个名为
config
的列
- 第二个数据集具有一个位置的名称和两个附加属性,在将它们加入第一个数据集之前需要对其进行汇总.
- first data set has the name of a location and a column called
config
- second data set has the name of a location and two additional attributes that need to be summarized before they are joined to the first data set.
我想使用name
列在两个数据集之间进行联接.但是,name
列在数据集中可能包含其他字符或前导字符,或者在较大的单词内部包含一个单词.因此,例如,如果我们查看这两个数据集,我希望名称OPAL加入OPALAS,而SAUSALITO Y加入SAUSALITO.
I would like to use the name
column to join between the two data sets. However the name
column may have additional or leading characters in either data set or have one word contained inside of a larger word. So for example if we looked at these two data sets, I'd like the name OPAL to join to the OPALAS, and SAUSALITO Y to join to SAUSALITO.
Dataset1:
Name Config
ALTO D BB
CONTRA ST
EIGHT A DD
OPALAS BB
SAUSALITO Y AA
SOLANO J ST
Dataset2:
Name Age Rank
ALTO D 50 2
ALTO D 20 6
CONTRA 10 10
CONTRA 15 15
EIGHTH 18 21
OPAL 19 4
SAUSALITO 2 12
SOLANO 34 43
数据集2汇总代码
Data2a <- summaryBy(Age ~ Name,FUN=c(mean), data=Data2,na.rm=TRUE)
Data2b <- summaryBy(Rank ~ Name,FUN=c(sum), data=Data2,na.rm=TRUE)
Data2 <- data.frame(Data2a$Name, Data2a$Age.mean, Data2b$Rank.sum)
Desired Outcome:
Name Config Age Rank
ALTO D BB 35 8
CONTRA ST 12.5 25
EIGHT A DD 18 21
OPALAS BB 19 4
SAUSALITO Y AA 12 5
SOLANO J ST 34 43
推荐答案
我能够使用Fuzzyjoin包将两个数据集结合起来:
I was able to join the two datasets, using the fuzzyjoin package:
library(fuzzyjoin)
stringdist_inner_join(Dataset1, Data2,
by ="Name", distance_col = NULL)
这篇关于使用模糊逻辑连接两个数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文