如果值在范围内,则合并 2 个数据帧 [英] Merge 2 dataframes if value within range

查看:24
本文介绍了如果值在范围内,则合并 2 个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经为此苦苦挣扎了一段时间,但找不到任何方法,因此如果您能提供帮助,我将不胜感激!我是编程新手,我的代码可能效率低下,但这是我能想到的最好的方法.

基本上,我有 2 个 .csv 文件(fixes.csv 和 zone.csv),它们包含不同的变量并具有不同的行数和列数.第一个文件 fix.csv 包含在实验过程中记录的眼球运动数据,看起来像这样:

Order Participant Sentence Fixation StartPosition1 1 1 1 -6.892 1 1 2 -5.883 1 1 3 -5.334 1 1 4 -4.095 1 1 5 -5.36

这包含在句子阅读期间制作的眼动记录.所发生的情况是,20 名参与者每人阅读一组 40 个 12 个单词的句子,对每个句子中的不同单词进行多次注视,有时还会回顾以前读过的单词.StartPosition 列包含屏幕上注视开始的位置(以视角为单位).值通常在 -8deg 和 8deg 之间.

第二个文件 zone.csv 包含有关句子的信息.40 个句子中的每一个都包含 12 个单词,每个单词形成一个兴趣区.zone.csv 看起来像这样:

句区 ZoneStart ZoneEnd1 1 -8.86 -7.491 2 -7.49 -5.891 3 -5.88 -4.511 4 -4.51 -2.90

ZoneStart 和 ZoneEnd 表示屏幕上每个区域的开始和结束坐标(以视角为单位).因为每个句子中的单词不同,每个区域都有一个宽度.

我想要做的是同时使用这两个文件,以便将 zone.csv 中的区域编号分配到 fix.csv 中的注视点.因此,例如,如果句子 1 中的第一个注视起始位置落在区域 1 的范围内,我希望将值 1 分配给它,以便结束文件看起来像这样:

Order Participant Sentence Fixation StartPosition Zone1 1 1 1 -6.89 22 1 1 2 -5.88 23 1 1 3 -5.33 34 1 1 4 -4.09 35 1 1 5 -5.36 3

到目前为止我尝试过的是使用循环来自动化该过程.

zones = read.csv(file.choose(), header = TRUE, sep = ",")修复 = read.csv(file.choose(), header = TRUE, sep = ",")fixs$SentNo = as.factor(fixes$SentNo)zone$Sentence = as.factor(zones$Sentence)zone$Zone = as.factor(zones$Zone)nfix = nrow(fixes) ##fixes.csv 文件中的注视次数nsent = nlevels(fixes$Sentence) ##数据文件fixes.csv中的句子数nzs =​​ nlevels(zones1$Zone) ## 文件zones.csv 中每个句子的区域数nsz = nlevels(zones$Sentence) ##数据文件zones.csv中的句子数修复 $Zone = 0for (i in c(1:nfix)){for (j in c(1:nzs)){for (k in c(1:nsent){for (l in c(1:nsz)){while(fixes$Sentence[k] == zone$Sentence[l]){ifelse(fixes$StartPosition[i]>zones$ZoneStart[j]&修复$StratPosition[i] <zone1$ZoneEnd[j],修复$Zone[i] ->zone1$Zone[j], 0)返回(修复$区域)}}}}

但这只会返回大量的零,而不是为每个注视点分配一个区域编号.当它们具有不同的行数和列数时,甚至可以以这种方式使用 2 个单独的 .csv 文件吗?我尝试通过 Sentence 合并它们并从一个大的组合文件中工作,但这没有帮助,因为它似乎混淆了一个文件中的注视顺序和另一个文件中的区域顺序.

任何帮助将不胜感激!

谢谢!

解决方案

使用 v1.9.8 版本(CRAN 2016 年 11 月 25 日),data.table 获得了执行非-equi 连接范围连接:

library(data.table)setDT(fixes)[setDT(zones),on = .(Sentence, StartPosition >= ZoneStart, StartPosition 

<块引用>

 Order Participant Sentence Fixation StartPosition Zone1: 1 1 1 1 -6.89 22: 2 1 1 2 -5.88 33:3 1 1 3 -5.33 34:4 1 1 4 -4.09 45:5 1 1 5 -5.36 3

数据

修复了 <- readr::read_table("Order Participant Sentence Fixation StartPosition1 1 1 1 -6.892 1 1 2 -5.883 1 1 3 -5.334 1 1 4 -4.095 1 1 5 -5.36")区域 <- readr::read_table("句子区 ZoneStart ZoneEnd1 1 -8.86 -7.491 2 -7.49 -5.891 3 -5.88 -4.511 4 -4.51 -2.90")

I have been struggling with this for some time now and couldn't find any way of doing it, so I would be incredibly grateful if you could help! I am a novice in programming and my code is probably inefficient, but this was the best I could come up with.

Basically, I have 2 .csv files (fixes.csv and zones.csv) which contain different variables and have different numbers of rows and columns. The first file fixes.csv contains eye movement data recorded during an experiment and looks something like this:

Order Participant Sentence Fixation StartPosition
1       1          1         1       -6.89
2       1          1         2       -5.88
3       1          1         3       -5.33
4       1          1         4       -4.09
5       1          1         5       -5.36      

This contains eye movement recordings made during sentence reading. What happens is that each of 20 participants reads a set of 40 12-word sentences, making several fixations on different words in each sentence, and sometimes going back to look at previously read words. The StartPosition column contains the position on the screen (in degrees of visual angle) where the fixation started. Values are generally between -8deg and 8deg.

The second file zones.csv contains information about the sentences. Each of the 40 sentences contains 12 words, and each word forms one zone of interest. zones.csv looks something like this:

Sentence     Zone  ZoneStart   ZoneEnd
  1           1     -8.86      -7.49
  1           2     -7.49      -5.89
  1           3     -5.88      -4.51
  1           4     -4.51      -2.90

ZoneStart and ZoneEnd indicate the starting and ending coordinates of each zone on the screen (in deg of visual angle). Because the words in each sentence are different, each zone has a width.

What I would like to do is use both files simultaneously in order to assign zone numbers from zones.csv to fixations from fixes.csv. So for example, if the first fixation starting position in Sentence 1 falls within the range of Zone 1, I want the value 1 to be assigned to it so that the end file looks something like this:

Order Participant Sentence Fixation StartPosition Zone
1       1          1        1        -6.89          2
2       1          1        2        -5.88          2
3       1          1        3        -5.33          3
4       1          1        4        -4.09          3
5       1          1        5        -5.36          3   

What I have tried so far is using a loop to automate the process.

zones = read.csv(file.choose(), header = TRUE, sep = ",")
fixes = read.csv(file.choose(), header = TRUE, sep = ",")

fixes$SentNo = as.factor(fixes$SentNo)
zones$Sentence = as.factor(zones$Sentence)
zones$Zone = as.factor(zones$Zone)

nfix = nrow(fixes) ## number of fixations in file fixes.csv
nsent = nlevels(fixes$Sentence) ## number of sentences in data file fixes.csv
nzs = nlevels(zones1$Zone) ## number of zones per sentence from file zones.csv
nsz = nlevels(zones$Sentence) ## number of sentences in data file zones.csv

fixes$Zone = 0

for (i in c(1:nfix)){
  for (j in c(1:nzs)){
    for (k in c(1:nsent){
      for (l in c(1:nsz)){ 
        while(fixes$Sentence[k] == zones$Sentence[l]){
          ifelse(fixes$StartPosition[i] > zones$ZoneStart[j]  
          & fixes$StratPosition[i] < zones1$ZoneEnd[j], 
          fixes$Zone[i] -> zones1$Zone[j], 0)
        return(fixes$Zone)
}
}
}
}

But this just returns loads of zeros, rather than assigning a zone number to each fixation. Is it even possible to use 2 separate .csv files in this way when they have different numbers of rows and columns? I tried merging them by Sentence and working from a large combined file, but that didn't help, as it seemed to mess up the order of fixations in one file and the order of zones in the other.

Any help would be greatly appreciated!

Thank you!

解决方案

With version v1.9.8 (on CRAN 25 Nov 2016), data.table has gained the ability to perform non-equi joins and range joins:

library(data.table)
setDT(fixes)[setDT(zones), 
             on = .(Sentence, StartPosition >= ZoneStart, StartPosition < ZoneEnd), 
             Zone := Zone][]

   Order Participant Sentence Fixation StartPosition Zone
1:     1           1        1        1         -6.89    2
2:     2           1        1        2         -5.88    3
3:     3           1        1        3         -5.33    3
4:     4           1        1        4         -4.09    4
5:     5           1        1        5         -5.36    3

Data

fixes <- readr::read_table(
  "Order Participant Sentence Fixation StartPosition
  1       1          1         1       -6.89
  2       1          1         2       -5.88
  3       1          1         3       -5.33
  4       1          1         4       -4.09
  5       1          1         5       -5.36"
)
zones <- readr::read_table(
  "Sentence     Zone  ZoneStart   ZoneEnd
  1           1     -8.86      -7.49
  1           2     -7.49      -5.89
  1           3     -5.88      -4.51
  1           4     -4.51      -2.90"
)

这篇关于如果值在范围内,则合并 2 个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆