如果值在范围内,则合并2个数据帧 [英] Merge 2 dataframes if value within range

查看:107
本文介绍了如果值在范围内,则合并2个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力这一段时间,现在找不到任何办法,所以我会非常感激,如果你能帮助!我是编程的新手,我的代码可能是低效的,但这是最好的我可以想出。



基本上,我有2个.csv文件(fixes.csv和zones.csv),它们包含不同的变量,行数和列数不同。第一个文件fixes.csv包含在实验期间记录的眼动数据,如下所示:

 订单参与者句子固定StartPosition 
1 1 1 1 -6.89
2 1 1 2 -5.88
3 1 1 3 -5.33
4 1 1 4 -4.09
5 1 1 5 -5.36

这包含在句子阅读期间进行的眼动记录。发生的是,20个参与者中的每一个读取一组40个12字的句子,在每个句子中对不同的单词进行几个固定,并且有时返回以查看先前读取的单词。 StartPosition列包含屏幕上固定开始的位置(以视角度)。值通常在-8deg和8deg之间。



第二个文件z​​ones.csv包含关于句子的信息。 40个句子中的每一个包含12个词,并且每个词形成一个感兴趣区域。 zones.csv看起来像这样:

  Sentence Zone ZoneStart ZoneEnd 
1 1 -8.86 -7.49
1 2 -7.49 -5.89
1 3 -5.88 -4.51
1 4 -4.51 -2.90

ZoneStart和ZoneEnd表示屏幕上每个区域的开始和结束坐标(以视角deg)。因为每个句子中的单词是不同的,每个区域都有一个宽度。



我想要做的是同时使用这两个文件,以便从zones.csv分配区域号到fixes.csv的固定。因此,例如,如果Sentence 1中的第一个注视开始位置在区域1的范围内,则我想要将值1分配给它,以便结束文件看起来像这样:

 订单参与者句子固定StartPosition区
1 1 1 1 -6.89 2
2 1 1 2 -5.88 2
3 1 1 3 -5.33 3
4 1 1 4 -4.09 3
5 1 1 5 -5.36 3

我到目前为止所尝试的是使用循环来自动化过程。

  zones = read.csv(file.choose(),header = TRUE,sep =,)
fixes = read.csv(file.choose(),header = TRUE,sep =,)

修复$ SentNo = as.factor(修复$ SentNo)
zones $ Sentence = asesfactor(zones $ Sentence)
zones $ Zone = as.factor(zones $ Zone)

nfix = nrow(fixes)## fixes.csv
nsent = nlevels(fixes $ Sentence)##数据文件中的句子数fixes.csv
nzs =​​ nlevels(zones1 $ Zone)##文件中每句话的区域数zones.csv
nsz = nlevels(zones $ Sentence)##数据文件中句子的数量zones.csv

修复$ Zone = 0

for(i in c(1:nfix)) {
for(j in c(1:nzs)){
for(k in c(1:nsent){
for(l in c(1:nsz)){
while(fixes $ Sentence [k] == zones $ Sentence [l]){
ifelse(fixes $ StartPosition [i]> zones $ ZoneStart [j]
& fixes $ StratPosition [ i]< zones1 $ ZoneEnd [j],
修复$ Zone [i] - > zones1 $ Zone [j],0)
return(fixes $ Zone)
}
}
}
}

但这只是返回零的载荷,而不是为每个固定分配一个区域号。当它们有不同数量的行和列时,是否甚至可以这样使用2个单独的.csv文件?我尝试通过Sentence合并它们,并从一个大的组合文件中工作,但这没有帮助,因为它似乎搞乱了一个文件中的固定顺序和其他区域的顺序。



任何帮助将非常感谢!



谢谢!

解决方案

Bioconductor IRanges,做你想要的。



首先,为您的区域形成IRanges对象:

  .ranges<  -  with(zones,IRanges(ZoneStart,ZoneEnd))

  zone.ind<  -  findOverlaps(修复$ StartPosition,zone.ranges,select =arbitrary)



现在您已经在 zones 您可以合并:

 修复$ Zone<  -  zones $ Zone [zone.ind] 

编辑:刚刚意识到你有浮点值,而IRanges是基于整数的。因此,考虑到您的精度,您需要将坐标乘以100。


I have been struggling with this for some time now and couldn't find any way of doing it, so I would be incredibly grateful if you could help! I am a novice in programming and my code is probably inefficient, but this was the best I could come up with.

Basically, I have 2 .csv files (fixes.csv and zones.csv) which contain different variables and have different numbers of rows and columns. The first file fixes.csv contains eye movement data recorded during an experiment and looks something like this:

Order Participant Sentence Fixation StartPosition
1       1          1         1       -6.89
2       1          1         2       -5.88
3       1          1         3       -5.33
4       1          1         4       -4.09
5       1          1         5       -5.36      

This contains eye movement recordings made during sentence reading. What happens is that each of 20 participants reads a set of 40 12-word sentences, making several fixations on different words in each sentence, and sometimes going back to look at previously read words. The StartPosition column contains the position on the screen (in degrees of visual angle) where the fixation started. Values are generally between -8deg and 8deg.

The second file zones.csv contains information about the sentences. Each of the 40 sentences contains 12 words, and each word forms one zone of interest. zones.csv looks something like this:

Sentence     Zone  ZoneStart   ZoneEnd
  1           1     -8.86      -7.49
  1           2     -7.49      -5.89
  1           3     -5.88      -4.51
  1           4     -4.51      -2.90

ZoneStart and ZoneEnd indicate the starting and ending coordinates of each zone on the screen (in deg of visual angle). Because the words in each sentence are different, each zone has a width.

What I would like to do is use both files simultaneously in order to assign zone numbers from zones.csv to fixations from fixes.csv. So for example, if the first fixation starting position in Sentence 1 falls within the range of Zone 1, I want the value 1 to be assigned to it so that the end file looks something like this:

Order Participant Sentence Fixation StartPosition Zone
1       1          1        1        -6.89          2
2       1          1        2        -5.88          2
3       1          1        3        -5.33          3
4       1          1        4        -4.09          3
5       1          1        5        -5.36          3   

What I have tried so far is using a loop to automate the process.

zones = read.csv(file.choose(), header = TRUE, sep = ",")
fixes = read.csv(file.choose(), header = TRUE, sep = ",")

fixes$SentNo = as.factor(fixes$SentNo)
zones$Sentence = as.factor(zones$Sentence)
zones$Zone = as.factor(zones$Zone)

nfix = nrow(fixes) ## number of fixations in file fixes.csv
nsent = nlevels(fixes$Sentence) ## number of sentences in data file fixes.csv
nzs = nlevels(zones1$Zone) ## number of zones per sentence from file zones.csv
nsz = nlevels(zones$Sentence) ## number of sentences in data file zones.csv

fixes$Zone = 0

for (i in c(1:nfix)){
  for (j in c(1:nzs)){
    for (k in c(1:nsent){
      for (l in c(1:nsz)){ 
        while(fixes$Sentence[k] == zones$Sentence[l]){
          ifelse(fixes$StartPosition[i] > zones$ZoneStart[j]  
          & fixes$StratPosition[i] < zones1$ZoneEnd[j], 
          fixes$Zone[i] -> zones1$Zone[j], 0)
        return(fixes$Zone)
}
}
}
}

But this just returns loads of zeros, rather than assigning a zone number to each fixation. Is it even possible to use 2 separate .csv files in this way when they have different numbers of rows and columns? I tried merging them by Sentence and working from a large combined file, but that didn't help, as it seemed to mess up the order of fixations in one file and the order of zones in the other.

Any help would be greatly appreciated!

Thank you!

解决方案

There is a package in Bioconductor called IRanges that does what you want.

First, form an IRanges object for your zones:

zone.ranges <- with(zones, IRanges(ZoneStart, ZoneEnd))

Next, find the overlaps:

zone.ind <- findOverlaps(fixes$StartPosition, zone.ranges, select="arbitrary")

Now you have indices into the rows of the zones data frame, so you can merge:

fixes$Zone <- zones$Zone[zone.ind]

Edit: Just realized you have floating point values, while IRanges is integer-based. So you would need to multiply the coordinates by 100, given your precision.

这篇关于如果值在范围内,则合并2个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆