R:尝试根据另一个数据框的位置计算一个数据框内的货币数量 [英] R: Trying to count the number of currencies in one data frame based on the positions of the other data frame

查看:75
本文介绍了R:尝试根据另一个数据框的位置计算一个数据框内的货币数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据帧,XY.

X <- data.frame(V1 = c("chr1", "chr1", "chr1", "chr2", "chr2", "ch2"),
                Start = c(0, 540, 920, 0, 582, 715 ),
                Stop = c(230, 720, 1270, 350, 635, 950))

Y <- data.frame(V1 = c("chr1", "chr1", "chr1", "chr2", "chr2", "ch2"),
                Start = c(3, 16, 180,
                          15, 585, 800 ),
                Stop = c(15, 24, 201,
                         102, 612, 850))

我想获得一个data.frame Z,它是一个新的data.frame,其信息为X,且计数值在每个"X"行的范围之间为Y.例如,您可以计算chr1中第一行"X"的范围之间的3行"Y",因此该行的"Z"中有3行.

I want to obtain a data.frame Z which is a new data.frame with the info of X and the counts of Y between the range of each "X" row. For example, you can count 3 rows of "Y" which are between the range of the first row of "X" in chr1, so I have a 3 in "Z" in that row.

Z <- data.frame(V1 = c("chr1", "chr1", "chr2", "chr2", "chr2", "ch2"),
                Start = c(0, 540, 920, 0, 582, 715 ),
                Stop = c(230, 720, 1270, 350, 635, 950),
                Count = c(3, 0, 0, 1, 1, 1))

我将不胜感激,因为直到今天,如果"X"数据集只有一行,我只能设法打印行数,但是我不知道如何实现我的目标.我想我必须使用一些条件语句以及一个for循环来遍历"X"的行,但是我不知道该怎么做.

I would appreciate some help, because until today I have only managed to print the number of rows if "X" dataset has only one row, but I don't know how to achieve my goal. I suppose I have to use some conditional statements plus a for loop to iterate over the rows of "X", but I don't know how to do it.

我尝试过的事情:

  1. 试图计算与条件匹配的行数,其中"Y"中只有一行:

  1. Tried to calculate the number of rows that match the criteria with only one row in "Y":

nrow(Y[Y$Start >= X$Start & Y$Stop <= X$Stop, ])

在"X"中只有1行时有效,但是当我尝试在for循环中实现它时则无效.

Worked when there is only 1 row in "X", but not when I tried to implement it in a for loop.

推荐答案

,您可以使用tidyverse包进行此操作.

you can do this using the tidyverse package.

首先,我建议选择选项stringsAsFactors = FALSE.

First I would recommend to choose the option stringsAsFactors = FALSE.

X <- data.frame(V1 = c("chr1", "chr1", "chr1", "chr2", "chr2", "ch2"),
                Start = c(0, 540, 920, 0, 582, 715 ),
                Stop = c(230, 720, 1270, 350, 635, 950), stringsAsFactors = F)

Y <- data.frame(V1 = c("chr1", "chr1", "chr1", "chr2", "chr2", "ch2"),
                Start = c(3, 16, 180,
                          15, 585, 800 ),
                Stop = c(15, 24, 201,
                         102, 612, 850), stringsAsFactors = F)



library(tidyverse)
X %>%
  mutate(count = pmap_int(list(V1, Start, Stop), ~filter(Y, V1 == ..1,  Start >= ..2, Stop <=..3) %>% nrow))

    V1 Start Stop count
1 chr1     0  230     3
2 chr1   540  720     0
3 chr1   920 1270     0
4 chr2     0  350     1
5 chr2   582  635     1
6  ch2   715  950     1

这篇关于R:尝试根据另一个数据框的位置计算一个数据框内的货币数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆