R中数据框的子集 [英] subset of a data frame in R
问题描述
我有 2 个数据帧 df2
和 DF
.
I have 2 data frames df2
and DF
.
> DF
date tickers
1 2000-01-01 B
2 2000-01-01 GOOG
3 2000-01-01 V
4 2000-01-01 YHOO
5 2000-01-02 XOM
> df2
date tickers quantities
1 2000-01-01 BB 11
2 2000-01-01 XOM 23
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
8 2000-01-02 BB 422
我需要来自 df2
的值,这些值存在于 DF
中.这意味着我需要以下输出:
i need the values from df2
those are present in DF
. That means i require the following output:
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
所以我使用了以下代码:
So I used the following code:
> subset(df2,df2$date %in% DF$date & df2$tickers %in% DF$tickers)
date tickers quantities
2 2000-01-01 XOM 23
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
但是输出包含一个额外的列.那是因为 ticker
'xom' 在 2 天内出现在 df2
中.所以两行都被选中.我的代码需要什么修改?
But the output contains one extra column.That is because the ticker
'xom' is present in 2 days in df2
. so both rows are selected. What modification is needed in my code?
dput如下:
> dput(DF)
structure(list(date = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("2000-01-01",
"2000-01-02"), class = "factor"), tickers = structure(c(4L, 5L,
6L, 8L, 7L), .Label = c("A", "AA", "AAPL", "B", "GOOG", "V",
"XOM", "YHOO", "Z"), class = "factor")), .Names = c("date", "tickers"
), row.names = c(NA, -5L), class = "data.frame")
> dput(df2)
structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L), .Label = c("2000-01-01", "2000-01-02"), class = "factor"),
tickers = structure(c(2L, 5L, 3L, 6L, 4L, 1L, 5L, 2L), .Label = c("B",
"BB", "GOOG", "V", "XOM", "YHOO"), class = "factor"), quantities = c(11,
23, 42, 21, 2112, 13, 24, 422)), .Names = c("date", "tickers",
"quantities"), row.names = c(NA, -8L), class = "data.frame")
推荐答案
这没什么不同 来自我对您这篇帖子的回答,但需要稍作修改:
This is not so different from my answer to this post of yours, but requires a little modification:
df2[duplicated(rbind(DF, df2[,1:2]))[-seq_len(nrow(DF))], ]
# date tickers quantities
# 3 2000-01-01 GOOG 42
# 4 2000-01-01 YHOO 21
# 5 2000-01-01 V 2112
# 6 2000-01-01 B 13
# 7 2000-01-02 XOM 24
注意:这提供了与 df2
中行顺序相同的输出.
Note: This provides the output with the rows in the same order as it were in df2
.
或者,正如 Ben 建议的那样,使用 merge
:
Alternatively, as Ben suggests, using merge
:
merge(df2, DF, by=c("date", "tickers"))
也会给出相同的结果(但不一定以相同的顺序).
will give the same result as well (but not necessarily in the same order).
这篇关于R中数据框的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!