在R中的grepl之后过滤数据集? [英] Filtering a dataset after grepl in R?
问题描述
我有以下数据集:
USERNAME API_TRACK_EVENT TIME
userA浏览图片1454941960
userA订单/ payment 1454941972
userA订单/更改地址1454941976
用户B已浏览图片1454941983
用户B订单/客人姓名1454941986
用户B订单/更改地址1454941992
我只想采取较早的订单,这意味着userA的订单/付款和userB的订单/客户端 p>
说完,所有其他非秩序事件应该保持不变。
所以,输出数据集是:
USERNAME API_TRACK_EVENT TIME
userA已浏览图片1454941960
userA订单/付款1454941972
userB浏览图片1454941983
userB订单/ guestlogin 1454941986
那么,我该怎么做? [打开使用dplyr。]
这是一个基本R的选项:
0)根据USERNAME和TIME命令数据:
df < - df [order(df $ USERNAME,df $ TIME),]
a)检查行是否包含订单信息:
idx< - grepl(Order,df $ API_TRACK_EVENT,ignore.case = TRUE)
b)按USERNAME分组的子集
子集(df,ave(idx,USERNAME,FUN = cumsum)< = 1L |!idx)
#USERNAME API_TRACK_EVENT TIME
#1 userA已阅读1454941960
# userA订单/付款1454941972
#4 userB已查看_p_p1454941983
#5 userB订单/ guestlogin 1454941986
这个子集只有第一个订单行和任何其他行(没有订单信息)。
I have the following dataset:
USERNAME API_TRACK_EVENT TIME
userA Viewed pic 1454941960
userA Order/payment 1454941972
userA Order/Changed Address 1454941976
userB Viewed pic 1454941983
userB Order/guestlogin 1454941986
userB Order/Changed Address 1454941992
I want to take only the earlier "Order", which means "Order/payment" for userA and "Order/guestlogin" for userB.
Having said that, all the other non-order events should remain the same.
So, the output dataset would be:
USERNAME API_TRACK_EVENT TIME
userA Viewed pic 1454941960
userA Order/payment 1454941972
userB Viewed pic 1454941983
userB Order/guestlogin 1454941986
So, how should I do this? [Open to use dplyr too.]
Here's an option with base R:
0) order the data according to USERNAME and TIME:
df <- df[order(df$USERNAME, df$TIME),]
a) Check whether rows contain order-information:
idx <- grepl("Order", df$API_TRACK_EVENT, ignore.case = TRUE)
b) Subset by group of USERNAME
subset(df, ave(idx, USERNAME, FUN = cumsum) <= 1L | !idx)
# USERNAME API_TRACK_EVENT TIME
#1 userA Viewed_pic 1454941960
#2 userA Order/payment 1454941972
#4 userB Viewed_pic 1454941983
#5 userB Order/guestlogin 1454941986
This subsets only the first order-row and any other rows (without order info).
这篇关于在R中的grepl之后过滤数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!