在R中的grepl之后过滤数据集? [英] Filtering a dataset after grepl in R?

查看:181
本文介绍了在R中的grepl之后过滤数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集:

  USERNAME API_TRACK_EVENT TIME 
userA浏览图片1454941960
userA订单/ payment 1454941972
userA订单/更改地址1454941976
用户B已浏览图片1454941983
用户B订单/客人姓名1454941986
用户B订单/更改地址1454941992

我只想采取较早的订单,这意味着userA的订单/付款和userB的订单/客户端 p>

说完,所有其他非秩序事件应该保持不变。



所以,输出数据集是:

  USERNAME API_TRACK_EVENT TIME 
userA已浏览图片1454941960
userA订单/付款1454941972
userB浏览图片1454941983
userB订单/ guestlogin 1454941986

那么,我该怎么做? [打开使用dplyr。]

解决方案

这是一个基本R的选项:



0)根据USERNAME和TIME命令数据:

  df < -  df [order(df $ USERNAME,df $ TIME),] 

a)检查行是否包含订单信息:

  idx<  -  grepl(Order,df $ API_TRACK_EVENT,ignore.case = TRUE)

b)按USERNAME分组的子集

 子集(df,ave(idx,USERNAME,FUN = cumsum)< = 1L |!idx)

#USERNAME API_TRACK_EVENT TIME
#1 userA已阅读1454941960
# userA订单/付款1454941972
#4 userB已查看_p_p1454941983
#5 userB订单/ guestlogin 1454941986

这个子集只有第一个订单行和任何其他行(没有订单信息)。


I have the following dataset:

USERNAME API_TRACK_EVENT         TIME
userA    Viewed pic              1454941960
userA    Order/payment           1454941972
userA    Order/Changed Address   1454941976
userB    Viewed pic              1454941983
userB    Order/guestlogin        1454941986
userB    Order/Changed Address   1454941992

I want to take only the earlier "Order", which means "Order/payment" for userA and "Order/guestlogin" for userB.

Having said that, all the other non-order events should remain the same.

So, the output dataset would be:

USERNAME API_TRACK_EVENT         TIME
userA    Viewed pic              1454941960
userA    Order/payment           1454941972
userB    Viewed pic              1454941983
userB    Order/guestlogin        1454941986

So, how should I do this? [Open to use dplyr too.]

解决方案

Here's an option with base R:

0) order the data according to USERNAME and TIME:

df <- df[order(df$USERNAME, df$TIME),]

a) Check whether rows contain order-information:

idx <- grepl("Order", df$API_TRACK_EVENT, ignore.case = TRUE)

b) Subset by group of USERNAME

subset(df, ave(idx, USERNAME, FUN = cumsum) <= 1L | !idx)

#  USERNAME  API_TRACK_EVENT       TIME
#1    userA       Viewed_pic 1454941960
#2    userA    Order/payment 1454941972
#4    userB       Viewed_pic 1454941983
#5    userB Order/guestlogin 1454941986

This subsets only the first order-row and any other rows (without order info).

这篇关于在R中的grepl之后过滤数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆