数据表-在几列上应用相同的功能以创建新的数据表列 [英] Data table - apply the same function on several columns to create new data table columns

查看:52
本文介绍了数据表-在几列上应用相同的功能以创建新的数据表列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用data.table包.我有一个数据表,代表用户在网站上的操作.假设每个用户都可以访问一个网站,并对该网站执行多项操作.我的原始数据表是动作(每行都是一个动作),我想将此信息汇总到一个新的数据表中,并按用户访问进行分组(每次访问都有唯一的ID).同一访问的操作共有一些字段,例如,用户名,用户状态,访问号码等.每次访问中至少有一个操作包含此信息(不一定是所有操作) ).我想为每次访问(=具有相同访问ID的一组操作)检索此字段的值,并将其设置为访问新数据"表中的访问.例如,如果我有以下原始数据表:

I am working with data.table package. I have a data table which represents users actions on a website. Let's say that every user can visit a website, and perform multiple actions on it. My original data table is of actions (every row is an action) and I want to aggregate this information into a new data table, grouped by user visits (every visit has a unique ID). There are some fields which are shared by the actions of the same visit - for example - the user name, the user status, the visit number etc. At least one of the actions of each visit contains this info (not necessarily all of the actions). I want to retrieve, for each visit (= group of actions with the same visit ID), the value of this field, and set it to the visit in the visits new data table. For example, if I have the following original data table:

VisitID     ActionNum    UserName   UserStatus    VisitNum   ActionType
aaaaaaa        1           John        Active        5           x
aaaaaaa        2                       Active                    y
aaaaaaa        3           John                      5           z
bbbbbbb        1                      NonActive                  w
bbbbbbb        2           Dan                       7           t

我想要一个访问数据表,如下所示:

I want to have a visits data table, as following:

VisitID  UserName   UserStatus   VisitNum
aaaaaaa   John       Active        5
bbbbbbb   Dan        NonActive     7

我创建了一个对数据表的子集(仅访问行)和一个字段起作用的函数,并且该函数应应用于多个字段(UserName,UserStatus,VisitNum).

I created a function that works on subset of data table (only the rows of the visit) and a field, and this function should be applied on several fields (UserName, UserStatus, VisitNum).

getGeneralField<- function(visitDT,field){
  vec = visitDT[,get(field)]
  return (unique(vec[vec != ""])[1])
}

问题是,当by = VisitID时,每次在.SD上应用此功能的尝试都会导致某些事情与我计划的有所不同...最好的方法是什么?我使用!="以避免空白单元格.

The problem is that every trial to apply this function on .SD when by=VisitID results in something different than I planned... What is the best way to do it? I used !="" in order to avoid blank cells.

推荐答案

我们在.SDcols中指定感兴趣的列(按"VisitID"分组),循环遍历.SDcols(lapply(.SD, ...)中的列并获取第一个非空白元素

We specify the columns of interest in .SDcols, grouped by 'VisitID', loop through the columns in .SDcols (lapply(.SD, ...) and get the first non-blank element

dt[, lapply(.SD, function(x) x[nzchar(x)][1]), by = VisitID, .SDcols = 3:5]

这篇关于数据表-在几列上应用相同的功能以创建新的数据表列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆