数据表-在几列上应用相同的功能以创建新的数据表列 [英] Data table - apply the same function on several columns to create new data table columns

查看：52 发布时间：2020/4/27 5:13:19 r user-defined-functions aggregation lapply

本文介绍了数据表-在几列上应用相同的功能以创建新的数据表列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用data.table包.我有一个数据表，代表用户在网站上的操作.假设每个用户都可以访问一个网站，并对该网站执行多项操作.我的原始数据表是动作(每行都是一个动作)，我想将此信息汇总到一个新的数据表中，并按用户访问进行分组(每次访问都有唯一的ID).同一访问的操作共有一些字段，例如，用户名，用户状态，访问号码等.每次访问中至少有一个操作包含此信息(不一定是所有操作) ).我想为每次访问(=具有相同访问ID的一组操作)检索此字段的值，并将其设置为访问新数据"表中的访问.例如，如果我有以下原始数据表:

I am working with data.table package. I have a data table which represents users actions on a website. Let's say that every user can visit a website, and perform multiple actions on it. My original data table is of actions (every row is an action) and I want to aggregate this information into a new data table, grouped by user visits (every visit has a unique ID). There are some fields which are shared by the actions of the same visit - for example - the user name, the user status, the visit number etc. At least one of the actions of each visit contains this info (not necessarily all of the actions). I want to retrieve, for each visit (= group of actions with the same visit ID), the value of this field, and set it to the visit in the visits new data table. For example, if I have the following original data table:

VisitID     ActionNum    UserName   UserStatus    VisitNum   ActionType
aaaaaaa        1           John        Active        5           x
aaaaaaa        2                       Active                    y
aaaaaaa        3           John                      5           z
bbbbbbb        1                      NonActive                  w
bbbbbbb        2           Dan                       7           t

我想要一个访问数据表，如下所示:

I want to have a visits data table, as following:

VisitID  UserName   UserStatus   VisitNum
aaaaaaa   John       Active        5
bbbbbbb   Dan        NonActive     7

我创建了一个对数据表的子集(仅访问行)和一个字段起作用的函数，并且该函数应应用于多个字段(UserName，UserStatus，VisitNum).

I created a function that works on subset of data table (only the rows of the visit) and a field, and this function should be applied on several fields (UserName, UserStatus, VisitNum).

getGeneralField<- function(visitDT,field){
  vec = visitDT[,get(field)]
  return (unique(vec[vec != ""])[1])
}

问题是，当by = VisitID时，每次在.SD上应用此功能的尝试都会导致某些事情与我计划的有所不同...最好的方法是什么?我使用！="以避免空白单元格.

The problem is that every trial to apply this function on .SD when by=VisitID results in something different than I planned... What is the best way to do it? I used !="" in order to avoid blank cells.

数据表-在几列上应用相同的功能以创建新的数据表列 [英] Data table - apply the same function on several columns to create new data table columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

数据表-在几列上应用相同的功能以创建新的数据表列 [英] Data table - apply the same function on several columns to create new data table columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭