按值差异(时间戳)将列拆分为 N 组 [英] split a column into N groups by value differences (timestamp)

查看：47 发布时间：2021/6/14 18:34:34 python pandas pandas-groupby

本文介绍了按值差异(时间戳)将列拆分为 N 组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

.csv 格式的样本数据

Sample data in .csv format

| No.|   IP     |      Unix_time     |    # integer unix time
| 1  | 1.1.1.1  |     1563552000     |    # equivalent to 12:00:00 AM
| 2  | 1.1.1.1  |     1563552030     |    # equivalent to 12:00:30 AM
| 3  | 1.1.1.1  |     1563552100     |    # equivalent to 12:01:40 AM
| 4  | 1.1.1.1  |     1563552110     |    # equivalent to 12:01:50 AM
| 5  | 1.1.1.1  |     1563552180     |    # equivalent to 12:03:00 AM
| 6  | 1.2.3.10 |     1563552120     |

这是使用 pandas groupby( ) 和 get_group( ) 函数的当前工作代码:

Here's the current working code using pandas groupby( ) and get_group( ) functions:

data = pd.read_csv(some_path, header=0)
root = data.groupby('IP')

for a in root.groups.keys():
    t = root.get_group(a)['Unix_time']
    print(a + 'has' + t.count() + 'record')

您将看到以下结果:

1.1.1.1 has 5 record
1.2.3.10 has 1 record

现在，我想根据上述代码进行一些改进.

Now, I want some improvement based on above code.

对于相同的IP值(例如1.1.1.1)，我想根据最大时间间隔(例如60秒)进一步子组，并计算每个子组中元素的数量.例如，在上面的示例数据中:

For the same IP value (e.g., 1.1.1.1), I want to make further sub-groups based on a maximum time interval (e.g., 60 seconds), and count the number of elements in each sub-group. For example, in above sample data:

从第 1 行开始:第 2 行 Unix_time 值在 60 秒内，但第 3 行超过 60 秒.

Start from row 1: row 2 Unix_time value is within 60 seconds, but row 3 is beyond 60 seconds.

因此，第 1-2 行是一个组，第 3-4 行是一个单独的组，第 5 行是一个单独的组.换句话说，组1.1.1.1"现在有 3 个子组.结果应该是:

Thus, row 1-2 is a group, row 3-4 is a separate group, row 5 is a separate group. In other words, group '1.1.1.1' has 3 sub-groups now. The result should be:

1.1.1.1 start time 1563552000 has 2 record within 60 secs
1.1.1.1 start time 1563552100 has 2 record within 60 secs
1.1.1.1 start time 1563552150 has 1 record within 60 secs
1.2.3.10 start time 1563552120 has 1 record within 60 secs

如何制作?

使用root"和整数:

root = df.groupby(['IP',df['Unix_time']//60])

for n, g in root:
     print(f'{n[0]} start time {g.iloc[0, g.columns.get_loc("Unix_time")]} has {len(g)} records within 60 secs')

输出:

1.1.1.1  start time 1563552000 has 2 records within 60 secs
1.1.1.1  start time 1563552100 has 2 records within 60 secs
1.1.1.1  start time 1563552150 has 1 records within 60 secs
1.2.3.10 start time 1563552120 has 1 records within 60 secs

这篇关于按值差异(时间戳)将列拆分为 N 组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按值差异(时间戳)将列拆分为 N 组 [英] split a column into N groups by value differences (timestamp)

问题描述

推荐答案

使用root"和整数:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

按值差异(时间戳)将列拆分为 N 组 [英] split a column into N groups by value differences (timestamp)

问题描述

推荐答案

使用root"和整数:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭