如何有效计算另一列中每个元素的较小元素的数量? [英] How to efficiently count the number of smaller elements for every element in another column?

查看:52
本文介绍了如何有效计算另一列中每个元素的较小元素的数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 df

    name        created_utc
0   t1_cqug90j  1430438400
1   t1_cqug90k  1430438400
2   t1_cqug90z  1430438400
3   t1_cqug91c  1430438401
4   t1_cqug91e  1430438401
... ...         ...

其中 name 列仅包含唯一值.我想创建一个字典,其键与列 name 中的元素相同.每个这样的键的值是 created_utc 列中的元素数严格小于.我的预期结果是类似

in which column name contains only unique values. I would like to create a dictionary whose keys are the same elements as in column name. The value for each such a key is the number of elements in column created_utc strictly smaller than that of the key. My expected result is something like

{'t1_cqug90j': 6, 't1_cqug90k': 0, 't1_cqug90z': 3, ...} 

在这种情况下, created_utc 列中有6个元素严格小于,该数字小于1430438400,这是 t1_cqug90j 的对应值.我可以做循环来生成这样的字典.但是,在我的情况下,如果行数超过300万,则循环效率不高.

In this case, there are 6 elements in column created_utc strictly smaller than 1430438400, which is the corresponding value of t1_cqug90j. I can do the loop to generate such dictionary. However, the loop is not efficient in my case with more than 3 millions rows.

您能详细说明一下吗?

import pandas as pd
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/leanhdung1994/WebMining/main/df1.csv', header = 0)[['name', 'created_utc']]
df

更新:我发布了问题

Update: I posted the question How to efficiently count the number of larger elements for every elements in another column? and received a great answer there. However, I'm not able to modify the code into this case. It would be great if there is an efficient code that can be adapted for both cases, i.e. "strictly larger" and "strictly smaller".

推荐答案

我认为您需要 sort_index 进行 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆