pandas 的间隔路口 [英] Interval intersection in pandas

查看:89
本文介绍了 pandas 的间隔路口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此功能已作为熊猫20.1的一部分发布(在我生日那天:)

This feature has been released as part of pandas 20.1 (on my birthday :] )

PR已合并!

公共关系已移至此处

似乎这个问题可能对.

It seems like this question may have contributed to re-opening the PR for IntervalIndex in pandas.

我不再遇到这个问题,因为我现在实际上正在查询AB的重叠范围,而不是B中属于A范围内的点,这是一个完整的间隔树问题.不过,我不会删除该问题,因为我认为这仍然是一个有效的问题,而且我的回答也不太好.

I no longer have this problem, since I'm actually now querying for overlapping ranges from A and B, not points from B which fall within ranges in A, which is a full interval tree problem. I won't delete the question though, because I think it's still a valid question, and I don't have a good answer.

我有两个数据框.

在数据帧A中,两个整数列加在一起表示一个间隔.

In dataframe A, two of the integer columns taken together represent an interval.

在数据帧B中,一个整数列代表一个位置.

In dataframe B, one integer column represents a position.

我想进行某种连接,以便将点分配给它们所属于的每个间隔.

I'd like to do a sort of join, such that points are assigned to each interval they fall within.

间隔很少,但偶尔会重叠.如果一个点落在该重叠范围内,则应将其分配给两个间隔.大约一半的点不会落在一个间隔内,但是几乎每个间隔都将在其范围内至少有一个点.

Intervals are rarely but occasionally overlapping. If a point falls within that overlap, it should be assigned to both intervals. About half of points won't fall within an interval, but nearly every interval will have at least one point within its range.

我最初打算将数据从熊猫中转出,并使用 intervaltree 印度榕树 gist .事实证明,shosho那里的想法从未使它变成大熊猫,但它让我开始思考-在大熊猫中这样做可能是可能的,并且由于我希望这段代码尽可能快地达到python的速度,所以我直到最后我都不会将我的数据从大熊猫中转储出来.我也觉得使用bins和熊猫 cut 函数,但是我是熊猫的新手,所以我可以使用一些指导!谢谢!

I was initially going to dump my data out of pandas, and use intervaltree or banyan or maybe bx-python but then I came across this gist. It turns out that the ideas shoyer has in there never made it into pandas, but it got me thinking -- it might be possible to do this within pandas, and since I want this code to be as fast as python can possibly go, I'd rather not dump my data out of pandas until the very end. I also get the feeling that this is possible with bins and pandas cut function, but I'm a total newbie to pandas, so I could use some guidance! Thanks!

可能相关? Pandas DataFrame组由可变长度的重叠间隔

推荐答案

此功能是作为熊猫20.1的一部分发布的

This feature is was released as part of pandas 20.1

这篇关于 pandas 的间隔路口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆