如何从python中的数据帧行中提取具有特定长度的范围? [英] How to extract ranges with specific length from dataframe row in python?

查看:31
本文介绍了如何从python中的数据帧行中提取具有特定长度的范围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的数据框的前 10 列:

Here is a first 10 columns of my dataframe:

import pandas as pd

df = pd.DataFrame({
    '0': [373.60],
    '1': [442.83],
    '2': [259.21],
    '3': [293.05],
    '4': [332.79],
    '5': [360.03],
    '6': [676.55],
    '7': [481.67],
    '8': [486.59],
    '9': [561.65],
    '10': [491.75]})

等等,实际上我的 df 包含 100000 列.最小值为 109.59,最大值为 1703.35.

And so on, actually my df contains 100000 columns. Min is a 109.59, and max is a 1703.35.

我想将 df 切成长度为 3.98 的特定范围,然后定义一个包含最大数量值的 ragne.我的意思是,范围必须是这样的:

I want to slice df into specific ranges with length of 3.98, and then define a ragne that contain a maximum amount of values. I mean, the ranges must be like:

# converting df to array
df_array = np.array(df)

# defining ranges like:
range_length=3.98
range_1 = df_array.min() + range_length
range_2 = range_1 + range_lenght
...
range_n = df_array.max() - range_n-1

然后我看到一些 range_150 包含大约 1200 个值,这是我需要的最频繁的分布范围.

And then I see that some range_150 contains about 1200 values, which is a most frequent distribution range that I need.

然后我需要在我的 df 中定义该范围内每个值的索引..

And thet I need to define index of each value from that range in my df..

真的没有任何想法如何做到这一点.看起来需要创建几个函数.有人可以帮忙吗?

Really haven't any ideas how to do that. Looks like need create several functions. Can somebody help please?

推荐答案

这样你就可以得到每个范围的条目数:

Like this you get the number of entries for each range:

ranges = np.arange(df.T.min()[0] - 5, df.T.max()[0] + 5, 3.98) #added +5 to max and -5 to min to surely include them in the range
df_count = df.T.groupby(pd.cut(df.T[0], ranges)).count()
df_count

                  0
0                  
(254.21, 258.19]  0
(258.19, 262.17]  1
(262.17, 266.15]  0
(266.15, 270.13]  0
(270.13, 274.11]  0
              ..
(660.17, 664.15]  0
(664.15, 668.13]  0
(668.13, 672.11]  0
(672.11, 676.09]  0
(676.09, 680.07]  1
[107 rows x 1 columns]

这样你就可以得到点击次数最多的索引(范围):

Like this you can get the index (the range) with most hits:

df_count.idxmax()

0    (258.19, 262.17]
dtype: object

您可以像这样获取此范围内的条目:

You can get the entries which are in this range like this:

df.T[df.T[0].between(258.19, 262.17)]

        0
2  259.21

也许有帮助.

这篇关于如何从python中的数据帧行中提取具有特定长度的范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆