使用`pandas.cut()`,我如何获得整数箱并避免获得负的最低界限? [英] With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?

查看:90
本文介绍了使用`pandas.cut()`,我如何获得整数箱并避免获得负的最低界限?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框的最小值为零.我正在尝试使用 pandas.cut() precision include_lowest 参数,但我无法获得由整数组成的区间比浮点数小数点后一位.我也无法让最左边的间隔停在零.

My dataframe has zero as the lowest value. I am trying to use the precision and include_lowest parameters of pandas.cut(), but I can't get the intervals consist of integers rather than floats with one decimal. I can also not get the left most interval to stop at zero.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style='white', font_scale=1.3)

df = pd.DataFrame(range(0,389,8)[:-1], columns=['value'])
df['binned_df_pd'] = pd.cut(df.value, bins=7, precision=0, include_lowest=True)
sns.pointplot(x='binned_df_pd', y='value', data=df)
plt.xticks(rotation=30, ha='right')

我尝试将 precision 设置为-1、0和1,但是它们都输出一个十进制浮点数. pandas.cut()帮助确实提到x-min和x-max值扩展了x-range的0.1%,但我认为 include_lowest 可能以某种方式抑制这种行为.我目前的解决方法是导入numpy:

I have tried setting precision to -1, 0 and 1, but they all output one decimal floats. The pandas.cut() help does mention that the x-min and x-max values are extended with 0.1 % of the x-range, but I thought maybe include_lowest could suppress this behaviour somehow. My current workaround involves importing numpy:

import numpy as np

bin_counts, edges = np.histogram(df.value, bins=7)
edges = [int(x) for x in edges]
df['binned_df_np'] = pd.cut(df.value, bins=edges, include_lowest=True)

sns.pointplot(x='binned_df_np', y='value', data=df)
plt.xticks(rotation=30, ha='right')

是否有一种方法可以直接使用 pandas.cut()获得非负整数作为区间边界,而无需使用numpy?

Is there a way to obtain non-negative integers as the interval boundaries directly with pandas.cut() without using numpy?

我刚刚注意到,指定 right = False 会使最低间隔移到0而不是-0.4.似乎优先于 include_lowest ,因为与 right = False 结合使用时,更改后者不会产生任何可见效果.以下间隔仍指定为小数点.

I just noticed that specifying right=False makes the lowest interval shift to 0 rather than -0.4. It seems to take precedence over include_lowest, as changing the latter does not have any visible effect in combination with right=False. The following intervals are still specified with one decimal point.

推荐答案

您应专门设置 labels 参数

lower, higher = df['value'].min(), df['value'].max()
n_bins = 7

建立标签:

edges = range(lower, higher, (higher - lower)/n_bins) # the number of edges is 8
lbs = ['(%d, %d]'%(edges[i], edges[i+1]) for i in range(len(edges)-1)]

设置标签:

df['binned_df_pd'] = pd.cut(df.value, bins=n_bins, labels=lbs, include_lowest=True)

这篇关于使用`pandas.cut()`,我如何获得整数箱并避免获得负的最低界限?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆