将字符串的Pandas DataFrame转换为直方图 [英] Turn Pandas DataFrame of strings into histogram

查看:353
本文介绍了将字符串的Pandas DataFrame转换为直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个这样创建的DataFrame:

Suppose I have a DataFrame of created like this:

import pandas as pd
s1 = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
s2 = pd.Series(['a', 'f', 'a', 'd', 'a', 'f', 'f'])
d = pd.DataFrame({'s1': s1, 's2', s2})

实际数据中的字符串有很多稀疏性.我想为字符串的出现创建直方图,看起来像是d.hist()(例如,带有子图)针对s1和s2(每个子图一个)生成的.

There is quite a lot of sparsity in the strings in the real data. I would like to create histograms of the occurrence of strings that looks like what is generated by d.hist() (eg. with subplots) for s1 and s2 (one per subplot).

仅执行d.hist()就会出现此错误:

Just doing d.hist() gives this error:

/Library/Python/2.7/site-packages/pandas/tools/plotting.pyc in hist_frame(data, column, by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, **kwds)
   1725         ax.xaxis.set_visible(True)
   1726         ax.yaxis.set_visible(True)
-> 1727         ax.hist(data[col].dropna().values, **kwds)
   1728         ax.set_title(col)
   1729         ax.grid(grid)

/Library/Python/2.7/site-packages/matplotlib/axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   8099             # this will automatically overwrite bins,
   8100             # so that each histogram uses the same bins
-> 8101             m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
   8102             if mlast is None:
   8103                 mlast = np.zeros(len(bins)-1, m.dtype)

/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/function_base.pyc in histogram(a, bins, range, normed, weights, density)
    167             else:
    168                 range = (a.min(), a.max())
--> 169         mn, mx = [mi+0.0 for mi in range]
    170         if mn == mx:
    171             mn -= 0.5

TypeError: cannot concatenate 'str' and 'float' objects

我想我可以手动遍历每个系列,执行value_counts(),然后将其绘制为条形图,然后手动创建子图.我想检查是否有更简单的方法.

I suppose I could manually go through each series, do a value_counts(), then plot it as a bar plot, and manually create the subplots. I wanted to check if there is a simpler way.

推荐答案

重新创建数据框:

import pandas as pd
s1 = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
s2 = pd.Series(['a', 'f', 'a', 'd', 'a', 'f', 'f'])
d = pd.DataFrame({'s1': s1, 's2': s2})

要根据需要获得带有子图的直方图:

To get the histogram with subplots as desired:

d.apply(pd.value_counts).plot(kind='bar', subplots=True)

OP在问题中提到了pd.value_counts.我认为所缺少的只是没有理由手动"创建所需的条形图.

The OP mentioned pd.value_counts in the question. I think the missing piece is just that there is no reason to "manually" create the desired bar plot.

d.apply(pd.value_counts)的输出是一个熊猫数据框.我们可以像绘制其他任何数据框一样绘制值,然后选择选项subplots=True即可提供所需的内容.

The output from d.apply(pd.value_counts) is a pandas dataframe. We can plot the values like any other dataframe, and selecting the option subplots=True gives us what we want.

这篇关于将字符串的Pandas DataFrame转换为直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆