将字符串的Pandas DataFrame转换为直方图 [英] Turn Pandas DataFrame of strings into histogram
问题描述
假设我有一个这样创建的DataFrame:
Suppose I have a DataFrame of created like this:
import pandas as pd
s1 = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
s2 = pd.Series(['a', 'f', 'a', 'd', 'a', 'f', 'f'])
d = pd.DataFrame({'s1': s1, 's2', s2})
实际数据中的字符串有很多稀疏性.我想为字符串的出现创建直方图,看起来像是d.hist()(例如,带有子图)针对s1和s2(每个子图一个)生成的.
There is quite a lot of sparsity in the strings in the real data. I would like to create histograms of the occurrence of strings that looks like what is generated by d.hist() (eg. with subplots) for s1 and s2 (one per subplot).
仅执行d.hist()就会出现此错误:
Just doing d.hist() gives this error:
/Library/Python/2.7/site-packages/pandas/tools/plotting.pyc in hist_frame(data, column, by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, **kwds)
1725 ax.xaxis.set_visible(True)
1726 ax.yaxis.set_visible(True)
-> 1727 ax.hist(data[col].dropna().values, **kwds)
1728 ax.set_title(col)
1729 ax.grid(grid)
/Library/Python/2.7/site-packages/matplotlib/axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
8099 # this will automatically overwrite bins,
8100 # so that each histogram uses the same bins
-> 8101 m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
8102 if mlast is None:
8103 mlast = np.zeros(len(bins)-1, m.dtype)
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/function_base.pyc in histogram(a, bins, range, normed, weights, density)
167 else:
168 range = (a.min(), a.max())
--> 169 mn, mx = [mi+0.0 for mi in range]
170 if mn == mx:
171 mn -= 0.5
TypeError: cannot concatenate 'str' and 'float' objects
我想我可以手动遍历每个系列,执行value_counts()
,然后将其绘制为条形图,然后手动创建子图.我想检查是否有更简单的方法.
I suppose I could manually go through each series, do a value_counts()
, then plot it as a bar plot, and manually create the subplots. I wanted to check if there is a simpler way.
推荐答案
重新创建数据框:
import pandas as pd
s1 = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
s2 = pd.Series(['a', 'f', 'a', 'd', 'a', 'f', 'f'])
d = pd.DataFrame({'s1': s1, 's2': s2})
要根据需要获得带有子图的直方图:
To get the histogram with subplots as desired:
d.apply(pd.value_counts).plot(kind='bar', subplots=True)
OP在问题中提到了pd.value_counts
.我认为所缺少的只是没有理由手动"创建所需的条形图.
The OP mentioned pd.value_counts
in the question. I think the missing piece is just that there is no reason to "manually" create the desired bar plot.
d.apply(pd.value_counts)
的输出是一个熊猫数据框.我们可以像绘制其他任何数据框一样绘制值,然后选择选项subplots=True
即可提供所需的内容.
The output from d.apply(pd.value_counts)
is a pandas dataframe. We can plot the values like any other dataframe, and selecting the option subplots=True
gives us what we want.
这篇关于将字符串的Pandas DataFrame转换为直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!