在同一张图上绘制两个直方图,其列的总和为100 [英] Plot two histograms on the same graph and have their columns sum to 100

查看:111
本文介绍了在同一张图上绘制两个直方图,其列的总和为100的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在同一直方图上绘制两组不同大小的图像.但是,由于一组具有约330,000个值,而另一组具有约16,000个值,因此很难比较它们的频率直方图.我想绘制一个直方图,比较这两个集合,以便y轴是该bin中出现的百分比.我下面的代码接近于此,除了直方图的积分总和为1.0之外,而不是使各个bin值总和为1.0(这是因为normed = True参数).

I have two sets of different sizes that I'd like to plot on the same histogram. However, since one set has ~330,000 values and the other has about ~16,000 values, their frequency histograms are hard to compare. I'd like to plot a histogram comparing the two sets such that the y-axis is the % of occurrences in that bin. My code below gets close to this, except that rather than having the individual bin values sum to 1.0, the integral of the histogram sums to 1.0 (this is because of the normed=True parameter).

我如何实现我的目标?我已经尝试过手动计算%频率并使用plt.bar()进行比较,但并没有覆盖图形,而是并排比较了图形.我想保持alpha = 0.5的效果

How can I achieve my goal? I've already tried manually calculating the % frequency and using plt.bar() but rather than overlaying the plots, the plots are compared side by side. I want to keep the effect of having the alpha=0.5

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

if plt.get_fignums():
    plt.close('all')

electric = pd.read_csv('electric.tsv', sep='\t')
gas = pd.read_csv('gas.tsv', sep='\t')

electric_df = pd.DataFrame(electric)
gas_df = pd.DataFrame(ngma_nonheat)

electric = electric_df['avg_daily']*30
gas = gas_df['avg_daily']*30


## Create a plot for NGMA gas usage
plt.figure("Usage Comparison")

weights_electric = np.ones_like(electric)/float(len(electric))
weights_gas = np.ones_like(gas)/float(len(gas))

bins=np.linspace(0, 200, num=50)

n, bins, rectangles = plt.hist(electric, bins, alpha=0.5, label='electric usage', normed=True, weights=weights_electric)
plt.hist(gas, bins, alpha=0.5, label='gas usage', normed=True, weights=weights_gas)

plt.legend(loc='upper right')
plt.xlabel('Average 30 day use in therms')
plt.ylabel('% of customers')
plt.title('NGMA Customer Usage Comparison')
plt.show()

推荐答案

在这种情况下,听起来好像您不希望使用normed/density kwarg.您已经在使用weights.如果将权重乘以100,而忽略normed=True选项,则应该完全了解您的想法.

It sounds like you don't want the normed/density kwarg in this case. You're already using weights. If you multiply your weights by 100 and leave out the normed=True option, you should get exactly what you had in mind.

例如:

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)

x = np.random.normal(5, 2, 10000)
y = np.random.normal(2, 1, 3000000)

xweights = 100 * np.ones_like(x) / x.size
yweights = 100 * np.ones_like(y) / y.size

fig, ax = plt.subplots()
ax.hist(x, weights=xweights, color='lightblue', alpha=0.5)
ax.hist(y, weights=yweights, color='salmon', alpha=0.5)

ax.set(title='Histogram Comparison', ylabel='% of Dataset in Bin')
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()

另一方面,您当前正在执行的操作(weightsnormed)将导致(请注意y轴上的单位):

On the other hand, what you're currently doing (weights and normed) would result in (note the units on the y-axis):

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)

x = np.random.normal(5, 2, 10000)
y = np.random.normal(2, 1, 3000000)

xweights = 100 * np.ones_like(x) / x.size
yweights = 100 * np.ones_like(y) / y.size

fig, ax = plt.subplots()
ax.hist(x, weights=xweights, color='lightblue', alpha=0.5, normed=True)
ax.hist(y, weights=yweights, color='salmon', alpha=0.5, normed=True)

ax.set(title='Histogram Comparison', ylabel='% of Dataset in Bin')
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()

这篇关于在同一张图上绘制两个直方图,其列的总和为100的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆