PDF绘图问题 [英] PDF plotting concern

查看:31
本文介绍了PDF绘图问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了以下手动方法:

I tried the following manual approach:

dict = {'id': ['a','b','c','d'], 'testers_time': [10, 30, 15, None], 'stage_1_to_2_time': [30, None, 30, None], 'activated_time' : [40, None, 45, None],'stage_2_to_3_time' : [30, None, None, None],'engaged_time' : [70, None, None, None]} 
df = pd.DataFrame(dict, columns=['id', 'testers_time', 'stage_1_to_2_time', 'activated_time', 'stage_2_to_3_time', 'engaged_time'])

df= df.dropna(subset=['testers_time']).sort_values('testers_time')

prob = df['testers_time'].value_counts(normalize=True)
print(prob)
#0.333333,  0.333333,  0.333333
plt.plot(df['testers_time'], prob, marker='.', linestyle='-') 

plt.show()

我尝试了在 stackoverflow 上找到的以下方法:

And I tried the following approach I found on stackoverflow:

dict = {'id': ['a','b','c','d'], 'testers_time': [10, 30, 15, None], 'stage_1_to_2_time': [30, None, 30, None], 'activated_time' : [40, None, 45, None],'stage_2_to_3_time' : [30, None, None, None],'engaged_time' : [70, None, None, None]} 
df = pd.DataFrame(dict, columns=['id', 'testers_time', 'stage_1_to_2_time', 'activated_time', 'stage_2_to_3_time', 'engaged_time'])

df= df.dropna(subset=['testers_time']).sort_values('testers_time')

fit = stats.norm.pdf(df['testers_time'], np.mean(df['testers_time']), np.std(df['testers_time']))  
print(fit)
#0.02902547,  0.04346777,  0.01829513]
plt.plot(df['testers_time'], fit, marker='.', linestyle='-')
plt.hist(df['testers_time'], normed='true')      

plt.show()

如您所见,我得到了完全不同的值-概率对于#1是正确的,但对于#2则不是(也不累加到100%),直方图的y轴(%)基于6个容器,而不是3个.

As you can see I get completely different values- the probabilities are correct for #1, but for #2 they aren't (nor do they add up to 100%), and the y axis (%) of the histogram is based on 6 bins, not 3.

你能解释一下我如何获得 #2 的正确概率吗?

Can you explain how I can get the right probability for #2?

推荐答案

第一种方法为您提供概率质量函数.第二个参数为您提供概率密度-因此命名为概率密度函数(pdf).因此,两者都是正确的,只是表现出一些不同.

The first approach gives you a probability mass function. The second gives you a probability density - hence the name probability density function (pdf). Hence both are correct, they just show something different.

如果您在更大范围内(例如,标准偏差的 10 倍)评估 pdf,它将看起来很像预期的高斯曲线.

If you evaluate the pdf over a larger range (e.g. 10 times the standard deviation), it will look much like an expected gaussian curve.

import pandas as pd
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt

dict = {'id': ['a','b','c','d'], 'testers_time': [10, 30, 15, None], 'stage_1_to_2_time': [30, None, 30, None], 'activated_time' : [40, None, 45, None],'stage_2_to_3_time' : [30, None, None, None],'engaged_time' : [70, None, None, None]} 
df = pd.DataFrame(dict, columns=['id', 'testers_time', 'stage_1_to_2_time', 'activated_time', 'stage_2_to_3_time', 'engaged_time'])

df= df.dropna(subset=['testers_time']).sort_values('testers_time')

mean = np.mean(df['testers_time'])
std = np.std(df['testers_time'])
x = np.linspace(mean - 5*std, mean + 5*std)

fit = stats.norm.pdf(x, mean, std)  
print(fit)

plt.plot(x, fit, marker='.', linestyle='-')
plt.hist(df['testers_time'], normed='true')      

plt.show()

这篇关于PDF绘图问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆