概率分布函数Python [英] Probability Distribution Function Python

查看:281
本文介绍了概率分布函数Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组原始数据,我必须确定这些数据的分布.绘制概率分布函数的最简单方法是什么?我已经尝试将其拟合为正态分布.

I have a set of raw data and I have to identify the distribution of that data. What is the easiest way to plot a probability distribution function? I have tried fitting it in normal distribution.

但是我更好奇地知道数据在内部携带哪种分布?

But I am more curious to know which distribution does the data carry within itself ?

我没有代码显示进度,因为我无法在python中找到任何可以测试数据集分布的函数.我不想对数据进行切片,并强迫其适合正态分布或偏斜分布.

I have no code to show my progress as I have failed to find any functions in python that will allow me to test the distribution of the dataset. I do not want to slice the data and force it to fit in may be normal or skew distribution.

有什么方法可以确定数据集的分布吗?任何建议表示赞赏.

Is any way to determine the distribution of the dataset ? Any suggestion appreciated.

这是正确的方法吗? 示例
这与我要寻找的东西很接近,但又使数据适合正态分布. 示例

Is this any correct approach ? Example
This is something close what I am looking for but again it fits the data into normal distribution. Example

输入项包含一百万行,简短示例如下所示

The input has million rows and the short sample is given below

Hashtag,Frequency
#Car,45
#photo,4
#movie,6
#life,1

频率范围从120,000计数,我正在尝试确定关键字频率的分布.我尝试绘制一个简单的直方图,但得到的输出为单个条形图.

The frequency ranges from 1 to 20,000 count and I am trying to identify the distribution of the frequency of the keywords. I tried plotting a simple histogram but I get the output as a single bar.

代码:

import pandas
import matplotlib.pyplot as plt


df = pandas.read_csv('Paris_random_hash.csv', sep=',')
plt.hist(df['Frequency'])
plt.show()

输出

推荐答案

这是显示直方图的最小工作示例.它仅能解决部分问题,但可以朝目标迈出一步.请注意,histogram函数为您提供了bin的两个角处的值,您必须进行插值才能获得中心值.

This is a minimal working example for showing a histogram. It only solves part of your question, but it can be a step towards your goal. Note that the histogram function gives you the values at the two corners of the bin and you have to interpolate to get the center value.

import numpy as np
import matplotlib.pyplot as pl

x = np.random.randn(10000)

nbins = 20

n, bins = np.histogram(x, nbins, density=1)
pdfx = np.zeros(n.size)
pdfy = np.zeros(n.size)
for k in range(n.size):
    pdfx[k] = 0.5*(bins[k]+bins[k+1])
    pdfy[k] = n[k]

pl.plot(pdfx, pdfy)

您可以使用以下示例显示数据:

You can fit your data using the example shown in:

使用Scipy(Python)将经验分布拟合为理论分布?

这篇关于概率分布函数Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆