如何在 Python 中将数据帧的一列拟合为泊松分布 [英] How to fit a column of a dataframe into poisson distribution in Python

查看:138
本文介绍了如何在 Python 中将数据帧的一列拟合为泊松分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试找到一种方法,使我的一些列(包含用户点击数据)适合 python 中的泊松分布.这些列(例如,click_website_1、click_website_2)可能包含从 1 到数千的值.我正在尝试这样做,因为某些 资源:

I have been trying to find a way to fit some of my columns (that contains user click data) to poisson distribution in python. These columns (e.g., click_website_1, click_website_2) may contain a value ranging from 1 to thousands. I am trying to do this as it is recommended by some resources:

我们建议不要分析计数数据对它进行对数转换,而是基于泊松和负数的模型应该使用二项分布.

We recommend that count data should not be analysed by log-transforming it, but instead models based on Poisson and negative binomial distributions should be used.

我在 scipynumpy 中找到了一些方法,但是这些方法似乎生成了一些具有泊松分布的随机数.但是,我感兴趣的是将我自己的数据拟合到泊松分布.在 Python 中执行此操作的任何库建议?

I found some methods in scipy and numpy, but these methods seem to generate some random numbers that have poisson distribution. However, what I am interested in is to fit my own data to poisson distribution. Any library suggestions to do this in Python?

推荐答案

这里提供了一种快速检查数据是否遵循泊松分布的方法.您在假设下绘制了它遵循泊松分布与速率参数 lambda = data.mean()

Here is a quick way to check if your data follows a poisson distribution. You plot the under the assumption that it follows a poisson distribution with rate parameter lambda = data.mean()

import numpy as np
from scipy.misc import factorial


def poisson(k, lamb):
    """poisson pdf, parameter lamb is the fit parameter"""
    return (lamb**k/factorial(k)) * np.exp(-lamb)

# lets collect clicks since we are going to need it later
clicks = df["clicks_website_1"] 

这里我们使用 pmf 进行possion 分布.

Here we use the pmf for possion distribution.

现在让我们从数据做一些建模(click_website_one)我们将使用 MLE 估计泊松参数,事实证明这只是平均值

Now lets do some modeling, from data (click_website_one) we'll estimate the the poisson parameter using the MLE, which turns out to be just the mean

lamb = clicks.mean()

# plot the pmf using lamb as as an estimate for `lambda`. 
# let sort the counts in the columns first.

clicks.sort().apply(poisson, lamb).plot()

这篇关于如何在 Python 中将数据帧的一列拟合为泊松分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆