如何在python中实现概率分布的合并? [英] How to implement Conflation for probability distribution in python?

查看:108
本文介绍了如何在python中实现概率分布的合并?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在网上寻找将几个连续概率分布合并为一个连续概率分布的方法.此方法称为合并",可以在下面的文章中找到:

说我有大约4个列表,例如4个范数分布,

  list_1 = [5、8、6、2、1]list_2 = [2,6,1,3,8]list_3 = [1、9、2、7、5]list_4 = [3,2,4,1,6] 

并实施合并",结果列表将变为

  Con_list = [2.73,34.56,3.69,3.23,12] 

(如果我错了,请纠正我)

如何将照片中的两个方程式实现为python以获得输入的PDF分布的合并?

关于平均列表,我发现了堆栈流问题,代码如下,

  def平均(l):llen = len(l)def除法(x):返回x/llen#返回map(divide,map(sum,zip(* l)))返回地图(划分,地图(总和,邮编(l))) 

我一直在尝试重新编码此函数,以遵循上面的公式,但我找不到一种方法来获取pdf连续分布的方法.

根据 @Josh Purtell 的回答,我重新编写了代码,但是,我继续收到以下错误消息:

错误消息:

  Traceback(最近一次通话最近):< module>中的文件"/tmp/sessions/c903d99d60f20c3b/main.py"第72行.graph = conflate_pdf(domain,dists,lb,ub)conflate_pdf中的文件"/tmp/sessions/c903d99d60f20c3b/main.py",第58行单位=四(prod_pdf,lb,ub,args =(dists))[0]在四元组中的文件"/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py",第341行点)_quad中的文件"/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py",第448行return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)TypeError:只有大小为1的数组可以转换为Python标量 

代码:

  def prod_pdf(x,pdfs):prod = np.ones(pdfs [0] .shape [0])对于pdf中的pdf:prod =产品* pdf返回产品def conflate_pdf(x,dists,lb,ub):单位=四(prod_pdf,lb,ub,args =(dists))[0]返回prod_pdf(x,dists)/denom磅= -10ub = 10domain = np.arange(lb,ub,.01)dist_1 = stats.norm.pdf(domain,2,1)dist_2 = stats.norm.pdf(domain,2.5,1.5)dist_3 = stats.norm.pdf(domain,2.2,1.6)dist_4 = stats.norm.pdf(domain,2.4,1.3)dist_5 = stats.norm.pdf(domain,2.7,1.5)dists = [dist_1,dist_2,dist_3,dist_4,dist_5]graph = conflate_pdf(domain,dists,lb,ub)从matplotlib导入pyplot作为pltplt.plot(domain,dist_1)plt.plot(domain,dist_2)plt.plot(domain,dist_3)plt.plot(domain,dist_4)plt.plot(domain,dist_5)plt.plot(域,图)plt.xlabel(域")plt.ylabel("pdf")plt.title(合并的PDF")plt.show() 

从代码中,是什么导致此错误?

我设法重写了代码以查找分发列表,而不是在 Edit 1 中的product函数中获取了pdf,但是,在 Edit中,我仍然遇到相同的错误1 .

代码:

  def prod_pdf(x,pdfs):prod = np.ones(np.array(pdfs)[0] .shape)对于pdf中的pdf:打印(产品)对于c,y的枚举(pdf):prod [c] = prod [c] * yprint('final:',prod)返回产品def conflate_pdf(x,dists,lb,ub):单位=四(prod_pdf,lb,ub,args =(dists))[0]print('Denom:',denom)打印('product pdf:',prod_pdf(x,dists))conflated_pdf = prod_pdf(x,dists)/面打印(conflated_pdf)返回conflated_pdf磅= -10ub = 10domain = np.arange(lb,ub,.01)dist_1 = st.norm.pdf(domain,2,1)dist_2 = st.norm.pdf(domain,2.5,1.5)dist_3 = st.norm.pdf(domain,2.2,1.6)dist_4 = st.norm.pdf(domain,2.4,1.3)dist_5 = st.norm.pdf(domain,2.7,1.5)从matplotlib导入pyplot作为pltplt.plot(domain,dist_1,'r')plt.plot(domain,dist_2,'g')plt.plot(domain,dist_3,'b')plt.plot(domain,dist_4,'y')plt.plot(domain,dist_5,'c')dists = [dist_1,dist_2,dist_3,dist_4,dist_5]graph = conflate_pdf(domain,dists,lb,ub)plt.plot(domain,graph,'m')plt.xlabel(域")plt.ylabel("pdf")plt.title(合并的PDF")plt.show() 

修改3:

我试图运行以下代码(基于来自 @Josh Purtell 的回答),但是,我不断获取一个变量,它在乘积函数后获取了整个数组,并且产生了相同的错误有关size-1数组的消息.看到下面的代码以及部分输出:

代码:

来自scipy的

  .integrate导入quad从scipy进口统计将numpy导入为npdef prod_pdf(x,dists):p_pdf = 1print('Incoming Array:',p_pdf)对于dist中的dist:p_pdf = p_pdf * dist打印('final:',p_pdf)返回p_pddef conflate_pdf(x,dists,lb,ub):打印('输入产品pdf:',prod_pdf(x,dists))单位=四(prod_pdf,lb,ub,args =(dists,))[0]#denom = simps(prod_pdf)#denom = nquad(func =(prod_pdf),范围=([[lb,ub]),args =(dists,))[0]print('Denom:',denom)conflated_pdf = prod_pdf(x,dists)/面打印('Conflated PDF:',conflated_pdf)返回conflated_pdf磅= -10ub = 10domain = np.arange(lb,ub,.01)dist_1 = st.norm.pdf(domain,2,1)dist_2 = st.norm.pdf(domain,2.5,1.5)dist_3 = st.norm.pdf(domain,2.2,1.6)dist_4 = st.norm.pdf(domain,2.4,1.3)dist_5 = st.norm.pdf(domain,2.7,1.5)从matplotlib导入pyplot作为pltplt.xlabel(域")plt.ylabel("pdf")plt.title(合并的PDF")plt.legend()plt.plot(domain,dist_1,'r',label ='Dist.1')plt.plot(domain,dist_2,'g',label ='Dist.2')plt.plot(domain,dist_3,'b',label ='Dist.3')plt.plot(domain,dist_4,'y',label ='Dist.4')plt.plot(domain,dist_5,'c',label ='Dist.5')dists = [dist_1,dist_2,dist_3,dist_4,dist_5]print('分发列表:\ n',dists)graph = conflate_pdf(domain,dists,lb,ub)plt.plot(domain,graph,'m',label ='Confulated Dist.')plt.show() 

这是输出的一小部分:

 传入数组:1最终版:[2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-155.92770938e-15 5.47278628e-15]最终版:[4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-216.27970218e-21 5.60806584e-21]最终版:[2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-261.14952951e-26 9.95834610e-27]最终版:[1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-341.46198598e-34 1.21092834e-34]最终版:[1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-402.98464396e-40 2.39335035e-40]输入产品pdf:[1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-402.98464396e-40 2.39335035e-40]传入数组:1最终版:[2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-155.92770938e-15 5.47278628e-15]最终版:[4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-216.27970218e-21 5.60806584e-21]最终版:[2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-261.14952951e-26 9.95834610e-27]最终版:[1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-341.46198598e-34 1.21092834e-34]最终版:[1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-402.98464396e-40 2.39335035e-40] 

我设法在 Edit 3 中研究了实现相同方法的代码,我在代码中编辑了该代码,该代码从每个分布中获取了第一个变量,但是在循环的其余部分中,打印相同的值,它不会转到列表中的下一个值,并且Conflated分布是单个变量.看到下面的代码以及部分输出:

代码:

来自scipy的

  .integrate import quad从scipy进口统计将numpy导入为npdef prod_pdf(x,dists):p_pdf = 1print('Incoming Array:',p_pdf)对于c,dist中的enumerate(dists):p_pdf = p_pdf * dist [E]打印('final:',p_pdf)返回p_pdfdef conflate_pdf(x,dists,lb,ub):打印('输入产品pdf:',prod_pdf(x,dists))单位=四(prod_pdf,lb,ub,args =(dists,))[0]#denom = simps(prod_pdf)#denom = nquad(func =(prod_pdf),范围=([[lb,ub]),args =(dists,))[0]print('Denom:',denom)conflated_pdf = prod_pdf(x,dists)/面打印('Conflated PDF:',conflated_pdf)返回conflated_pdf磅= -10ub = 10domain = np.arange(lb,ub,.01)dist_1 = st.norm.pdf(domain,2,1)dist_2 = st.norm.pdf(domain,2.5,1.5)dist_3 = st.norm.pdf(domain,2.2,1.6)dist_4 = st.norm.pdf(domain,2.4,1.3)dist_5 = st.norm.pdf(domain,2.7,1.5)从matplotlib导入pyplot作为pltplt.xlabel(域")plt.ylabel("pdf")plt.title(合并的PDF")plt.legend()plt.plot(domain,dist_1,'r',label ='Dist.1')plt.plot(domain,dist_2,'g',label ='Dist.2')plt.plot(domain,dist_3,'b',label ='Dist.3')plt.plot(domain,dist_4,'y',label ='Dist.4')plt.plot(domain,dist_5,'c',label ='Dist.5')dists = [dist_1,dist_2,dist_3,dist_4,dist_5]print('分发列表:\ n',dists)graph = conflate_pdf(domain,dists,lb,ub)plt.plot(domain,graph,'m',label ='Confulated Dist.')plt.show() 

输出的一部分:

 传入数组:1的最后:2.1463837356630605e-32的最终版本:5.0231307782193034e-48决赛:3.266239495519432e-61的最后:2.187514996217005e-81的最后:1.979657878680375e-97传入数组:1的最后:2.1463837356630605e-32的最终版本:5.0231307782193034e-48决赛:3.266239495519432e-61的最后:2.187514996217005e-81的最后:1.979657878680375e-97面元:3.95931575736075e-96传入数组:1的最后:2.1463837356630605e-32的最终版本:5.0231307782193034e-48决赛:3.266239495519432e-61的最后:2.187514996217005e-81的最后:1.979657878680375e-97混淆的PDF:0.049999999999999996 

我实现了以下代码,而且似乎可以正常运行,而且,我设法将 quad 的问题解决了,好像我将 quad 更改为fixed_quad 并规范化pdf列表.我会得到相同的结果.这是以下代码:

 将scipy.stats导入为st将numpy导入为np将scipy.stats导入为st导入matplotlib.pyplot作为plt从sklearn.preprocessing导入MinMaxScaler,Normalizer,normalize,StandardScaler从scipy.integrate导入四边形,simps,quad_vec,nquad,累积梯形从scipy.integrate导入romberg,梯形,辛普森,罗姆从scipy.integrate导入fixed_quad,正交,quad_explain从scipy进口统计导入时间def user_prod_pdf(x,dists):p_list = []p_pdf = 1print('Incoming Array:',p_pdf)对于dist中的dist:print('Incoming Distribution Array:',dist.pdf(x))p_pdf = p_pdf * dist.pdf(x)打印('产品PDF:',p_pdf)p_list.append(p_pdf)打印(最终产品PDF:",p_pdf)打印(产品PDF列表:',p_list)返回p_pdfdef user_conflate_pdf(x,dists,lb,ub):打印('输入产品pdf:',user_prod_pdf(x,dists))单位=平方(user_prod_pdf,lb,ub,args =(dists,))[0]print('Denom:',denom)conflated_pdf = user_prod_pdf(x,dists)/面额打印('Conflated PDF:',conflated_pdf)返回conflated_pdfdef user_conflate_pdf_2(pdfs):"计算给定pdf的合并.[ARGS]-pdf:PDF的numpy形状数组(n,x)其中n是PDF的数量x是变量空间.[返回]一维标准化归一化合并PDF数组."#合并合并= np.array(pdfs).prod(axis = 0)#规范化conflation/= conflation.sum()返回合并def my_product_pdf(x,dists):p_list = []p_pdf = 1print('Incoming Array:',p_pdf)list_full_size = np.array(dists).shapeprint('完整列表大小:',list_full_size)打印(列表大小:",list_full_size [0])对于x范围(list_full_size [1]):p_pdf = 1对于范围内的y(list_full_size [0]):p_pdf = float(p_pdf)* dists [y] [x]print('产品值:',p_pdf)打印('产品PDF:',p_pdf)p_list.append(p_pdf)打印(最终产品PDF:",p_pdf)打印(产品PDF列表:',p_list)#返回p_pdf返回p_list#返回np.array(p_list)def my_conflate_pdf(x,dists,lb,ub):打印('\ n')#print('product pdf:',prod_pdf(x,dists))打印('product pdf:',my_product_pdf(x,dists))denom = fixed_quad(my_product_pdf,lb,ub,args =(dists,),n = 1)[0]print('Denom:',denom)#conflated_pdf = prod_pdf(x,dists)/denomconflated_pdf =我的产品_pdf(x,dists)/面额#conflated_pdf = [zip中i,j的i/j(my_product_pdf(x,dists),denom)]打印('Conflated PDF:',conflated_pdf)返回conflated_pdf磅= -10ub = 10domain = np.arange(lb,ub,.01)#dist_1 = st.norm(2,1)#dist_2 = st.norm(2.5,1.5)#dist_3 = st.norm(2.2,1.6)#dist_4 = st.norm(2.4,1.3)#dist_5 = st.norm(2.7,1.5)#dist_1_pdf = st.norm.pdf(域,2,1)#dist_2_pdf = st.norm.pdf(domain,2.5,1.5)#dist_3_pdf = st.norm.pdf(domain,2.2,1.6)#dist_4_pdf = st.norm.pdf(domain,2.4,1.3)#dist_5_pdf = st.norm.pdf(domain,2.7,1.5)#dist_1_pdf/= dist_1_pdf.sum()#dist_2_pdf/= dist_2_pdf.sum()#dist_3_pdf/= dist_3_pdf.sum()#dist_4_pdf/= dist_4_pdf.sum()#dist_5_pdf/= dist_5_pdf.sum()dist_1 = st.norm(2,1)dist_2 = st.norm(4,2)dist_3 = st.norm(7,4)dist_4 = st.norm(2.4,1.3)dist_5 = st.norm(2.7,1.5)dist_1_pdf = st.norm.pdf(域,2,1)dist_2_pdf = st.norm.pdf(域,4,2)dist_3_pdf = st.norm.pdf(domain,7,4)dist_4_pdf = st.norm.pdf(domain,2.4,1.3)dist_5_pdf = st.norm.pdf(domain,2.7,1.5)#dist_1_pdf/= dist_1_pdf.sum()#dist_2_pdf/= dist_2_pdf.sum()#dist_3_pdf/= dist_3_pdf.sum()#dist_4_pdf/= dist_4_pdf.sum()#dist_5_pdf/= dist_5_pdf.sum()#用户:plt.xlabel(域")plt.ylabel("pdf")plt.title(用户合并PDF")plt.plot(domain,dist_1_pdf,'r',label ='Dist.1')plt.plot(domain,dist_2_pdf,'g',label ='Dist.2')plt.plot(domain,dist_3_pdf,'b',label ='Dist.3')plt.plot(domain,dist_4_pdf,'y',label ='Dist.4')plt.plot(domain,dist_5_pdf,'c',label ='Dist.5')dists = [dist_1,dist_2,dist_3,dist_4,dist_5]user_graph = user_conflate_pdf(domain,dists,lb,ub)打印(最终合并PDF:",user_graph)#user_graph/= user_graph.sum()plt.plot(domain,user_graph,'m',label ='Conflated PDF')plt.legend()plt.show()#用户2:plt.xlabel(域")plt.ylabel("pdf")plt.title(用户已归并PDF 2")plt.plot(domain,dist_1_pdf,'r',label ='Dist.1')plt.plot(domain,dist_2_pdf,'g',label ='Dist.2')plt.plot(domain,dist_3_pdf,'b',label ='Dist.3')plt.plot(domain,dist_4_pdf,'y',label ='Dist.4')plt.plot(domain,dist_5_pdf,'c',label ='Dist.5')dists = [dist_1_pdf,dist_2_pdf,dist_3_pdf,dist_4_pdf,dist_5_pdf]user_graph = user_conflate_pdf_2(dists)print('最终用户合并PDF 2:',user_graph)#user_graph/= user_graph.sum()plt.plot(domain,user_graph,'m',label ='Conflated PDF')plt.legend()plt.show()#我的代码:#从matplotlib导入pyplot作为pltplt.xlabel(域")plt.ylabel("pdf")plt.title(我的合并PDF代码")plt.plot(domain,dist_1_pdf,'r',label ='Dist.1')plt.plot(domain,dist_2_pdf,'g',label ='Dist.2')plt.plot(domain,dist_3_pdf,'b',label ='Dist.3')plt.plot(domain,dist_4_pdf,'y',label ='Dist.4')plt.plot(domain,dist_5_pdf,'c',label ='Dist.5')dists = [dist_1_pdf,dist_2_pdf,dist_3_pdf,dist_4_pdf,dist_5_pdf]my_graph = my_conflate_pdf(domain,dists,lb,ub)打印('最终合并PDF:',my_graph)my_graph/= np.array(my_graph).sum()#my_graph = inverse_normalise(my_graph)plt.plot(domain,my_graph,'m',label ='Conflated PDF')plt.legend()plt.show()#合并的PDF:print('User Confulated PDF:',user_graph)打印('我的合并PDF:',np.array(my_graph)) 

以下是输出:

我在这里的问题是,我了解我需要对PDF列表进行规范化.但是,说我没有对PDF进行规范化,如何修改我的合并代码以得到下面的图?

要获得上面的图和我的混淆代码:

 #user_graph/= user_graph.sum()#dist_1_pdf/= dist_1_pdf.sum()#dist_2_pdf/= dist_2_pdf.sum()#dist_3_pdf/= dist_3_pdf.sum()#dist_4_pdf/= dist_4_pdf.sum()#dist_5_pdf/= dist_5_pdf.sum() 

我的归一化代码图没有标准化:

解决方案

免责声明:很可能我会误解您或论文作者,在这种情况下,请建议对此答案进行修改.

这是我认为合并可能看起来很简单的,不是特别有效的实现

  ## define离散RV X的pdf x = {1,2,3,4}将numpy导入为npdef mult_list(pdfs):prod = np.ones(pdfs [0] .shape [0])对于pdf中的pdf:prod =产品* pdf返回产品def conflate(pdfs):返回mult_list(pdfs)/sum(mult_list(pdfs))pdf_1 = np.array([.25,.25,.25,.25])pdf_2 = np.array([.33,.33,.33,.00])pdf_3 = np.array([.25,.12,.13,.50])打印(合并([pdf_1,pdf_2,pdf_3])) 

产生合并后的pdf

 >>>[0.5 0.24 0.26 0.] 

通过了粗略的嗅探测试.

在事物的连续方面,以上内容翻译为

来自scipy的

  .integrate导入四元组从scipy进口统计将numpy导入为npdef prod_pdf(x,dists):p_pdf = 1对于dist中的dist:p_pdf = p_pdf * dist.pdf(x)返回p_pdfdef conflate_pdf(x,dists,lb,ub):单位=四(prod_pdf,lb,ub,args =(dists))[0]返回prod_pdf(x,dists)/denomdists = [stats.norm(2,1),stats.norm(4,2)]磅= -10ub = 10domain = np.arange(lb,ub,.01)graph = conflate_pdf(domain,dists,lb,ub)从matplotlib导入pyplot作为pltplt.plot(域,图)plt.xlabel(域")plt.ylabel("pdf")plt.title(合并的PDF")plt.show()plt.savefig("conflatedpdf.png") 

给出

如您所见,分布不是双峰分布,就像人们希望的那样.

I looked online for performing the combining several continuous probability distributions into one continuous probability distribution. This method is called Conflation, the method can be found in the following article: An Optimal Method for Consolidating Data from Different Experiments. In this article, I found out that it was better to perform Conflation instead of averaging to combine distributions.

From what I understood from the article is that equation performs by multiplying each probability density values from several probability distributions divided by the integration of the product of each probability density value from several probability distributions for continuous distribution while for the discrete distribution it is done by multiplying each probability density value from several probability distributions divided by the summation of each probability density value from several probability distributions. (Details can be found on page 5 of the article)

Say I have around 4 lists from, for example, 4 norm distributions, for example,

list_1 = [5, 8, 6, 2, 1]
list_2 = [2, 6, 1, 3, 8]
list_3 = [1, 9, 2, 7, 5]
list_4 = [3, 2, 4, 1, 6]

and implementing the Conflation the result list becomes,

Con_list = [2.73, 34.56, 3.69, 3.23, 12]

(Correct me if I am wrong)

how is it possible to implement both equations in the photo into python to get the Conflation of inputted PDF distribution?

I found stackflow question before regarding averaging list and the code was the following,

def average(l):
    llen = len(l)
    def divide(x):
        return x / llen
    # return map(divide, map(sum, zip(*l)))
    return map(divide, map(sum, zip(l)))

I have been trying to recode this function to follow the equation above but I can't find a way to get conflated pdf for a continuous distribution.

Edit 1:

Based on the answer from @Josh Purtell, I rewrote the code, however, I keep on getting the following error message:

Error Message:

Traceback (most recent call last):
  File "/tmp/sessions/c903d99d60f20c3b/main.py", line 72, in <module>
    graph=conflate_pdf(domain, dists,lb,ub)
  File "/tmp/sessions/c903d99d60f20c3b/main.py", line 58, in conflate_pdf
    denom = quad(prod_pdf, lb, ub, args=(dists))[0]
  File "/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py", line 341, in quad
    points)
  File "/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py", line 448, in _quad
    return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
TypeError: only size-1 arrays can be converted to Python scalars

Code:

def prod_pdf(x,pdfs):
    prod=np.ones(pdfs[0].shape[0])
    for pdf in pdfs:
        prod=prod*pdf
    return prod

def conflate_pdf(x,dists,lb,ub):
    denom = quad(prod_pdf, lb, ub, args=(dists))[0]
    return prod_pdf(x,dists)/denom

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

dist_1 = stats.norm.pdf(domain, 2,1)
dist_2 = stats.norm.pdf(domain, 2.5,1.5)
dist_3 = stats.norm.pdf(domain, 2.2,1.6)
dist_4 = stats.norm.pdf(domain, 2.4,1.3)
dist_5 = stats.norm.pdf(domain, 2.7,1.5)

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
graph=conflate_pdf(domain, dists,lb,ub)

from matplotlib import pyplot as plt
plt.plot(domain, dist_1)
plt.plot(domain, dist_2)
plt.plot(domain, dist_3)
plt.plot(domain, dist_4)
plt.plot(domain, dist_5)
plt.plot(domain,graph)
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.show()

From the code, what causes this error?

Edit 2:

I managed to rewrite the code to look into lists of distribution instead of getting the pdf in the product function in Edit 1, but still, I keep on having the same error in Edit 1.

Code:

def prod_pdf(x,pdfs):
    prod=np.ones(np.array(pdfs)[0].shape)
    for pdf in pdfs:
        print(prod)
        for c,y in enumerate(pdf):
            prod[c]=prod[c]*y
        print('final:', prod)
    return prod

def conflate_pdf(x,dists,lb,ub):
    denom = quad(prod_pdf, lb, ub, args=(dists))[0]
    print('Denom: ',denom)
    print('product pdf: ', prod_pdf(x,dists))
    conflated_pdf=prod_pdf(x,dists)/denom
    print(conflated_pdf)
    return conflated_pdf

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)

from matplotlib import pyplot as plt
plt.plot(domain, dist_1, 'r')
plt.plot(domain, dist_2, 'g')
plt.plot(domain, dist_3, 'b')
plt.plot(domain, dist_4, 'y')
plt.plot(domain, dist_5, 'c')

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
graph=conflate_pdf(domain, dists,lb,ub)


plt.plot(domain,graph, 'm')
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.show()

Edit 3:

I tried to run the following code (based on an answer from @Josh Purtell), but, I keep on getting one variable it gets the whole array after product function and it produces the same error message regarding the size-1 array. See the following code with a portion of the output:

Code:

from scipy.integrate import quad
from scipy import stats
import numpy as np

def prod_pdf(x,dists):
    p_pdf=1
    print('Incoming Array:', p_pdf)
    for dist in dists:
        p_pdf=p_pdf*dist
        print('final:', p_pdf)
    return p_pd

def conflate_pdf(x,dists,lb,ub):
    print('Input product pdf: ', prod_pdf(x,dists))
    denom = quad(prod_pdf, lb, ub, args=(dists,))[0]
    # denom = simps(prod_pdf)
    # denom = nquad(func=(prod_pdf), ranges=([lb, ub]), args=(dists,))[0]
    print('Denom: ', denom)
    conflated_pdf=prod_pdf(x,dists)/denom
    print('Conflated PDF: ', conflated_pdf)
    return conflated_pdf

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)

from matplotlib import pyplot as plt
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.legend()
plt.plot(domain, dist_1, 'r', label='Dist. 1')
plt.plot(domain, dist_2, 'g', label='Dist. 2')
plt.plot(domain, dist_3, 'b', label='Dist. 3')
plt.plot(domain, dist_4, 'y', label='Dist. 4')
plt.plot(domain, dist_5, 'c', label='Dist. 5')

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
print('distribution list: \n', dists)
graph=conflate_pdf(domain, dists,lb,ub)

plt.plot(domain,graph, 'm', label='Conflated Dist.')
plt.show()

Here is a small portion of the output:

Incoming Array: 1
final: [2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-15
 5.92770938e-15 5.47278628e-15]
final: [4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-21
 6.27970218e-21 5.60806584e-21]
final: [2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-26
 1.14952951e-26 9.95834610e-27]
final: [1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-34
 1.46198598e-34 1.21092834e-34]
final: [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
 2.98464396e-40 2.39335035e-40]
Input product pdf:  [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
 2.98464396e-40 2.39335035e-40]
Incoming Array: 1
final: [2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-15
 5.92770938e-15 5.47278628e-15]
final: [4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-21
 6.27970218e-21 5.60806584e-21]
final: [2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-26
 1.14952951e-26 9.95834610e-27]
final: [1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-34
 1.46198598e-34 1.21092834e-34]
final: [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
 2.98464396e-40 2.39335035e-40]

I managed to look into the code to implement the same method in Edit 3, I edited the code where it gets the first variables from each distribution however, for the rest of the loop it keeps on printing the same values, it does not go to the next values in the lists and Conflated distribution is a single variable. See the following code with a portion of the output:

Code:

from scipy.integrate import quad
from scipy import stats
import numpy as np

def prod_pdf(x,dists):
    p_pdf=1
    print('Incoming Array:', p_pdf)
    for c,dist in enumerate(dists):
        p_pdf=p_pdf*dist[c]
        print('final:', p_pdf)
    return p_pdf

def conflate_pdf(x,dists,lb,ub):
    print('Input product pdf: ', prod_pdf(x,dists))
    denom = quad(prod_pdf, lb, ub, args=(dists,))[0]
    # denom = simps(prod_pdf)
    # denom = nquad(func=(prod_pdf), ranges=([lb, ub]), args=(dists,))[0]
    print('Denom: ', denom)
    conflated_pdf=prod_pdf(x,dists)/denom
    print('Conflated PDF: ', conflated_pdf)
    return conflated_pdf

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)

from matplotlib import pyplot as plt
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.legend()
plt.plot(domain, dist_1, 'r', label='Dist. 1')
plt.plot(domain, dist_2, 'g', label='Dist. 2')
plt.plot(domain, dist_3, 'b', label='Dist. 3')
plt.plot(domain, dist_4, 'y', label='Dist. 4')
plt.plot(domain, dist_5, 'c', label='Dist. 5')

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
print('distribution list: \n', dists)
graph=conflate_pdf(domain, dists,lb,ub)

plt.plot(domain,graph, 'm', label='Conflated Dist.')
plt.show()

A portion of the output:

Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Denom:  3.95931575736075e-96
Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Conflated PDF:  0.049999999999999996

Edit 4:

I implemented the following code and it seems to work, also, I managed to sort out the problem with quad it seems if I changed the quad into fixed_quad and normalise the pdf list. I will get the same result. Here is the following code:

import scipy.stats as st
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler, Normalizer, normalize, StandardScaler
from scipy.integrate import quad, simps, quad_vec, nquad, cumulative_trapezoid
from scipy.integrate import romberg, trapezoid, simpson, romb
from scipy.integrate import fixed_quad, quadrature, quad_explain
from scipy import stats
import time

def user_prod_pdf(x,dists):
p_list=[]
p_pdf=1
print('Incoming Array:', p_pdf)
for dist in dists:
print('Incoming Distribution Array:', dist.pdf(x))
p_pdf=p_pdf*dist.pdf(x)
print('Product PDF:', p_pdf)
p_list.append(p_pdf)
print('final Product PDF:', p_pdf)
print('Product PDF list: ', p_list)
return p_pdf

def user_conflate_pdf(x,dists,lb,ub):
print('Input product pdf: ', user_prod_pdf(x,dists))
denom = quad(user_prod_pdf, lb, ub, args=(dists,))[0]
print('Denom: ', denom)
conflated_pdf=user_prod_pdf(x,dists)/denom
print('Conflated PDF: ', conflated_pdf)
return conflated_pdf

def user_conflate_pdf_2(pdfs):
"""
Compute conflation of given pdfs.

[ARGS]
- pdfs: PDFs numpy array of shape (n, x)
where n is the number of PDFs
and x is the variable space.

[RETURN]
A 1d-array of normalized conflated PDF.
"""
# conflate
conflation = np.array(pdfs).prod(axis=0)
# normalize
conflation /= conflation.sum()
return conflation

def my_product_pdf(x,dists):
p_list=[]
p_pdf=1
print('Incoming Array:', p_pdf)
list_full_size=np.array(dists).shape
print('Full list size: ', list_full_size)
print('list size: ', list_full_size[0])
for x in range(list_full_size[1]):
p_pdf=1
for y in range(list_full_size[0]):
p_pdf=float(p_pdf)*dists[y][x]
print('Product value: ', p_pdf)
print('Product PDF:', p_pdf)
p_list.append(p_pdf)
print('final Product PDF:', p_pdf)
print('Product PDF list: ', p_list)
# return p_pdf
return p_list
# return np.array(p_list)

def my_conflate_pdf(x,dists,lb,ub):
print('\n')
# print('product pdf: ', prod_pdf(x,dists))
print('product pdf: ', my_product_pdf(x,dists))
denom = fixed_quad(my_product_pdf, lb, ub, args=(dists,), n=1)[0]
print('Denom: ', denom)
# conflated_pdf=prod_pdf(x,dists)/denom
conflated_pdf=my_product_pdf(x,dists)/denom
# conflated_pdf=[i / j for i,j in zip(my_product_pdf(x,dists), denom)]
print('Conflated PDF: ', conflated_pdf)
return conflated_pdf

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

# dist_1 = st.norm(2,1)
# dist_2 = st.norm(2.5,1.5)
# dist_3 = st.norm(2.2,1.6)
# dist_4 = st.norm(2.4,1.3)
# dist_5 = st.norm(2.7,1.5)

# dist_1_pdf = st.norm.pdf(domain, 2,1)
# dist_2_pdf = st.norm.pdf(domain, 2.5,1.5)
# dist_3_pdf = st.norm.pdf(domain, 2.2,1.6)
# dist_4_pdf = st.norm.pdf(domain, 2.4,1.3)
# dist_5_pdf = st.norm.pdf(domain, 2.7,1.5)

# dist_1_pdf /= dist_1_pdf.sum()
# dist_2_pdf /= dist_2_pdf.sum()
# dist_3_pdf /= dist_3_pdf.sum()
# dist_4_pdf /= dist_4_pdf.sum()
# dist_5_pdf /= dist_5_pdf.sum()

dist_1 = st.norm(2,1)
dist_2 = st.norm(4,2)
dist_3 = st.norm(7,4)
dist_4 = st.norm(2.4,1.3)
dist_5 = st.norm(2.7,1.5)

dist_1_pdf = st.norm.pdf(domain, 2,1)
dist_2_pdf = st.norm.pdf(domain, 4,2)
dist_3_pdf = st.norm.pdf(domain, 7,4)
dist_4_pdf = st.norm.pdf(domain, 2.4,1.3)
dist_5_pdf = st.norm.pdf(domain, 2.7,1.5)

# dist_1_pdf /= dist_1_pdf.sum()
# dist_2_pdf /= dist_2_pdf.sum()
# dist_3_pdf /= dist_3_pdf.sum()
# dist_4_pdf /= dist_4_pdf.sum()
# dist_5_pdf /= dist_5_pdf.sum()

# User:
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("User Conflated PDF")
plt.plot(domain, dist_1_pdf, 'r', label='Dist. 1')
plt.plot(domain, dist_2_pdf, 'g', label='Dist. 2')
plt.plot(domain, dist_3_pdf, 'b', label='Dist. 3')
plt.plot(domain, dist_4_pdf, 'y', label='Dist. 4')
plt.plot(domain, dist_5_pdf, 'c', label='Dist. 5')

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
user_graph=user_conflate_pdf(domain,dists,lb,ub)
print('Final Conflated PDF: ', user_graph)

# user_graph /= user_graph.sum()

plt.plot(domain, user_graph, 'm', label='Conflated PDF')
plt.legend()
plt.show()

# User 2:
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("User Conflated PDF 2")
plt.plot(domain, dist_1_pdf, 'r', label='Dist. 1')
plt.plot(domain, dist_2_pdf, 'g', label='Dist. 2')
plt.plot(domain, dist_3_pdf, 'b', label='Dist. 3')
plt.plot(domain, dist_4_pdf, 'y', label='Dist. 4')
plt.plot(domain, dist_5_pdf, 'c', label='Dist. 5')

dists=[dist_1_pdf, dist_2_pdf, dist_3_pdf, dist_4_pdf, dist_5_pdf]
user_graph=user_conflate_pdf_2(dists)
print('Final User Conflated PDF 2 : ', user_graph)

# user_graph /= user_graph.sum()

plt.plot(domain, user_graph, 'm', label='Conflated PDF')
plt.legend()
plt.show()

# My Code:
# from matplotlib import pyplot as plt
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("My Conflated PDF Code")
plt.plot(domain, dist_1_pdf, 'r', label='Dist. 1')
plt.plot(domain, dist_2_pdf, 'g', label='Dist. 2')
plt.plot(domain, dist_3_pdf, 'b', label='Dist. 3')
plt.plot(domain, dist_4_pdf, 'y', label='Dist. 4')
plt.plot(domain, dist_5_pdf, 'c', label='Dist. 5')

dists=[dist_1_pdf, dist_2_pdf, dist_3_pdf, dist_4_pdf, dist_5_pdf]
my_graph=my_conflate_pdf(domain,dists,lb,ub)
print('Final Conflated PDF: ', my_graph)

my_graph /= np.array(my_graph).sum()

# my_graph = inverse_normalise(my_graph)

plt.plot(domain, my_graph, 'm', label='Conflated PDF')
plt.legend()
plt.show()

# Conflated PDF:
print('User Conflated PDF: ', user_graph)
print('My Conflated PDF: ', np.array(my_graph))

Here is the output:

My question here, I understand that I would need to normalise the PDF lists. But, say I did not normalise the PDF, how can I modify my conflation code to get the following plot?

To get the plot above and my conflated code:

# user_graph /= user_graph.sum()
# dist_1_pdf /= dist_1_pdf.sum()
# dist_2_pdf /= dist_2_pdf.sum()
# dist_3_pdf /= dist_3_pdf.sum()
# dist_4_pdf /= dist_4_pdf.sum()
# dist_5_pdf /= dist_5_pdf.sum()

My conflated code plot with no normalisation:

解决方案

Disclaimer: there's a good chance I'm misunderstanding either you or the paper authors, in which case please suggest an edit to this answer.

Here is a trivial, not-especially-performant implementation of what I think conflation might look like

##define pdfs for discrete RV X = {1,2,3,4}
import numpy as np

def mult_list(pdfs):
    prod=np.ones(pdfs[0].shape[0])
    for pdf in pdfs:
        prod=prod*pdf
    return prod

def conflate(pdfs):
    return mult_list(pdfs)/sum(mult_list(pdfs))

pdf_1=np.array([.25,.25,.25,.25])
pdf_2=np.array([.33,.33,.33,.00])
pdf_3=np.array([.25,.12,.13,.50])

print(conflate([pdf_1,pdf_2,pdf_3]))

which yields the resulting conflated pdf

>>> [0.5  0.24 0.26 0.  ]

which passes a cursory sniff test.

On the continuous side of things, the above translates to

from scipy.integrate import quad
from scipy import stats
import numpy as np

def prod_pdf(x,dists):
    p_pdf=1
    for dist in dists:
        p_pdf=p_pdf*dist.pdf(x)
    return p_pdf

def conflate_pdf(x,dists,lb,ub):
    denom = quad(prod_pdf, lb, ub, args=(dists))[0]
    return prod_pdf(x,dists)/denom

dists=[stats.norm(2,1),stats.norm(4,2)]
lb=-10
ub=10
domain=np.arange(lb,ub,.01)
graph=conflate_pdf(domain,dists,lb,ub)

from matplotlib import pyplot as plt
plt.plot(domain,graph)
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.show()
plt.savefig("conflatedpdf.png")

which gives

As you can see, the distribution is not bimodal, just as one would hope.

这篇关于如何在python中实现概率分布的合并?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆