关联不同数据源之间的加权数据 [英] Correlating weighted data between different data sources

查看:94
本文介绍了关联不同数据源之间的加权数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿大家,



我想知道是否有任何可用的开源软件可以帮助根据指定值和权重关联数据。



我正在研究一个项目,因为我们有许多不同的数据源,由于文书错误,有时可能会出错。为了关联数据,我们将权重与某些字段相关联,这些字段将计算数据相关的可能性。数据将考虑数据之间的差异,并查看文书错误的可能性,而不是实际的不同来源。



我尝试了什么:



如果没有,我会有兴趣创建一个可以实现这一目标的开源解决方案。如果它已经存在,我只是不想重新发明轮子。



谢谢



PS如果有人觉得他们之前可能已经看过这个问题了,我昨天就发布了这个问题,但是我只得到了投票而没有评论所以如果我打破了某种论坛的礼仪,请随时告诉我。

Hey everyone,

I was wondering if there were any available open source software that helps correlating data based on specified values with weights.

I am working on a project were we have many different data sources that can sometimes be incorrect due to clerical errors. To correlate the data, we will associate weights to certain fields that will calculate how likely it the data relates together. The data will take into account the difference between the data and see how likely it was a clerical error rather than a actual different source.

What I have tried:

Should there not be, I would be interested in creating an open source solution that can accomplish this. I just didn't want to re-invent the wheel if it already existed.

Thanks

P.S if anyone feels like they might have seen this question before, I posted on SO yesterday, but I only got downvotes and no comments so if I'm breaking some kinda forum etiquette feel free to let me know.

推荐答案

我想你可以搜索 C#统计库,例如参见 c# - 什么是.Net的优秀统计数学包? - 堆栈溢出 [ ^ ]。

您也可以选择使用统计导向的编程语言,例如 R [ ^ ]。

最后,如果您喜欢脚本,那么您可能会发现许多统计库可用于 Python





I guess you could search for a C# statistical library, see, for instance c# - What is a good statistical math package for .Net? - Stack Overflow[^].
You could also opt for using a statistical oriented programming language, like R[^].
Finally, if you like scripting then you may find many statistical libraries available for Python.


引用:

PS如果有人觉得他们之前可能已经看过这个问题,昨天我发布了SO,但是我只得到了回报,没有评论所以

P.S if anyone feels like they might have seen this question before, I posted on SO yesterday, but I only got downvotes and no comments so

我个人认为这是他们的错,不是你的。

Personally, I think it is their fault, not yours.


这篇关于关联不同数据源之间的加权数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆