创建适合数据集的高效函数 [英] Creating an efficient function to fit a dataset

查看：77 发布时间：2020/5/6 11:33:38 math interpolation data-fitting

本文介绍了创建适合数据集的高效函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

基本上，我有一个4字节输入及其对应的4字节输出的大型数据集(最多可以得到100,000-150,000个值).不能保证输入是唯一的(这并不是一个真正的问题，因为我认为我可以生成伪随机数来对输入进行加法运算或异或运算，以使它们成为唯一的)，但是并不能保证输出结果是唯一的两者都是唯一的(因此两组不同的输入可能具有相同的输出).

Basically I have a large (could get as large as 100,000-150,000 values) data set of 4-byte inputs and their corresponding 4-byte outputs. The inputs aren't guaranteed to be unique (which isn't really a problem because I figure I can generate pseudo-random numbers to add or xor the inputs with so that they do become unique), but the outputs aren't guaranteed to be unique either (so two different sets of inputs might have the same output).

我正在尝试创建一个对我的数据集中的值进行有效建模的函数.我不需要它进行有效的插值，甚至根本不需要插值(这就是说，我永远不会向它提供此静态数据集中未包含的输入).但是，它确实需要尽可能高效.我研究了插值法，发现它确实不符合我的需求.例如，大量的值意味着样条曲线插值将不起作用，因为它会为每个间隔创建一个多项式.

I'm trying to create a function that effectively models the values in my data-set. I don't need it to interpolate efficiently, or even at all (by this I mean that I'm never going to feed it an input that isn't contained in this static data-set). However it does need to be as efficient as possible. I've looked into interpolation and found that it doesn't really fit what I'm looking for. For example, the large number of values means that spline interpolation won't do since it creates a polynomial per interval.

另外，根据我的理解，多项式插值在计算上会太昂贵(n值表示多项式可以包含高达pow(x，n-1)的项.对于x = 4字节数且n = 100,000只是不可行).我已经尝试在线查找一段时间了，但是我对数学的了解不是很强，并且一定不知道要搜索的正确术语，因为到目前为止我还没有遇到类似的事情.

Also, from my understanding polynomial interpolation would be way too computationally expensive (n values means that the polynomial could include terms as high as pow(x,n-1). For x= a 4-byte number and n=100,000 it's just not feasible). I've tried looking online for a while now, but I'm not very strong with math and must not know the right terms to search with because I haven't come across anything similar so far.

我可以看到这并不是一个完整的编程问题，我事先表示歉意.我不是在寻找确切的解决方案，甚至不是完整的答案.我只需要指向需要阅读的主题的指针，这样我就可以自行解决此问题.谢谢！

I can see that this is not completely (to put it mildly) a programming question and I apologize in advance. I'm not looking for the exact solution or even a complete answer. I just need pointers on the topics that I would need to read up on so I can solve this problem on my own. Thanks!

TL; DR-我需要插值的一种变体，它只需要适合最初给定的数据点，但是计算效率很高.

TL;DR - I need a variant of interpolation that only needs to fit the initially given data-points, but which is computationally efficient.

一些澄清-我确实需要输出准确，而不是近似值.这是对我当前正在做的一些研究工作的一种优化，我需要在没有程序中实际输出字节的情况下实现此查找.目前我还不能说太多，但是我要说的是，出于我的工作目的，加密(或压缩或其他任何形式的混淆处理)不是隐藏表格的一种选择.我需要一个数学函数，只要它可以访问输入，就可以重新创建输出.我希望这能使事情变得更清楚.

Some clarification - I do need the output to be exact and not an approximation. This is sort of an optimization of some research work I'm currently doing and I need to have this look-up implemented without the actual bytes of the outputs being present in my program. I can't really say a whole lot about it at the moment, but I will say that for the purposes of my work, encryption (or compression or any other other form of obfuscation) is not an option to hide the table. I need a mathematical function that can recreate the output so long as it has access to the input. I hope that clears things up a bit.

创建适合数据集的高效函数 [英] Creating an efficient function to fit a dataset

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

创建适合数据集的高效函数 [英] Creating an efficient function to fit a dataset

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭