确定数据的回归系数-MATLAB [英] Determining regression coefficients for data - MATLAB

查看:130
本文介绍了确定数据的回归系数-MATLAB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个涉及科学计算的项目.以下是三个变量及其在实验后获得的值.

I am doing a project involving scientific computing. The following are three variables and their values I got after some experiments.

还有一个具有三个未知数的方程,分别为abc:

There is also an equation with three unknowns, a, b and c:

x=(a+0.98)/y+(b+0.7)/z+c

如何使用上述方法获取a,b,c的值?在MATLAB中有可能吗?

How do I get values of a,b,c using the above? Is this possible in MATLAB?

推荐答案

这听起来像是回归问题.假设无法解释的测量误差是高斯分布的,则可以通过最小二乘来找到参数.基本上,您必须重写方程式,以便将其转换为ma + nb + oc = p的形式,然后有6个具有3个未知数(a, b, c)的方程式,这些参数可以通过最小二乘法优化来找到.因此,有了一些代数,我们得到:

This sounds like a regression problem. Assuming that the unexplained errors in measurements are Gaussian distributed, you can find the parameters via least squares. Basically, you'd have to rewrite the equation so that you get this to the form of ma + nb + oc = p and then you have 6 equations with 3 unknowns (a, b, c) and these parameters can be found through optimization by least squares. Therefore, with some algebra, we get:

za + yb + yzc = xyz - 0.98z - 0.7z

因此,m = z, n = y, o = yz, p = xyz - 0.98z - 0.7z.我将把它留给您作为练习,以验证我的代数是正确的.然后,您可以形成矩阵方程:

As such, m = z, n = y, o = yz, p = xyz - 0.98z - 0.7z. I'll leave that for you as an exercise to verify that my algebra is right. You can then form the matrix equation:

Ax = d

我们将有6个方程,我们想求解x,其中x = [a b c]^{T}.要解决x,您可以使用称为 pseudoinverse 如果要使用相同的输入数据,则检索最大程度地减小真实输出与这些参数生成的输出之间的误差的参数.

We would have 6 equations and we want to solve for x where x = [a b c]^{T}. To solve for x, you can employ what is known as the pseudoinverse to retrieve the parameters that best minimize the error between the true output and the output that is generated by these parameters if you were to use the same input data.

换句话说:

x = A^{+}d

A^{+}是矩阵A的伪逆,并且是矩阵向量与向量d的乘积.

A^{+} is the pseudoinverse of the matrix A and is matrix-vector multiplied with the vector d.

为了将思想投入代码,我们将定义输入数据,形成A矩阵和d向量,其中它们之间共享的每一行都是一个方程式,然后使用拟逆来查找我们的参数.您可以使用ldivide (\)运算符来完成这项工作:

To put our thoughts into code, we would define our input data, form the A matrix and d vector where each row shared between them both is one equation, and then employ the pseudoinverse to find our parameters. You can use the ldivide (\) operator to do the job:

%// Define x y and z
x = [9.98; 8.3; 8.0; 7; 1; 12.87];
y = [7.9; 7.5; 7.4; 6.09; 0.9; 11.23];
z = [7.1; 5.6; 5.9; 5.8; -1.8; 10.8];

%// Define A matrix
A = [z y y.*z];
%// Define d vector
d = x.*y.*z - 0.98*z - 0.7*z;

%// Find parameters via least-squares
params = A\d;

params存储参数abc,我们得到:

params stores the parameters a, b and c, and we get:

params =

  -37.7383
  -37.4008
   19.5625


如果您要仔细检查这些值的接近程度,则可以在帖子中使用上面的表达式,然后将其与x中的每个值进行比较:


If you want to double-check how close the values are, you can simply use the above expression in your post and compare with each of the values in x:

a = params(1); b = params(2); c = params(3);
out = (a+0.98)./y+(b+0.7)./z+c;
disp([x out])

9.9800    9.7404
8.3000    8.1077
8.0000    8.3747
7.0000    7.1989
1.0000   -0.8908
12.8700   12.8910

您可以看到它并不完全接近,但是从最小二乘错误的角度来看,您获得的参数将是最好的.

You can see that it's not exactly close, but the parameters you got would be the best in a least-squares error sense.

您可以看到某些预测值(输出的右列)比其他预测值差得多.这是因为我们在您的数据中使用了所有点来找到合适的模型.一种用于最小化误差并提高模型估计的鲁棒性的技术是使用称为 RANSAC RAN dom SA 大量 C 常识. RANSAC的基本方法是,对于一定数量的迭代,您将获取数据并随机采样找到模型所需的最少数量的点.找到此模型后,如果要使用这些参数来描述数据,则会发现总体错误.您将保持随机选择点,查找模型,查找错误以及产生最少错误的迭代,这些都是您用来定义整体模型的参数.

You can see that some of the predicted values (right column in the output) are more off than others. This is because we used all points in your data to find the appropriate model. One technique that is used to minimize error and increase the robustness of the model estimation is to use something called RANSAC, or RANdom SAmple Consensus. The basic methodology behind RANSAC is that for a certain number of iterations, you take your data and randomly sample the least amount of points necessary to find a model. Once you find this model, you find the overall error if you were to use these parameters to describe your data. You keep randomly choosing points, finding your model, and finding the error and the iteration that produced the least amount of error would be the parameters you keep to define the overall model.

正如您在上面看到的,我们可以定义的一个错误是真实的x点与预测的x点之间的绝对差之和.还有许多其他度量,例如平方误差的总和,但让我们暂时坚持一些简单的方法.如果您看一下上面的公式,我们至少需要三个方程才能定义abc,因此对于每次迭代,我们都会随机选择选择三个点不替换,我可能会添加,找到我们的模型,确定错误,并不断迭代并查找错误量最小的参数.

As you can see above, one error that we can define is the sum of absolute differences between the true x points and the predicted x points. There are many other measures, such as the sum of squared errors, but let's stick with something simple for now. If you take a look at the above formulation, we need a minimum of three equations in order to define a, b and c, and so for each iteration, we'd randomly select three points without replacement I might add, find our model, determine the error, and keep iterating and finding the parameters with the least amount of error.

因此,您可以编写如下的RANSAC算法:

Therefore, you could write a RANSAC algorithm like so:

%// Define cost and number of iterations
cost = Inf;
iterations = 50;

%// Set seed for reproducibility
rng(123);

%// Define x y and z
x = [9.98; 8.3; 8.0; 7; 1; 12.87];
y = [7.9; 7.5; 7.4; 6.09; 0.9; 11.23];
z = [7.1; 5.6; 5.9; 5.8; -1.8; 10.8];

for idx = 1 : iterations
    %// Determine where we would need to sample
    ind = randperm(numel(x), 3);

    xs = x(ind); ys = y(ind); zs = z(ind); %// Sample

    %// Define A matrix
    A = [zs ys ys.*zs];
    %// Define d vector
    d = xs.*ys.*zs - 0.98*zs - 0.7*zs;

    %// Find parameters via least-squares
    params = A\d;

    %// Determine error
    a = params(1); b = params(2); c = params(3);
    out = (a+0.98)./y+(b+0.7)./z+c;
    err = sum(abs(x - out));

    %// If error produced is less than current error
    %// then save parameters
    if err < cost
        cost = err;
        final_params = params;
    end
end


运行上面的代码时,我得到了我的参数:


When I run the above code, I get for my parameters:

final_params =

  -38.1519
  -39.1988
   19.7472

将此与我们的x进行比较,我们得到:

Comparing this with our x, we get:

a = final_params(1); b = final_params(2); c = final_params(3);
out = (a+0.98)./y+(b+0.7)./z+c;
disp([x out])

9.9800    9.6196
8.3000    7.9162
8.0000    8.1988
7.0000    7.0057
1.0000   -0.1667
12.8700   12.8725

如您所见,这些值得到了改进-尤其是第四和第六点...并将其与以前的版本进行比较:

As you can see, the values are improved - especially the fourth and sixth points... and compare it to the previous version:

9.9800    9.7404
8.3000    8.1077
8.0000    8.3747
7.0000    7.1989
1.0000   -0.8908
12.8700   12.8910

您可以看到第二个值比以前的版本差,但是其他数字更接近真实值.

You can see that the second value is worse off than the previous version, but the other numbers are much more closer to the true values.

玩得开心!

这篇关于确定数据的回归系数-MATLAB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆