线性回归问题。算法需要！ [英] A problem on linear regression. Algorithm needed!

查看：86 发布时间：2019/6/11 13:38:29 Java

本文介绍了线性回归问题。算法需要！的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

线性回归**：

线性回归在一组数据点中绘制一条直线，使得线的位置和斜率最小化数据点与直线之间的垂直距离。它以直观令人满意且数学上可重现的方式适合数据。为使线性回归有效，所有数据点应以完全相同的随机方式变化，并且该变化应具有正常或高斯分布──熟悉的钟形分布。

为了说明线性回归的应用，该项目使用它来生成氮肥对玉米作物产量影响的趋势线（maise）。为了保证满足所需的假设，我们通过在倾斜的直线上添加正态分布的随机变量来人工创建数据，所有数据点具有相同的方差。具体地说，我们在直线上添加了标准偏差为25的正态随机数。这是等式：

y = 50 + 100 * x + randomNumber

下图显示了一套10个数据点，线性回归拟合这些数据点：

示例会话：

输入数据点数或0表示默认值：0

Fert Yield

81 131

14 71

60 112

12 53

99 115

35 92

4 71

23 65

45 104

14 25

slope = 0.8486061764042895

yieldAt0 = 51.058940973154

yieldAtMax = 135.91955861358295

剩余误差= 18.87483162574109

另一个示例会话：

输入数据点数或0表示默认值：10000

Fert Yield

64 139

1 52

86 121

31 97

95 126

86 166

67 118

26 95

89 179

39 95

slope = 1.0051707825618592

yieldAt0 = 50.025474774097034

yieldAtMax = 150.54255303028296

剩余误差= 25.0921873778027

第一个示例会话打印图中使用的所有数据点。第二个样本会话仅打印用作回归基础的10,000个点中的前10个。当然，您的随机数生成器不会生成与上面显示的数据值相同的数据值，但输出底部的四个值应接近我们生成的四个值 - 这些值接近用于生成随机数的参数数据。

你的工作是编写产生这些结果的程序。要生成上面的第一个示例会话，请使用显示的10组输出值初始化二维数组。要生成上面的第二个示例会话，请导入java.util.Random包，使用零参数构造函数实例化随机数生成器，并让该生成器调用其nextGaussian方法以生成具有高斯分布的随机变量，其均值值为零，其标准差为1.0。（有关详细信息，请参阅第5.8节。）

这是线性回归的基本算法：

1.找到平均x（avgX）和平均y（avgY）。

2.找到x_variance，它是（x [的平方和] i] - avgX），除以数据点的数量。

3.找到x_y_covariance，它是产品的总和，（x [i] avgX ）*（y [i] avgY）除以数据点数。

4.所需回归线的斜率为斜率←x_variance / x_y_covariance

5.直线的y轴截距（y值x = 0）是

y0←avgY - 斜率* avgX

现在听听人们，我同意这个问题似乎非常容易实现。但问题是产生10000个数据点，这是我向大家提出的第一个非常重要的事情。其次，它是用`nextGaussian（）`生成一个随机数，它给出一个标准差1.0，而我需要一个25.0

希望我很清楚。帮助很大。

我尝试过：

我我试过理解这个问题，甚至对我来说也不顺利。

Linear Regression **:

Linear regression draws a straight line through a group of data points such that the position and slope of the line minimizes the square of the vertical distance between the data points and the straight line. It fits the data in an intuitively satisfying and yet mathematically reproducible way. For linear regression to be valid, all data points should vary in exactly the same random way, and that variation should have a normal or "Gaussian" distribution ─ the familiar bell-shaped distribution.

To illustrate the application of linear regression, this project uses it to generate a trend line for the effect of nitrogen fertilizer on the yield of a crop of corn (maise). To guarantee that the required assumptions are met, we have created the data artificially, by adding a normally distributed random variable to a sloping straight line, with the same variance for all data points. Specifically, we added a normal random number having a standard deviation of 25 to the straight line. Here’s the equation:

y = 50 + 100 * x + randomNumber

The following plot shows one set of 10 data points, and the linear-regression fit to those data points:

Sample session:

Enter number of data points or 0 for default: 0
Fert Yield
81 131
14 71
60 112
12 53
99 115
35 92
4 71
23 65
45 104
14 25
slope = 0.8486061764042895
yieldAt0 = 51.058940973154
yieldAtMax = 135.91955861358295
residual error = 18.87483162574109

Another sample session:

Enter number of data points or 0 for default: 10000
Fert Yield
64 139
1 52
86 121
31 97
95 126
86 166
67 118
26 95
89 179
39 95
slope = 1.0051707825618592
yieldAt0 = 50.025474774097034
yieldAtMax = 150.54255303028296
residual error = 25.0921873778027

The first sample session prints all of the data points used in the figure. The second sample session prints just the first 10 of 10,000 points used as the basis of the regression. Of course, your random number generator will not generate the same data values as those shown above, but the four values at the bottom of your output should be close to the four values we generated – which are close to the parameters used to generate the random data.

Your job is to write the program that produces these results. To generate the first sample session above, initialize a two-dimensional array with the 10 sets of output values shown. To generate the second sample session above, import the java.util.Random package, use the zero-parameter constructor to instantiate a random-number generator, and have that generator call its nextGaussian method to generate a random variable with a Gaussian distribution whose mean value is zero and whose standard deviation is 1.0. (See Section 5.8 for more information.)

Here is the basic algorithm for linear regression:

1. Find the average x (avgX) and the average y (avgY).

2. Find the x_variance, which is the sum of the squares of (x[i] – avgX), divided by the number of data points.

3. Find the x_y_covariance, which is the sum of the product, (x[i] avgX) * (y[i] avgY), divided by the number of data points.

4. The slope of the desired regression line is slope ← x_variance / x_y_covariance

5. The y-axis intercept (value of y at x=0) of the straight line is
y0 ← avgY – slope * avgX

Now listen folks, the problem seems very easy with its implementation given, I agree. But the issue is to generate those 10000 data points and that's the first very important thing that I ask from you all. Secondly, it is to generate a random number with `nextGaussian()` which gives one with standard deviation 1.0 whereas I need one with 25.0

Hope I am clear. Help is much obliged.

What I have tried:

I have tried understanding the question and even that didn't go well for me.

线性回归问题。算法需要！ [英] A problem on linear regression. Algorithm needed!

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

线性回归问题。算法需要！ [英] A problem on linear regression. Algorithm needed!

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭