PostgreSQL中使用R的非线性回归模型 [英] Non-linear regression models in PostgreSQL using R

查看:126
本文介绍了PostgreSQL中使用R的非线性回归模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有1900年至2009年之间加拿大全境的气候数据(温度,降水量,积雪深度)。我写了一个基本的网站,最简单的页面允许用户选择类别和城市。然后,他们返回一个非常简单的报告(没有参数和计算部分):

I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section):

Web应用程序的主要目的是提供一个简单的用户界面,以便公众可以有意义的方式浏览数据。 (数字列表对公众没有意义,也没有提供太少的网站许多输入。)该应用程序的第二个目的是为气候学家和其他科学家提供更深层的查看数据的方法。 (当然,使用了太多输入。)

The primary purpose of the web application is to provide a simple user interface so that the general public can explore the data in meaningful ways. (A list of numbers is not meaningful to the general public, nor is a website that provides too many inputs.) The secondary purpose of the application is to provide climatologists and other scientists with deeper ways to view the data. (Using too many inputs, of course.)

数据库是带有R的PostgreSQL(主要是)安装。报告使用iReport编写,并使用JasperReports生成。

The database is PostgreSQL with R (mostly) installed. The reports are written using iReport and generated using JasperReports.

目前,线性回归模型适用于每日数据的年度平均值。线性回归模型是通过PostgreSQL函数计算的,如下所示:

Currently, a linear regression model is applied against annual averages of daily data. The linear regression model is calculated within a PostgreSQL function as follows:

SELECT 
  regr_slope( amount, year_taken ),
  regr_intercept( amount, year_taken ),
  corr( amount, year_taken )
FROM
  temp_regression
INTO STRICT slope, intercept, correlation;

使用以下方法将结果返回到JasperReports:

The results are returned to JasperReports using:

SELECT
  year_taken,
  amount,
  year_taken * slope + intercept,
  slope,
  intercept,
  correlation,
  total_measurements
INTO result;

JasperReports使用以下参数化分析函数调用PostgreSQL:

JasperReports calls into PostgreSQL using the following parameterized analysis function:

SELECT
  year_taken,
  amount,
  measurements,
  regression_line,
  slope,
  intercept,
  correlation,
  total_measurements,
  execute_time
FROM
  climate.analysis(
    $P{CityId},
    $P{Elevation1},
    $P{Elevation2},
    $P{Radius},
    $P{CategoryId},
    $P{Year1},
    $P{Year2}
  )
ORDER BY year_taken

这不是最佳解决方案,因为它可以错误的印象是气候正在以缓慢但稳定的速度变化。

This is not an optimal solution because it gives the false impression that the climate is changing at a slow, but steady rate.

使用带有两个参数(例如,年份[X]和金额[Y])的函数,例如PostgreSQL的 regr_slope

Using functions that take two parameters (e.g., year [X] and amount [Y]), such as PostgreSQL's regr_slope:


  • 应用哪种更好的回归模型?

  • 哪些CPAN-R软件包提供了此类模型? (理想情况下,可使用 apt-get 安装。)

  • 如何在PostgreSQL函数中调用R函数?

  • What is a better regression model to apply?
  • What CPAN-R packages provide such models? (Installable, ideally, using apt-get.)
  • How can the R functions be called within a PostgreSQL function?

如果不存在这样的功能:

If no such functions exist:


  • 什么参数我应该尝试获取产生所需拟合的函数吗?

  • 您如何建议显示最佳拟合曲线?

请记住,这是一个供公众使用的Web应用程序。如果分析数据的唯一方法是使用R Shell,则该目的已被击败。 (我知道到目前为止,大多数R函数都不是这种情况。)

Keep in mind that this is a web app for use by the general public. If the only way to analyse the data is from an R shell, then the purpose has been defeated. (I know this is not the case for most R functions I have looked at so far.)

谢谢!

推荐答案

令人敬畏的 pl / r 软件包可让您在PostgreSQL内部以程序语言运行R。之所以存在一些陷阱,是因为R倾向于根据矢量来考虑数据,而RDBMS则不这样做。它仍然是一个非常有用的软件包,因为它为您提供了PostgreSQL内部的 R ,从而节省了一些体系结构的往返过程。

The awesome pl/r package allows you to run R inside PostgreSQL as a procedural language. There are some gotchas because R likes to think about data in terms of vectors which is not what a RDBMS does. It is still a very useful package as it gives you R inside of PostgreSQL saving you some of the roundtrips of your architecture.

pl / r 对于 apt-get 是可行的您,因为它已经是Debian / Ubuntu的一部分了。从 apt-cache show postgresql-8.4-plr 开始(即在测试中,其他版本/版本也有)。

And pl/r is apt-get-able for you as it has been part of Debian / Ubuntu for a while. Start with apt-cache show postgresql-8.4-plr (that is on testing, other versions/flavours have it too).

关于适当的建模:那是完全不同的局面。 黄土是非参数化的合理建议,您可能还需要某种动态模型,ARMA / ARIMA或滞后回归。考虑到主题的政治化程度,建模的选择非常关键。

As for the appropriate modeling: that is a whole different ballgame. loess is a fair suggestion for something non-parametric, and you probably also want some sort of dynamic model, either ARMA/ARIMA or lagged regression. The choice of modeling is pretty critical given how politicized the topic is.

这篇关于PostgreSQL中使用R的非线性回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆