使用RPostgreSQL提取数据时,有没有特定的方法来处理R中的时间戳列? [英] Is there a specific way to handle timestamp columns in R when pulling data using RPostgreSQL?

查看:104
本文介绍了使用RPostgreSQL提取数据时,有没有特定的方法来处理R中的时间戳列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从PostgreSQL数据库中提取数据,并且timestamp字段的结果不一致.我不确定是否正确处理POSIXct结果.否则,我想我在RPostgreSQL软件包中发现了一个错误.这是复制问题的方法:

I'm trying to pull data from a PostgreSQL database and the results for a timestamp field are inconsistent. I'm not sure if I'm handling POSIXct results properly. Otherwise, I think I found a bug in the RPostgreSQL package. Here is the way to replicate the issue:

假设postgres数据库中有一个具有一个字段的表(在PostgreSQL中运行):

Suppose there is a table in a postgres database with one field (run this in PostgreSQL):

CREATE DATABASE mydb;
CREATE TABLE test_table
(   
  "DateTime" timestamp without time zone NOT NULL,
  CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime")
)
WITH (
  OIDS=FALSE
);
ALTER TABLE test_table
  OWNER TO postgres;

假设有几百条记录.我将在R中填充它们.这是代码:

And let’s say there are a few hundred records. I will populate them in R. Here is the code:

library(RPostgreSQL)

# Let's feed the table with some sequence of date/time values
date_values <-  as.chron(seq(10000, 10500, 1/24))

format.chron <- function(z)  {
  sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00", 
            as.numeric(as.character(years(z))), 
            months(z), 
            as.numeric(as.character(days(z))), 
            as.numeric(as.character(hours(z))), 
            as.numeric(as.character(minutes(z))))
}

.generateInsertQuery <- function(date_values, field_name, table_name) {
  insert_val  <- paste(paste0("(", sQuote(format(date_values)), ")"), collapse=',')
  qry         <- paste("INSERT INTO", dQuote(table_name), paste0("(", dQuote(field_name), ")"), "VALUES", insert_val)
  qry
}

drv <- dbDriver('PostgreSQL')
con <- dbConnect(drv, user='postgres', dbname='mydb')
qry <- .generateInsertQuery(date_values, "DateTime", "test_table")
dbSendQuery(con, qry)

如果我尝试获取这些值,则会从结果数据中剥离时间分量

If I try to get the values, the time component gets stripped out of the resulting data

res <- dbGetQuery(con, "SELECT * FROM test_table")
res[1:20,1]

结果的类是POSIXct

The class of the result, however, is POSIXct

class(res[,1])

如果一次获取一条记录,则小时:分钟等于00:00的值将丢失时间部分:

If the result is fetched one record at a time, the values with hour:min equal to 00:00 loose the time component:

rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table")
res_list <- list()
for(i in 1:100) res_list[i]  <- fetch(rs,1)
res_list

作为一种解决方法,我一次要获取结果1记录,将其修复并将其聚合到data.frame中.但这非常耗时,特别是对于大型数据集.为什么会发生这种情况以及如何处理此问题的任何想法?

As a workaround, I'm fetching the result 1 record at a time, fixing, and aggregating them into a data.frame. But this is very time-consuming, especially for large data sets. Any ideas of why this is happening and how to deal with this issue?

提前谢谢!

推荐答案

首先,RPostgreSQL项目有一个邮件列表.我建议你张贴在那里.

First off, the RPostgreSQL project has a mailing list; I suggest you post there.

PostgreSQL有两种日期时间类型:带和不带时区.我记得,R仅映射后者.我确实为此编写了一些早期回归测试(请参阅软件包源),但最近并未参与该项目.但是我确实记得POSIXct可以来回映射到PostgreSQL日期时间类型.

PostgreSQL has two datetime types: with and without timezone. As I recall, R only maps the latter. I did write some early regression tests for this (see the package source) but have not been that involved with the project of late. But I do recall that POSIXct maps back and forth to the PostgreSQL datetime type just fine.

这篇关于使用RPostgreSQL提取数据时,有没有特定的方法来处理R中的时间戳列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆