将strptime应用到本地数据帧 [英] applying strptime to local data frame

查看:148
本文介绍了将strptime应用到本地数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想我有一个与 \ 相关的问题,我无法处理。



这是一个摘录自data.frame的DateTime列我已经阅读了 read_csv

 地震[1:20,1] 
资料来源:本地资料框架[20 x 1]
DateTime
(chr)
1 1964/01/01 12:21 :55.40
2 1964/01/01 14:16:27.60
3 1964/01/01 14:18:53.90
4 1964/01/01 15:49:47.90
5 1964/01/01 17:26:43.50

我的目标是在这里提取年份。 Manully做

 >格式(strptime(c(1964/01/01 12:21:55.40,1964/01/01 12:21:55.40,1964/01/01 14:16:27.60),%Y / %s /%d%H:%M:%OS),%Y)
[1]196419641964
pre>

按照预期工作。但是,

 > strptime(地震[1:5,1],%Y /%m /%d%H:%M:%OS)
DateTime
NA

我的预期是,问题与

  as.character(地震[1:5,1])$ ​​b $ b [1]c(\1964/01/01 12:21:55.40 \,\1964/01/01 14:16:27.60 \,\1964/01/01 14:18:53.90 \,\1964/01/01 15:49:47.90 \,\1964/01 / 01 17:26:43.50\)

所以,数据框中的列还包含通过转义 \。但是我不知道从这里处理这个。



鉴于这几年是前四个条目,似乎还可以(但不太优雅,imho)做

  substr(地震[1:5,1],1,4)

但那么相应地只是给了

  1]c(\1

显然,我可以做

  substr(地震[1:5,1],4,7)

但这只适用于第一行。

解决方案

显然你有一个 dplyr :: tbl_df ,默认情况下, [从不将单个列简化为原子向量到 [应用于基础R data.frame ),因此,您可以使用 [/ code> $ 提取列,然后将其简化为原子向量。



一些例子:

  data iris)
库(dplyr)
x< - tbl_df(iris)
x [1:5,1]
#Source:本地数据框[5 x 1]

#Sepal.Length
#(dbl)
#1 5.1
#2 4.9
#3 4.7
#4 4.6
#5 5.0
iris [1:5,1]
#[1] 5.1 4.9 4.7 4.6 5.0
x [[1]] [1:5]
#[1] 5.1 4.9 4.7 4.6 5.0
x $ Sepal.Length [1:5]
#[1] 5.1 4.9 4.7 4.6 5.0


I think I have a problem related to \ that I fail to handle.

Here is an excerpt from a DateTime column of a data.frame I have read with read_csv:

earthquakes[1:20,1]
Source: local data frame [20 x 1]
                 DateTime
                    (chr)
1  1964/01/01 12:21:55.40
2  1964/01/01 14:16:27.60
3  1964/01/01 14:18:53.90
4  1964/01/01 15:49:47.90
5  1964/01/01 17:26:43.50

My goal is to extract the years here. Manully doing

> format(strptime(c("1964/01/01 12:21:55.40","1964/01/01 12:21:55.40","1964/01/01 14:16:27.60"), "%Y/%m/%d %H:%M:%OS"), "%Y")
[1] "1964" "1964" "1964"

works as intended. However,

> strptime(earthquakes[1:5,1], "%Y/%m/%d %H:%M:%OS")
DateTime 
      NA 

My hunch is that the problem is related to

as.character(earthquakes[1:5,1])
[1] "c(\"1964/01/01 12:21:55.40\", \"1964/01/01 14:16:27.60\", \"1964/01/01 14:18:53.90\", \"1964/01/01 15:49:47.90\", \"1964/01/01 17:26:43.50\")"

So, that the column in the data frame does also contain the " via the escape \". But I do not know how to handle this from here.

Given that the years are the first four entries, it would also seem OK (but less elegant, imho) to do

substr(earthquakes[1:5,1],1,4)

but that then accordingly just gives

[1] "c(\"1"

Clearly, I could do

substr(earthquakes[1:5,1],4,7)

but that would only work for the first row.

解决方案

Apparently you have a dplyr::tbl_df and by default in those, [ never simplifies a single column to an atomic vector (in contrast to [ applied to a base R data.frame). Hence, you could use either [[ or $ to extract the column which will then be simplified to atomic vector.

Some examples:

data(iris)
library(dplyr)
x <- tbl_df(iris)
x[1:5, 1]
#Source: local data frame [5 x 1]
#
#  Sepal.Length
#         (dbl)
#1          5.1
#2          4.9
#3          4.7
#4          4.6
#5          5.0
iris[1:5, 1]
#[1] 5.1 4.9 4.7 4.6 5.0
x[[1]][1:5]
#[1] 5.1 4.9 4.7 4.6 5.0
x$Sepal.Length[1:5]
#[1] 5.1 4.9 4.7 4.6 5.0

这篇关于将strptime应用到本地数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆