将strptime应用到本地数据帧 [英] applying strptime to local data frame
问题描述
我想我有一个与 \
相关的问题,我无法处理。
这是一个摘录自data.frame的DateTime列我已经阅读了 read_csv
:
地震[1:20,1]
资料来源:本地资料框架[20 x 1]
DateTime
(chr)
1 1964/01/01 12:21 :55.40
2 1964/01/01 14:16:27.60
3 1964/01/01 14:18:53.90
4 1964/01/01 15:49:47.90
5 1964/01/01 17:26:43.50
我的目标是在这里提取年份。 Manully做
>格式(strptime(c(1964/01/01 12:21:55.40,1964/01/01 12:21:55.40,1964/01/01 14:16:27.60),%Y / %s /%d%H:%M:%OS),%Y)
pre>
[1]196419641964
按照预期工作。但是,
> strptime(地震[1:5,1],%Y /%m /%d%H:%M:%OS)
DateTime
NA
我的预期是,问题与
as.character(地震[1:5,1])$ b $ b [1]c(\1964/01/01 12:21:55.40 \,\1964/01/01 14:16:27.60 \,\1964/01/01 14:18:53.90 \,\1964/01/01 15:49:47.90 \,\1964/01 / 01 17:26:43.50\)
所以,数据框中的列还包含通过转义
\
。但是我不知道从这里处理这个。
鉴于这几年是前四个条目,似乎还可以(但不太优雅,imho)做
substr(地震[1:5,1],1,4)
但那么相应地只是给了
1]c(\1
显然,我可以做
substr(地震[1:5,1],4,7)
但这只适用于第一行。
解决方案显然你有一个
dplyr :: tbl_df
,默认情况下,[
从不将单个列简化为原子向量到[
应用于基础Rdata.frame
),因此,您可以使用[/ code> $
提取列,然后将其简化为原子向量。
一些例子:
data iris)
库(dplyr)
x< - tbl_df(iris)
x [1:5,1]
#Source:本地数据框[5 x 1]
#
#Sepal.Length
#(dbl)
#1 5.1
#2 4.9
#3 4.7
#4 4.6
#5 5.0
iris [1:5,1]
#[1] 5.1 4.9 4.7 4.6 5.0
x [[1]] [1:5]
#[1] 5.1 4.9 4.7 4.6 5.0
x $ Sepal.Length [1:5]
#[1] 5.1 4.9 4.7 4.6 5.0
I think I have a problem related to
\
that I fail to handle.Here is an excerpt from a DateTime column of a data.frame I have read with
read_csv
:earthquakes[1:20,1] Source: local data frame [20 x 1] DateTime (chr) 1 1964/01/01 12:21:55.40 2 1964/01/01 14:16:27.60 3 1964/01/01 14:18:53.90 4 1964/01/01 15:49:47.90 5 1964/01/01 17:26:43.50
My goal is to extract the years here. Manully doing
> format(strptime(c("1964/01/01 12:21:55.40","1964/01/01 12:21:55.40","1964/01/01 14:16:27.60"), "%Y/%m/%d %H:%M:%OS"), "%Y") [1] "1964" "1964" "1964"
works as intended. However,
> strptime(earthquakes[1:5,1], "%Y/%m/%d %H:%M:%OS") DateTime NA
My hunch is that the problem is related to
as.character(earthquakes[1:5,1]) [1] "c(\"1964/01/01 12:21:55.40\", \"1964/01/01 14:16:27.60\", \"1964/01/01 14:18:53.90\", \"1964/01/01 15:49:47.90\", \"1964/01/01 17:26:43.50\")"
So, that the column in the data frame does also contain the " via the escape
\"
. But I do not know how to handle this from here.Given that the years are the first four entries, it would also seem OK (but less elegant, imho) to do
substr(earthquakes[1:5,1],1,4)
but that then accordingly just gives
[1] "c(\"1"
Clearly, I could do
substr(earthquakes[1:5,1],4,7)
but that would only work for the first row.
解决方案Apparently you have a
dplyr::tbl_df
and by default in those,[
never simplifies a single column to an atomic vector (in contrast to[
applied to a base Rdata.frame
). Hence, you could use either[[
or$
to extract the column which will then be simplified to atomic vector.Some examples:
data(iris) library(dplyr) x <- tbl_df(iris) x[1:5, 1] #Source: local data frame [5 x 1] # # Sepal.Length # (dbl) #1 5.1 #2 4.9 #3 4.7 #4 4.6 #5 5.0 iris[1:5, 1] #[1] 5.1 4.9 4.7 4.6 5.0 x[[1]][1:5] #[1] 5.1 4.9 4.7 4.6 5.0 x$Sepal.Length[1:5] #[1] 5.1 4.9 4.7 4.6 5.0
这篇关于将strptime应用到本地数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!