转换年龄,在R中输入为“ X周,Y天,Z小时” [英] Convert age entered as 'X Weeks, Y Days, Z hours' in R

查看:88
本文介绍了转换年龄,在R中输入为“ X周,Y天,Z小时”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个年龄变量,其中包含遵循以下(不一致)格式的观察结果:

I have an age variable containing observations that follow this (inconsistent) format:

3 weeks, 2 days, 4 hours
4 weeks, 6 days, 12 hours
3 days, 18 hours
4 days, 3 hours
7 hours
8 hours

我需要使用R将每个观察值转换为小时。

I need to convert each observation to hours using R.

strsplit(vector,',')在每个逗号处拆分变量。

I have used strsplit(vector, ',') to split the variable at each comma.

我遇到了麻烦,因为将每个观察结果拆分为,每个观察结果会产生1到3个条目。我不知道如何正确地为这些条目建立索引,以至于每个观察结果最后一行。

I am running trouble because splitting each observation at the ',' yields anywhere from 1 to 3 entries for each observation. I do not know how to properly index these entries so that I end up with one row for each observation.

我猜想一旦能够将这些值存储在合理的行,我可以从一行的每一列中提取数值数据并进行相应的转换,然后对整个行求和。

I am guessing that once I am able to store these values in sensible rows, I can extract the numeric data from each column in a row and convert accordingly, then sum the entire row.

我也欢迎采用任何其他方法这个问题。

I am also open to any different methods of approaching this problem.

推荐答案

Hadleyverse再次求助:

Hadleyverse comes to the rescue again:

library(lubridate)
library(stringr)

dat <- readLines(textConnection(" 3   weeks,   2  days,  4 hours
 4 week,  6 days,  12 hours 
3 days, 18 hours
4 day, 3 hours
 7 hours
8  hour"))

sapply(str_split(str_trim(dat), ",[ ]*"), function(x) {
  sum(sapply(x, function(y) {
    bits <- str_split(str_trim(y), "[ ]+")[[1]]
    duration(as.numeric(bits[1]), bits[2])
  })) / 3600
})

## [1] 556 828  90  99   7   8

我对数据进行了一些修改,以表明它在解析事物方面也比较灵活。我不认为第二个 str_trim 绝对不是必需的,但是没有验证周期。

I whacked the data a bit to show it's also somewhat flexible in how it parses things. I rly don't think the second str_trim is absolutely necessary but didn't have cycles to verify.

解释是它会修剪原始向量,然后将其拆分为多个分量(从而形成向量列表)。然后对该列表进行迭代,并进一步修剪各个矢量元素,并将其分为#和单位持续时间。传递给lubridate并返回值,并通过调用 sum 自动将其转换为数字秒,然后将其换算为小时。

The exposition is that it trims the original vector then splits it into components (which makes a list of vectors). That list is then iterated over and the individual vector elements are further trimmed and split into # and unit duration. That's passed to lubridate and the value is returned and automatically converted to numeric seconds by the call to sum and we then make it into hours.

这篇关于转换年龄,在R中输入为“ X周,Y天,Z小时”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆