通过从R中给定列中提取字符来循环创建列/变量 [英] Loop to create columns/variables by extracting characters from given column in R

查看:182
本文介绍了通过从R中给定列中提取字符来循环创建列/变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据集如下所示:

 关键日期人口普查j 
1:01_35004_10-14 _ + _ M 11NOV2001 2.934397 01
2:01_35004_10-14 _ + _ M 06JAN2002 3.028231 01
3:01_35004_10-14 _ + _ M 07APR2002 3.180712 01
4:01_35004_10-14 _ + _ M 02JUN2002 3.274546 01
5: 01_35004_10-14 _ + _ M 28JUL2002 3.368380 01
6:01_35004_10-14 _ + _ M 22SEP2002 3.462214 01
7:01_35004_10-14 _ + _ M 22DEC2002 3.614694 01
8:01_35004_10-14 _ + _ M 16FEB2003 3.708528 01
9:01_35004_10-14 _ + _ M 13JUL2003 3.954843 01
10:01_35004_10-14 _ + _ M 07SEP2003 4.048677 01

列key中的某些字符对应于不同的变量。
例如:01是州,
35004是邮政编码,
10-14是年龄组,
+是比赛,
M是性别



我想提取每个字符为它们创建单独的变量(例如,填充01的状态列,填充35004的Zip Code列等)



这是我的代码:

pre code var = c( ($ Var){
play $ j = gsub(_。* $,,打$ key

code
$ b显然这是不正确的。我希望循环遍历key列中的每个观测值,并生成一个变量,其中包含与变量相关联的提取字符。

解决方案 read.csv :

 #您的数据摘录(仅包含模型点坐标的坐标列)
x < - c(01_35004_10-14 _ + _ M,01_35004_10-14 _ + _ M)

#简单的方法是将字符串视为CSV行:-) $ b $由< - read.csv(text = x,sep =_,header = FALSE)

#修正错误的列名
名(y)< -c(State,Zip_Code,Age_Group,Race,Gender)

#现在通过使用翻译(lookup)表
gender.lookup< - data.frame(gender.code = c(M,F),gender.name = c 男,女))

#将重新编码的值添加为新列。注意:查找失败将被忽略
y $ GenderName< - gender.lookup $ gender.name [match(y $ Gender,gender.lookup $ gender.code)]

因为我没有更多的查询数据在你的问题中,所以我将循环的实现留给了你的想象...(例如,使用 lapply 以及与列索引具有相同索引位置的查找表的列表)。


My data set looks like this:

                  key      date   census  j
1: 01_35004_10-14_+_M 11NOV2001 2.934397 01
2: 01_35004_10-14_+_M 06JAN2002 3.028231 01
3: 01_35004_10-14_+_M 07APR2002 3.180712 01
4: 01_35004_10-14_+_M 02JUN2002 3.274546 01
5: 01_35004_10-14_+_M 28JUL2002 3.368380 01
6: 01_35004_10-14_+_M 22SEP2002 3.462214 01
7: 01_35004_10-14_+_M 22DEC2002 3.614694 01
8: 01_35004_10-14_+_M 16FEB2003 3.708528 01
9: 01_35004_10-14_+_M 13JUL2003 3.954843 01
10: 01_35004_10-14_+_M 07SEP2003 4.048677 01

Certain characters within the column "key" correspond to different variables. For instance: 01 is the State, 35004 is the Zip Code, 10-14 is the Age Group, + is the Race, M is the Gender

I want to extract each of these characters to create separate variables for them (i.e. a column for state filled with 01, a column for Zip Code filled with 35004, etc)

Here is my code:

Var = c("State","Zip_Code", "Age_Group", "Race", "Gender")
for(j in Var){
play$j = gsub("_.*$","",play$key) 
}

Clearly this is not correct. I would like the loop to iterate through each observation in the "key" column and produce a variable with the extracted character associated with the variable.

解决方案

The basic solution (without expecting a good performance) uses read.csv:

# excerpt of your data (only the "coordinate" column containing the model point coordinates)
x <- c("01_35004_10-14_+_M", "01_35004_10-14_+_M")

# simple way is treating the string as CSV row :-)
y <- read.csv(text = x, sep="_", header=FALSE)

# Fix the wrong column names
names(y) <- c("State","Zip_Code", "Age_Group", "Race", "Gender")

# Now recode one example column by using translation ("lookup") table
gender.lookup <- data.frame( gender.code=c("M", "F"), gender.name=c("Male", "Female"))

# Add the recoded value as new column. Note: Lookup failures are ignored
y$GenderName <- gender.lookup$gender.name[match(y$Gender, gender.lookup$gender.code)]

I am leaving the implementation of the loop to your imagination since I don't have more lookup data in your question... (e. g. use lapply and a list of lookup tables with the same index positions as the column indices).

这篇关于通过从R中给定列中提取字符来循环创建列/变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆