在R中的列上使用gsub [英] Using gsub on columns in R

查看:197
本文介绍了在R中的列上使用gsub的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个数据帧(数据),具有数千行和10列. 列中的9个包含具有多个级别的因子.

I have a data frame (data) in R with thousands of rows and 10 columns. 9 of the columns contain factors with several levels.

这是数据帧的一小部分.

Here is a small portion of the data frame.

gr1

10 303.90

10 303.90

11 304.1

12 303.6

13 303.90磅

13 303.90 obs

14 303.90k

14 303.90k

例如,一个因素的水平为"303.90",另一个水平为"303.90 obs".我想将"303.90 obs"更改为"303.90". 我正在使用以下命令来编辑关卡的名称.

As an example, one factor has a level that is "303.90" and another level that is "303.90 obs". I want to change the "303.90 obs" to "303.90". I am using the following command to edit the names of the level.

data[] = as.data.frame(lapply(data, function(x) {x = gsub("303.90 obs","303.90", fixed = T, x)}))

但这不会将级别"303.90 obs"更改为"303.90".它只是保持不变. 该命令仍然适用于其他字符串,例如. "303.9"更改为"303.90" 当我使用时:

But this is not changing the level "303.90 obs" to "303.90". It just stays the same. Still this command works for other strings, eg. "303.9" gets changed to "303.90" when I use:

data[] = as.data.frame(lapply(data, function(x) {x = gsub("303.9 obs","303.90", fixed = T, x)}))

关于为什么会这样的任何建议?

Any suggestions to why this might be ?

推荐答案

我对lapply并不熟悉,因此我的解决方案只是在数据框的列上循环.这可以正常工作.

I'm not that familiar with lapply therefore my solution simply loops over the columns of the dataframe. This works as it should.

col1 <- 1:10
col2 <- 21:30
col3 <- c("503.90", "303.90 obs", "803.90sfsdf sf", "203.90 obs", "303.90", "103.90 obs", "303.90", "403.90 obs", "803.90sfsdf sf", "303.90 obs")
col4 <- c("303.90", "303.90 obs", "303.90", "203.90 obs", "303.90", "107.40fghfg", "303.90", "303.90 obs", "303.90", "303.90 obs")

data <- data.frame(col1, col2, col3, col4)

data$col3 <- as.factor(data$col3)
data$col4 <- as.factor(data$col4)

for(i in 3:4) {
  matchedExpression = regexpr(pattern = "\\d+\\.\\d+", text = data[,i])
  data[,i] = regmatches(x = data[,i], m = matchedExpression)
  data[,i] <- as.factor(data[,i])
}

编辑

OP更改了描述.将所有因素更改为303.90 regex是更好的解决方案.但是,OP需要更多信息才能提供一般解决方案,例如只是应该更改的303.90吗?

OP changed description. To change all factors to 303.90regex is a better solution. However, more information are necessary from the OP to give a general solution e.g. is it only 303.90 which should be changed?

EDIT2

更新了脚本,因为OP提供了更多信息,例如列可以具有与303.90不同的因素.

Updated the script since OP provided more information e.g. columns can have different factors than 303.90.

这篇关于在R中的列上使用gsub的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆