从R中的整个数据帧中删除空格 [英] Removing Whitespace From a Whole Data Frame in R

查看:273
本文介绍了从R中的整个数据帧中删除空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试删除数据框(使用R)中的空白.数据帧很大(> 1gb),并具有多个列,每个数据条目中都包含空白.

是否有一种快速的方法可以从整个数据框中删除空白?我一直在尝试使用以下方法对前10行数据的子集执行此操作:

gsub( " ", "", mydata) 

这似乎没有用,尽管R返回了我无法解释的输出.

str_replace( " ", "", mydata)

R返回了 47条警告,并且没有删除空白.

erase_all(mydata, " ")

R返回一个错误,提示错误:找不到函数"erase_all""

在此过程中,我花了24小时来解决这个问题,对此我将非常感谢.

谢谢!

解决方案

如果我正确理解了您的意思,那么您想从整个数据框中删除所有空白,我想您正在使用的代码很好地删除了其中的空格.列名称.我认为您应该尝试这样做:

 apply(myData,2,function(x)gsub('\\s+', '',x))

希望这行得通.

这将返回一个矩阵,但是,如果您想将其更改为数据框,请执行以下操作:

as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))

在2020年进行修改:

both=TRUE中使用lapplytrimws函数可以删除开头和结尾的空格,但不能删除其中的空格.由于OP没有提供输入数据,因此我添加了一个虚拟示例来生成结果.

数据:

df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)

#situation:1 (使用Base R),当我们只想在字符串值的开头和结尾处删除空格时,不使用,我们可以使用<

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)

#情况:2 (使用Base R),当我们想删除字符列中数据帧中每个位置的空格时(在字符串的内以及前导和尾部).

(这是使用apply提出的最初解决方案,请注意,使用apply的解决方案似乎可以工作,但是会很慢,还有一个问题,就是如果OP真的想删除前导/尾随,那么它的问题显然不是很清楚空白或数据中的每个空白)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))

##情况:1 (使用data.table,仅删除前导空格和尾随空格)

library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]

输出来自情况1 :

    val val1 num num1
1:  abc  klm   1    2
2: kl m gdfs   2    3
3: dfsd  123   3    4

##情况:2 (使用data.table,删除内部的所有空白以及前导/后退空白)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]

输出来自 situation2 :

    val val1 num num1
1:  abc  klm   1    2
2:  klm gdfs   2    3
3: dfsd  123   3    4

请注意两种情况的输出之间的差异,在第2行:您可以看到,使用trimws我们可以删除开头和结尾的空格,但是使用正则表达式解决方案我们可以删除每个空格.

我希望这会有所帮助,谢谢

I've been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains white space in every data entry.

Is there a quick way to remove the white space from the whole data frame? I've been trying to do this on a subset of the first 10 rows of data using:

gsub( " ", "", mydata) 

This didn't seem to work, although R returned an output which I have been unable to interpret.

str_replace( " ", "", mydata)

R returned 47 warnings and did not remove the white space.

erase_all(mydata, " ")

R returned an error saying 'Error: could not find function "erase_all"'

I would really appreciate some help with this as I've spent the last 24hrs trying to tackle this problem.

Thanks!

解决方案

If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:

 apply(myData,2,function(x)gsub('\\s+', '',x))

Hope this works.

This will return a matrix however, if you want to change it to data frame then do:

as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))

EDIT In 2020:

Using lapply and trimws function with both=TRUE can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.

DATA:

df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)

#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)

# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).

(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))

## situation: 1 (Using data.table, removing only leading and trailing blanks)

library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]

Output from situation1:

    val val1 num num1
1:  abc  klm   1    2
2: kl m gdfs   2    3
3: dfsd  123   3    4

## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]

Output from situation2:

    val val1 num num1
1:  abc  klm   1    2
2:  klm gdfs   2    3
3: dfsd  123   3    4

Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).

I hope this helps , Thanks

这篇关于从R中的整个数据帧中删除空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆