从 R 中的整个数据帧中删除空白 [英] Removing Whitespace From a Whole Data Frame in R

查看:60
本文介绍了从 R 中的整个数据帧中删除空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试删除数据框中的空白区域(使用 R).数据框很大 (>1gb) 并且有多个列,每个数据条目中都包含空格.

有没有一种快速的方法可以从整个数据框中删除空白?我一直在尝试使用以下方法对前 10 行数据的子集执行此操作:

gsub( " ", "", mydata)

这似乎不起作用,尽管 R 返回了我无法解释的输出.

str_replace( " ", "", mydata)

R 返回了 47 个警告并且没有删除空格.

erase_all(mydata, " ")

R 返回一个错误,指出错误:找不到函数erase_all""

我真的很感激这方面的帮助,因为我在过去的 24 小时内一直在努力解决这个问题.

谢谢!

解决方案

如果我理解正确,那么您想从整个数据框中删除所有空格,我猜您正在使用的代码适用于删除空格列名.我想你应该试试这个:

 apply(myData,2,function(x)gsub('\s+', '',x))

希望这有效.

然而,这将返回一个矩阵,如果您想将其更改为数据框,请执行以下操作:

as.data.frame(apply(myData,2,function(x)gsub('\s+', '',x)))

在 2020 年

使用 lapplytrimws 函数与 both=TRUE 可以删除前导和尾随空格,但不能删除其中.因为没有输入数据由 OP 提供,我正在添加一个虚拟示例来生成结果.

数据:

df <- data.frame(val = c("abc","kl m","dfsd"),val1 = c("klm","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)

#situation: 1(使用 Base R),当我们想删除仅在前导和尾随不在字符串值内的空格时,我们可以使用 <强>修剪

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified],trimws)

#情况:2(使用 Base R),当我们想要删除字符列中数据帧中每个位置的空格(字符串内部以及前导和尾端).

(这是使用 apply 提出的初始解决方案,请注意使用 apply 的解决方案似乎有效但会很慢,而且问题显然不是很清楚,如果 OP 真的想删除前导/尾随空白或数据中的每个空白)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\s+','',x))

## 情况:1(使用 data.table,仅删除前导和尾随空格)

library(data.table)设置DT(df)cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]

输出来自situation1:

<块引用>

 val val1 num num11: abc klm 1 22:kl m gdfs 2 33:dfsd 123 3 4

##情况:2(使用data.table,删除内部的每个空格以及前导/尾随空格)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\s+', '', x)), .SDcols = cols_to_be_rectified]

输出来自situation2:

<块引用>

 val val1 num num11: abc klm 1 22:klm gdfs 2 33:dfsd 123 3 4

注意两种情况的输出之间的差异,在第 2 行:您可以看到,使用 trimws 我们可以删除前导和尾随空白,但是使用正则表达式解决方案我们能够删除每个空白.

希望能帮到你,谢谢

I've been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains white space in every data entry.

Is there a quick way to remove the white space from the whole data frame? I've been trying to do this on a subset of the first 10 rows of data using:

gsub( " ", "", mydata) 

This didn't seem to work, although R returned an output which I have been unable to interpret.

str_replace( " ", "", mydata)

R returned 47 warnings and did not remove the white space.

erase_all(mydata, " ")

R returned an error saying 'Error: could not find function "erase_all"'

I would really appreciate some help with this as I've spent the last 24hrs trying to tackle this problem.

Thanks!

解决方案

If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:

 apply(myData,2,function(x)gsub('\s+', '',x))

Hope this works.

This will return a matrix however, if you want to change it to data frame then do:

as.data.frame(apply(myData,2,function(x)gsub('\s+', '',x)))

EDIT In 2020:

Using lapply and trimws function with both=TRUE can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.

DATA:

df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)

#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)

# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).

(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\s+','',x))

## situation: 1 (Using data.table, removing only leading and trailing blanks)

library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]

Output from situation1:

    val val1 num num1
1:  abc  klm   1    2
2: kl m gdfs   2    3
3: dfsd  123   3    4

## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\s+', '', x)), .SDcols = cols_to_be_rectified]

Output from situation2:

    val val1 num num1
1:  abc  klm   1    2
2:  klm gdfs   2    3
3: dfsd  123   3    4

Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).

I hope this helps , Thanks

这篇关于从 R 中的整个数据帧中删除空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆