从 R 中的整个数据帧中删除空白 [英] Removing Whitespace From a Whole Data Frame in R
问题描述
我一直在尝试删除数据框中的空白区域(使用 R).数据框很大 (>1gb) 并且有多个列,每个数据条目中都包含空格.
有没有一种快速的方法可以从整个数据框中删除空白?我一直在尝试使用以下方法对前 10 行数据的子集执行此操作:
gsub( " ", "", mydata)
这似乎不起作用,尽管 R 返回了我无法解释的输出.
str_replace( " ", "", mydata)
R 返回了 47 个警告并且没有删除空格.
erase_all(mydata, " ")
R 返回一个错误,指出错误:找不到函数erase_all""
我真的很感激这方面的帮助,因为我在过去的 24 小时内一直在努力解决这个问题.
谢谢!
如果我理解正确,那么您想从整个数据框中删除所有空格,我猜您正在使用的代码适用于删除空格列名.我想你应该试试这个:
apply(myData,2,function(x)gsub('\s+', '',x))
希望这有效.
然而,这将返回一个矩阵,如果您想将其更改为数据框,请执行以下操作:
as.data.frame(apply(myData,2,function(x)gsub('\s+', '',x)))
在 2020 年
使用 lapply
和 trimws
函数与 both=TRUE
可以删除前导和尾随空格,但不能删除其中.因为没有输入数据由 OP 提供,我正在添加一个虚拟示例来生成结果.
数据:
df <- data.frame(val = c("abc","kl m","dfsd"),val1 = c("klm","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)
#situation: 1(使用 Base R),当我们想删除仅在前导和尾随不在字符串值内的空格时,我们可以使用 <强>修剪
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified],trimws)
#情况:2(使用 Base R),当我们想要删除字符列中数据帧中每个位置的空格(字符串内部以及前导和尾端).
(这是使用 apply 提出的初始解决方案,请注意使用 apply 的解决方案似乎有效但会很慢,而且问题显然不是很清楚,如果 OP 真的想删除前导/尾随空白或数据中的每个空白)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\s+','',x))
## 情况:1(使用 data.table,仅删除前导和尾随空格)
library(data.table)设置DT(df)cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]
输出来自situation1:
<块引用> val val1 num num11: abc klm 1 22:kl m gdfs 2 33:dfsd 123 3 4
##情况:2(使用data.table,删除内部的每个空格以及前导/尾随空格)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\s+', '', x)), .SDcols = cols_to_be_rectified]
输出来自situation2:
<块引用> val val1 num num11: abc klm 1 22:klm gdfs 2 33:dfsd 123 3 4
注意两种情况的输出之间的差异,在第 2 行:您可以看到,使用 trimws
我们可以删除前导和尾随空白,但是使用正则表达式解决方案我们能够删除每个空白.
希望能帮到你,谢谢
I've been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains white space in every data entry.
Is there a quick way to remove the white space from the whole data frame? I've been trying to do this on a subset of the first 10 rows of data using:
gsub( " ", "", mydata)
This didn't seem to work, although R returned an output which I have been unable to interpret.
str_replace( " ", "", mydata)
R returned 47 warnings and did not remove the white space.
erase_all(mydata, " ")
R returned an error saying 'Error: could not find function "erase_all"'
I would really appreciate some help with this as I've spent the last 24hrs trying to tackle this problem.
Thanks!
If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:
apply(myData,2,function(x)gsub('\s+', '',x))
Hope this works.
This will return a matrix however, if you want to change it to data frame then do:
as.data.frame(apply(myData,2,function(x)gsub('\s+', '',x)))
EDIT In 2020:
Using lapply
and trimws
function with both=TRUE
can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.
DATA:
df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)
#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)
# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).
(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\s+','',x))
## situation: 1 (Using data.table, removing only leading and trailing blanks)
library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]
Output from situation1:
val val1 num num1 1: abc klm 1 2 2: kl m gdfs 2 3 3: dfsd 123 3 4
## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\s+', '', x)), .SDcols = cols_to_be_rectified]
Output from situation2:
val val1 num num1 1: abc klm 1 2 2: klm gdfs 2 3 3: dfsd 123 3 4
Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws
we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).
I hope this helps , Thanks
这篇关于从 R 中的整个数据帧中删除空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!