从R中的整个数据帧中删除空格 [英] Removing Whitespace From a Whole Data Frame in R
问题描述
我一直在尝试删除数据框(使用R)中的空白.数据帧很大(> 1gb),并具有多个列,每个数据条目中都包含空白.
是否有一种快速的方法可以从整个数据框中删除空白?我一直在尝试使用以下方法对前10行数据的子集执行此操作:
gsub( " ", "", mydata)
这似乎没有用,尽管R返回了我无法解释的输出.
str_replace( " ", "", mydata)
R返回了 47条警告,并且没有删除空白.
erase_all(mydata, " ")
R返回一个错误,提示错误:找不到函数"erase_all""
在此过程中,我花了24小时来解决这个问题,对此我将非常感谢.
谢谢!
如果我正确理解了您的意思,那么您想从整个数据框中删除所有空白,我想您正在使用的代码很好地删除了其中的空格.列名称.我认为您应该尝试这样做:
apply(myData,2,function(x)gsub('\\s+', '',x))
希望这行得通.
这将返回一个矩阵,但是,如果您想将其更改为数据框,请执行以下操作:
as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))
在2020年进行修改:
在both=TRUE
中使用lapply
和trimws
函数可以删除开头和结尾的空格,但不能删除其中的空格.由于OP没有提供输入数据,因此我添加了一个虚拟示例来生成结果.>
数据:
df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)
#situation:1 (使用Base R),当我们只想在字符串值的开头和结尾处删除空格时,不使用,我们可以使用<
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)
#情况:2 (使用Base R),当我们想删除字符列中数据帧中每个位置的空格时(在字符串的内以及前导和尾部).
(这是使用apply提出的最初解决方案,请注意,使用apply的解决方案似乎可以工作,但是会很慢,还有一个问题,就是如果OP真的想删除前导/尾随,那么它的问题显然不是很清楚空白或数据中的每个空白)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))
##情况:1 (使用data.table,仅删除前导空格和尾随空格)
library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]
输出来自情况1 :
val val1 num num1 1: abc klm 1 2 2: kl m gdfs 2 3 3: dfsd 123 3 4
##情况:2 (使用data.table,删除内部的所有空白以及前导/后退空白)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]
输出来自 situation2 :
val val1 num num1 1: abc klm 1 2 2: klm gdfs 2 3 3: dfsd 123 3 4
请注意两种情况的输出之间的差异,在第2行:您可以看到,使用trimws
我们可以删除开头和结尾的空格,但是使用正则表达式解决方案我们可以删除每个空格.
我希望这会有所帮助,谢谢
I've been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains white space in every data entry.
Is there a quick way to remove the white space from the whole data frame? I've been trying to do this on a subset of the first 10 rows of data using:
gsub( " ", "", mydata)
This didn't seem to work, although R returned an output which I have been unable to interpret.
str_replace( " ", "", mydata)
R returned 47 warnings and did not remove the white space.
erase_all(mydata, " ")
R returned an error saying 'Error: could not find function "erase_all"'
I would really appreciate some help with this as I've spent the last 24hrs trying to tackle this problem.
Thanks!
If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:
apply(myData,2,function(x)gsub('\\s+', '',x))
Hope this works.
This will return a matrix however, if you want to change it to data frame then do:
as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))
EDIT In 2020:
Using lapply
and trimws
function with both=TRUE
can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.
DATA:
df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)
#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)
# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).
(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))
## situation: 1 (Using data.table, removing only leading and trailing blanks)
library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]
Output from situation1:
val val1 num num1 1: abc klm 1 2 2: kl m gdfs 2 3 3: dfsd 123 3 4
## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]
Output from situation2:
val val1 num num1 1: abc klm 1 2 2: klm gdfs 2 3 3: dfsd 123 3 4
Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws
we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).
I hope this helps , Thanks
这篇关于从R中的整个数据帧中删除空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!