修剪错误?未删除前导空格 [英] trimws bug? leading whitespace not removed

查看:38
本文介绍了修剪错误?未删除前导空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:感谢 R Yoda,我终于能够针对我面临的问题创建一个可重现的示例:

Edit: Thanks to R Yoda, I was finally able to create a reproducible example to the issue I am facing:

x = rawToChar(as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32)))
trimws(x)

=> 问题:如何修剪 x?

=> Question: How can I trim x?

问题的旧文本:
请参阅附件截图.不幸的是,我无法创建可重现的示例,因为 dput 正在影响结果...

任何人都知道如何调查 x 出了什么问题?前导空格似乎不是标准的!

As anyone an idea how to investigate what's going wrong with x? The leading whitespace doesn't seem to be a standard one!

charToRaw(x) 给出 a0 31 31 2e 31 33 32 35 39 32
dput(charToRaw(x)) 给出 as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39,0x32))
Encoding(x) 给出 "unknown"(与 Encoding(" 11.132592") 相同)

charToRaw(x) gives a0 31 31 2e 31 33 32 35 39 32
dput(charToRaw(x)) gives as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32))
Encoding(x) gives "unknown" (same as Encoding(" 11.132592"))

推荐答案

0xa0 正在对 R 中的另一种类型的空格(不间断空格)进行编码,而 0x20 是空格.
trimws 搜索空格或制表符或换行符或回车符(由 [ \t\r\n]+ 表示)但不搜索不间断空格,因此它会不工作.
您可以使用 sub(抑制前导或尾随空格)或 gsub(抑制尾随和前导空格)删除任何类型的尾随或前导空格(包括0xa0表示的那个):

0xa0 is encoding another type of space (the non-breaking space) in R, while 0x20 is the white space.
trimws searches for white spaces or tabs or linebreaks or carriage returns (represented by [ \t\r\n]+) but not for non-breaking spaces, hence it does not work.
You can use sub (to suppress either leading or trailing spaces) or gsub (to suppress both trailing and leading spaces) to remove any kind of trailing or leading space(s) (including the one represented by 0xa0):

sub("^\\s+", "", x)
[1] "11.132592"

以及删除前导和尾随空格:

And for removing leading and trailing spaces:

gsub("(^\\s+)|(\\s+$)", "", x)

这篇关于修剪错误?未删除前导空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆