在R上的字符串中删除（非中断）空格字符 [英] remove (non-breaking) space character in string in R on Linux

查看：141 发布时间：2018/6/25 13:32:44 html r regex linux string

本文介绍了在R上的字符串中删除（非中断）空格字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题似乎很容易删除R中的字符串中的空格字符。但是，当我加载下表时，我无法删除两个数字之间的空格（例如 11 846.4 ）：

  require（XML）
 library（RCurl）
 link2fetch ='https://www.destatis.de/ DE / ZahlenFakten / Wirtschaftsbereiche / LandForstwirtschaftFischerei / FeldfruechteGruenland / Tabellen / AckerlandHauptfruchtgruppenFruchtarten.html'
 
 theurl = getURL（link2fetch，.opts = list（ssl.verifypeer = FALSE））＃important！ 
 area_cult10 = readHTMLTable（theurl，stringsAsFactors = FALSE）
 area_cult10 = data.table :: rbindlist（area_cult10）
 
 test = sub（'，'，'。'，area_cult10 $ V5）＃更改为。 
 test = gsub（'（。+）\\s（[AZ] {1}）*'，'\\1'，test）＃删除LETTERS 
 gsub（'\\ \\'s'，''，test）＃删除空格？

为什么我不能删除test [1]中的空格？感谢您的任何建议！这可以是空间角色以外的东西吗？也许答案很简单，我忽略了一些东西。您可以缩短测试< （code> perl = TRUE 参数）：/ code>创建仅需2步，仅使用1个 PCRE 正则表达式：

  test = sub（，，。，gsub（（* UCP）[\\s\\\p {L}] + | \\W + $，，area_cult10 $ V5，perl = TRUE），fixed = TRUE）

结果：

  [1]11846.46529.23282.7616.0 1621.8125.714.2
 [8]401.6455.511.7160.479.137.629.6
 [15]13.9 554.1236.7312.84.6136.9
 [22]1374.41332.31281.83.75.018.423.4
 [29] 42.02746.2106.62100.4267.8258.413.1
 [36]23.511.6310.2

gsub 正则表达式值得特别注意：

 
  （* UCP）  - 执行模式的PCRE动词以识别Unicode 
 
   [\\s\\\p {L}] +   - 匹配1+空格或字母字符
 
   |   - 或（一个替代运算符）
 
   \\W + $   - 在字符串末尾有1个非单词字符。

 
 
 然后， sub（，，。，x，fixed = TRUE）将替换第一个，使用。作为字符串， fixed = TRUE 可以节省性能，因为它不需要编译正则表达式。 p> 
This question seems to make it easy to remove space characters in a string in R. However when I load the following table I'm not able to remove a space between two numbers (eg.11 846.4):
require(XML)
library(RCurl)
link2fetch = 'https://www.destatis.de/DE/ZahlenFakten/Wirtschaftsbereiche/LandForstwirtschaftFischerei/FeldfruechteGruenland/Tabellen/AckerlandHauptfruchtgruppenFruchtarten.html'

theurl = getURL(link2fetch, .opts = list(ssl.verifypeer = FALSE) ) # important!
area_cult10 = readHTMLTable(theurl, stringsAsFactors = FALSE)
area_cult10 = data.table::rbindlist(area_cult10)

test = sub(',', '.', area_cult10$V5) # change , to . 
test = gsub('(.+)\\s([A-Z]{1})*', '\\1', test) # remove LETTERS
gsub('\\s', '', test) # remove white space?
Why can't I remove the space in test[1]?
Thanks for any advice! Can this be something else than a space character? Maybe the answer is really easy and I'm overlooking something.
 解决方案 
You may shorten the test creation to just 2 steps and using just 1 PCRE regex (note the perl=TRUE parameter):
test = sub(",", ".", gsub("(*UCP)[\\s\\p{L}]+|\\W+$", "", area_cult10$V5, perl=TRUE), fixed=TRUE)
Result:
 [1] "11846.4" "6529.2"  "3282.7"  "616.0"   "1621.8"  "125.7"   "14.2"   
 [8] "401.6"   "455.5"   "11.7"    "160.4"   "79.1"    "37.6"    "29.6"   
[15] ""        "13.9"    "554.1"   "236.7"   "312.8"   "4.6"     "136.9"  
[22] "1374.4"  "1332.3"  "1281.8"  "3.7"     "5.0"     "18.4"    "23.4"   
[29] "42.0"    "2746.2"  "106.6"   "2100.4"  "267.8"   "258.4"   "13.1"   
[36] "23.5"    "11.6"    "310.2"  
The gsub regex is worth special attention:


(*UCP) - the PCRE verb that enforces the pattern to be Unicode aware
[\\s\\p{L}]+  - matches 1+ whitespace or letter characters
| - or (an alternation operator)
\\W+$ - 1+ non-word chars at the end of the string.


Then, sub(",", ".", x, fixed=TRUE) will replace the first , with a . as literal strings, fixed=TRUE saves performance since it does not have to compile a regex.

                        这篇关于在R上的字符串中删除（非中断）空格字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在R上的字符串中删除（非中断）空格字符 [英] remove (non-breaking) space character in string in R on Linux

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

在R上的字符串中删除（非中断）空格字符 [英] remove (non-breaking) space character in string in R on Linux

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭