有条件地删除R中的向量元素的字符 [英] Conditionally Remove Character of a Vector Element in R

查看:569
本文介绍了有条件地删除R中的向量元素的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有(如有不完整的)地址资料,如下所示:

  data<  -  c(1600 Pennsylvania Avenue,Washington DC,
,Siem Reap,FC,11 Wall Street,New York,NY,,Addis Ababa,FC))



如果其中一个是逗号,我需要删除第一个和/或最后一个字符。



到目前为止,我有:

  for(i in 1:length(data)){
lastchar < - nchar(data [i])
sec2last< - nchar(data [i]) - 1
if(regexpr(,,data [i])[1] == 1 ){
data [i] }
if(regexpr(,,data [i])[1] = nchar(data [i])){
data [i] }
}

数据

它适用于第一个字符,但不是最后一个字符。如何修改第二个 if 语句或以其他方式完成我的目标?

解决方案

p>您可以尝试以下代码删除在开始或结束处出现的逗号。

 >数据< -  c(1600 Pennsylvania Avenue,Washington DC,
+,Siem Reap,FC,11 Wall Street,New York,NY,,Addis Ababa, b $ b>
[1]1600 Pennsylvania Avenue,Washington DC
[2](1)
[1] ]暹粒,FC
[3]11华尔街,纽约,纽约
[4]亚的斯亚贝巴,FC
pre>

模式说明




  • (?<= ^),在regex (?<=)称为正后备。在我们的例子中,它断言什么前面的逗号必须是一个行开始 ^

  • | 逻辑OR运算符通常用于合并(即ORing)两个正则表达式。

  • ,(?= $) Lookahead要求逗号后面的行必须是行尾 $ 。因此,它匹配行末尾的逗号。


I have (sometimes incomplete) data on addresses that looks like this:

data <- c("1600 Pennsylvania Avenue, Washington DC", 
          ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")  

I need to remove the first and/or last character if either one of them are a comma.

So far, I have:

for(i in 1:length(data)){
  lastchar <- nchar(data[i])
  sec2last <- nchar(data[i]) - 1
  if(regexpr(",",data[i])[1] == 1){
    data[i] <- substr(data[i],2, lastchar)
  }
  if(regexpr(",",data[i])[1] == nchar(data[i])){
    data[i] <- substr(data[i],1, sec2last)
  }
}

data

which works for the first character, but not the last character. How can I modify the second if statement or otherwise accomplish my goal?

解决方案

You could try the below code which remove the comma present at the start or at the end,

> data <- c("1600 Pennsylvania Avenue, Washington DC", 
+           ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
> gsub("(?<=^),|,(?=$)", "", data, perl=TRUE)
[1] "1600 Pennsylvania Avenue, Washington DC"
[2] "Siem Reap,FC"                           
[3] "11 Wall Street, New York, NY"           
[4] "Addis Ababa,FC" 

Pattern explanation:

  • (?<=^), In regex (?<=) called positive look-behind. In our case it asserts What precedes the comma must be a line start ^. So it matches the starting comma.
  • | Logical OR operator usually used to combine(ie, ORing) two regexes.
  • ,(?=$) Lookahead aseerts that what follows comma must be a line end $. So it matches the comma present at the line end.

这篇关于有条件地删除R中的向量元素的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆