包含美国货币的正则表达式/grep 字符串 [英] Regex/grep strings containing us currency

查看:28
本文介绍了包含美国货币的正则表达式/grep 字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串列表,其中一些包含美元数字.例如:

I have a list of strings, some of which contain dollar figures. For example:

'$34232 foo    \n  bar'

是否有一个 [r] 命令可以只返回包含美元金额的字符串?

is there an [r] command that can return to me only the strings which contain dollar amounts in them?

谢谢!

推荐答案

使用 \\$ 来保护 $ 否则意味着字符串结束":

Use \\$ to protect the $ which otherwise means "end of string":

   grep("\\$[0-9]+",c("123","$567","abc $57","$abc"),value=TRUE)

这将选择包含美元符号后跟一个或多个数字的字符串(但不是例如 $abc).grep with value=FALSE 返回索引.grepl 返回一个逻辑向量.R 特定的一点是您需要指定 \\$,而不仅仅是 \$(即需要额外的反斜杠来保护): \$ 会给你一个无法识别的转义"错误.

This will select strings that contain a dollar sign followed by one or more digits (but not e.g. $abc). grep with value=FALSE returns the indices. grepl returns a logical vector. One R-specific point is that you need to specify \\$, not just \$ (i.e. an additional backslash is required for protection): \$ will give you an "unrecognized escape" error.

@Cerbrus 的回答 '\\$[0-9,.]+' 的匹配范围会稍微大一些(例如,它将匹配 $456.89367,245,100 美元).它还将匹配一些难以置信的货币字符串,例如$45.13.89$467.43,2,1(即逗号只允许用于美元部分的 3 位数字分组;美元和分).我们的两个答案都会(错误地?)匹配 $45abc.如果幸运的话,您的数据不会包含任何这些棘手的可能性.总的来说很难做到这一点;评论中提到的答案(什么是最佳"美国货币正则表达式? ) 尝试这样做,结果得到了明显更复杂的答案,但如果您通过适当保护 $ 将答案调整为 R 可能会很有用.

@Cerbrus's answer, '\\$[0-9,.]+', will match slightly more broadly (e.g. it will match $456.89 or $367,245,100). It will also match some implausible currency strings, e.g. $45.13.89 or $467.43,2,1 (i.e. commas should be allowed only for groupings of 3 digits in the dollars segment; there should be only one decimal point separating dollars and cents). Both of our answers will (incorrectly?) match $45abc. If you're lucky, your data don't have contain any of these tricky possibilities. Getting this right in general is hard; the answer referred to in the comments ( What is "The Best" U.S. Currency RegEx? ) tries to do this, and as a result has significantly more complex answers, but could be useful if you adapt the answers to R by protecting $ appropriately.

这篇关于包含美国货币的正则表达式/grep 字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆