正则表达式 - 如何替换引号内的字符 [英] Regular Expressions - how to replace a character within quotes

查看:210
本文介绍了正则表达式 - 如何替换引号内的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正则表达式专家您好,

迄今为止,我从未遇到过我无法使用正则表达式解决的字符串操作问题,至少以一种优雅的方式仅使用一个步骤即可.这是我正在使用的示例数据:

There has never been a string manipulation problem I couldn't resolve with regular expressions until now, at least in an elegant manner using just one step. Here is the sample data I'm working with:

0,"section1","(7) 交付美国以外的证书"国家禁止.由于这两个部分1940 年法规第 339 条、第 68 条和本法第 341 条是在他们的声明中明确表示应提供证书公民,仅当此人在在美国境内的时间,它很明显,该文件不能并且不能在外面交付美国.",http://www.google.com/

0,"section1","(7) Delivery of 'certificate' outside the United States prohibited. Since both section 339 of the 1940 statute, 68/ and section 341 of the present law are explicit in their statement that the certificate shall be furnished the citizen, only if such individual is at the time within the United States, it is clear that the document could not and cannot be delivered outside the United States.",http://www.google.com/

1,"section2",,http://www.google.com/

1,"section2",,http://www.google.com/

2,"section3",",,",http://www.google.com/

2,"section3",",,",http://www.google.com/

这是一个更大的 CSV 文件的一部分.使用一个优雅的正则表达式,我只想用下划线字符 (_) 替换双引号内出现的所有逗号.重要的是正则表达式不会替换引号之外的任何逗号,因为这会弄乱 CSV 数据结构.

This is a section of a much larger CSV file. With one elegant regular expression, I'd like to replace only all the commas that occur within the double quotes with an underscore character (_). It is important that the regular expression does NOT replace any commas outside the quotes because that would mess up the CSV data structure.

谢谢,汤姆

--

澄清:

抱歉各位,我在没有完全澄清我的情况的情况下发布了这个问题,所以让我总结如下:

Sorry guys, I posted the question without fully clarifying my situation, so let me summarize below:

  • 假设引号内的引号已经转义了(Excel保存的CSV文件中的引号内的引号由"""""等表示,因此它们很容易提前更换).
  • 我在 JavaScript 中工作.
  • Assume that quotes within quotes are already escaped (quotes within quotes in a CSV file saved by Excel are represented by "" or """ etc., so they are easily replaced beforehand).
  • I am working within JavaScript.

使用上面的示例文本,这是运行正则表达式替换后应该是什么样子(总共应该有 5 个替换):

Using the sample text above, here is what it SHOULD look like after running the regular expression replacement (there should be a total of 5 replacements):

0,"section1","(7) 交付美国以外的证书"国家禁止.由于这两个部分1940 年法规第 339 条_68/和本法第 341 条是在他们的声明中明确表示应提供证书Citizen_ 仅当此人在在美国的时间_它很明显,该文件不能并且不能在外面交付美国.",http://www.google.com/

0,"section1","(7) Delivery of 'certificate' outside the United States prohibited. Since both section 339 of the 1940 statute_ 68/ and section 341 of the present law are explicit in their statement that the certificate shall be furnished the citizen_ only if such individual is at the time within the United States_ it is clear that the document could not and cannot be delivered outside the United States.",http://www.google.com/

1,"section2",,http://www.google.com/

1,"section2",,http://www.google.com/

2,"section3","__",http://www.google.com/

2,"section3","__",http://www.google.com/

推荐答案

我会帮助你,但你必须保证停止使用优雅"这个词.最近工作太辛苦了,该休息了.:P

I'll help you, but you have to promise to stop using the word "elegant". It's been working too hard lately, and deserves a rest. :P

(?m),(?=[^"]*"(?:[^"\r\n]*"[^"]*")*[^"\r\n]*$)

如果在逗号和记录末尾之间有奇数个引号,则匹配逗号.我假设是标准的 CSV 格式,其中记录在下一行分隔符处结束,该分隔符没有用引号括起来.行分隔符在带引号的字段中是合法的,如果用另一个引号将引号转义,则也是如此.

This matches a comma if, between the comma and the end of the record, there's an odd number of quotation marks. I'm assuming a standard CSV format, in which a record ends at the next line separator that isn't enclosed in quotes. Line separators are legal inside quoted fields, as are quotes if they're escaped with another quote.

根据您使用的正则表达式风格,您可能必须使用 \r?$ 而不是 $.例如,在 .NET 中,只有换行符 (\n) 被视为行分隔符.但在 Java 中,$ 匹配 \r\n 中的 \r 之前,但不匹配 \r 之间的和 \n(除非你设置了 UNIX_LINES 模式).

Depending on which regex flavor you're using, you may have to use \r?$ instead of just $. In .NET, for example, only the linefeed (\n) is considered a line separator. But in Java, $ matches before the \r in \r\n, but not between the \r and the \n (unless you set UNIX_LINES mode).

这篇关于正则表达式 - 如何替换引号内的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆