根据确定性有限自动机每次到达最终状态时是否拆分字符串? [英] Split a string based on each time a Deterministic Finite Automata reaches a final state?
问题描述
我有一个问题,该问题的解决方案可以通过迭代解决,但是我想知道是否存在使用正则表达式和split()
I have a problem which has an solution that can be solved by iteration, but I'm wondering if there's a more elegant solution using regular expressions and split()
我有一个字符串(excel把它放在剪贴板上),从本质上说,它是逗号分隔的.需要注意的是,当单元格值包含逗号时,整个单元格都用引号引起来(大概是为了避免该字符串中的逗号).字符串示例如下:
I have a string (which excel is putting on the clipboard), which is, in essence, comma delimited. The caveat is that when the cell values contain a comma, the whole cell is surrounded with quotation marks (presumably to escape the commas within that string). An example string is as follows:
123,12,"12,345",834,54,"1,111","98,273","1,923,002",23,"1,243"
现在,我想将该字符串优雅地拆分为单个单元格,但是要注意的是,我不能使用带有逗号作为分隔符的普通拆分表达式,因为它将对值中包含逗号的单元格进行划分.解决此问题的另一种方法是,如果逗号前面有 Even 个引号,我可以仅分割逗号.
Now, I want to elegantly split this string into individual cells, but the catch is I cannot use a normal split expression with comma as a delimiter, because it will divide cells that contain a comma in their value. Another way of looking at this problem, is that I can ONLY split on a comma if there is an EVEN number of quotation marks preceding the comma.
这很容易通过循环来解决,但是我想知道是否存在一个能够捕获此逻辑的正则expression.split函数.为了解决这个问题,我为逻辑构造了确定性有限自动机(DFA).
This is easy to solve with a loop, but I'm wondering if there's a regular expression.split function capable of capturing this logic. In an attempt to solve this problem, I constructed the Deterministic Finite Automata (DFA) for the logic.
现在的问题简化为:是否有办法拆分此字符串,以便每次在DFA中达到最终状态(此处为状态4)时都生成一个新的数组元素(对应于/s)?
The question now is reduced to the following: is there a way to split this string such that a new array element (corresponding to /s) is produced each time the final state (state 4 here) is reached in a DFA?
推荐答案
使用正则表达式(未转义):(?:(?:"[^"]*")|(?:[^,]*))
Using regex (unescaped): (?:(?:"[^"]*")|(?:[^,]*))
使用它并调用Regex.Matches(),它是.NET或在其他平台中的类似版本.
Use that and call Regex.Matches() which is .NET, or its analog in other platforms.
您可以将以上内容进一步扩展为:^(?:(?:"(?<Value>[^"]*)")|(?<Value>[^,]*))(?:,(?:(?:"(?<Value>[^"]*)")|(?<Value>[^,]*)))*$
You could further expand the above to this: ^(?:(?:"(?<Value>[^"]*)")|(?<Value>[^,]*))(?:,(?:(?:"(?<Value>[^"]*)")|(?<Value>[^,]*)))*$
这将以一枪的方式解析整个字符串,但是您需要命名组和每个组的多捕获功能才能工作(.NET支持).
This will parse the whole string in 1 shot, but you need named groups and multi-capture per group for this to work (.NET supports it).
这篇关于根据确定性有限自动机每次到达最终状态时是否拆分字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!