正则表达式:用标签替换每个逗号不在引号内 [英] Regex: Replace every Comma with Tab Not within quotes
问题描述
我有一个巨大的数据集条目:
I have a huge data set of entries like these:
(21,2,'23 .5R25 ETADT','description,用逗号),
(22,1,'26 .5R25 ETADT','不带逗号的说明'),
(23,5,'20 .5R20.5' ,'另一个描述与; semicolumn'),
(21, 2, '23.5R25 ETADT', 'description, with a comma'),
(22, 1, '26.5R25 ETADT', 'Description without a comma'),
(23, 5, '20.5R20.5', 'Another description with ; semicolumn'),
我试图用一个选项卡替换列表中的每个逗号。排除单引号内的逗号。也不包括结尾的逗号。
I'm trying to replace every comma in the list with a tab. Excluding the commas within the single quotes. Also excluding the ending commas.
所以示例条目应该成为:
So the examples entries should become:
(21 [TAB] 2 [TAB] '23 .5R25 ETADT'[TAB]'描述,用逗号'),
(22 [TAB] 1 [TAB] '26 .5R25 ETADT'[TAB] '说明没有逗号'),
(23 [TAB] 5 [TAB] '20 .5R20.5'[TAB]'另一个描述与; semicolumn'),
(21[TAB]2[TAB]'23.5R25 ETADT'[TAB]'description, with a comma'),
(22[TAB]1[TAB]'26.5R25 ETADT'[TAB]'Description without a comma'),
(23[TAB]5[TAB]'20.5R20.5'[TAB]'Another description with ; semicolumn'),
我有这样的6000行数据。
标签允许我告诉Excel将这些条目的元素导入不同的列。
I've got like 6000 rows of data like this. The tabs allow me to tell Excel to import the elements of these entries into different columns.
我尝试过的正则表达式是这样的: [] *,[] *
但是这个正则表达式选择所有的逗号,甚至是单引号的逗号。
The Regex I've tried was this: [ ]*,[ ]*
But this Regex selects all the commas, even the ones within the single quotes.
推荐答案
它看起来好像你的每一行在括号内有4个元素。而且看起来只有最后2个元素使用单引号。如果可以做出这些假设,我在记事本++中测试了以下内容:
It looks as though each of your lines has 4 elements within parenthesis. And it looks like only the last 2 elements use single quotes. If those assumptions can be made, I've tested the following in Notepad++:
- 找到什么:
^ \(([^,] *),\s *([^,] *),\s * '([^'] *)'\s *,\s *
- 替换为:
\(\1\t\2\t'\3'\t
- "Find what :"
^\(([^,]*),\s*([^,]*),\s*'([^']*)'\s*,\s*
- "Replace with :"
\(\1\t\2\t'\3'\t
编辑:
搜索正则表达式取决于4列模型,只有最后两个元素具有单引号。视觉上是如何工作的:
The search regex is dependent upon the 4 column model with only the last two elements having single quotes. Visually this is how it works:
-
^ \(
:找到一个开头的括号 -
([^,] *)
-comma字符将全部为元素1 -
,\s *
:匹配逗号和任何尾随空格 -
([^,] *)
:捕获全部为元素2的非逗号字符 -
,\s *
:匹配逗号和任何尾随空格 -
'([^'] *) '
:使用单引号捕获字符串,它将全部为元素3 -
\s *,\s *
:匹配一个逗号和所有周围的空格 - 忽略字符串的其余部分,没有更多的逗号被替换,我们只是想替换我们刚才的行的一部分阅读
^\(
: Finds an opening parenthesis([^,]*)
: Captures non-comma characters which will be all of element 1,\s*
: Matches a comma and any trailing spaces([^,]*)
: Captures non-comma characters which will be all of element 2,\s*
: Matches a comma and any trailing spaces'([^']*)'
: Captures the string in single quotes which will be all of element 3\s*,\s*
: Matches a comma and all surrounding spaces- Ignore the rest of the string, there are no more commas to be replaced we just want to replace parts of the line we just read in
这篇关于正则表达式:用标签替换每个逗号不在引号内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!