正则表达式:用标签替换每个逗号不在引号内 [英] Regex: Replace every Comma with Tab Not within quotes

查看:1048
本文介绍了正则表达式:用标签替换每个逗号不在引号内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的数据集条目:

I have a huge data set of entries like these:


(21,2,'23 .5R25 ETADT','description,用逗号),

(22,1,'26 .5R25 ETADT','不带逗号的说明'),

(23,5,'20 .5R20.5' ,'另一个描述与; semicolumn'),

(21, 2, '23.5R25 ETADT', 'description, with a comma'),
(22, 1, '26.5R25 ETADT', 'Description without a comma'),
(23, 5, '20.5R20.5', 'Another description with ; semicolumn'),

我试图用一个选项卡替换列表中的每个逗号。排除单引号内的逗号。也不包括结尾的逗号。

I'm trying to replace every comma in the list with a tab. Excluding the commas within the single quotes. Also excluding the ending commas.

所以示例条目应该成为:

So the examples entries should become:


(21 [TAB] 2 [TAB] '23 .5R25 ETADT'[TAB]'描述,用逗号'),

(22 [TAB] 1 [TAB] '26 .5R25 ETADT'[TAB] '说明没有逗号'),

(23 [TAB] 5 [TAB] '20 .5R20.5'[TAB]'另一个描述与; semicolumn'),

(21[TAB]2[TAB]'23.5R25 ETADT'[TAB]'description, with a comma'),
(22[TAB]1[TAB]'26.5R25 ETADT'[TAB]'Description without a comma'),
(23[TAB]5[TAB]'20.5R20.5'[TAB]'Another description with ; semicolumn'),

我有这样的6000行数据。
标签允许我告诉Excel将这些条目的元素导入不同的列。

I've got like 6000 rows of data like this. The tabs allow me to tell Excel to import the elements of these entries into different columns.

我尝试过的正则表达式是这样的: [] *,[] *
但是这个正则表达式选择所有的逗号,甚至是单引号的逗号。

The Regex I've tried was this: [ ]*,[ ]* But this Regex selects all the commas, even the ones within the single quotes.

推荐答案

它看起来好像你的每一行在括号内有4个元素。而且看起来只有最后2个元素使用单引号。如果可以做出这些假设,我在记事本++中测试了以下内容:

It looks as though each of your lines has 4 elements within parenthesis. And it looks like only the last 2 elements use single quotes. If those assumptions can be made, I've tested the following in Notepad++:


  • 找到什么: ^ \(([^,] *),\s *([^,] *),\s * '([^'] *)'\s *,\s *

  • 替换为: \(\1\t\2\t'\3'\t

  • "Find what :" ^\(([^,]*),\s*([^,]*),\s*'([^']*)'\s*,\s*
  • "Replace with :" \(\1\t\2\t'\3'\t

编辑:

搜索正则表达式取决于4列模型,只有最后两个元素具有单引号。视觉上是如何工作的:

The search regex is dependent upon the 4 column model with only the last two elements having single quotes. Visually this is how it works:


  1. ^ \(:找到一个开头的括号

  2. ([^,] *) -comma字符将全部为元素1

  3. ,\s * :匹配逗号和任何尾随空格

  4. ([^,] *):捕获全部为元素2的非逗号字符

  5. ,\s * :匹配逗号和任何尾随空格

  6. '([^'] *) ':使用单引号捕获字符串,它将全部为元素3

  7. \s *,\s * :匹配一个逗号和所有周围的空格

  8. 忽略字符串的其余部分,没有更多的逗号被替换,我们只是想替换我们刚才的行的一部分阅读

  1. ^\(: Finds an opening parenthesis
  2. ([^,]*): Captures non-comma characters which will be all of element 1
  3. ,\s*: Matches a comma and any trailing spaces
  4. ([^,]*): Captures non-comma characters which will be all of element 2
  5. ,\s*: Matches a comma and any trailing spaces
  6. '([^']*)': Captures the string in single quotes which will be all of element 3
  7. \s*,\s*: Matches a comma and all surrounding spaces
  8. Ignore the rest of the string, there are no more commas to be replaced we just want to replace parts of the line we just read in

这篇关于正则表达式:用标签替换每个逗号不在引号内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆