从.CSV文件的数值中删除双引号和逗号 [英] Remove double quotes and comma from a numeric value of a .CSV file
问题描述
我有一个.CSV文件,其中几乎没有带有数字的记录,这些记录用双引号引起来(例如,在"455,365.44"中用引号引起来),并在引号之间使用逗号.我需要从记录的数值中删除逗号("455,365.44"在处理后应类似于455365.44),以便可以在文件的进一步处理中使用它们.
I have a .CSV file which has few records with numbers in them which are enclosed in double quotes (such as in "455,365.44") and commas in between the quotes. I need to remove the comma from the numeric values("455,365.44" should look like 455365.44 after processing) of the records so I could use them in the further processing of the file.
这是文件的示例
column 1, column 2, column 3, column 4, column 5, column 6, column 7
12,"455,365.44","string with quotes, and with a comma in between","4,432",6787,890,88
432,"222,267.87","another, string with quotes, and with two comma in between","1,890",88,12,455
11,"4,324,653.22","simple string",77,777,333,22
我需要的结果如下:
column 1, column 2, column 3, column 4, column 5, column 6, column 7
12,455365.44,"string with quotes, and with a comma in between",4432,6787,890,88
432,222267.87,"another, string with quotes, and with two comma in between",1890,88,12,455
11,4324653.22,"simple string",77,777,333,22
P.S:我只需要像这样转换为数字的值,并且字符串值应保持不变.
P.S: I need only the values which are numeric to be converted like this and the string values should remain same.
请帮助...
推荐答案
要删除引号(用带引号的数字替换不带引号的数字):
To remove the quotes (replace the number with the quotes with the number without them):
s/"(\d[\d.,]*)"/\1/g
请参见珠光
对于逗号,如果正则表达式实现支持多数民众赞成,那么我只能想到先行和后退(如果前后引号内的数字都用逗号括住,请用逗号替换逗号):
For the commas I could only think of a lookahead and lookbehind, if thats supported by your regex implementation (replace commas with nothing if before and after is a number within quotes):
s/(?<="[\d,]+),(?=[\d,.]+")//g
在删除引号之前,您必须执行此操作.
You would have to execute this before removing the quotes.
它可能也可以在不隐藏的情况下工作:
It might also work without lookbehind:
s/,(?=[\d,.]*\d")//g
请参见珠光
在shell脚本中,您可能需要使用 perl ,例如执行:
In a shell script you might want use perl e.g. execute:
cat test.csv | perl -p -e 's/,(?=[\d,.]*\d")//g and s/"(\d[\d,.]*)"/\1/g'
正则表达式的解释:
首先执行:
s/,(?=[\d,.]*\d")//g
这将删除所有后跟数字([\d,.]*\d
)和引号的逗号,从而仅删除引号内数字的逗号
This will remove all commas that are followed by a number ([\d,.]*\d
) and a quote, thus removing only commas from numbers within quotes
下一步执行
s/"(\d[\d,.]*)"/\1/g
这会将引号内的所有数字替换为不带引号的值
This will replace all numbers that are within quotes by the value without the quotes
这篇关于从.CSV文件的数值中删除双引号和逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!