如何建立一个没有不必要的空格CSV文件 [英] How to create CSV file without unnecessary spaces

查看:85
本文介绍了如何建立一个没有不必要的空格CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 xls2csv 二,以翻译的 XLS 商务部为 CSV 在我的Linux红帽机,

例如:(从手册页)

  xls2csv -x1252s preadsheet.xls-b WINDOWS-1252 -cut8csvfile.csv-a UTF-8

不过,我注意到了有关以下问题 - 步骤1,2 (以下问题引起我的bash脚本许多麻烦)

的问题是:

(1) CSV文件中包括不必要的空格(上字的左边或字的右边)

在CSV错误的语法示例

 ,在/ var / ADM / SYS LDD /所有/通信/日志,WORD,WORD

在CSV正确的语法示例

 ,在/ var / ADM / SYS LDD /所有/通信/日志,WORD,WORD

(2)引号出现在CSV即使字(分隔符之间的一个词),实际上我们并不需要在隔板之间的单个字(分隔符的情况下,双引号, )

在CSV错误的语法示例

 ,WORD,

在CSV正确的语法示例

 ,WORD,

请指点如何解决的,以创建根据清洁csv文件,这里所描述的问题的步骤1,2

实施可能是的awk,sed的,perl的单行,或在bash任何解决方案脚本

在修复之前CSV文件示例

  1的/ var / ADM / SYS LDD /所有/通信/日志,34356234245,24245
 2,在/ var / ADM / SYS LDD /所有
 /Comm/debugs.txt,45356,435,578 58976
 3,在crontab中加入这一行:34356,234245,24245
 4,1.0348 54 35.5,45356,435,578
 4,1 2,45356 95857,435,578
 5,1 2,45356 95857,435,578,
 6,1.0348 54 35.5,45356,435,578
 7,1.0348 54 35.5,45356,435,578

正确的CSV文件的示例(修复后)

  1的/ var / ADM / SYS LDD /所有/通信/日志,34356,234245,24245
 2,在/ var / ADM / SYS LDD /所有
 /Comm/debugs.txt\",45356,435,\"578 58976
 3,在crontab中加入这一行:34356,234245,24245
 4,1.0348 54 35.5,45356,435,578
 4,1 2,45356 95857,435578
 5,1 2,45356 95857,435578
 61.0348 54 35.5,45356,4,,35,578
 71.0348 54 35.5,45356,4,35,578

逗号不能领域里出现。

请注意包含的领域内明确新行 2号线

在一个字段是双引号,并且不包含空格(如7号线,45356),这些双引号不得拆除,因为整场包括这些报价是一间codeD密码


解决方案

  awk的-F,-v OFS ={为(i = 1; I< NF = + + I){GSUB(/(^[[:空间:]] * | [[:空间:]] *??$)/,,$ I);如果($ I〜/ [[:空间:]] /)$ I =\\$ I\\}} 1'文件

输出:

  1的/ var / ADM / SYS LDD /所有/通信/日志,34356,234245,24245
2,在/ var / ADM / SYS LDD /所有/Comm/debugs.txt\",45356,435,\"578 58976
3,在crontab中加入这一行:34356,234245,24245
4,1.0348 54 35.5,45356,435,578
4,1 2,45356 95857,435578
5,1 2,45356 95857,435578

唯一的事情就是价值观不能有逗号它例如这是一个值。

I use the xls2csv binary in order to translate XLS doc to CSV in my linux red-hat machine,

example: ( from man page )

 xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8

But I noticed about the following problems - steps 1,2 ( the following problems cause many troubles in my bash script )

THE PROBLEMS ARE:

(1) CSV file include unnecessary spaces (on the left side of word or on the right side of word)

Example of wrong syntax in CSV

 ,"/var/adm/sys ldd/all  /Comm/logs   ","WORD "," WORD"

Example of right syntax in csv

 ,"/var/adm/sys ldd/all  /Comm/logs",WORD,WORD

(2) quotation marks appears in the CSV even if word is (one word between separators), in fact we not need quotation marks in case of SINGLE word between the separators ( separator "," )

Example of wrong syntax in CSV

 ," WORD ",

Example of right syntax in csv

 ,WORD,

Please advice how to solve the problems as described here in order to create "clean csv file" according to steps 1,2

implementation could be with awk,sed ,perl one liner, or any solution under bash script

Example of CSV file before the fix

 1,"/var/adm/sys ldd/all  /Comm/logs",34356,"234245 ",24245
 2,"/var/adm/sys ldd/all
 /Comm/debugs.txt"," 45356",435,"  578 58976  "
 3,"   add this line in crontab    :",34356,"234245 ",24245
 4,"1.0348    54 35.5"," 45356","   435","578 "
 4,"1 2 "," 45356 95857 ","   435","578 "
 5,"1 2 "," 45356 95857 ","   "435","578" "
 6,"1.0348    54 35.5"," 45356"," "4"""    ""35","578 "
 7,"1.0348    54 35.5",""45356",""4"""""35,"578 "

Example of correct CSV file ( after the fix )

 1,"/var/adm/sys ldd/all  /Comm/logs",34356,234245,24245
 2,"/var/adm/sys ldd/all
 /Comm/debugs.txt",45356,435,"578 58976"
 3,"add this line in crontab    :",34356,234245,24245
 4,"1.0348    54 35.5",45356,435,578 
 4,"1 2","45356 95857",435,578
 5,"1 2","45356 95857","435,578" 
 6,"1.0348    54 35.5",45356,"4"""    ""35,578
 7,"1.0348    54 35.5",""45356",""4"""""35,578

Commas cannot appear within fields.

Note the explicit newline contained within a field of line 2.

When a field is within double quotes and contains no white space (e.g. line 7 ""45356"), those double quotes must not be removed because that whole field including those quotes is an encoded password

解决方案

awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i) { gsub(/(^"?[[:space:]]*|[[:space:]]*"?$)/, "", $i); if ($i ~ /[[:space:]]/) $i = "\"" $i "\"" } } 1' file

Output:

1,"/var/adm/sys ldd/all  /Comm/logs",34356,234245,24245
2,"/var/adm/sys ldd/all  /Comm/debugs.txt",45356,435,"578 58976"
3,"add this line in crontab    :",34356,234245,24245
4,"1.0348    54 35.5",45356,435,578
4,"1 2","45356 95857",435,578
5,"1 2","45356 95857","435,578"

Only thing is that values can't have commas on it e.g. "This is, a value.".

这篇关于如何建立一个没有不必要的空格CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆