如何在一个字符串字段格式,有着多个逗号为.csv文件日期字段 [英] How do I format a Date field of a .CSV file with multiple commas in a string field

查看:101
本文介绍了如何在一个字符串字段格式,有着多个逗号为.csv文件日期字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有.csv文件(FILE.CSV),其数据都包含在双引号。该文件的样本格式如下:

I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below:

column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"

第9个字段的格式为日期字段的DD-MMM-YY。我不得不将其转换为格式 YYYY / MM / DD 。我尝试使用下面的code,但没有用的。

The 9th field is the date field in the format "DD-MMM-YY". I have to convert it to the format YYYY/MM/DD. I am trying to use the below code, but of no use.

awk -F, '
 BEGIN {
 split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
 for (i=1; i<=12; i++) mdigit[month[i]]=i
 }
 { m=substr($9,4,3)
 $9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20))
 print
 }' OFS="," file.csv > temp_file.csv

如下图所示执行上述code之后把文件temp_file.csv的了。

The out put of the file temp_file.csv after executing the above code is as shown below.

column1,column2,column3,column4,column5,column6,column7,Column8,00/00/2000,Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,00/00/2000,"890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455",00/00/2002, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,00/00/2000,"333","22","13-OCT-11","232"

据我了解,这个问题是在双引号作为我的code中的逗号正在予以考虑太...请在下面的问题建议:

As far as I am understand, the issue is with the commas in the double quote as my code is taking them into consideration too... Please suggest on the below questions:

1)是否在各个领域的双引号中的所有值有什么区别?如果他们有什么区别,我该如何摆脱他们除了从与他们逗号字符串的所有值?
2)我的code的任何修改,所以我可以格式化9日,该磁场的格式DD-MMM-YYYY YYYY / MM / DD

1) Does the double quoting all the values in all the fields make any difference? If they make any difference, how do I get rid of them from all the values except the strings with commas in them? 2) Any modifications to my code so I could format the 9th field which in the format "DD-MMM-YYYY" to YYYY/MM/DD

推荐答案

您可以试试下面的一行代码:

You can try the following one-liner:

awk '
BEGIN {
    FS = OFS = ","
    split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, / /)
    for (i=1; i<=12; i++) {
        mm[month[i]]=i
    }
}
NR>1 { 
    gsub(/\"/, "", $(NF-1))
    split($(NF-1), d, /-/)
    $(NF-1)=q "20" d[3] "/" mm[d[2]] "/" d[1] q}1' q='"' file

输出:

column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","2011/10/11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","2011/10/12","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","2011/10/13","232"

这篇关于如何在一个字符串字段格式,有着多个逗号为.csv文件日期字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆