从CSV preserving行删除换行符 [英] Remove linefeed from csv preserving rows

查看:192
本文介绍了从CSV preserving行删除换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有被导出的CSV,一些线路在一个记录中间的换行符(ASCII 012)。我需要用空格来代替这一点,但preserve每条记录的新行加载它。

大部分线路都正常,但是一个好的几个有这样的:

输入:

  10,2007-07-30 13.26.21.598000,1922,0,,,,特殊需求列表行更新:
第1行:说明:其他:评论:泵运行,所有的水为INSD的家,10003,524,抄送:2023,2023年,CCR,插入,2011-12- 03 01.25.39.759555,2011-12-03 01.25.39.759555

输出:

  10,2007-07-30 13.26.21.598000,1922,0,,,,特殊需求列表行更新:第1行:说明:其他:评论:泵运行所有的水为INSD的家,10003,524,抄送:2023,2023年,CCR,插入,2011-12-03 01.25.39.759555,2011-12 -03 01.25.39.759555

我一直在寻找到awk中却无法真正使如何preserve实际行感。

另外一个例子:

输入:

  9 ~~2007-08-01 16.14.45.099000〜2215〜0 ~~~~曝光关闭(不必要的):车库门工作
索赔撤回~~ 701〜抄送:6007~~ 564〜6007 ~~~CCR〜插入〜2011-12-03 01.25.39.759555〜2011-12-03 01.25.39.759555
4 ~~2007-08-01 16.14.49.333000〜1923〜0 ~~~~分配给用户李亚男Hamshere组GIO家处理(3队)~~ 912〜抄送:6008~~~ 6008 ~~~CCR〜插入〜2011-12-03 01.25.39.759555〜2011-12-03 01.25.39.759555

输出:

  9 ~~2007-08-01 16.14.45.099000〜2215〜0 ~~~~曝光关闭(不必要的):车库门工作索赔撤回~~ 701〜 抄送:6007~~ 564〜6007 ~~~CCR〜插入〜2011-12-03 01.25.39.759555〜2011-12-03 01.25.39.759555
4 ~~2007-08-01 16.14.49.333000〜1923〜0 ~~~~分配给用户李亚男Hamshere组GIO家处理(3队)~~ 912〜抄送:6008~~~ 6008 ~~~CCR〜插入〜2011-12-03 01.25.39.759555〜2011-12-03 01.25.39.759555


解决方案

使用的一种方法 GNU AWK

 的awk -f script.awk file.txt的

目录 script.awk

  BEGIN {
    FS =[,〜]
}NF< 21 {
    行=(逐行OFS:行)$ 0个
    域= +领域NF
}田> = {21
    打印线
    线=
    域= 0
}NF == {21
    打印
}

另外,你可以用这个单行:

 的awk -F[,〜]'NF< 21 {线=(逐行OFS:行)$ 0;域= +领域NF}域> = {21打印线;行=;栏= 0} == NF {21}打印file.txt的


说明:

我做了一个观察你的期望的输出:似乎每行应包含正好21个领域。因此,如果您的线路中包含小于21个领域,存储线和存储字段的数目。当我们循环到下一行,该线路将被连接到一个空间存储行,场总数为。如果此数量的字段是大于或等于21(虚线的字段的总和将增加至22),打印存储的线。否则,如果该行包含21个领域(NF == 21),打印。 HTH。

I have a CSV that was exported, some lines have a linefeed (ASCII 012) in the middle of a record. I need to replace this with a space, but preserve the new line for each record to load it.

Most of the lines are fine, however a good few have this:

Input:

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :
Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

Output:

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

I have been looking into Awk but cannot really make sense of how to preserve the actual row.

Another Example:

Input:

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working
Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

Output:

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

解决方案

One way using GNU awk:

awk -f script.awk file.txt

Contents of script.awk:

BEGIN {
    FS = "[,~]"
}

NF < 21 {
    line = (line ? line OFS : line) $0
    fields = fields + NF
}

fields >= 21 {
    print line
    line=""
    fields=0
}

NF == 21 {
    print
}

Alternatively, you can use this one-liner:

awk -F "[,~]" 'NF < 21 { line = (line ? line OFS : line) $0; fields = fields + NF } fields >= 21 { print line; line=""; fields=0 } NF == 21 { print }' file.txt


Explanation:

I made an observation about your expected output: it seems each line should contain exactly 21 fields. Therefore if your line contains less than 21 fields, store the line and store the number of fields. When we loop onto the next line, the line will be joined to the stored line with a space, and the number of fields totaled. If this number of fields is greater or equal to 21 (the sum of the fields of a broken line will add to 22), print the stored line. Else if the line contains 21 fields (NF == 21), print it. HTH.

这篇关于从CSV preserving行删除换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆