如何在Linux中根据记录数分割带分隔符的文本文件,该文件在数据字段中具有记录结尾分隔符 [英] How to Split a Delimited Text file in Linux, based on no of records, which has end-of-record separator in data fields

查看:118
本文介绍了如何在Linux中根据记录数分割带分隔符的文本文件,该文件在数据字段中具有记录结尾分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题陈述:

我从Teradata卸载了一个分隔的文本文件,该文件恰好在数据字段中包含"\ n"(换行符或EOL标记).

I have a delimited text file offloaded from Teradata which happens to have "\n" (newline characters or EOL markers) inside data fields.

同一行的EOL标记位于整个行或记录的每一行的末尾.

The same EOL marker is at the end of each new line for one entire line or record.

我需要将此文件拆分为两个或多个文件(基于我提供的记录数),同时在数据字段中保留换行符,但要注意每行末尾的换行符.

I need to split this file in two or more files (based on no of records given by me) while retaining the newline chars in data fields but against the line breaks at the end of each lines.

示例:

1|Alan
Wake|15
2|Nathan
Drake|10
3|Gordon
Freeman|11

期望:

file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10  


file2.txt

3|Gordon
Freeman|11 

我尝试过的事情:

 awk 'BEGIN{RS="\n"}NR%2==1{x="SplitF"++i;}{print > x}' inputfile.txt

该代码无法区分数据字段换行符和实际换行符.有办法可以实现吗?

The code can't discern between data field newlines and actual newlines. Is there a way it can be achieved?

:我已经用示例更改了问题说明.请分享您对新示例的想法.

: i have changed the problem statement with example. Please share your thoughts on the new example.

推荐答案

如果您使用的是GNU awk,则可以通过适当地设置RS来做到这一点,例如:

If you are using GNU awk you can do this by setting RS appropriately, e.g.:

parse.awk

BEGIN { RS="[0-9]\\|" }

# Skip the empty first record by checking NF (Note: this will also skip
# any empty records later in the input)
NF {
  # Send record with the appropriate key to a numbered file
  printf("%s", d $0) > "file" i ".txt"
}

# When we found enough records, close current file and 
# prepare i for opening the next one
#
# Note: NR-1 because of the empty first record
(NR-1)%n == 0 { 
  close("file" i ".txt")
  i++
}

# Remember the record key in d, again, 
# becuase of the empty first record
{ d=RT }

像这样运行它:

gawk -f parse.awk n=2 infile

n是要放入每个文件的记录数.

Where n is the number of records to put into each file.

输出:

file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10

file2.txt

3|Gordon
Freeman|11

这篇关于如何在Linux中根据记录数分割带分隔符的文本文件,该文件在数据字段中具有记录结尾分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆