连接多个数据文件 [英] Concatenate Multiple Data Files

查看:159
本文介绍了连接多个数据文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个数据文件,如下所示:

I have several data files that look like this:

HR0
012312010
001230202

HR1
012031020
012320102
012323222
012321010

HR2
321020202
...

解释:有一行定义字段(HRn),带有四进制数(321020202),然后在两个字段之间有一个额外的换行符。我想结合等效的人力资源字段。所以在某种意义上,我想把这些文件合并成一个大文件。我想使用sed是答案,但我不知道从哪里开始。

To explain: there is a line that defines the field (HR"n"), a variable number of lines with quaternary numbers (321020202) and then an extra newline between two fields. I want to combine equivalent HR fields. So in a sense, I want to zipper these files into one large file. I think using sed is the answer, but I don't know where to start.

我想使用一个shell脚本over python或c ++程序,因为我觉得在写作和执行上可能更快。想法?

And I'm thinking of using a shell script over python or a c++ program because I feel it might be faster in both writing and execution. Thoughts?

推荐答案

这在C ++中很容易做到,如果你有C ++ 17,
您可以编写一个函数来读取 multimap< int,int> 如:

This is pretty easy to do in C++, made more so if you have C++17. You can write a function for reading a multimap<int, int> something like:

multimap<int, int> read(istream& input) {
    multimap<int, int> output;
    string i;

    while(input >> i) {
        const auto key = std::atoi(data(i) + 2);

        transform(istream_iterator<int>(input), istream_iterator<int>(), inserter(output, begin(output)), [key](const auto value){ return make_pair(key, value); });
        input.clear();
    }
    return output; 
}

所以你可以调用每个文件的 ifstream 并使用 merge 将返回值转储到您的累积 multimap< int,int>输出

So you'll call that function with each file's ifstream and use merge to dump the return into your acumulating multimap<int, int> output.

然后,您只需将输出它已经用 ofstream filep 打开,可以这样转储:

Then you'll just dump output to your output file, say it had been opened with ofstream filep you could dump like this:

auto key = cbegin(output)->first;

filep << key << ":\n" << setfill('0');

for(const auto& it : output) {
    if(it.first == key) {
        filep << '\t' << setw(9) << it.second << endl;
    } else {
        key = it.first;
        filep << key << ":\n\t" << setw(9) << it.second << endl;
    }
}

我写了一个只包含一个文件的实例这里: http://ideone.com/n47MnS

I've written a live example only involving one file here: http://ideone.com/n47MnS

这篇关于连接多个数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆