如何将行转换成基于列的重复数据? [英] How can I transform rows into repeated column based data?

查看:184
本文介绍了如何将行转换成基于列的重复数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试一个数据集,如下所示:

I'm trying to take a dataset that looks like this:

并将记录变换成这种格式:

And transform the records into this format:

< img src =https://i.stack.imgur.com/CjVT4.jpgalt =目的地格式>

生成的格式将有两列,一个用于旧的列名称和一个列的值。如果有10,000行,则应该有新的格式的10,000组数据。

The resulting format would have two columns, one for the old column names and one column for the values. If there are 10,000 rows then there should be 10,000 groups of data in the new format.

我可以使用所有不同的方法,excel公式,sql(mysql)或直接的ruby代码也适用于我。解决这个问题的最好方法是什么?

I'm open to all different methods, excel formulas, sql (mysql), or straight ruby code would work for me also. What is the best way to tackle this problem?

推荐答案

只是为了好玩:

# Input file format is tab separated values

# name  search_term address code
# Jim jim jim_address 123
# Bob bob bob_address 124
# Lisa  lisa  lisa_address  126
# Mona  mona  mona_address  129


infile = File.open("inputfile.tsv")

headers = infile.readline.strip.split("\t")
puts headers.inspect
of = File.new("outputfile.tsv","w")
infile.each_line do |line|
  row = line.split("\t")
  headers.each_with_index do |key, index|
    of.puts "#{key}\t#{row[index]}"
  end
end

of.close



# A nicer way, on my machine it does 1.6M rows in about 17 sec

File.open("inputfile.tsv") do | in_file |
  headers = in_file.readline.strip.split("\t")
  File.open("outputfile.tsv","w") do | out_file |
    in_file.each_line do | line |
      row = line.split("\t")
      headers.each_with_index do | key, index | 
        out_file << key << "\t" << row[index]
      end
    end 
  end
end

这篇关于如何将行转换成基于列的重复数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆