CSV读取数据库的V / S临时表中读取,循环,活动记录使用的最优化。红宝石 [英] CSV read v/s Temp table read from database, optimization of the loop and active record usage . Ruby

查看:108
本文介绍了CSV读取数据库的V / S临时表中读取,循环,活动记录使用的最优化。红宝石的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该文件的CSV解析是非常缓慢的,所以我试图直接直接加载文件中的一些临时表中的数据库,然后做计算如下:

CSV parsing of the file was very slow so I was trying to load the file directly in to some temp table in database directly and then doing the computation as below :

此前它是这样的,用了13分钟,以使用下面的方法的条目:

Earlier it was like this, took 13 mins to add the entries using below method :

CSV.foreach(fileName) do |line|
     completePath = line[0]                                                
    num_of_bps = line[1]

    completePath = cluster_path+ '/' + completePath
    inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id") 

    metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
    add_entry(metric_instance.id, num_of_bps, num_of_bp_tests) 
end



def self.add_entry(metaid, num_of_bps, num_of_bp_tests)
    entry = Bp.new
    entry.metric_instance_id = metaid
    entry.num_of_bps = num_of_bps
    entry.num_of_bp_tests = num_of_bp_tests
    entry.save
    return entry
end

现在我改变了方法,这一点,现在只需52分钟:(

now I changed the method to this, now takes 52 mins :(

@bps = TempTable.all

      @bps.each do |bp|
      completePath = bp.first_column
      num_of_bps = bp.second_column
      num_of_bps3 = bp.third_column


completePath = cluster_path+ '/' + completePath
      inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id")     
      num_of_bp_tests = 0
       if(inode.nil?)
       else
          if(num_of_bps !='0')
            num_of_bp_tests = 1
          end

          metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
          add_entry(metric_instance.id, num_of_bps, num_of_bp_tests)
         end
end 

请帮我优化这个code或让我知道,如果你觉得CSV.each快于数据库中读取!

Please help me optimize this code or let me know if you think CSV.each is faster than database read !

推荐答案

在CSV加载到数据库中你做的:

When you load csv into database you do:

  • 在载荷N CSV行
  • 插入N条记录诠释DB
  • 选择和实例N个活动记录模式
  • 在迭代的

当您使用原始CSV你只

When you work with raw csv you only

  • 在载荷N CSV行
  • 在迭代的

当然,它的速度更快。

这篇关于CSV读取数据库的V / S临时表中读取,循环,活动记录使用的最优化。红宝石的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆