我该如何将这些数据导入我的数据库? [英] How should I import this data into my database?

查看:128
本文介绍了我该如何将这些数据导入我的数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数千条记录的数据库

I have database with thousands of records

Code  | Name  | Price
00106 | Water | 9.99
00107 | Onion | 8.99

哪个编码在 GES 文件中下面:

Which is coded in GES file like below:


  • 00F 表示列标题

  • 00I 表示插入行

  • 00F means column header
  • 00I means insert a row

还有其他人喜欢( 00D 用于删除行或 00U 用于更新)

there are also others like(00D for delete row or 00U for update)

00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899

我想创建处理此文件并将其推送到我的数据库的导入程序。所以我开始实现:

I want to create importer which process this file and push it into my database. So I started implemented that:

class Importer
  CONN = ActiveRecord::Base.connection
  F = "00F"
  I = "00I"

  def extract_to_database(collection)
    add       = true
    tmp       = []
    type      = F
    inserts   = []

    collection.each_with_index do |line, i|
      _type    = line.strip
      _changed = [F,I].include? _type

      if _changed && i > 0
        case type
        when F then @f << tmp
        when I
          group_id = Group.find_by(code: tmp[1]).id
          inserts.push "(group_id,'#{tmp[2]}','#{tmp[3]}')"
        end

        tmp  = []
        type = _type
      end

      tmp << line
    end
    sql = "INSERT INTO products (`group_id`, `name`, `price`) VALUES #{inserts.join(", ")}"
    CONN.execute sql
  end
end

有一个问题,我想使用函数式编程的重构。

There is one problem with that, I would like to refactor that using functional programming.

我必须通过代码找到其他型号并放入产品表相关 some_model_id 列,这样可能会使整个过程复杂化。因为现在导入这些数据需要几个小时。

And I will have to find other model by code and put to products table related some_model_id column so this can complicate whole process. Because right now importing this data takes me few hours.

使用Ruby可能不是最佳选择。

Maybe using Ruby is not the best option.

推荐答案

这里没有Ruby无法处理的东西。目前还不清楚函数式编程是如何帮助这一点的,因为这是一种经典的状态机问题,正在进行一些简单的数据转换。

There's nothing here that Ruby can't handle. It's not clear how "functional programming" would help this any, either, as this is a classic state-machine sort of problem with some simple data transformation going on.

示例脚手架:

class SomethingImporter
  FIELD_MARKER = "00F"
  INSERT_MARKER = "00I"

  COLUMNS = %w[ group_id name price ]

  # Performs the insert into a given model. This should probably be a class
  # method on the model itself.
  def bulk_insert(model, rows)
    sql = [
      "INSERT INTO `#{model.table_name}` (#{columns.collect { |c| }}"
    ]

    # Append the placeholders: (?,?,?),(?,?,?),...
    sql[0] += ([ '(%s)' % ([ '?' ] * COLUMNS.length).join(',') ] * rows.length).join(',')

    sql += rows.flatten

    model.connection.execute(model.send(:sanitize_sql, sql))
  end

  # Resolve a group code to a group_id value, and cache the result so that
  # subsequent look-ups for the same code are valid.
  def group_id(group_code)
    @find_group ||= { }

    # This tests if any value has been cached for this code, including one
    # that might be nil.
    if (@find_group.key?(group_code))
      return @find_group[group_code]
    end

    group = Group.find_by(code: group_code)

    @find_group[group_code] = group && group.id
  end

  # Call this with the actual collection, lines stripped, and with any header
  # lines removed (e.g. collection.shift)
  def extract_rows(collection)
    state = nil
    rows = [ ]
    row = [ ]

    collection.each_with_index do |line|
      case (line)
      when FIELD_MARKER
        # Indicates field data to follow
        state = :field
      when INSERT_MARKER
        case (state)
        when :insert
          rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
        end

        state = :insert
        row = [ ]
      else
        case (state)
        when :field
          # Presumably you'd pay attention to the data here and establish
          # a mapping table.
        when :insert
          row << line.sub(/^\d\d/, '')
          # puts row.inspect
        end
      end
    end

    case (state)
    when :insert
      rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
    end

    rows
  end
end


data = <<END
00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899
END

importer = SomethingImporter.new

puts importer.extract_rows(data.split(/\n/)).inspect

根据您的数据,此示例输出如下所示:

The example output from this, based on your data, looks like:

[["00106", "Water", 9.99], ["00107", "Onion", 8.99]]

写这样的代码时,一定要确定公开中间结果以便能够测试正在发生的事情。您的实现会一次性获取数据并将其直接转储到数据库中,如果它无法正常运行,很难分辨哪些内容出错。这个版本由几种方法组成,每种方法都有更具体的用途。

When writing code like this, be sure to expose intermediate results in order to be able to test what's happening. Your implementation takes data and dumps it directly in the database in one shot, making it very hard to tell where something's going wrong if it doesn't work out properly. This version is composed of several methods, each of which has a more specific purpose.

原始例子中不清楚为什么要解析 group_id 根本没有,你的示例输出与此无关,但作为一个例子,我已经包含了一个方法来解决它们并保持它们被缓存,避免重复查找相同的东西。对于更大规模的导入,您可能会加载许多行,提取不同的group_id值,一次加载它们,并在插入之前重新映射它们。

It's not clear in your original example why you're resolving group_id at all, your sample output has nothing to do with that, but as an example I've included a method that resolves them and keeps them cached, avoiding repeated lookups of the same thing. For a larger scale import you'd probably load in many rows, extract out the distinct group_id values, load them all at once, and remap them before inserting.

这篇关于我该如何将这些数据导入我的数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆