损坏的 PL/ruby 的替代方案:转换仓库日志表 [英] Alternatives to broken PL/ruby: convert a warehouse journal table

查看:64
本文介绍了损坏的 PL/ruby 的替代方案:转换仓库日志表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

升级 Postgres 8.4 后 ->9.3 和 Ruby 1.8 ->2.1、PL/ruby运行失败.在第一次执行任何 PL/ruby 函数时,我会立即获得 Postgres 服务器核心转储.我正在分析堆栈跟踪,但它看起来不太好.另外PL/ruby的维护状态也不好.

After upgrading Postgres 8.4 -> 9.3 and Ruby 1.8 -> 2.1, PL/ruby fails to run. I get immediate Postgres server coredump on first execution of any PL/ruby function. I am in the process of analyzing stacktraces, but it does not look good. Also, the maintenance state of PL/ruby does not look good.

因此,将重点转移到我使用 PL/ruby 解决的实际数据库问题上,并考虑替代方案.

So, changing focus to the actual database problem for which I use PL/ruby and think about alternatives.

问题的简化示例:给定一个仓库日志作为具有以下字段的数据库表:

Simplified example of problem: Given a warehouse journal as a database table with following fields:

  • 日期(日期)
  • 商品类型(外键)
  • 计数(数字)

考虑到仓库在严格的 FIFO 模式下运行,我需要一个列表来显示每堆物品在仓库中停留了多长时间(以及仍在仓库中的剩余物品列表):

Considering the warehouse runs in strict FIFO mode, I need a list that shows how long each bunch of items has stayed in the warehouse (and a list of remainders which are still in the warehouse):

  • journal_recno_in(外键)
  • journal_recno_out(外键)
  • 计数(数字)

所有其他信息都可以轻松加入.

All other info can easily be joined to this.

我需要在当前 SQL 查询中动态创建它,以便包含最新的数据;因此排除了外部程序.我认为用普通的 SQL 查询语言不可能解决这个问题,所以过程语言似乎是唯一的选择.

I need this dynamically created within the current SQL query, so that the most recent data gets included; therefore an external procedure is ruled out. I consider it impossible to solve this with plain SQL query language, so a procedural language seems the only option.

我尝试过 PL/pgSQL,这绝对可行,但它看起来粗糙和丑陋.

I tried with PL/pgSQL, this is definitely possible, but it looked crude and ugly.

现在我正在寻找痛苦最少的方法,并考虑未来的扩展.Ruby 显然是我的最爱,因为这种语言似乎几乎按照我的想法编写代码.但是,如果 PL/ruby 无法实现稳定的行为(而且目前看起来需要大量额外的工作和学习),那就毫无意义了.

Now I am searching for the way of least pain, having in mind future extensions. Ruby would be my clear favourite, as this language seems to almost code itself along my thinking. But if PL/ruby cannot be brought to a solid behaviour (and that looks currently like a lot additional work and learning), that is rather pointless.

建议?我可能忽略的事情?

Suggestions? Things that I may have overlooked?

第一个问题是 PL/ruby 将 ruby​​ SAFE_LEVEL 设置为 12,而 ruby​​ 2.1 最多接受 3,否则会提高.那很容易纠正,然后就可以做简单的功能了.但是在执行 RETURNS SETOF 函数时,它再次崩溃,这次是在 ruby​​ 库中的 rb_iterate() 附近.我在这里放弃了,得出的结论是 PL/ruby 可能需要从头到尾查看(即 5000+ loc).

The first issue was that PL/ruby sets ruby SAFE_LEVEL to 12, while ruby 2.1 accepts a maximum of 3 and does raise otherwise. That is easy to correct, and then it becomes possible to do simple functions. But when doing RETURNS SETOF functions, it crashes again, this time from near rb_iterate() in the ruby lib. I gave up here, with the conclusion that PL/ruby might need a lookover from beginning to end (i.e. 5000+ loc).

@Erwin:这是您需要的数据:

@Erwin: here is your req'd data:

输入表:

CREATE TABLE events (
  id serial PRIMARY KEY,
  datum date NOT NULL,
  name_id integer,
  count numeric(12,4),
  created_at timestamp without time zone,
  updated_at timestamp without time zone,
);

输出格式:

SELECT * FROM ev_fifo() AS (id_in int, id_out int, 
                           datum_in date, datum_out date, 
                           name_id int, 
                           count numeric)

输入示例:

  id  |   datum    | name_id |  count     |      created_at     |      updated_at
------+------------+---------+------------+---------------------+---------------------
  1   | 23.04.2008 |       1 |     1.0000 | 23.04.2008 02:11:45 | 06.06.2008 02:11:45
  2   | 28.04.2008 |       2 |    50.0000 | 29.04.2008 07:17:24 | 16.12.2008 04:32:43
  3   | 03.07.2008 |       2 |   250.0000 | 21.07.2008 01:26:15 | 16.12.2008 04:36:20
  4   | 03.07.2008 |       2 |    -1.0000 | 21.07.2008 01:31:00 | 16.12.2008 04:37:22
  5   | 03.07.2008 |       1 |    -1.0000 | 21.07.2008 01:28:19 | 16.12.2008 04:36:50
  6   | 04.07.2008 |       2 |   -60.0000 | 21.07.2008 01:32:26 | 16.12.2008 04:37:50

所需的输出:

  id_in  |  id_out  |  datum_in  |  datum_out  |  name_id |    count
---------+----------+------------+-------------+----------+-----------
  2      |  4       | 28.04.2008 | 03.07.2008  |        2 |    1.0000
  1      |  5       | 23.04.2009 | 03.07.2008  |        1 |    1.0000
  2      |  6       | 28.04.2008 | 04.07.2008  |        2 |   49.0000
  3      |  6       | 03.07.2008 | 04.07.2008  |        2 |   11.0000
  3      |  NULL    | 03.07.2008 | NULL        |        2 |  239.0000

推荐答案

纯SQL,单查询

作为概念证明,因为我有点挑战你,这个单一的 SQL 查询完成了所有工作:

Pure SQL, single query

As proof of concept, since I kind of challenged you, this single SQL query does it all:

WITH i AS (  -- input summed up
   SELECT id_in, datum_in, name_id, count, numrange(sum - count, sum) AS rng
   FROM  (
      SELECT id AS id_in, datum AS datum_in, name_id, count
           , sum(count) OVER (PARTITION BY name_id ORDER BY datum, id) AS sum
      FROM   events
      WHERE  count > 0
      ) sub
   )
,    o AS (  -- output summed up
   SELECT id_out, datum_out, name_id, count, numrange(sum + count, sum) AS rng
   FROM  (
      SELECT id AS id_out, datum AS datum_out, name_id, count
           , sum(count) OVER (PARTITION BY name_id ORDER BY datum, id) * -1 AS sum
      FROM   events
      WHERE  count < 0
      ) sub

   UNION ALL  -- add ghost range for items still in store
   SELECT NULL AS id_out, NULL AS datum_out, name_id, sum_in - sum_out AS count
        , numrange(sum_out, sum_in) AS rng
   FROM   (
      SELECT name_id, sum(CASE WHEN count > 0 THEN count END)          AS sum_in
           , COALESCE(sum(CASE WHEN count < 0 THEN count END) * -1, 0) AS sum_out
      FROM   events
      GROUP  BY 1
      ) sub
   WHERE  sum_in > sum_out  -- only where items are left
   )
SELECT i.id_in, o.id_out, i.datum_in::text, datum_out::text, i.name_id
     , upper(i.rng * o.rng) - lower(i.rng * o.rng) AS count  -- range intersect operator *
FROM   i
JOIN   o USING (name_id)
WHERE  i.rng && o.rng  -- range overlaps operator &&
ORDER  BY datum_out, id_out, datum_in, id_in;

假设基础表一致:不能扣除比之前添加的更多的项目.即,输出总和 <= 每个 name_id 的输入总和.

Assuming that the underlying table is consistent: no more items can be deducted than have been added previously. I.e., sum of output <= sum of input per name_id.

使用 Postgres 9.3 测试.准确地产生你的结果.并且应该表现不错.

Tested with Postgres 9.3. Produces your result exactly. And should perform decently.

使用范围类型范围运算符 以简化任务.

SQL Fiddle 带有扩展数据以显示角落案例.

SQL Fiddle with extended data to show corner cases.

我希望这种方法明显更快,不过:并行运行两个游标,一个用于输入,另一个用于输出列.所以我们只穿过桌子一次.

I expect this approach to be substantially faster, though: Run two cursors in parallel, one over input, the other over output columns. So we walk through the table only once.

此相关答案实现了基本逻辑(FNC - 函数"一章):

This related answer implements the basic logic (chapter "FNC - Function"):

这篇关于损坏的 PL/ruby 的替代方案:转换仓库日志表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆