解析&使用Coldfusion& MySQL [英] Parse & Compare Data using Coldfusion & MySQL

查看:155
本文介绍了解析&使用Coldfusion& MySQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我将解释我需要做什么,然后我会如何来实现它。我现在的计划在理论上似乎非常低效,所以我的问题是是否有一个更好的方法来完成它。



我有2个表 - 叫他们'产品' 'Products_Temp',都是相同的。我需要下载大量的文件(XML或XLS),其中包含来自供应商的产品详细信息(股票,定价等)。然后将它们解析到Products_Temp表中。现在,我计划使用CF计划任务来处理下载,并且Navicat做实际的解析 - 我很高兴这是足够和足够有效。



下一步是我正在努力 - 一旦文件已经下载并解析,我需要查找数据中的任何更改。这将与Products表进行比较。如果发现更改,那么应该添加或更新该行(如果应该删除,那么我需要标记它,而不是删除它)。一旦所有数据被比较,products_temp表应该被清空。



我知道比较表和同步它们的方法,但是我有的问题是事实上,我将处理来自不同来源的多个文件。我认为只使用products表和append / update,但是我不确定如何管理'flag deleted'的要求。



现在,唯一的方法我知道我可以让它工作是循环通过products_temp表,做各种cfquerys和删除行一次完成。然而,这似乎是令人难以置信的低效率,并且考虑到事实,我们可能处理成千上万的行,如果我们每天更新一切,不太可能有效。



解决方案

两个响应都有可能。



选项#1



如果 > mySQL支持某种散列,在每行的基础,您可以使用 comodoro的建议避免硬删除。



识别更改



,在主键上执行内部联接并检查哈希值。如果它们不同,则更改产品并更新:

 更新产品p INNER JOIN Products_Temp tmp ON tmp.ProductID = p.ProductID 
SET p.ProductName = tmp.ProductName
,p.Stock = tmp.Stock
,...
,p.DateLastChanged = now()
,p.IsDiscontinued = 0
WHERE tmpTheRowHash<> p.TheRowHash

识别已删除



使用简单的外连接来标识临时表中不存在的记录,并将它们标记为已删除。

  UPDATE产品p LEFT JOIN Products_Temp tmp ON tmp.ProductID = p.ProductID 
SET p.DateLastChanged = now()
,p.IsDiscontinued = 1
WHERE tmp.ProductID IS NULL

识别新



最后,使用类似的外连接插入任何新产品。

  INSERT INTO产品(ProductName,库存,DateLastChanged,IsDiscontinued,..)
SELECT tmp.ProductName,tmp。库存,现在()AS DateLastChanged,0 AS IsDiscontinued,...
从Products_Temp tmp LEFT JOIN产品p ON tmp.ProductID = p.ProductID
WHERE p.ProductID是NULL



选项#2



如果每行哈希不可行,替代方法是 Sharondio的建议的变体。



在临时表中添加状态列,并通过一系列连接将所有导入的记录标记为新,已更改或未更改。 (默认值应为已更改)。



识别联合国变更



,以识别未更改的产品。 (注意,如果您的表包含任何可空字段,请记住使用 coalesce 否则,结果可能会倾斜,因为 null 值不等于任何值。

 更新Products_Temp tmp INNER JOIN产品p ON tmp.ProductID = p.ProductID 
SET tmp.Status ='Unchanged'
WHERE p.ProductName = tmp.ProductName
AND p.Stock = tmp.Stock
...



识别新



加入以标识新记录。

 更新Products_Temp tmp LEFT JOIN产品p ON tmp.ProductID = p.ProductID 
SET tmp.Status ='New'
WHERE p.ProductID is NULL

一旦计算出状态,就可以更新Products表:

  / *更新更改的产品* / 
更新产品p INNER JOIN Products_Temp tmp ON tmp.ProductID = p.ProductID
SET p.ProductName = tmp.ProductName
,p.Stock = tmp.Stock
,...
,p.DateLastChanged = now()
,p.IsDiscontinued = 0
WHERE tmp.status ='Changed'

/ *插入新产品* /
INSERT INTO产品(ProductName,库存,DateLastChanged,IsDiscontinued,..)
SELECT tmp.ProductName,tmp.Stock,now()AS DateLastChanged,0 AS IsDiscontinued,...
FROM Products_Temp tmp
WHERE tmp.Status ='New'

/ *标志已删除记录* /
更新产品p LEFT JOIN Products_Temp tmp ON tmp.ProductID = p.ProductID
SET p.DateLastChanged = now()
,p.IsDiscontinued = 1
WHERE tmp.ProductID IS NULL


First, I'll explain what I need to do, then how I think I can achieve it. My current plan seems very inefficient in theory, so my question is whether there is a better way of accomplishing it.

I have 2 Tables - lets call them 'Products' and 'Products_Temp', both are identical. I need to download a large number of files (XML or XLS) which contain product details (stock, pricing etc) from suppliers. These are then parsed into the Products_Temp table. Right now, I plan to use CF Scheduled Tasks to handle the downloading, and Navicat to do the actual parsing - I'm happy enough this is adequate and efficient enough.

The next step is where I'm struggling - once the file has been downloaded and parsed, I need to look for any changes in the data. This will be compared against the Products table. If a change is found, then that row should be added or updated (if it should be removed, then I'll need to flag it rather than just delete it). Once all the data has been compared, the products_temp table should be emptied.

I'm aware of methods to compare tables and sync them accordingly, however the issue I have is the fact I'll be handling multiple files from different sources. I had considered using only the products table and append/update, but I'm unsure how I could manage the 'flag deleted' requirement.

Right now, the only way I know I can make it work is to loop through the products_temp table, do various cfquerys and delete the row once complete. However, that seems incredibly inefficient, and given the fact we're likely to be dealing with hundreds of thousands of rows, unlikely to be effective if we update everything daily.

Any pointers or advice on a better route would be appreciated!

解决方案

Both responses have possibilities. Just to expand on your options a little ..

Option #1

IF mySQL supports some sort of hashing, on a per row basis, you could use a variation of comodoro's suggestion to avoid hard deletes.

Identify Changed

To identify changes, do an inner join on the primary key and check the hash values. If they are different, the product was changed and should be updated:

    UPDATE Products p INNER JOIN Products_Temp tmp ON tmp.ProductID = p.ProductID
    SET    p.ProductName = tmp.ProductName
           , p.Stock = tmp.Stock
           , ...
           , p.DateLastChanged = now()
           , p.IsDiscontinued  = 0
    WHERE  tmp.TheRowHash <> p.TheRowHash

Identify Deleted

Use a simple outer join to identify records that do not exist in the temp table, and flag them as "deleted"

    UPDATE Products p LEFT JOIN Products_Temp tmp ON tmp.ProductID = p.ProductID
    SET    p.DateLastChanged = now()
           , p.IsDiscontinued = 1
    WHERE  tmp.ProductID IS NULL

Identify New

Finally, use a similar outer join to insert any "new" products.

    INSERT INTO Products ( ProductName, Stock, DateLastChanged, IsDiscontinued, .. )
    SELECT tmp.ProductName, tmp.Stock, now() AS DateLastChanged, 0 AS IsDiscontinued, ...
    FROM   Products_Temp tmp LEFT JOIN Products p ON tmp.ProductID = p.ProductID
    WHERE  p.ProductID IS NULL

Option #2

If per row hashing is not feasible, an alternate approach is a variation of Sharondio's suggestion.

Add a "status" column to the temp table and flag all imported records as "new", "changed" or "unchanged" through a series of joins. (The default should be "changed").

Identify UN-Changed

First use an inner join, on all fields, to identify products that have NOT changed. (Note, if your table contains any nullable fields, remember to use something like coalesce Otherwise, the results may be skewed because null values are not equal to anything.

    UPDATE  Products_Temp tmp INNER JOIN Products p ON tmp.ProductID = p.ProductID
    SET     tmp.Status = 'Unchanged'
    WHERE   p.ProductName = tmp.ProductName
    AND     p.Stock = tmp.Stock
    ... 

Identify New

Like before, use an outer join to identify "new" records.

    UPDATE  Products_Temp tmp LEFT JOIN Products p ON tmp.ProductID = p.ProductID
    SET     tmp.Status = 'New'
    WHERE   p.ProductID IS NULL

By process of elimination, all other records in the temp table are "changed". Once you have calculated the statuses, you can update the Products table:

    /*  update changed products */
    UPDATE Products p INNER JOIN Products_Temp tmp ON tmp.ProductID = p.ProductID
    SET    p.ProductName = tmp.ProductName
           , p.Stock = tmp.Stock
           , ...
           , p.DateLastChanged = now()
           , p.IsDiscontinued = 0
    WHERE  tmp.status = 'Changed'

    /*  insert new products */
    INSERT INTO Products ( ProductName, Stock, DateLastChanged, IsDiscontinued, .. )
    SELECT tmp.ProductName, tmp.Stock, now() AS DateLastChanged, 0 AS IsDiscontinued, ...
    FROM   Products_Temp tmp
    WHERE  tmp.Status = 'New'

    /* flag deleted records */
    UPDATE Products p LEFT JOIN Products_Temp tmp ON tmp.ProductID = p.ProductID
    SET    p.DateLastChanged = now()
           , p.IsDiscontinued = 1
    WHERE  tmp.ProductID IS NULL

这篇关于解析&amp;使用Coldfusion&amp; MySQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆