doctrine2 - 如何提高冲洗效率? [英] doctrine2 - How to improve flush efficiency?

查看:133
本文介绍了doctrine2 - 如何提高冲洗效率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须更新我的Doctrine实体来匹配(强大的非常大的)XML文件中的记录。我还必须根据XML中的数据更新ManyToMany关联。这是我在循环中所做的:


  1. 从XML获取数据

  2. 从DB(如果不存在创建新的)

  3. 设置新的实体属性

  4. 获取当前实体关联(getter返回 ArrayCollection object)

  5. 清除所有关联(通过调用 ArrayCollection :: clear()

  6. 设置新关联(通过在子循环中调用 ArrayCollection :: add()

  7. 持久实体通过EntityManager

循环后,我调用 EntityManager :: flush() / code>。



问题是,刷新会生成大量查询,而不是一次更新/插入/删除多个行。对于每个实体,执行以下查询:




  • SELECT从DB获取实体

  • UPDATE更新实体属性(现在实际上已经跳过,因为没有属性更改...)

  • 删除以清除以前的关联

  • 插入插入新关联



所以总共有305条记录在XML中,我得到915个查询(我猜,如果所有实体都会更改,可能会达到1220个查询)这使得导入非常缓慢。



我可以利用IdentityMap和pre-fetch实体在循环之前,但仍然有UPDATE / DELETE / INSERT查询。 p>


  • 有没有办法让flush方法更好地优化查询(使用多插入,WHERE IN而不是多个DELETE查询等)?

  • 这是正常的flush方法行为还是我做错了?

  • 也许在更新实体关联的方式中有问题。有更好的方法如何做到这一点吗? (而不是获取/清除/添加方法)

  • 我知道该原则不是用于大规模投入处理,但我认为将其用于XML导入是最好的方法以避免可能与非ORM方法一起出现的DB不一致。这是对的吗?

  • 如果上述方法出错,我该如何解决问题?


解决方案

你做的正确 - 这只是缓慢的,因为ORM的附加抽象意味着你不能做出你想要的优化。



那就是说,EntityManager对于大的事务确实变慢。如果你不是完全需要它们,那么你可以通过flush()来获取更多的性能代码,然后在你的循环每20-200次迭代清除())EM。



如果没有足够的性能,我可以想到的唯一替代方法是恢复到直接针对您的DBMS运行自定义SQL的自定义代码。



我知道这不是一个很好的答案,但至少我可以告诉你,你不是疯了。



----编辑------


从官方的Doctrine2文章批处理


有些人似乎想知道为什么Doctrine不使用
多插入(insert into(...)values(...),(...),(...)... ...



首先,这种语法仅在mysql和更新的
postgresql版本上受支持;其次,当使用
AUTO_INCREMENT或SERIAL并且ORM需要身份
管理对象的标识符时,没有简单的方法来保持所有
生成的标识符在这种多插入中。最后,插入性能很少是ORM的
瓶颈。正常插入对于
大多数情况来说都非常快,如果你真的想做快速批量插入,那么一个
多插入不是最好的方式,比如Postgres COPY或Mysql
负载数据INFILE是几个数量级。



这些是为了实现执行多插入的
抽象不值得的努力的原因mysql和postgresql在
ORM中。


使用远程vs本地时,性能也有显着差异数据库作为向远程服务器发送每个查询的开销相当大。由于事务和DB优化,使用本地数据库时,开销要低得多。 (例如在问题的例子中降低至70ms),


I have to update my Doctrine entities to match records inside of (potentionaly very large) XML file. I have to also update ManyToMany associations according to data in the XML. This is what I do inside of a loop:

  1. get data from XML
  2. get entity from DB (if does not exist create new)
  3. set new entity properties
  4. get current entity associations (getter returns ArrayCollection object)
  5. clear all associations (by calling the ArrayCollection::clear())
  6. set new associations (by calling ArrayCollection::add() in sub-loop)
  7. persist entity by EntityManager

After the loop I call EntityManager::flush().

The problem is that flushing generates large amount of queries instead of updating/inserting/deleting multiple rows at once. For every entity are executed following queries:

  • SELECT to get entity from DB
  • UPDATE to update entity properties (this is actually skipped now as no properties changed ... yet)
  • DELETE to clear previous associations
  • INSERT to insert new associations

So in total for 305 records in XML i get 915 queries (I guess it could go up to 1220 queries if all entities would changed) which makes the import very slow.

I could take advantage of IdentityMap and pre-fetch entities before loop, but there are still the UPDATE/DELETE/INSERT queries.

  • Is there a way to let the flush method better optimize queries (use multi-insert, WHERE IN instead of multiple DELETE queries, etc.)?
  • Is this normal behaviour of flush method or am I doing something wrong?
  • Perhaps there is problem in the way how I update the associations of entity. Is there better way how to do this? (instead of "get/clear/add" method)
  • I am aware of that Doctrine is not intended for mass betch processing, but I think using it for XML imports is the best way how to avoid DB inconsitencies which could appear with a not-ORM approach. Is that right?
  • If the approach above is wrong, how should I solve the problem?

解决方案

You're doing it right -- it's just slow, because the added abstraction of the ORM means you can't make the sorts of optimizations you'd like.

That said, the EntityManager does get slow on transactions that large. If you don't absolutely need them all in one big transaction, you can probably get more performant code by flush()ing and then clear()ing the EM every 20-200 iterations of your loop.

If that doesn't get you enough performance, the only alternative that I can think of is to revert to custom code that runs custom SQL directly against your DBMS.

I know this isn't a great answer, but at least I can tell you that you're not crazy.

------ edit ------

From official Doctrine2 article on Batch processing:

Some people seem to be wondering why Doctrine does not use multi-inserts (insert into (...) values (...), (...), (...), ...

First of all, this syntax is only supported on mysql and newer postgresql versions. Secondly, there is no easy way to get hold of all the generated identifiers in such a multi-insert when using AUTO_INCREMENT or SERIAL and an ORM needs the identifiers for identity management of the objects. Lastly, insert performance is rarely the bottleneck of an ORM. Normal inserts are more than fast enough for most situations and if you really want to do fast bulk inserts, then a multi-insert is not the best way anyway, i.e. Postgres COPY or Mysql LOAD DATA INFILE are several orders of magnitude faster.

These are the reasons why it is not worth the effort to implement an abstraction that performs multi-inserts on mysql and postgresql in an ORM.

Also there is a significant difference in performance when using remote vs local database as overhead of sending each query to remote server is quite large. The overhead is much lower while using local database thanks to transactions and DB optimizations. (e.g. 70sec lowered to 300ms in the case of example in the question)

这篇关于doctrine2 - 如何提高冲洗效率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆