大量插入和截断的Oracle性能问题(已附加AWR) [英] Oracle performance issue with massive inserts and truncates (AWR attached)

查看:82
本文介绍了大量插入和截断的Oracle性能问题(已附加AWR)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Oracle在应用服务器的两个节点之间同步事件(不要介意为什么/如果这是最好的方法/等.这是给定的).

I'm using Oracle to synchronize events between two nodes of my application server (never mind why/if that's the best way/etc. This is a given).

为此,我使用一个事件"表,一个节点(活动")向其中写入新事件,而另一个节点(被动")从中读取新事件.该表如下所示:

To do so, I'm using an "events" table that one node (the "active") writes new events to and the other node (the "passive") reads from. The table looks like:

事件UUID(UUID)||事件ID(长)||事件数据(不同类型的几列)

Event UUID (UUID) || Event ID (long) || Event Data (several columns of different types)

事件ID是一个不断增加的数字(由应用程序控制,而不是序列),表示应用事件数据后内部模型的修订版本.事件UUID具有唯一约束.我在事件ID上有一个索引,以方便进行选择SQL-选择...,其中Event_ID>通过Event_ID进行XXX排序",其中XXX是被动节点的内部修订号.偶尔我会截断表(使用截断重用存储").
[实际上,我以循环方式使用了三个这样的表,因此我总是可以截断要写入到我的被动节点的表,这样我就可以有时间追赶".]

The event ID is a number constantly increasing (application controlled, not a sequence) that signifies the revision the internal model would be at after applying the event data. The Event UUID has unique constraint. I have a single index on the event ID to facilitate the select SQL - "Select ... where Event_ID > XXX order by Event_ID" where the XXX is the internal revision number of the passive node. Once in a while I truncate the table (using "truncate reuse storage").
[Actually, I use three such tables in a round-robin order so I could always truncate the one I'm about to write to the my passive node can have time to "catch up"].

在插入并截断所有内容都可以的地方数小时之后,我开始从数据库中获得大量噪音",并且响应时间急剧下降.这可能持续一两个小时(甚至更多),然后突然停止,响应时间恢复到正常水平. AWR报告指向截断语句,有点指向插入语句.我怀疑DBWR发生了什么-但我无法查明.请注意,即使辅助节点(运行"SELECT"语句的节点)处于关闭状态,也会发生这种性能下降-因此它是纯插入/截断的.注意2:此问题不会在MSSQL上重现.

After several hours of inserting and truncating where everything is fine I start getting a lot of "noise" from the database and response time drops dramatically. This can go on for an hour or two (or even more), then all of the sudden it stops and response time return to its normal level. The AWR reports point toward the truncate statement and a bit toward the insert statements. I suspect something is going on with DBWR - but I can't pinpoint. Note that this performance degradation happens even when the secondary node (the one running the "SELECT" statements) is off - so it's pure insert/truncate. Note2: This issue does NOT reproduce on MSSQL.

问题:为什么会这样?我该怎么做才能阻止它?此设计是否有替代方案(与当前设计的替代方案越接近越好).

The question: why is this happening? What can I do to stop it? Are there alternatives to this design (the closer the alternative to the current design the better).

更新1:我可能会误导标题.这不是单个的大量插入,而是随着事件在应用程序服务器中生成而发生的一系列插入.

Update 1: I might have mislead with the title. This is not a single massive insert but a trickle of inserts as the events are generated in the application server.

更新2:第一个时期(好)和第二个时期(不好)的样本的AWR比较位于

Update 2: AWR compare of a sample from the first period (good) and the second period (bad) is at http://pastehtml.com/view/1eirn20.html

更新3:位于 http://pastehtml.com/view/1eirn20.html 的新AWR差异

Update 3: new AWR diff at http://pastehtml.com/view/1eirn20.html

更新4:已解决.显然它是存储设备(感谢ik_zelf!).只是展示一下-抽象并不是真正的抽象.最后是一块磁化的旋转板.

Update 4: Solved. Apparently it WAS the storage (thanks ik_zelf!). Just goes to show - abstractions aren't really abstract. At the end, it's a magnetized spinning plate.

推荐答案

在awr报告中清楚地表明,在不良时期内,与第一时期相比,io时间翻了一番.检查存储空间使用情况.很有可能存储是在系统之间共享的,例如,另一个系统正在备份并造成不良时期.检查所有已连接系统/备份的日志,尝试将时间连接到测试结果.

In the awr reports is a clear indication that the io time doubles in the bad period compared to the first period. Check the storage usage. It could very well be that the storage is shared between systems and that - for example - an other system is taking a backup and causes a bad period. Check the logs of all connected systems/backups an try to connect the times to your test findings.

这篇关于大量插入和截断的Oracle性能问题(已附加AWR)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆