排除Oracle - 挂起进程 [英] Troubleshoot Oracle - hung process

查看:205
本文介绍了排除Oracle - 挂起进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解我们正在使用的Java进程挂起的问题。这个过程已经在生产中运行了大约4个月,并且本周早些时候它开始挂起。当我看一个进程的线程转储所有相关线程(3)有如下堆栈:

 TxnParser_1 prio = 6 tid = 0x69bd3400 nid = 0x2534 runnable [0x6aa2f000] 
java.lang.Thread.State:RUNNABLE
在java.net.SocketInputStream.socketRead0(本地方法)
在java。 net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(未知源)
at oracle.net.ns.DataPacket.receive(未知源)
at oracle.net.ns.NetInputStream.getNextPacket(未知源)
at oracle.net.ns.NetInputStream.read(未知源)
at oracle.net.ns.NetInputStream.read(未知)源)
在oracle.net.ns.NetInputStream.read(未知源)
在oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
在oracle.jdbc。 driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
在oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
在oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement。 java:207)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:790)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1039)
在oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:830)
在oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1132)
在oracle.jdbc.driver.OracleStatement .executeInternal(OracleStatement.java:1687)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1653)
- locked< 0x40e22f88> (a oracle.jdbc.driver.T4CStatement)
- locked< 0x28f8d398> (a oracle.jdbc.driver.T4CConnection)
at com.gcg.data.LogParsingInfo.initFromDB(LogParsingInfo.java:262)
at com.gcg.om.OmQueueEntry.initParseInfoFromDB(OmQueueEntry.java: 104)
at com.gcg.om.GenericQueueEntry.run(GenericQueueEntry.java:237)
at java.util.concurrent.ThreadPoolExecutor $ Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

没有线程等待锁,所以进程没有死锁。这些工作的3个线程只是阻塞等待Oracle的响应,至少这是我看起来像这样。



查看Oracle,当我查询v $ session,它看起来像与这些线程相关联的连接当前正在执行一个查询,虽然我看不到sql。

 code> select ... from v $ session where ...; 
SQL_ADDRESS SQL_HASH_VALUE SQL_ID SQL_CHILD_NUMBER SQL_EXEC_START SQL_EXEC_ID PREV_SQL_ADDR PREV_HASH_VALUE PREV_SQL_ID PREV_CHILD_NUMBER PREV_EXEC_START PREV_EXEC_ID
---------------- -------------- ------------- ---------------- -------------- ------- ---- ---------------- --------------- ------------- - --------------- --------------- ------------
00 0 0000000239F59EE8 1483377872 fqr8pndc6p36h 5 26-JUL-12 32080545
00 0 0000000239F59EE8 1483377872 fqr8pndc6p36h 5 26-JUL-12 32080546
0000000148CABD88 1784444892 a16hxxtp5sxyw 0000000239F59EE8 1483377872 fqr8pndc6p36h 5 26-JUL-12 32080544

select * from v $ sql其中sql_id ='a16hxxtp5sxyw';

未选择行

我的问题是:


  1. 在我的分析中,我是否纠正了该进程被阻塞等待Oracle响应?



  2. 更新:

    p>基于关于在DBA_WAITERS和DBA_LOCKS中查找的注释

      select * from dba_waiters; 

    未选择行

    select * from dba_locks where BLOCKING_OTHERS<> '不阻止';

    未选择行

    dba_locks中有98行, 不阻塞我不认为这是一个锁定问题?这个进程已经在这个状态超过3小时,所以任何死锁现在已经被检测到。



    我的理论,Oracle实例不是健康,但我很失望,看看什么。我有一个请求重新启动Oracle服务器,但是还没有完成。



    后续问题:v $ session是否正常在v $ sql中不存在的sql_id,如果是,在什么条件下?

    解决方案

    问题解决了,答案是正确的在v $会话表。显然,Oracle会话可以阻止其他原因,而不仅仅是锁定。请注意列FINAL_BLOCKING_SESSION - 它标识是阻止的根本原因的会话。
    我们调查会话845,发现客户端进程(由MACHINE和PORT标识)不再存在。

      SID SERIAL#状态程式类型SQL_ID PREV_SQL_ID BLOCKING_SESSION_STATUS BLOCKING_INSTANCE BLOCKING_SESSION FINAL_BLOCKING_SESSION_STATUS FINAL_BLOCKING_INSTANCE FINAL_BLOCKING_SESSION EVENT 
    ------- ------- --------- ---------------- ---- --- ---------- -------------- ----------------------- --- -------------- ---------------- -------------------- --------- ----------------------- ------------------ ---- ----------------------------
    108 22447 ACTIVE Gcg log parser 1 USER fqr8pndc6p36h VALID 1 1581 VALID 1 845库缓存:mutex X
    639 40147 ACTIVE gcg log parser 3 USER fqr8pndc6p36h VALID 1 1581 VALID 1 845 library cache:mutex X
    742 34683 ACTIVE Gcg log parser 2 USER a16hxxtp5sxyw fqr8pndc6p36h VALID 1 1581 VALID 1 845库缓存:mutex X


    I'm trying to understand a problem we're having with a Java process that is hanging. This process has been running in production for about 4 months and earlier this week it started hanging. When I look at a thread dump of the process all of the relevant threads (3) have stacks like the following:

        "TxnParser_1" prio=6 tid=0x69bd3400 nid=0x2534 runnable [0x6aa2f000]
       java.lang.Thread.State: RUNNABLE
            at java.net.SocketInputStream.socketRead0(Native Method)
            at java.net.SocketInputStream.read(SocketInputStream.java:129)
            at oracle.net.ns.Packet.receive(Unknown Source)
            at oracle.net.ns.DataPacket.receive(Unknown Source)
            at oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
            at oracle.net.ns.NetInputStream.read(Unknown Source)
            at oracle.net.ns.NetInputStream.read(Unknown Source)
            at oracle.net.ns.NetInputStream.read(Unknown Source)
            at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
            at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
            at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
            at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:207)
            at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:790)
            at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1039)
            at oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:830)
            at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1132)
            at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1687)
            at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1653)
            - locked <0x40e22f88> (a oracle.jdbc.driver.T4CStatement)
            - locked <0x28f8d398> (a oracle.jdbc.driver.T4CConnection)
            at com.gcg.data.LogParsingInfo.initFromDB(LogParsingInfo.java:262)
            at com.gcg.om.OmQueueEntry.initParseInfoFromDB(OmQueueEntry.java:104)
            at com.gcg.om.GenericQueueEntry.run(GenericQueueEntry.java:237)
            at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
            at java.lang.Thread.run(Thread.java:619)
    

    There are no threads waiting for locks so the process isn't deadlocked. These 3 threads that are doing the work are just blocked waiting for a response from Oracle, at least that is what it looks like to me.

    Looking at Oracle, when I query v$session, it looks like one of the connections associated with these threads is currently executing a query, although I can't see the sql.

    select ... from v$session where ...;
    SQL_ADDRESS      SQL_HASH_VALUE SQL_ID        SQL_CHILD_NUMBER SQL_EXEC_START SQL_EXEC_ID PREV_SQL_ADDR    PREV_HASH_VALUE PREV_SQL_ID   PREV_CHILD_NUMBER PREV_EXEC_START PREV_EXEC_ID
    ---------------- -------------- ------------- ---------------- -------------- ----------- ---------------- --------------- ------------- ----------------- --------------- ------------
                  00              0                                                           0000000239F59EE8      1483377872 fqr8pndc6p36h                 5 26-JUL-12           32080545
                  00              0                                                           0000000239F59EE8      1483377872 fqr8pndc6p36h                 5 26-JUL-12           32080546
    0000000148CABD88     1784444892 a16hxxtp5sxyw                                             0000000239F59EE8      1483377872 fqr8pndc6p36h                 5 26-JUL-12           32080544
    
    select * from v$sql where sql_id = 'a16hxxtp5sxyw';
    
    no rows selected
    

    My questions are:

    1. Am I correct in my analysis that the process is simply blocked waiting for a response from Oracle?
    2. What should I be looking for in Oracle to understand why this process is blocking?

    Updated:

    Based on the comment regarding looking in DBA_WAITERS and DBA_LOCKS

    select * from dba_waiters;
    
    no rows selected
    
    select * from dba_locks where BLOCKING_OTHERS <> 'Not Blocking';
    
    no rows selected 
    

    There were 98 rows in dba_locks but since all are 'Not Blocking' I don't think it is a locking issue? The process in question has been in this state for over 3 hours so any deadlock would have been detected by now.

    I'm of the theory that the Oracle instance is not "healthy", but I'm at a loss as to what to look at. I have a request in to reboot the Oracle server, but that hasn't been done yet.

    Follow-up question: Is it normal for the v$session to contain a sql_id that doesn't exist in v$sql and if so, under what conditions?

    解决方案

    The problem was solved and the answer was right in the v$session table. Apparently Oracle sessions can block for reasons other than just locking. Notice the column FINAL_BLOCKING_SESSION - it identifies the session that is the root cause of the blocking. We investigated session 845 and found that the client process (identified by MACHINE and PORT) no longer existed. The DBA killed session 845 and all returned to normal.

    SID     SERIAL# STATUS    PROGRAM          TYPE SQL_ID        PREV_SQL_ID    BLOCKING_SESSION_STATUS BLOCKING_INSTANCE BLOCKING_SESSION FINAL_BLOCKING_SESSION_STATUS FINAL_BLOCKING_INSTANCE FINAL_BLOCKING_SESSION EVENT
    ------- ------- --------- ---------------- ---- ------------- -------------- ----------------------- ----------------- ---------------- ----------------------------- ----------------------- ---------------------- ----------------------------
     108    22447   ACTIVE    Gcg log parser 1 USER               fqr8pndc6p36h  VALID                   1                 1581             VALID                         1                       845                    library cache: mutex X
     639    40147   ACTIVE    Gcg log parser 3 USER               fqr8pndc6p36h  VALID                   1                 1581             VALID                         1                       845                    library cache: mutex X
     742    34683   ACTIVE    Gcg log parser 2 USER a16hxxtp5sxyw fqr8pndc6p36h  VALID                   1                 1581             VALID                         1                       845                    library cache: mutex X
    

    这篇关于排除Oracle - 挂起进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆