数据库崩溃后重新启动问题(信号11)。 [英] Problems restarting after database crashed (signal 11).

查看:143
本文介绍了数据库崩溃后重新启动问题(信号11)。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

昨天,在尝试访问数据库时,我收到了错误

,说数据库是无法访问的。在调查了一下

之后,我在PostgreSQL日志文件中发现了以下内容:


2004-06-30 08:30:19 [24119]日志:检查点进程(PID 28423)

由信号终止11

2004-06-30 08:30:19 [24119]日志:终止任何其他活动服务器

流程

2004-06-30 08:30:19 [28383]警告:由于另一个服务器进程的
崩溃而终止连接

详细信息:邮件管理员已命令此服务器进程回滚当前事务并退出
,因为另一个服务器进程退出

异常并且可能已损坏共享内存。

提示:片刻之后,你应该可以重新连接到数据库了。

重复你的命令。

2004-06- 30 08:30:19 [28362]警告:由于另一个服务器进程崩溃而导致连接终止连接

详细信息:邮件管理员已命令此服务器进程回滚

当前事务并退出,因为另一个服务器进程退出

异常并且可能损坏共享内存。

提示:稍后您应该能够重新连接到数据库和

重复你的命令。


最后一点再重复几次,然后:


2004-06-30 08:30:20 [24119]日志:所有服务器进程终止;

重新初始化

2004-06-30 08:30:20 [28424]日志:数据库系统中断2004-06-30

08:22:23 CDT

2004-06-30 08:30:20 [28424]日志:检查站记录是8 / 77703F9C

2004-06-30 08:30:20 [28424]日志:重做记录是8 / 775B1D38;撤消

记录为0/0;关闭FALSE

2004-06-30 08:30:20 [28424]日志:下次交易ID:1638554;下一页

OID:1058492

2004-06-30 08:30:20 [28424]日志:数据库系统未正确关闭

down ;正在进行自动恢复

2004-06-30 08:30:20 [28424]日志:重做从8 / 775B1D38开始

2004-06-30 08:30: 21 [28430]日志:收到连接:主机= [本地]端口=

2004-06-30 08:30:21 [28430]致命:数据库系统正在启动

2004-06-30 08:30:38 [28424]日志:记录零长度为8 / 78855F38

2004-06-30 08:30:38 [28424]日志:重做完成于8 / 78853EE0

2004-06-30 08:31:40 [28449]日志:已收到的连接数:host = [local] port =

2004-06- 30 08:31:40 [28449]致命:数据库系统正在启动

2004-06-30 08:31:48 [28452]日志:已收到连接:host = [local] port =

2004-06-30 08:31:48 [28452]致命:数据库系统正在启动

2004-06-30 08:31:53 [28459]日志:收到连接:主机= [本地]端口=

2004-06-30 08:31:53 [28459]致命:数据库系统正在启动


然后继续这样。即使20分钟后,连接到数据库的
尝试也遇到了相同的致命错误。


最终我试图将其关闭并重新启动,然而,

也失败了。当我试图关闭它时,我发现了一个不能被杀死的启动子进程'


nexus:〜# ps aux | grep postgres

postgres 28424 0.0 1.5 16804 3044 pts / 313 D 08:35 0:06 postgres:

启动子流程

nexus:〜# kill -9 28424

nexus:〜#ps aux | grep postgres

postgres 28424 0.0 1.5 16804 3044 pts / 313 D 08:35 0:06 postgres:

启动子流程

nexus:〜#


一旦我可以实际访问机器,我就打算重新启动它,因为我无法想到任何东西否则要杀死一个进程

不能杀死-KILL''ed。


我担心试图启动然而,重新启动后数据库将以同样的方式失败。之前有没有人见过这样的

,或者对如何继续有任何想法?


我正在使用英特尔奔腾Pro盒子, Debian / GNU Linux,运行

''不稳定''。我正在使用PostgreSQL 7.4.3。


感谢您的帮助。


-

|克里斯托弗

+ ---------------------------------------- -------- +

|我站在这儿。我别无他法。 |

+ ---------------------------------------- -------- +

---------------------------(播出结束) - --------------------------

提示2:您可以使用unregister命令一次性取消所有列表

(发送取消注册YourEmailAddressHere到 ma ******* @ postgresql.org

Yesterday, while attempting to access a database, I received errors
saying that the database was innaccessible. After investigating a
little, I found the following in the PostgreSQL log files:

2004-06-30 08:30:19 [24119] LOG: checkpoint process (PID 28423) was
terminated by signal 11
2004-06-30 08:30:19 [24119] LOG: terminating any other active server
processes
2004-06-30 08:30:19 [28383] WARNING: terminating connection because of
crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the curre nt transaction and exit, because another server process exited
abnormally and po ssibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat yo ur command.
2004-06-30 08:30:19 [28362] WARNING: terminating connection because of
crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the curre nt transaction and exit, because another server process exited
abnormally and po ssibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat yo ur command.

The last bit then repeated a few more times, and then:

2004-06-30 08:30:20 [24119] LOG: all server processes terminated;
reinitializing
2004-06-30 08:30:20 [28424] LOG: database system was interrupted at 2004-06-30
08:22:23 CDT
2004-06-30 08:30:20 [28424] LOG: checkpoint record is at 8/77703F9C
2004-06-30 08:30:20 [28424] LOG: redo record is at 8/775B1D38; undo
record is at 0/0; shutdown FALSE
2004-06-30 08:30:20 [28424] LOG: next transaction ID: 1638554; next
OID: 1058492
2004-06-30 08:30:20 [28424] LOG: database system was not properly shut
down; automatic recovery in progress
2004-06-30 08:30:20 [28424] LOG: redo starts at 8/775B1D38
2004-06-30 08:30:21 [28430] LOG: connection received: host=[local] port=
2004-06-30 08:30:21 [28430] FATAL: the database system is starting up
2004-06-30 08:30:38 [28424] LOG: record with zero length at 8/78855F38
2004-06-30 08:30:38 [28424] LOG: redo done at 8/78853EE0
2004-06-30 08:31:40 [28449] LOG: connection received: host=[local] port=
2004-06-30 08:31:40 [28449] FATAL: the database system is starting up
2004-06-30 08:31:48 [28452] LOG: connection received: host=[local] port=
2004-06-30 08:31:48 [28452] FATAL: the database system is starting up
2004-06-30 08:31:53 [28459] LOG: connection received: host=[local] port=
2004-06-30 08:31:53 [28459] FATAL: the database system is starting up

And this then continues on and on. Even 20 minutes later, attempts to
connect to the database were met with the same FATAL error.

Eventually I attempted to shut it down and restart it, however that
failed too. When I attempted to shut it down, I discovered a hung
''startup subprocess'' that can''t be killed.

nexus:~# ps aux | grep postgres
postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres:
startup subprocess
nexus:~# kill -9 28424
nexus:~# ps aux | grep postgres
postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres:
startup subprocess
nexus:~#

As soon as I can get physical access to the machine, I''m planning to
reboot it, as I can''t think of anything else to do to kill a process
that can''t be kill -KILL''ed.

I''m worried that attempting to start the database after rebooting will
fail in the same way, however. Has anyone seen anything like this
before, or have any ideas on how to proceed?

I''m running on an Intel Pentium Pro box, with Debian/GNU Linux, running
''unstable''. I''m using PostgreSQL 7.4.3.

Thank you for your help.

--
| Christopher
+------------------------------------------------+
| Here I stand. I can do no other. |
+------------------------------------------------+
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

推荐答案

Christopher Cashell< to ********** @ zyp.org>写道:
Christopher Cashell <to**********@zyp.org> writes:
最终我试图将其关闭并重新启动,但是
也失败了。当我试图关闭它时,我发现了一个不能被杀死的启动子启动过程。
Eventually I attempted to shut it down and restart it, however that
failed too. When I attempted to shut it down, I discovered a hung
''startup subprocess'' that can''t be killed.




这很有趣因为它看起来就像这样最近的红帽错误报告:
https://bugzilla.redhat.com/bugzilla....cgi?id=126885


正如我评论的那样在那里,我认为它必须是一个内核或硬件

问题--- Postgres本身肯定不能做出一个不可杀戮的过程。

然而,这是常见的如果

它们被困在内核I / O请求中,则不响应kill的进程。这可能意味着

无响应的硬件或内核错误。


我想知道你是否在硬件或Linux内核中有任何相似之处

提交上述报告的人?


问候,汤姆巷


------------ ---------------(播出结束)---------------------------

提示9:如果您的

加入列的数据类型不匹配,计划员将忽略您选择索引扫描的愿望



This is interesting because it seems just about exactly like this
recent Red Hat bug report:
https://bugzilla.redhat.com/bugzilla....cgi?id=126885

As I commented there, I think that it must be a kernel or hardware
issue --- Postgres itself can surely not make an unkillable process.
However it''s common to see processes that don''t respond to kill if
they are stuck inside a kernel I/O request. That could mean either
unresponsive hardware or a kernel bug.

I wonder whether you have any similarities in hardware or Linux kernel
to the person who filed the above report?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column''s datatypes do not match


04年6月30日星期三,身份不明的飞香蕉Tom Lane说:
At Wed, 30 Jun 04, Unidentified Flying Banana Tom Lane, said:
Christopher Cashell< to ********** @ zyp.org>写道:
Christopher Cashell <to**********@zyp.org> writes:
最终我试图将其关闭并重新启动,但是
也失败了。当我试图关闭它时,我发现了一个不能被杀死的启动子启动子程序。
这很有趣,因为它看起来就像这样
最近红帽错误报告:
https:// bugzilla .redhat.com / bugzilla .... cgi?id = 126885
Eventually I attempted to shut it down and restart it, however that
failed too. When I attempted to shut it down, I discovered a hung
''startup subprocess'' that can''t be killed.
This is interesting because it seems just about exactly like this
recent Red Hat bug report:
https://bugzilla.redhat.com/bugzilla....cgi?id=126885




Hrm。是的,它似乎是一个非常相似的,如果不是相同的问题。

正如我在那里评论的那样,我认为它必须是内核或硬件问题--- Postgres本身肯定不能做出一个不可杀戮的过程。
然而,如果它们被困在内核I / O请求中,那么看到不会响应kill的进程是很常见的。这可能意味着没有响应的硬件或内核错误。


这有点像我在想的那样,虽然我有

之前没有这样的问题。这台机器已经运行了100天以上的b $ b和数据库,没有问题。


28424 postgres 18 0 16804 3044 15m D 0.0 1.6 0 :06.72 postmaster


请注意,它确实具有D的过程状态,或者不间断

睡眠。这可以解释不可杀戮的部分,虽然我很好奇它是如何结束那里的。当Posgres发生分裂时,除非恰好是在一个非常糟糕的地方

。 。 。虽然,我不认为这会影响''启动子流程''。

我想知道你是否在硬件或Linux内核方面有任何相似之处提交上述报告的人?


这里是我为这台机器提供的所有信息:


IBM IntelliStation Z Pro

型号:6899-12U

双Pentium Pro 200

192MB RAM

4.5 GB IBM SCSI HDD

9 GB IBM SCSI HDD

6.4 GB WD HDD


数据库驻留在4.5 GB SCSI上,pg_xlog目录

符号链接在那里,实际存在于9GB SCSI上。


nexus:〜



Hrm. Yes, it does appear to be a very similar, if not identical, issue.
As I commented there, I think that it must be a kernel or hardware
issue --- Postgres itself can surely not make an unkillable process.
However it''s common to see processes that don''t respond to kill if
they are stuck inside a kernel I/O request. That could mean either
unresponsive hardware or a kernel bug.
That is somewhat along the lines of what I was thinking, although I have
had no problems like this before. The machine has been running for over
100 days, and the database as well, without issue.

28424 postgres 18 0 16804 3044 15m D 0.0 1.6 0:06.72 postmaster

Note that it does have a process status of ''D'', or uninterruptible
sleep. That would explain the unkillable part, though I''m curious how
it ended up there. Unless it just happened to be in a really bad spot
when Posgres segfaulted. . . although, I wouldn''t expect that would
affect the ''startup subprocess''.
I wonder whether you have any similarities in hardware or Linux kernel
to the person who filed the above report?
Here''s all the information I can provide for this machine:

IBM IntelliStation Z Pro
Model: 6899-12U
Dual Pentium Pro 200
192MB RAM
4.5 GB IBM SCSI HDD
9 GB IBM SCSI HDD
6.4 GB WD HDD

The database resides on the 4.5 GB SCSI, with the pg_xlog directory
symlinked from there, and actually existing on the 9GB SCSI.

nexus:~


uname -a

Linux nexus。 zyp.org 2.6.4#1 SMP Thu Mar 11 14:04:49 CST 2004 i686 GNU / Linux

nexus:〜
uname -a
Linux nexus.zyp.org 2.6.4 #1 SMP Thu Mar 11 14:04:49 CST 2004 i686 GNU/Linux
nexus:~


这篇关于数据库崩溃后重新启动问题(信号11)。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆