系统未在GKE上的docker的scala应用程序中终止 [英] System is not terminated in scala application in docker on GKE
问题描述
我有一个使用Akka Streams的scala应用程序,并在Google Kubernetes Engine中作为cronjob运行.但是吊舱仍处于运行中"状态(未完成).而且Java进程仍在容器内运行.
I have a scala application that uses Akka Streams and running as a cronjob in Google Kubernetes Engine. But the pod is still in the "Running" state (not completed). And the Java process is still running inside the container.
这就是我要做的:
我使用 sbt-native-packager
和 sbt docker:publish
构建docker映像.
I build the docker image with sbt-native-packager
and sbt docker:publish
.
工作完成后,我会通过常规的 system.terminate
调用终止该工作.
When the job is done, I terminate it with regular system.terminate
call.
implicit val system: ActorSystem = ActorSystem("actor-system")
/* doing actual stuff */
stream.runWith(
Sink. // whatever's here
).onComplete { _ ⇒
println("finished!!!")
system.terminate()
}
我在日志中看到 find !!!
,所以必须调用 system.terminate
.
I see finished!!!
in logs, so the system.terminate
must be called.
如果我猛击pod并运行 ps aux
,我仍然看到Java进程正在运行.
If I bash into the pod and run ps aux
I still see the java process running.
demiourgos728@crawler-manual-f9tdf-mjcvn:/opt/docker$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
demiour+ 1 0.1 17.1 2418900 296532 ? Ssl Oct13 4:22 /usr/local/openjdk-8/bin/java -cp /opt/docker/lib/crawler.crawler-2.0.0.jar
demiour+ 212 0.0 0.2 5752 3656 pts/0 Ss 15:25 0:00 /bin/bash
demiour+ 218 0.0 0.1 9392 3064 pts/0 R+ 15:25 0:00 ps aux
当我在本地将其作为独立应用程序运行时,此方法有效,并且在使用docker在本地运行时也终止.
This works when I run it locally as a standalone application, and also terminates when I run it locally with docker.
如何确定应用程序已终止,因此pod的状态变为已完成"?
How do I make sure that the application is terminated and therefore pod's status becomes "Completed"?
这是cronjob的定义方式:
Here's how the cronjob is defined:
# cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: crawler
spec:
schedule: "49 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: crawler-cronjob
image: eu.gcr.io/myawesomeproject/crawler:latest
restartPolicy: OnFailure
更新
我根据以下行的建议更新了代码:
I updated my code as suggested with these lines:
stream.runWith(
Sink. // whatever's here
).onComplete { _ ⇒
println("finished!!!")
system.terminate()
Await.ready(system.whenTerminated, 300.second)
println("and terminated!!!")
}
这是我在日志中看到的:
And this is what I see in the logs:
2020-10-27 09:52:19.375 CEST finished!!!
2020-10-27 09:52:19.489 CEST [ERROR] [10/27/2020 07:52:19.476] [actor-system-akka.actor.default-dispatcher-8] [akka://actor-system/system/pool-master] connection pool for Pool(shared->https://some-api-url.com:443) has shut down unexpectedly
2020-10-27 09:52:19.489 CEST java.lang.IllegalStateException: Pool shutdown unexpectedly
at akka.http.impl.engine.client.PoolInterface$Logic.postStop(PoolInterface.scala:214)
at akka.stream.impl.fusing.GraphInterpreter.finalizeStage(GraphInterpreter.scala:599)
at akka.stream.impl.fusing.GraphInterpreter.finish(GraphInterpreter.scala:324)
...
2020-10-27 09:52:19.508 CEST and terminated!!!
但是下次我运行它(使用更新的消息传递)时,会发生这种情况:
But next time I run it (with updated messaging) this happens:
system.registerOnTermination({
println("\n\n really terminated!!! \n\n")
})
这是jstack的输出:
This is jstack output:
demiourgos728@crawler-manual-7nkq9-lklv8:/opt/docker$ jstack 1
2020-10-27 11:38:15
Full thread dump OpenJDK 64-Bit Server VM (25.265-b01 mixed mode):
"Attach Listener" #158 daemon prio=9 os_prio=0 tid=0x00007f1150089800 nid=0xe3 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"async-channel-group-0-timeout-thread" #45 daemon prio=5 os_prio=0 tid=0x00007f115c008000 nid=0x59 waiting on condition [0x00007f1143f3f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000eee416a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"DestroyJavaVM" #42 prio=5 os_prio=0 tid=0x00007f117400b800 nid=0x2c waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"async-channel-group-0-handler-executor" #41 prio=5 os_prio=0 tid=0x00007f11581e1800 nid=0x56 waiting on condition [0x00007f1144040000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000eee39c58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"cluster-5f93dae7deb88a3b7a33ab63" #40 daemon prio=5 os_prio=0 tid=0x00007f11748d0000 nid=0x55 waiting on condition [0x00007f1144341000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e599e360> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at com.mongodb.internal.connection.BaseCluster$WaitQueueHandler.run(BaseCluster.java:491)
at java.lang.Thread.run(Thread.java:748)
"cluster-rtt-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-02.z3l5v.gcp.mongodb.net:27017" #39 daemon prio=5 os_prio=0 tid=0x00007f11748bd800 nid=0x54 waiting on condition [0x00007f1144442000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.mongodb.internal.connection.DefaultServerMonitor.waitForNext(DefaultServerMonitor.java:435)
at com.mongodb.internal.connection.DefaultServerMonitor.access$1300(DefaultServerMonitor.java:57)
at com.mongodb.internal.connection.DefaultServerMonitor$RoundTripTimeRunnable.run(DefaultServerMonitor.java:409)
at java.lang.Thread.run(Thread.java:748)
"cluster-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-02.z3l5v.gcp.mongodb.net:27017" #38 daemon prio=5 os_prio=0 tid=0x00007f1174670800 nid=0x53 waiting on condition [0x00007f1144543000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000eee507c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:294)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:275)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:170)
at java.lang.Thread.run(Thread.java:748)
"cluster-rtt-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-01.z3l5v.gcp.mongodb.net:27017" #37 daemon prio=5 os_prio=0 tid=0x00007f117465d000 nid=0x52 waiting on condition [0x00007f1144644000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.mongodb.internal.connection.DefaultServerMonitor.waitForNext(DefaultServerMonitor.java:435)
at com.mongodb.internal.connection.DefaultServerMonitor.access$1300(DefaultServerMonitor.java:57)
at com.mongodb.internal.connection.DefaultServerMonitor$RoundTripTimeRunnable.run(DefaultServerMonitor.java:409)
at java.lang.Thread.run(Thread.java:748)
"cluster-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-01.z3l5v.gcp.mongodb.net:27017" #36 daemon prio=5 os_prio=0 tid=0x00007f117465b800 nid=0x51 waiting on condition [0x00007f1144745000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000eee50b40> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:294)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:275)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:170)
at java.lang.Thread.run(Thread.java:748)
"cluster-rtt-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-00.z3l5v.gcp.mongodb.net:27017" #35 daemon prio=5 os_prio=0 tid=0x00007f1174656000 nid=0x50 waiting on condition [0x00007f1144846000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.mongodb.internal.connection.DefaultServerMonitor.waitForNext(DefaultServerMonitor.java:435)
at com.mongodb.internal.connection.DefaultServerMonitor.access$1300(DefaultServerMonitor.java:57)
at com.mongodb.internal.connection.DefaultServerMonitor$RoundTripTimeRunnable.run(DefaultServerMonitor.java:409)
at java.lang.Thread.run(Thread.java:748)
"cluster-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-00.z3l5v.gcp.mongodb.net:27017" #34 daemon prio=5 os_prio=0 tid=0x00007f11743f6800 nid=0x4f waiting on condition [0x00007f1144947000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000eee50ec0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:294)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:275)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:170)
at java.lang.Thread.run(Thread.java:748)
"Thread-1" #32 daemon prio=5 os_prio=0 tid=0x00007f1175354800 nid=0x4d runnable [0x00007f1144b49000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000eee0c2f8> (a sun.nio.ch.Util$3)
- locked <0x00000000eee0c308> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000eee0c2b0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at com.mongodb.connection.TlsChannelStreamFactoryFactory$SelectorMonitor.lambda$start$0(TlsChannelStreamFactoryFactory.java:136)
at com.mongodb.connection.TlsChannelStreamFactoryFactory$SelectorMonitor$$Lambda$698/375074687.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
"async-channel-group-0-selector" #31 prio=5 os_prio=0 tid=0x00007f1175350000 nid=0x4c runnable [0x00007f1144c4a000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000eee0c548> (a sun.nio.ch.Util$3)
- locked <0x00000000eee0c558> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000eee0c500> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup.loop(AsynchronousTlsChannelGroup.java:392)
at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup$$Lambda$696/548795052.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
"AsyncAppender-Worker-ASYNC" #14 daemon prio=5 os_prio=0 tid=0x00007f11580ad000 nid=0x3b waiting on condition [0x00007f11475fa000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000ee865d38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
at ch.qos.logback.core.AsyncAppenderBase$Worker.run(AsyncAppenderBase.java:289)
"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f11740b6000 nid=0x33 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f11740b3000 nid=0x32 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f11740b1000 nid=0x31 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f117409e800 nid=0x30 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f117407d800 nid=0x2f in Object.wait() [0x00007f1178498000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x00000000ee59ad48> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f1174079000 nid=0x2e in Object.wait() [0x00007f1178599000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000000ee59af00> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"VM Thread" os_prio=0 tid=0x00007f117406f800 nid=0x2d runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f11740b9000 nid=0x34 waiting on condition
JNI global references: 3679
推荐答案
我认为您的问题与GKE无关,或者与Kubernetes无关:您的 CronJob
定义看起来不错,并且乍一看,一切应该正常.
I don't think your problem is related to GKE in particular, or Kubernetes in general: your CronJob
definition looks fine and, at first glance, everything should work fine.
我认为您的 Dockerfile
也是正确的,并定义了一个入口点和/或仅启动JVM进程的命令.
I assume your Dockerfile
is correct too and defines an entry point and/or command that just starts the JVM process.
可能您的问题是 ActorSystem
实际上从未因任何原因而终止:也许某些actor无法终止,或者在actor系统终止时出现了一些问题.
Probably your problem is that the ActorSystem
is never actually terminated for any reason: maybe some actor could not be terminated or there is some problem while the actor system is being terminated.
To test this assumption and help debug the problem, you can register a simple registerOnTermination
callback and await for system termination, something like:
implicit val system: ActorSystem = ActorSystem("actor-system")
system.registerOnTermination {
println("the actor system is terminated")
}
/* doing actual stuff */
stream.runWith(
Sink. // whatever's here
).onComplete { _ ⇒
println("finished!!!")
system.terminate()
// Use ready or result, with the amount of time you consider
// appropriate, and analyze the results (timeout, error, ...) provided
// See, for instance, https://stackoverflow.com/questions/41170280/difference-await-ready-and-await-result
Await.ready(system.whenTerminated(), 100.second)
}
如果您正在使用第三方资源(例如数据库或某些其他存储系统)来存储爬网结果,请特别注意.实际上,正如@TomerShetah指出的那样,在检查更新时,很可能是这样:似乎与Mongo相关的代码中可能存在某种连接泄漏.请仔细检查您的代码.另外,如果您可以进一步处理 Await
调用的结果,以将超时作为泄漏的原因而丢弃,那就太好了.
Pay special attention if you are using third party resources like a database or some other store system for storing your crawl results. In fact, reviewing your update, and as @TomerShetah also has indicated, it is very likely the case: it seems that you can have some kind of connection leak in your code related with Mongo. Please, carefully review your code. Also, it would be great if you can further process the results of the Await
call in order to discard a timeout as the cause of the leak.
如果您可以使用与设置有关的更多信息来更新您的答案,我认为这也将有所帮助.例如,Mongo在哪里?您是否还在k8s中部署它?
I think it would be also of help if you can update your answer with further information related with your setup. For instance, where is Mongo? Are you deploying it also in k8s?
这篇关于系统未在GKE上的docker的scala应用程序中终止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!