系统未在GKE上的docker的scala应用程序中终止 [英] System is not terminated in scala application in docker on GKE

查看:64
本文介绍了系统未在GKE上的docker的scala应用程序中终止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用Akka Streams的scala应用程序,并在Google Kubernetes Engine中作为cronjob运行.但是吊舱仍处于运行中"状态(未完成).而且Java进程仍在容器内运行.

I have a scala application that uses Akka Streams and running as a cronjob in Google Kubernetes Engine. But the pod is still in the "Running" state (not completed). And the Java process is still running inside the container.

这就是我要做的:

我使用 sbt-native-packager sbt docker:publish 构建docker映像.

I build the docker image with sbt-native-packager and sbt docker:publish.

工作完成后,我会通过常规的 system.terminate 调用终止该工作.

When the job is done, I terminate it with regular system.terminate call.

implicit val system: ActorSystem = ActorSystem("actor-system")

/* doing actual stuff */

stream.runWith(
    Sink. // whatever's here
  ).onComplete { _ ⇒
    println("finished!!!")
    system.terminate()
  }

我在日志中看到 find !!! ,所以必须调用 system.terminate .

I see finished!!! in logs, so the system.terminate must be called.

如果我猛击pod并运行 ps aux ,我仍然看到Java进程正在运行.

If I bash into the pod and run ps aux I still see the java process running.

demiourgos728@crawler-manual-f9tdf-mjcvn:/opt/docker$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
demiour+       1  0.1 17.1 2418900 296532 ?      Ssl  Oct13   4:22 /usr/local/openjdk-8/bin/java -cp /opt/docker/lib/crawler.crawler-2.0.0.jar
demiour+     212  0.0  0.2   5752  3656 pts/0    Ss   15:25   0:00 /bin/bash
demiour+     218  0.0  0.1   9392  3064 pts/0    R+   15:25   0:00 ps aux

当我在本地将其作为独立应用程序运行时,此方法有效,并且在使用docker在本地运行时也终止.

This works when I run it locally as a standalone application, and also terminates when I run it locally with docker.

如何确定应用程序已终止,因此pod的状态变为已完成"?

How do I make sure that the application is terminated and therefore pod's status becomes "Completed"?

这是cronjob的定义方式:

Here's how the cronjob is defined:

# cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: crawler
spec:
  schedule: "49 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: crawler-cronjob
              image: eu.gcr.io/myawesomeproject/crawler:latest
          restartPolicy: OnFailure

更新

我根据以下行的建议更新了代码:

I updated my code as suggested with these lines:

stream.runWith(
    Sink. // whatever's here
  ).onComplete { _ ⇒
    println("finished!!!")
    system.terminate()
    Await.ready(system.whenTerminated, 300.second)
    println("and terminated!!!")
  }

这是我在日志中看到的:

And this is what I see in the logs:

2020-10-27 09:52:19.375 CEST finished!!!
2020-10-27 09:52:19.489 CEST [ERROR] [10/27/2020 07:52:19.476] [actor-system-akka.actor.default-dispatcher-8] [akka://actor-system/system/pool-master] connection pool for Pool(shared->https://some-api-url.com:443) has shut down unexpectedly
2020-10-27 09:52:19.489 CEST java.lang.IllegalStateException: Pool shutdown unexpectedly
                                at akka.http.impl.engine.client.PoolInterface$Logic.postStop(PoolInterface.scala:214)
                                at akka.stream.impl.fusing.GraphInterpreter.finalizeStage(GraphInterpreter.scala:599)
                                at akka.stream.impl.fusing.GraphInterpreter.finish(GraphInterpreter.scala:324)
                                ...
2020-10-27 09:52:19.508 CEST and terminated!!!

但是下次我运行它(使用更新的消息传递)时,会发生这种情况:

But next time I run it (with updated messaging) this happens:

  system.registerOnTermination({
    println("\n\n really terminated!!! \n\n")
  })

这是jstack的输出:

This is jstack output:

demiourgos728@crawler-manual-7nkq9-lklv8:/opt/docker$ jstack 1
2020-10-27 11:38:15
Full thread dump OpenJDK 64-Bit Server VM (25.265-b01 mixed mode):

"Attach Listener" #158 daemon prio=9 os_prio=0 tid=0x00007f1150089800 nid=0xe3 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"async-channel-group-0-timeout-thread" #45 daemon prio=5 os_prio=0 tid=0x00007f115c008000 nid=0x59 waiting on condition [0x00007f1143f3f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000eee416a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

"DestroyJavaVM" #42 prio=5 os_prio=0 tid=0x00007f117400b800 nid=0x2c waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"async-channel-group-0-handler-executor" #41 prio=5 os_prio=0 tid=0x00007f11581e1800 nid=0x56 waiting on condition [0x00007f1144040000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000eee39c58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

"cluster-5f93dae7deb88a3b7a33ab63" #40 daemon prio=5 os_prio=0 tid=0x00007f11748d0000 nid=0x55 waiting on condition [0x00007f1144341000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000e599e360> (a java.util.concurrent.CountDownLatch$Sync)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
    at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
    at com.mongodb.internal.connection.BaseCluster$WaitQueueHandler.run(BaseCluster.java:491)
    at java.lang.Thread.run(Thread.java:748)

"cluster-rtt-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-02.z3l5v.gcp.mongodb.net:27017" #39 daemon prio=5 os_prio=0 tid=0x00007f11748bd800 nid=0x54 waiting on condition [0x00007f1144442000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at com.mongodb.internal.connection.DefaultServerMonitor.waitForNext(DefaultServerMonitor.java:435)
    at com.mongodb.internal.connection.DefaultServerMonitor.access$1300(DefaultServerMonitor.java:57)
    at com.mongodb.internal.connection.DefaultServerMonitor$RoundTripTimeRunnable.run(DefaultServerMonitor.java:409)
    at java.lang.Thread.run(Thread.java:748)

"cluster-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-02.z3l5v.gcp.mongodb.net:27017" #38 daemon prio=5 os_prio=0 tid=0x00007f1174670800 nid=0x53 waiting on condition [0x00007f1144543000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000eee507c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:294)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:275)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:170)
    at java.lang.Thread.run(Thread.java:748)

"cluster-rtt-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-01.z3l5v.gcp.mongodb.net:27017" #37 daemon prio=5 os_prio=0 tid=0x00007f117465d000 nid=0x52 waiting on condition [0x00007f1144644000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at com.mongodb.internal.connection.DefaultServerMonitor.waitForNext(DefaultServerMonitor.java:435)
    at com.mongodb.internal.connection.DefaultServerMonitor.access$1300(DefaultServerMonitor.java:57)
    at com.mongodb.internal.connection.DefaultServerMonitor$RoundTripTimeRunnable.run(DefaultServerMonitor.java:409)
    at java.lang.Thread.run(Thread.java:748)

"cluster-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-01.z3l5v.gcp.mongodb.net:27017" #36 daemon prio=5 os_prio=0 tid=0x00007f117465b800 nid=0x51 waiting on condition [0x00007f1144745000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000eee50b40> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:294)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:275)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:170)
    at java.lang.Thread.run(Thread.java:748)

"cluster-rtt-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-00.z3l5v.gcp.mongodb.net:27017" #35 daemon prio=5 os_prio=0 tid=0x00007f1174656000 nid=0x50 waiting on condition [0x00007f1144846000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at com.mongodb.internal.connection.DefaultServerMonitor.waitForNext(DefaultServerMonitor.java:435)
    at com.mongodb.internal.connection.DefaultServerMonitor.access$1300(DefaultServerMonitor.java:57)
    at com.mongodb.internal.connection.DefaultServerMonitor$RoundTripTimeRunnable.run(DefaultServerMonitor.java:409)
    at java.lang.Thread.run(Thread.java:748)

"cluster-ClusterId{value='5f93dae7deb88a3b7a33ab63', description='null'}-yatta-shard-00-00.z3l5v.gcp.mongodb.net:27017" #34 daemon prio=5 os_prio=0 tid=0x00007f11743f6800 nid=0x4f waiting on condition [0x00007f1144947000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000eee50ec0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:294)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:275)
    at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:170)
    at java.lang.Thread.run(Thread.java:748)

"Thread-1" #32 daemon prio=5 os_prio=0 tid=0x00007f1175354800 nid=0x4d runnable [0x00007f1144b49000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000eee0c2f8> (a sun.nio.ch.Util$3)
    - locked <0x00000000eee0c308> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000eee0c2b0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
    at com.mongodb.connection.TlsChannelStreamFactoryFactory$SelectorMonitor.lambda$start$0(TlsChannelStreamFactoryFactory.java:136)
    at com.mongodb.connection.TlsChannelStreamFactoryFactory$SelectorMonitor$$Lambda$698/375074687.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

"async-channel-group-0-selector" #31 prio=5 os_prio=0 tid=0x00007f1175350000 nid=0x4c runnable [0x00007f1144c4a000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000eee0c548> (a sun.nio.ch.Util$3)
    - locked <0x00000000eee0c558> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000eee0c500> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
    at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup.loop(AsynchronousTlsChannelGroup.java:392)
    at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup$$Lambda$696/548795052.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

"AsyncAppender-Worker-ASYNC" #14 daemon prio=5 os_prio=0 tid=0x00007f11580ad000 nid=0x3b waiting on condition [0x00007f11475fa000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000ee865d38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
    at ch.qos.logback.core.AsyncAppenderBase$Worker.run(AsyncAppenderBase.java:289)

"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f11740b6000 nid=0x33 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f11740b3000 nid=0x32 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f11740b1000 nid=0x31 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f117409e800 nid=0x30 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f117407d800 nid=0x2f in Object.wait() [0x00007f1178498000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
    - locked <0x00000000ee59ad48> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f1174079000 nid=0x2e in Object.wait() [0x00007f1178599000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    - locked <0x00000000ee59af00> (a java.lang.ref.Reference$Lock)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"VM Thread" os_prio=0 tid=0x00007f117406f800 nid=0x2d runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f11740b9000 nid=0x34 waiting on condition 

JNI global references: 3679

推荐答案

我认为您的问题与GKE无关,或者与Kubernetes无关:您的 CronJob 定义看起来不错,并且乍一看,一切应该正常.

I don't think your problem is related to GKE in particular, or Kubernetes in general: your CronJob definition looks fine and, at first glance, everything should work fine.

我认为您的 Dockerfile 也是正确的,并定义了一个入口点和/或仅启动JVM进程的命令.

I assume your Dockerfile is correct too and defines an entry point and/or command that just starts the JVM process.

可能您的问题是 ActorSystem 实际上从未因任何原因而终止:也许某些actor无法终止,或者在actor系统终止时出现了一些问题.

Probably your problem is that the ActorSystem is never actually terminated for any reason: maybe some actor could not be terminated or there is some problem while the actor system is being terminated.

要测试此假设并帮助调试问题,您可以注册一个简单的

To test this assumption and help debug the problem, you can register a simple registerOnTermination callback and await for system termination, something like:

implicit val system: ActorSystem = ActorSystem("actor-system")

system.registerOnTermination {
  println("the actor system is terminated")
}

/* doing actual stuff */

stream.runWith(
    Sink. // whatever's here
  ).onComplete { _ ⇒
    println("finished!!!")
    system.terminate()
    // Use ready or result, with the amount of time you consider   
    // appropriate, and analyze the results (timeout, error, ...) provided
    // See, for instance, https://stackoverflow.com/questions/41170280/difference-await-ready-and-await-result
    Await.ready(system.whenTerminated(), 100.second)
  }

如果您正在使用第三方资源(例如数据库或某些其他存储系统)来存储爬网结果,请特别注意.实际上,正如@TomerShetah指出的那样,在检查更新时,很可能是这样:似乎与Mongo相关的代码中可能存在某种连接泄漏.请仔细检查您的代码.另外,如果您可以进一步处理 Await 调用的结果,以将超时作为泄漏的原因而丢弃,那就太好了.

Pay special attention if you are using third party resources like a database or some other store system for storing your crawl results. In fact, reviewing your update, and as @TomerShetah also has indicated, it is very likely the case: it seems that you can have some kind of connection leak in your code related with Mongo. Please, carefully review your code. Also, it would be great if you can further process the results of the Await call in order to discard a timeout as the cause of the leak.

如果您可以使用与设置有关的更多信息来更新您的答案,我认为这也将有所帮助.例如,Mongo在哪里?您是否还在k8s中部署它?

I think it would be also of help if you can update your answer with further information related with your setup. For instance, where is Mongo? Are you deploying it also in k8s?

这篇关于系统未在GKE上的docker的scala应用程序中终止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆