Spark Job(Java)无法将数据写入Elasticsearch集群 [英] Spark job (Java) cannot write data to Elasticsearch cluster

查看:144
本文介绍了Spark Job(Java)无法将数据写入Elasticsearch集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Windows中使用 Docker 17.04.0-ce Compose 1.12.0 来部署 Elasticsearch 集群(版本5.4。 0)通过Docker。
到目前为止,我已经执行了以下操作:

I am using Docker 17.04.0-ce and Compose 1.12.0 in Windows in order to deploy Elasticsearch cluster (version 5.4.0) over Docker. So far I have done the following:

1)我已经通过compose并使用以下配置创建了一个Elasticsearch节点

1) I have created a single Elasticsearch node via compose with the following configuration

  elasticsearch1:
    build: elasticsearch/
    container_name: es_1
    cap_add:
      - IPC_LOCK
    environment:
      - cluster.name=cp-es-cluster
      - node.name=cloud1
      - node.master=true
      - http.cors.enabled=true
      - http.cors.allow-origin="*"
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=1
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      docker_elk:
        aliases:
          - elasticsearch

此导致节点已部署,但无法从Spark访问。我将数据写为

This results in the node being deployed, but it is not accessible from Spark. I write data as

JavaEsSparkSQL.saveToEs(aggregators.toDF(), collectionName +"/record");

尽管节点正在运行,但出现以下错误

and I get the following error, although the node is running

I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect

2)我发现如果在节点配置中添加以下行,则可以解决此问题

2) I found out that this problem is solved if I add the following line in the node configuration

- network.publish_host=${ENV_IP}

3)然后我为另外2个节点创建了类似的配置

3) Then I created similar configurations for 2 additional nodes as

  elasticsearch1:
    build: elasticsearch/
    container_name: es_1
    cap_add:
      - IPC_LOCK
    environment:
      - cluster.name=cp-es-cluster
      - node.name=cloud1
      - node.master=true
      - http.cors.enabled=true
      - http.cors.allow-origin="*"
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=1
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
      - network.publish_host=${ENV_IP}
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      docker_elk:
        aliases:
          - elasticsearch

  elasticsearch2:
    build: elasticsearch/
    container_name: es_2
    cap_add:
      - IPC_LOCK
    environment:
      - cluster.name=cp-es-cluster
      - node.name=cloud2
      - http.cors.enabled=true
      - http.cors.allow-origin="*"
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=2
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.ping.unicast.hosts=elasticsearch1"
      - node.master=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - esdata2:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
      - 9301:9300
    networks:
      - docker_elk

  elasticsearch3:
    build: elasticsearch/
    container_name: es_3
    cap_add:
      - IPC_LOCK
    environment:
      - cluster.name=cp-es-cluster
      - node.name=cloud3
      - http.cors.enabled=true
      - http.cors.allow-origin="*"
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=2
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.ping.unicast.hosts=elasticsearch1"
      - node.master=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - esdata3:/usr/share/elasticsearch/data
    ports:
      - 9202:9200
      - 9302:9300
    networks:
      - docker_elk

成功创建3个节点的群集。但是,Spark中再次出现相同的错误,并且数据无法写入集群。即使将 network.publish_host 添加到所有节点,我也会得到相同的行为。

This results in a cluster of 3 nodes being created successfully. However the same error reappeared in Spark and data cannot be written to the cluster. I get the same behavior even if I add network.publish_host to all nodes.

关于Spark,我使用 elasticsearch-spark-20_2.11 版本5.4.0(与ES版本相同)。有什么想法可以解决这个问题吗?

Regarding Spark, I use elasticsearch-spark-20_2.11 version 5.4.0 (same as ES version). Any ideas how to solve this issue?

推荐答案

我设法解决了这个问题。除了在Spark中设置 es.nodes es.port 之外,如果我设置 es.nodes.wan.only true

I managed to solve this. Apart from setting es.nodes and es.port in Spark, the problem goes away if I set es.nodes.wan.only to true.

这篇关于Spark Job(Java)无法将数据写入Elasticsearch集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆