Spark Job(Java)无法将数据写入Elasticsearch集群 [英] Spark job (Java) cannot write data to Elasticsearch cluster
问题描述
我在Windows中使用 Docker 17.04.0-ce 和 Compose 1.12.0 来部署 Elasticsearch 集群(版本5.4。 0)通过Docker。
到目前为止,我已经执行了以下操作:
I am using Docker 17.04.0-ce and Compose 1.12.0 in Windows in order to deploy Elasticsearch cluster (version 5.4.0) over Docker. So far I have done the following:
1)我已经通过compose并使用以下配置创建了一个Elasticsearch节点
1) I have created a single Elasticsearch node via compose with the following configuration
elasticsearch1:
build: elasticsearch/
container_name: es_1
cap_add:
- IPC_LOCK
environment:
- cluster.name=cp-es-cluster
- node.name=cloud1
- node.master=true
- http.cors.enabled=true
- http.cors.allow-origin="*"
- bootstrap.memory_lock=true
- discovery.zen.minimum_master_nodes=1
- xpack.security.enabled=false
- xpack.monitoring.enabled=false
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- esdata1:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
networks:
docker_elk:
aliases:
- elasticsearch
此导致节点已部署,但无法从Spark访问。我将数据写为
This results in the node being deployed, but it is not accessible from Spark. I write data as
JavaEsSparkSQL.saveToEs(aggregators.toDF(), collectionName +"/record");
尽管节点正在运行,但出现以下错误
and I get the following error, although the node is running
I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect
2)我发现如果在节点配置中添加以下行,则可以解决此问题
2) I found out that this problem is solved if I add the following line in the node configuration
- network.publish_host=${ENV_IP}
3)然后我为另外2个节点创建了类似的配置
3) Then I created similar configurations for 2 additional nodes as
elasticsearch1:
build: elasticsearch/
container_name: es_1
cap_add:
- IPC_LOCK
environment:
- cluster.name=cp-es-cluster
- node.name=cloud1
- node.master=true
- http.cors.enabled=true
- http.cors.allow-origin="*"
- bootstrap.memory_lock=true
- discovery.zen.minimum_master_nodes=1
- xpack.security.enabled=false
- xpack.monitoring.enabled=false
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
- network.publish_host=${ENV_IP}
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- esdata1:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
networks:
docker_elk:
aliases:
- elasticsearch
elasticsearch2:
build: elasticsearch/
container_name: es_2
cap_add:
- IPC_LOCK
environment:
- cluster.name=cp-es-cluster
- node.name=cloud2
- http.cors.enabled=true
- http.cors.allow-origin="*"
- bootstrap.memory_lock=true
- discovery.zen.minimum_master_nodes=2
- xpack.security.enabled=false
- xpack.monitoring.enabled=false
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- "discovery.zen.ping.unicast.hosts=elasticsearch1"
- node.master=false
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- esdata2:/usr/share/elasticsearch/data
ports:
- 9201:9200
- 9301:9300
networks:
- docker_elk
elasticsearch3:
build: elasticsearch/
container_name: es_3
cap_add:
- IPC_LOCK
environment:
- cluster.name=cp-es-cluster
- node.name=cloud3
- http.cors.enabled=true
- http.cors.allow-origin="*"
- bootstrap.memory_lock=true
- discovery.zen.minimum_master_nodes=2
- xpack.security.enabled=false
- xpack.monitoring.enabled=false
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- "discovery.zen.ping.unicast.hosts=elasticsearch1"
- node.master=false
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- esdata3:/usr/share/elasticsearch/data
ports:
- 9202:9200
- 9302:9300
networks:
- docker_elk
成功创建3个节点的群集。但是,Spark中再次出现相同的错误,并且数据无法写入集群。即使将 network.publish_host
添加到所有节点,我也会得到相同的行为。
This results in a cluster of 3 nodes being created successfully. However the same error reappeared in Spark and data cannot be written to the cluster. I get the same behavior even if I add network.publish_host
to all nodes.
关于Spark,我使用 elasticsearch-spark-20_2.11 版本5.4.0(与ES版本相同)。有什么想法可以解决这个问题吗?
Regarding Spark, I use elasticsearch-spark-20_2.11 version 5.4.0 (same as ES version). Any ideas how to solve this issue?
推荐答案
我设法解决了这个问题。除了在Spark中设置 es.nodes
和 es.port
之外,如果我设置 es.nodes.wan.only
到 true
。
I managed to solve this. Apart from setting es.nodes
and es.port
in Spark, the problem goes away if I set es.nodes.wan.only
to true
.
这篇关于Spark Job(Java)无法将数据写入Elasticsearch集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!