如何将弹性搜索数据从一个服务器移动到另一个服务器 [英] how to move elasticsearch data from one server to another

查看:181
本文介绍了如何将弹性搜索数据从一个服务器移动到另一个服务器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将弹性搜索数据从一个服务器移动到另一个?



我有服务器正在运行的Elasticsearch 1.1.1 在一个具有多个索引的本地节点上。
我想将该数据复制到运行Elasticsearch 1.3.4的服务器 B



到目前为止的过程


  1. 在两台服务器上关闭ES,并将

  2. 将所有数据全部删除到正确的数据目录在新服务器上。 (数据似乎位于我的debian框上的/ var / lib / elasticsearch /)

  3. 将权限和所有权更改为elasticsearch:elasticsearch

  4. start新的ES服务器

当我使用ES头插件查看集群时,不会出现任何索引。



似乎没有加载数据。我错过了什么吗?

解决方案

所选答案使其听起来稍微复杂一些,以下是您需要的(在您的系统上首先安装npm)。

  npm install -g elasticdump 
elasticdump --input = http:// mysrc.com:9200/my_index --output = http://mydest.com:9200 / my_index --type = mapping
elasticdump --input = http://mysrc.com:9200 / my_index --output = http://mydest.com:9200 / my_index --type = data

您可以跳过如果映射保持不变,请先使用弹性弹出命令。



我刚刚从AWS迁移到Qbox.io,并且没有任何问题。

有关详情,请访问:



https://www.npmjs.com/package/elasticdump



帮助页面(截至2016年2月)包含的完整性:

  elasticdump:导入d export tools for elasticsearch 

用法:elasticdump --input SOURCE - 输出DESTINATION [选项]

--input
源位置(必需)
--input-index
源索引和类型
(默认值:全部,示例:索引/类型)
--output
目标位置(必需)
- -output-index
目标索引和类型
(默认值:全部,示例:索引/类型)
--limit
每个操作批量移动多少个对象
限制是文件流的近似值
(默认值:100)
--debug
显示正在使用的弹性搜索命令
(默认值:false)
--type
我们出口什么?
(默认值:data,options:[data,mapping])
--delete
移动
时,从输入中逐个删除文档。不会删除源索引
(默认值:false)
--searchBody
根据搜索结果预先设置部分提取
(当ES为输入时,
默认:'{query:{match_all:{}}}'))
--sourceOnly
只输出文档中包含的json _source
普通:{_index :,_ type:,_ id:,_source:{SOURCE}}
sourceOnly:{SOURCE}
(默认值:false)
- 全部
从所有索引加载/存储文档
(默认值:false)
--bulk
编写文档时杠杆弹性搜索批量API
(默认值:false)
--ignore-errors
将在写入错误
(默认值:false)上继续读取/写入循环
--scrollTime
Ti我节点将按顺序保存所请求的搜索。
(默认值:10m)
--maxSockets
我们可以处理多少个并发HTTP请求?
(默认值:
5 [node <= v0.10.x] /
Infinity [node> = v0.11.x])
- 散列模式
模式可以是索引,删除或更新。
'index':添加或替换目标索引中的文档。
'delete':删除目的地索引中的文档。
'update':对批量更新API使用'doc_as_upsert'选项进行部分更新。
(默认值:index)
--bulk-use-output-index-name
在批量写入时强制使用目标索引名称(实际输出URL)
作为目的地ES。允许$​​ b $ b利用批量API复制数据在同一个
弹性搜索实例中。
(默认值:false)
--timeout
包含在中止请求之前等待
请求响应的毫秒数的整数。将
直接传递到请求库。如果用于批量写作,
将导致整个批次不被写入。
如果在导入时丢失了一些
数据,而不是太多,那么大多数情况下使用,而是具有速度。
--skip
包含要从输入传输中跳过
的行数的整数。当导入一个大的
索引时,事情可能会出错,无论是连接,崩溃,
有人忘记了屏幕等。这样可以让
从最后一次已知的行
(由输出中的offset记录)。请求
建议,因为最初创建
转储时没有指定排序,所以没有真正的方法
保证跳过的行已经被
写入/解析。这是更多的选项,当
你想要获取大多数数据尽可能索引
,而不用担心在这个进程中丢失一些行
类似于timeout选项。
--inputTransport
向我们提供一个自定义js文件作为输入传输
--outputTransport
向我们提供自定义js文件作为输出传输
- toLog
当使用自定义outputTransport时,应将日志行
附加到输出流?
(默认值:true,'$`)
--help
此页

示例:

#复制索引从生产到分段映射:
elasticdump \
--input = http://production.es.com:9200 / my_index \
--output = http://分段.es.com:9200 / my_index \
--type = mapping
elasticdump \
--input = http://production.es.com:9200 / my_index \
--output = http://staging.es.com:9200 / my_index \
--type = data

#将文件的索引数据备份到
elasticdump \
--input = http://production.es.com:9200 / my_index \
--output = / data / my_index_mapping.json \
- 类型= mapping
elasticdump \
--input = http://production.es.com:9200 / my_index \
--output = / data / my_index.json \
--type = data

#使用stdout对gzip进行备份和索引:
elasticdump \
--input = http://production.es.com: 9200 / my_inde x \
--output = $ \
| gzip> /data/my_index.json.gz

#备份所有索引,然后使用批量API填充另一个ES集群:
elasticdump \
--all = true \
--input = http://production-a.es.com:9200 / \
--output = / data / production.json
elasticdump \
- bulk = true \
--input = / data / production.json \
--output = http://production-b.es.com:9200 /

#将查询的结果备份到文件
elasticdump \
--input = http://production.es.com:9200 / my_index \
--output = query .json \
--searchBody'{query:{term:{username:admin}}}'

-------- -------------------------------------------------- --------------------
了解更多@ https://github.com/taskrabbit/elasticsearch-dump`enter代码这里`


How do I move Elasticsearch data from one server to another?

I have server A running Elasticsearch 1.1.1 on one local node with multiple indices. I would like to copy that data to server B running Elasticsearch 1.3.4

Procedure so far

  1. Shut down ES on both servers and
  2. scp all the data to the correct data dir on the new server. (data seems to be located at /var/lib/elasticsearch/ on my debian boxes)
  3. change permissions and ownership to elasticsearch:elasticsearch
  4. start up the new ES server

When I look at the cluster with the ES head plugin, no indices appear.

It seems that the data is not loaded. Am I missing something?

解决方案

The selected answer makes it sound slightly more complex than it is, the following is what you need (install npm first on your system).

npm install -g elasticdump
elasticdump --input=http://mysrc.com:9200/my_index --output=http://mydest.com:9200/my_index --type=mapping
elasticdump --input=http://mysrc.com:9200/my_index --output=http://mydest.com:9200/my_index --type=data

You can skip the first elasticdump command for subsequent copies if the mappings remain constant.

I have just done a migration from AWS to Qbox.io with the above without any problems.

More details over at:

https://www.npmjs.com/package/elasticdump

Help page (as of Feb 2016) included for completeness:

elasticdump: Import and export tools for elasticsearch

Usage: elasticdump --input SOURCE --output DESTINATION [OPTIONS]

--input
                    Source location (required)
--input-index
                    Source index and type
                    (default: all, example: index/type)
--output
                    Destination location (required)
--output-index
                    Destination index and type
                    (default: all, example: index/type)
--limit
                    How many objects to move in bulk per operation
                    limit is approximate for file streams
                    (default: 100)
--debug
                    Display the elasticsearch commands being used
                    (default: false)
--type
                    What are we exporting?
                    (default: data, options: [data, mapping])
--delete
                    Delete documents one-by-one from the input as they are
                    moved.  Will not delete the source index
                    (default: false)
--searchBody
                    Preform a partial extract based on search results
                    (when ES is the input,
                    (default: '{"query": { "match_all": {} } }'))
--sourceOnly
                    Output only the json contained within the document _source
                    Normal: {"_index":"","_type":"","_id":"", "_source":{SOURCE}}
                    sourceOnly: {SOURCE}
                    (default: false)
--all
                    Load/store documents from ALL indexes
                    (default: false)
--bulk
                    Leverage elasticsearch Bulk API when writing documents
                    (default: false)
--ignore-errors
                    Will continue the read/write loop on write error
                    (default: false)
--scrollTime
                    Time the nodes will hold the requested search in order.
                    (default: 10m)
--maxSockets
                    How many simultaneous HTTP requests can we process make?
                    (default:
                      5 [node <= v0.10.x] /
                      Infinity [node >= v0.11.x] )
--bulk-mode
                    The mode can be index, delete or update.
                    'index': Add or replace documents on the destination index.
                    'delete': Delete documents on destination index.
                    'update': Use 'doc_as_upsert' option with bulk update API to do partial update.
                    (default: index)
--bulk-use-output-index-name
                    Force use of destination index name (the actual output URL)
                    as destination while bulk writing to ES. Allows
                    leveraging Bulk API copying data inside the same
                    elasticsearch instance.
                    (default: false)
--timeout
                    Integer containing the number of milliseconds to wait for
                    a request to respond before aborting the request. Passed
                    directly to the request library. If used in bulk writing,
                    it will result in the entire batch not being written.
                    Mostly used when you don't care too much if you lose some
                    data when importing but rather have speed.
--skip
                    Integer containing the number of rows you wish to skip
                    ahead from the input transport.  When importing a large
                    index, things can go wrong, be it connectivity, crashes,
                    someone forgetting to `screen`, etc.  This allows you
                    to start the dump again from the last known line written
                    (as logged by the `offset` in the output).  Please be
                    advised that since no sorting is specified when the
                    dump is initially created, there's no real way to
                    guarantee that the skipped rows have already been
                    written/parsed.  This is more of an option for when
                    you want to get most data as possible in the index
                    without concern for losing some rows in the process,
                    similar to the `timeout` option.
--inputTransport
                    Provide a custom js file to us as the input transport
--outputTransport
                    Provide a custom js file to us as the output transport
--toLog
                    When using a custom outputTransport, should log lines
                    be appended to the output stream?
                    (default: true, except for `$`)
--help
                    This page

Examples:

# Copy an index from production to staging with mappings:
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=mapping
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=data

# Backup index data to a file:
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=/data/my_index_mapping.json \
  --type=mapping
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=/data/my_index.json \
  --type=data

# Backup and index to a gzip using stdout:
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=$ \
  | gzip > /data/my_index.json.gz

# Backup ALL indices, then use Bulk API to populate another ES cluster:
elasticdump \
  --all=true \
  --input=http://production-a.es.com:9200/ \
  --output=/data/production.json
elasticdump \
  --bulk=true \
  --input=/data/production.json \
  --output=http://production-b.es.com:9200/

# Backup the results of a query to a file
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=query.json \
  --searchBody '{"query":{"term":{"username": "admin"}}}'

------------------------------------------------------------------------------
Learn more @ https://github.com/taskrabbit/elasticsearch-dump`enter code here`

这篇关于如何将弹性搜索数据从一个服务器移动到另一个服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆