如何在elasticsearch中索引geojson文件? [英] How to index geojson file in elasticsearch?
问题描述
我正在尝试使用PYTHON使用geojson,csv文件和形状文件的形式将空间数据存储到elasticsearch中。我是Elasticsearch的新手,即使遵循了文档,我也无法成功对其进行索引。
I am trying to store spatial data in the form of geojson,csv files and shape files into elasticsearch USING PYTHON.I am new to elasticsearch and even after following the documentation i am not able to successfully index it. Any help would be appreciated.
示例geojson文件:
sample geojson file :
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"ID_0": 105,
"ISO": "IND",
"NAME_0": "India",
"ID_1": 1288,
"NAME_1": "Telangana",
"ID_2": 15715,
"NAME_2": "Telangana",
"VARNAME_2": null,
"NL_NAME_2": null,
"HASC_2": "IN.TS.AD",
"CC_2": null,
"TYPE_2": "State",
"ENGTYPE_2": "State",
"VALIDFR_2": "Unknown",
"VALIDTO_2": "Present",
"REMARKS_2": null,
"Shape_Leng": 8.103535,
"Shape_Area": 127258717496
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
79.14429367552918,
19.500257885106404
],
[
79.14582245808431,
19.498859172536427
],
[
79.14600496956801,
19.498823981691853
],
[
79.14966523737327,
19.495821705263914
]
]
]
}
}
]
}
推荐答案
代码
Code
import geojson
from datetime import datetime
from elasticsearch import Elasticsearch, helpers
def geojson_to_es(gj):
for feature in gj['features']:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
yield feature
with open("GeoObs.json") as f:
gj = geojson.load(f)
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])
k = ({
"_index": "YOUR_INDEX",
"_source": feature,
} for feature in geojson_to_es(gj))
helpers.bulk(es, k)
说明
Explanation
with open("GeoObs.json") as f:
gj = geojson.load(f)
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])
此代码的一部分会加载一个外部geojson文件,然后连接到Elasticsearch。
This portion of the code loads an external geojson file, then connects to Elasticsearch.
k = ({
"_index": "conflict-data",
"_source": feature,
} for feature in geojson_to_es(gj))
helpers.bulk(es, k)
此处的()
吃了一个发电机,我们将其馈送到 helpers.bulk(es,k)
。记住 _source
是原始数据,正如Elasticsearch所说-IE:我们的原始JSON。 _index
只是我们要将数据放入其中的索引。您将在此处看到其他带有 _doc
的示例。这是映射类型的一部分,在Elasticsearch 7.X +中不再存在。
The ()
s here creates a generator which we will feed to helpers.bulk(es, k)
. Remember _source
is the original data as is in Elasticsearch speak - IE: our raw JSON. _index
is just the index in which we want to put our data. You'll see other examples with _doc
here. This is part of the mapping types and no longer exists in Elasticsearch 7.X+.
def geojson_to_es(gj):
for feature in gj['features']:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
yield feature
函数 geojson
使用生成器来生成事件。每次调用后,生成器函数将代替返回并结束关键字 yield`的恢复。在这种情况下,我们将生成GeoJSON功能。在我的代码中,您还会看到:
The function geojson
uses a generator to produce events. A generator function will, instead of returning and finishingresume at the keyword
yield` after each call. In this case, we are generating our GeoJSON features. In my code you also see:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
这只是在将数据发送到Elasticsearch之前对其进行操作的一个示例。
This is just an example of manipulating the data in the JSON before sending it out to Elasticsearch.
密钥在映射文件中,您必须将某些内容标记为 geo_point
或 geo_shape
。这些数据类型是Elasticsearch识别地理数据的方式。来自我的映射文件的示例:
The key is in your mapping file you must have something tagged as geo_point
or geo_shape
. These data types are how Elasticsearch recognizes geo data. Example from my mapping file:
...
{
"properties": {
"geometry": {
"properties": {
"coordinates": {
"type": "geo_point"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
...
也就是说,在使用Python上传GeoJSON数据之前,您需要创建索引,然后应用包含 geo_shape
或 geo_point
,例如:
That is to say, before uploading your GeoJSON data with Python, you need to create your index, and then apply a mapping file which includes either geo_shape
or geo_point
using something like:
curl -X PUT localhost:9200 / YOUR_INDEX?pretty
curl -X PUT localhost:9200 / YOUR_INDEX / _mapping?pretty -H内容类型:application / json -d @ mapping.json
这篇关于如何在elasticsearch中索引geojson文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!