如何在elasticsearch中索引geojson文件? [英] How to index geojson file in elasticsearch?

查看:512
本文介绍了如何在elasticsearch中索引geojson文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PYTHON使用geojson,csv文件和形状文件的形式将空间数据存储到elasticsearch中。我是Elasticsearch的新手,即使遵循了文档,我也无法成功对其进行索引。

I am trying to store spatial data in the form of geojson,csv files and shape files into elasticsearch USING PYTHON.I am new to elasticsearch and even after following the documentation i am not able to successfully index it. Any help would be appreciated.

示例geojson文件:

sample geojson file :

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
        "ID_0": 105,
        "ISO": "IND",
        "NAME_0": "India",
        "ID_1": 1288,
        "NAME_1": "Telangana",
        "ID_2": 15715,
        "NAME_2": "Telangana",
        "VARNAME_2": null,
        "NL_NAME_2": null,
        "HASC_2": "IN.TS.AD",
        "CC_2": null,
        "TYPE_2": "State",
        "ENGTYPE_2": "State",
        "VALIDFR_2": "Unknown",
        "VALIDTO_2": "Present",
        "REMARKS_2": null,
        "Shape_Leng": 8.103535,
        "Shape_Area": 127258717496
      },
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              79.14429367552918,
              19.500257885106404
            ],
            [
              79.14582245808431,
              19.498859172536427
            ],
            [
              79.14600496956801,
              19.498823981691853
            ],
            [
              79.14966523737327,
              19.495821705263914
            ]
          ]
        ]
      }
    }
  ]
}


推荐答案

代码



Code

import geojson
from datetime import datetime
from elasticsearch import Elasticsearch, helpers


def geojson_to_es(gj):

    for feature in gj['features']:

        date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
        feature["properties"]["timestamp"] = int(date.timestamp())
        feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
        yield feature


with open("GeoObs.json") as f:
    gj = geojson.load(f)

    es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])

    k = ({
        "_index": "YOUR_INDEX",
        "_source": feature,
    } for feature in geojson_to_es(gj))

    helpers.bulk(es, k)



说明



Explanation

with open("GeoObs.json") as f:
    gj = geojson.load(f)

    es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])

此代码的一部分会加载一个外部geojson文件,然后连接到Elasticsearch。

This portion of the code loads an external geojson file, then connects to Elasticsearch.

    k = ({
        "_index": "conflict-data",
        "_source": feature,
    } for feature in geojson_to_es(gj))

    helpers.bulk(es, k)

此处的()吃了一个发电机,我们将其馈送到 helpers.bulk(es,k)。记住 _source 是原始数据,正如Elasticsearch所说-IE:我们的原始JSON。 _index 只是我们要将数据放入其中的索引。您将在此处看到其他带有 _doc 的示例。这是映射类型的一部分,在Elasticsearch 7.X +中不再存在。

The ()s here creates a generator which we will feed to helpers.bulk(es, k). Remember _source is the original data as is in Elasticsearch speak - IE: our raw JSON. _index is just the index in which we want to put our data. You'll see other examples with _doc here. This is part of the mapping types and no longer exists in Elasticsearch 7.X+.

def geojson_to_es(gj):

    for feature in gj['features']:

        date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
        feature["properties"]["timestamp"] = int(date.timestamp())
        feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
        yield feature

函数 geojson 使用生成器来生成事件。每次调用后,生成器函数将代替返回并结束关键字 yield`的恢复。在这种情况下,我们将生成GeoJSON功能。在我的代码中,您还会看到:

The function geojson uses a generator to produce events. A generator function will, instead of returning and finishingresume at the keywordyield` after each call. In this case, we are generating our GeoJSON features. In my code you also see:

date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')

这只是在将数据发送到Elasticsearch之前对其进行操作的一个示例。

This is just an example of manipulating the data in the JSON before sending it out to Elasticsearch.

密钥在映射文件中,您必须将某些内容标记为 geo_point geo_shape 。这些数据类型是Elasticsearch识别地理数据的方式。来自我的映射文件的示例:

The key is in your mapping file you must have something tagged as geo_point or geo_shape. These data types are how Elasticsearch recognizes geo data. Example from my mapping file:

...
{
  "properties": {
    "geometry": {
      "properties": {
        "coordinates": {
          "type": "geo_point"
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    },
...

也就是说,在使用Python上传GeoJSON数据之前,您需要创建索引,然后应用包含 geo_shape geo_point ,例如:

That is to say, before uploading your GeoJSON data with Python, you need to create your index, and then apply a mapping file which includes either geo_shape or geo_point using something like:

curl -X PUT localhost:9200 / YOUR_INDEX?pretty
curl -X PUT localhost:9200 / YOUR_INDEX / _mapping?pretty -H内容类型:application / json -d @ mapping.json

这篇关于如何在elasticsearch中索引geojson文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆