如何使用Hadoop GIS框架加载空间数据 [英] How to Load Spatial Data using the Hadoop GIS framework

查看:247
本文介绍了如何使用Hadoop GIS框架加载空间数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用



当我切换到 jar 2.0 时,以及最新的配置单元(0.13),问题消失。



您可以找到我的问题报告这里。希望这可以帮助有人遇到相同的问题。

解决方案

我得到的专家是将你的几何信息以文本格式存储,而不是你尝试过的几何格式。


I am trying to use the Hadoop GIS Framework, in order to add Spatial support to hive. One of the things I want to do is to create a spatial table from external data (from PostGIS). Unfortunately, the serializer provided by ESRI maps to a ESRI JSON format, rather than standards such as WKT, GeoJSON. What I ended up doing, was a bit of a workaround.

The first thing, was to export my PostGIS data as a tab separated file, transforming the geometric field into GeoJSON.

\COPY (select id, ST_AsGeoJSON(geom) from grid_10) TO '/tmp/grid_10.geojson';

Then I put it somewhere in the S3 filesystem, and loaded it using the csv serializer. It created a table with two fields: and integer, and text (containing GeoJSON).

CREATE EXTERNAL TABLE grid_10 (id bigint, json STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://some-url/data/grids/geojson';

I can generate geometry correctly from this GeoJSON, using this query:

SELECT ID, ST_AsText(ST_GeomFromGeoJSON(json)) from grid_10 limit 3;

Which outputs:

Now I wanted to convert this table into an actual spatial table, where geometry is stored as a BLOB, rather than some text. I did it with the following query:

create table new_grid as SELECT ID, ST_GeomFromGeoJSON(json) as geom from grid_10;  

To my surprise, the content of this table is the same geometry, repeated over and over.

I tried the same approach - creating a geometry from a WKT/GeoJSON and writing it into a table - with the same results. Is this a bug? Does it mean, I am condemned to perform spatial queries using conversions-on-the-fly, and by the way isn't it much costly in computational terms than manipulating BLOBs?

create table grid_cnt as 
SELECT grid_10.id, count(grid_10.id) as ptcnt FROM grid_10 JOIN tweets WHERE     ST_Contains(ST_GeomFromGeoJSON(grid_10.json),ST_Point(tweets.longitude, tweets.latitude))=true GROUP BY grid_10.id;

I was wondering if anybody has experienced the same issues.

Update: This problem was happening with Hive 0.11, running on Amazon Hadoop's Distribution 3.3.1. I was also pulling the ESRI jars, from this link:

https://github.com/Esri/gis-tools-for-hadoop/archive/master.zip

When I switched to the jar 2.0, and the latest hive (0.13), the problem disappeared.

You can find my issue report here. Hope this helps someone experiencing the same issues.

解决方案

I went through same issues described by you above...The solution from some expert that I got was to stored your geometry information in wkt i.e. text format instead of geometry format which you have tried.

这篇关于如何使用Hadoop GIS框架加载空间数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆