将GeoDataFrame写入SQL数据库 [英] Write GeoDataFrame into SQL Database

查看:206
本文介绍了将GeoDataFrame写入SQL数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我的问题不会很荒谬,因为令人惊讶的是,据我所知,显然尚未真正在热门网站上问过这个问题.

I hope that my question is not ridiculous since, surprisingly, this question has apparently not really been asked yet (to the best of my knowledge) on the popular websites.

这种情况是我有几个csv文件,总共包含1个以上的Mio观测值.每个观察结果都包含一个邮政地址.我打算将所有文件读入单个GeoDataFrame中,对地址进行地址编码,在给定shapefile的情况下执行空间连接,并为每行保存来自多边形的一些信息.我想是相当标准的.这是一次性数据清理过程的一部分.

The situation is that I have several csv files containing more than 1 Mio observations in total. Each observation contains, among others, a postal address. I am planning to read all files into a single GeoDataFrame, geocode the addresses, perform a spatial join given a shapefile and save some information from the polygon for each row. Quite standard, I suppose. This is a part of a one-time data cleaning process.

我的目标是使用此最终数据集建立一个数据库.这是因为它使我可以非常轻松地共享和搜索数据,例如在网站上绘制一些观察结果.而且,它使得根据某些标准选择观测值然后进行一些分析变得非常容易.

My goal is to set up a database with this final dataset. This is because it allows me to share and search the data quite easily as well as e.g. plot some observations on a website. Also, it makes it quite easy to select observations based on some criteria and then run some analyses.

我的问题是,尚未将GeoDataFrame插入数据库的功能-显然是因为GeoPandas应该是数据库的替代品("GeoPandas使您能够轻松地在python中进行操作,否则将需要空间数据库,例如PostGIS").

My problem is that the feature of inserting a GeoDataFrame into a database seems not to be implemented yet - apparently because GeoPandas is supposed to be a subsitute for databases ("GeoPandas enables you to easily do operations in python that would otherwise require a spatial database such as PostGIS").

当然,我可以遍历每一行并手动"插入每个数据点,但是我正在这里寻找最佳解决方案.对于任何解决方法,我还担心数据类型可能与数据库的数据类型冲突.有没有最好的方法"在这里?

Of course, I could iterate through each line and insert each data point "manually", but I am looking for the best solution here. For any workaround I would also be afraid that the datatype may conflict with that of the database. Is there "a best way" to take here?

感谢您的帮助.

推荐答案

因此,我刚刚为PostGIS数据库实现了这一点,可以在这里粘贴我的方法.对于MySQL,您必须修改代码.

So, I just implemented this for a PostGIS database, and I can paste my method here. For MySQL, you'll have to adapt the code.

第一步是将经过地理编码的列转换为WKB十六进制字符串,因为我通过引擎使用了 SQLAlchemy 基于 pyscopg 的版本,而这两个软件包本身都不了解地理类型.下一步是照常将数据写入SQL DB(请注意,所有几何列都应转换为包含WKB十六进制字符串的文本列),最后通过执行查询将列的类型更改为几何.请参考以下伪代码:

First step was to convert the geocoded columns into WKB hex string, because I use SQLAlchemy, with an engine based on pyscopg, and both of those packages do not understand geo-types natively. Next step is to write that data into a SQL DB, as usual (note that all geometry columns should be converted to text columns holding the WKB hex string), and finally change the type of the columns to Geometry by executing a query. Refer to the following pseudocode:

# Imports
import sqlalchemy as sal
import geopandas as gpd

# Function to generate WKB hex
def wkb_hexer(line):
    return line.wkb_hex

# Convert `'geom'` column in GeoDataFrame `gdf` to hex
    # Note that following this step, the GeoDataFrame is just a regular DataFrame
    # because it does not have a geometry column anymore. Also note that
    # it is assumed the `'geom'` column is correctly datatyped.
gdf['geom'] = gdf['geom'].apply(wkb_hexer)

# Create SQL connection engine
engine = sal.create_engine('postgresql://username:password@host:socket/database')

# Connect to database using a context manager
with engine.connect() as conn, conn.begin():
    # Note use of regular Pandas `to_sql()` method.
    gdf.to_sql(table_name, con=conn, schema=schema_name,
               if_exists='append', index=False)
    # Convert the `'geom'` column back to Geometry datatype, from text
    sql = """ALTER TABLE schema_name.table_name
               ALTER COLUMN geom TYPE Geometry(LINESTRING, <SRID>)
                 USING ST_SetSRID(geom::Geometry, <SRID>)"""
    conn.execute(sql)

这篇关于将GeoDataFrame写入SQL数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆