使用GDAL/OGR python模块解析osm.pbf数据 [英] Parsing osm.pbf data using GDAL/OGR python module
问题描述
我正在尝试使用python GDAL/OGR模块从OSM.PBF文件中提取数据.
I'm trying to extract data from an OSM.PBF file using the python GDAL/OGR module.
当前我的代码如下:
import gdal, ogr
osm = ogr.Open('file.osm.pbf')
## Select multipolygon from the layer
layer = osm.GetLayer(3)
# Create list to store pubs
pubs = []
for feat in layer:
if feat.GetField('amenity') == 'pub':
pubs.append(feat)
这一点代码对于small.pbf文件(15mb)可以正常工作.但是,当解析大于50mb的文件时,出现以下错误:
While this little bit of code works fine with small.pbf files (15mb). However, when parsing files larger than 50mb I get the following error:
ERROR 1: Too many features have accumulated in points layer. Use OGR_INTERLEAVED_READING=YES MODE
当我通过以下方式打开此模式时:
When I turn this mode on with:
gdal.SetConfigOption('OGR_INTERLEAVED_READING', 'YES')
即使在解析小文件时,
ogr也不返回任何功能.
ogr does not return any features at all anymore, even when parsing small files.
有人知道这是怎么回事吗?
Does anyone know what is going on here?
推荐答案
感谢scai的回答,我得以弄清楚.
Thanks to scai's answer I was able to figure it out.
gdal.org/1.11/ogr/drv_osm.html中提到的交错阅读所需的特殊阅读模式被转换为可以在下面找到的有效python示例.
The special reading pattern required for interleaved reading that is mentioned in gdal.org/1.11/ogr/drv_osm.html is translated into a working python example that can be found below.
这是如何提取.osm.pbf文件中所有带有'amenity = pub'标签的功能的示例
This is an example of how to extract all features in an .osm.pbf file that have the 'amenity=pub' tag
import gdal, ogr
gdal.SetConfigOption('OGR_INTERLEAVED_READING', 'YES')
osm = ogr.Open('file.osm.pbf')
# Grab available layers in file
nLayerCount = osm.GetLayerCount()
thereIsDataInLayer = True
pubs = []
while thereIsDataInLayer:
thereIsDataInLayer = False
# Cycle through available layers
for iLayer in xrange(nLayerCount):
lyr=osm.GetLayer(iLayer)
# Get first feature from layer
feat = lyr.GetNextFeature()
while (feat is not None):
thereIsDataInLayer = True
#Do something with feature, in this case store them in a list
if feat.GetField('amenity') == 'pub':
pubs.append(feat)
#The destroy method is necessary for interleaved reading
feat.Destroy()
feat = lyr.GetNextFeature()
据我了解,需要while循环而不是for循环,因为使用交错读取方法时,不可能获得集合的特征数.
As far as I understand it, a while-loop is needed instead of a for-loop because when using the interleaved reading method, it is impossible to obtain the featurecount of a collection.
对于这部分代码为何如此起作用的更多说明,将不胜感激.
More clarification on why this piece of code works like it does would be greatly appreciated.
这篇关于使用GDAL/OGR python模块解析osm.pbf数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!