Python：建议改进一个chunk-by-chunk代码读取几百万点 [英] Python: suggestions to improve a chunk-by-chunk code to read several millions of points

查看：514 发布时间：2016/12/14 14:16:41 python performance coding-style matplotlib chunked-encoding

本文介绍了Python：建议改进一个chunk-by-chunk代码读取几百万点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Python中写了一个代码来读取 *。las 文件。 * las 文件是特殊的ascii文件，其中每行 x，y，z

我的函数读取 N 。点数，并检查它们是否在 points_inside_poly 的多边形内。

我有以下问题： p>

当我到达文件末尾时，我收到以下消息： LASException：LASReader_GetPointAt中的LASError：超出范围，因为点数在块尺寸以下。

a = [file_out.write（c [m]）for m in xrange（len（c））] 我使用 a = ，以避免视频打印。是否正确？

在 c = [chunk [l] for l in index] 我创建一个新列表 c ，因为我不确定替换新的块是聪明的解决方案（例如： chunk = [chunk [l] for l in index] ）。

在语句 if else ... else 我使用 pass 。这是正确的选择吗？

真的非常感谢您的帮助。重要的是要提高专家的倾听建议!!!!

  import shapefile 
 import numpy 
 import numpy as np 
从numpy import非零
从liblas导入文件as lasfile 
从shapely.geometry导入多边形
从matplotlib.nxutils import points_inside_poly 
 
 
＃open shapefile（polygon）
 sf = shapefile.Reader（poly）
 shapes = sf.shapes（）
＃提取顶点
 verts = np.array 0] .points，float）
 
＃打开las文件
f = lasfile.File（inFile，None，'r'）＃open LAS 
＃读header
h = f.header 
 
＃创建一个文件，其中存储点
 file_out = lasfile.File（outFile，mode ='w'，header = h）
 
 
 chunkSize = 100000 
 for i in xrange（0，len（f），chunkSize）：
 chunk = f [i：i + chunkSize] 
 
x ，y = []，[] 
 
每个点的x和y值
 for xrange（len（chunk））：
 x.append（chunk [ p] .x）
 y.append（chunk [p] .y）
 
＃zip所有点
 points = np.array（zip（x，y））
＃创建索引，其中存在多边形内的点
 index = nonzero（points_inside_poly（points，verts））[0] 
 
＃如果索引不为空， pass
如果len（index）！= 0：
c = [chunk [l] for l in index] #Is正确创建一个新列表或者我可以替换chunck？ 
＃保存点
 a = [file_out.write（c [m]）for m in xrange（len（c））]＃使用a =以避免视频打印。这是对的吗？ 
 else：
 pass #Is正确使用pass？ 
 
 f.close（）
 file_out.close（）

由@Roland Smith提出并由Gianni更改的代码

  f = lasfile.File（inFile，None，'r'）＃打开LAS 
h = f.header 
＃将软件ID更改为libLAS 
 h.software_id =Gianni
 file_out = lasfile.File（outFile，mode ='w' header = h）
 f.close（）
 sf = shapefile.Reader（poly）#open shpfile 
 shapes = sf.shapes（）
 for i in xrange形状））：
 verts = np.array（shapes [i] .points，float）
 inside_points = [p in lasfile.File（inFile，None，'r'）if pnpoly ，py，verts）] 
 for p在inside_points：
 file_out.write（p）
 f.close（）
 file_out.close（）
  
 
 
 我使用这些解决方案：
 1）读取 f = lasfile.File（inFile，None，'r' ），因为我需要在* .las输出文件
 2）关闭文件
 3）我使用 inside_points = [ p在lasfile.File（inFile，无，'r'）如果pnpoly（px，py，verts）] 而不是
  with lasfile.File（inFile，None，'r'）as f：
 ... inside_points = [p for p in f if pnpoly（px，py，verts）] 
 ... 
  
，因为我总是收到此错误消息
 
 
  跟踪（最近一次调用）：
 
中的文件，第1行AttributeError：_  exit  _  p> 
 
解决方案
关于（1）：
 
 
  ？只需使用las文件作为迭代器（如教程所示），并处理点一次一个。以下内容应该将多边形内的所有点写入输出文件，方法是使用  pnpoly  在列表解析中，而不是 points_inside_poly 。
 从liblas导入文件as lasfile 
 import numpy as np 
从matplotlib.nxutils import pnpoly 
 
 with lasfile.File（inFile，None， r'）as f：
 inside_points =（p for p in f if pnpoly（px，py，verts））
 with lasfile.File（outFile，mode ='w'，header = h）as file_out：
 for inside_points：
 file_out.write（p）
  
上面的五行代替了的整个大 -loop。让我们逐一讨论它们：
 
 
  
   with lasfile.File（inFile  ...：使用此构造意味着和块完成后，文件将自动关闭。
 
  （（））之间的部分，它遍历输入文件（ for p in f  code>）。在生成器中添加多边形内部的每个点（ if pnpoly（px，py，verts））。
 
 我们对输出文件
 
 使用另一个和 p在inside_points 中，这是使用生成器）
 
 写入输出文件（ file_out.write / code>）
 
 
 
 
 因为此方法仅将多边形内的点添加到列表， 
 
 
 如果上面显示的方法不起作用，您应该 只使用chunks。 
当使用块时，应该正确处理异常。例如：
 来自liblas import LASException 
 
 chunkSize = 100000 
 for i in xrange 0，len（f），chunkSize）：
 try：
 chunk = f [i：i + chunkSize] 
除了LASException：
 rem = len（f）-i 
 chunk = f [i：i + rem] 
  
我不明白你想在这里完成什么。 视频打印是什么意思？ 
 
 
 关于（3）：由于您不再使用原始的块，您可以重新使用该名称。意识到在python变量只是一个名称标签。
 
 
 关于（4）：您未使用 else 它完全。
 
I wrote a code to read *.las file in Python. *las file are special ascii file where each line is x,y,z value of points.

My function read N. number of points and check if they are inside a polygon with points_inside_poly.

I have the following questions:

When I arrive at the end of the file I get this message: LASException: LASError in "LASReader_GetPointAt": point subscript out of range because the number of points is under the chunk dimension. I cannot figure how to resolve this problem.
a = [file_out.write(c[m]) for m in xrange(len(c))] I use a = in order to avoid video print. Is it correct?
In c = [chunk[l] for l in index] I create a new list c because I am not sure that replacing a new chunk is the smart solution (ex: chunk = [chunk[l] for l in index]). 
In a statement if else...else I use pass. Is this the right choice?
Really thank for help. It's important to improve listen suggestions from expertise!!!!
import shapefile
import numpy
import numpy as np
from numpy import nonzero
from liblas import file as lasfile
from shapely.geometry import Polygon
from matplotlib.nxutils import points_inside_poly  


# open shapefile (polygon)
sf = shapefile.Reader(poly)
shapes = sf.shapes()
# extract vertices
verts = np.array(shapes[0].points,float)

# open las file
f = lasfile.File(inFile,None,'r') # open LAS
# read "header"
h = f.header

# create a file where store the points
file_out = lasfile.File(outFile,mode='w',header= h)


chunkSize = 100000
for i in xrange(0,len(f), chunkSize):
    chunk = f[i:i+chunkSize]

    x,y = [],[]

    # extraxt x and y value for each points
    for p in xrange(len(chunk)):
        x.append(chunk[p].x)
        y.append(chunk[p].y)

    # zip all points 
    points = np.array(zip(x,y))
    # create an index where are present the points inside the polygon
    index = nonzero(points_inside_poly(points, verts))[0]

    # if index is not empty do this otherwise "pass"
    if len(index) != 0:
        c = [chunk[l] for l in index] #Is It correct to create a new list or i can replace chunck?
        # save points
        a = [file_out.write(c[m]) for m in xrange(len(c))] #use a = in order to avoid video print. Is it correct?
    else:
        pass #Is It correct to use pass?

f.close()
file_out.close()
code proposed by @Roland Smith and changed by Gianni
f = lasfile.File(inFile,None,'r') # open LAS
h = f.header
# change the software id to libLAS
h.software_id = "Gianni"
file_out = lasfile.File(outFile,mode='w',header= h)
f.close()
sf = shapefile.Reader(poly) #open shpfile
shapes = sf.shapes()
for i in xrange(len(shapes)):
    verts = np.array(shapes[i].points,float)
    inside_points = [p for p in lasfile.File(inFile,None,'r') if pnpoly(p.x, p.y, verts)]
    for p in inside_points:
        file_out.write(p)
f.close()
file_out.close()
i used these solution:
1) reading f = lasfile.File(inFile,None,'r') and after the read head because i need in the *.las output file
2) close the file 
3) i used inside_points = [p for p in lasfile.File(inFile,None,'r') if pnpoly(p.x, p.y, verts)] instead of 
with lasfile.File(inFile, None, 'r') as f:
...     inside_points = [p for p in f if pnpoly(p.x, p.y, verts)]
...     
because i always get this error message

Traceback (most recent call last):
File "", line 1, in 
AttributeError: _exit_
 解决方案 
Regarding (1): 

First, why are you using chunks? Just use the lasfile as an iterator (as shown in the tutorial), and process the points one at a time. The following should get write all the points inside the polygon to the output file, by using the pnpoly function in a list comprehension instead of points_inside_poly.
from liblas import file as lasfile
import numpy as np
from matplotlib.nxutils import pnpoly

with lasfile.File(inFile, None, 'r') as f:
    inside_points = (p for p in f if pnpoly(p.x, p.y, verts))
    with lasfile.File(outFile,mode='w',header= h) as file_out:
        for p in inside_points:
            file_out.write(p)
The five lines directly above should replace the whole big for-loop. Let's go over them one-by-one:


with lasfile.File(inFile...: Using this construction means that the file will be closed automatically when the with block finishes.
Now comes the good part, the generator expression that does all the work (the part between ()). It iterates over the input file (for p in f). Every point that is inside the polygon (if pnpoly(p.x, p.y, verts)) is added to the generator.
We use another with block for the output file
and all the points (for p in inside_points, this is were the generator is used)
are written to the output file (file_out.write(p))


Because this method only adds the points that are inside the polygon to the list, you don't waste memory on points that you don't need!

You should only use chunks if the method shown above doesn't work. 
When using chunks you should handle the exception properly. E.g:
from liblas import LASException

chunkSize = 100000
for i in xrange(0,len(f), chunkSize):
    try:
        chunk = f[i:i+chunkSize]
    except LASException:
        rem = len(f)-i
        chunk = f[i:i+rem]
Regarding (2): Sorry, but I fail to understand what you are trying to accomplish here. What do you mean by "video print"? 

Regarding (3): since you are not using the original chunk anymore, you can re-use the name. Realize that in python a "variable" is just a nametag.

Regarding (4): you aren't using the else, so leave it out completely.

                        这篇关于Python：建议改进一个chunk-by-chunk代码读取几百万点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Python：建议改进一个chunk-by-chunk代码读取几百万点 [英] Python: suggestions to improve a chunk-by-chunk code to read several millions of points

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python：建议改进一个chunk-by-chunk代码读取几百万点 [英] Python: suggestions to improve a chunk-by-chunk code to read several millions of points

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭