优化非正规数据读取图像 [英] Optimizing non regularized data reading to image

查看:122
本文介绍了优化非正规数据读取图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些未规范化的源数据(示例显示在下面代码的csv变量中).在此数据中,我无法保证任何最小值,最大值或步长值.因此,我需要找出源数据.

I have some source data that isn't regularized (sample shown on csv variable on code below). In this data I can't garantee any minimum, maximum or step values. Therefore I need to find out on source data.

读取数据并定义必要的值以绘制我的图像后,出现了下面的循环.运行这样的代码读取(150k行)表明该代码非常慢,花了我大约110秒(!!!)来渲染整个图像(非常小的图像).

After reading the data, and defined the necessary values to plot my image I came with the loop below. Running this code reading (150k lines) like that showed that the code is pretty slow, took me around a 110 seconds (!!!) to render the whole image (a very small image).

欢迎任何提示,即使我必须使用其他库或数据类型也是如此.我的主要目标是显示来自CSV源(例如可以跨越一百万行的源)的热图".将文件读入数据集或绘制图形很快.问题是从csv创建图像映射.

Any hints are welcome, even if I have to use other libraries or data types. My main objective is to show up "heat maps" from csv sources like those that can span for a million lines. Reading the file into the dataset o plotting the graph is fast. The issue is create the image map from the csv.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io

csv = """
"X","Y","V"
1001,1001,909.630432
1001,1003,940.660156
1001,1005,890.571594
1001,1007,999.651062
1001,1009,937.775513
1003,1002,937.601074
1003,1004,950.006897
1003,1006,963.458923
1003,1008,878.646851
1003,1012,956.835938
1005,1001,882.472656
1005,1003,857.491028
1005,1005,907.293335
1005,1007,877.087891
1005,1009,852.005554
1007,1002,880.791931
1007,1004,862.990967
1007,1006,882.135864
1007,1008,896.634521
1007,1010,888.916626
1013,1001,853.410583
1013,1003,863.324341
1013,1005,843.284607
1013,1007,852.712097
1013,1009,882.543640
"""

data=io.StringIO(csv)

columns = [ "X" , "Y", "V" ]

df = pd.read_csv(data, sep=',', skip_blank_lines=True, quoting=2, skipinitialspace=True, usecols = columns, index_col=[0,1] ) 

# Fields
x_axis="X"
y_axis="Y"
val="V"

# Unique values on the X-Y axis
x_ind=df.index.get_level_values(x_axis).unique()
y_ind=df.index.get_level_values(y_axis).unique()

# Size of each axis
nx = len(x_ind)
ny = len(y_ind)

# Maxima and minima
xmin = x_ind.min()
xmax = x_ind.max()
ymin = y_ind.min()
ymax = y_ind.max()

img = np.zeros((nx,ny))

print "Entering in loop"
for ix in range(0, nx):
    print "Mapping {0} {1}".format( x_axis, ix )
    for iy in range(0, ny):
        try:
            img[ix,iy] = df.loc[ix+xmin,iy+ymin][val]
        except KeyError:
            img[ix,iy] = np.NaN

plt.imshow(img, extent=[xmin, xmax, ymin, ymax], cmap=plt.cm.jet, interpolation=None)
plt.colorbar()
plt.show()

试图使用pcolormesh,但是如果不使用类似的循环,就无法正确地将值拟合到网格中.没有循环,我无法创建z_mesh

Tried to use pcolormesh, but was not able to correctly fit the values into the mesh without use a similar loop. I was not able to create the z_mesh without the loop

x_mesh,y_mesh = np.mgrid[xmin:xmax,ymin:ymax]
z_mesh = ?? hints ?? ;-)

推荐答案

我认为您的代码甚至没有执行您想要的操作,我运行了该代码,并且图像中只有14个有效点.

I think your code is not even doing what you want, I ran it and got only 14 valid points in the image.

您可以使用pivot()unstack()然后使用reindex()来创建图像.这就是你想要的吗?

You may use pivot() or unstack() and then reindex() to create the image. Is this what you want?

data=io.StringIO(csv)
df = pd.read_csv(data, sep=',', skip_blank_lines=True, quoting=2,
                 skipinitialspace=True, usecols = columns)
img = df.pivot(index='Y', columns='X', values='V')
img = img.reindex(index=range(df['Y'].min(), df['Y'].max() + 1),
                  columns=range(df['X'].min(), df['X'].max() + 1))

extent = [df['X'].min() - 0.5, df['X'].max() + 0.5,
          df['Y'].min() - 0.5, df['Y'].max() + 0.5]
plt.imshow(img, origin='lower', extent=extent)
plt.colorbar()

这篇关于优化非正规数据读取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆