找到一个更简单的方法,以集群的2-D散射数据转换成栅格阵列数据 [英] Find a easier way to cluster 2-d scatter data into grid array data

查看:343
本文介绍了找到一个更简单的方法,以集群的2-D散射数据转换成栅格阵列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经找到了一种方法来聚集点数据分散到结构化的2-D阵列(例如栅格功能)。我希望有一些更好的方法来实现这一目标。

我的作品

1。简介


  • 1000点数据有没有whicn重新present一厂位于属性(LON,纬度,排放)的尺寸(X,Y)排放一定量的二氧化碳进入大气层

  • 格网:predefine在20×20的形状的2-D阵列

在code转载这里:

  ####定义地图区域
XC1,XC2,YC1,YC2 = 113.49805889531724,115.5030664238035,37.39995194888143,38.789235929357105
地图底图=(llcrnrlon = XC1,llcrnrlat = YC1,urcrnrlon = XC2,urcrnrlat = YC2)####读取由它们的位置点数据和散点图
DF = pd.read_csv(xxxxx.csv)
PX,PY =地图(df.lon,df.lat)
map.scatter(PX,PY,颜色=红,S = 5,ZORDER = 3)#### predefine电网网络
lon_grid,lat_grid = np.linspace(XC1,xc2,21),np.linspace(YC1,yc2,21)
lon_x,lat_y = np.meshgrid(lon_grid,lat_grid)
网格= np.zeros(20 * 20).reshape(20,20)
plt.pcolormesh(lon_x,lat_y,网格,CMAP =灰色,facecolor ='无',edgecolor ='K',ZORDER = 3)

2。我的目标


  

      
  1. 查找最近的网格点,每个工厂

  2.   
  3. 排放数据添加到这个数字格

  4.   

3。算法实现

3.1栅格网

注意:20×20的网格点分布在这个区域重新由蓝点psented $ P $。

3.2 KD树

查找每个红点最近的蓝点

  SH =(20 * 20,2)
网格= np.zeros(20 * 20 * 2).reshape(* SH)sh_emission =(20 * 20)
grids_em = np.zeros(20 * 20).reshape(sh_emission)K = 0
对于在范围Ĵ(0,yy.shape [0],1):
    对于i在范围(0,xx.shape [0],1):
        网格[K] = np.array([lon_grid [I],lat_grid [J]])
        K + = 1T = KDTree(网格)X_DELTA =(lon_grid [2] - lon_grid [1])
Y_DELTA =(lat_grid [2] - lat_grid [1])
R = np.sqrt(X_DELTA ** 2 + Y_DELTA ** 2)对于i在范围(0,LEN(df.lon),1):
    IDX = T.query_ball_point([df.lon.iloc [I],df.lat.iloc [Ⅰ],R = R)
    #有一个以上的蓝点这有时会成立后,
    #所以我会计算厂(红点)之间的距离
    #并列出所有蓝点
    如果(IDX→1):
        距离= []
        对于在范围K(0,LEN(IDX),1):
            distance.append(np.sqrt((df.lon.iloc [I] - 网格[K] [0])** 2+(df.lat.iloc [I] - 网格[k]的[1])** 2))
           pos_index = distance.index(分(距离))
           POS = IDX [pos_index]    #只发现1个点
    其他:
         POS = IDX
    grids_em [POS] + = df.so2 [I]

4。结果

  CO2 = grids_em.reshape(20,20)
plt.pcolormesh(lon_x,lat_y,CO2,CMAP = plt.cm.Spectral_r,ZORDER = 3)

5。我的问题


  • 有人能指出这种方法的一些缺点或错误?

  • 有一些算法,我的目标更一致?

非常感谢!


解决方案

有许多for循环在code,它不是numpy的方式。

先作一些样本数据:

 导入numpy的是NP
进口大熊猫作为PD
从scipy.spatial进口KDTree
进口pylab为PLXC1,XC2,YC1,YC2 = 113.49805889531724,115.5030664238035,37.39995194888143,38.789235929357105N = 1000
GSIZE = 20
的x,y = np.random.multivariate_normal([(XC1 + XC2)* 0.5,(YC1 + YC2)* 0.5],[[0.1,0.02],[0.02,0.1],大小= N).T
值= np.ones(N)df_points = pd.DataFrame({×:X,Y为:y,V:值})

有关平等的空间网格,您可以使用 hist2d()

  pl.hist2d(df_points.x,df_points.y,权重= df_points.v,箱= 20,CMAP =狗尾草);

下面是输出:

在这里输入的形象描述

下面是code使用 KdTree

  X,Y = np.mgrid [x.min():x.max():GSIZE * 1J,y.min():y.max():GSIZE * 1J]格= np.c_ [X.ravel(),Y.ravel()]
点= np.c_ [df_points.x,df_points.y]树= KDTree(网格)
DIST,指数= tree.query(点)grid_values​​ = df_points.groupby(指数).v.sum()df_grid = pd.DataFrame(网格,列= [×,Y])
df_grid [V] = grid_values无花果,AX = pl.subplots(figsize =(10,8))
ax.plot(df_points.x,df_points.yKX,α-= 0.2)
映射器= ax.scatter(df_grid.x,df_grid.y,C = df_grid.v,
                    CMAP =狗尾草
                    线宽= 0,
                    S = 100,标记=O)
pl.colorbar(映射器,斧斧=);

的输出是:

在这里输入的形象描述

I have figured out a method to cluster disperse point data into structured 2-d array(like rasterize function). And I hope there are some better ways to achieve that target.

My work

1. Intro

  • 1000 point data has there dimensions of properties (lon, lat, emission) whicn represent one factory located at (x,y) emit certain amount of CO2 into atmosphere
  • grid network: predefine the 2-d array in the shape of 20x20

The code reproduced here:

#### define the map area
xc1,xc2,yc1,yc2 = 113.49805889531724,115.5030664238035,37.39995194888143,38.789235929357105       
map = Basemap(llcrnrlon=xc1,llcrnrlat=yc1,urcrnrlon=xc2,urcrnrlat=yc2)     

#### reading the point data and scatter plot by their position
df = pd.read_csv("xxxxx.csv")
px,py = map(df.lon, df.lat)       
map.scatter(px, py, color = "red", s= 5,zorder =3)      

#### predefine the grid networks      
lon_grid,lat_grid = np.linspace(xc1,xc2,21), np.linspace(yc1,yc2,21)
lon_x,lat_y = np.meshgrid(lon_grid,lat_grid)
grids = np.zeros(20*20).reshape(20,20)
plt.pcolormesh(lon_x,lat_y,grids,cmap =  'gray', facecolor = 'none',edgecolor = 'k',zorder=3)

2. My target

  1. Finding the nearest grid point for each factory
  2. Add the emission data into this grid number

3. Algorithm realization

3.1 Raster grid

note: 20x20 grid points are distributed in this area represented by blue dot.

3.2 KD-tree

Find the nearest blue dot of each red point

sh = (20*20,2)
grids = np.zeros(20*20*2).reshape(*sh)

sh_emission = (20*20)
grids_em = np.zeros(20*20).reshape(sh_emission)

k = 0
for j in range(0,yy.shape[0],1):
    for i in range(0,xx.shape[0],1):
        grids[k] = np.array([lon_grid[i],lat_grid[j]])
        k+=1

T = KDTree(grids)

x_delta = (lon_grid[2] - lon_grid[1])
y_delta = (lat_grid[2] - lat_grid[1])
R = np.sqrt(x_delta**2 + y_delta**2)

for i in range(0,len(df.lon),1):
    idx = T.query_ball_point([df.lon.iloc[i],df.lat.iloc[i]], r=R)
    # there are more than one blue dot which are founded sometimes,      
    # So I'll calculate the distances between the factory(red point)       
    # and all blue dots which are listed 
    if (idx > 1):
        distance = []
        for k in range(0,len(idx),1):
            distance.append(np.sqrt((df.lon.iloc[i] - grids[k][0])**2 + (df.lat.iloc[i] - grids[k][1])**2))
           pos_index = distance.index(min(distance))
           pos = idx[pos_index]

    # Only find 1 point
    else:
         pos = idx   
    grids_em[pos] += df.so2[i]      

4. Result

co2 = grids_em.reshape(20,20)
plt.pcolormesh(lon_x,lat_y,co2,cmap =plt.cm.Spectral_r,zorder=3)

5. My question

  • Can someone point out some drawbacks or error of this method?
  • Is there some algorithms more aligned with my target?

Thanks a lot!

解决方案

There are many for-loop in your code, it's not the numpy way.

Make some sample data first:

import numpy as np
import pandas as pd
from scipy.spatial import KDTree
import pylab as pl

xc1, xc2, yc1, yc2 = 113.49805889531724, 115.5030664238035, 37.39995194888143, 38.789235929357105       

N = 1000
GSIZE = 20
x, y = np.random.multivariate_normal([(xc1 + xc2)*0.5, (yc1 + yc2)*0.5], [[0.1, 0.02], [0.02, 0.1]], size=N).T
value = np.ones(N)

df_points = pd.DataFrame({"x":x, "y":y, "v":value})

For equal space grids you can use hist2d():

pl.hist2d(df_points.x, df_points.y, weights=df_points.v, bins=20, cmap="viridis");

Here is the output:

Here is the code to use KdTree:

X, Y = np.mgrid[x.min():x.max():GSIZE*1j, y.min():y.max():GSIZE*1j]

grid = np.c_[X.ravel(), Y.ravel()]
points = np.c_[df_points.x, df_points.y]

tree = KDTree(grid)
dist, indices = tree.query(points)

grid_values = df_points.groupby(indices).v.sum()

df_grid = pd.DataFrame(grid, columns=["x", "y"])
df_grid["v"] = grid_values

fig, ax = pl.subplots(figsize=(10, 8))
ax.plot(df_points.x, df_points.y, "kx", alpha=0.2)
mapper = ax.scatter(df_grid.x, df_grid.y, c=df_grid.v, 
                    cmap="viridis", 
                    linewidths=0, 
                    s=100, marker="o")
pl.colorbar(mapper, ax=ax);

the output is:

这篇关于找到一个更简单的方法,以集群的2-D散射数据转换成栅格阵列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆