在PostgreSQL中聚集(x,y)坐标点云 [英] Aggregating (x,y) coordinate point clouds in PostgreSQL

查看:458
本文介绍了在PostgreSQL中聚集(x,y)坐标点云的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下简化结构的PostgreSQL数据库表:

I have a a PostgreSQL database table with the following simplified structure:


  • 设备ID varchar

  • Pos_X(int)

  • Pos_Y(int)

基本上此表包含很多设备的二维航路点数据。现在,我想设计一个查询,以减少输出中的坐标数。它应该汇总附近的坐标(对于某个x,y阈值)
例如:

Basically this table contains a lot of two dimensional waypoint data for devices. Now I want to design a query which reduces the number of coordinates in the output. It should aggregate nearby coordinates (for a certain x,y threshold) An example:

第1行:DEVICE1; 603; 1205

row 1: DEVICE1;603;1205

第2行:DEVICE1; 604; 1204

row 2: DEVICE1;604;1204

如果阈值为5,则应汇总这两行,因为方差小于5.
知道一般如何在PostgreSQL或SQL中执行此操作吗?

If the threshold is 5, these two rows should be aggregated since the variance is smaller than 5. Any idea how to do this in PostgreSQL or SQL in general?

推荐答案

使用经常被忽略的内置在函数 width_bucket() 结合您的汇总:

Use the often overlooked built-in function width_bucket() in combination with your aggregation:

如果您的坐标从0到2000,并且您希望合并所有内容在5到单个点的正方形内,我会像这样布置10(5 * 2)的网格:

If your coordinates run from, say, 0 to 2000 and you want to consolidate everything within squares of 5 to single points, I would lay out a grid of 10 (5*2) like this:

SELECT device_id
     , width_bucket(pos_x, 0, 2000, 2000/10) * 10 AS pos_x
     , width_bucket(pos_y, 0, 2000, 2000/10) * 10 AS pos_y
     , count(*) AS ct -- or any other aggregate
FROM   tbl
GROUP  BY 1,2,3
ORDER  BY 1,2,3;

最小化错误,您可以 GROUP BY 如图所示,但保存实际平均坐标:

To minimize the error you could GROUP BY the grid as demonstrated, but save actual average coordinates:

SELECT device_id
     , avg(pos_x)::int AS pos_x   -- save actual averages to minimize error
     , avg(pos_y)::int AS pos_y   -- cast if you need to
     , count(*)        AS ct      -- or any other aggregate
FROM   tbl
GROUP  BY
       device_id
     , width_bucket(pos_x, 0, 2000, 2000/10) * 10  -- aggregate by grid
     , width_bucket(pos_y, 0, 2000, 2000/10) * 10
ORDER  BY 1,2,3;

sqlfiddle一起展示了它们。

这种情况可能更简单:

...
GROUP  BY
       device_id
     , (pos_x / 10) * 10          -- truncates last digit of an integer
     , (pos_y / 10) * 10
...

但这只是因为演示网格大小 10 可以方便地与十进制系统匹配。尝试使用 17 或其他大小的网格进行同样的操作...

But that's just because the demo grid size of 10 conveniently matches the decimal system. Try the same with a grid size of 17 or something ...

您可以扩展此方法以覆盖日期时间戳值,方法是使用 extract()

You can expand this approach to cover date and timestamp values by converting them to unix epoch (number of seconds since '1970-1-1') with extract().

SELECT extract(epoch FROM '2012-10-01 21:06:38+02'::timestamptz);

完成后,将结果转换回带有时区的 timestamp

When you are done, convert the result back to timestamp with time zone:

SELECT timestamptz 'epoch' + 1349118398 * interval '1s';

或者直接 to_timestamp()

SELECT to_timestamp(1349118398);

这篇关于在PostgreSQL中聚集(x,y)坐标点云的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆