在PostgreSQL中聚集(x,y)坐标点云 [英] Aggregating (x,y) coordinate point clouds in PostgreSQL
问题描述
我有一个具有以下简化结构的PostgreSQL数据库表:
I have a a PostgreSQL database table with the following simplified structure:
- 设备ID varchar
- Pos_X(int)
- Pos_Y(int)
基本上此表包含很多设备的二维航路点数据。现在,我想设计一个查询,以减少输出中的坐标数。它应该汇总附近的坐标(对于某个x,y阈值)
例如:
Basically this table contains a lot of two dimensional waypoint data for devices. Now I want to design a query which reduces the number of coordinates in the output. It should aggregate nearby coordinates (for a certain x,y threshold) An example:
第1行:DEVICE1; 603; 1205
row 1: DEVICE1;603;1205
第2行:DEVICE1; 604; 1204
row 2: DEVICE1;604;1204
如果阈值为5,则应汇总这两行,因为方差小于5.
知道一般如何在PostgreSQL或SQL中执行此操作吗?
If the threshold is 5, these two rows should be aggregated since the variance is smaller than 5. Any idea how to do this in PostgreSQL or SQL in general?
推荐答案
使用经常被忽略的内置在函数 width_bucket()
结合您的汇总:
Use the often overlooked built-in function width_bucket()
in combination with your aggregation:
如果您的坐标从0到2000,并且您希望合并所有内容在5到单个点的正方形内,我会像这样布置10(5 * 2)的网格:
If your coordinates run from, say, 0 to 2000 and you want to consolidate everything within squares of 5 to single points, I would lay out a grid of 10 (5*2) like this:
SELECT device_id
, width_bucket(pos_x, 0, 2000, 2000/10) * 10 AS pos_x
, width_bucket(pos_y, 0, 2000, 2000/10) * 10 AS pos_y
, count(*) AS ct -- or any other aggregate
FROM tbl
GROUP BY 1,2,3
ORDER BY 1,2,3;
要最小化错误,您可以 GROUP BY
如图所示,但保存实际平均坐标:
To minimize the error you could GROUP BY
the grid as demonstrated, but save actual average coordinates:
SELECT device_id
, avg(pos_x)::int AS pos_x -- save actual averages to minimize error
, avg(pos_y)::int AS pos_y -- cast if you need to
, count(*) AS ct -- or any other aggregate
FROM tbl
GROUP BY
device_id
, width_bucket(pos_x, 0, 2000, 2000/10) * 10 -- aggregate by grid
, width_bucket(pos_y, 0, 2000, 2000/10) * 10
ORDER BY 1,2,3;
这种情况可能更简单:
...
GROUP BY
device_id
, (pos_x / 10) * 10 -- truncates last digit of an integer
, (pos_y / 10) * 10
...
但这只是因为演示网格大小 10
可以方便地与十进制系统匹配。尝试使用 17
或其他大小的网格进行同样的操作...
But that's just because the demo grid size of 10
conveniently matches the decimal system. Try the same with a grid size of 17
or something ...
您可以扩展此方法以覆盖日期
和时间戳
值,方法是使用 extract()。
You can expand this approach to cover date
and timestamp
values by converting them to unix epoch (number of seconds since '1970-1-1') with extract().
SELECT extract(epoch FROM '2012-10-01 21:06:38+02'::timestamptz);
完成后,将结果转换回带有时区的 timestamp
:
When you are done, convert the result back to timestamp with time zone
:
SELECT timestamptz 'epoch' + 1349118398 * interval '1s';
或者直接 to_timestamp()
:
SELECT to_timestamp(1349118398);
这篇关于在PostgreSQL中聚集(x,y)坐标点云的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!