在大型Postgresl数据库上按日期获取最后一行 [英] Getting last rows by date on a large postgresl data base

查看:116
本文介绍了在大型Postgresl数据库上按日期获取最后一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大数据库(每分钟有6000行插入)在分区表上,当数据库很小时,我正在工作很酷,现在我有一个大型数据库。我正在使用此解决方案以前的解决方案按日期加入SQL,但它正在使用250MB的硬盘并且在我的桌面增长时长大,然后我决定将其更改为简单查询的迭代,这对10行有效,但是缓慢的超过10辆车(15秒)作为响应,并使用超过200MB的硬盘。

I have a large database ( there are 6000 rows inserted per minute ) on a partitioned table, I was working cool when the database was small, now I have a large database. I was using this solution a previous solutions SQL joined by date but it is using 250MB of hard disk and grows up while my table grows, then I decide on change it to an iteration of simple queries, that works well for 10 rows, but get slow with more than 10 cars ( 15 secods for response ) and uses more than 200MB of hard disk.

我的问题是如何更快地建立一个好的查询来解决这个问题

My question is how to build a good query faster for resolve this issue

额外的信息


  1. 查询在ajax的django应用程序上调用

  2. 我正在考虑迭代ajax调用而不是一个调用完整的项目列表响应

  3. 表按日分隔

我的实际查询是

CREATE OR REPLACE FUNCTION gps_get_last_positions (
    _plates varchar(8)
)
RETURNS TABLE (
    plate varchar,
    device_id integer,
    date_time_process timestamp with time zone, 
    latitude double precision, 
    longitude double precision, 
    course smallint, 
    speed smallint, 
    mileage integer,
    gps_signal smallint,
    gsm_signal smallint,
    alarm_status boolean,
    gsm_status boolean,
    vehicle_status boolean,
    alarm_over_speed boolean,
    other text,
    realtime  double precision
) AS $func$
DECLARE 
    arr varchar[];

BEGIN
    arr := regexp_split_to_array(_plates, E'\\s+');
    FOR i IN 1..array_length(arr, 1) LOOP
        RETURN QUERY SELECT 
        gpstracking_vehicles.registration,
        gpstracking_device_tracks.device_id, 
        gpstracking_device_tracks.date_time_process,
        gpstracking_device_tracks.latitude,
        gpstracking_device_tracks.longitude,
        gpstracking_device_tracks.course,
        gpstracking_device_tracks.speed,
        gpstracking_device_tracks.mileage,
        gpstracking_device_tracks.gps_signal,
        gpstracking_device_tracks.gsm_signal,
        gpstracking_device_tracks.alarm_status,
        gpstracking_device_tracks.gps_status,
        gpstracking_device_tracks.vehicle_status,
        gpstracking_device_tracks.alarm_over_speed,
        gpstracking_device_tracks.other,
        EXTRACT(EPOCH FROM current_timestamp - gpstracking_device_tracks.date_time_process)/60 AS realtime
        FROM (
        gpstracking_devices INNER JOIN (
        gpstracking_vehicles INNER JOIN gpstracking_vehicles_devices ON ( gpstracking_vehicles.id = gpstracking_vehicles_devices.vehicle_id AND gpstracking_vehicles_devices.is_joined = TRUE )
        ) ON ( gpstracking_devices.id = gpstracking_vehicles_devices.device_id AND gpstracking_vehicles_devices.is_joined = TRUE )
        ) INNER JOIN gpstracking_device_tracks ON ( gpstracking_devices.id = gpstracking_device_tracks.device_id )
        WHERE gpstracking_vehicles.registration = arr[i]::VARCHAR
        ORDER BY gpstracking_device_tracks.date_time_process DESC
        LIMIT 1;
    END LOOP;
    RETURN;
END;
$func$ 
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

配置参数

application_name            phpPgAdmin_5.0.4                    client
constraint_exclusion        on                                  configuration file
DateStyle                   ISO, MDY                            session
default_text_search_config  pg_catalog.english                  configuration file
external_pid_file           /var/run/postgresql/9.1-main.pid    configuration file
lc_messages                 en_US.UTF-8                         configuration file
lc_monetary                 en_US.UTF-8                         configuration file
lc_numeric                  en_US.UTF-8                         configuration file
lc_time                     en_US.UTF-8                         configuration file
log_line_prefix             %t                                  configuration file
log_timezone                localtime                           environment variable
max_connections             100                                 configuration file
max_stack_depth             2MB                                 environment variable
port                        5432                                configuration file
shared_buffers              24MB                                configuration file
ssl                         on                                  configuration file
TimeZone                    localtime                           environment variable
unix_socket_directory       /var/run/postgresql                 configuration file

我的第一个慢查询是: p>

My first slow query was:

CREATE OR REPLACE VIEW view_vehicle_devices AS
SELECT 
gpstracking_vehicles_devices.id AS id,
gpstracking_devices.id AS device_id,
gpstracking_vehicles.id AS vehicle_id,
gpstracking_drivers.id AS driver_id,
gpstracking_device_protocols.name AS protocol,
gpstracking_vehicles.registration AS plate,
gpstracking_drivers.firstname as first_name,
gpstracking_drivers.lastname as last_name,
gpstracking_devices.imei,
gpstracking_devices.simcard, 
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
gpstracking_device_tracks.point,
EXTRACT(EPOCH FROM current_timestamp - gpstracking_device_tracks.date_time_process)/60 realtime,
gpstracking_devices.created,
gpstracking_devices.updated,
gpstracking_devices.is_connected

FROM (
gpstracking_vehicles LEFT JOIN (
gpstracking_drivers  LEFT JOIN gpstracking_vehicles_drivers ON gpstracking_drivers.id = gpstracking_vehicles_drivers.driver_id AND gpstracking_vehicles_drivers.is_joined = TRUE
) ON gpstracking_vehicles.id = gpstracking_vehicles_drivers.vehicle_id AND gpstracking_vehicles_drivers.is_joined = TRUE
) LEFT JOIN (((
gpstracking_device_protocols RIGHT JOIN gpstracking_devices ON gpstracking_device_protocols.id = gpstracking_devices.device_protocol_id
) LEFT JOIN (
SELECT DISTINCT ON (gpstracking_device_tracks.device_id) gpstracking_device_tracks.device_id, 
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
gpstracking_device_tracks.point
FROM gpstracking_device_tracks 
ORDER BY gpstracking_device_tracks.device_id, gpstracking_device_tracks.date_time_process DESC
) AS gpstracking_device_tracks ON gpstracking_devices.id = gpstracking_device_tracks.device_id  
) LEFT JOIN gpstracking_vehicles_devices ON ( gpstracking_devices.id = gpstracking_vehicles_devices.device_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) ON ( gpstracking_vehicles.id = gpstracking_vehicles_devices.vehicle_id AND gpstracking_vehicles_devices.is_joined = TRUE )

我已经改变了循环开始我的帖子,我的循环更快,但是不够快,因为我需要

I have changed it for the loop that is starting my post, my loop rujns faster, however is not enought faster as I need

推荐答案

您的问题是,策划者无法知道在哪个分区是一个回答您的查询。它只有统计数据。所以你不会受益于一天中的数据分割。

Your problem is that a planner can not know in which partition is an answer to your query. It only has statistics. So you do not benefit from partitioning your data by a day at all.

为了从中受益,您可以修改查询,以便从当天开始查找最新的坐标,如果不从昨天则发现,如果没有之前从日发现等等。我想99天的分区只能在今天的分区中找到。

To benefit from it you can modify your query so it'll look for latest coordinates from current day, if not found then from yesterday, if not found from a day before an so on. I suppose 99% answers would be found in todays partition only.

或者你可以通过分区例如 device_id%256

Or you can partition by for example device_id % 256 instead.

但是更好的是创建一个附加的表,只有几个最近的设备坐标。在 gpstracking_device_tracks 中将会执行(伪代码)维护:

But even better would be to create an additional table with several recent device coordinates only. It would be maintained with a trigger on gpstracking_device_tracks which will just do (pseudocode):

if random()*64 < 1.0 then
  -- statistically once per 64 runs do a cleanup
  with todelete as (
    -- lock rows in particular order to avoid possible deadlocks
    -- if run concurrently
    select id from gpstracking_device_tracks_recent where device_id=?
      order by id for share;
  )
  delete from gpstracking_device_tracks_recent
    where id in (select id from todelete)
end if;
insert into gpstracking_device_tracks_recent (...) values (...);

然后在这个更小的表格中查找最新的坐标。

And then look for latest coordinates in this much smaller table.

这篇关于在大型Postgresl数据库上按日期获取最后一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆