如何筛选/减少到第n行的QuerySet? [英] How to filter/reduce a QuerySet to every nth row?

查看:124
本文介绍了如何筛选/减少到第n行的QuerySet?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将来自多个传感器的时间序列数据存储在MySQL数据库中。每个传感器都与一个设备相关联,并且每个设备可以具有多个传感器。

I am storing time-series data from a number of sensors in a MySQL db. Each sensor is associated with a device, and each device can have multiple sensors.

传感器每10秒轮询一次,因此会持续很长时间(每天/每周/每月/每年) ),获取不必要的大数据集会遇到问题。

The sensors poll every 10 seconds, so for long periods (day/week/month/year), fetching the unnecessarily large dataset becomes problematic.

我想在评估之前对QuerySet进行重新采样,以使其仅获取第n行。

I would like to resample the QuerySet prior to evaluation so that it only fetches every nth row. Is this possible?

如果没有,我可以采用一种更明智的方法吗?我想我可以找出一个匹配时间戳的微秒可能值的1 / n的where子句?

If not, is there a smarter approach I can take? I suppose I could figure out a where clause that matches 1/n of the possible values for the timestamp's microseconds?

device_name = request.GET['device']
   device = Datalogger.objects.get(device_name=device_name)

   sensors = Sensor.objects.filter(datalogger=device).order_by('pk').select_related('type')
   sensor_models = sensors.values_list('type', flat=True)  # get all models of sensor used by this controller
   sensor_datum_types = list(SensorModelDatumType.objects.filter(sensor__in=sensor_models).order_by('sensor',
                                                                                                    'datum_type'))  # get all datatypes relating to all models of sensor used

# assign each trace (sensor/datum_type combination) an indice for the tuples (zero is used for time/x-axis)
   bulk_queryset = SensorDatum.objects.filter(sensor__datalogger__device_name=device_name,
                                              timestamp__gte=get_filter_start_time(request),
                                              timestamp__lte=get_filter_end_time(request))
   chart_traces = []
   chart_trace_indices = {}
   chart_trace_data = [None]
   chart_trace_queryset = SensorDatum.objects.none()
   next_free_idx = 1
   for sensor in sensors:
       for datum_type in sensor_datum_types:
           if datum_type.sensor == sensor.type:
               chart_trace_name = get_chart_trace_name(sensor.sensor_name, datum_type.datum_type.description)
               chart_traces.append({'sensor': sensor.sensor_name, 'datum_type': datum_type.datum_type.description,
                                    'chart_trace_name': chart_trace_name})
               chart_trace_indices.update({chart_trace_name: next_free_idx})
               chart_trace_queryset = chart_trace_queryset | bulk_queryset.filter(sensor_id=sensor.id,
                                                                                  type_id=datum_type.datum_type.id)
               next_free_idx += 1

   # process data into timestamp-grouped tuples accessible by chart_trace_index ([0] is timestamp)
   raw_data = list(chart_trace_queryset.order_by('timestamp', 'sensor_id', 'type_id'))
   row_count = len(raw_data)


推荐答案

您也许可以使用 .annotate()和一个仅检索第N行的模数。 我正在使用此答案作为参考。

You could perhaps use .annotate() and a modulus to only retrieve every Nth row. I'm using this answer as my reference.

Foo.objects.annotate(idmod4=F('id') % 4).filter(idmod4=0)

这应该返回大约每第4行,尽管如果您还使用其他一些过滤器,则可能无法获得确切的子样本,该过滤器可能会排除一堆适合的行模数,因此您很不幸运,您过滤所用的扫描器没有那么多的ID(是4的倍数)。尽管您提到生成的行很多,并且对于子样本来说,这可能就足够了。

This should return approximately every 4th row, though if you're using some other filters as well then you might not get an exact subsample, the filter might exclude a bunch of rows that would fit the modulus, so you get unlucky that the scanner your filtering for doesn't have as many id's that are a multiple of 4. Though you mentioned you are generating a lot of rows, and for a subsample this may be sufficient.

这篇关于如何筛选/减少到第n行的QuerySet?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆