如何基于“最新”来优化排序?相关模型 [英] How to optimize a sort based on "latest" related model
问题描述
所以说我们有两个模型
class Product(models.Model):
""" A model representing a product in a website. Has new datapoints referencing this as a foreign key daily """
name = models.CharField(null=False, max_length=1024, default="To be Scraped")
url = models.URLField(null=False, blank=False, max_length=10000)
class DataPoint(models.Model):
""" A model representing a datapoint in a Product's timeline. A new one is created for every product daily """
product = models.ForeignKey(Product, null=False)
price = models.FloatField(null=False, default=0.0)
inventory_left = models.BigIntegerField(null=False, default=0)
inventory_sold = models.BigIntegerField(null=False, default=0)
date_created = models.DateField(auto_now_add=True)
def __unicode__(self):
return "%s - %s" % (self.product.name, self.inventory_sold)
<目标是根据产品附带的最新数据点的ventory_sold值对产品的QuerySet进行排序。到目前为止,这是我所拥有的:
The goal is to sort a QuerySet of products based on the inventory_sold value of the latest datapoint attached to the product. Here's what I have so far:
products = Product.objects.all()
datapoints = DataPoint.objects.filter(product__in=products)
datapoints = list(datapoints.values("product__id", "inventory_sold", "date_created"))
products_d = {}
# Loop over the datapoints values array
for i in datapoints:
# If a datapoint for the product doesn't exist in the products_d, add the datapoint
if str(i["product__id"]) not in products_d.keys():
products_d[str(i["product__id"])] = {"inventory_sold": i["inventory_sold"], "date_created": i["date_created"]}
# Otherwise, if the current datapoint was created after the existing datapoint, overwrite the datapoint in products_d
else:
if products_d[str(i["product__id"])]["date_created"] < i["date_created"]:
products_d[str(i["product__id"])] = {"inventory_sold": i["inventory_sold"], "date_created": i["date_created"]}
# Sort the products queryset based on the value of inventory_sold in the products_d dictionary
products = sorted(products, key=lambda x: products_d.get(str(x.id), {}).get("inventory_sold", 0), reverse=True)
这行得通,但在( 500,000〜)产品和数据点的数量。还有什么更好的方法吗?
This works alright, but it's extremely slow with a high (500,000~) number of products and datapoints. Is there any better way to do this?
还有一点(不重要),因为我对此一无所获,所以好像是unicode DataPoint模型的方法也进行不必要的SQL查询。这是Django模型传递给模板后的默认特征吗?
And on a side note (unimportant), since I haven't able to find anything about this, it seems like the unicode method of the DataPoint model is also making needless SQL queries. Is this something that's a default characteristic of Django models once they get passed to templates?
推荐答案
我认为您可以使用子查询此处用于注释最新数据点的值,然后排序。
I think you can use a subquery here to annotate the value of the most recent datapoint, then sort on that.
根据这些文档中的示例,结果类似于:
Based on the example in those docs, it would be something like:
from django.db.models import OuterRef, Subquery
newest = DataPoint.objects.filter(product=OuterRef('pk')).order_by('-date_created')
products = Product.objects.annotate(
newest_inventory_sold=Subquery(newest.values('inventory_sold')[:1])
).order_by('newest_inventory_sold')
对于您来说,为了避免在输出DataPoints时出现额外的查询,您需要在 select_related
中使用原始查询:
For your side point, to avoid the extra queries when outputting DataPoints you will need to use select_related
in the original query:
datapoints = DatePoint.objects.filter(...).select_related('product')
这将执行JOIN操作,以便获得产品名称不会引起新的数据库查找。
This will do a JOIN so that getting the product name doesn't cause a new db lookup.
这篇关于如何基于“最新”来优化排序?相关模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!