Django查询自然排序 [英] Django Query Natural Sort

查看:94
本文介绍了Django查询自然排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这个Django模型:

Let's say I have this Django model:

class Question(models.Model):
    question_code = models.CharField(max_length=10)

我在数据库中有15,000个问题.

and I have 15k questions in the database.

我想按字母数字 question_code 对其进行排序.这是一个非常经典的问题,已经在以下文章中进行了讨论:

I want to sort it by question_code, which is alphanumeric. This is quite a classical problem and has been talked about in:

  • http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
  • Does Python have a built in function for string natural sort?

我尝试了第二个链接中的代码(复制到下面,更改了一下),并注意到最多需要3秒才能对数据进行排序.为了确保函数的性能,我编写了一个测试,该测试创建了一个100k个随机字母数字字符串的列表.对该列表进行排序仅需0.76s.那是怎么回事?

I tried the code in the 2nd link (which is copied below, changed a bit), and notice it takes up to 3 seconds to sort the data. To make sure about the function's performance, I write a test which creates a list of 100k random alphanumeric string. It takes only 0.76s to sort that list. So what's happening?

这就是我的想法.该函数需要获取每个问题的 question_code 进行比较,因此调用此函数对15k值进行排序意味着要分别请求mysql 15k的时间.这就是为什么要花这么长时间的原因.任何的想法?对于Django,自然排序的任何解决方案都可以吗?非常感谢!

This is what I think. The function needs to get the question_code of each question for comparing, thus calling this function to sort 15k values means requesting mysql 15k separate times. And this is the reason why it takes so long. Any idea? And any solution to natural sort for Django in general? Thanks a lot!

def natural_sort(l, ascending, key=lambda s:s):
    def get_alphanum_key_func(key):
        convert = lambda text: int(text) if text.isdigit() else text
        return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
    sort_key = get_alphanum_key_func(key)
    return sorted(l, key=sort_key, reverse=ascending)

推荐答案

据我所知,还没有通用的Django解决方案.您可以通过构建id/question_code查找结构来减少内存使用并限制数据库查询

As far as I'm aware there isn't a generic Django solution to this. You can reduce your memory usage and limit your db queries by building an id/question_code lookup structure

from natsort import natsorted
question_code_lookup = Question.objects.values('id','question_code')
ordered_question_codes = natsorted(question_code_lookup, key=lambda i: i['question_code'])

假设您要分页结果,然后可以对ordered_question_codes进行切片,执行另一个查询以检索所有需要的问题,并根据它们在该切片中的位置对其进行排序

Assuming you want to page the results you can then slice up ordered_question_codes, perform another query to retrieve all the questions you need order them according to their position in that slice

#get the first 20 questions
ordered_question_codes = ordered_question_codes[:20]
question_ids = [q['id'] for q in ordered_question_codes]
questions = Question.objects.filter(id__in=question_ids)
#put them back into question code order
id_to_pos = dict(zip((question_ids), range(len(question_ids))))
questions = sorted(questions, key = lambda x: id_to_pos[x.id])

如果查找结构仍然使用太多内存,或者排序时间太长,那么您将不得不提出更高级的内容.当然,这无法很好地扩展到庞大的数据集

If the lookup structure still uses too much memory, or takes too long to sort, then you'll have to come up with something more advanced. This certainly wouldn't scale well to a huge dataset

这篇关于Django查询自然排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆