使用sqlite在django上实现重音不敏感搜索 [英] Implementing accent insensitive search on django using sqlite

查看:121
本文介绍了使用sqlite在django上实现重音不敏感搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与我之前提到的问题相关:重音不敏感搜索django sqlite



如回应中所述,没有直接的方法。我已经提出了一个解决方案,但我不知道这是否是一个好的:



使用案例:假设数据库有 NewsArticles ,其中一列为 ArticleText 。顾名思义, ArticleText 包含包含多个具有重音字符的单词的新闻文章的文本。让我们来说,在 ArticleText 中的一个这样的词出现在主键 aid123 的文章是 Puerto艾森。现在,用户可以搜索 PuertoAisén或 Puerto Aisen ,并且应该可以使用PK aid123 返回以粗体字(< b> PuertoAisén< / b> )发现的重音词。 >

解决方案:我在表 normalizedArticleText 中增加了一列,并使其包含 unicode.normalize (口音删除)版本的文本。现在每当搜索查询出现时,我首先使用 s.decode('ascii')确定查询是否包含重音字符,然后在相应的列中进行相应的搜索。



问题:我正在复制整个数据。此外,如果搜索查询是关键字的非重音版本,我无法加粗重音关键字。



任何辉煌的建议?我正在使用django与sqlite

解决方案

尝试使用unicodedata包。以下是Python 3的一个例子:

  import unicodedata 

unicodedata.normalize('NFD' (ascii,ignore)

或者,对于Python 2.7: / p>

  import unicodedata 

unicodedata.normalize('NFD',u'répertoire' ascii','ignore')

这些都将输出:

 'repertoire'

répertoire与您的字符串。 NFD 是正常化的表单。您可以在这里阅读有关不同形式的规范化的更多信息:



https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize
https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize



祝你好运!


This question is related to my earlier question Accent insensitive search django sqlite

As mentioned in the response there is no direct way to do so. I have come up with a solution, but I am not sure if it is a good one:

Use Case: Assume that the database has a table NewsArticles with one of the column being ArticleText. As the name implies ArticleText contains the text of the news articles which includes several words with accented characters. Let's say one such word present in the ArticleText for an article with Primary Key aid123 is Puerto Aisén. Now, a user can search for either Puerto Aisén or Puerto Aisen and should be able to get the article with PK aid123 back with the found accented word in bold (<b>Puerto Aisén</b>).

Solution: I add one more column in the table normalizedArticleText and make it contain the unicode.normalize (accent removed) version of the text. Now whenever a search query comes, I first determine if the query contains accented character or not by using s.decode('ascii') and then search accordingly in the corresponding column.

Problem: I am duplicating the whole data. Also, there is no way for me to bold the accented keyword if the search query was the non-accented version of the keyword.

Any brilliant suggestions? I am using django with sqlite

解决方案

Try using the unicodedata package. Here's an example for Python 3:

import unicodedata

unicodedata.normalize('NFD', 'répertoire').encode('ascii', 'ignore')

Or, for Python 2.7:

import unicodedata

unicodedata.normalize('NFD', u'répertoire').encode('ascii', 'ignore')

Either of these will output:

'repertoire'

Simply replace répertoire with your string. NFD is a form of normalization. You can read more on the different forms of normalization here:

https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize

Good luck!

这篇关于使用sqlite在django上实现重音不敏感搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆