使用sqlite在django上实现重音不敏感搜索 [英] Implementing accent insensitive search on django using sqlite
问题描述
此问题与我之前提到的问题相关:重音不敏感搜索django sqlite
如回应中所述,没有直接的方法。我已经提出了一个解决方案,但我不知道这是否是一个好的:
使用案例:假设数据库有 解决方案:我在表 问题:我正在复制整个数据。此外,如果搜索查询是关键字的非重音版本,我无法加粗重音关键字。 任何辉煌的建议?我正在使用django与sqlite 尝试使用unicodedata包。以下是Python 3的一个例子: 或者,对于Python 2.7: / p> 这些都将输出: https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize 祝你好运! This question is related to my earlier question Accent insensitive search django sqlite As mentioned in the response there is no direct way to do so. I have come up with a solution, but I am not sure if it is a good one: Use Case: Assume that the database has a table Solution: I add one more column in the table Problem: I am duplicating the whole data. Also, there is no way for me to bold the accented keyword if the search query was the non-accented version of the keyword. Any brilliant suggestions? I am using django with sqlite Try using the unicodedata package. Here's an example for Python 3: Or, for Python 2.7: Either of these will output: Simply replace https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize
https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize Good luck! 这篇关于使用sqlite在django上实现重音不敏感搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! NewsArticles
,其中一列为 ArticleText
。顾名思义, ArticleText
包含包含多个具有重音字符的单词的新闻文章的文本。让我们来说,在 ArticleText
中的一个这样的词出现在主键 aid123
的文章是 Puerto艾森
。现在,用户可以搜索 PuertoAisén或 Puerto Aisen
,并且应该可以使用PK aid123
返回以粗体字(< b> PuertoAisén< / b>
)发现的重音词。 >
normalizedArticleText
中增加了一列,并使其包含 unicode.normalize
(口音删除)版本的文本。现在每当搜索查询出现时,我首先使用 s.decode('ascii')
确定查询是否包含重音字符,然后在相应的列中进行相应的搜索。
import unicodedata
unicodedata.normalize('NFD' (ascii,ignore)
import unicodedata
unicodedata.normalize('NFD',u'répertoire' ascii','ignore')
'repertoire'
répertoire
与您的字符串。 NFD
是正常化的表单
。您可以在这里阅读有关不同形式的规范化的更多信息:
https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize
NewsArticles
with one of the column being ArticleText
. As the name implies ArticleText
contains the text of the news articles which includes several words with accented characters. Let's say one such word present in the ArticleText
for an article with Primary Key aid123
is Puerto Aisén
. Now, a user can search for either Puerto Aisén
or Puerto Aisen
and should be able to get the article with PK aid123
back with the found accented word in bold (<b>Puerto Aisén</b>
).normalizedArticleText
and make it contain the unicode.normalize
(accent removed) version of the text. Now whenever a search query comes, I first determine if the query contains accented character or not by using s.decode('ascii')
and then search accordingly in the corresponding column.import unicodedata
unicodedata.normalize('NFD', 'répertoire').encode('ascii', 'ignore')
import unicodedata
unicodedata.normalize('NFD', u'répertoire').encode('ascii', 'ignore')
'repertoire'
répertoire
with your string. NFD
is a form
of normalization. You can read more on the different forms of normalization here: