用于RethinkDB匹配(regex)查询的Python Unicode转义 [英] Python unicode escape for RethinkDB match (regex) query

查看:189
本文介绍了用于RethinkDB匹配(regex)查询的Python Unicode转义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用逸出的unicode用户提供的搜索参数执行rethinkdb匹配查询:

I am trying to perform a rethinkdb match query with an escaped unicode user provided search param:

import re
from rethinkdb import RethinkDB

r = RethinkDB()

search_value = u"\u05e5"  # provided by user via flask
search_value_escaped = re.escape(search_value)  # results in u'\\\u05e5' ->
    # when encoded with "utf-8" gives "\ץ" as expected.

conn = rethinkdb.connect(...)

results_cursor_a = r.db(...).table(...).order_by(index="id").filter(
    lambda doc: doc.coerce_to("string").match(search_value)
).run(conn)  # search_value works fine

results_cursor_b = r.db(...).table(...).order_by(index="id").filter(
    lambda doc: doc.coerce_to("string").match(search_value_escaped)
).run(conn)  # search_value_escaped spits an error

search_value_escaped的错误如下:

The error for search_value_escaped is the following:

ReqlQueryLogicError: Error in regexp `\ץ` (portion `\ץ`): invalid escape sequence: \ץ in:
r.db(...).table(...).order_by(index="id").filter(lambda var_1: var_1.coerce_to('string').match(u'\\\u05e5m'))
                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         

我在re.escape()之前/之后尝试使用"utf-8"进行编码,但结果相同,但有不同的错误.我在搞什么是我的代码中存在某种东西还是某种错误?

I tried encoding with "utf-8" before/after re.escape() but same results with different errors. What am I messing? Is it something in my code or some kind of a bug?

.coerce_to('string')将文档转换为"utf-8"编码的字符串. RethinkDB还将查询转换为"utf-8",然后将它们匹配,因此第一个查询即使在字符串中看起来像是unicde匹配也可以使用.

.coerce_to('string') converts the document to "utf-8" encoded string. RethinkDB also converts the query to "utf-8" and then it matches them hence the first query works even though it looks like a unicde match inside a string.

推荐答案

从外观上看,RethinkDB拒绝转义的unicode字符,所以我写了一个简单的解决方法,使用自定义转义而不实现我自己的替换字符逻辑(担心我必须错过一个并造成安全问题.

From what it looks like RethinkDB rejects escaped unicode characters so I wrote a simple workaround with a custom escape without implementing my own logic of replacing characters (in fear that I must miss one and create a security issue).

import re

def no_unicode_escape(u):
    escaped_list = []

    for i in u:
        if ord(i) < 128:
            escaped_list.append(re.escape(i))
        else:
            escaped_list.append(i)

    rv = "".join(escaped_list)
    return rv

或单线:

import re

def no_unicode_escape(u):
    return "".join(re.escape(i) if ord(i) < 128 else i for i in u)

可以产生转义危险"字符的所需结果,并可以根据需要与RethinkDB一起使用.

Which yields the required result of escaping "dangerous" characters and works with RethinkDB as I wanted.

这篇关于用于RethinkDB匹配(regex)查询的Python Unicode转义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆