在 Scala 中是否有针对 Elasticsearch 的搜索词清理器的实现? [英] Is there an implementation of a search term sanitizer for Elasticsearch in Scala?
本文介绍了在 Scala 中是否有针对 Elasticsearch 的搜索词清理器的实现?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在寻找一种方法来清理传递给弹性搜索的搜索词,即转义所有控制字符.类似于 这个答案 中 Ruby 中描述的内容.Scala 有这样的东西吗?
I'm looking for a method that would sanitize search terms passed to elastic search, i.e. escape all the control characters. Something like what is described in Ruby in this answer. Is there such a thing for Scala?
推荐答案
我已经翻译了 this answer 到 Scala:
I've translated the solution for ruby found in this answer to Scala:
package util
import java.util.regex.Pattern
trait ElasticSearchSanitizer {
/** Sanitizes special characters and set operators in elastic search search-terms. */
def sanitize(term: String): String = (
escapeSpecialCharacters _ andThen
escapeSetOperators andThen
collapseWhiteSpaces andThen
escapeOddQuote
)(term)
private def escapeSpecialCharacters(term: String): String = {
val escapedCharacters = Pattern.quote("""\/+-&|!(){}[]^~*?:""")
term.replaceAll(s"([$escapedCharacters])", "\\\\$1")
}
private def escapeSetOperators(term: String): String = {
val operators = Set("AND", "OR", "NOT")
operators.foldLeft(term) { case (accTerm, op) =>
val escapedOp = escapeEachCharacter(op)
accTerm.replaceAll(s"""\\b($op)\\b""", escapedOp)
}
}
private def escapeEachCharacter(op: String): String =
op.toCharArray.map(ch => s"""\\\\$ch""").mkString
private def collapseWhiteSpaces(term: String): String = term.replaceAll("""\s+""", " ")
private def escapeOddQuote(term: String): String = {
if (term.count(_ == '"') % 2 == 1) term.replaceAll("""(.*)"(.*)""", """$1\\"$2""") else term
}
}
这里是测试:
package util
import org.specs2.matcher.Matchers
import org.specs2.mutable.Specification
class ElasticSearchSanitizerSpec extends Specification with Matchers {
"sanitize" should {
object S extends ElasticSearchSanitizer
"escape special characters" in {
S.sanitize("""back\slash""") mustEqual """back\\slash"""
S.sanitize("""sl/ash""") mustEqual """sl\/ash"""
S.sanitize("""pl+us""") mustEqual """pl\+us"""
S.sanitize("""mi-nus""") mustEqual """mi\-nus"""
S.sanitize("""amper&sand""") mustEqual """amper\&sand"""
S.sanitize("""pi|pe""") mustEqual """pi\|pe"""
S.sanitize("""ba!ng""") mustEqual """ba\!ng"""
S.sanitize("""open(parenthesis""") mustEqual """open\(parenthesis"""
S.sanitize("""close)parenthesis""") mustEqual """close\)parenthesis"""
S.sanitize("""open{curly""") mustEqual """open\{curly"""
S.sanitize("""close}curly""") mustEqual """close\}curly"""
S.sanitize("""open[bracket""") mustEqual """open\[bracket"""
S.sanitize("""close[bracket""") mustEqual """close\[bracket"""
S.sanitize("""circum^flex""") mustEqual """circum\^flex"""
S.sanitize("""til~de""") mustEqual """til\~de"""
S.sanitize("""aste*risk""") mustEqual """aste\*risk"""
S.sanitize("""ques?tion""") mustEqual """ques\?tion"""
S.sanitize("""co:lon""") mustEqual """co\:lon"""
}
"escape set operators" in {
S.sanitize("gin AND tonic") mustEqual """gin \A\N\D tonic"""
S.sanitize("now OR never") mustEqual """now \O\R never"""
S.sanitize("NOT never") mustEqual """\N\O\T never"""
}
"not escape set operators if part of words" in {
S.sanitize("MANDATE") mustEqual "MANDATE"
S.sanitize("NOTORIOUS") mustEqual "NOTORIOUS"
}
"not escape set operators if lowercase" in {
S.sanitize("and or not") mustEqual "and or not"
}
"collapse excess whitespaces" in {
S.sanitize("Y u no use single \t space??") mustEqual """Y u no use single space\?\?"""
}
"escape last quote if number of quotes is odd" in {
S.sanitize("""Che "Guevarra" wears me" on his t shirt""") mustEqual """Che "Guevarra" wears me\" on his t shirt"""
}
"not escape any quotes if number of quotes even" in {
S.sanitize("""Using these "lasers", we punch a hole in the "ozone layer"... """) mustEqual
"""Using these "lasers", we punch a hole in the "ozone layer"... """
}
}
}
这篇关于在 Scala 中是否有针对 Elasticsearch 的搜索词清理器的实现?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文