在SQLite中搜索时排除HTML标记和一些UNICODE字符 [英] Exclude HTML tags and some UNICODE Characters while searching in SQLite

查看:46
本文介绍了在SQLite中搜索时排除HTML标记和一些UNICODE字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新#4



我已经成功运行了 firstchar 示例,但现在的问题是使用 regex 。即使包含头文件,它也无法识别 regex 运算符。有什么线索可以解决吗?



更新#2



在我的项目中编译了 sqlite3 库。我现在正在寻找任何人来帮助我为 regex 编写函数,将其附加到数据库并从查询中调用它。



更新#3



我已经从。这里是

 外部 C 
void
Java_com_kfmwa916_testapp_DatabaseHandler_createFunction(){
sqlite3 * D b;
//打开数据库
sqlite3_open( MyDBName.db,& db);

//将函数附加到数据库
sqlite3_create_function(db, firstchar,1,SQLITE_UTF8,NULL,& firstchar,NULL,NULL);
}

firstchar 函数是,

 静态无效的firstchar(sqlite3_context * context,int argc,sqlite3_value ** argv){
if(argc == 1){
char * text =(char *)sqlite3_value_text(argv [0]);
if(text&& text [0]){
char result [2];
结果[0] =文本[0];结果[1] =‘\0’;
sqlite3_result_text(context,result,-1,SQLITE_TRANSIENT);
的回报;
}
}
sqlite3_result_null(context);
}

在我的查询中使用的

 从虚拟

中选择firstchar(text)

但是它给出了错误

 没有这样的函数firstchar()

我们非常感谢您提供任何帮助。



原始问题



我正在使用在SQLite中以UNICODE格式保存的阿拉伯语。我想执行搜索。但是有一个问题。



假设文本为



 < html> < head> < style> @ font-face {font-family: Al_Mushaf; src:url(’fonts / al_mushaf.ttf’); } @ font-face {font-family: Jameel Noori Nastaleeq; src:url(’fonts / jameel_noori.ttf’); }< / style> < / head> < body> < h3 style =’font-family: Al_Mushaf'> صحابہکرامکاانبیائےکرامکیسنتپرعملمیٹھےمیٹھےاسلامیبھائیو!صدائےمدینہلگاناانبیائےکرامعلیہمالسلامکیاسقدرپیاریسنتہےکہصحابۂکرامعلیہمالرضواننےبھیاسےخوباپنایااوروہبھیحضرتسیدناداؤدعلیہالسلامکیطرحاپنےگھروالوں کوجگایاکرتےجیساکہحضرتسیدناعبداللہبنعمررضیاللہتعالیعنہمافرماتےہیںکہمیرےوالدمحترمامیرالمومنینحضرتسیدناعمرفاروقاعظمرضیاللہتعالیعنہراتمیںجسقدرربتعالیچاہتا,نمازپڑھتےرہتے,یہاںتککہجبراتکاآخریوقت电子邮件:电子邮件:电子邮件: پھریہآیتمبارکہتلاوتفرماتے:وامراہلکبالصلوۃواصطبرعلیہالانسلکرزقانحننرزقکوالعقبۃللتقوی(پ16,طه:132)< / H3> < / body> < / html>  



它存储在SQLite数据库中。现在我要搜索 html ,它将返回结果,如果我搜索مبارکہ,则不会返回结果因为在实际文本中,它是 مُبارَکَ (带有这些额外的UNICODE)。



我想忽略所有HTML标记以及搜索时这些额外的UNICODE字符,以使 html 不应返回结果,而مبارکہ应返回结果。



到目前为止我发现的内容


  1. 添加多余的列并放入将文本剥离到其中然后搜索(我不能这样做,因为有成千上万的书,它们将占用大量内存)


  2. UDF类似SQL (我找不到任何合适的示例/教程来实现它)


  3. 使用REGEXP(我还不能弄清楚如何做到这一点,我只是知道我必须自己实现它)


  4. 使用LIKE和GLOB运算符和通配符的SQL查询。


我被困了两天找不到有效的解决方案。选项#4是可取的,但任何可行的解决方案都可以解决问题。



同时,我必须保持应用程序内存的效率和优化的搜索。



我们非常感谢您的帮助。



更新



我使正则表达式忽略html标签和样式标签之间的文本,并在 REGEXP 中将其用于查询。



现在有两个问题,


  1. 我也想忽略这些多余的字符。我知道他们的UNICODE,只需要知道如何在正则表达式中附加它即可。这是我的正则表达式;



    (?![^<] *>)(?!&style; ^ [>] * ?>)(TEXT)(?![^<] *?< \ / style>)


  2. 我在查询中使用过它,例如



    从虚拟WHERE文本REGEXP< myregex>

    它没有给出错误,但也没有返回期望的结果。



解决方案

原始问题的答案



注意:正如我最近所学到的,我可能在很多地方都错了,请纠正我的错误



有两种解决方案


  1. 在SQLite查询中使用 REGEXP 运算符

  2. 使用NDK来实现自己的用户定义函数

第一个问题是它返回 true false ,但我需要数据。两种方法的问题在于您必须在Android项目中使用C / C ++库。因此,我决定创建自己的用户定义函数。



您可以找到许多有关如何在项目中使用NDK的教程,但找不到使用它的完整示例。您项目中的第三方图书馆。



经过大量的搜索/研究,我将许多不同地方的东西组合在一起,从而能够完成我的任务。以下是有关如何操作的一些步骤。我还打算编写一个完整的分步教程。



准备好一切


  1. 首先,您需要要在项目中使用的库。就我而言,我需要 sqlite3 混合库,可以从这里。将它们提取到项目的 cpp 文件夹中。

  2. 您可能已经熟悉 CMakeLists.txt 文件。现在该将这些库添加到 CMakeLists.txt 文件中。为此,转到 Project 窗格,您将在其中及其内部看到外部构建文件所需的文件。打开并按如下所示对其进行编辑,

 #设置构建本机#库所需的CMake最低版本。您应该保留默认值,或者仅传递3.4.0或更小的值。set(CMAKE_CXX_FLAGS $ {CMAKE_CXX_FLAGS} -std = c ++ 11)cmake_minimum_required(版本3.4.1)#创建并命名一个库,将其设置为STATIC#或SHARED,并提供其源代码的相对路径。#您可以定义多个库,然后CMake会为您构建它。#Gradle会自动将共享库与APK打包.add_library(#设置名称库-native-lib#将库设置为共享库SHARED#提供源文件的相对路径#自动包含与源文件相同位置的关联头src / main / cpp /native-lib.cpp)include_directories($ {CMAKE_SOURCE_DIR} / src)add_library(sqlite3 STATIC src / main / cpp / sqlite3.c src / main / cpp / sqlite3.h src / main / cpp / sqlite3ext.h)add_executable( sqlite src / main / cpp / sqlite3.c src / main / cpp / sqlite3.h src / main / cpp /sqlite3ext.h)set_target_properties(sqlite PROPERTIES OUTPUT_NAME sqlite3)#搜索指定的预构建库并将路径存储为a#变量。由于默认情况下搜索路径中包含系统库,因此您只需要指定要添加的公共NDK库的名称即可。 CMake在完成构建之前会先验证该库是否存在#完成其build.find_library(#设置路径变量的名称。log-lib#指定您要CMake定位的NDK库的名称。log)#指定CMake应该链接的库到您的目标库。您可以链接多个库,例如您在构建脚本中定义的库,预构建的第三方库或系统库。target_link_libraries(#指定目标库。native-lib#将目标库链接到包含的日志库#在NDK中。sqlite3日志) 



您必须先添加库使用 add_library ,然后将其链接到您所创建的类,默认情况下,它名为 native-lib.cpp


  1. 构建您的项目,您就可以开始了。

实现功能



现在是主要部分。打开 native-lib.cpp 并包含必需的文件标头。您必须执行的操作;


  1. 使您可以通过 YourActivity.java 。看到文件后,您便会知道模式。就我而言,它是 Java_com_kfmwa916_testapp_SearchResult_createFunction(JNIEnv * env,jobject object,jstring search)其中

Java 是关键字



com_kfmwa916_testapp 是您的软件包



SearchResult 是您的 Java



createFunction 是函数的名称。


  1. 创建函数。就我而言,我必须在搜索中应用某些 regex 。这是我的

      static void strip_text(sqlite3_context * context,int argc,sqlite3_value ** argv){
    如果(argc == 1){
    __android_log_print(ANDROID_LOG_VERBOSE, TAG, inside strip_text);
    char * result =(char *)sqlite3_value_text(argv [0]);
    std :: string text(result);
    std :: regex regex_head( YOUR REGEX);

    if(!text.empty()){
    text = std :: regex_replace(text,regex_head,);

    sqlite3_result_text(context,text.c_str(),-1,SQLITE_TRANSIENT);

    __android_log_print(ANDROID_LOG_VERBOSE, STRIPPED TEXT,%s,text.c_str());
    的回报;
    }

    }
    sqlite3_result_null(上下文);

    }


  2. 创建 sqlite3 实例,打开数据库,将此函数附加到数据库并在查询中使用它。这是一个代码段

     外部 C 
    void
    Java_com_kfmwa916_testapp_SearchResult_createFunction(JNIEnv * env,jobject对象,jstring search){
    const char * search_term = env-> GetStringUTFChars(search,0);
    env-> ReleaseStringUTFChars(search,search_term);

    std :: string q(search_term);

    std :: string查询=从虚拟的WHERE LIKE('% + q +%',strip_text(text))= 1中选择文本

    __android_log_print(ANDROID_LOG_VERBOSE, TAG,%s,query.c_str());

    // GetJStringContent(env,search,search_term);
    sqlite3 * db;
    //打开数据库
    __android_log_print(ANDROID_LOG_VERBOSE, TAG,正在打开数据库);
    int rc = sqlite3_open( / data / data / com.kfmwa916.testapp / databases / MyDBName.db,& db);
    //最好检查 rc是否存在错误。

    //将函数附加到数据库
    __android_log_print(ANDROID_LOG_VERBOSE, TAG,附加函数);
    rc = sqlite3_create_function(db, strip_text,1,SQLITE_ANY,NULL,& strip_text,NULL,NULL);

    __android_log_print(ANDROID_LOG_VERBOSE, TAG,执行查询);
    rc = sqlite3_exec(db,query.c_str(),callback,NULL,NULL);

    }


  3. 实施回调函数来处理结果。看起来应该像

     静态int回调(无效* NotUsed,int argc,char ** argv,char ** azColName){ 
    __android_log_print(ANDROID_LOG_VERBOSE, TAG, FOUND);
    int i;
    for(i = 0; i< argc; ++ i){
    __android_log_print(ANDROID_LOG_VERBOSE, TAG,%s =%sn,azColName [i],argv [i]吗? argv [i]: NULL);
    }
    返回0;
    }


  4. 最后使用您的 Java 类,以我为例,它是 SearchResult 加载库并定义函数。

      static {
    System.loadLibrary( native-lib);
    }

    public native void createFunction(String search);


并在需要的地方调用它。假设像 createFunction(searchterm)

这样的按钮的 onClickEvent $ b

此职位开放以供更正和修改。


UPDATE # 4

I've successfully run the firstchar example, but now the problem is with using regex. Even after including header file, it is not recognizing regex operator. Any clue how can this be resolved?

UPDATE # 2

I've compiled sqlite3 library in my project. I am now looking for anyone to help me with writing a function for my regex, attach it to database and call it from query.

UPDATE # 3

I've written some code from this example. Here it is

extern "C"
void
Java_com_kfmwa916_testapp_DatabaseHandler_createFunction() {
    sqlite3 *db;
    //Open database
    sqlite3_open("MyDBName.db", &db);

    //Attach function to database
    sqlite3_create_function(db, "firstchar", 1, SQLITE_UTF8, NULL, &firstchar, NULL, NULL);
}

And firstchar function is,

static void firstchar(sqlite3_context *context, int argc, sqlite3_value **argv) {
    if (argc == 1) {
        char *text = (char *) sqlite3_value_text(argv[0]);
        if (text && text[0]) {
            char result[2];
            result[0] = text[0]; result[1] = '\0';
            sqlite3_result_text(context, result, -1, SQLITE_TRANSIENT);
            return;
        }
    }
    sqlite3_result_null(context);
}

And the used it in my query like

SELECT firstchar(text) FROM dummy

But it is giving error

no such function firstchar()

Any help is highly appreciated.

Original Question

I am working with Arabic Language saved in UNICODE Format in SQLite. I want to implement a search. But there's a problem.

Let's say the text is

<html>
  <head>
    <style>
      @font-face {
        font-family: "Al_Mushaf";
        src: url('fonts/al_mushaf.ttf');
      }
      @font-face {
        font-family: "Jameel Noori Nastaleeq";
        src: url('fonts/jameel_noori.ttf');
      }
    </style>
  </head>
  
  <body>
    <h3 style='font-family:"Al_Mushaf"'>
            صحابہ کرام کا انبیائے کرام کی سنّت پر عمل
            میٹھے میٹھے اسلامی بھائیو!صدائے مدینہ لگانا انبیائے کِرام عَلَیْہِمُ السَّلَام کی اس قَدْر پیاری سنّت ہے کہ صحابۂ کِرام عَلَیْہِمُ الرِّضْوَان نے بھی اسے خُوب اپنایا اور وہ بھی حضرت سَیِّدُنا داؤد عَلَیْہِ السَّلَام کی طرح اپنے گھر والوں کو جگایا کرتے جیسا کہ حضرت سَیِّدُنا عبد اللہ بن عُمَر رَضِیَاللّٰہُ تَعَالٰی عَنْہُما فرماتے ہیں کہ میرے والِدِ مُحْتَرَم اَمِیرُ الْمُوْمِنِین حضرت سَیِّدُنا عُمَر فَارُوقِ اَعْظَم رَضِیَاللّٰہُ تَعَالٰی عَنْہ رات میں جس قَدْر ربّ تعالیٰ چاہتا،نَماز پڑھتے رہتے،یہاں تک کہ جب رات کا آخری وَقْت ہوتا تو اپنے گھر والوں کو بھی نَماز کے لیے جگا دیتے اور ان سے فرماتے: اَلصَّلٰوة یعنی نماز۔ پھر یہ آیت مُبارَکہ تِلاوَت فرماتے:
            وَاۡمُرْ اَہۡلَکَ بِالصَّلٰوۃِ وَ اصْطَبِرْ عَلَیۡہَا ؕ لَا نَسْـَٔلُکَ رِزْقًا ؕ نَحْنُ نَرْزُقُکَ ؕ وَالْعٰقِبَۃُ لِلتَّقْوٰی (پ۱۶،طٰهٰ:۱۳۲)
    </h3>
  </body>
  
</html>

And it is stored in SQLite Database. Now I want to search html, it will return the result and if I search مبارکہ it won't return a result because in actual text, it is مُبارَکہ (with these extra UNICODE).

I want to ignore all HTML tags and these extra UNICODE Characters while searching so that html shouldn't return a result while مبارکہ should return a result.

What I found so far;

  1. Make extra column and put stripped text into it and then search (I can't do it because there are thousands of books and they will take a lot of memory)

  2. UDF Like SQL (I couldn't find any suitable example/tutorial to implement it)

  3. Using REGEXP (I couldn't figure it out yet how to do this, I just know that I've to implement it myself)

  4. SQL query using LIKE and GLOB operators and wildcard characters.

I'm stuck for two days and couldn't find a working solution. Option#4 is desirable but any working solution will do the charm.

Meanwhile, I've to keep application memory efficient and optimized searching.

Any help is highly appreciated.

UPDATE

I've made regex to ignore html tags and text between style tag and used it in query with REGEXP.

Now there are two problems,

  1. I want to ignore these extra characters too. I know their UNICODEs, just need to know how to append it in the regex. This is my regex;

    (?![^<]*>)(?!<style[^>]*?>)(TEXT)(?![^<]*?<\/style>)

  2. I've used it in query like

    SELECT text FROM dummy WHERE text REGEXP <myregex>

    It's not giving an error but not returning the desired result too.

解决方案

Answer to Original Question

NOTE: As I have recently learned it, I maybe wrong at many places, kindly correct my mistakes

There are two solutions

  1. Use REGEXP Operator with SQLite Query
  2. Implement your own User Defined Function using NDK

The problem with first one is that it returns either true or false but I need data. And the problem with both methods is that you have to use C/C++ Library in your Android Project. So I decided to create my own user defined function.

You can find many tutorials on how to use NDK in your project, but won't find any complete example of using 3rd Party libraries in your project.

After a lot of searching/studying, I combined things from many different places and was able to complete my task. Below are some steps on how you can do it. I also intend to write a complete step-by-step tutorial.

Getting things ready

  1. First you need libraries which you want to use in your project. In my case, I need sqlite3 amalgamated library, which can be downloaded from here. Extract them in cpp folder of your project.
  2. You might have familiar with CMakeLists.txt file by now when you included NDK in your project. It's time to add these libraries in CMakeLists.txt file. For that, go to your Project pane, you'll see External Build Files there and inside it you'll see the desired file. Open it and edit it as follows,

# Sets the minimum version of CMake required to build the native
# library. You should either keep the default value or only pass a
# value of 3.4.0 or lower.

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
cmake_minimum_required(VERSION 3.4.1)

# Creates and names a library, sets it as either STATIC
# or SHARED, and provides the relative paths to its source code.
# You can define multiple libraries, and CMake builds it for you.
# Gradle automatically packages shared libraries with your APK.

add_library( # Sets the name of the library.
             native-lib

             # Sets the library as a shared library.
             SHARED

             # Provides a relative path to your source file(s).
             # Associated headers in the same location as their source
             # file are automatically included.
             src/main/cpp/native-lib.cpp )

include_directories(${CMAKE_SOURCE_DIR}/src)
add_library(sqlite3 STATIC src/main/cpp/sqlite3.c src/main/cpp/sqlite3.h src/main/cpp/sqlite3ext.h)
add_executable(sqlite src/main/cpp/sqlite3.c src/main/cpp/sqlite3.h src/main/cpp/sqlite3ext.h)
set_target_properties(sqlite PROPERTIES OUTPUT_NAME sqlite3)

# Searches for a specified prebuilt library and stores the path as a
# variable. Because system libraries are included in the search path by
# default, you only need to specify the name of the public NDK library
# you want to add. CMake verifies that the library exists before
# completing its build.


find_library( # Sets the name of the path variable.
              log-lib

              # Specifies the name of the NDK library that
              # you want CMake to locate.
              log )

# Specifies libraries CMake should link to your target library. You
# can link multiple libraries, such as libraries you define in the
# build script, prebuilt third-party libraries, or system libraries.

target_link_libraries( # Specifies the target library.
                       native-lib

                       # Links the target library to the log library
                       # included in the NDK.
                       sqlite3
                       log )

You have to first add libraries using add_library and then link it to your class which you've made, it is by default named native-lib.cpp.

  1. Build your project and you are ready to go.

Implementing Function

Now here comes the main part. Open native-lib.cpp and include required files and headers. What you have to do;

  1. Make a function which you will be calling from YourActivity.java. You'll know the pattern once you see your file. In my case it is Java_com_kfmwa916_testapp_SearchResult_createFunction(JNIEnv * env, jobject object, jstring search) where

Java is a keyword

com_kfmwa916_testapp is your package

SearchResult is your Java class

createFunction is the name of the function.

  1. Create your function. In my case, I have to apply certain regex in search. Here is mine,

    static void strip_text(sqlite3_context *context, int argc, sqlite3_value **argv) {
    if(argc == 1) {
        __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "inside strip_text");
        char *result = (char *) sqlite3_value_text(argv[0]);
        std::string text(result);
        std::regex regex_head("YOUR REGEX");
    
        if (!text.empty()) {
            text = std::regex_replace(text, regex_head, "");
    
            sqlite3_result_text(context, text.c_str(), -1, SQLITE_TRANSIENT);
    
            __android_log_print(ANDROID_LOG_VERBOSE, "STRIPPED TEXT", "%s", text.c_str());
            return;
            }
    
        }
        sqlite3_result_null(context);
    
    }
    

  2. Create sqlite3 instance, open database, attach this function to database and use it in your query. Here is a code snippet

    extern "C"
    void
    Java_com_kfmwa916_testapp_SearchResult_createFunction(JNIEnv * env, jobject object, jstring search) {
        const char * search_term = env->GetStringUTFChars(search, 0);
        env->ReleaseStringUTFChars(search, search_term);
    
        std::string q(search_term);
    
        std::string query = "SELECT text FROM dummy WHERE LIKE('%" + q + "%', strip_text(text))=1";
    
        __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "%s", query.c_str());
    
        //GetJStringContent(env, search, search_term);
        sqlite3 *db;
        //Open database
        __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "Opening database");
        int rc = sqlite3_open("/data/data/com.kfmwa916.testapp/databases/MyDBName.db", &db);
        //It'll be good to check 'rc' for error(s).
    
        //Attach function to database
        __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "Attaching function");
        rc = sqlite3_create_function(db, "strip_text", 1, SQLITE_ANY, NULL, &strip_text, NULL, NULL);
    
        __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "Executing query");
        rc = sqlite3_exec(db, query.c_str(), callback, NULL, NULL);
    
    }
    

  3. Implement callback function to process result. It should look like

    static int callback(void *NotUsed, int argc, char **argv, char **azColName)                      {
        __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "FOUND");
        int i;
        for (i = 0; i < argc; ++i) {
            __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "%s = %sn", azColName[i], argv[i] ? argv[i] : "NULL");
        }
        return 0;
    }
    

  4. Finally come to your Java class, in my case, it is SearchResult Load library and define function.

    static {
        System.loadLibrary("native-lib");
    }
    
    public native void createFunction(String search);
    

and call it where you want it. Let's say onClickEvent of a button like createFunction(searchterm)

Post is open for correction and modification.

这篇关于在SQLite中搜索时排除HTML标记和一些UNICODE字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆