在SQLite中搜索时排除HTML标记和一些UNICODE字符 [英] Exclude HTML tags and some UNICODE Characters while searching in SQLite
问题描述
更新#4
我已经成功运行了 firstchar
示例,但现在的问题是使用 regex
。即使包含头文件,它也无法识别 regex
运算符。有什么线索可以解决吗?
更新#2
在我的项目中编译了 sqlite3
库。我现在正在寻找任何人来帮助我为 regex
编写函数,将其附加到数据库并从查询中调用它。
更新#3
我已经从。这里是
外部 C
void
Java_com_kfmwa916_testapp_DatabaseHandler_createFunction(){
sqlite3 * D b;
//打开数据库
sqlite3_open( MyDBName.db,& db);
//将函数附加到数据库
sqlite3_create_function(db, firstchar,1,SQLITE_UTF8,NULL,& firstchar,NULL,NULL);
}
和 firstchar
函数是,
静态无效的firstchar(sqlite3_context * context,int argc,sqlite3_value ** argv){
if(argc == 1){
char * text =(char *)sqlite3_value_text(argv [0]);
if(text&& text [0]){
char result [2];
结果[0] =文本[0];结果[1] =‘\0’;
sqlite3_result_text(context,result,-1,SQLITE_TRANSIENT);
的回报;
}
}
sqlite3_result_null(context);
}
在我的查询中使用的
从虚拟
中选择firstchar(text)
但是它给出了错误
没有这样的函数firstchar()
我们非常感谢您提供任何帮助。
原始问题
我正在使用在SQLite中以UNICODE格式保存的阿拉伯语。我想执行搜索。但是有一个问题。
假设文本为
< html> < head> < style> @ font-face {font-family: Al_Mushaf; src:url(’fonts / al_mushaf.ttf’); } @ font-face {font-family: Jameel Noori Nastaleeq; src:url(’fonts / jameel_noori.ttf’); }< / style> < / head> < body> < h3 style =’font-family: Al_Mushaf'> صحابہکرامکاانبیائےکرامکیسنتپرعملمیٹھےمیٹھےاسلامیبھائیو!صدائےمدینہلگاناانبیائےکرامعلیہمالسلامکیاسقدرپیاریسنتہےکہصحابۂکرامعلیہمالرضواننےبھیاسےخوباپنایااوروہبھیحضرتسیدناداؤدعلیہالسلامکیطرحاپنےگھروالوں کوجگایاکرتےجیساکہحضرتسیدناعبداللہبنعمررضیاللہتعالیعنہمافرماتےہیںکہمیرےوالدمحترمامیرالمومنینحضرتسیدناعمرفاروقاعظمرضیاللہتعالیعنہراتمیںجسقدرربتعالیچاہتا,نمازپڑھتےرہتے,یہاںتککہجبراتکاآخریوقت电子邮件:电子邮件:电子邮件: پھریہآیتمبارکہتلاوتفرماتے:وامراہلکبالصلوۃواصطبرعلیہالانسلکرزقانحننرزقکوالعقبۃللتقوی(پ16,طه:132)< / H3> < / body> < / html>
它存储在SQLite数据库中。现在我要搜索 html
,它将返回结果,如果我搜索مبارکہ
,则不会返回结果因为在实际文本中,它是 مُبارَکَ
(带有这些额外的UNICODE)。
我想忽略所有HTML标记以及搜索时这些额外的UNICODE字符,以使 html
不应返回结果,而مبارکہ
应返回结果。
到目前为止我发现的内容
-
添加多余的列并放入将文本剥离到其中然后搜索(我不能这样做,因为有成千上万的书,它们将占用大量内存)
-
UDF类似SQL (我找不到任何合适的示例/教程来实现它)
-
使用REGEXP(我还不能弄清楚如何做到这一点,我只是知道我必须自己实现它)
-
使用LIKE和GLOB运算符和通配符的SQL查询。
我被困了两天找不到有效的解决方案。选项#4是可取的,但任何可行的解决方案都可以解决问题。
同时,我必须保持应用程序内存的效率和优化的搜索。
我们非常感谢您的帮助。
更新
我使正则表达式忽略html标签和样式标签之间的文本,并在 REGEXP
中将其用于查询。
现在有两个问题,
-
我也想忽略这些多余的字符。我知道他们的UNICODE,只需要知道如何在正则表达式中附加它即可。这是我的正则表达式;
(?![^<] *>)(?!&style; ^ [>] * ?>)(TEXT)(?![^<] *?< \ / style>)
-
我在查询中使用过它,例如
从虚拟WHERE文本REGEXP< myregex>
它没有给出错误,但也没有返回期望的结果。
原始问题的答案
注意:正如我最近所学到的,我可能在很多地方都错了,请纠正我的错误
有两种解决方案
- 在SQLite查询中使用
REGEXP
运算符 - 使用NDK来实现自己的用户定义函数
第一个问题是它返回 true
或 false
,但我需要数据。两种方法的问题在于您必须在Android项目中使用C / C ++库。因此,我决定创建自己的用户定义函数。
您可以找到许多有关如何在项目中使用NDK的教程,但找不到使用它的完整示例。您项目中的第三方图书馆。
经过大量的搜索/研究,我将许多不同地方的东西组合在一起,从而能够完成我的任务。以下是有关如何操作的一些步骤。我还打算编写一个完整的分步教程。
准备好一切
- 首先,您需要要在项目中使用的库。就我而言,我需要
sqlite3
混合库,可以从这里。将它们提取到项目的cpp
文件夹中。 - 您可能已经熟悉
CMakeLists.txt $现在,当您在项目中包含NDK时,c $ c>文件。现在该将这些库添加到
CMakeLists.txt
文件中。为此,转到Project
窗格,您将在其中及其内部看到外部构建文件
所需的文件。打开并按如下所示对其进行编辑,
#设置构建本机#库所需的CMake最低版本。您应该保留默认值,或者仅传递3.4.0或更小的值。set(CMAKE_CXX_FLAGS $ {CMAKE_CXX_FLAGS} -std = c ++ 11)cmake_minimum_required(版本3.4.1)#创建并命名一个库,将其设置为STATIC#或SHARED,并提供其源代码的相对路径。#您可以定义多个库,然后CMake会为您构建它。#Gradle会自动将共享库与APK打包.add_library(#设置名称库-native-lib#将库设置为共享库SHARED#提供源文件的相对路径#自动包含与源文件相同位置的关联头src / main / cpp /native-lib.cpp)include_directories($ {CMAKE_SOURCE_DIR} / src)add_library(sqlite3 STATIC src / main / cpp / sqlite3.c src / main / cpp / sqlite3.h src / main / cpp / sqlite3ext.h)add_executable( sqlite src / main / cpp / sqlite3.c src / main / cpp / sqlite3.h src / main / cpp /sqlite3ext.h)set_target_properties(sqlite PROPERTIES OUTPUT_NAME sqlite3)#搜索指定的预构建库并将路径存储为a#变量。由于默认情况下搜索路径中包含系统库,因此您只需要指定要添加的公共NDK库的名称即可。 CMake在完成构建之前会先验证该库是否存在#完成其build.find_library(#设置路径变量的名称。log-lib#指定您要CMake定位的NDK库的名称。log)#指定CMake应该链接的库到您的目标库。您可以链接多个库,例如您在构建脚本中定义的库,预构建的第三方库或系统库。target_link_libraries(#指定目标库。native-lib#将目标库链接到包含的日志库#在NDK中。sqlite3日志)
您必须先添加库使用 add_library
,然后将其链接到您所创建的类,默认情况下,它名为 native-lib.cpp
。
- 构建您的项目,您就可以开始了。
实现功能
现在是主要部分。打开 native-lib.cpp
并包含必需的文件
和标头
。您必须执行的操作;
- 使您可以通过
YourActivity.java $ c调用的函数$ c>。看到文件后,您便会知道模式。就我而言,它是
Java_com_kfmwa916_testapp_SearchResult_createFunction(JNIEnv * env,jobject object,jstring search)
其中
Java
是关键字
com_kfmwa916_testapp
是您的软件包
SearchResult
是您的 Java
类
createFunction
是函数的名称。
-
创建函数。就我而言,我必须在搜索中应用某些
regex
。这是我的static void strip_text(sqlite3_context * context,int argc,sqlite3_value ** argv){
如果(argc == 1){
__android_log_print(ANDROID_LOG_VERBOSE, TAG, inside strip_text);
char * result =(char *)sqlite3_value_text(argv [0]);
std :: string text(result);
std :: regex regex_head( YOUR REGEX);
if(!text.empty()){
text = std :: regex_replace(text,regex_head,);
sqlite3_result_text(context,text.c_str(),-1,SQLITE_TRANSIENT);
__android_log_print(ANDROID_LOG_VERBOSE, STRIPPED TEXT,%s,text.c_str());
的回报;
}
}
sqlite3_result_null(上下文);
}
-
创建
sqlite3
实例,打开数据库,将此函数附加到数据库并在查询中使用它。这是一个代码段外部 C
void
Java_com_kfmwa916_testapp_SearchResult_createFunction(JNIEnv * env,jobject对象,jstring search){
const char * search_term = env-> GetStringUTFChars(search,0);
env-> ReleaseStringUTFChars(search,search_term);
std :: string q(search_term);
std :: string查询=从虚拟的WHERE LIKE('% + q +%',strip_text(text))= 1中选择文本
__android_log_print(ANDROID_LOG_VERBOSE, TAG,%s,query.c_str());
// GetJStringContent(env,search,search_term);
sqlite3 * db;
//打开数据库
__android_log_print(ANDROID_LOG_VERBOSE, TAG,正在打开数据库);
int rc = sqlite3_open( / data / data / com.kfmwa916.testapp / databases / MyDBName.db,& db);
//最好检查 rc是否存在错误。
//将函数附加到数据库
__android_log_print(ANDROID_LOG_VERBOSE, TAG,附加函数);
rc = sqlite3_create_function(db, strip_text,1,SQLITE_ANY,NULL,& strip_text,NULL,NULL);
__android_log_print(ANDROID_LOG_VERBOSE, TAG,执行查询);
rc = sqlite3_exec(db,query.c_str(),callback,NULL,NULL);
}
-
实施
回调
函数来处理结果。看起来应该像静态int回调(无效* NotUsed,int argc,char ** argv,char ** azColName){
__android_log_print(ANDROID_LOG_VERBOSE, TAG, FOUND);
int i;
for(i = 0; i< argc; ++ i){
__android_log_print(ANDROID_LOG_VERBOSE, TAG,%s =%sn,azColName [i],argv [i]吗? argv [i]: NULL);
}
返回0;
}
-
最后使用您的
Java
类,以我为例,它是SearchResult
加载库并定义函数。static {
System.loadLibrary( native-lib);
}
public native void createFunction(String search);
并在需要的地方调用它。假设像 createFunction(searchterm)
onClickEvent
$ b 此职位开放以供更正和修改。
UPDATE # 4
I've successfully run the firstchar
example, but now the problem is with using regex
. Even after including header file, it is not recognizing regex
operator. Any clue how can this be resolved?
UPDATE # 2
I've compiled sqlite3
library in my project. I am now looking for anyone to help me with writing a function for my regex
, attach it to database and call it from query.
UPDATE # 3
I've written some code from this example. Here it is
extern "C"
void
Java_com_kfmwa916_testapp_DatabaseHandler_createFunction() {
sqlite3 *db;
//Open database
sqlite3_open("MyDBName.db", &db);
//Attach function to database
sqlite3_create_function(db, "firstchar", 1, SQLITE_UTF8, NULL, &firstchar, NULL, NULL);
}
And firstchar
function is,
static void firstchar(sqlite3_context *context, int argc, sqlite3_value **argv) {
if (argc == 1) {
char *text = (char *) sqlite3_value_text(argv[0]);
if (text && text[0]) {
char result[2];
result[0] = text[0]; result[1] = '\0';
sqlite3_result_text(context, result, -1, SQLITE_TRANSIENT);
return;
}
}
sqlite3_result_null(context);
}
And the used it in my query like
SELECT firstchar(text) FROM dummy
But it is giving error
no such function firstchar()
Any help is highly appreciated.
Original Question
I am working with Arabic Language saved in UNICODE Format in SQLite. I want to implement a search. But there's a problem.
Let's say the text is
<html>
<head>
<style>
@font-face {
font-family: "Al_Mushaf";
src: url('fonts/al_mushaf.ttf');
}
@font-face {
font-family: "Jameel Noori Nastaleeq";
src: url('fonts/jameel_noori.ttf');
}
</style>
</head>
<body>
<h3 style='font-family:"Al_Mushaf"'>
صحابہ کرام کا انبیائے کرام کی سنّت پر عمل
میٹھے میٹھے اسلامی بھائیو!صدائے مدینہ لگانا انبیائے کِرام عَلَیْہِمُ السَّلَام کی اس قَدْر پیاری سنّت ہے کہ صحابۂ کِرام عَلَیْہِمُ الرِّضْوَان نے بھی اسے خُوب اپنایا اور وہ بھی حضرت سَیِّدُنا داؤد عَلَیْہِ السَّلَام کی طرح اپنے گھر والوں کو جگایا کرتے جیسا کہ حضرت سَیِّدُنا عبد اللہ بن عُمَر رَضِیَاللّٰہُ تَعَالٰی عَنْہُما فرماتے ہیں کہ میرے والِدِ مُحْتَرَم اَمِیرُ الْمُوْمِنِین حضرت سَیِّدُنا عُمَر فَارُوقِ اَعْظَم رَضِیَاللّٰہُ تَعَالٰی عَنْہ رات میں جس قَدْر ربّ تعالیٰ چاہتا،نَماز پڑھتے رہتے،یہاں تک کہ جب رات کا آخری وَقْت ہوتا تو اپنے گھر والوں کو بھی نَماز کے لیے جگا دیتے اور ان سے فرماتے: اَلصَّلٰوة یعنی نماز۔ پھر یہ آیت مُبارَکہ تِلاوَت فرماتے:
وَاۡمُرْ اَہۡلَکَ بِالصَّلٰوۃِ وَ اصْطَبِرْ عَلَیۡہَا ؕ لَا نَسْـَٔلُکَ رِزْقًا ؕ نَحْنُ نَرْزُقُکَ ؕ وَالْعٰقِبَۃُ لِلتَّقْوٰی (پ۱۶،طٰهٰ:۱۳۲)
</h3>
</body>
</html>
And it is stored in SQLite Database. Now I want to search html
, it will return the result and if I search مبارکہ
it won't return a result because in actual text, it is مُبارَکہ
(with these extra UNICODE).
I want to ignore all HTML tags and these extra UNICODE Characters while searching so that html
shouldn't return a result while مبارکہ
should return a result.
What I found so far;
Make extra column and put stripped text into it and then search (I can't do it because there are thousands of books and they will take a lot of memory)
UDF Like SQL (I couldn't find any suitable example/tutorial to implement it)
Using REGEXP (I couldn't figure it out yet how to do this, I just know that I've to implement it myself)
SQL query using LIKE and GLOB operators and wildcard characters.
I'm stuck for two days and couldn't find a working solution. Option#4 is desirable but any working solution will do the charm.
Meanwhile, I've to keep application memory efficient and optimized searching.
Any help is highly appreciated.
UPDATE
I've made regex to ignore html tags and text between style tag and used it in query with REGEXP
.
Now there are two problems,
I want to ignore these extra characters too. I know their UNICODEs, just need to know how to append it in the regex. This is my regex;
(?![^<]*>)(?!<style[^>]*?>)(TEXT)(?![^<]*?<\/style>)
I've used it in query like
SELECT text FROM dummy WHERE text REGEXP <myregex>
It's not giving an error but not returning the desired result too.
Answer to Original Question
NOTE: As I have recently learned it, I maybe wrong at many places, kindly correct my mistakes
There are two solutions
- Use
REGEXP
Operator with SQLite Query - Implement your own User Defined Function using NDK
The problem with first one is that it returns either true
or false
but I need data. And the problem with both methods is that you have to use C/C++ Library in your Android Project. So I decided to create my own user defined function.
You can find many tutorials on how to use NDK in your project, but won't find any complete example of using 3rd Party libraries in your project.
After a lot of searching/studying, I combined things from many different places and was able to complete my task. Below are some steps on how you can do it. I also intend to write a complete step-by-step tutorial.
Getting things ready
- First you need libraries which you want to use in your project. In my case, I need
sqlite3
amalgamated library, which can be downloaded from here. Extract them incpp
folder of your project. - You might have familiar with
CMakeLists.txt
file by now when you included NDK in your project. It's time to add these libraries inCMakeLists.txt
file. For that, go to yourProject
pane, you'll seeExternal Build Files
there and inside it you'll see the desired file. Open it and edit it as follows,
# Sets the minimum version of CMake required to build the native
# library. You should either keep the default value or only pass a
# value of 3.4.0 or lower.
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
cmake_minimum_required(VERSION 3.4.1)
# Creates and names a library, sets it as either STATIC
# or SHARED, and provides the relative paths to its source code.
# You can define multiple libraries, and CMake builds it for you.
# Gradle automatically packages shared libraries with your APK.
add_library( # Sets the name of the library.
native-lib
# Sets the library as a shared library.
SHARED
# Provides a relative path to your source file(s).
# Associated headers in the same location as their source
# file are automatically included.
src/main/cpp/native-lib.cpp )
include_directories(${CMAKE_SOURCE_DIR}/src)
add_library(sqlite3 STATIC src/main/cpp/sqlite3.c src/main/cpp/sqlite3.h src/main/cpp/sqlite3ext.h)
add_executable(sqlite src/main/cpp/sqlite3.c src/main/cpp/sqlite3.h src/main/cpp/sqlite3ext.h)
set_target_properties(sqlite PROPERTIES OUTPUT_NAME sqlite3)
# Searches for a specified prebuilt library and stores the path as a
# variable. Because system libraries are included in the search path by
# default, you only need to specify the name of the public NDK library
# you want to add. CMake verifies that the library exists before
# completing its build.
find_library( # Sets the name of the path variable.
log-lib
# Specifies the name of the NDK library that
# you want CMake to locate.
log )
# Specifies libraries CMake should link to your target library. You
# can link multiple libraries, such as libraries you define in the
# build script, prebuilt third-party libraries, or system libraries.
target_link_libraries( # Specifies the target library.
native-lib
# Links the target library to the log library
# included in the NDK.
sqlite3
log )
You have to first add libraries using add_library
and then link it to your class which you've made, it is by default named native-lib.cpp
.
- Build your project and you are ready to go.
Implementing Function
Now here comes the main part. Open native-lib.cpp
and include required files
and headers
. What you have to do;
- Make a function which you will be calling from
YourActivity.java
. You'll know the pattern once you see your file. In my case it isJava_com_kfmwa916_testapp_SearchResult_createFunction(JNIEnv * env, jobject object, jstring search)
where
Java
is a keyword
com_kfmwa916_testapp
is your package
SearchResult
is your Java
class
createFunction
is the name of the function.
Create your function. In my case, I have to apply certain
regex
in search. Here is mine,static void strip_text(sqlite3_context *context, int argc, sqlite3_value **argv) { if(argc == 1) { __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "inside strip_text"); char *result = (char *) sqlite3_value_text(argv[0]); std::string text(result); std::regex regex_head("YOUR REGEX"); if (!text.empty()) { text = std::regex_replace(text, regex_head, ""); sqlite3_result_text(context, text.c_str(), -1, SQLITE_TRANSIENT); __android_log_print(ANDROID_LOG_VERBOSE, "STRIPPED TEXT", "%s", text.c_str()); return; } } sqlite3_result_null(context); }
Create
sqlite3
instance, open database, attach this function to database and use it in your query. Here is a code snippetextern "C" void Java_com_kfmwa916_testapp_SearchResult_createFunction(JNIEnv * env, jobject object, jstring search) { const char * search_term = env->GetStringUTFChars(search, 0); env->ReleaseStringUTFChars(search, search_term); std::string q(search_term); std::string query = "SELECT text FROM dummy WHERE LIKE('%" + q + "%', strip_text(text))=1"; __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "%s", query.c_str()); //GetJStringContent(env, search, search_term); sqlite3 *db; //Open database __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "Opening database"); int rc = sqlite3_open("/data/data/com.kfmwa916.testapp/databases/MyDBName.db", &db); //It'll be good to check 'rc' for error(s). //Attach function to database __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "Attaching function"); rc = sqlite3_create_function(db, "strip_text", 1, SQLITE_ANY, NULL, &strip_text, NULL, NULL); __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "Executing query"); rc = sqlite3_exec(db, query.c_str(), callback, NULL, NULL); }
Implement
callback
function to process result. It should look likestatic int callback(void *NotUsed, int argc, char **argv, char **azColName) { __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "FOUND"); int i; for (i = 0; i < argc; ++i) { __android_log_print(ANDROID_LOG_VERBOSE, "TAG", "%s = %sn", azColName[i], argv[i] ? argv[i] : "NULL"); } return 0; }
Finally come to your
Java
class, in my case, it isSearchResult
Load library and define function.static { System.loadLibrary("native-lib"); } public native void createFunction(String search);
and call it where you want it. Let's say onClickEvent
of a button like createFunction(searchterm)
Post is open for correction and modification.
这篇关于在SQLite中搜索时排除HTML标记和一些UNICODE字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!