全文不索引varbinary列(带有html) [英] Full Text not indexing varbinary column (with html)
问题描述
我有一个包含HTML数据的表,我想通过html过滤器使用全文索引进行搜索
所以我创建了一个索引:
CREATE FULLTEXT CATALOG myCatalog AS DEFAULT
CREATE FULLTEXT INDEX ON myTable (Body TYPE COLUMN Filetype)
KEY INDEX PK_myTable
Body
是带有HTML的varbinary(max)
列. Filetype
列是返回.html
的计算列.
没有返回结果.
我确认已安装.html
过滤器.如果将列转换为nvarchar
并仅创建纯文本"索引(不是html),则FullText索引也已正确安装,并且工作正常.
SQL日志或FTS日志中没有错误.
关键字表为空!
SELECT *
FROM sys.dm_fts_index_keywords
(DB_ID('myDatabase'), OBJECT_ID('myTable'))
它返回的只是文件结束"符号.
显示文档数35",表示已处理文档,但未提取任何关键字.
PS.我有SQL Server Express 版(具有所有高级功能,包括全文).这可能是原因吗?但是,纯文本"全文搜索还是可以的!
PPS.让我的同事在SQL Express 2016上进行测试-相同的结果...在我们的生产服务器"Enterprise"版本上尝试过-相同.
更新
确定,全文索引不支持UNICODE!在varbinary列中.当我将列转换为非unicode时(通过将其转换为nvarchar
然后转换为varchar
然后又返回到varbinary
),它开始工作.
有人知道任何解决方法吗?
确定,因此事实证明全文索引确实支持varbinary
中的unicode数据,但请注意:
如果您的varbinary
列是从Nvarchar
创建的,请确保在开头添加0xFFFE
unicode签名
例如,我将计算列用于全文索引,因此我不得不将计算列更改为:
alter table myTable
add FTS_Body as 0xFFFE + (CAST(HtmlBody as VARBINARY(MAX)))
--HtmlBody is my nvarchar column that contains html
I have a table with HTML data, that I want to search using the Full Text Index via an html-filter
So I created an index:
CREATE FULLTEXT CATALOG myCatalog AS DEFAULT
CREATE FULLTEXT INDEX ON myTable (Body TYPE COLUMN Filetype)
KEY INDEX PK_myTable
Body
is a varbinary(max)
column with HTML. The Filetype
column is a computed column returns .html
.
No results are being returned.
I verified that .html
filter is installed. FullText index is also installed properly and works fine if I convert the column to nvarchar
and create just a "plain text" index (not html).
No errors in the SQL log or FTS log.
The keywords table is just empty!
SELECT *
FROM sys.dm_fts_index_keywords
(DB_ID('myDatabase'), OBJECT_ID('myTable'))
All it returns is "END OF FILE" symbol.
It says "document count 35" which mean the documents were processed, but no keywords were extracted.
PS. I have SQL Server Express Edition 2012 (with all advanced features including full text). Can this be the reason? But again, the "plain" full text search works just fine!
PPS. Asked my coworker to test this on SQL Express 2016 - same result... Tried on our production server "Enterprise" edition - same.
UPDATE
OK it turns out the full text index DOES NOT SUPPORT UNICODE!! in varbinary columns. When I converted the column to non-unicode (by converting it to nvarchar
then to varchar
and then back to varbinary
) It started working.
Anyone knows any workarounds?
OK, so it turns out fulltext index DOES support unicode data in varbinary
but pay attention to this:
If your varbinary
column is created from Nvarchar
be sure to include the 0xFFFE
unicode signature at the beginning
For example, I'm using a computed column for full text index, so I had to change my computed column to this:
alter table myTable
add FTS_Body as 0xFFFE + (CAST(HtmlBody as VARBINARY(MAX)))
--HtmlBody is my nvarchar column that contains html
这篇关于全文不索引varbinary列(带有html)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!