大型文件(大于32k)上的全文本索引 [英] Full text indexing on large files (more than 32k)

查看:86
本文介绍了大型文件(大于32k)上的全文本索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在大小超过32kB的Blob上使用Azure搜索?我在Azure上存储了大约500GB的文本文件作为Blob.平均Blob大小约为1MB.我很想尝试Azure搜索对文件进行全文搜索.但是,看起来索引字段Edm.String不能超过32kB.我在任何地方都找不到这个确切的限制,我从门户中的错误消息中提取了此信息.

Is it possible to use Azure Search on blobs over 32kB size? I have around 500GB of text files stored as blobs on Azure. Average blob size is around 1MB. I was so exited to try Azure Search to have full text search on files. However, it looks like index field Edm.String cannot be more than 32kB. I couldn't find this exact limit anywhere, I extracted this information from error message in the portal.

Azure上是否有可用于在Blobs上添加全文本搜索功能的现成解决方案? Azure团队是否计划删除32kB字段大小?

Is there any out of the box solution on Azure that I can use to add full text search functionality on Blobs? Does Azure team plan to remove 32kB field size?

推荐答案

此处可能存在两个不同的限制:

Two different limits are potentially relevant here:

  1. Azure搜索对从大块中提取多少个字符有限制,具体取决于定价层.对于免费套餐,该限制为32 * 1024个字符.对于标准S1和S2定价层,为400万个字符.

  1. Azure Search has a limit on how many characters it will extract from a blob, depending on the pricing tier. For free tier, that limit is 32*1024 characters. For the Standard S1 and S2 pricing tiers, it's 4 million characters.

另外,搜索索引中单个词的大小也有限制-恰好是32KB.如果搜索索引中的content字段标记为filterablefacetablesortable,则您将达到此限制(无论该字段是否标记为searchable).通常,对于大型可搜索内容,您想启用searchable有时是retrievable而不是其他.这样,您就不会从索引端达到内容长度的限制.

Separately, there's a limit on the size of a single term in the search index - it also happens to be 32KB. If the content field in your search index is marked as filterable, facetable or sortable then you'll hit this limit (regardless of whether the field is marked as searchable or not). Typically for large searchable content you want to enable searchable and sometimes retrievable but not the rest. That way you won't hit limits on content length from the index side.

我们意识到,特别是现在没有记录第一个限制;我们将很快在配额和限制"页面中反映出来.

We realize that the first limit especially isn't documented now; we'll reflect this in our Quotas and Limits page soon.

这篇关于大型文件(大于32k)上的全文本索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆