什么是实现简单文本搜索的最佳方法 [英] What is the best way to implement simple text search

查看:88
本文介绍了什么是实现简单文本搜索的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我已经建立了一个Firestore数据库,其中包含一个用户集合和一个funkoPops集合.funkoPops集合包含大约750个文档,它们是funko pops的流派/系列.因此,一个文档的示例是document.id ="2014 Funko Pop奇迹雷神系列2乙烯基公仔"该文档具有funkoData和genre字段.funkoData是对象的数组,这些对象是该2014系列中的funko pops,您可以在此处看到.查询名称的最佳方法是什么,我想在整个数据库文档中搜索funkoData数组(其中field = name)中的每个funko pop.目前,我正在使用某种自定义函数,就像在firestore上阅读一样,无法搜索数据库,而只能匹配字符串的一部分,就像您在图片中看到的那样,如果我搜索Loki,firebase不会给我名称为Loki的funko pop-带头盔,因为它不完全匹配.相反,我在这里写了这个功能

Currently, I have a firestore database set up with a users collection and a funkoPops collection. The funkoPops collections has about 750 documents which are genres/series of funko pops. So an example of one document would be, document.id = "2014 Funko Pop Marvel Thor Series 2 Vinyl Figures" and that document has fields of funkoData, and genre. funkoData is an array of objects that are the funko pops that are within that 2014 series as you can see here. What is the best way to query for say name, where i would want to search the whole database docs for each funko pop in the funkoData array where field = name. Currently I am using somewhat of a custom function as from reading up on firestore, there is no way to search your database and just match part of a string, like you can see in the picture where if I searched Loki, firebase would not give me the funko pop that has the name = Loki - with helmet because it is not an exact match. Instead i have wrote this function here

const getFunkoPopQuery = async (req, res, next) => {
  try {
    console.log(req.params);
    const query = req.params.query.trim().toLowerCase();
    const funkoPops = await firestore.collection("funkoPops");
    const data = await funkoPops.get();
    const funkoArr = [];
    if (data.empty) {
      res.status(404).send("No Funko Pop records exsist");
    } else {
      data.forEach((doc) => {
        const funkoObj = new FunkoPop(doc.data().genre, doc.data().funkoData);
        funkoArr.push(funkoObj);
      });

      // genre matching
      let genreMatches = funkoArr.filter((funko) =>
        funko.genre.toLowerCase().includes(query)
      );
      if (genreMatches.length === 0) {
        genreMatches = `No funko pop genres with search: ${query}`;
      }
      // name & number matching
      let nameMatches = [];
      let numbMatches = [];
      funkoArr.forEach((funko) => {
        const genre = funko.genre;
        const funkoData = funko.funkoData;
        const name = funkoData.filter((data) =>
          data.name.toLowerCase().includes(query)
        );
        const number = funkoData.filter((data) =>
          data.number.toLowerCase().includes(query)
        );

        if (Object.keys(name).length > 0) {
          nameMatches.push({
            genre,
            name,
          });
        }
        if (Object.keys(number).length > 0) {
          numbMatches.push({
            genre,
            number,
          });
        }
      });

      if (nameMatches.length === 0) {
        nameMatches = `No funko pops found with search name: ${query}`;
      }
      if (numbMatches.length === 0) {
        numbMatches = `No funko pop numbers found with search: ${query}`;
      }

      const searchFinds = {
        genre: genreMatches,
        name: nameMatches,
        number: numbMatches,
      };

      res.send(searchFinds);
    }
  } catch (error) {
    res.status(400).send(error.message);
  }
};

现在,此功能适用于数据库中数据的完整和子字符串,并且还可以处理多个搜索,这意味着如果我要搜索funko pop的系列/类型,名称和/或编号,但它会贯穿整个50k的读取配额,因为它每次搜索内容时都会获取所有文档.有没有比我的解决方案更好的方法或通过Firestore搜索呢?我显然仍然需要搜索数据库,以匹配funko pop的系列/类型,名称和编号.这样,如果一个人想要搜索奇迹,它将给出所有带有该类型奇迹的funko pop,钢铁侠将检索所有钢铁侠pop和类型的钢铁侠,搜索一个数字显然会产生所有具有该特性的funko pop数字.任何帮助都会很棒!

Now this function works for full and substring of the data in the database and also handles multiple searches, meaning if i want to search for a series/genre, the name and or number of the funko pop, but it blows through the 50k read quota since its getting all the documents each time you search for something. Is there a better way of doing this or searching through firestore that would be better than my solution? I would need to obviously still search the database to match the series/genre, the name, and the number of the funko pop. This way if a person wants to search marvel, it will give all funko pops with the genre marvel in it, iron man would retrieve all iron man pops and genres for iron man, and searching a number would obviously yield all funko pops that have that number. Any help would be awesome!

推荐答案

在Algolia将价格提高10倍并将其呈现为简化的定价模型之后,是时候在实施之前进行更多思考了.

After Algolia raised their prices 10x times and presented it as simplifying their pricing model it's time to think more before implementing.

  • 如果您不想使用第三方API .

如果您觉得搜索单词存在就足够.

如果您不需要任何语音算法,搜索.

If you don't need any phonetic algorithm be a part of your search.

可以仅使用Firebase.

You can use just Firebase.

假设您希望您的 funkoData [].name 可搜索.

Say you want your funkoData[].name to be searchable.

  1. 将所有 funkoData [].name 连接在一起,并将其小写.
  1. Concatenate all funkoData[].name together and lowercase it.

funkoData.map(v => v.name).join(").toLowerCase()

  1. 用所有类型的空格和特殊字符分隔字符串.

funkoData.map(v => v.name).join(").toLowerCase().split(/[\ s-\.,!?]/)

  1. 过滤短词和空字符串.

funkoData.map(v => v.name).join(").toLowerCase().split(/[\ s-\.,!?]/).filter(v=> v.length> 4)

  1. 使用云功能对此进行计算.

完整示例:

let searchable = documentData.funkoData
                    .map(v=>v.name)
                    .join(" ")
                    .toLowerCase()
                    .split(/[\s-\.,!?]/)
                    .filter(v=>v.length>4);
searchable = Array.from(new Set(searchable)); //remove duplicates
await documentRef.update({_searchArray: searchable});

现在.搜索可以使用 array-contains array-contains-any :

const query = "Loki";
const normalizedQuery = query.trim().toLowerCase();
firestoreRef.collection("funkoPops").where('_searchArray', 'array-contains', normalizedQuery).get()

PS:如果您的可搜索阵列有潜力变得巨大,则可以将其移动到具有相同 ids 的单独集合中,并使用 https可调用云函数触发搜索仅返回符合搜索条件的文档ID.为什么?由于使用的是Admin SDK,因此您可以选择跳过要在查询响应中返回的指定字段.

PS: If your searchable array has a potential to become huge, you can move it to a separate collection with the same ids and trigger search using https callable cloud function returning just ids of documents that fit search criteria. Why? Because using Admin SDK you have an option to skip specified fields to be returned in a query response.

这篇关于什么是实现简单文本搜索的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆