正确的JSON结构可过滤数据 [英] Correct JSON structure to filter through data

查看:31
本文介绍了正确的JSON结构可过滤数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当您需要通过Firebase中的数据(在Swift中)过滤"时,最佳" JSON结构是什么?

我正在让用户将他们的问题分类为:

 业务娱乐其他 

每个问题类型最好有一个单独的孩子吗?如果是这样,我如何获取所有数据(何时需要),然后在需要时仅按业务"过滤?

解决方案

在NoSQL数据库中,您通常最终针对要在应用程序中允许的用例对数据结构进行建模.

这是一个学习途径,所以我将在以下四个步骤中对其进行解释:

  1. 按类别的树:按照您似乎最感兴趣的方式,按类别将数据存储在树中.
  2. 平坦的问题列表和查询:将数据存储在平坦的列表中,然后使用查询进行过滤.
  3. 平面列表和索引:结合以上两种方法,可以使结果更具可扩展性.
  4. 复制数据:通过在其上复制数据,您可以降低代码复杂性并进一步提高性能.


按类别分类的树

如果您仅 要按类别获取问题,则最好将每个问题简单地存储在其类别下.在一个看起来像这样的简单模型中:

  questionsByCategory:{商业: {问题1:{...},问题4:{...}},娱乐: {问题2: { ... },问题5:{...}},其他: {问题3:{...},问题6:{...}}} 

使用上述结构,为该类别加载问题列表是对该类别的简单直接访问读取: firebase.database().ref("questionsByCategory").child("Business").once("value" ... .

但是,如果您需要所有问题的列表,则需要阅读所有类别,并拒绝客户端类别.如果您需要不是所有问题的 all 问题列表,因为无论如何都需要全部加载它们,但是如果要过滤类别以外的其他条件,这可能会很浪费


平坦的问题列表和查询

另一种方法是创建所有问题的平面列表,然后使用查询来过滤数据.在这种情况下,您的JSON将如下所示:

 问题:{问题1:{类别:业务",难度:1,...},问题2:{类别:娱乐",难度:1,...},问题3:{类别:其他",难度:2,...},问题4:{类别:业务",难度:2,...}问题5:{类别:娱乐",难度:3,...}问题6:{类别:其他",难度:1,...}} 

现在,获取所有问题的列表很容易,因为您可以阅读它们并遍历结果:

  firebase.database().ref("questions").once("value").then(function(result){result.forEach(function(snapshot){console.log(snapshot.key +:" + snapshot.val().category);})}) 

如果要获取特定类别的所有问题,请使用查询,而不只是 ref("questions").所以:

  • 获取所有业务问题:

      firebase.database().ref("questions").orderByChild("category").equalTo("Business").once("value")... 

  • 获取所有困难3的问题:

      firebase.database().ref("questions").orderByChild("difficult").equalTo(3).once("value")... 

除非您有大量问题,否则此方法效果很好.


平面列表和索引

如果您有数百万个问题,Firebase数据库查询可能对您而言不再足够好.在这种情况下,您可能需要将上述两种方法结合起来,使用平面列表存储问题,并使用所谓的(自制)二级索引来执行过滤后的查找.

如果您认为自己会遇到这么多问题,我会考虑使用Cloud Firestore,因为它没有Realtime Database固有的可扩展性限制.实际上,Cloud Firestore拥有唯一的保证,即无论数据库/集合中有多少数据,检索一定数量的数据都将花费固定的时间.

在这种情况下,您的JSON如下所示:

 问题:{问题1:{类别:业务",难度:1,...},问题2:{类别:娱乐",难度:1,...},问题3:{类别:其他",难度:2,...},问题4:{类别:业务",难度:2,...}问题5:{类别:娱乐",难度:3,...}问题6:{类别:其他",难度:1,...}},questionsByCategory:{商业: {问题1:是的,问题4:是},娱乐: {问题2:是的,问题5:是的},其他: {问题3:是的,问题6:是的}},QuestionsByDifficulty:{"1":{问题1:是的,问题2:是的,问题6:是的},"2":{问题3:是的,问题4:是},"3":{问题3:是的}} 

您会看到,我们只有一个单一的问题清单,然后是具有我们要过滤的不同属性的单独清单,以及每个值的问题的问题ID.这些二级列表通常也称为(二级)索引,因为它们实际上充当数据的索引.

要加载上面的难题,我们采取两步法:

  1. 通过直接查找加载问题ID.
  2. 通过其ID加载每个问题.

在代码中:

  firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result){result.forEach(function(snapshot){firebase.database().ref("questions").child(snapshot.key).once("value").then(function(questionSnapshot){console.log(questionSnapshot.key +:" + questionSnapshot.val().category);});})}) 

如果您需要在记录(或以其他方式处理)之前等待所有问题,则可以使用 Promise.all :

  firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result){var promises = [];result.forEach(function(snapshot){promises.push(firebase.database().ref("questions").child(snapshot.key).once("value")));})Promise.all(promises).then(function(questionSnapshots){questionSnapshots.forEach(function(questionSnapshot){console.log(questionSnapshot.key +:" + questionSnapshot.val().category);})})}) 

许多开发人员认为这种方法很慢,因为它需要为每个问题单独打电话.但这实际上非常快,因为Firebase通过其现有连接对请求进行流水线处理.有关更多信息,请参见

如果您来自关系数据建模的背景,这可能看起来很不自然,因为我们现在在主列表和二级索引之间复制数据.

然而,对于有经验的NoSQL数据建模者来说,这看起来是完全正常的.我们需要权衡存储一些额外的数据和加载数据所需的额外时间/代码.

这种折衷在计算机科学的所有领域都是很普遍的,在NoSQL数据建模中,您经常会看到人们选择牺牲空间(从而存储重复数据)来获得更简单,更可扩展的数据模型./p>

What's the "best" JSON structure when you need to "filter" through data in Firebase (in Swift)?

I'm having users sort their questions into:

Business
Entertainment
Other

Is it better to have a separate child for each question genre? If so, how do I get all of the data (when i want it), and then filter it only by "business" when I want to?

解决方案

In NoSQL databases you usually end up modeling your data structure for the use-cases you want to allow in your app.

It's a bit of a learning path, so I'll explain it below in four steps:

  1. Tree by category: Storing the data in a tree by its category, as you seem to be most interested in already.
  2. Flat list of questions, and querying: Storing the data in a flat list, and then using queries to filter.
  3. Flat list and indexes: Combining the above two approaches, to make the result more scalable.
  4. Duplicating data: By duplicating data on top of that, you can reduce code complexity and improve performance further.


Tree by category

If you only want to get the questions by their category, you're best of simply storing each question under its category. In a simple model that'd look like this:

questionsByCategory: {
  Business: {
    question1: { ... },
    question4: { ... }
  },
  Entertainment: {
    question2: { ... },
    question5: { ... }
  },
  Other: {
    question3: { ... },
    question6: { ... }
  }
}

With the above structure, loading a list of question for a category is a simple, direct-access read for that category: firebase.database().ref("questionsByCategory").child("Business").once("value"....

But if you'd need a list of all questions, you'd need to read all categories, and denest the categories client-side. If you'd need a list of all question that is not a real problem, as you need to load them all anyway, but if you want to filter over some other condition than category, this may be wasteful.


Flat list of questions, and querying

An alternative is to create a flat list of all questions, and use queries to filter the data. In that case your JSON would look like this:

questions: {
  question1: { category: "Business", difficulty: 1, ... },
  question2: { category: "Entertainment", difficulty: 1, ... },
  question3: { category: "Other", difficulty: 2, ... },
  question4: { category: "Business", difficulty: 2, ... }
  question5: { category: "Entertainment", difficulty: 3, ... }
  question6: { category: "Other", difficulty: 1, ... }
}

Now, getting a list of all questions is easy, as you can just read them and loop over the results:

firebase.database().ref("questions").once("value").then(function(result) {
  result.forEach(function(snapshot) {
    console.log(snapshot.key+": "+snapshot.val().category);
  })
})

If we want to get all questions for a specific category, we use a query instead of just the ref("questions"). So:

  • Get all Business questions:

    firebase.database().ref("questions").orderByChild("category").equalTo("Business").once("value")...
    

  • Get all questions with difficult 3:

    firebase.database().ref("questions").orderByChild("difficult").equalTo(3).once("value")...
    

This approach works quite well, unless you have huge numbers of questions.


Flat list and indexes

If you have millions of questions, Firebase database queries may not perform well enough anymore for you. In that case you may need to combine the two approaches above, using a flat list to store the question, and so-called (self-made) secondary indexes to perform the filtered lookups.

If you think you'll ever reach this number of questions, I'd consider using Cloud Firestore, as that does not have the inherent scalability limits that the Realtime Database has. In fact, Cloud Firestore has the unique guarantee that retrieving a certain amount of data takes a fixed amount of time, no matter how much data there is in the database/collection.

In this scenario, your JSON would look like:

questions: {
  question1: { category: "Business", difficulty: 1, ... },
  question2: { category: "Entertainment", difficulty: 1, ... },
  question3: { category: "Other", difficulty: 2, ... },
  question4: { category: "Business", difficulty: 2, ... }
  question5: { category: "Entertainment", difficulty: 3, ... }
  question6: { category: "Other", difficulty: 1, ... }
},
questionsByCategory: {
  Business: {
    question1: true,
    question4: true
  },
  Entertainment: {
    question2: true,
    question5: true
  },
  Other: {
    question3: true,
    question6: true
  }
},
questionsByDifficulty: {
  "1": {
    question1: true,
    question2: true,
    question6: true
  },
  "2": {
    question3: true,
    question4: true
  },
  "3": {
    question3: true
  }
}

You see that we have a single flat list of the questions, and then separate lists with the different properties we want to filter on, and the question IDs of the question for each value. Those secondary lists are also often called (secondary) indexes, since they really serve as indexes on your data.

To load the hard questions in the above, we take a two-step approach:

  1. Load the questions IDs with a direct lookup.
  2. Load each question by their ID.

In code:

firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
  result.forEach(function(snapshot) {
    firebase.database().ref("questions").child(snapshot.key).once("value").then(function(questionSnapshot) {
      console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
    });
  })
})

If you need to wait for all questions before logging (or otherwise processing) them, you'd use Promise.all:

firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
  var promises = [];
  result.forEach(function(snapshot) {
    promises.push(firebase.database().ref("questions").child(snapshot.key).once("value"));
  })
  Promise.all(promises).then(function(questionSnapshots) {
    questionSnapshots.forEach(function(questionSnapshot) {
      console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
    })
  })
})

Many developers assume that this approach is slow, since it needs a separate call for each question. But it's actually quite fast, since Firebase pipelines the requests over its existing connection. For more on this, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly


Duplicating data

The code for the nested load/client-side join is a bit tricky to read at times. If you'd prefer only performing a single load, you could consider duplicating the data for each question into each secondary index too.

In this scenario, the secondary index would look like this:

questionsByCategory: {
  Business: {
    question1: { category: "Business", difficulty: 1, ... },
    question4: { category: "Business", difficulty: 2, ... }
  },

If you come from a background in relational data modeling, this may look quite unnatural, since we're now duplicating data between the main list and the secondary indexes.

To an experienced NoSQL data modeler however, this looks completely normal. We're trading off storing some extra data against the extra time/code it takes to load the data.

This trade-off is common in all areas of computer science, and in NoSQL data modeling you'll fairly often see folks choosing to sacrifice space (and thus store duplicate data) to get an easier and more scalable data model.

这篇关于正确的JSON结构可过滤数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆