JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式 [英] JArray - How to remove elements that are not contained in another list -fastest/best performance way

查看：143 发布时间：2021/2/14 18:34:17 c# json performance json.net

本文介绍了JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的问题的背景:

有一个JArray，包含成千上万个元素.它是样本中的雨伞(JArray).

There is an JArray with thousands of elements. It is umbrellas (JArray) in the sample.

还有另一个只有数百个元素的小列表.它称为 umbrellasToBeRemovedIds (List<string>).

There is another small list with only hundred of elements. It is called umbrellasToBeRemovedIds (List<string>).

我正在尝试这种方法来移除这些保护伞:

I am trying this approach to remove these umbrellas:

foreach (string umbrellaToRemoveId in umbrellasToBeRemovedIds)
{
    umbrellas.FirstOrDefault(o => o["Id"].Value<string>() == umbrellaToRemoveId)?.Remove();
}

考虑伞伞的尺寸要小于伞伞的尺寸. 从此伞JArray中删除元素的最快方法(从性能角度考虑)是什么?

Consider umbrellasToBeRemovedIds have smaller size than umbrellas. What is the fastest way (performance wise) to remove elements from this umbrellas JArray ?

推荐答案

JArray有两个特定的问题需要牢记在性能上:

There are two issues specific to JArray to keep in mind with respect to performance:

JArray 没有API等效于 List<T>.RemoveAll(Predicate<T>) ，该方法使用shift- List<T>参考资料来源.相反，它具有 Clear() 来删除所有项目， RemoveAt() 删除单个项目，然后

JArray does not have an API equivalent to List<T>.RemoveAll(Predicate<T>) that removes multiple entries efficiently using the shift-down-and-resize-once algorithm shown in the List<T> reference source. Instead it has Clear() to remove all items, RemoveAt() to remove a single item, and ReplaceAll() to efficiently replace the current contents with different contents.

由于实施的RemoveAt()从内部List<JToken>中删除了一个条目并向下移动了后续列表项，因此从大小n的JArray中删除k项将是O(n*k)，这很可能是对于您的应用程序还不够好.

Since RemoveAt() as implemented removes an entry from an internal List<JToken> and shifts the subsequent list items down, removing k items from a JArray of size n would be O(n*k), which is probably not good enough for your application.

在JToken层次结构中，父母与子女之间存在双向引用:

There is a bi-directional reference between parents and children in the JToken hierarchy:

JToken.Children() 遍历所有子标记给定令牌的；

JToken.Children() iterates through all child tokens of a given token;

JToken.Parent 获取a的父项给定令牌.

JToken.Parent gets the parent of a given token.

结果，JToken不能有两个父级，并且如果您尝试将已经有一个父级的JToken添加到另一个父级中，则会被克隆. IE.如果array是JArray，则

As a result, a JToken cannot have two parents, and if you attempt to add a JToken that already has a parent to another parent, it is cloned. I.e. if array is a JArray then

array[i] = array[i+1];

将克隆索引为i+1的数组条目，而不是简单地将其两次添加到数组中. (请参阅此答案以了解发生这种情况的详细信息.)

Will clone the array entry at index i+1 rather than simply adding it twice to the array. (See this answer for details why this happens.)

结果，由于在该过程中克隆了许多数组项，因此在应用程序代码中天真地实现一次下移并重新调整大小"算法将具有糟糕的性能.

As a result, naively implementing the shift-down-and-resize-once algorithm in applications code will have terrible performance as many array entries get cloned in the process.

将以上几点放在一起，以下扩展方法应具有最佳算法性能，可从大小为n的JArray中删除k个项目:

Putting the above points together, the following extension methods should have the best algorithmic performance in removing k items from a JArray of size n:

public static partial class JTokenExtensions
{
    /// <summary>
    /// Removes all the elements whose values, as defined by `selector`, belong to the collection of incoming values
    /// </summary>
    public static void RemoveAll<T>(this JArray array, IEnumerable<T> values, Func<JToken, T> selector, IEqualityComparer<T> comparer = null)
    {
        if (array == null || values == null || selector == null)
            throw new ArgumentNullException();
        var set = new HashSet<T>(values, comparer);
        array.RemoveAll(i => set.Contains(selector(i)));
    }

    /// <summary>
    /// Removes all the elements that match the conditions defined by the specified predicate.
    /// </summary>
    public static void RemoveAll(this JArray array, Predicate<JToken> match)
    {
        if (array == null || match == null)
            throw new ArgumentNullException();
        array.ReplaceAll(array.Where(i => !match(i)).ToList());
    }
}

您将按以下方式致电:

umbrellas.RemoveAll(umbrellasToBeRemovedIds, i => (string)i["id"]);

这应该具有O(n + k + k*log(k))的性能，其中k*log(k)项是构建值的哈希集的(假定的)复杂性.注意我假设"id"属性是唯一的.如果两个列表中都可以有重复的ID，并且您只想从array中删除与匹配的重复ID数量相等的项，那么将需要更复杂的算法.

This should have performance of O(n + k + k*log(k)) where the k*log(k) term is the (presumed) complexity of building the hash set of values. Note I am assuming that the "id" properties are unique. If there can be duplicated ids in both lists and you only want to remove a number of items from the array equal to the number of matching duplicate ids, then a more complex algorithm would be required.

对于潜在的点优化，例如是否更快地执行(string)i["id"]或i["Id"].Value<string>() ，我将带您到Erik Lippert的文章

As for potential point-optimizations, such as is it faster to do (string)i["id"] or i["Id"].Value<string>(), I direct you to Erik Lippert's article Which is faster, QueryLightBulbFrobStatusEx() or __WGetBulbFrobberState2()?

这篇关于JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式 [英] JArray - How to remove elements that are not contained in another list -fastest/best performance way

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式 [英] JArray - How to remove elements that are not contained in another list -fastest/best performance way

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭