JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式 [英] JArray - How to remove elements that are not contained in another list -fastest/best performance way

查看:143
本文介绍了JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的问题的背景:

有一个JArray,包含成千上万个元素.它是样本中的雨伞(JArray).

There is an JArray with thousands of elements. It is umbrellas (JArray) in the sample.

还有另一个只有数百个元素的小列表.它称为 umbrellasToBeRemovedIds (List<string>).

There is another small list with only hundred of elements. It is called umbrellasToBeRemovedIds (List<string>).

我正在尝试这种方法来移除这些保护伞:

I am trying this approach to remove these umbrellas:

foreach (string umbrellaToRemoveId in umbrellasToBeRemovedIds)
{
    umbrellas.FirstOrDefault(o => o["Id"].Value<string>() == umbrellaToRemoveId)?.Remove();
}

考虑伞伞的尺寸要小于伞伞的尺寸. 从此伞JArray中删除元素的最快方法(从性能角度考虑)是什么?

Consider umbrellasToBeRemovedIds have smaller size than umbrellas. What is the fastest way (performance wise) to remove elements from this umbrellas JArray ?

推荐答案

JArray有两个特定的问题需要牢记在性能上:

There are two issues specific to JArray to keep in mind with respect to performance:

  1. JArray 没有API等效于 List<T>.RemoveAll(Predicate<T>) ,该方法使用shift- List<T>参考资料来源.相反,它具有 Clear() 来删除所有项目, RemoveAt() 删除单个项目,然后

  1. JArray does not have an API equivalent to List<T>.RemoveAll(Predicate<T>) that removes multiple entries efficiently using the shift-down-and-resize-once algorithm shown in the List<T> reference source. Instead it has Clear() to remove all items, RemoveAt() to remove a single item, and ReplaceAll() to efficiently replace the current contents with different contents.

由于实施的RemoveAt()从内部List<JToken>中删除了一个条目并向下移动了后续列表项,因此从大小nJArray中删除k项将是O(n*k),这很可能是对于您的应用程序还不够好.

Since RemoveAt() as implemented removes an entry from an internal List<JToken> and shifts the subsequent list items down, removing k items from a JArray of size n would be O(n*k), which is probably not good enough for your application.

JToken层次结构中,父母与子女之间存在双向引用:

There is a bi-directional reference between parents and children in the JToken hierarchy:

  • JToken.Children() iterates through all child tokens of a given token;

JToken.Parent 获取a的父项给定令牌.

JToken.Parent gets the parent of a given token.


结果,JToken不能有两个父级,并且如果您尝试将已经有一个父级的JToken添加到另一个父级中,则会被克隆. IE.如果arrayJArray,则


As a result, a JToken cannot have two parents, and if you attempt to add a JToken that already has a parent to another parent, it is cloned. I.e. if array is a JArray then

array[i] = array[i+1];

将克隆索引为i+1的数组条目,而不是简单地将其两次添加到数组中. (请参阅此答案以了解发生这种情况的详细信息.)

Will clone the array entry at index i+1 rather than simply adding it twice to the array. (See this answer for details why this happens.)

结果,由于在该过程中克隆了许多数组项,因此在应用程序代码中天真地实现一次下移并重新调整大小"算法将具有糟糕的性能.

As a result, naively implementing the shift-down-and-resize-once algorithm in applications code will have terrible performance as many array entries get cloned in the process.

将以上几点放在一起,以下扩展方法应具有最佳算法性能,可从大小为nJArray中删除k个项目:

Putting the above points together, the following extension methods should have the best algorithmic performance in removing k items from a JArray of size n:

public static partial class JTokenExtensions
{
    /// <summary>
    /// Removes all the elements whose values, as defined by `selector`, belong to the collection of incoming values
    /// </summary>
    public static void RemoveAll<T>(this JArray array, IEnumerable<T> values, Func<JToken, T> selector, IEqualityComparer<T> comparer = null)
    {
        if (array == null || values == null || selector == null)
            throw new ArgumentNullException();
        var set = new HashSet<T>(values, comparer);
        array.RemoveAll(i => set.Contains(selector(i)));
    }

    /// <summary>
    /// Removes all the elements that match the conditions defined by the specified predicate.
    /// </summary>
    public static void RemoveAll(this JArray array, Predicate<JToken> match)
    {
        if (array == null || match == null)
            throw new ArgumentNullException();
        array.ReplaceAll(array.Where(i => !match(i)).ToList());
    }
}

您将按以下方式致电:

umbrellas.RemoveAll(umbrellasToBeRemovedIds, i => (string)i["id"]);

这应该具有O(n + k + k*log(k))的性能,其中k*log(k)项是构建值的哈希集的(假定的)复杂性.注意我假设"id"属性是唯一的.如果两个列表中都可以有重复的ID,并且您只想从array中删除与匹配的重复ID数量相等的项,那么将需要更复杂的算法.

This should have performance of O(n + k + k*log(k)) where the k*log(k) term is the (presumed) complexity of building the hash set of values. Note I am assuming that the "id" properties are unique. If there can be duplicated ids in both lists and you only want to remove a number of items from the array equal to the number of matching duplicate ids, then a more complex algorithm would be required.

对于潜在的点优化,例如是否更快地执行(string)i["id"]i["Id"].Value<string>() ,我将带您到Erik Lippert的文章

As for potential point-optimizations, such as is it faster to do (string)i["id"] or i["Id"].Value<string>(), I direct you to Erik Lippert's article Which is faster, QueryLightBulbFrobStatusEx() or __WGetBulbFrobberState2()?

这篇关于JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆