JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式 [英] JArray - How to remove elements that are not contained in another list -fastest/best performance way
问题描述
这是我的问题的背景:
有一个JArray
,包含成千上万个元素.它是样本中的雨伞(JArray
).
There is an JArray
with thousands of elements. It is umbrellas (JArray
) in the sample.
还有另一个只有数百个元素的小列表.它称为 umbrellasToBeRemovedIds (List<string>
).
There is another small list with only hundred of elements. It is called umbrellasToBeRemovedIds (List<string>
).
我正在尝试这种方法来移除这些保护伞:
I am trying this approach to remove these umbrellas:
foreach (string umbrellaToRemoveId in umbrellasToBeRemovedIds)
{
umbrellas.FirstOrDefault(o => o["Id"].Value<string>() == umbrellaToRemoveId)?.Remove();
}
考虑伞伞的尺寸要小于伞伞的尺寸.
从此伞JArray
中删除元素的最快方法(从性能角度考虑)是什么?
Consider umbrellasToBeRemovedIds have smaller size than umbrellas.
What is the fastest way (performance wise) to remove elements from this umbrellas JArray
?
推荐答案
JArray
有两个特定的问题需要牢记在性能上:
There are two issues specific to JArray
to keep in mind with respect to performance:
-
JArray
没有API等效于List<T>.RemoveAll(Predicate<T>)
,该方法使用shift-List<T>
参考资料来源.相反,它具有Clear()
来删除所有项目,RemoveAt()
删除单个项目,然后
JArray
does not have an API equivalent toList<T>.RemoveAll(Predicate<T>)
that removes multiple entries efficiently using the shift-down-and-resize-once algorithm shown in theList<T>
reference source. Instead it hasClear()
to remove all items,RemoveAt()
to remove a single item, andReplaceAll()
to efficiently replace the current contents with different contents.
由于实施的RemoveAt()
从内部List<JToken>
中删除了一个条目并向下移动了后续列表项,因此从大小n
的JArray
中删除k
项将是O(n*k)
,这很可能是对于您的应用程序还不够好.
Since RemoveAt()
as implemented removes an entry from an internal List<JToken>
and shifts the subsequent list items down, removing k
items from a JArray
of size n
would be O(n*k)
, which is probably not good enough for your application.
在JToken
层次结构中,父母与子女之间存在双向引用:
There is a bi-directional reference between parents and children in the JToken
hierarchy:
-
JToken.Children()
遍历所有子标记给定令牌的;
JToken.Children()
iterates through all child tokens of a given token;
JToken.Parent
获取a的父项给定令牌.
JToken.Parent
gets the parent of a given token.
结果,JToken
不能有两个父级,并且如果您尝试将已经有一个父级的JToken
添加到另一个父级中,则会被克隆. IE.如果array
是JArray
,则
As a result, a JToken
cannot have two parents, and if you attempt to add a JToken
that already has a parent to another parent, it is cloned. I.e. if array
is a JArray
then
array[i] = array[i+1];
将克隆索引为i+1
的数组条目,而不是简单地将其两次添加到数组中. (请参阅此答案以了解发生这种情况的详细信息.)
Will clone the array entry at index i+1
rather than simply adding it twice to the array. (See this answer for details why this happens.)
结果,由于在该过程中克隆了许多数组项,因此在应用程序代码中天真地实现一次下移并重新调整大小"算法将具有糟糕的性能.
As a result, naively implementing the shift-down-and-resize-once algorithm in applications code will have terrible performance as many array entries get cloned in the process.
将以上几点放在一起,以下扩展方法应具有最佳算法性能,可从大小为n
的JArray
中删除k
个项目:
Putting the above points together, the following extension methods should have the best algorithmic performance in removing k
items from a JArray
of size n
:
public static partial class JTokenExtensions
{
/// <summary>
/// Removes all the elements whose values, as defined by `selector`, belong to the collection of incoming values
/// </summary>
public static void RemoveAll<T>(this JArray array, IEnumerable<T> values, Func<JToken, T> selector, IEqualityComparer<T> comparer = null)
{
if (array == null || values == null || selector == null)
throw new ArgumentNullException();
var set = new HashSet<T>(values, comparer);
array.RemoveAll(i => set.Contains(selector(i)));
}
/// <summary>
/// Removes all the elements that match the conditions defined by the specified predicate.
/// </summary>
public static void RemoveAll(this JArray array, Predicate<JToken> match)
{
if (array == null || match == null)
throw new ArgumentNullException();
array.ReplaceAll(array.Where(i => !match(i)).ToList());
}
}
您将按以下方式致电:
umbrellas.RemoveAll(umbrellasToBeRemovedIds, i => (string)i["id"]);
这应该具有O(n + k + k*log(k))
的性能,其中k*log(k)
项是构建值的哈希集的(假定的)复杂性.注意我假设"id"
属性是唯一的.如果两个列表中都可以有重复的ID,并且您只想从array
中删除与匹配的重复ID数量相等的项,那么将需要更复杂的算法.
This should have performance of O(n + k + k*log(k))
where the k*log(k)
term is the (presumed) complexity of building the hash set of values. Note I am assuming that the "id"
properties are unique. If there can be duplicated ids in both lists and you only want to remove a number of items from the array
equal to the number of matching duplicate ids, then a more complex algorithm would be required.
对于潜在的点优化,例如是否更快地执行(string)i["id"]
或i["Id"].Value<string>()
,我将带您到Erik Lippert的文章
As for potential point-optimizations, such as is it faster to do (string)i["id"]
or i["Id"].Value<string>()
, I direct you to Erik Lippert's article Which is faster, QueryLightBulbFrobStatusEx()
or __WGetBulbFrobberState2()
?
这篇关于JArray-如何删除另一个列表中不包含的元素-最快/最佳性能的方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!