检查两个pyspark行是否相等 [英] Check if two pyspark Rows are equal
本文介绍了检查两个pyspark行是否相等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在为Spark作业编写单元测试,其中一些输出名为元组: pyspark.sql.Row
I am writing unit tests for a Spark job, and some of the outputs are named tuples: pyspark.sql.Row
我如何主张他们的平等?
How can I assert their equality?
actual = get_data(df)
expected = Row(total=4, unique_ids=2)
self.assertEqual(actual, expected)
执行此操作时,值将按照无法确定的顺序重新排列.
When I do this, the values are rearranged in an order I can not determine.
推荐答案
Your code should work as written because according to the docs:
这些字段将按名称排序.
the fields will be sorted by names.
不过,另一种方法是使用 pySpark.sql.Row
的 asDict()
方法,并将它们作为字典进行比较:
Nevertheless, another way is to use the asDict()
method of the pySpark.sql.Row
and compare them as dictionaries:
actual = get_data(df)
expected = Row(total=4, unique_ids=2)
self.assertEqual(actual.asDict(), expected.asDict())
这篇关于检查两个pyspark行是否相等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文