我们可以在数据框spark scala中拥有的最大列数 [英] maximum number of columns we can have in dataframe spark scala

查看:172
本文介绍了我们可以在数据框spark scala中拥有的最大列数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道数据框中可以有的最大列数,在保持数据框中的列数方面是否有任何限制. 谢谢.

I like to know the maximum number of columns I can have in the dataframe,Is there any limitations in maintaining number of columns in dataframes. Thanks.

推荐答案

为您提供详细信息,答案为,Apache Spark中的列数大小受到限制.

Sparing you the details, the answer is Yes, there is a limit for the size the number of columns in Apache Spark.

从理论上讲,此限制取决于平台和每列中元素的大小.

Theoretically speaking, this limit depends on the platform and the size of element in each column.

不要忘记Java受JVM大小的限制,而执行程序也受JVM大小的限制- Java在堆中最大的对象大小.

Don't forget that Java is limited by the size of the JVM and an executor is also limited by that size - Java largest object size in Heap.

我会回头引用此为什么Spark RDD分区的HDFS是否有2GB的限制?是指HDFS对块/分区大小的限制.

I would go back an refer to this Why does Spark RDD partition has 2GB limit for HDFS? which refers to the limitation with HDFS on block/partition size.

因此实际上有很多限制要考虑.

So there is actually lots of restriction to take into account.

这意味着您可以轻松地找到一个硬限制(例如Int.MaxValue),但是更重要的 Spark仅可缩放较长且相对较薄的数据. (如保险柜中所述).

This means that you can easily find a hard limit (Int.MaxValue par ex.) but what is more important Spark scales well only long and relatively thin data. (like stated by pault).

最后,您需要记住,从根本上讲,您不能在执行者/分区之间分割单个记录.并且存在许多实际限制(GC,磁盘IO),这些限制使非常宽泛的数据变得不切实际.更不用说一些已知的错误了.

Finally, you need to remember that fundamentally you cannot split a single record between executors/partitions. And there is a number of practical limitations (GC, disk IO) which make very wide data impractical. Not to mention some known bugs.

注意:我提到@pault和@RameshMaharjan,因为这个答案实际上是我们进行讨论的结果. (而ofc @ zero323则来自其他答案的评论).

Note : I mention @pault and @RameshMaharjan as this answer is actually the fruit of the discussion we had. (And ofc @zero323 for his comment from the other answer).

这篇关于我们可以在数据框spark scala中拥有的最大列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆