无法使用spark-sql在Spark中增加配置单元动态分区 [英] Unable to increase hive dynamic partitions in spark using spark-sql

查看:789
本文介绍了无法使用spark-sql在Spark中增加配置单元动态分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个配置单元查询,该查询从表中选择数据,然后使用 spark-sql 将结果插入到另一个配置单元分区表中。插入时需要1536个分区。但是即使我将最大分区数增加到2000,spark也无法插入具有1536个分区的数据。

I am running a hive query which selects data from a table and inserts result into another hive partitioned table using spark-sql. While inserting it requires 1536 partitions. But spark is not able to insert data with 1536 partitions eventhough I increased max partitions to 2000.

下面是命令:


spark-sql --master yarn --num-executors 14 --executor-memory 45G
--executor-cores 30 --driver-memory 10G --conf spark。 dynamicAllocation.enabled = false -e SET
hive.exec.dynamic.partition = true; SET
hive.exec.dynamic.partition.mode = nonstrict; SET
hive.exec.max .dynamic.partitions = 2000;将
weatherdata_part_rv.weather_data_daily_model_location_mapping_rv
分区(model_id,record_date)插入表中,选择
y.rec_id,x.municipal_id,x.model_id,y.record_date from(select * from
weatherdata_part_rv.model_location_xref)x左外连接
weatherdata_part_rv.weather_data_daily y
x.municipal_id = y.weather_station_id;

spark-sql --master yarn --num-executors 14 --executor-memory 45G --executor-cores 30 --driver-memory 10G --conf spark.dynamicAllocation.enabled=false -e "SET hive.exec.dynamic.partition = true;SET hive.exec.dynamic.partition.mode = nonstrict;SET hive.exec.max.dynamic.partitions = 2000; insert into table weatherdata_part_rv.weather_data_daily_model_location_mapping_rv partition (model_id,record_date) select y.rec_id,x.municipal_id,x.model_id,y.record_date from (select * from weatherdata_part_rv.model_location_xref) x left outer join weatherdata_part_rv.weather_data_daily y on x.municipal_id=y.weather_station_id;"

错误堆栈:

 spark-sql --master yarn --num-executors 14 --executor-memory 45G --executor-cores 30 --driver-memory 10G --conf spark.dynamicAllocation.enabled=false -e "SET hive.exec.dynamic.partition = true;SET hive.exec.dynamic.partition.mode = nonstrict;SET hive.exec.max.dynamic.partitions = 2000;
> insert into table weatherdata_part_rv.weather_data_daily_model_location_mapping_rv partition (model_id,record_date) select y.rec_id,x.municipal_id,y.temprature_min_in_celcius,y.temprature_max_in_celcius,y.rainfall_in_mm,y.relative_humidity_min,y.relative_humidity_max,y.radiation_max,y.wind_intensity,y.wind_direction,y.cloud_coverage,y.soil_temprature_in_celcius,y.water_quantity_in_soil,y.lmdt,y.icon,y.probablity_of_rainfall,y.rain_acc_20feb_onwards,x.model_id,y.record_date from (select * from weatherdata_part_rv.model_location_xref) x left outer join weatherdata_part_rv.weather_data_daily y on x.municipal_id=y.weather_station_id;"
17/05/12 09:44:05 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/05/12 09:44:05 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/05/12 09:44:08 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
hive.exec.dynamic.partition     true
Time taken: 1.874 seconds, Fetched 1 row(s)
hive.exec.dynamic.partition.mode        nonstrict
Time taken: 0.67 seconds, Fetched 1 row(s)
hive.exec.max.dynamic.partitions        2000
Time taken: 0.047 seconds, Fetched 1 row(s)
17/05/12 09:58:30 ERROR SparkSQLDriver: Failed in [
insert into table weatherdata_part_rv.weather_data_daily_model_location_mapping_rv partition (model_id,record_date) select y.rec_id,x.municipal_id,y.temprature_min_in_celcius,y.temprature_max_in_celcius,y.rainfall_in_mm,y.relative_humidity_min,y.relative_humidity_max,y.radiation_max,y.wind_intensity,y.wind_direction,y.cloud_coverage,y.soil_temprature_in_celcius,y.water_quantity_in_soil,y.lmdt,y.icon,y.probablity_of_rainfall,y.rain_acc_20feb_onwards,x.model_id,y.record_date from (select * from weatherdata_part_rv.model_location_xref) x left outer join weatherdata_part_rv.weather_data_daily y on x.municipal_id=y.weather_station_id]
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:823)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:689)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272)
        at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:687)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:796)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)
        at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:784)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:268)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:168)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1536, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1536.
        at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1578)
        ... 48 more
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:823)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:689)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272)
        at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:687)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:796)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)
        at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:784)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:268)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:168)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1536, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1536.
        at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1578)
        ... 48 more

spark的最大蜂巢分区是否有限制?

Is there any limit in maximum hive partitions in spark ?

如果是,是否有任何方法可以增加最大分区数?

If so, is there any way to increase maximum no of partitions ?

推荐答案

您可以在hive-site.xml的spark_home / conf / hive-site.xml和hive-home / conf / hive-site.xml中添加以下属性吗?

Can you add below property at hive-site.xml at spark_home/conf/hive-site.xml and hive-home/conf/hive-site.xml

hive.exec.max.dynamic.partitions = 2000

hive.exec.max.dynamic.partitions=2000

    <name>hive.exec.max.dynamic.partitions</name>
    <value>2000</value>
    <description></description>

希望这应该可以解决问题

hope this should resolve the issue.

如果值没有增加,请尝试重新启动hs2进程。

If value is not picking up, try to restart the hs2 process.

这篇关于无法使用spark-sql在Spark中增加配置单元动态分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆