在 pyspark 中获取 OutofMemoryError-GC 开销限制超出 [英] Getting OutofMemoryError- GC overhead limit exceed in pyspark

查看：34 发布时间：2021/11/14 22:24:40 apache-spark pyspark apache-spark-sql udf pyspark-sql

本文介绍了在 pyspark 中获取 OutofMemoryError-GC 开销限制超出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在项目的中间，我在我的 spark sql 查询中调用了一个函数后出现了以下错误

in the middle of project i am getting bellow error after invoking a function in my spark sql query

我编写了一个用户定义的函数，它将接受两个字符串并在连接后将它们连接起来，它将取最右边的子字符串长度为 5 取决于总字符串长度(sql server 的 right(string,integer) 的替代方法)

i have written a user define function which will take two string and concat them after concatenation it will take right most substring length of 5 depend on total string length(alternate method of right(string,integer) of sql server )

  from pyspark.sql.types import*


def concatstring(xstring, ystring):
            newvalstring = xstring+ystring
            print newvalstring
            if(len(newvalstring)==6):
                stringvalue=newvalstring[1:6]
                return stringvalue
            if(len(newvalstring)==7):
                stringvalue1=newvalstring[2:7]
                return stringvalue1
            else:
                return '99999'


spark.udf.register ('rightconcat', lambda x,y:concatstring(x,y), StringType())

它单独工作正常.现在，当我在我的 spark sql 查询中将它作为列传递时，发生了此异常查询是

it works fine individually. now when i pass it in my spark sql query as column this exception occured the query is

书面查询是

spark.sql("select d.BldgID,d.LeaseID,d.SuiteID,coalesce(BLDG.BLDGNAME,('select EmptyDefault from EmptyDefault')) as LeaseBldgName,coalesce(l.OCCPNAME,('select EmptyDefault from EmptyDefault'))as LeaseOccupantName, coalesce(l.DBA, ('select EmptyDefault from EmptyDefault')) as LeaseDBA, coalesce(l.CONTNAME, ('select EmptyDefault from EmptyDefault')) as LeaseContact,coalesce(l.PHONENO1, '')as LeasePhone1,coalesce(l.PHONENO2, '')as LeasePhone2,coalesce(l.NAME, '') as LeaseName,coalesce(l.ADDRESS, '') as LeaseAddress1,coalesce(l.ADDRESS2,'') as LeaseAddress2,coalesce(l.CITY, '')as LeaseCity, coalesce(l.STATE, ('select EmptyDefault from EmptyDefault'))as LeaseState,coalesce(l.ZIPCODE, '')as LeaseZip, coalesce(l.ATTENT, '') as LeaseAttention,coalesce(l.TTYPID, ('select EmptyDefault from EmptyDefault'))as LeaseTenantType,coalesce(TTYP.TTYPNAME, ('select EmptyDefault from EmptyDefault'))as LeaseTenantTypeName,l.OCCPSTAT as LeaseCurrentOccupancyStatus,l.EXECDATE as LeaseExecDate, l.RENTSTRT as LeaseRentStartDate,l.OCCUPNCY as LeaseOccupancyDate,l.BEGINDATE as LeaseBeginDate,l.EXPIR as LeaseExpiryDate,l.VACATE as LeaseVacateDate,coalesce(l.STORECAT, (select EmptyDefault from EmptyDefault)) as LeaseStoreCategory ,rightconcat('00000',cast(coalesce(SCAT.SORTSEQ,99999) as string)) as LeaseStoreCategorySortID from Dim_CMLease_primer d join LEAS l on l.BLDGID=d.BldgID and l.LEASID=d.LeaseID left outer join SUIT on SUIT.BLDGID=l.BLDGID and SUIT.SUITID=l.SUITID left outer join BLDG on BLDG.BLDGID= l.BLDGID left outer join SCAT on SCAT.STORCAT=l.STORECAT left outer join TTYP on TTYP.TTYPID = l.TTYPID").show()

我在这里上传了查询和查询后的状态.我怎么能解决这个问题.请指导我

i have uploaded the the query and after query state here. how could i solve this problem. Kindly guide me

在 pyspark 中获取 OutofMemoryError-GC 开销限制超出 [英] Getting OutofMemoryError- GC overhead limit exceed in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 pyspark 中获取 OutofMemoryError-GC 开销限制超出 [英] Getting OutofMemoryError- GC overhead limit exceed in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭