列对象不可调用spark [英] column object not callable spark

查看:68
本文介绍了列对象不可调用spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试安装spark并运行本教程中给出的命令,但出现以下错误-

I tried to install spark and run the commands given in the tutorial but get the following error -

https://spark.apache.org/docs/latest/quick-start.html

P-MBP:spark-2.0.2-bin-hadoop2.4 prem$ ./bin/pyspark 
Python 2.7.13 (default, Apr  4 2017, 08:44:49) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/09/12 17:26:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.0.2
      /_/

Using Python version 2.7.13 (default, Apr  4 2017 08:44:49)
SparkSession available as 'spark'.
>>> textFile = spark.read.text("README.md")
>>> textFile.count()
99
>>> textFile.first()
Row(value=u'# Apache Spark')
>>> linesWithSpark = textFile.filter(textFile.value.contains("Spark"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Column' object is not callable


>>> dir(textFile.value)
['__add__', '__and__', '__bool__', '__class__', '__contains__', '__delattr__', '__dict__', '__div__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__init__', '__invert__', '__iter__', '__le__', '__lt__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__or__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rsub__', '__rtruediv__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__weakref__', '_jc', 'alias', 'asc', 'astype', 'between', 'bitwiseAND', 'bitwiseOR', 'bitwiseXOR', 'cast', 'desc', 'endswith', 'getField', 'getItem', 'isNotNull', 'isNull', 'isin', 'like', 'name', 'otherwise', 'over', 'rlike', 'startswith', 'substr', 'when']

推荐答案

Column.contains 方法已在Spark 2.2中添加(

Column.contains method has been added in Spark 2.2 (SPARK-19706) You are using Spark 2.0.2, so it is not present and __getattr__ (dot syntax) is resolved as a nested Column.

您可以改用 like :

textFile.filter(textFile.value.like("%Spark%"))

这篇关于列对象不可调用spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆