如何将一个或多个本地.jar文件中的类导入Spark/Scala Notebook? [英] How do I import classes from one or more local .jar files into a Spark/Scala Notebook?
问题描述
我正在努力将JAR中的类加载到我的Scala-Spark内核Jupyter笔记本中.我在这个位置有罐子.
I am struggling to load classes from JARs into my Scala-Spark kernel Jupyter notebook. I have jars at this location:
/home/hadoop/src/main/scala/com/linkedin/relevance/isolationforest/
,其内容如下:
-rwx------ 1 hadoop hadoop 7170 Sep 11 20:54 BaggedPoint.scala
-rw-rw-r-- 1 hadoop hadoop 186719 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1.jar
-rw-rw-r-- 1 hadoop hadoop 1482 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1-javadoc.jar
-rw-rw-r-- 1 hadoop hadoop 20252 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1-sources.jar
-rwx------ 1 hadoop hadoop 16133 Sep 11 20:54 IsolationForestModelReadWrite.scala
-rwx------ 1 hadoop hadoop 5740 Sep 11 20:54 IsolationForestModel.scala
-rwx------ 1 hadoop hadoop 4057 Sep 11 20:54 IsolationForestParams.scala
-rwx------ 1 hadoop hadoop 11301 Sep 11 20:54 IsolationForest.scala
-rwx------ 1 hadoop hadoop 7990 Sep 11 20:54 IsolationTree.scala
drwxrwxr-x 2 hadoop hadoop 157 Sep 11 21:35 libs
-rwx------ 1 hadoop hadoop 1731 Sep 11 20:54 Nodes.scala
-rwx------ 1 hadoop hadoop 854 Sep 11 20:54 Utils.scala
当我尝试像这样加载IsolationForest类时:
When I attempt to load the IsolationForest class like so:
import com.linkedin.relevance.isolationforest.IsolationForest
我的笔记本出现以下错误:
I get the following error in my notebook:
<console>:33: error: object linkedin is not a member of package com
import com.linkedin.relevance.isolationforest.IsolationForest
我已经谷歌搜索了几个小时以达到这一点,但无法继续前进.下一步是什么?
I've been Googling for several hours now to get to this point but am unable to progress further. What is the next step?
顺便说一句,我正在尝试使用此软件包: https://github.com/linkedin /isolation-forest
By the way, I am attempting to use this package: https://github.com/linkedin/isolation-forest
谢谢.
推荐答案
对于Scala:
如果您使用的是 spylon-kernel ,则可以指定%%init_spark
部分中的其他jar,例如在文档中进行了说明(第一个用于jar文件,第二个用于程序包,如下所述):
if you're using spylon-kernel, then you can specify additional jars in the %%init_spark
section, as described in the docs (first is for jar file, second is for package, as described below):
%%init_spark
launcher.jars = ["/some/local/path/to/a/file.jar"]
launcher.packages = ["com.acme:super:1.0.1"]
对于Python:
执行以下操作:
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars <full_path_to>/isolation-forest_2.3.0_2.11-1.0.1.jar pyspark-shell'
这会将罐子添加到PySpark上下文中.但是最好使用--packages
而不是--jars
,因为它还会获取所有必要的依赖项,并将所有内容放入内部缓存中.例如
this will add the jars into the PySpark context. But it's better to use --packages
instead of --jars
because it will also fetch all necessary dependencies, and put everything into the internal cache. For example
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.linkedin.isolation-forest:isolation-forest_2.3.0_2.11:1.0.0 pyspark-shell'
您只需要选择与PySpark和Scala版本匹配的版本(2.3.x和2.4是Scala 2.11,3.0是Scala 2.12),如
You only need to select version that matches your PySpark and Scala version (2.3.x & 2.4 are Scala 2.11, 3.0 is Scala 2.12), as it's listed in the Git repo.
这篇关于如何将一个或多个本地.jar文件中的类导入Spark/Scala Notebook?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!