如何将一个或多个本地.jar文件中的类导入Spark/Scala Notebook? [英] How do I import classes from one or more local .jar files into a Spark/Scala Notebook?

查看:199
本文介绍了如何将一个或多个本地.jar文件中的类导入Spark/Scala Notebook?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力将JAR中的类加载到我的Scala-Spark内核Jupyter笔记本中.我在这个位置有罐子.

I am struggling to load classes from JARs into my Scala-Spark kernel Jupyter notebook. I have jars at this location:

/home/hadoop/src/main/scala/com/linkedin/relevance/isolationforest/

,其内容如下:

-rwx------ 1 hadoop hadoop   7170 Sep 11 20:54 BaggedPoint.scala
-rw-rw-r-- 1 hadoop hadoop 186719 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1.jar
-rw-rw-r-- 1 hadoop hadoop   1482 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1-javadoc.jar
-rw-rw-r-- 1 hadoop hadoop  20252 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1-sources.jar
-rwx------ 1 hadoop hadoop  16133 Sep 11 20:54 IsolationForestModelReadWrite.scala
-rwx------ 1 hadoop hadoop   5740 Sep 11 20:54 IsolationForestModel.scala
-rwx------ 1 hadoop hadoop   4057 Sep 11 20:54 IsolationForestParams.scala
-rwx------ 1 hadoop hadoop  11301 Sep 11 20:54 IsolationForest.scala
-rwx------ 1 hadoop hadoop   7990 Sep 11 20:54 IsolationTree.scala
drwxrwxr-x 2 hadoop hadoop    157 Sep 11 21:35 libs
-rwx------ 1 hadoop hadoop   1731 Sep 11 20:54 Nodes.scala
-rwx------ 1 hadoop hadoop    854 Sep 11 20:54 Utils.scala

当我尝试像这样加载IsolationForest类时:

When I attempt to load the IsolationForest class like so:

import com.linkedin.relevance.isolationforest.IsolationForest

我的笔记本出现以下错误:

I get the following error in my notebook:

<console>:33: error: object linkedin is not a member of package com
       import com.linkedin.relevance.isolationforest.IsolationForest

我已经谷歌搜索了几个小时以达到这一点,但无法继续前进.下一步是什么?

I've been Googling for several hours now to get to this point but am unable to progress further. What is the next step?

顺便说一句,我正在尝试使用此软件包: https://github.com/linkedin /isolation-forest

By the way, I am attempting to use this package: https://github.com/linkedin/isolation-forest

谢谢.

推荐答案

对于Scala:

如果您使用的是 spylon-kernel ,则可以指定%%init_spark部分中的其他jar,例如在文档中进行了说明(第一个用于jar文件,第二个用于程序包,如下所述):

if you're using spylon-kernel, then you can specify additional jars in the %%init_spark section, as described in the docs (first is for jar file, second is for package, as described below):

%%init_spark
launcher.jars = ["/some/local/path/to/a/file.jar"]
launcher.packages = ["com.acme:super:1.0.1"]

对于Python:

执行以下操作:

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars <full_path_to>/isolation-forest_2.3.0_2.11-1.0.1.jar pyspark-shell'

这会将罐子添加到PySpark上下文中.但是最好使用--packages而不是--jars,因为它还会获取所有必要的依赖项,并将所有内容放入内部缓存中.例如

this will add the jars into the PySpark context. But it's better to use --packages instead of --jars because it will also fetch all necessary dependencies, and put everything into the internal cache. For example

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.linkedin.isolation-forest:isolation-forest_2.3.0_2.11:1.0.0 pyspark-shell'

您只需要选择与PySpark和Scala版本匹配的版本(2.3.x和2.4是Scala 2.11,3.0是Scala 2.12),如

You only need to select version that matches your PySpark and Scala version (2.3.x & 2.4 are Scala 2.11, 3.0 is Scala 2.12), as it's listed in the Git repo.

这篇关于如何将一个或多个本地.jar文件中的类导入Spark/Scala Notebook?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆