如何关闭Spark中的INFO日志记录? [英] How to turn off INFO logging in Spark?
问题描述
我使用AWS EC2指南安装了Spark,并且我可以使用 bin / pyspark
脚本启动程序以获得spark提示,并且还可以执行快速启动但是,我不能为我的生活找出如何停止所有详细的 INFO
在每条命令后面进行日志记录。
在我的 log4j.properties中,我尝试了几乎所有可能的场景(注释掉,设置为OFF)在
文件中,我从哪里启动应用程序以及在每个节点上执行任何操作。在执行每条语句后,我仍然记录日志 conf
文件夹中的 INFO
语句。
我很困惑这是怎么回事上班。
#设置所有要记录到控制台的信息log4j.rootCategory =信息,控制台
log4j.appender.console = org.apache.log4j.ConsoleAppender
log4j.appender.console.target = System.err
log4j.appender.console.layout = org.apache.log4j.PatternLayout
log4j.appender.console .layout.ConversionPattern =%d {yy / MM / dd HH:mm:ss}%p%c {1}:%m%n
#安静的第三方日志过于冗长
log4j.logger.org.eclipse.jetty = WARN
log4j.logger.org.apache.spark.repl.SparkIMain $ exprTyper = INFO
log4j.logger.org.apache.spark。 repl.SparkILoop $ SparkILoopInterpreter = INFO
当我使用时, SPARK_PRINT_LAUNCH_COMMAND
:
lockquote
Spark命令
/Library/Java/JavaVirtualMachines/jdk1.8.0_05。 jdk / Contents / Home / bin / java
-cp:/root/spark-1.0.1-bin-hadoop2/conf:/ roo T /火花1.0.1彬hadoop2 / conf目录:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0。 1彬hadoop2 / LIB / DataNucleus将-API-JDO-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/根/火花1.0.1-bin-hadoop2 / lib / datanucleus-rdbms-3.2.1.jar
-XX:MaxPermSize = 128m -Djava.library.path = -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark -shell --class
org.apache.spark.repl.Main
#!/ usr / bin / env bash
#当运行各种Spark程序时,该文件来源于。
#将其复制为spark-env.sh并编辑,为您的站点配置Spark。
#使用
#./bin/run-example或./bin/spark-submit
#本地启动程序时读取的选项 - HADOOP_CONF_DIR,指向Hadoop配置的Spark文件
# - SPARK_LOCAL_IP,设置Spark绑定到此节点上的IP地址
# - SPARK_PUBLIC_DNS,设置驱动程序的公共dns名称
# - SPARK_CLASSPATH = / root / spark -1.0.1-bin-hadoop2 / conf /
#集群中运行的执行程序和驱动程序读取的选项
# - SPARK_LOCAL_IP,设置Spark绑定到此节点上的IP地址
# - SPARK_PUBLIC_DNS,设置驱动程序的公共DNS名称
# - SPARK_CLASSPATH,缺省类路径条目以追加
# - SPARK_LOCAL_DIRS,在此节点上使用的存储目录用于混洗和RDD数据
# - MESOS_NATIVE_LIBRARY,指向你的libmesos.so,如果你使用Mesos
#在YARN客户端模式下读取的选项
# - HADOOP_CONF_DIR,指向Hadoop配置文件
# - SPARK_EXEC UTOR_INSTANCES,要开始的工作人员数量(默认值:2)
# - SPARK_EXECUTOR_CORES,工作人员的核心数量(默认值:1)。
# - SPARK_EXECUTOR_MEMORY,每个工作者的内存(例如1000M,2G)(默认:1G)
# - SPARK_DRIVER_MEMORY,主内存(例如1000M,2G)(默认值:512 Mb)
# - SPARK_YARN_APP_NAME,您的应用程序的名称(默认:Spark)
# - SPARK_YARN_QUEUE,用于分配请求的hadoop队列(默认值:'default')
# - SPARK_YARN_DIST_FILES,逗号分隔的文件列表与工作一起分发。
# - SPARK_YARN_DIST_ARCHIVES,与作业一起分发的逗号分隔列表。
#在独立部署模式下使用的守护程序选项:
# - SPARK_MASTER_IP,将主机绑定到不同的IP地址或主机名
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT,为master
# - SPARK_MASTER_OPTS使用非默认端口,仅为master设置配置属性(例如-Dx = y)
# - SPARK_WORKER_CORES,以设置要使用的核心数这台机器
# - SPARK_WORKER_MEMORY,用来设置工作人员必须给予执行程序(例如1000m,2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT多少内存,以便为worker使用非默认端口
# - SPARK_WORKER_INSTANCES,设置每个节点的工作进程数量
# - SPARK_WORKER_DIR,设置工作进程的工作目录
# - SPARK_WORKER_OPTS,仅为工作人员设置配置属性(例如-Dx = y)
# - SPARK_HISTORY_OPTS,仅为历史记录服务器设置配置属性(例如-Dx = y)
# - SPARK_DAEMON_JAVA_OPTS,设置c所有守护进程的属性(例如-Dx = y)
# - SPARK_PUBLIC_DNS,设置master或worker的公共dns名称
export SPARK_SUBMIT_CLASSPATH =$ FWDIR / conf
解决方案只需在spark目录中执行此命令:
cp conf / log4j.properties.template conf / log4j.properties
编辑log4j.properties:
#将所有要记录的内容设置到控制台
log4j .rootCategory = INFO,控制台
log4j.appender.console = org.apache.log4j.ConsoleAppender
log4j.appender.console.target = System.err
log4j.appender.console.layout = org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern =%d {yy / MM / dd HH:mm:ss}%p%c {1}:%m%n
#设置用于清除过于冗长的第三方日志
log4j.logger.org.eclipse.jetty = WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle =错误
log4j.logger.org.apache.spark.repl.S parkIMain $ exprTyper = INFO
log4j.logger.org.apache.spark.repl.SparkILoop $ SparkILoopInterpreter = INFO
在第一行替换:
log4j.rootCategory =信息,控制台
$ c
$ p
$ b $ $ $ $ p $ log4j.rootCategory = WARN,console
保存并重新启动您的shell。它适用于OS X上的Spark 1.1.0和Spark 1.5.1。
I installed Spark using the AWS EC2 guide and I can launch the program fine using the
bin/pyspark
script to get to the spark prompt and can also do the Quick Start quide successfully.However, I cannot for the life of me figure out how to stop all of the verbose
INFO
logging after each command.I have tried nearly every possible scenario in the below code (commenting out, setting to OFF) within my
log4j.properties
file in theconf
folder in where I launch the application from as well as on each node and nothing is doing anything. I still get the loggingINFO
statements printing after executing each statement.I am very confused with how this is supposed to work.
#Set everything to be logged to the console log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Here is my full classpath when I use
SPARK_PRINT_LAUNCH_COMMAND
:Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java -cp :/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main
contents of
spark-env.sh
:#!/usr/bin/env bash # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/ # Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program # - SPARK_CLASSPATH, default classpath entries to append # - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos # Options read in YARN client mode # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2) # - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1). # - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) # - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) # - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) # - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’) # - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. # - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. # Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y") # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g) # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y") # - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y") # - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y") # - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"
解决方案Just execute this command in the spark directory:
cp conf/log4j.properties.template conf/log4j.properties
Edit log4j.properties:
# Set everything to be logged to the console log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Replace at the first line:
log4j.rootCategory=INFO, console
by:
log4j.rootCategory=WARN, console
Save and restart your shell. It works for me for Spark 1.1.0 and Spark 1.5.1 on OS X.
这篇关于如何关闭Spark中的INFO日志记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!