如何关闭信息记录在PySpark? [英] How to turn off INFO logging in PySpark?
问题描述
我安装使用AWS EC2引导星火,我可以使用斌/ pyspark
脚本去的火花迅速启动程序罚款,也可以做快速启动成功quide。
不过,我不能为我的生活弄清楚如何停止所有详细的信息
日志记录。
我曾尝试在下面的code几乎每一个可能的方案(注释掉,设置为OFF)在<$ C $我的 log4j.properties
文件中C> CONF 在我的以及每个节点并没有什么上启动应用程序文件夹是做任何事情。我仍然得到记录信息
报表打印执行每个语句后。
我很困惑这个应该是怎样工作的。
#SET一切都将被记录到控制台log4j.rootCategory = INFO,控制台
log4j.appender.console = org.apache.log4j.ConsoleAppender
log4j.appender.console.target = System.err的
log4j.appender.console.layout = org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern =%d个{YY / MM / DD HH:MM:SS}%P%C {1}:%M%N#设置安静第三方日志过于冗长
log4j.logger.org.eclipse.jetty = WARN
log4j.logger.org.apache.spark.repl.SparkIMain $ exprTyper = INFO
log4j.logger.org.apache.spark.repl.SparkILoop $ SparkILoopInter preTER = INFO
下面是我的完整的类路径,当我使用 SPARK_PRINT_LAUNCH_COMMAND
:
星火命令:
/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java
-cp :/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar
-XX:MaxPermSize参数=128米-Djava.library.path = -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit火花壳--class
org.apache.spark.repl.Main
块引用>内容
spark-env.sh
:#!的/ usr / bin中/ env的庆典#运行的各种星火计划时,此文件来源。
#复制它作为spark-env.sh和编辑,为您的网站配置的火花。#选项读取本地启动程序时
#./bin/run-example或./bin/spark-submit
# - HADOOP_CONF_DIR,指向对Hadoop的配置文件星火
# - SPARK_LOCAL_IP,设置IP地址星火绑定到这个节点上
# - SPARK_PUBLIC_DNS,设置驱动程序的公共DNS名称
# - SPARK_CLASSPATH = /根/火花1.0.1彬hadoop2 / conf目录/由遗嘱执行人和司机在集群内部运行读#选项
# - SPARK_LOCAL_IP,设置IP地址星火绑定到这个节点上
# - SPARK_PUBLIC_DNS,设置驱动程序的公共DNS名称
# - SPARK_CLASSPATH,默认的类路径条目追加
# - SPARK_LOCAL_DIRS,存储目录到这个节点上使用的洗牌和RDD数据
# - MESOS_NATIVE_LIBRARY,指向您libmesos.so如果使用Mesos阅读纱线客户端模式选项#
# - HADOOP_CONF_DIR,指向对Hadoop的配置文件星火
# - SPARK_EXECUTOR_INSTANCES,工人人数开始(默认:2)
# - SPARK_EXECUTOR_CORES,核心数量为工人(默认值:1)。
# - SPARK_EXECUTOR_MEMORY,每个工人存储器(例如1000M,2G)(默认:1G)
# - SPARK_DRIVER_MEMORY,内存硕士(例如1000M,2G)(默认值:512 MB)
# - SPARK_YARN_APP_NAME,你的应用程序的名称(默认:星火)
# - SPARK_YARN_QUEUE,Hadoop的队列用于分配请求(默认:默认)
# - SPARK_YARN_DIST_FILES,以逗号分隔的文件列表与工作分配。
# - SPARK_YARN_DIST_ARCHIVES,以逗号分隔的档案列表,与工作分配。在独立部署模式下使用的守护进程#选项:
# - SPARK_MASTER_IP,到主绑定到不同的IP地址或主机名
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT,使用非默认端口为主控
# - SPARK_MASTER_OPTS,只对主设置配置属性(例如,-Dx = Y)
# - SPARK_WORKER_CORES,设置内核的数量到本机上使用
# - SPARK_WORKER_MEMORY,设置总内存的工人有多少给执行者(例如1000米2G)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT,使用非默认端口为职工
# - SPARK_WORKER_INSTANCES,设置每个节点的工作进程的数量
# - SPARK_WORKER_DIR,设置工作进程的工作目录
# - SPARK_WORKER_OPTS,只为工人设置配置属性(例如,-Dx = Y)
# - SPARK_HISTORY_OPTS,只为历史记录服务器设置配置属性(例如,-Dx = Y)
# - SPARK_DAEMON_JAVA_OPTS,要为所有守护程序配置属性(例如,-Dx = Y)
# - SPARK_PUBLIC_DNS,设置主或工人的公共DNS名称出口SPARK_SUBMIT_CLASSPATH =$ FWDIR / conf目录
解决方案就在火花目录执行此命令:
CP的conf / log4j.properties.template的conf / log4j.properties
编辑log4j.properties:
#设置都将被记录到控制台
log4j.rootCategory = INFO,控制台
log4j.appender.console = org.apache.log4j.ConsoleAppender
log4j.appender.console.target = System.err的
log4j.appender.console.layout = org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern =%d个{YY / MM / DD HH:MM:SS}%P%C {1}:%M%N#设置安静第三方日志过于冗长
log4j.logger.org.eclipse.jetty = WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle =错误
log4j.logger.org.apache.spark.repl.SparkIMain $ exprTyper = INFO
log4j.logger.org.apache.spark.repl.SparkILoop $ SparkILoopInter preTER = INFO在第一行替换:
log4j.rootCategory = INFO,控制台
按
log4j.rootCategory = WARN,控制台
保存并重新启动你的shell。它为我的星火1.1.0和1.5.1星火OS X上。
I installed Spark using the AWS EC2 guide and I can launch the program fine using the
bin/pyspark
script to get to the spark prompt and can also do the Quick Start quide successfully.However, I cannot for the life of me figure out how to stop all of the verbose
INFO
logging after each command.I have tried nearly every possible scenario in the below code (commenting out, setting to OFF) within my
log4j.properties
file in theconf
folder in where I launch the application from as well as on each node and nothing is doing anything. I still get the loggingINFO
statements printing after executing each statement.I am very confused with how this is supposed to work.
#Set everything to be logged to the console log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Here is my full classpath when I use
SPARK_PRINT_LAUNCH_COMMAND
:Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java -cp :/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main
contents of
spark-env.sh
:#!/usr/bin/env bash # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/ # Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program # - SPARK_CLASSPATH, default classpath entries to append # - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos # Options read in YARN client mode # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2) # - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1). # - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) # - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) # - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) # - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’) # - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. # - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. # Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y") # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g) # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y") # - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y") # - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y") # - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"
解决方案Just execute this command in the spark directory:
cp conf/log4j.properties.template conf/log4j.properties
Edit log4j.properties:
# Set everything to be logged to the console log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Replace at the first line:
log4j.rootCategory=INFO, console
by
log4j.rootCategory=WARN, console
Save and restart your shell. It works for me for Spark 1.1.0 and Spark 1.5.1 on OS X.
这篇关于如何关闭信息记录在PySpark?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!