apache spark2.3.0 при запуске с master как пряжа завершается с ошибкой. Не удается найти или загрузить основной класс org.apache.spark.deploy.yarn.ApplicationMaster.

Я установил Apache Hadoop 2.7.5 а также Apache Spark 2.3.0,
Когда я отправляю свою работу с --master local[*] работает нормально. Но когда я бегу --master yarn ошибка из веб-журналов говорит

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

Вот команда, которую я запускаю:

spark-submit --class com.spark.SparkTest --master yarn --deploy-mode cluster /root/Downloads/SimpleSpark-0.0.1-SNAPSHOT.jar

И консоль гласит:

[root@localhost sbin]# spark-submit --class com.spark.SparkTest --master yarn --deploy-mode cluster /root/Downloads/SimpleSpark-0.0.1-SNAPSHOT.jar
2018-05-12 17:24:37 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-05-12 17:24:39 INFO  RMProxy:98 - Connecting to ResourceManager at /0.0.0.0:8032
2018-05-12 17:24:40 INFO  Client:54 - Requesting a new application from cluster with 1 NodeManagers
2018-05-12 17:24:40 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2018-05-12 17:24:40 INFO  Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2018-05-12 17:24:40 INFO  Client:54 - Setting up container launch context for our AM
2018-05-12 17:24:40 INFO  Client:54 - Setting up the launch environment for our AM container
2018-05-12 17:24:40 INFO  Client:54 - Preparing resources for our AM container
2018-05-12 17:24:43 INFO  Client:54 - Uploading resource file:/opt/spark-2.3.0/yarn/spark-2.3.0-yarn-shuffle.jar -> hdfs://localhost:9000/user/root/.sparkStaging/application_1526143826498_0001/spark-2.3.0-yarn-shuffle.jar
2018-05-12 17:24:45 INFO  Client:54 - Uploading resource file:/root/Downloads/SimpleSpark-0.0.1-SNAPSHOT.jar -> hdfs://localhost:9000/user/root/.sparkStaging/application_1526143826498_0001/SimpleSpark-0.0.1-SNAPSHOT.jar
2018-05-12 17:24:45 WARN  DFSClient:611 - Caught exception
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        at java.lang.Thread.join(Thread.java:1326)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
2018-05-12 17:24:45 WARN  Client:66 - Same name resource file:/opt/spark-2.3.0/yarn/spark-2.3.0-yarn-shuffle.jar added multiple times to distributed cache
2018-05-12 17:24:45 INFO  Client:54 - Uploading resource file:/tmp/spark-6db13382-d02d-4e8a-b5bf-5aafd535ba1e/__spark_conf__789951835863303071.zip -> hdfs://localhost:9000/user/root/.sparkStaging/application_1526143826498_0001/__spark_conf__.zip
2018-05-12 17:24:46 WARN  DFSClient:611 - Caught exception
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        at java.lang.Thread.join(Thread.java:1326)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
2018-05-12 17:24:46 INFO  SecurityManager:54 - Changing view acls to: root
2018-05-12 17:24:46 INFO  SecurityManager:54 - Changing modify acls to: root
2018-05-12 17:24:46 INFO  SecurityManager:54 - Changing view acls groups to:
2018-05-12 17:24:46 INFO  SecurityManager:54 - Changing modify acls groups to:
2018-05-12 17:24:46 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2018-05-12 17:24:46 INFO  Client:54 - Submitting application application_1526143826498_0001 to ResourceManager
2018-05-12 17:24:46 INFO  YarnClientImpl:273 - Submitted application application_1526143826498_0001
2018-05-12 17:24:47 INFO  Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:47 INFO  Client:54 -
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1526145886541
         final status: UNDEFINED
         tracking URL: http://localhost.localdomain:8088/proxy/application_1526143826498_0001/
         user: root
2018-05-12 17:24:48 INFO  Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:49 INFO  Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:50 INFO  Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:51 INFO  Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:52 INFO  Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:53 INFO  Client:54 - Application report for application_1526143826498_0001 (state: ACCEPTED)
2018-05-12 17:24:54 INFO  Client:54 - Application report for application_1526143826498_0001 (state: FAILED)
2018-05-12 17:24:54 INFO  Client:54 -
         client token: N/A
         diagnostics: Application application_1526143826498_0001 failed 2 times due to AM Container for appattempt_1526143826498_0001_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://localhost.localdomain:8088/cluster/app/application_1526143826498_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1526143826498_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
        at org.apache.hadoop.util.Shell.run(Shell.java:482)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1526145886541
         final status: FAILED
         tracking URL: http://localhost.localdomain:8088/cluster/app/application_1526143826498_0001
         user: root
2018-05-12 17:24:54 INFO  Client:54 - Deleted staging directory hdfs://localhost:9000/user/root/.sparkStaging/application_1526143826498_0001
Exception in thread "main" org.apache.spark.SparkException: Application application_1526143826498_0001 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1159)
        at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-05-12 17:24:55 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-05-12 17:24:55 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-6db13382-d02d-4e8a-b5bf-5aafd535ba1e
2018-05-12 17:24:55 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-1218ca67-7fae-4c0b-b678-002963a1cf08

Диагностика гласит:

Application application_1526143826498_0001 failed 2 times due to AM Container for appattempt_1526143826498_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://localhost.localdomain:8088/cluster/app/application_1526143826498_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1526143826498_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.

Когда я нажимаю на логи для деталей:

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

Вот мой spark-defaults.conf:

spark.master                     spark://localhost.localdomain:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://localhost.localdomain:8021/user/spark/logs
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              1g
spark.executor.memory            1g
spark.yarn.dist.jars             /opt/spark-2.3.0/yarn/spark-2.3.0-yarn-shuffle.jar
spark.yarn.jars                  /opt/spark-2.3.0/yarn/spark-2.3.0-yarn-shuffle.jar
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

Мой spark-env.sh:

SPARK_MASTER_HOST=localhost.localdomain
SPARK_MASTER_PORT=7077
SPARK_LOCAL_IP=localhost.localdomain
SPARK_CONF_DIR=${SPARK_HOME}/conf
HADOOP_CONF_DIR=/opt/hadoop-2.7.5/etc/hadoop
YARN_CONF_DIR=/opt/hadoop-2.7.5/etc/hadoop
SPARK_EXECUTOR_CORES=2
SPARK_EXECUTOR_MEMORY=500M
SPARK_DRIVER_MEMORY=500M

И мой yarn-site.xml:

<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
                <name>yarn.nodemanager.vmem-check-enabled</name>
                <value>false</value>
        </property>
        <property>
                <name>yarn.application.classpath</name>
                <value>
                /opt/hadoop-2.7.5/etc/hadoop,
                /opt/hadoop-2.7.5/*,
                /opt/hadoop-2.7.5/lib/*,
                /opt/hadoop-2.7.5/share/hadoop/common/*,
                /opt/hadoop-2.7.5/share/hadoop/common/lib/*
                /opt/hadoop-2.7.5/share/hadoop/hdfs/*,
                /opt/hadoop-2.7.5/share/hadoop/hdfs/lib/*,
                /opt/hadoop-2.7.5/share/hadoop/mapreduce/*,
                /opt/hadoop-2.7.5/share/hadoop/mapreduce/lib/*,
                /opt/hadoop-2.7.5/share/hadoop/tools/lib/*,
                /opt/hadoop-2.7.5/share/hadoop/yarn/*,
                /opt/hadoop-2.7.5/share/hadoop/yarn/lib/*
                </value>
        </property>
</configuration>

Я скопировал spark-yarn_2.11-2.3.0.jar в /opt/hadoop-2.7.5/share/hadoop/yarn/*,
Я пошел через несколько решений stackru, где упоминалось о прохождении --conf "spark.driver.extraJavaOptions=-Diop.version=4.1.0.0" но это не сработало для моего случая.
В каком-то решении говорилось об отсутствующих банках для журналов, но я не уверен, какой именно. Я что-то пропустил?

1 ответ

Попробуйте запустить команду, добавив банки с помощью --jars

spark-submit --class com.spark.SparkTest --master yarn  --jars /fullpath/first.jar,/fullpath/second.jar --deploy-mode cluster /root/Downloads/SimpleSpark-0.0.1-SNAPSHOT.jar 

Или добавить банки в conf/spark-defaults.conf добавив такие строки, как:

spark.driver.extraClassPath /fullpath/yarn-jar.jar:/fullpath/second.jar
spark.executor.extraClassPath /fullpath/yarn-jar.jar:/fullpath/second.jar
Другие вопросы по тегам