Выбор данных из таблицы улья
У меня есть данные временных рядов в Hive table(21.1G)
в котором хранится столько частей файлов паркета в формате hdf. Таблица разделена на sid, год, дату. Мне нужно сделать запрос на выборку для этой таблицы и сделать некоторые вычисления. задание не выполнено на шаге запроса select. Задание убивается без сообщения об ошибке.
запрос:
hiveContext.sql(s"SELECT timeStamp, movingVariance, sid, year, date_col FROM qf_testcase_1_5_movingvariance WHERE sid = '$stationId' ORDER BY timeStamp")
PS: я мог бы сделать выбор на таблице кустов 1.6T с той же конфигурации без сбоя задания. В обоих случаях я выбираю по итерации (для каждого sid), чтобы ограничить количество записей.
я использую Spark 1.6
, Hive 1.2
, Пожалуйста, посоветуйте мне.
консольный вывод:
2017-09-08 17:08:35,725 INFO org.apache.spark.scheduler.TaskSetManager (Logging.scala:logInfo(58)) - Finished task 112235.0 in stage 3.0 (TID 112596) in 584 ms on 165-51.sc1.xxxxxx.com (112275/177852)
Killed
Моя настройка конфигурации:
val conf = new SparkConf()
.setAppName("MyApp_name")
.setMaster("yarn-client")
.set("spark.shuffle.compress", "true")
.set("spark.sql.shuffle.partitions", "1900")
.set("spark.storage.memoryFraction", "0.2")
.set("spark.shuffle.memory.fraction", "0.8")
.set("spark.yarn.executor.memoryOverhead", "3000")
.set("spark.shuffle.manager", "SORT")
.set("spark.eventLog.enabled", "false")
.set("spark.network.timeout", "10000000")
.set("spark.executor.heartbeatInterval", "10000000")
.set("spark.driver.maxResultSize", "5g")
val sc = new SparkContext(conf)
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
hiveContext.setConf("hive.exec.stagingdir", "/tmp/.hive-staging")
hiveContext.setConf("spark.sql.parquet.compression.codec", "gzip")
hiveContext.getConf("spark.sql.parquet.compression.codec", "uncompressed")
команда spark-submit:
spark-submit $SPARK_SUBMIT_ARGS --verbose --class org.zz.spark.class_name --master yarn --deploy-mode client --num-executors 15 --driver-memory 5g --executor-memory 30g --executor-cores 10 --conf "spark.yarn.nodemanager.resource.memory-mb=123880" --conf "spark.yarn.nodemanager.resource.cpu-vcores=43" xxxxx.jar
Мой журнал Hadoop:
Log Type: stderr
Showing 4096 bytes of 736206 total. Click here for the full log.
/application_1505465764561017_0136/asm-tree-3.1.jar#asm-tree-3.1.jar,hdfs://nn-yyyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/hive-common-1.2.1.jar#hive-common-1.2.1.jar,hdfs://nn-yyyyy.s3s.xxxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/httpcore-4.4.jar#httpcore-4.4.jar,hdfs://nn-yyyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/ant-launcher-1.9.1.jar#ant-launcher-1.9.1.jar,hdfs://nn-yyyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/commons-vfs2-2.0.jar#commons-vfs2-2.0.jar,hdfs://nn-yyyyy.s3s.xxxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar#pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar,hdfs://nn-
yyyyy.s3s.xxxxx.com:8020/user/username/.sparkStaging/application_1503335684017_0136/calcite-avatica-1.2.0-incubating.jar#calcite-avatica-1.2.0-
incubating.jar,hdfs://nn-yyyy.s3s.xxxxxxx.com:8020/user/username/.sparkStaging/application_1505465764561017_0136/commons-beanutils-
core-1.8.0.jar#commons-beanutils-core-1.8.0.jar,hdfs://nn-yyyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/hive-jdbc-1.2.1-standalone.jar#hive-jdbc-1.2.1-standalone.jar,hdfs://nn-yyyyy.s3s.xxxxx.com:8020/user/
username/.sparkStaging/application_1505465764561017_0136/netty-3.7.0.Final.jar#netty-3.7.0.Final.jar,hdfs://nn-yyyyyy.s3s.xxxx.com:8020/user/
username/.sparkStaging/application_1505465764561017_0136/commons-digester-1.8.jar#commons-digester-1.8.jar,hdfs://nn-xxxx.s3s.yyyyy.com:8020/
user/username/.sparkStaging/application_1505465764561017_0136/commons-configuration-1.6.jar#commons-configuration-1.6.jar,hdfs://nn-
yyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/application_1505465764561017_0136/hive-shims-common-1.2.1.jar#hive-shims-common-1.2.1.jar
command:
{{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms30720m -Xmx30720m -Djava.io.tmpdir={{PWD}}/tmp '-
Dspark.executor.port=45250' '-Dspark.replClassServer.port=45070' '-Dspark.broadcast.port=45200' '-Dspark.history.ui.port=18080' '
-Dspark.ui.port=45100' '-Dspark.port.maxRetries=999' '-Dspark.blockManager.port=45300' '-Dspark.fileserver.port=45090' '-
Dspark.driver.port=45055' -Dspark.yarn.app.container.log.dir=<LOG_DIR> -XX:MaxPermSize=256m
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.252.18.244:45055 --executor-id 12 --
hostname 105-47.sc1.xxxx.com --cores 10 --app-id application_1505465764561017_0136 --user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/
stdout 2> <LOG_DIR>/stderr
===============================================================================
2017-09-08 00:27:38,617 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy
(ContainerManagementProtocolProxy.java:newProxy(260)) - Opening proxy : 105-47.sc1.xxxx.com:64033
2017-09-08 00:48:55,813 INFO org.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint (Logging.scala:logInfo(58)) - Driver terminated or
disconnected! Shutting down. desktop-yyyy.service.xxxx.com:45055
2017-09-08 00:48:55,817 INFO org.apache.spark.deploy.yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Final app status: SUCCEEDED,
exitCode: 0
2017-09-08 00:48:55,825 INFO org.apache.spark.deploy.yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Unregistering ApplicationMaster
with SUCCEEDED
2017-09-08 00:48:55,836 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl (AMRMClientImpl.java:unregisterApplicationMaster(382)) -
Waiting for application to be successfully unregistered.
2017-09-08 00:48:55,938 INFO org.apache.spark.deploy.yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Deleting staging
directory .sparkStaging/application_1505465764561017_0136
2017-09-08 00:48:56,089 INFO org.apache.spark.util.ShutdownHookManager (Logging.scala:logInfo(58)) - Shutdown hook called