Выбор данных из таблицы улья

У меня есть данные временных рядов в Hive table(21.1G) в котором хранится столько частей файлов паркета в формате hdf. Таблица разделена на sid, год, дату. Мне нужно сделать запрос на выборку для этой таблицы и сделать некоторые вычисления. задание не выполнено на шаге запроса select. Задание убивается без сообщения об ошибке.

запрос:

  hiveContext.sql(s"SELECT timeStamp, movingVariance, sid, year, date_col FROM qf_testcase_1_5_movingvariance WHERE sid = '$stationId' ORDER BY timeStamp") 

PS: я мог бы сделать выбор на таблице кустов 1.6T с той же конфигурации без сбоя задания. В обоих случаях я выбираю по итерации (для каждого sid), чтобы ограничить количество записей.

я использую Spark 1.6, Hive 1.2, Пожалуйста, посоветуйте мне.

консольный вывод:

2017-09-08 17:08:35,725 INFO  org.apache.spark.scheduler.TaskSetManager (Logging.scala:logInfo(58)) - Finished task 112235.0 in stage 3.0 (TID 112596) in 584 ms on 165-51.sc1.xxxxxx.com (112275/177852)
Killed

Моя настройка конфигурации:

  val conf = new SparkConf()
  .setAppName("MyApp_name")
  .setMaster("yarn-client")
  .set("spark.shuffle.compress", "true")
  .set("spark.sql.shuffle.partitions", "1900")
  .set("spark.storage.memoryFraction", "0.2")
  .set("spark.shuffle.memory.fraction", "0.8")
  .set("spark.yarn.executor.memoryOverhead", "3000")
  .set("spark.shuffle.manager", "SORT")
  .set("spark.eventLog.enabled", "false")
  .set("spark.network.timeout", "10000000")
  .set("spark.executor.heartbeatInterval", "10000000")
  .set("spark.driver.maxResultSize", "5g")
val sc = new SparkContext(conf)

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
hiveContext.setConf("hive.exec.stagingdir", "/tmp/.hive-staging")
hiveContext.setConf("spark.sql.parquet.compression.codec", "gzip")
hiveContext.getConf("spark.sql.parquet.compression.codec", "uncompressed")

команда spark-submit:

spark-submit $SPARK_SUBMIT_ARGS --verbose --class org.zz.spark.class_name --master yarn --deploy-mode client --num-executors 15 --driver-memory 5g --executor-memory 30g --executor-cores 10 --conf "spark.yarn.nodemanager.resource.memory-mb=123880" --conf "spark.yarn.nodemanager.resource.cpu-vcores=43" xxxxx.jar

Мой журнал Hadoop:

Log Type: stderr
Showing 4096 bytes of 736206 total. Click here for the full log.

 /application_1505465764561017_0136/asm-tree-3.1.jar#asm-tree-3.1.jar,hdfs://nn-yyyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/       
application_1505465764561017_0136/hive-common-1.2.1.jar#hive-common-1.2.1.jar,hdfs://nn-yyyyy.s3s.xxxxx.com:8020/user/username/.sparkStaging/ 
application_1505465764561017_0136/httpcore-4.4.jar#httpcore-4.4.jar,hdfs://nn-yyyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/ant-launcher-1.9.1.jar#ant-launcher-1.9.1.jar,hdfs://nn-yyyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/commons-vfs2-2.0.jar#commons-vfs2-2.0.jar,hdfs://nn-yyyyy.s3s.xxxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar#pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar,hdfs://nn- 
yyyyy.s3s.xxxxx.com:8020/user/username/.sparkStaging/application_1503335684017_0136/calcite-avatica-1.2.0-incubating.jar#calcite-avatica-1.2.0-
incubating.jar,hdfs://nn-yyyy.s3s.xxxxxxx.com:8020/user/username/.sparkStaging/application_1505465764561017_0136/commons-beanutils-
core-1.8.0.jar#commons-beanutils-core-1.8.0.jar,hdfs://nn-yyyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/
application_1505465764561017_0136/hive-jdbc-1.2.1-standalone.jar#hive-jdbc-1.2.1-standalone.jar,hdfs://nn-yyyyy.s3s.xxxxx.com:8020/user/
username/.sparkStaging/application_1505465764561017_0136/netty-3.7.0.Final.jar#netty-3.7.0.Final.jar,hdfs://nn-yyyyyy.s3s.xxxx.com:8020/user/
username/.sparkStaging/application_1505465764561017_0136/commons-digester-1.8.jar#commons-digester-1.8.jar,hdfs://nn-xxxx.s3s.yyyyy.com:8020/
user/username/.sparkStaging/application_1505465764561017_0136/commons-configuration-1.6.jar#commons-configuration-1.6.jar,hdfs://nn-
yyyy.s3s.xxxx.com:8020/user/username/.sparkStaging/application_1505465764561017_0136/hive-shims-common-1.2.1.jar#hive-shims-common-1.2.1.jar

  command:

    {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms30720m -Xmx30720m -Djava.io.tmpdir={{PWD}}/tmp '- 
Dspark.executor.port=45250' '-Dspark.replClassServer.port=45070' '-Dspark.broadcast.port=45200' '-Dspark.history.ui.port=18080' '   
-Dspark.ui.port=45100' '-Dspark.port.maxRetries=999' '-Dspark.blockManager.port=45300' '-Dspark.fileserver.port=45090' '-     
Dspark.driver.port=45055' -Dspark.yarn.app.container.log.dir=<LOG_DIR> -XX:MaxPermSize=256m       
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.252.18.244:45055 --executor-id 12 --   
hostname 105-47.sc1.xxxx.com --cores 10 --app-id application_1505465764561017_0136 --user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/    
stdout   2> <LOG_DIR>/stderr
===============================================================================

2017-09-08 00:27:38,617 INFO  org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy      
(ContainerManagementProtocolProxy.java:newProxy(260)) - Opening proxy : 105-47.sc1.xxxx.com:64033
2017-09-08 00:48:55,813 INFO  org.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint (Logging.scala:logInfo(58)) - Driver terminated or     
disconnected! Shutting down. desktop-yyyy.service.xxxx.com:45055
2017-09-08 00:48:55,817 INFO  org.apache.spark.deploy.yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Final app status: SUCCEEDED,   
exitCode: 0
2017-09-08 00:48:55,825 INFO  org.apache.spark.deploy.yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Unregistering ApplicationMaster    
with   SUCCEEDED
2017-09-08 00:48:55,836 INFO  org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl (AMRMClientImpl.java:unregisterApplicationMaster(382)) -   
Waiting for application to be successfully unregistered.
2017-09-08 00:48:55,938 INFO  org.apache.spark.deploy.yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Deleting staging    
directory .sparkStaging/application_1505465764561017_0136
2017-09-08 00:48:56,089 INFO  org.apache.spark.util.ShutdownHookManager (Logging.scala:logInfo(58)) - Shutdown hook called

0 ответов

Другие вопросы по тегам