Ошибка в MR (карта = карта, уменьшить = уменьшить, объединить = объединить, vectorized.reduce,: потоковая передача hadoop не удалось с кодом ошибки 1 Вызовы: mapreduce -> MR

Я бегу ниже Rscript GDP.R

#!/usr/bin/env Rscript

Sys.getenv(c("HADOOP_HOME", "HADOOP_CMD", "HADOOP_STREAMING", "HADOOP_CONF_DIR"))

library(rmr2)
library(rhdfs)

setwd("/root/somnath/GDP_data/")
gdp <- read.csv("GDP.csv")
head(gdp)

hdfs.init()
gdp.values <- to.dfs(gdp)

aaplRevenue = 156508

gdp.map.fn <- function(k,v) {
key <- ifelse(v[4] < aaplRevenue, "less", "greater")
keyval(key, 1)
}

count.reduce.fn <- function(k,v) {
keyval(k, length(v))
}

count <- mapreduce(input=gdp.values, map=gdp.map.fn, reduce=count.reduce.fn)

from.dfs(count)

$val

и не смог преодолеть приведенную ниже ошибку в функции mapreduce:

Сбой потоковой команды! Ошибка в MR (карта = карта, уменьшить = уменьшить, объединить = объединить, vectorized.reduce,: потоковая передача hadoop не удалось с кодом ошибки 1 Вызовы: mapreduce -> MR

[root@kkws029 RHadoop_scripts]# Rscript gdp.R
Loading required package: methods
Loading required package: rJava

HADOOP_CMD=/opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop

Be sure to run hdfs.init()
  CountryCode Number   CountryName     GDP
1         USA      1  UnitedStates  168000
2         CHN      2         China 9240270
3         JPN      3         Japan 4901530
4         DEU      4       Germany 3634823
5         FRA      5        France 2734949
6         GBR      6 UnitedKingdom 2522261
14/07/08 16:57:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
14/07/08 16:57:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/07/08 16:57:02 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
packageJobJar: [/tmp/hadoop-root/hadoop-unjar4163972761639211537/] [] /tmp/streamjob5583656452134995821.jar tmpDir=null
14/07/08 16:57:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/07/08 16:57:13 INFO mapred.FileInputFormat: Total input paths to process : 1
14/07/08 16:57:20 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
14/07/08 16:57:20 INFO streaming.StreamJob: Running job: job_201407071547_0024
14/07/08 16:57:20 INFO streaming.StreamJob: To kill this job, run:
14/07/08 16:57:20 INFO streaming.StreamJob: /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=kkws030.mara-ison.com:8021 -kill job_201407071547_0024
14/07/08 16:57:20 INFO streaming.StreamJob: Tracking URL: http://kkws030.mara-ison.com:50030/jobdetails.jsp?jobid=job_201407071547_0024
14/07/08 16:57:21 INFO streaming.StreamJob:  map 0%  reduce 0%
14/07/08 16:57:26 INFO streaming.StreamJob:  map 50%  reduce 0%
14/07/08 16:57:49 INFO streaming.StreamJob:  map 100%  reduce 100%
14/07/08 16:57:49 INFO streaming.StreamJob: To kill this job, run:
14/07/08 16:57:49 INFO streaming.StreamJob: /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=kkws030.mara-ison.com:8021 -kill job_201407071547_0024
14/07/08 16:57:49 INFO streaming.StreamJob: Tracking URL: http://kkws030.mara-ison.com:50030/jobdetails.jsp?jobid=job_201407071547_0024
14/07/08 16:57:49 ERROR streaming.StreamJob: Job not successful. Error: NA
14/07/08 16:57:49 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  :
  hadoop streaming failed with error code 1
Calls: mapreduce -> mr
In addition: Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
4: In rmr.options("backend.parameters") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
Execution halted
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)

Мой журнал stderr выглядит следующим образом:

Loading objects:
  .Random.seed
  aaplRevenue
  count.reduce.fn
  gdp
  gdp.map.fn
  gdp.values
Loading objects:
  backend.parameters
  combine
  combine.file
  combine.line
  debug
  default.input.format
  default.output.format
  in.folder
  in.memory.combine
  input.format
  libs
  map
  map.file
  map.line
  out.folder
  output.format
  pkg.opts
  postamble
  preamble
  profile.nodes
  reduce
  reduce.file
  reduce.line
  rmr.global.env
  rmr.local.env
  save.env
  tempfile
  vectorized.reduce
  verbose
  work.dir
Loading required package: rhdfs
Loading required package: methods
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
  call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Warning in FUN(c("rhdfs", "rJava", "methods", "rmr2", "stats", "graphics",  :
  can't load rhdfs
Loading required package: rmr2
Loading objects:
  backend.parameters
  combine
  combine.file
  combine.line
  debug
  default.input.format
  default.output.format
  in.folder
  in.memory.combine
  input.format
  libs
  map
  map.file
  map.line
  out.folder
  output.format
  pkg.opts
  postamble
  preamble
  profile.nodes
  reduce
  reduce.file
  reduce.line
  rmr.global.env
  rmr.local.env
  save.env
  tempfile
  vectorized.reduce
  verbose
  work.dir
Warning in Ops.factor(left, right) : < not meaningful for factors
Error in split.default(a.list, ceiling(seq_along(a.list)/every.so.many),  : 
  first argument must be a vector
Calls: <Anonymous> ... <Anonymous> -> do.call -> mapply -> split -> split.default
No traceback available 
Error during wrapup: 
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

Любые предложения будут высоко оценены

Спасибо -S

Примечание. У меня есть блок-кавычки Ошибка в журнале stderr, где указано, что системная переменная HADOOP_CMD не найдена. Есть ли способ сделать переменные окружения HADOOP System экспортированными в R? Также обратите внимание, что я использую Sys.getenv(c("HADOOP_HOME", ...)) в начале моего скрипта, но это, похоже, не работает, как предлагает stderr

Обратите внимание, что я уже добавил следующие команды экспорта для переменных среды HADOOP в моем ~/.bash_profile

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

export JRI_PATH=/usr/lib64/R
export R_HOME=/usr/lib64/R
export R_SHARE_DIR=/usr/share/R
#export JRI_LD_PATH=${R_HOME}/library/rJava/jri:${R_HOME}/lib:${R_HOME}/bin
export LD_LIBRARY_PATH=${R_HOME}/library/rJava/jri

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CMD=${HADOOP_HOME}/bin/hadoop
export HADOOP_STREAMING=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export JAVA_HOME=/var/jdk1.7.0_25
#PATH=$PATH:$HOME/bin:$JAVA_HOME/bin

PATH=$PATH:$R_HOME/bin:$JAVA_HOME/bin:$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/mahout:/opt/cloudera/parcels/CDH/lib/hadoop:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce:/var/lib/storm-0.9.0-rc2/lib:$HADOOP_CMD:$HADOOP_STREAMING:$HADOOP_CONF_DIR

export PATH

2 ответа

Сбой потоковой команды!

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

count <- mapreduce(input=gdp.values, map = gdp.map.fn, reduce = count.reduce.fn, #output = output, input.format = "text",
combine = T)

Я работал над одним и тем же кодом... но мне не дали желаемого ответа... поэтому я попрактиковался в применении каждой вещи к каждому черному пятну карты... и, наконец, я обнаружил ошибку, допущенную автором. изменить код.. и вы можете запустить его

Другие вопросы по тегам