Azure / R-сервер - создайте xdf из файла tsv/csv

Я пытаюсь построить прогностическую модель в Azure / R-сервере, но не могу понять, как получить данные в правильном формате, в частности, как преобразовать символьную переменную в факторную переменную. В настоящее время у меня есть данные в файле CSV, в то время как я преобразую в источник rxTextData. Когда я пытаюсь преобразовать это в файл XDF (чтобы преобразовать переменную из символа в фактор), я получаю следующее сообщение об ошибке:

    rxSetFileSystem(RxHdfsFileSystem())
# Point to the ADL store
myNameNode <- "adl://mydatalake.net"
myPort <- 0

# Location of the data 
dataRoot <- "/result_output"  

# Define Spark compute context
sparkCluster <- rxSparkConnect(consoleOutput=TRUE, nameNode=myNameNode, port=myPort, reset = T)
Parameter 'reset' is set to TRUE. Shutting down existing Spark applications (scaleR-spark-won2r0FHOA-sargow-60879-7D2545A13EC347E393FD0EDEA85F43B1).
It may take 1 to 2 minutes to launch a new Spark application.

# Set compute context
rxSetComputeContext(sparkCluster)

# Define HDFS file system
hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)

#---------------------------------------------------

inTxtFile <- file.path(dataRoot,"result.csv")
df <- RxTextData(inTxtFile, fileSystem = hdfsFS, firstRowIsColNames = T)

rxGetInfo(df, getVarInfo = TRUE)
======  ed0-myRserverr (Master HPA Process) has started run at Tue Nov 21 21:43:48 2017  ====== 
Picked up JAVA_TOOL_OPTIONS: -Xss4m 
17/11/21 21:43:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
======  ed0-myRserverr (Master HPA Process) has completed run at Tue Nov 21 21:43:52 2017  ====== 
File name: /result_output/result.csv 
Data Source: Text 
Number of variables: 4 
Variable information: 
Var 1: 1, Type: integer
Var 2: 2010-01-01, Type: character
Var 3: TagName1, Type: character
Var 4: 30, Type: integer
head(df)
======  ed0-myRserverr (Master HPA Process) has started run at Tue Nov 21 21:44:00 2017  ====== 
Picked up JAVA_TOOL_OPTIONS: -Xss4m 
17/11/21 21:44:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
======  ed0-myRserverr (Master HPA Process) has completed run at Tue Nov 21 21:44:16 2017  ====== 
  1 2010-01-01 TagName1 30
1 2 2010-01-01 TagName2  5
2 2 2010-01-02 TagName2  7
3 2 2010-01-02 TagName3  6
4 3 2010-01-03 TagName2 15
5 1 2010-01-01 TagName2  2
6 1 2010-01-01 TagName3  1

#---------------------------------------------------

outXdfFile <- file.path(dataRoot,"result_xdf.xdf")
dfOUT <- RxXdfData(outXdfFile, fileSystem = hdfsFS)

#---------------------------------------------------

rxImport(inData = df, outFile = dfOUT)
Error: /result_output/result_xdf.xdf has extension '.xdf', which is considered as single XDF and not supported in RxHadoopMR and RxSpark compute context

РЕДАКТИРОВАТЬ: Исходные данные создаются здесь:

@t = SELECT *
     FROM(
        VALUES
        ( 1, "2010-01-01","TagName1", 30 ),
        ( 2, "2010-01-01","TagName2", 5 ),
        ( 2, "2010-01-02","TagName2", 7 ),
        ( 2, "2010-01-02","TagName3", 6 ),
        ( 3, "2010-01-03","TagName2", 15 ),
        ( 1, "2010-01-01","TagName2", 2 ),
        ( 1, "2010-01-01","TagName3", 1),
        ( 3, "2010-01-04","TagName1", 2 ),
        ( 3, "2010-01-04","TagName2", 4 )
     ) AS T(DeviceID, Date, TagName, dv);

     OUTPUT @t
TO "result_output/result.csv"
USING Outputters.Csv(quoting:false);

EDIT_2:

> rxTextToXdf(inFile = df, outFile = dfOUT)
Error in rxClusterStop() : 
    The computeContext option is currently set to a distributed computing context. 'rxTextToXdf' calls
    are not inherently distributable. You can, however, use 'rxExec' to process a call in the distributed
    context. Please type '?rxExec' for more information. To execute the function locally, enter
    rxOptions(computeContext = 'local')

EDIT_3:

> rxDataStep(inFile = df, outFile = dfOUT)
Error: /result_output/result_xdf.xdf has extension '.xdf', which is considered as single XDF and not supported in RxHadoopMR and RxSpark compute context.

0 ответов

Другие вопросы по тегам