U-sql: Как обработать файл Avro с несколькими массивами JSON с несколькими объектами?
Я получаю файл Avro в своем хранилище озера данных через потоковую аналитику и концентратор событий с помощью захвата.
Структура файла выглядит следующим образом:
[{ "ID": 1, "PID": "ABC", "значение":"1","utctimestamp":1537805867},{"идентификатор":6569,"PID":"1E014000","значение":"-5.8","utctimestamp":1537805867}] [{"id":2,"pid":"cde","value":"77","utctimestamp":1537772095},{"id":6658,"PID":"02002001","значение":"77","utctimestamp":1537772095}]
Я использовал этот скрипт:
@rs =
EXTRACT
SequenceNumber long,
Offset string,
EnqueuedTimeUtc string,
Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
{
""type"": ""record"",
""name"": ""EventData"",
""namespace"": ""Microsoft.ServiceBus.Messaging"",
""fields"": [
{
""name"": ""SequenceNumber"",
""type"": ""long""
},
{
""name"": ""Offset"",
""type"": ""string""
},
{
""name"": ""EnqueuedTimeUtc"",
""type"": ""string""
},
{
""name"": ""SystemProperties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes""
]
}
},
{
""name"": ""Properties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes"",
""null""
]
}
},
{
""name"": ""Body"",
""type"": [
""null"",
""bytes""
]
}
]
}
");
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message FROM @rs;
@cnt = SELECT message["id"] AS id,
message["id2"] AS pid,
message["value"] AS value,
message["utctimestamp"] AS utctimestamp,
message["extra"] AS extra
FROM @jsonify;
OUTPUT @cnt TO @output_file USING Outputters.Text(quoting: false);
Сценарий приводит к созданию файла, но только с разделителями запятыми и без значений.
Как извлечь / преобразовать эту структуру, чтобы я мог вывести ее в виде сплющенного 4-столбцового CSV-файла?
1 ответ
Я получил это на работу, взорвав столбец JSON снова и применив JsonTuple
снова (хотя я подозреваю, что это можно упростить):
@jsonify =
SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message
FROM @rs;
// Explode the tuple as key-value pair;
@working =
SELECT key,
JsonFunctions.JsonTuple(value) AS value
FROM @jsonify
CROSS APPLY
EXPLODE(message) AS y(key, value);
Полный скрипт:
REFERENCE ASSEMBLY Avro;
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE @input_file string = @"\input\input21.avro";
DECLARE @output_file string = @"\output\output.csv";
@rs =
EXTRACT
Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"{
""type"": ""record"",
""name"": ""EventData"",
""namespace"": ""Microsoft.ServiceBus.Messaging"",
""fields"": [
{
""name"": ""SequenceNumber"",
""type"": ""long""
},
{
""name"": ""Offset"",
""type"": ""string""
},
{
""name"": ""EnqueuedTimeUtc"",
""type"": ""string""
},
{
""name"": ""SystemProperties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes""
]
}
},
{
""name"": ""Properties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes"",
""null""
]
}
},
{
""name"": ""Body"",
""type"": [
""null"",
""bytes""
]
}
]
}");
@jsonify =
SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message
FROM @rs;
// Explode the tuple as key-value pair;
@working =
SELECT key,
JsonFunctions.JsonTuple(value) AS value
FROM @jsonify
CROSS APPLY
EXPLODE(message) AS y(key, value);
@cnt =
SELECT value["id"] AS id,
value["id2"] AS pid,
value["value"] AS value,
value["utctimestamp"] AS utctimestamp,
value["extra"] AS extra
FROM @working;
OUTPUT @cnt TO @output_file USING Outputters.Text(quoting: false);
Мои результаты: