Загрузка вложенных XML-данных в Hive с использованием SerDe

Я пытаюсь загрузить вложенные данные XML в Hive. Пример данных выглядит следующим образом...

<CustomerOrders>
  <Customers>
    <CustID>ALFKI</CustID>
    <Orders>
      <OrderID>10643</OrderID>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-08-25</OrderDate>
    </Orders>
    <Orders>
      <OrderID>10692</OrderID>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-10-03</OrderDate>
    </Orders>
    <CompanyName>Alfreds Futterkiste</CompanyName>
  </Customers>
  <Customers>
    <CustID>ANATR</CustID>
    <Orders>
      <OrderID>10308</OrderID>
      <CustomerID>ANATR</CustomerID>
      <OrderDate>1996-09-18</OrderDate>
    </Orders>
    <CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
  </Customers>
</CustomerOrders>

Ниже приведена команда, которую я использую:

CREATE TABLE CUSTOMERORDERS(
          CustID STRING,
          Orders ARRAY<STRUCT<OrderID:STRING,CustomerID:STRING,OrderDate:STRING>>,
          CompanyName STRING)
          ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
          WITH SERDEPROPERTIES (
          "column.xpath.CustID"="/Customers/CustID/text()",
          "column.xpath.Orders"="/Customers/Orders",
          "column.xpath.OrderID"="/Customers/Orders/OrderID",
          "column.xpath.CustomerID"="/Customers/Orders/CustomerID",
          "column.xpath.OrderDate"="/Customers/Orders/OrderDate",
          "column.xpath.CompanyName"="/Customers/CompanyName/text()")
          STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
          OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
          TBLPROPERTIES ("xmlinput.start"="<Customers>","xmlinput.end"= "</Customers>");

Вывод, который я получаю:

    hive> select * from customerorders;
OK
ALFKI   [{"orderid":null,"customerid":null,"orderdate":null},{"orderid":null,"customerid":null,"orderdate":null}]       Alfreds Futterkiste
ANATR   [{"orderid":null,"customerid":null,"orderdate":null}]   Ana Trujillo Emparedados y helados
Time taken: 0.039 seconds, Fetched: 2 row(s)

я собираюсь null значения для OrderID, CustomerID а также OrderDate, Может ли кто-нибудь помочь мне в решении этой проблемы?

Спасибо

1 ответ

Решение
create external table customerorders
(
    custid      string
   ,orders      array<struct<Orders:struct<OrderID:string,CustomerID:string,OrderDate:string>>>
   ,companyname string
)
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
with serdeproperties 
(
    "column.xpath.CustID"       = "/Customers/CustID/text()"
   ,"column.xpath.Orders"       = "/Customers/Orders"
   ,"column.xpath.CompanyName"  = "/Customers/CompanyName/text()"
)

stored as 
inputformat     'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
outputformat    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
tblproperties 
(
    "xmlinput.start"    = "<Customers>"
   ,"xmlinput.end"      = "</Customers>"
);

-

select * from customerorders
;

-

+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| custid |                                                                           orders                                                                            |            companyname             |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| ALFKI  | [{"orders":{"orderid":"10643","customerid":"ALFKI","orderdate":"1997-08-25"}},{"orders":{"orderid":"10692","customerid":"ALFKI","orderdate":"1997-10-03"}}] | Alfreds Futterkiste                |
| ANATR  | [{"orders":{"orderid":"10308","customerid":"ANATR","orderdate":"1996-09-18"}}]                                                                              | Ana Trujillo Emparedados y helados |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+          
Другие вопросы по тегам