JSON Extractor для массива строк

Question

JSON Extractor для массива строк

В Риаке у меня есть этот основной user схема с сопровождающим user index (я пропустил специфичные для riak поля, такие как _yz_id так далее.):

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="user" version="1.5">

 <fields>
   <field name="email"    type="string"   indexed="true"  stored="false"/>   
   <field name="name"     type="string"   indexed="true"  stored="false"/>   
   <field name="groups"   type="string"   indexed="true"  stored="false" multiValued="true"/>

   <dynamicField name="*" type="ignored"  indexed="false" stored="false" multiValued="true"/>

   ..riak-specific fields.. 

 </fields>

 <uniqueKey>_yz_id</uniqueKey>                                                 

 <types>                                                                       
   <fieldType name="string"  class="solr.StrField"     sortMissingLast="true"/>
   <fieldType name="_yz_str" class="solr.StrField"     sortMissingLast="true"/>
   <fieldtype name="ignored" class="solr.StrField"/>                           
 </types>

</schema>

Мой пользователь JSON выглядит так:

{
   "name" : "John Smith",
   "email" : "jsmith@gmail.com",
   "groups" : [
      "3304cf79",
      "abe155cf"
   ]
}

Когда я пытаюсь найти, используя этот запрос:

curl http://localhost:10018/search/query/user?wt=json&q=groups:3304cf79

Я не понимаю docs назад.

Почему это? Создает ли экстрактор JSON записи индекса для групп?

0

json riak riak-search

Источник

22 мар '15 в 19:27

2 ответа

Другие вопросы по тегам json riak riak-search

23 мар '15 в 02:03 2015-03-23 02:03 · Answer 1 · 2015-03-23 02:03

Схема правильная. Проблема заключалась в том, что это была не оригинальная схема, которую я использовал для установки свойств корзины. Этот вопрос на GitHub Yokozuna был виновником. Я обновил схему после вставки новых данных, думая, что индексы перезагрузятся. В настоящее время они этого не делают.

user4719725 05 авг '23 в 21:25 2023-08-05 21:25 · Answer 2 · 2023-08-05 21:25

как насчет этого? вы можете извлечь все сразу, это общий вариант

      import json
import numpy as np
import pandas as pd
from jsonpath_ng import jsonpath, parse

def explode_list(df, col):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.len())
    return df.iloc[i].assign(**{col: np.concatenate(s)})

def process_json_data(data_file, mapping_file, root):
    # Load the JSON data
    with open(data_file) as f:
        data = json.load(f)

    # Load the mapping
    with open(mapping_file) as f:
        mapping = json.load(f)

    # Prepare an empty dataframe to hold the results
    df = pd.DataFrame()

    # Iterate over each datapoint in the data file
    for i, datapoint in enumerate(data[root]):
        # Prepare an empty dictionary to hold the results for this datapoint
        datapoint_dict = {}
        # Iterate over each field in the mapping file
        for field, path in mapping.items():
            # Prepare the JSONPath expression
            jsonpath_expr = parse(path)
            # Find the first match in the datapoint
            match = jsonpath_expr.find(datapoint)
            if match:
                # If a match was found, add it to the dictionary
                datapoint_dict[field] = [m.value for m in match]
            else:
                # If no match was found, add 'no path' to the dictionary
                datapoint_dict[field] = ['no path']

        # Create a temporary dataframe for this datapoint
        frames = [pd.DataFrame({k: np.repeat(v, max(map(len, datapoint_dict.values())))}) for k, v in datapoint_dict.items()]
        temp_df = pd.concat(frames, axis=1)

        # Identify list-like columns and explode them
        while True:
            list_cols = [col for col in temp_df.columns if any(isinstance(i, list) for i in temp_df[col])]
            if not list_cols:
                break
            for col in list_cols:
                temp_df = explode_list(temp_df, col)

        # Append the temporary dataframe to the main dataframe
        df = df.append(temp_df)

    df.reset_index(drop=True, inplace=True)
    return df.style.set_properties(**{'border': '1px solid black'})

# Calling the function
df = process_json_data('/content/jsonShredd/data.json', '/content/jsonShredd/mapping.json', 'datapoints')
df