Как объединить битый текст из списка и добавить в словарь?

Со ссылкой на модуль Python для преобразования PDF в текстовую запись, файл PDF очищается и данные извлекаются. При очистке данные разбиваются на две отдельные переменные. Как я могу объединить эти данные и извлечь их как словарь?
Например

content = ['Sample Questions Set 1 ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '01  Which function among the following can’t be accessed outside ', 'the class in java in same package? ', 'A. public void show()。 ', 'B. void show()。 ', 'C. protected show()。 ', 'D. static void show()。 ', '02  How many private member functions are allowed in a class ? ', 'A. Only 1 ', 'B. Only 7 ', 'C. Only 255 ', 'D. As many as required ', '03  Can main() function be made private? ', 'A. Yes, always。 ', 'B. Yes, if program doesn’t contain any classes。 ', 'C. No, because main function is user defined。 ', 'D. No, never。 ', '04  If private member functions are to be declared in C++ then_________。 ', 'A. private:  ', 'B. private ', 'C. private(private member list) ', 'D. private :- <private members> ', '05  If a function in java is declared private then it _________。 ', 'A. Can’t access the standard output ', 'B. Can access the standard output。 ', 'C. Can’t access any output stream。 ', 'D. Can access only the output streams。 ']

Выход:

questions = [{'Qid':01,'Qtext':'Which function among the following can’t be accessed outside the class in java in same package?','A.':'public void show()。','B.':' void show()。','C.':'protected show()。','D.':'static void show()'},{'Qid':02,....},{...},{...},{...}]

2 ответа

Решение

Следующее будет делать:

questions = []
for s in content:
    s = s.lstrip()
    if s:
        if s[0].isdigit():
            questions.append({'Qid': len(questions) + 1, 'Qtext': s.split(maxsplit=1)[1]})
        elif s[0].isalpha() and s[1] == '.':
            questions[-1][s[:2]] = s.split(maxsplit=1)[1]
        elif questions:
            questions[-1]['Qtext'] += s

questions станет:

[{'Qid': 1, 'Qtext': 'Which function among the following can’t be accessed outside the class in java in same package? ', 'A.': 'public void show()。 ', 'B.': 'void show()。 ', 'C.': 'protected show()。 ', 'D.': 'static void show()。 '}, {'Qid': 2, 'Qtext': 'How many private member functions are allowed in a class ? ', 'A.': 'Only 1 ', 'B.': 'Only 7 ', 'C.': 'Only 255 ', 'D.': 'As many as required '}, {'Qid': 3, 'Qtext': 'Can main() function be made private? ', 'A.': 'Yes, always。 ', 'B.': 'Yes, if program doesn’t contain any classes。 ', 'C.': 'No, because main function is user defined。 ', 'D.': 'No, never。 '}, {'Qid': 4, 'Qtext': 'If private member functions are to be declared in C++ then_________。 ', 'A.': 'private:  ', 'B.': 'private ', 'C.': 'private(private member list) ', 'D.': 'private :- <private members> '}, {'Qid': 5, 'Qtext': 'If a function in java is declared private then it _________。 ', 'A.': 'Can’t access the standard output ', 'B.': 'Can access the standard output。 ', 'C.': 'Can’t access any output stream。 ', 'D.': 'Can access only the output streams。 '}]

Это объединит их в список вопросов:-

import re

questions = []
loc = 0

for i in range(len(content)):
    res = content[i]
    prefix = res[0]
    if(prefix.isalpha() and res[1]=='.'):
        questions[loc][prefix + "."] = re.sub(r"[ABCD]\.\s*", '', res)
        if(prefix == "D"):loc += 1
    elif(prefix.isdigit()):
        questions.append({'Qid':loc+1, 'Qtext': re.sub(r"\d+\s+", '', res)})
    elif(len(questions) != 0):
        questions[loc]['Qtext'] += res #for this line which after a question cutted

Результат:

[{'Qid': 1, 'Qtext': 'Which function among the following can’t be accessed outside the class in java in same package? ', 'A.': 'public void show()。 ', 'B.': 'void show()。 ', 'C.': 'protected show()。 ', 'D.': 'static void show()。 '}, {'Qid': 2, 'Qtext': 'How many private member functions are allowed in a class ? ', 'A.': 'Only 1 ', 'B.': 'Only 7 ', 'C.': 'Only 255 ', 'D.': 'As many as required '}, {'Qid': 3, 'Qtext': 'Can main() function be made private? ', 'A.': 'Yes, always。 ', 'B.': 'Yes, if program doesn’t contain any classes。 ', 'C.': 'No, because main function is user defined。 ', 'D.': 'No, never。 '}, {'Qid': 4, 'Qtext': 'If private member functions are to be declared in C++ then_________。 ', 'A.': 'private:  ', 'B.': 'private ', 'C.': 'private(private member list) ', 'D.': 'private :- <private members> '}, {'Qid': 5, 'Qtext': 'If a function in java is declared private then it _________。 ', 'A.': 'Can’t access the standard output ', 'B.': 'Can access the standard output。 ', 'C.': 'Can’t access any output stream。 ', 'D.': 'Can access only the output streams。 '}]
Другие вопросы по тегам