Как исправить список (иногда неправильный) с помощью словаря?

Question

Как исправить список (иногда неправильный) с помощью словаря?

Я все еще пытаюсь заставить закомментированные разделы в цикле for работать так, как предложил Klasske ниже. Я включил свой собственный тестовый скрипт, к которому я добавил комментарии, чтобы объяснить мои мысли. Любой совет будет принят во внимание.

У меня есть файл (infile1.tsv), который содержит правильные пары для использования в качестве словаря, и у меня есть файл, который иногда имеет неправильный 1-й столбец (infile2.tsv). Я не могу просто +1 к колонке выше.

заменить 1-й столбец, если у него нет "Title_#", где # может быть любой длины
нежелательным именем может быть любая строка символов и цифр, которая не соответствует шаблону Title_ #
имя для замены из infile2.tsv указано в infile1.tsv (1-й столбец) с исправленным именем (2-й столбец)

Мой тестовый скрипт:

#!/usr/bin/python
# import dictionary of pairs, then replace those which aren't Title_
# USAGE:  python import_dict_then_replace.py
import csv
import string

infile1_dict = {} # use dict for the library to replace as (key, value)
with open('infile1.tsv','rb') as tsvin:
    tsvin = csv.reader(tsvin, delimiter='\t') # imports the file as a tsv
    for row in tsvin: # each row in the tab-delimited input file
        infile1_dict.update({row[0]:row[1]}) # use 1st row (0) as key, use 2nd row (1) as value in the dictionary

#print infile1_dict # to verify dictionary is read in correctly

outputstring = [] # use list because it's ordered and can interate through each just once (linear search) to check against the dictionary for replacement
outlist = [] # make another empty list
with open('infile2.tsv','rb') as inlist:
    inputlist = inlist.read().replace('\n','') # gets rid of all newlines to import the rows as list; rstrip('\n') just removes newlines from last lines
    for index, item in enumerate(inputlist): # iterate over each item in inputlist; need the index to reference for each item in the list
        if 'Title_' in item: # when Title_ is in the list's item
            outputstring.append(item) # just put the Title_ item into the list
        if not 'Title_' in item: # when Title_ is not in the list's item
            correction = infile1_dict.get(item, "key was not in dictionary")
            outlist = inlist.replace(item, correction, 1) # replaces the unmatching item with the corresponding value in the dictionary
            outputstring = inlist.append(outlist) # adds to the list
print outputstring

infile1.tsv

junk_name1  Title_3
junk-name2  Title_184
junk.name3  Title_122
junkname4   Title_96

infile2.tsv

Title_1
Title_2
junk_name1
Title_94
Title_95
junkname4
Title_121
junk.name3

из-complete.txt

Title_1
Title_2
Title_3
Title_94
Title_95
Title_96
Title_121
Title_122

из-replaced.tsv

junk_name1  Title_3
junkname4   Title_96
junk.name3  Title_122

-2

python regex dictionary replace tsv

Источник

user3862987 23 авг '14 в 18:03

2 ответа

Решение

Мне потребовалось некоторое время, чтобы понять, что вы спрашивали, но я думаю, что вы хотите сделать, это прочитать первый файл в словарь пар ключ-значение:

import csv

infile1_dict = {}

with open('infile1.tsv','rb') as tsvin:
    tsvin = csv.reader(tsvin, delimiter='\t')
    for row in tsvin:
        infile1_dict.update({row[0]:row[1]})

Затем вы захотите прочитать второй файл и заменить его, а также следить за замененными значениями:

replaced_values_dict = {}
with open('infile2.tsv','rb') as tsvin, open('out-complete.tsv', 'wb') as tsvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    tsvout = csv.writer(tsvout, delimiter='\t')

    for row in tsvin:
        if not 'Title_' in row[0]:
            #write to new file with item found in infile1_dict
            #save pair in replaced_values_dict
        else:
            #write as is

Это то, что вы имели в виду?

1

Источник

user3929902 23 авг '14 в 18:48

Другие вопросы по тегам python regex dictionary replace tsv

user3862987 08 сен '14 в 00:59 2014-09-08 00:59 · Accepted Answer · 2014-09-08 00:59

Это работает, но, вероятно, не очень элегантно

#!/usr/bin/python
#imports dictionary of pairs, then replaces those which aren't Title_

import csv

infile1dict = {} # dict for the library to replace as (key, value)
with open('infile1.tsv','r') as tsvin:
    tsvin = csv.reader(tsvin, delimiter='\t') # imports the file as a tsv
    for row in tsvin: # each row in the tab-delimited input file
        infile1dict.update({row[0]:row[1]}) # use 1st row (0) as key, use 2nd row (1) as value in the dictionary
    print (infile1dict)

inlist = [] # list to be tested against the dict library 
with open('infile2.tsv','r') as inlist:
    inlist = [line.strip() for line in inlist.readlines()] # strips off the \n, grabs each line as an item
    print (inlist)

outlist = [] # list for each item to be added into
for item in inlist:
    if 'Title_' in item:
        print 'Item %s will go directly into the output list.' % item
        itmstr = "".join(item) # converts item tuple to item string for append function
        outlist.append(itmstr) # adds item to the output list from input file list
    if not 'Title_' in item:
        print 'Item %s will be replaced with the dictionary.' % item
        correction = infile1dict.get(item, "The inlist's item was not a key in the dictionary.")
        outlist.append(correction) # adds the correction string to the list    
        print('\t'.join(map(str,outlist)))
with open('out-complete.txt', 'w') as outobj:
    outobj.writelines('\n'.join(map(str,outlist)))