Пусть Python принимает предложение за предложением, а не слово за словом?

У меня есть ряд строк, и я хочу, чтобы Python брал это предложение за предложением при создании кортежа. Например:

string = [("I am a good boy"), ("I am a good girl")]
tuple = [("I am a good boy", -1), ("I am a good girl", -1)]

Но, видимо, это делает:

tuple = [("I", -1), ("am", -1), ("a", -1), ("good", -1), ("boy", -1).....]

Что пошло не так и как мне это решить?

import re

def cleanedthings(trainset):
    cleanedtrain = []
    specialch = "!@#$%^&*-=_+:;\".,/?`~][}{|)("
    for line in trainset:
        for word in line.split():
            lowword = word.lower()
            for ch in specialch:
                if ch in lowword:
                    lowword = lowword.replace(ch,"")
            if len(lowword) >= 3:
                cleanedtrain.append(lowword)
    return cleanedtrain

poslinesTrain = [('I just wanted to drop you a note to let you know how happy I am with my cabinet'), ('The end result is a truly amazing transformation!'), ('Who can I thank for this?'), ('For without his artistry and craftmanship this transformation would not have been possible.')]

neglinesTrain = [('I have no family and no friends, very little food, no viable job and very poor future prospects.'), ('I have therefore decided that there is no further point in continuing my life.'), ('It is my intention to drive to a secluded area, near my home, feed the car exhaust into the car, take some sleeping pills and use the remaining gas in the car to end my life.')]

poslinesTest = [('Another excellent resource from Teacher\'s Clubhouse!'), ('This cake tastes awesome! It\'s almost like I\'m in heaven already oh God!'), ('Don\'t worry too much, I\'ll always be here for you when you need me. We will be playing games or watching movies together everytime to get your mind off things!'), ('Hey, this is just a simple note for you to tell you that you\'re such a great friend to be around. You\'re always being the listening ear to us, and giving us good advices. Thanks!')]

neglinesTest = [('Mum, I could write you for days, but I know nothing would actually make a difference to you.'), ('You are much too ignorant and self-concerned to even attempt to listen or understand. Everyone knows that.'), ('If I were, your BITCHY comments that I\'m assuming were your attempt to help, wouldn\'t have.'), ('If I have stayed another minute I would have painted the walls and stained the carpets with my blood, so you could clean it up... I wish I were never born.')]

clpostrain = cleanedthings(poslinesTrain)
clnegtrain = cleanedthings(neglinesTrain)

clpostest = cleanedthings(poslinesTest)
clnegtest = cleanedthings(neglinesTest)


trainset= [(x,1) for x in clpostrain] + [(x,-1) for x in clnegtrain]
testset= [(x,1) for x in clpostest] + [(x,-1) for x in clnegtest]

print testset

1 ответ

Решение

Вы присоединились к окончательному результату словами, а не предложениями. Добавление переменной для каждого предложения исправит вашу ошибку

def cleanedthings(trainset):
    cleanedtrain = []
    specialch = "!@#$%^&*-=_+:;\".,/?`~][}{|)("
    for line in trainset:
        #will append the clean word of the current sentence in this var
        sentence = []
        for word in line.split():
            lowword = word.lower()
            for ch in specialch:
                if ch in lowword:
                    lowword = lowword.replace(ch,"")
            if len(lowword) >= 3:
                sentence.append(lowword)
        #once we check all words, recreate the sentence joining by white space 
        #and append to the list of cleaned sentences
        cleanedtrain.append(' '.join(sentence))
    return cleanedtrain
Другие вопросы по тегам