Как улучшить косинусоподобие в SBERT
Я пытаюсь реализовать SBERT multi-qa-mpnet-base-dot-v1 в корпусе Yahoo Answers QA. почему какой-то результат нехороший? следующий код.
!pip install -U sentence-transformers<br>
from sentence_transformers import SentenceTransformer, util
import tensorflow as tf
model=SentenceTransformer('multi-qa-mpnet-base-dot-v1')
import numpy as np
import pandas as pd
nRowsRead = 1000
cnames=["id","q","q2","a"]
df1 = pd.read_csv('test.csv', delimiter=',', names=cnames, nrows = nRowsRead)
df1.dataframeName = 'test.csv'
nRow, nCol = df1.shape
answers=[]
questions=[]
questions=df1.q.to_list()
answers=df1.a.to_list()
answer_embeddings=model.encode(answers,convert_to_tensor=True)
import torch
import random
for i in range(5):
question=questions[random.randrange(0,nRowsRead)]
question_embeddings=model.encode(question,convert_to_tensor=True)
sim = util.cos_sim(question_embeddings, answer_embeddings)
print("Question:",question)<br>
print("Answer :",answers[torch.argmax(sim)])
print("Score :",torch.max(sim))
ВЫХОД
Question: Is Julia Roberts' doing a movie for next year?
Answer : Kindly ignore the above answer. So far, for 2006, she has Charlotte's Web ( http://imdb.com/title/tt0413895/ ) and Ant Bully ( http://imdb.com/title/tt0429589/ ).
Score : tensor(0.6032)
Question: how do i become a citizen of new zealand?
Answer : This depends mainly on which state you are . Usually the more immigrants like Texas, California , Florida could take 7 to 15 months. Other states with few cases could be shorter. If you've got the labor permit I believe you are more than halfway in the process, once you start the i485 ( not sure if the correct number) you are done , just a matter of wait. The more important document you need in this stage is the travel document or travel parole that is only valid for one year and take slike two months to renew so you need to apply every 10 months. with thia paper you are free to move outside of the US and comeback. You do not need a visa anymore just wait until your residence card is ready.
Score : tensor(0.4470)
Question: What is the importance of learning Computer networking. ???
Answer : Anyone that uses the web or other internet technologies can benefit from learning at least the very basics of networking, for example, the difference between a telephone cable and a Cat5 ethernet cable and the places they plug into on the back of your computer. \n\nKnowing the difference between public and private networks, how data travels from one computer to another, and how to troubleshoot problems can save you money and frustration, but for many people it's easier to pay someone like me $80 an hour to fix it for them (fast).\n\nIf you work in IT or use networks on a regular basis, it would be very beneficial to study up on TCP/IP and Ethernet basics. Once you understand how things work it makes it much easier to figure out why it's not doing what you want it to do.\n\nHere's a good quick tutorial on TCP/IP:\nhttp://www.w3schools.com/tcpip/default.asp
Score : tensor(0.7319)
Question: If you had to give up one of your constitutional rights, which would it be?
Answer : My right to not have soldiers quartered in my home during peacetime, because I don't think they'd ever actually do it anyway.
Score : tensor(0.7427)
Question: How do I configure DNS server?
Answer : 1. Start the DNS Manager (Start - Programs - Administrative Tools - DNS Manager)\n 2. From the DNS menu, select New Server and enter the IP address of the DNS Server, e.g. 200.200.200.3, and click OK\n 3. The server will now be displayed with a CACHE sub part\n 4. Next we want to add the domain, e.g. savilltech.com, from the DNS menu, select New Zone\n 5. Select Primary and click Next\n 6. Enter the name, e.g. savilltech.com, and then press tab, and it will fill in the Zone File Name and click Next\n 7. Click Finish\n 8. Next a zone for reverse lookups has to be created, so select New Zone from the DNS menu\n 9. Select Primary and click Next, enter the name of the first 3 parts of the domain IP + in-addr.arpa, e.g. if the domain was 158.234.26, the entry would be 26.234.158.in-addr.arpa, in my example it would be 200.200.200.in-addr.arpa, click tab for the file name to be filled and click Next, then click Finish\n 10. Add a record for the DNS server, by right clicking on the domain and select "New Record"\n 11. Enter the name of the machine, e.g. BUGSBUNNY (I had a strange upbringing :-) ), and enter and IP address, e.g. 200.200.200.3 and click OK\n 12. If you click F5 and examine the 200.200.200.in-addr.arpa a record has been added for BUGSBUNNY there as well
Score : tensor(0.7067)
1: Сначала я импортировал test.csv (файл ответов Yahoo)2: Столбец 1 назначен списку вопросов 3: Столбец 4 назначен списку ответов 4: получены ответы, встроенные в модель multi-qa-mpnet-base-dot-v1 5: ввод случайные вопросы из списка вопросов и вложения 6: найдено косинусоподобие.
Как я могу улучшить показатели сходства?