Запустите localGPT через Pipenv вместо Conda
Цель
Я хотел бы использовать Pipenv вместо Conda для запуска localGPT на машине с Ubuntu 22.04.03.
Причина: на сервере, где я хотел бы развернуть localGPT, Pipenv уже установлен, но conda нет, и у меня нет разрешений для его установки.
Подход
Я перевел существующие, актуальные
requirements.txt
файл:
# Natural Language Processing
langchain==0.0.267
chromadb==0.4.6
pdfminer.six==20221105
InstructorEmbedding
sentence-transformers
faiss-cpu
huggingface_hub
transformers
protobuf==3.20.2; sys_platform != 'darwin'
protobuf==3.20.2; sys_platform == 'darwin' and platform_machine != 'arm64'
protobuf==3.20.3; sys_platform == 'darwin' and platform_machine == 'arm64'
auto-gptq==0.2.2
docx2txt
unstructured
unstructured[pdf]
# Utilities
urllib3==1.26.6
accelerate
bitsandbytes ; sys_platform != 'win32'
bitsandbytes-windows ; sys_platform == 'win32'
click
flask
requests
# Streamlit related
streamlit
Streamlit-extras
# Excel File Manipulation
openpyxl
в Pipfile (расположенный в корне), который выглядит следующим образом:
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[packages]
langchain = "==0.0.267"
chromadb = "==0.4.6"
pdfminer.six = "==20221105"
InstructorEmbedding = "*"
sentence-transformers = "*"
faiss-cpu = "*"
huggingface_hub = "*"
transformers = "*"
protobuf = "==3.20.3"
auto-gptq = "==0.2.2"
docx2txt = "*"
unstructured = {extras = ["pdf"], version = "*"}
urllib3 = "==1.26.6"
accelerate = "*"
bitsandbytes = "*"
click = "*"
flask = "*"
requests = "*"
streamlit = "*"
Streamlit-extras = "*"
openpyxl = "*"
jmespath = "==1.0.1"
llama-cpp-python = "==0.2.11"
[requires]
python_version = "3.10"
Имейте в виду, что я добавилjmespath
иllama-cpp-python
, потому что, когда я делал это методом conda, мне нужно было дополнительно установить эти два пакета через pip.
Проблема
Итак, теоретически запуск
pipenv install
pipenv shell
python ingest.py
должен выполнить проглатывание, но, к сожалению, получаю ошибку :red_circle:
python ingest.py
2023-10-17 11:16:53,102 - INFO - ingest.py:121 - Loading documents from /home/*********/Documents/my-chatbot/SOURCE_DOCUMENTS
2023-10-17 11:16:53,131 - INFO - ingest.py:34 - Loading document batch
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/*******/Documents/mylocal-chatbot/ingest.py", line 40, in load_document_batch
data_list = [future.result() for future in futures]
File "/home/*******/Documents/mylocal-chatbot/ingest.py", line 40, in <listcomp>
data_list = [future.result() for future in futures]
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/*******/Documents/mylocal-chatbot/ingest.py", line 30, in load_single_document
return loader.load()[0]
File "/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/langchain/document_loaders/unstructured.py", line 86, in load
elements = self._get_elements()
File "/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/langchain/document_loaders/unstructured.py", line 169, in _get_elements
from unstructured.partition.auto import partition
File "/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/unstructured/partition/auto.py", line 80, in <module>
from unstructured.partition.pdf import partition_pdf
File "/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/unstructured/partition/pdf.py", line 12, in <module>
from pdfminer.converter import PDFPageAggregator, PDFResourceManager
ImportError: cannot import name 'PDFResourceManager' from 'pdfminer.converter' (/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/pdfminer/converter.py)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/*******/Documents/mylocal-chatbot/ingest.py", line 159, in <module>
main()
File "/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/*******/Documents/mylocal-chatbot/ingest.py", line 122, in main
documents = load_documents(SOURCE_DIRECTORY)
File "/home/*******/Documents/mylocal-chatbot/ingest.py", line 71, in load_documents
contents, _ = future.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
ImportError: cannot import name 'PDFResourceManager' from 'pdfminer.converter' (/home/*******/.local/share/virtualenvs/mylocal-chatbot-j8q8_E0e/lib/python3.10/site-packages/pdfminer/converter.py)
Ручная (пере) установка связанных пакетов черезpipenv install pdfminer
,pipenv install pdfminer.six
илиpipenv install unstructured
не помогло.
Есть идеи, как избавиться от этой ошибки импорта?