llama.cpp llama_cublas включен, но при запуске ./main используется только 75 МБ/6 ГБ видеопамяти
Я включил llama_cublas для работы с набором инструментов nvidia cuda.
make LLAMA_CUBLAS=1
Скомпилировалось нормально
Но когда я запускаю модель и отслеживаю потребление памяти nvidia-smi, используется только 75 МБ. См. ниже.
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 13189.99 MB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/43 layers to GPU
llm_load_tensors: VRAM used: 0.00 MB
....................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 400.00 MB
llama_new_context_with_model: compute buffer total size = 81.13 MB
llama_new_context_with_model: VRAM scratch buffer: 75.00 MB
llama_new_context_with_model: total VRAM used: 75.00 MB (model: 0.00 MB, context: 75.00 MB)
выход nvidia smi
Tue Oct 24 10:53:17 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4050 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 42C P8 5W / 30W | 89MiB / 6141MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1991 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+