Несколько графических процессоров AMD с Tensorflow и OpenCL в Ubuntu 16.04
После долгой борьбы:
Успешно построен Tensorflow с OpenCL на свежей Ubuntu 16.04 с amdgpu 17.50.
Установите 5 одинаковых графических процессоров (rx580), и все они сообщаются Clinfo и computecpp_info, как и ожидалось.
При выполнении примера MNIST Connet TF работает, но использует только GPU0, не видя других графических процессоров.
В dmesg об ошибках не сообщается о карте, они, кажется, все готовы на самом нижнем уровне, не знаю, почему SYCL, кажется, игнорирует некоторые карты.
Вот вывод computecpp_info:
********************************************************************************
ComputeCpp Info (CE 1.0.1)
SYCL 1.2.1 revision 3
********************************************************************************
Toolchain information:
GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 5 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 3:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 4:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v1.0.1/platform-support-notes
********************************************************************************
Вот список из tenorflow:
$ python3 list_gpus.py
2018-10-17 23:52:44.268968: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-17 23:52:44.385308: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-17 23:52:44.385342: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5429869323017416982
, name: "/device:SYCL:0"
device_type: "SYCL"
memory_limit: 268435456
locality {
}
incarnation: 7347791393919061653
physical_device_desc: "id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE"
]
РЕДАКТИРОВАТЬ: после перезагрузки
Я действительно не знаю, актуальны ли эти предупреждения, потому что они исчезают после первого запуска.
$ python3 list_gpus.py
2018-10-18 00:47:13.943021: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-18 00:47:13.952909: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:45] No OpenCL accelerator nor GPU found that is supported by ComputeCpp/triSYCL trying OpenCL CPU
2018-10-18 00:47:13.952930: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:52] No OpenCL CPU found that is supported by ComputeCpp/triSYCL, checking for host sycl device
2018-10-18 00:47:13.952936: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:59] Found SYCL host device
2018-10-18 00:47:13.953004: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-18 00:47:13.953014: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: Host, name: Host Device, vendor: Codeplay Software Ltd., profile: FULL_PROFILE
РЕДАКТИРОВАТЬ: dmesg details
[ 0.000000] Linux version 4.15.0-36-generic (buildd@lcy01-amd64-017) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 (Ubuntu 4.15.0-36.39~16.04.1-generic 4.15.18)
[ 0.688885] pcie_mp2_amd: AMD(R) PCI-E MP2 Communication Driver Version: 1.0
[ 1.143085] [drm] amdgpu kernel modesetting enabled.
[ 1.173931] amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
[ 1.564757] amdgpu 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 2.280211] amdgpu 0000:03:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 2.280212] amdgpu 0000:03:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 2.280322] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.280323] [drm] amdgpu: 4096M of GTT memory ready.
[ 2.280427] amdgpu 0000:03:00.0: amdgpu: using MSI.
[ 2.280439] [drm] amdgpu: irq initialized.
[ 2.280452] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 2.280690] amdgpu 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 2.280758] amdgpu 0000:03:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 2.280784] amdgpu 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 2.280842] amdgpu 0000:03:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 2.280903] amdgpu 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 2.280965] amdgpu 0000:03:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 2.280985] amdgpu 0000:03:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 2.281001] amdgpu 0000:03:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 2.281015] amdgpu 0000:03:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 2.281028] amdgpu 0000:03:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 2.281332] amdgpu 0000:03:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 2.281348] amdgpu 0000:03:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 2.285039] amdgpu 0000:03:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 2.285056] amdgpu 0000:03:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 2.285069] amdgpu 0000:03:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 2.285578] amdgpu 0000:03:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 2.285594] amdgpu 0000:03:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 2.980155] amdgpu 0000:03:00.0: kfd not supported on this ASIC
[ 2.980163] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:03:00.0 on minor 0
[ 2.980215] amdgpu 0000:06:00.0: enabling device (0000 -> 0003)
[ 4.068205] amdgpu 0000:06:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 4.068206] amdgpu 0000:06:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 4.068220] [drm] amdgpu: 4096M of VRAM memory ready
[ 4.068221] [drm] amdgpu: 4096M of GTT memory ready.
[ 4.068331] amdgpu 0000:06:00.0: amdgpu: using MSI.
[ 4.068344] [drm] amdgpu: irq initialized.
[ 4.068357] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 4.068444] amdgpu 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 4.068509] amdgpu 0000:06:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 4.068571] amdgpu 0000:06:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 4.068639] amdgpu 0000:06:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 4.068665] amdgpu 0000:06:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 4.068718] amdgpu 0000:06:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 4.068740] amdgpu 0000:06:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 4.068759] amdgpu 0000:06:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 4.068774] amdgpu 0000:06:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 4.068787] amdgpu 0000:06:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 4.069074] amdgpu 0000:06:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 4.069094] amdgpu 0000:06:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 4.072854] amdgpu 0000:06:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 4.072868] amdgpu 0000:06:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 4.072881] amdgpu 0000:06:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 4.073362] amdgpu 0000:06:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 4.073376] amdgpu 0000:06:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 4.771466] amdgpu 0000:06:00.0: kfd not supported on this ASIC
[ 4.771476] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:06:00.0 on minor 2
[ 4.771515] amdgpu 0000:07:00.0: enabling device (0000 -> 0003)
[ 5.856168] amdgpu 0000:07:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 5.856169] amdgpu 0000:07:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 5.856178] [drm] amdgpu: 4096M of VRAM memory ready
[ 5.856179] [drm] amdgpu: 4096M of GTT memory ready.
[ 5.856284] amdgpu 0000:07:00.0: amdgpu: using MSI.
[ 5.856297] [drm] amdgpu: irq initialized.
[ 5.856311] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 5.856402] amdgpu 0000:07:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 5.856441] amdgpu 0000:07:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 5.856464] amdgpu 0000:07:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 5.856541] amdgpu 0000:07:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 5.856569] amdgpu 0000:07:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 5.856641] amdgpu 0000:07:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 5.856668] amdgpu 0000:07:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 5.856690] amdgpu 0000:07:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 5.856707] amdgpu 0000:07:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 5.856722] amdgpu 0000:07:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 5.857007] amdgpu 0000:07:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 5.857027] amdgpu 0000:07:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 5.860789] amdgpu 0000:07:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 5.860803] amdgpu 0000:07:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 5.860817] amdgpu 0000:07:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 5.861298] amdgpu 0000:07:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 5.861313] amdgpu 0000:07:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 6.563837] amdgpu 0000:07:00.0: kfd not supported on this ASIC
[ 6.563845] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:07:00.0 on minor 3
[ 6.563887] amdgpu 0000:08:00.0: enabling device (0000 -> 0003)
[ 7.648177] amdgpu 0000:08:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 7.648178] amdgpu 0000:08:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 7.648188] [drm] amdgpu: 4096M of VRAM memory ready
[ 7.648188] [drm] amdgpu: 4096M of GTT memory ready.
[ 7.648292] amdgpu 0000:08:00.0: amdgpu: using MSI.
[ 7.648306] [drm] amdgpu: irq initialized.
[ 7.648322] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 7.648406] amdgpu 0000:08:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 7.648470] amdgpu 0000:08:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 7.648530] amdgpu 0000:08:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 7.648593] amdgpu 0000:08:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 7.648649] amdgpu 0000:08:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 7.648707] amdgpu 0000:08:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 7.648733] amdgpu 0000:08:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 7.648751] amdgpu 0000:08:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 7.648769] amdgpu 0000:08:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 7.648782] amdgpu 0000:08:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 7.649069] amdgpu 0000:08:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 7.649087] amdgpu 0000:08:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 7.652849] amdgpu 0000:08:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 7.652862] amdgpu 0000:08:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 7.652874] amdgpu 0000:08:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 7.653353] amdgpu 0000:08:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 7.653366] amdgpu 0000:08:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 8.355909] amdgpu 0000:08:00.0: kfd not supported on this ASIC
[ 8.355916] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:08:00.0 on minor 4
[ 8.355957] amdgpu 0000:09:00.0: enabling device (0000 -> 0003)
[ 9.440257] amdgpu 0000:09:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 9.440258] amdgpu 0000:09:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 9.440268] [drm] amdgpu: 4096M of VRAM memory ready
[ 9.440268] [drm] amdgpu: 4096M of GTT memory ready.
[ 9.440376] amdgpu 0000:09:00.0: amdgpu: using MSI.
[ 9.440390] [drm] amdgpu: irq initialized.
[ 9.440406] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 9.440499] amdgpu 0000:09:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 9.440563] amdgpu 0000:09:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 9.440625] amdgpu 0000:09:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 9.440690] amdgpu 0000:09:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 9.440753] amdgpu 0000:09:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 9.440808] amdgpu 0000:09:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 9.440831] amdgpu 0000:09:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 9.440849] amdgpu 0000:09:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 9.440865] amdgpu 0000:09:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 9.440880] amdgpu 0000:09:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 9.441167] amdgpu 0000:09:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 9.441184] amdgpu 0000:09:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 9.444946] amdgpu 0000:09:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 9.444964] amdgpu 0000:09:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 9.444976] amdgpu 0000:09:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 9.445456] amdgpu 0000:09:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 9.445469] amdgpu 0000:09:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 10.147558] amdgpu 0000:09:00.0: kfd not supported on this ASIC
[ 10.147564] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:09:00.0 on minor 5
[ 10.147606] amdgpu 0000:0a:00.0: enabling device (0000 -> 0003)
[ 11.232197] amdgpu 0000:0a:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 11.232198] amdgpu 0000:0a:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 11.232207] [drm] amdgpu: 4096M of VRAM memory ready
[ 11.232207] [drm] amdgpu: 4096M of GTT memory ready.
[ 11.232309] amdgpu 0000:0a:00.0: amdgpu: using MSI.
[ 11.232322] [drm] amdgpu: irq initialized.
[ 11.232337] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 11.232427] amdgpu 0000:0a:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 11.232488] amdgpu 0000:0a:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 11.232551] amdgpu 0000:0a:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 11.232615] amdgpu 0000:0a:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 11.232675] amdgpu 0000:0a:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 11.232699] amdgpu 0000:0a:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 11.232717] amdgpu 0000:0a:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 11.232735] amdgpu 0000:0a:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 11.232749] amdgpu 0000:0a:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 11.232763] amdgpu 0000:0a:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 11.233048] amdgpu 0000:0a:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 11.233067] amdgpu 0000:0a:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 11.236830] amdgpu 0000:0a:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 11.236848] amdgpu 0000:0a:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 11.236860] amdgpu 0000:0a:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 11.237341] amdgpu 0000:0a:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 11.237355] amdgpu 0000:0a:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 11.939330] amdgpu 0000:0a:00.0: kfd not supported on this ASIC
[ 11.939336] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:0a:00.0 on minor 6
РЕДАКТИРОВАТЬ: Это не связано с какой-либо конкретной карты, только первая доступна в автобусе.
Я попытался отключить некоторые карты, и после всех тестов кажется ясным, что SYCL всегда перечисляет только первый GPU, независимо от того, какой из них, всегда минимально доступный номер шины.
Это также подтверждает, что между картами нет различий и что все они могут быть использованы (по крайней мере, по отдельности), поэтому ОС, я думаю, в порядке, и я предполагаю, что проблема в SYCL.
Пожалуйста помоги!
1 ответ
На сегодняшний день несколько графических процессоров с Tensorflow и OpenCL в настоящее время не поддерживаются, даже если это не указано в документации.
Вы можете отслеживать детали проблемы здесь, я открыл проблему на Github: https://github.com/codeplaysoftware/tensorflow/issues/16
Я обновлю этот ответ, если что-то изменится, но, как сказал разработчик, это не является для них приоритетом!