CPU and GPU identification data#6
Conversation
Signed-off-by: Adrien Taberner <adrien.taberner@cea.fr>
Signed-off-by: Adrien Taberner <adrien.taberner@cea.fr>
- add a CEXA_ARCHINFO_ENABLE_TESTS option - try to find gtest before calling FetchContent - link to gtest instead of gtest_main - Use the PRE_TEST test discovery mode
- Use macros to choose between the platform specific functions instead of doing it at configure time
- Remove the compilInfo.hpp
- Properly install the library - Do not install gtest alongside the library - Add an install test subdirectory
There was a problem hiding this comment.
I kept the tests as is (just removed the prints in the main function), we might want to do something more specific than checking if the string is non-empty. For example checking that if there is no GPU backend, the GPU functions return "N/A".
| std::string get_gpu_driver_version() { | ||
| if (NVML_SUCCESS != nvmlInit_v2()) { | ||
| std::cerr << "get_gpu_driver_version: failed to initialize nvml\n"; | ||
| return "ERROR"; | ||
| } | ||
| char buffer[NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE]; | ||
| nvmlSystemGetDriverVersion(buffer, NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE); | ||
| nvmlShutdown(); | ||
| return buffer; | ||
| } |
There was a problem hiding this comment.
According to the documentation, nvml can be initialized/shutdown multiple times (they do a noop if the library was already initialized) so there shouldn't be any conflicts with a user code using nvml and our library.
For the error checking, nvmlSystemGetDriverVersion and nvmlShutdown cannot fail in our use case (nvml is initialized, buffer is not NULL, buffer size is sufficient), that's why I only check for errors in the call to nvmlInit_v2.
Also, I don't know if we should keep the std::cerr with the error message, as it is the only place in the code where we actually print an error message to the console.
There was a problem hiding this comment.
I am in favor of removing the cerr and simply returning "N/A".
There was a problem hiding this comment.
I would still check the return value of nvmlSystemGetDriverVersion. I trust you that with current specification we do not expect it to fail but:
- we can't guarantee that new condition for failure won't appear in the future
- we have no guarantee that the implementation contains no bug and doesn't throw error in other cases too
Co-authored-by: Paul Zehner <paul.zehner@cea.fr>
Co-authored-by: Paul Zehner <paul.zehner@cea.fr>
Co-authored-by: Paul Zehner <paul.zehner@cea.fr>
Co-authored-by: Paul Zehner <paul.zehner@cea.fr>
| std::string get_gpu_driver_version() { | ||
| if (NVML_SUCCESS != nvmlInit_v2()) { | ||
| std::cerr << "get_gpu_driver_version: failed to initialize nvml\n"; | ||
| return "ERROR"; | ||
| } | ||
| char buffer[NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE]; | ||
| nvmlSystemGetDriverVersion(buffer, NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE); | ||
| nvmlShutdown(); | ||
| return buffer; | ||
| } |
There was a problem hiding this comment.
I would still check the return value of nvmlSystemGetDriverVersion. I trust you that with current specification we do not expect it to fail but:
- we can't guarantee that new condition for failure won't appear in the future
- we have no guarantee that the implementation contains no bug and doesn't throw error in other cases too
| std::string get_gpu_name() { | ||
| hipDeviceProp_t device_prop; | ||
| int device_id = Kokkos::device_id(); | ||
| KOKKOS_IMPL_HIP_SAFE_CALL(hipGetDeviceProperties(&device_prop, device_id)); | ||
| return std::string{device_prop.name}; | ||
| } | ||
|
|
||
| std::string get_gpu_arch() { | ||
| hipDeviceProp_t device_prop; | ||
| int device_id = Kokkos::device_id(); | ||
| KOKKOS_IMPL_HIP_SAFE_CALL(hipGetDeviceProperties(&device_prop, device_id)); | ||
| return std::string{device_prop.gcnArchName}; | ||
| } |
There was a problem hiding this comment.
I would consider using a macro here to avoid code duplication (this is only a suggestion).
There was a problem hiding this comment.
I think it's fine since it's only used twice
pzehner
left a comment
There was a problem hiding this comment.
Some remarks about the CMake files.
Co-authored-by: PaulGannay <paul.gannay@cea.fr>
Co-authored-by: Paul Zehner <paul.zehner@alumni.enseeiht.fr>
Signed-off-by: Adrien Taberner <adrien.taberner@cea.fr>
Signed-off-by: Adrien Taberner <adrien.taberner@cea.fr>
Signed-off-by: Adrien Taberner <adrien.taberner@cea.fr>
| std::string get_gpu_name() { | ||
| hipDeviceProp_t device_prop; | ||
| int device_id = Kokkos::device_id(); | ||
| KOKKOS_IMPL_HIP_SAFE_CALL(hipGetDeviceProperties(&device_prop, device_id)); | ||
| return std::string{device_prop.name}; | ||
| } | ||
|
|
||
| std::string get_gpu_arch() { | ||
| hipDeviceProp_t device_prop; | ||
| int device_id = Kokkos::device_id(); | ||
| KOKKOS_IMPL_HIP_SAFE_CALL(hipGetDeviceProperties(&device_prop, device_id)); | ||
| return std::string{device_prop.gcnArchName}; | ||
| } |
There was a problem hiding this comment.
I think it's fine since it's only used twice
Signed-off-by: Adrien Taberner <adrien.taberner@cea.fr>
Signed-off-by: Adrien Taberner <adrien.taberner@cea.fr>
No description provided.