61 lines
1.8 KiB
Markdown
61 lines
1.8 KiB
Markdown
## deepspeed docker image build
|
|
|
|
```shell
|
|
docker-compose -f docker-compose_pytorch1.13.yml build
|
|
docker-compose -f docker-compose_pytorch2.3.yml build
|
|
```
|
|
|
|
## test command
|
|
|
|
```shell
|
|
docker run -it --gpus all --name deepspeed_test --shm-size=1gb --rm hotwa/deepspeed:latest /bin/bash
|
|
```
|
|
|
|
## [查询GPU 架构 给变量赋值](https://blog.csdn.net/zong596568821xp/article/details/106411024)
|
|
|
|
```shell
|
|
git clone https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps.git
|
|
cd deepstream_tlt_apps/TRT-OSS/x86
|
|
nvcc deviceQuery.cpp -o deviceQuery
|
|
./deviceQuery
|
|
```
|
|
|
|
H100 输出
|
|
|
|
```shell
|
|
(base) root@node19:~/bgpt/deepstream_tlt_apps/TRT-OSS/x86# ./deviceQuery
|
|
Detected 8 CUDA Capable device(s)
|
|
|
|
Device 0: "NVIDIA H100 80GB HBM3"
|
|
CUDA Driver Version / Runtime Version 12.4 / 10.1
|
|
CUDA Capability Major/Minor version number: 9.0
|
|
|
|
Device 1: "NVIDIA H100 80GB HBM3"
|
|
CUDA Driver Version / Runtime Version 12.4 / 10.1
|
|
CUDA Capability Major/Minor version number: 9.0
|
|
|
|
Device 2: "NVIDIA H100 80GB HBM3"
|
|
CUDA Driver Version / Runtime Version 12.4 / 10.1
|
|
CUDA Capability Major/Minor version number: 9.0
|
|
|
|
Device 3: "NVIDIA H100 80GB HBM3"
|
|
CUDA Driver Version / Runtime Version 12.4 / 10.1
|
|
CUDA Capability Major/Minor version number: 9.0
|
|
|
|
Device 4: "NVIDIA H100 80GB HBM3"
|
|
CUDA Driver Version / Runtime Version 12.4 / 10.1
|
|
CUDA Capability Major/Minor version number: 9.0
|
|
|
|
Device 5: "NVIDIA H100 80GB HBM3"
|
|
CUDA Driver Version / Runtime Version 12.4 / 10.1
|
|
CUDA Capability Major/Minor version number: 9.0
|
|
|
|
Device 6: "NVIDIA H100 80GB HBM3"
|
|
CUDA Driver Version / Runtime Version 12.4 / 10.1
|
|
CUDA Capability Major/Minor version number: 9.0
|
|
|
|
Device 7: "NVIDIA H100 80GB HBM3"
|
|
CUDA Driver Version / Runtime Version 12.4 / 10.1
|
|
CUDA Capability Major/Minor version number: 9.0
|
|
|
|
``` |