143 lines
5.8 KiB
Markdown
Executable File
143 lines
5.8 KiB
Markdown
Executable File
# Base Jupyter Notebook Stack
|
||
|
||
## ds_report
|
||
|
||
```shell
|
||
[2024-07-17 02:25:56,956] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
||
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
|
||
[WARNING] async_io: please install the libaio-dev package with apt
|
||
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
|
||
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
|
||
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
|
||
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
|
||
|
||
(deepspeed) root@ubuntu-finetune:~/binbbt/train/pretrain# cat .deepspeed_env
|
||
CUDA_HOME=/usr/local/cuda/
|
||
TORCH_USE_CUDA_DSA=1
|
||
CUTLASS_PATH=/opt/cutlass
|
||
TORCH_CUDA_ARCH_LIST="80;89;90;90a"
|
||
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:/usr/local/mpi/lib:/usr/local/mpi/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
|
||
NCCL_DEBUG=WARN
|
||
NCCL_SOCKET_IFNAME=bond0
|
||
NCCL_IB_HCA=mlx5_0:1,mlx5_2:1,mlx5_4:1,mlx5_6:1
|
||
NCCL_IB_GID_INDEX=3
|
||
NCCL_NET_GDR_LEVEL=2
|
||
NCCL_P2P_DISABLE=0
|
||
NCCL_IB_DISABLE=0
|
||
```
|
||
|
||
## test command
|
||
|
||
docker run -it --rm --network=host --privileged --ipc=host --ulimit memlock=-1 --gpus all hotwa/notebook:ngc
|
||
docker run --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/pytorch:24.06-py3 /bin/bash
|
||
docker run --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 hotwa/notebook:ngc /bin/bash
|
||
|
||
```shell
|
||
nvidia-smi
|
||
nvcc -V
|
||
ninja --version
|
||
ds_report
|
||
python -c "import torch; print('torch:', torch.__version__, torch)"
|
||
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
|
||
python -c "import deepspeed; deepspeed.ops.op_builder.CPUAdamBuilder().load()"
|
||
python -c "from flash_attn import flash_attn_func, flash_attn_varlen_func"
|
||
python -c "import apex.amp; print('Apex is installed and the amp module is available.')"
|
||
python -c "from xformers import ops as xops"
|
||
ibstat
|
||
ofed_info -s # 如果输出显示了 OFED 版本号,则说明 OFED 驱动已安装。
|
||
mst version
|
||
mpirun --version
|
||
```
|
||
|
||
> **Images hosted on Docker Hub are no longer updated. Please, use [quay.io image](https://quay.io/repository/jupyter/base-notebook)**
|
||
|
||
[](https://hub.docker.com/r/jupyter/base-notebook/)
|
||
[](https://hub.docker.com/r/jupyter/base-notebook/)
|
||
[](https://hub.docker.com/r/jupyter/base-notebook/ "jupyter/base-notebook image size")
|
||
|
||
GitHub Actions in the <https://github.com/jupyter/docker-stacks> project builds and pushes this image to the Registry.
|
||
|
||
Please visit the project documentation site for help to use and contribute to this image and others.
|
||
|
||
- [Jupyter Docker Stacks on ReadTheDocs](https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html)
|
||
- [Selecting an Image :: Core Stacks :: jupyter/base-notebook](https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-base-notebook)
|
||
|
||
# 构建docker-compose spawner镜像的Dockerfile
|
||
|
||
构建基[础镜像参考](https://github.com/jupyter/docker-stacks)
|
||
|
||
```shell
|
||
cp docker-stacks/images/base-notebook/* ./spawnerdockerfile/
|
||
cd spawnerdockerfile
|
||
docker buildx build -t hotwa/notebook:latest . -f Dockerfile.base-notebook --load
|
||
```
|
||
|
||
# 添加虚拟环境到jupyterhub
|
||
|
||
```shell
|
||
# 创建新的虚拟环境
|
||
micromamba create -n plot -c conda-forge scienceplots autopep8 python=3 ipykernel pandas numpy matplotlib scipy seaborn orange3 -y
|
||
micromamba run -n plot python -m pip install bamboolib
|
||
# 激活需要添加的虚拟环境
|
||
micromamba activate plot
|
||
# 环境中安装ipykernel
|
||
micromamba install -c conda-forge ipykernel -y
|
||
# 将新的虚拟环境添加为一个jupyter的内核
|
||
micromamba run -n plot python -m ipykernel install --user --name="sciplot" --display-name="SCIPlot Environment"
|
||
```
|
||
|
||
# micromamba 提示需要初始化 解决方法
|
||
|
||
```shell
|
||
eval "$(micromamba shell hook --shell bash)"
|
||
```
|
||
|
||
或者使用
|
||
|
||
```shell
|
||
miromamba init
|
||
source ~/.bashrc
|
||
```
|
||
|
||
这个操作会激活micromamba(mamba, conda)的安装目录`etc/profile.d/micromamba.sh`激活添加到bash初始化文件`.bashrc`里面
|
||
|
||
# docker镜像
|
||
|
||
关于这个jupyterlab的spawner的notebook启动的镜像来源于:quay.io/jupyterdocker-stacks-foundation
|
||
|
||
这个镜像的构建仓库是:https://github.com/jupyter/docker-stacks/blob/main/images/docker-stacks-foundation/Dockerfile
|
||
|
||
可以把这个Dockerfile的ARG参数ROOT_CONTAINER修改为docker pull nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
|
||
|
||
pytorch-notebook构建顺序是:
|
||
|
||
docker-stacks-foundation
|
||
base-notebook
|
||
minimal-notebook
|
||
scipy-notebook
|
||
pytorch-notebook
|
||
|
||
第一步
|
||
|
||
```shell
|
||
git clone https://github.com/jupyter/docker-stacks.git
|
||
cd docker-stacks/images/docker-stacks-foundation
|
||
docker buildx build --build-arg ROOT_CONTAINER=nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04 -t quay.io/hotwa/docker-stacks-foundation:latest . --load # docker pull nvidia/cuda:12.4.1-devel-ubuntu22.04
|
||
cd ../base-notebook
|
||
docker buildx build --build-arg OWNER=hotwa -t quay.io/hotwa/base-notebook:latest . --load
|
||
cd ../minimal-notebook/
|
||
docker buildx build --build-arg OWNER=hotwa -t quay.io/hotwa/minimal-notebook:latest . --load
|
||
cd ../scipy-notebook
|
||
docker buildx build --build-arg OWNER=hotwa -t quay.io/hotwa/scipy-notebook:latest . --load
|
||
cd ../pytorch-notebook
|
||
docker buildx build --build-arg OWNER=hotwa -t quay.io/hotwa/pytorch-notebook:latest . --load
|
||
```
|
||
|
||
# 然后构建自己的基础镜像
|
||
|
||
```shell
|
||
docker buildx build --build-arg OWNER=hotwa -t quay.io/hotwa/notebook:latest -f Dockerfile.base-notebook . --load
|
||
# 导出保存
|
||
|
||
```
|