安装 docker 与 docker-compose
Ubuntu:
参考文章
sudo apt install apt-transport-https ca-certificates curl software-properties-common gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update -y
sudo apt-get install docker-ce docker-ce-cli docker-compose -y
CentOS:
sudo yum-config-manager --add-repo http://download.docker.com/linux/centos/docker-ce.repo
sudo yum clean all
sudo yum makecache
sudo yum install docker-ce docker-ce-cli docker-compose -y`
报错处理
- Ubuntu 中如上方法操作时,可能在
lsb_release
命令中会报错ModuleNotFoundError: No module named 'lsb_release'
,此时需要将系统完整的lsb_release.py
文件拷贝到报错的目录文件下即可:
sudo cp /usr/lib/python3/dist-packages/lsb_release.py /usr/bin/
- 如果 docker-compose 安装后无法正常使用,推荐去 github release 下载最新版本,放在
/usr/bin
目录下即可。 docker-compose up -d
报错error getting credentials - err: exit status GDBus.Error:org.freedesktop.DBus.Error.ServiceU
:安装包:sudo apt install gnupg2 pass
。
调整用户权限
sudo groupadd docker
sudo gpasswd -a 你的用户名 docker
sudo service docker restart
然后记得重连下 SSH,或者重启终端。
安装 NVIDIA Container Toolkit
参考官方指南
Ubuntu:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
&& \
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
CentOS:
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo yum install -y nvidia-container-toolkit
安装后更新 docker 默认运行时:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
最后可以验证下:
docker run --gpus all nvidia/cuda:11.7.1-devel-ubuntu20.04 nvidia-smi
能正常输出即可。
编写 Dockerfile
...
!!!注意!!!
建议直接使用 pytorch 镜像……不用配置 conda,遇到报错先重启……
Comments NOTHING