first add
This commit is contained in:
355
docs/installation.md
Normal file
355
docs/installation.md
Normal file
@@ -0,0 +1,355 @@
|
||||
# Installation and Running Your First Prediction
|
||||
|
||||
You will need a machine running Linux; AlphaFold 3 does not support other
|
||||
operating systems. Full installation requires up to 1 TB of disk space to keep
|
||||
genetic databases (SSD storage is recommended) and an NVIDIA GPU with Compute
|
||||
Capability 8.0 or greater (GPUs with more memory can predict larger protein
|
||||
structures). We have verified that inputs with up to 5,120 tokens can fit on a
|
||||
single NVIDIA A100 80 GB, or a single NVIDIA H100 80 GB. We have verified
|
||||
numerical accuracy on both NVIDIA A100 and H100 GPUs.
|
||||
|
||||
Especially for long targets, the genetic search stage can consume a lot of RAM –
|
||||
we recommend running with at least 64 GB of RAM.
|
||||
|
||||
We provide installation instructions for a machine with an NVIDIA A100 80 GB GPU
|
||||
and a clean Ubuntu 22.04 LTS installation, and expect that these instructions
|
||||
should aid others with different setups.
|
||||
|
||||
The instructions provided below describe how to:
|
||||
|
||||
1. Provision a machine on GCP.
|
||||
1. Install Docker.
|
||||
1. Install NVIDIA drivers for an A100.
|
||||
1. Obtain genetic databases.
|
||||
1. Obtain model parameters.
|
||||
1. Build the AlphaFold 3 Docker container or Singularity image.
|
||||
|
||||
## Provisioning a Machine
|
||||
|
||||
Clean Ubuntu images are available on Google Cloud, AWS, Azure, and other major
|
||||
platforms.
|
||||
|
||||
We first provisioned a new machine in Google Cloud Platform using the following
|
||||
command. We were using a Google Cloud project that was already set up.
|
||||
|
||||
* We recommend using `--machine-type a2-ultragpu-1g` but feel free to use
|
||||
`--machine-type a2-highgpu-1g` for smaller predictions.
|
||||
* If desired, replace `--zone us-central1-a` with a zone that has quota for
|
||||
the machine you have selected. See
|
||||
[gpu-regions-zones](https://cloud.google.com/compute/docs/gpus/gpu-regions-zones).
|
||||
|
||||
```sh
|
||||
gcloud compute instances create alphafold3 \
|
||||
--machine-type a2-ultragpu-1g \
|
||||
--zone us-central1-a \
|
||||
--image-family ubuntu-2204-lts \
|
||||
--image-project ubuntu-os-cloud \
|
||||
--maintenance-policy TERMINATE \
|
||||
--boot-disk-size 1000 \
|
||||
--boot-disk-type pd-balanced
|
||||
```
|
||||
|
||||
This provisions a bare Ubuntu 22.04 LTS image on an
|
||||
[A2 Ultra](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a2-vms)
|
||||
machine with 12 CPUs, 170 GB RAM, 1 TB disk and NVIDIA A100 80 GB GPU attached.
|
||||
We verified the following installation steps from this point.
|
||||
|
||||
## Installing Docker
|
||||
|
||||
These instructions are for rootless Docker.
|
||||
|
||||
### Installing Docker on Host
|
||||
|
||||
Note these instructions only apply to Ubuntu 22.04 LTS images, see above.
|
||||
|
||||
Add Docker's official GPG key. Official Docker instructions are
|
||||
[here](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository).
|
||||
The commands we ran are:
|
||||
|
||||
```sh
|
||||
sudo apt-get update
|
||||
sudo apt-get install ca-certificates curl
|
||||
sudo install -m 0755 -d /etc/apt/keyrings
|
||||
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
|
||||
sudo chmod a+r /etc/apt/keyrings/docker.asc
|
||||
```
|
||||
|
||||
Add the repository to apt sources:
|
||||
|
||||
```sh
|
||||
echo \
|
||||
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
|
||||
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
|
||||
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
||||
sudo docker run hello-world
|
||||
```
|
||||
|
||||
### Enabling Rootless Docker
|
||||
|
||||
Official Docker instructions are
|
||||
[here](https://docs.docker.com/engine/security/rootless/#distribution-specific-hint).
|
||||
The commands we ran are:
|
||||
|
||||
```sh
|
||||
sudo apt-get install -y uidmap systemd-container
|
||||
|
||||
sudo machinectl shell $(whoami)@ /bin/bash -c 'dockerd-rootless-setuptool.sh install && sudo loginctl enable-linger $(whoami) && DOCKER_HOST=unix:///run/user/1001/docker.sock docker context use rootless'
|
||||
```
|
||||
|
||||
## Installing GPU Support
|
||||
|
||||
### Installing NVIDIA Drivers
|
||||
|
||||
Official Ubuntu instructions are
|
||||
[here](https://documentation.ubuntu.com/server/how-to/graphics/install-nvidia-drivers/).
|
||||
The commands we ran are:
|
||||
|
||||
```sh
|
||||
sudo apt-get -y install alsa-utils ubuntu-drivers-common
|
||||
sudo ubuntu-drivers install
|
||||
|
||||
sudo nvidia-smi --gpu-reset
|
||||
|
||||
nvidia-smi # Check that the drivers are installed.
|
||||
```
|
||||
|
||||
Accept "Pending kernel upgrade" dialog if it appears.
|
||||
|
||||
You will need to reboot the instance with `sudo reboot now` to reset the GPU if
|
||||
you see the following warning:
|
||||
|
||||
```text
|
||||
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
|
||||
Make sure that the latest NVIDIA driver is installed and running.
|
||||
```
|
||||
|
||||
Proceed only if `nvidia-smi` has a sensible output.
|
||||
|
||||
### Installing NVIDIA Support for Docker
|
||||
|
||||
Official NVIDIA instructions are
|
||||
[here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
|
||||
The commands we ran are:
|
||||
|
||||
```sh
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
|
||||
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
|
||||
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
||||
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y nvidia-container-toolkit
|
||||
nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
|
||||
systemctl --user restart docker
|
||||
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
|
||||
```
|
||||
|
||||
Check that your container can see the GPU:
|
||||
|
||||
```sh
|
||||
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi
|
||||
```
|
||||
|
||||
The output should look similar to this:
|
||||
|
||||
```text
|
||||
Mon Nov 11 12:00:00 2024
|
||||
+-----------------------------------------------------------------------------------------+
|
||||
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.6 |
|
||||
|-----------------------------------------+------------------------+----------------------+
|
||||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
||||
| | | MIG M. |
|
||||
|=========================================+========================+======================|
|
||||
| 0 NVIDIA A100-SXM4-80GB Off | 00000000:00:05.0 Off | 0 |
|
||||
| N/A 34C P0 51W / 400W | 1MiB / 81920MiB | 0% Default |
|
||||
| | | Disabled |
|
||||
+-----------------------------------------+------------------------+----------------------+
|
||||
|
||||
+-----------------------------------------------------------------------------------------+
|
||||
| Processes: |
|
||||
| GPU GI CI PID Type Process name GPU Memory |
|
||||
| ID ID Usage |
|
||||
|=========================================================================================|
|
||||
| No running processes found |
|
||||
+-----------------------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
## Obtaining AlphaFold 3 Source Code
|
||||
|
||||
You will need to have `git` installed to download the AlphaFold 3 repository:
|
||||
|
||||
```sh
|
||||
git clone https://github.com/google-deepmind/alphafold3.git
|
||||
```
|
||||
|
||||
## Obtaining Genetic Databases
|
||||
|
||||
This step requires `curl` and `zstd` to be installed on your machine.
|
||||
|
||||
AlphaFold 3 needs multiple genetic (sequence) protein and RNA databases to run:
|
||||
|
||||
* [BFD small](https://bfd.mmseqs.com/)
|
||||
* [MGnify](https://www.ebi.ac.uk/metagenomics/)
|
||||
* [PDB](https://www.rcsb.org/) (structures in the mmCIF format)
|
||||
* [PDB seqres](https://www.rcsb.org/)
|
||||
* [UniProt](https://www.uniprot.org/uniprot/)
|
||||
* [UniRef90](https://www.uniprot.org/help/uniref)
|
||||
* [NT](https://www.ncbi.nlm.nih.gov/nucleotide/)
|
||||
* [RFam](https://rfam.org/)
|
||||
* [RNACentral](https://rnacentral.org/)
|
||||
|
||||
We provide a Python program `fetch_databases.py` that can be used to download
|
||||
and set up all of these databases. This process takes around 45 minutes when not
|
||||
installing on local SSD. We recommend running the following in a `screen` or
|
||||
`tmux` session as downloading and decompressing the databases takes some time.
|
||||
|
||||
```sh
|
||||
cd alphafold3 # Navigate to the directory with cloned AlphaFold 3 repository.
|
||||
python3 fetch_databases.py --download_destination=<DATABASES_DIR>
|
||||
```
|
||||
|
||||
This script downloads the databases from a mirror hosted on GCS, with all
|
||||
versions being the same as used in the AlphaFold 3 paper.
|
||||
|
||||
:ledger: **Note: The download directory `<DATABASES_DIR>` should *not* be a
|
||||
subdirectory in the AlphaFold 3 repository directory.** If it is, the Docker
|
||||
build will be slow as the large databases will be copied during the image
|
||||
creation.
|
||||
|
||||
:ledger: **Note: The total download size for the full databases is around 252 GB
|
||||
and the total size when unzipped is 630 GB. Please make sure you have sufficient
|
||||
hard drive space, bandwidth, and time to download. We recommend using an SSD for
|
||||
better genetic search performance, and faster runtime of `fetch_databases.py`.**
|
||||
|
||||
:ledger: **Note: If the download directory and datasets don't have full read and
|
||||
write permissions, it can cause errors with the MSA tools, with opaque
|
||||
(external) error messages. Please ensure the required permissions are applied,
|
||||
e.g. with the `sudo chmod 755 --recursive <DATABASES_DIR>` command.**
|
||||
|
||||
Once the script has finished, you should have the following directory structure:
|
||||
|
||||
```sh
|
||||
pdb_2022_09_28_mmcif_files.tar # ~200k PDB mmCIF files in this tar.
|
||||
bfd-first_non_consensus_sequences.fasta
|
||||
mgy_clusters_2022_05.fa
|
||||
nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta
|
||||
pdb_seqres_2022_09_28.fasta
|
||||
rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta
|
||||
rnacentral_active_seq_id_90_cov_80_linclust.fasta
|
||||
uniprot_all_2021_04.fa
|
||||
uniref90_2022_05.fa
|
||||
```
|
||||
|
||||
## Obtaining Model Parameters
|
||||
|
||||
To request access to the AlphaFold 3 model parameters, please complete
|
||||
[this form](https://forms.gle/svvpY4u2jsHEwWYS6). Access will be granted at
|
||||
Google DeepMind’s sole discretion. We will aim to respond to requests within 2–3
|
||||
business days. You may only use AlphaFold 3 model parameters if received
|
||||
directly from Google. Use is subject to these
|
||||
[terms of use](https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md).
|
||||
|
||||
## Building the Docker Container That Will Run AlphaFold 3
|
||||
|
||||
Then, build the Docker container. This builds a container with all the right
|
||||
python dependencies:
|
||||
|
||||
```sh
|
||||
docker build -t alphafold3 -f docker/Dockerfile .
|
||||
```
|
||||
|
||||
You can now run AlphaFold 3!
|
||||
|
||||
```sh
|
||||
docker run -it \
|
||||
--volume $HOME/af_input:/root/af_input \
|
||||
--volume $HOME/af_output:/root/af_output \
|
||||
--volume <MODEL_PARAMETERS_DIR>:/root/models \
|
||||
--volume <DATABASES_DIR>:/root/public_databases \
|
||||
--gpus all \
|
||||
alphafold3 \
|
||||
python run_alphafold.py \
|
||||
--json_path=/root/af_input/fold_input.json \
|
||||
--model_dir=/root/models \
|
||||
--output_dir=/root/af_output
|
||||
```
|
||||
|
||||
:ledger: **Note: In the example above the databases have been placed on the
|
||||
persistent disk, which is slow.** If you want better genetic and template search
|
||||
performance, make sure all databases are placed on a local SSD.
|
||||
|
||||
If you get an error like the following, make sure the models and data are in the
|
||||
paths (flags named `--volume` above) in the correct locations.
|
||||
|
||||
```
|
||||
docker: Error response from daemon: error while creating mount source path '/srv/alphafold3_data/models': mkdir /srv/alphafold3_data/models: permission denied.
|
||||
```
|
||||
|
||||
## Running Using Singularity Instead of Docker
|
||||
|
||||
You may prefer to run AlphaFold 3 within Singularity. You'll still need to
|
||||
*build* the Singularity image from the Docker container. Afterwards, you will
|
||||
not have to depend on Docker (at structure prediction time).
|
||||
|
||||
### Install Singularity
|
||||
|
||||
Official Singularity instructions are
|
||||
[here](https://docs.sylabs.io/guides/3.3/user-guide/installation.html). The
|
||||
commands we ran are:
|
||||
|
||||
```sh
|
||||
wget https://github.com/sylabs/singularity/releases/download/v4.2.1/singularity-ce_4.2.1-jammy_amd64.deb
|
||||
sudo dpkg --install singularity-ce_4.2.1-jammy_amd64.deb
|
||||
sudo apt-get install -f
|
||||
```
|
||||
|
||||
### Build the Singularity Container From the Docker Image
|
||||
|
||||
After building the *Docker* container above with `docker build -t`, start a
|
||||
local Docker registry and upload your image `alphafold3` to it. Singularity's
|
||||
instructions are [here](https://github.com/apptainer/singularity/issues/1537).
|
||||
The commands we ran are:
|
||||
|
||||
```sh
|
||||
docker run -d -p 5000:5000 --restart=always --name registry registry:2
|
||||
docker tag alphafold3 localhost:5000/alphafold3
|
||||
docker push localhost:5000/alphafold3
|
||||
```
|
||||
|
||||
Then build the Singularity container:
|
||||
|
||||
```sh
|
||||
SINGULARITY_NOHTTPS=1 singularity build alphafold3.simg docker://localhost:5000/alphafold3:latest
|
||||
```
|
||||
|
||||
You can confirm your build by starting a shell and inspecting the environment.
|
||||
For example, you may want to ensure the Singularity image can access your GPU.
|
||||
You may want to restart your computer if you have issues with this.
|
||||
|
||||
```sh
|
||||
singularity exec --nv alphafold3.simg sh -c 'nvidia-smi'
|
||||
```
|
||||
|
||||
You can now run AlphaFold 3!
|
||||
|
||||
```sh
|
||||
singularity exec --nv alphafold3.simg <<args>>
|
||||
```
|
||||
|
||||
For example:
|
||||
|
||||
```sh
|
||||
singularity exec \
|
||||
--nv alphafold3.simg \
|
||||
--bind $HOME/af_input:/root/af_input \
|
||||
--bind $HOME/af_output:/root/af_output \
|
||||
--bind <MODEL_PARAMETERS_DIR>:/root/models \
|
||||
--bind <DATABASES_DIR>:/root/public_databases \
|
||||
python alphafold3/run_alphafold.py \
|
||||
--json_path=/root/af_input/fold_input.json \
|
||||
--model_dir=/root/models \
|
||||
--db_dir=/root/public_databases \
|
||||
--output_dir=/root/af_output
|
||||
```
|
||||
Reference in New Issue
Block a user