Reproducing AD-NeRF synthesizes face video from audio and face images and shows natural speaking style

Posted by cyh123 on Thu, 16 Dec 2021 15:03:06 +0100

Reproducing AD-NeRF synthesizes face video from audio and face images and shows natural speaking style

flyfish

Synthesizing facial video sequences driven by high fidelity audio is an important and challenging problem in many applications such as digital human, chat robot and virtual video conference.
The speech head generation process is regarded as a cross modal mapping from audio to visual face. It is expected that the synthesized face image will show a natural speaking style, and synchronize the realistic streaming media results of the same photos as the original video.

Environmental Science:

Ubuntu 18.04
NVIDIA Driver Version: 440.33.01
CUDA Version: 10.2 cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64
libcudnn8_8.0.3.33-1+cuda10.2_amd64
Python: 3.7
PyTorch: torch-1.8.2+cu102-cp37-cp37m-linux_x86_64
PyTorch3D: pytorch3d-0.6.0-py37_cu102_pyt181

The code can correctly process the data and training model in the above environment

The operation scheme of this paper is slightly different from the official website

Thesis: https://arxiv.org/abs/2103.11078
code: https://github.com/YudongGuo/AD-NeRF
Ad nerf: Audio driven neural radiation fields for talking head synthesis

This article AD-NeRF running environment, model and code download address

Link: https://pan.baidu.com/s/1tvZOwFM8XAnJONPXNvkGYg 
Extraction code: 1 l53

The implementation scheme of this paper is as follows
First download the provided AD-NeRF operating environment, including cuda, cudnn, pytorch and pytorch 3D, which are all matching versions. Localized installation is not prone to problems
Content includes

cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
libcudnn8_8.0.3.33-1+cuda10.2_amd64.deb
libcudnn8-dev_8.0.3.33-1+cuda10.2_amd64.deb
pytorch3d-0.6.0-py37_cu102_pyt181.tar.bz2
torch-1.8.2+cu102-cp37-cp37m-linux_x86_64.whl

Perform steps

1 installation of CUDA and cudnn

Install cuda and cudnn downloaded above in the environment where NVIDIA driver is 440 version, and other driver versions are not tested
If you download another version, see here

2 create anaconda virtual environment called adnerf

conda env create -f environment.yml
conda activate adnerf

3. Install PyTorch for GPU locally

pip install torch-1.8.2+cu102-cp37-cp37m-linux_x86_64.whl

4 install PyTorch3D locally

conda install --use-local pytorch3d-0.6.0-py37_cu102_pyt181.tar.bz2

5 third party model location

Put "01_MorphableModel.mat" in data_util/face_tracking/3DMM/
function

cd data_util/face_tracking
python convert_BFM.py

Text already provided the model

6 making data sets

bash process_data.sh Obama

The dataset is via dataset / vids / Obama Mp4 generated dataset. The network disk already contains the generated dataset

The rest can be trained according to the orders provided on the official website

The training is divided into two parts
(1) Training head nerf
If you encounter the same problem of insufficient video memory as me, you can skip to 8

python NeRFs/HeadNeRF/run_nerf.py --config dataset/Obama/HeadNeRF_config.txt

(2) Train TorsoNeRF

From ad nerf / dataset / Obama / logs / Obama_ Head find the latest model, e.g. 030000_ head. Rename tar to head tar
Put head Tar to ad nerf / dataset / Obama / logs / Obama_ COM
Execution name

python NeRFs/TorsoNeRF/run_nerf.py --config dataset/Obama/TorsoNeRF_config.txt

Training process

The source code is not enough in 12G video memory, so the sample size is changed from 64 to 32

AD-NeRF/NeRFs/HeadNeRF/run_nerf.py
AD-NeRF/NeRFs/TorsoNeRF/run_nerf.py
parser.add_argument("--N_samples", type=int, default=32,
...

Topics: Pytorch Deep Learning

Programmer Think

Reproducing AD-NeRF synthesizes face video from audio and face images and shows natural speaking style

Reproducing AD-NeRF synthesizes face video from audio and face images and shows natural speaking style

1 installation of CUDA and cudnn

2 create anaconda virtual environment called adnerf

3. Install PyTorch for GPU locally

4 install PyTorch3D locally

5 third party model location

6 making data sets

The rest can be trained according to the orders provided on the official website

The source code is not enough in 12G video memory, so the sample size is changed from 64 to 32

Hot Topics