Dense Interspecies Face Embedding NeurIPS 2022
- Sejong Yang Yonsei University
- Subin Jeon Yonsei University
- Seonghyeon Nam York University
- Seon Joo Kim Yonsei University
Abstract
Dense Interspecies Face Embedding (DIFE) is a new direction for understanding faces of various animals by extracting common features among animal faces including human face. There are three main obstacles for interspecies face understanding: (1) lack of animal data compared to human, (2) ambiguous connection between faces of various animals>, and (3) extreme shape and style variance.
Presentation (Korean)
Our Method
To cope with the lack of data, we utilize multi-teacher knowledge distillation of CSE and StyleGAN2 requiring no additional data or label. Then we synthesize pseudo pair images through the latent space exploration of StyleGAN2 to find implicit associations between different animal faces.
Interspecies Dense Keypoint Detection, Interspecies Face Parsing
Even though DIFE is trained without dense annotations of animal faces, interspecies dense keypoint could be found. In the results of interspecies face parsing, the eye, nose, mouth, and hairy parts are discovered which means DIFE has proper semantic information for the interspecies face.
The examples of synthesized pseudo-pair data
The synthesized images are natural and the face geometry of them are same with each origin images. Therefore, we could utilize synthesized pseudo-pair data for semantic matching between faces of different species.
Interspecies Keypoint Transfer
To quantitatively evaluate our method over possible previous methodologies like unsupervised keypoint detection, we perform interspecies facial keypoint transfer on MAFL and AP-10K.
This table shows the quantitative result of the interspecies keypoint transfer on WFLW and AnimalWeb with 9 landmarks including the corners of the eye and mouth. Our method shows the best performance compared to previous methods on every domain pair.
The comparison with SOTA methods on human keypoint detection
Even though our embedder is trained on synthesized datasets, not the target dataset, DIFE shows compatible performance with the early study results of each category meaning DIFE is the apposite baseline for cross-domain face understanding.
Citation
Acknowledgements
The website template was borrowed from Mip-NeRF.