Dense Interspecies Face Embedding
NeurIPS 2022



Dense Interspecies Face Embedding (DIFE) is a new direction for understanding faces of various animals by extracting common features among animal faces including human face. There are three main obstacles for interspecies face understanding: (1) lack of animal data compared to human, (2) ambiguous connection between faces of various animals>, and (3) extreme shape and style variance.

Presentation (Korean)

Our Method


To cope with the lack of data, we utilize multi-teacher knowledge distillation of CSE and StyleGAN2 requiring no additional data or label. Then we synthesize pseudo pair images through the latent space exploration of StyleGAN2 to find implicit associations between different animal faces.

Interspecies Dense Keypoint Detection, Interspecies Face Parsing


Even though DIFE is trained without dense annotations of animal faces, interspecies dense keypoint could be found. In the results of interspecies face parsing, the eye, nose, mouth, and hairy parts are discovered which means DIFE has proper semantic information for the interspecies face.

The examples of synthesized pseudo-pair data


The synthesized images are natural and the face geometry of them are same with each origin images. Therefore, we could utilize synthesized pseudo-pair data for semantic matching between faces of different species.

Interspecies Keypoint Transfer


To quantitatively evaluate our method over possible previous methodologies like unsupervised keypoint detection, we perform interspecies facial keypoint transfer on MAFL and AP-10K.


This table shows the quantitative result of the interspecies keypoint transfer on WFLW and AnimalWeb with 9 landmarks including the corners of the eye and mouth. Our method shows the best performance compared to previous methods on every domain pair.

The comparison with SOTA methods on human keypoint detection


Even though our embedder is trained on synthesized datasets, not the target dataset, DIFE shows compatible performance with the early study results of each category meaning DIFE is the apposite baseline for cross-domain face understanding.



The website template was borrowed from Mip-NeRF.