SemiHand: Semi-supervised Hand Pose Estimation with Consistency

Linlin Yang1,2   Shicheng Chen1   Angela Yao1
1National University of Singapore   2University of Bonn


Sources

[PDF] [Supplementary] [Poster] [Slides] [Quantitative Results] [Code]

Abstract

We present SemiHand, a semi-supervised framework for 3D hand pose estimation from monocular images. We pre-train the model on labelled synthetic data and fine-tune it on unlabelled real-world data by pseudo-labeling with consistency training. By design, we introduce data augmentation of differing difficulties, consistency regularizer, label correction and sample selection for RGB-based 3D hand pose estimation. In particular, by approximating the hand masks from hand poses, we propose a cross-modal consistency and leverage semantic predictions to guide the predicted poses. Meanwhile, we introduce pose registration as label correction to guarantee the biomechanical feasibility of hand bone lengths. Experiments show that our method achieves a favorable improvement on real-world datasets after fine-tuning.

Pipeline


Overview of SemiHand. The model is pre-trained on labelled synthetic data. Consistency training (orange double headed arrow) on unlabelled real-world data with perturbation augmentations and label correction and sample selection (blue dash-dotted arrow) together with augmentation of differing difficulties.

Modules


Pseudo-labelling of SemiHand. Our pseudo-label with confidence is generated based on the prediction from original (blue pose), the prediction from perturbation (green pose) and the corrected prediction (red pose).


Overview of cross-modal consistency loss. (uv, d) are 2.5D hand outputs; w denotes the hand mask.


Overview of view consistency loss for 2.5D representation.

Results


Gradual convergence from the prediction of pre-trained model to our final prediction. The arrows indicate the direction and distance of prediction movement during fine-tuning. For 10th iteration, the optimization converges because the length of arrows becomealmost zeros. We highlight the differences between our stable predictions and the ground-truth poses with red boxes.

AUC: Comparison to state-of-the-art on STB and DO.


Comparison of baseline, with only consistency training, with only pseudo-labeling and our proposed SemiHand.

BibTeX

@inproceedings{yang2021semihand,
title={SemiHand: Semi-supervised Hand Pose Estimation with Consistency},
author={Yang, Linlin and Chen, Shicheng and Yao, Angela},
booktitle={ICCV},
year={2021}}