1Beijing University of Posts and Telecommunications 2 Tsinghua University 3 Beijing Normal University
We present Ins-HOI, a novel method for instance-level humans/hands and objects interaction recovery. Ins-HOI is an implicit-based approach achieves accurate reconstruction of individual geometry and non-visible contact areas.
Recovering detailed interactions between humans/hands and objects is an appealing yet challenging task. Existing methods typically use template-based representations to track human/hand and objects in interactions. Despite the progress, they fail to handle the invisible contact surfaces. In this paper, we propose Ins-HOI, an end-to-end solution to recover human/hand-object reconstruction via instance-level implicit reconstruction. To this end, we introduce an instance-level occupancy field to support simultaneous human/hand and object representation, and a complementary training strategy to handle the lack of instance-level ground truths. Such a representation enables learning a contact prior implicitly from sparse observations. During the complementary training, we augment the real-captured data with synthesized data by randomly composing individual scans of humans/hands and objects and intentionally allowing for penetration. In this way, our network learns to recover individual shapes as completely as possible from the synthesized data, while being aware of the contact constraints and overall reasonability based on real-captured scans. As demonstrated in experiments, our method Ins-HOI can produce reasonable and realistic non-visible contact surfaces even in cases of extremely close interaction. To facilitate the research of this task, we collect a large-scale, high-fidelity 3D scan dataset, including 5.2k high-quality scans with real-world human-chair and hand-object interactions. We will release our dataset and source codes.
Overview of the method: (a) showcases the synthetic data augmentation process using THuman and Ins-Sit dataset to form a training dataset. (b) highlights how the training components provide unique guidance for complementary learning (blue and pink denote the human and chair meshes; purple and red indicate the union and intersection). (c) depicts our benchmark Ins-HOI, which given sparse view inputs to produce instance-level human-object recovery via an end-to-end approach.
Ins-Sit dataset comprises 4700 scans, involving 72 subjects (12 females and 60 males). Each subject seated on 2 of 11 distinct chair types and perform 60 unique poses.
Ins-Grasp dataset comprises 500 scans, involving 50 object types, each involved 10 distinct interactions.
1 of 6 input views
THuman-Sit data Ins-HOI Results Human Part Chair Part
THuman-Grasp data Ins-HOI Results Hand Part Object Part
@article{zhang2023inshoi,
title={Ins-HOI: Instance Aware Human-Object Interactions Recovery},
author={Zhang, Jiajun and Zhang, Yuxiang and Zhang, Hongwen and Zhou, Boyao and Shao, Ruizhi and Hu, Zonghai and Liu, Yebin},
booktitle={arXiv preprint arXiv:2312.09641},
year={2023}
}