3D human object interaction synthesis results by InteractAnything. Our method enables the generation of diverse, detailed, and novel interactions for open-set 3D objects. Given a simple text description with goal interaction and any object mesh as input, we can synthesize different natural HOI results without training on 3D assets. The orange and green boxes of (b) indicate detailed contact poses from different views.
Framework of InteractAnything. Given a text description and an open-set object mesh as input, our approach begins by querying LLM feedback to infer precise human-object relationships, which are used to initialize object properties (Section 3.2). Next, we analyze the contact affordance on the object geometry (Section 3.3). The human pose is synthesized using a pre-trained 2D diffusion model, guided by SSDS loss and designed spatial constraint (Section 3.4). Finally, based on the targeted object contact areas and a plausible human pose, we perform expressive HOI optimization to synthesize realistic and contact-accurate 3D human object interactions (Section 3.5).
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: hold Object: baby
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: lift Object: backpack
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: stand on Object: basketball
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: lie on Object: bed
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: ride Object: bicycle
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: hold Object: car
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: hold Object: chair
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: print Object: keyboard
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: hold Object: knife
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: ride Object: motorcycle
Magic3D
DreamFusion
DreamFusion*
DreamHOI
Ours
Action: hug Object: robot