I'm interested in embodied AI (embodied question answering, multi-modal navigation, task planning, etc.), 3d computer vision (3d reconstruction, 3d generative models, etc.), SLAM (object SLAM, radiance field based, etc.), world models, 3d vision foundation models, etc.
We propose ObsGraph, an observation-centric hierarchical scene graph that unifies scene representation, retrieval, and exploration, organizing visual evidence into room–view–object layers to enable coarse-to-fine hierarchical retrieval and targeted multi-scale exploration.
We present HERE, an active 3D scene reconstruction framework based on neural radiance fields, enabling high-fidelity implicit mapping via camera trajectory generation driven by epistemic uncertainty estimated through evidential deep learning, for efficient data acquisition and precise scene reconstruction.
We introduce category-level neural fields that learn meaningful common 3D information among objects belonging to the same category present in the scene.
We propose new evaluation methods tailored to gauge the performance of an object remover. We confirm that the proposed methods can make judgments consistent with human evaluators in the COCO dataset, and that they can produce measurements aligning with those using object removal ground truth in the self-acquired dataset.