My research interest lies in computer vision and graphics, with recent focus on 3D vision and 3D scene generation. I previously worked on action localization and detection in videos.
Imagine360 lifts standard perspective video into 360-degree video with rich and structured motion, unlocking dynamic scene experience from full 360 degrees.
LayerPano3D generates full-view, explorable panoramic 3D scene from a single text prompt.
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan, Yuhong Wang, Gangshan Wu, Limin Wang T-PAMI, 2023
arXiv /
code /
blog
We present Temporal Perceiver (TP), a general architecture based on Transformer decoders as a unified solution to detect arbitrary generic boundaries, including shot-level, event-level and scene-level temporal boundaries.
PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points Jing Tan, Xiaotong Zhao, Xintian Shi, Bin Kang, Limin Wang NeurIPS, 2022
arXiv /
code /
blog
PointTAD effectively tackles multi-label TAD by introducing a set of learnable query points to represent the action keyframes.
The first transformer-based framework for temporal action proposal generation.
Professional Services
• Regular Conference reviewer for CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR.
• Journal reviewer for IJCV.
• Teaching Assistant for IERG4998 and IERG4999 in CUHK