Jing Tan

I'm a Ph.D. Candidate in MMLab at The Chinese University of Hong Kong, supervised by Prof. Dahua Lin. I also work closely with Dr. Tong Wu. Previously, I obtained the B.Sc degree from Nanjing University in 2020, and the Master degree from Nanjing University under the supervision of Prof. Limin Wang in 2023.

My research interest lies in computer vision and graphics, with recent focus on Generative AI and Immersive scene generation. I previously worked on Video Understanding and Temporal Action Detection.

Email / CV / Scholar /

Research (Google Scholar)

Imagine360: Immersive 360 Video Generation from Perspective Anchor
Jing Tan*, Shuai Yang*, Tong Wu, Jingwen He, Yuwei Guo, Ziwei Liu, Dahua Lin
NeurIPS, 2025 (* equal contribution)
project page / code / video / arXiv

Imagine360 lifts standard perspective video into 360-degree video with rich and structured motion, unlocking dynamic scene experience from full 360 degrees.

LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation
Shuai Yang*, Jing Tan*, Mengchen Zhang, Tong Wu, Yixuan Li, Gordon Wetzstein, Ziwei Liu, Dahua Lin
SIGGRAPH (Conference Track), 2025 (* equal contribution)
project page / code / video / arXiv

LayerPano3D generates full-view, explorable panoramic 3D scene from a single text prompt.

Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan, Yuhong Wang, Gangshan Wu, Limin Wang
T-PAMI, 2023
arXiv / code / blog

We present Temporal Perceiver (TP), a general architecture based on Transformer decoders as a unified solution to detect arbitrary generic boundaries, including shot-level, event-level and scene-level temporal boundaries.

PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points
Jing Tan, Xiaotong Zhao, Xintian Shi, Bin Kang, Limin Wang
NeurIPS, 2022
arXiv / code / blog

PointTAD effectively tackles multi-label TAD by introducing a set of learnable query points to represent the action keyframes.

Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan*, Jiaqi Tang*, Limin Wang, Gangshan Wu
ICCV, 2021 (* equal contribution)
pdf / code / blog

The first transformer-based framework for temporal action proposal generation.

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
Mengchen Zhang, Tong Wu, Jing Tan, Ziwei Liu, Gordon Wetzstein,Dahua Lin.
ICCV, 2025
project page / arXiv / code /

GenDoP is an auto-regressive model that generate artistic and expressive camera trajectories from text prompts and geometric cues.

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
Zhibin Li, Tong Wu, Jing Tan, Mengchen Zhang, Jiaqi Wang, Dahua Lin.
ICLR, 2025
project page / arXiv / code / data

IDArb is a Diffusion-based intrinsic decomposition framework for an arbitrary number of image inputs under varying illuminations

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin.
NeurIPS D&B Track, 2024
project page / arXiv / code

Camera-controllable human image animation dataset and framework.

Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, Limin Wang
CVPR, 2024
arXiv / code

A new Dual-level query-based TAD framework to precisely detect actions from both instance-level and boundary-level.

Professional Services

• Regular Conference reviewer for CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR; Reviewer for SIGGRAPH2025.
• Journal reviewer for IJCV.
• Teaching Assistant for IERG4998 and IERG4999 in CUHK

Thanks Jon Barron for sharing the source code of this website template.