Jing Tan

I'm a Ph.D. student in MMLab at The Chinese University of Hong Kong, supervised by Prof. Dahua Lin. I also work closely with Dr. Tong Wu. Previously, I obtained the B.Sc degree from Nanjing University in 2020, and the Master degree from Nanjing University under the supervision of Prof. Limin Wang in 2023.

My research interest lies in computer vision and graphics, with recent focus on 3D vision and 3D scene generation. I previously worked on action localization and detection in videos.

Email  /  CV  /  Scholar  /  Twitter  /  Github

profile photo

Research

LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation
Shuai Yang*, Jing Tan*, Mengchen Zhang, Tong Wu, Yixuan Li, Gordon Wetzstein, Ziwei Liu, Dahua Lin
Arxiv, 2024 (* equal contribution)
project page / video / arXiv

LayerPano3D generates full-view, explorable panoramic 3D scene from a single text prompt.

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin.
NeurIPS D&B Track, 2024
project page / arXiv / code

We propose camera-controllable human image animation task for generating video clips that are similar to real movie clips. To achieve this, we collect a dataset named HumanVid, and a baseline model combined by Animate Anyone and CameraCtrl. Without any tricks, we show that a simple baseline trained on our dataset could generate movie-level video clips.

Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, Limin Wang
CVPR, 2024
arXiv / code

A new Dual-level query-based TAD framework to precisely detect actions from both instance-level and boundary-level.

Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan, Yuhong Wang, Gangshan Wu, Limin Wang
T-PAMI, 2023
arXiv / code / blog

We present Temporal Perceiver (TP), a general architecture based on Transformer decoders as a unified solution to detect arbitrary generic boundaries, including shot-level, event-level and scene-level temporal boundaries.

PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points
Jing Tan, Xiaotong Zhao, Xintian Shi, Bin Kang, Limin Wang
NeurIPS, 2022
arXiv / code / blog

PointTAD effectively tackles multi-label TAD by introducing a set of learnable query points to represent the action keyframes.

Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan*, Jiaqi Tang*, Limin Wang, Gangshan Wu
ICCV, 2021 (* equal contribution)
pdf / code / blog

The first transformer-based framework for temporal action proposal generation.

Professional Services

• Regular Conference reviewer for CVPR, ICCV, ECCV, NeurIPS, ICML.
• Journal reviewer for IJCV.
• Teaching Assistant for IERG4998 and IERG4999 in CUHK




Thanks Jon Barron for sharing the source code of this website template.