I am currently a third-year Ph.D. student in the School of Automation, Southeast University, under the supervision of Prof. Wankou Yang.

My research interests include MLLM, Agentic RL, Visual Grounding and Video Understanding.

I have published 10+ papers in top international AI conferences and journals such as TPAMI, TIP, TCSVT, PR, ICCV, CVPR, NeurIPS and AAAI.

🔥 News

2026.02: 🎉 One first-author paper has been accepted to PR 2026.
2025.11: 🎉 One first-author paper has been accepted to TCSVT 2025.
2025.09: 🎉 One first-author paper has been accepted to TPAMI 2025.
2025.07: 🎉 Two first-author paper has been accepted to ICCV 2025.
2024.12: 🎉 One first-author paper has been accepted to AAAI 2025.
2024.09: 🎉 One first-author paper has been accepted to NeurIPS 2024.
2023.12: 🎉 One first-author paper has been accepted to TIP 2023.
2021.09: 🎉 One first-author paper has been accepted to TCSVT 2021.

📝 Publications

Research Direction 1: Referring Video Object Segmentation (RefVOS)

Arxiv 2025

Arxiv 2025 MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
Ming Dai, Sen Yang, Boqiang Duan, Wankou Yang, Jingdong Wang

Paper | Code | Project

Highlights: MomentSeg is a MLLM method, which unifies temporal grounding and segmentation, enabling key-frame extraction without relying on any external models. In addition, we introduce a novel [FIND] token, which allows the model to perform temporal grounding without requiring any additional timestamp encoding.

Research Direction 2: Visual Grounding (REC, RES, GREC, GRES)

TPAMI 2025

TPAMI 2025 Improving Generalized Visual Grounding with Instance-aware Joint Learning
Ming Dai, Wenxuan Cheng, Jiang-jiang Liu, Lingfeng Yang, Zhenhua Feng, Wankou Yang, Jingdong Wang

Paper | Code | 中文解读

Highlights: InstanceVG supports instance-level referring segmentation across general scenarios (no/single/multiple targets). It also provides consistent prediction across point, box, and mask.

ICCV 2025

ICCV 2025 PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai, Wenxuan Cheng, Jiedong Zhuang, Jiang-jiang Liu, Hongshen Zhao, Zhenhua Feng, Wankou Yang

Paper | Code | 中文解读

Highlights: PropVG achieves end-to-end two-stage visual grounding, overcoming the traditional drawbacks of previous two-stage approaches that relied on external detectors and were often associated with slow inference and limited performance.

ICCV 2025

ICCV 2025 DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
Ming Dai, Wenxuan Cheng, Jiang-jiang Liu, Sen Yang, Wenxiao Cai, Yanpeng Sun, Wankou Yang

Paper | Code

Highlights: DeRIS analyzes a key bottleneck in visual grounding—Cognition. It decouples the VG task into perception and cognition components, and integrates them effectively through a loopback synergy mechanism.

TCSVT 2025

TCSVT 2025 GC3VG: Generalized Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Kai Chen, Wenxuan Cheng, Jiedong Zhuang, Zhenhua Feng, Pengfei Zhu, Wankou Yang

Paper

Highlights: GC3VG generalizes the C3VG architecture and incorporates UCRM, which implicitly captures region/instance features and explicitly aligns them via an IoU-based relational constraint. The GHA strategy ensures feature-space consistency and boosts the discriminative strength of multi-modal representations.

AAAI 2025 (Selected as Oral)

AAAI 2025 Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Jian Li, Jiedong Zhuang, Xian Zhang, Wankou Yang

Paper | Code

Highlights: C3VG investigates the consistency prediction problem in REC and RIS, introducing a coarse-to-fine architecture that enforces consistency through both implicit and explicit constraints.

NeurIPS 2024

NeurIPS 2024 SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai, Lingfeng Yang, Yihao Xu, Zhenhua Feng, Wankou Yang

Paper | Code | 中文解读

Highlights: SimVG explores the importance of multi-modal understanding for the VG task, proposing a simple yet effective framework. It also adopts a synchronized distillation learning strategy between the teacher and student branches, enhancing the performance of the student branch.

Research Direction 3: Cross-View Geo-Localization

PR 2026

PR 2026 Drone Referring Localization: An Efficient Heterogeneous Spatial Feature Interaction Method For UAV Self-Localization
Ming Dai, Enhui Zheng, Jiahao Chen, Lei Qi, Zhenhua Feng, Wankou Yang

Paper | Code

Highlights: DRL adopts an end-to-end training and inference paradigm to address common issues in image-retrieval-based UAV self-localization, including complex preprocessing, inherent localization errors, and slow inference.

TIP 2023

TIP 2023 Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments
Ming Dai, Enhui Zheng, Zhenhua Feng, Jiedong Zhuang, Wankou Yang

Paper | Code | 中文解读

Highlights: DenseUAV introduces a real-world sampled dataset for vision-based UAV self-localization and provides a comprehensive benchmark for the task.

TCSVT 2021

TCSVT 2021 A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization
Ming Dai, Jianhong Hu, Jiedong Zhuang, Enhui Zheng

Paper | Code

Highlights: FSRA is the first successful application of Transformer models to cross-view geo-localization. It introduces an attention-map-based region partitioning and alignment strategy that alleviates performance degradation caused by viewpoint shifts.

🎖 Honors and Awards

Competition

2023.12 National First Prize, 5th Global Campus AI Algorithm Elite Competition (Zero-Shot Referring Expression Understanding)
2023.10 National First Prize (Champion), 4th “Space Cup” National Innovation and Creativity Competition (Multispectral Object Detection), Team Leader
2022.08 National Second Prize (Runner-up), China Postgraduate Smart City Technology and Creative Design Competition (Object Detection), Team Leader
2018.09 Zhejiang Provincial Robotics Competition: 2nd Prize (Shopping Track), 2nd Prize (Tourism Track), 3rd Prize (Transportation Track)
2017.09 1st Prize (East China Division) and 2nd Prize (National Division), Siemens Cup China Intelligent Manufacturing Challenge, Team Leader

Scholarships and Honors

2025 National Scholarship for Doctoral Students, Advanced Academic Individual, Southeast University
2022 National Scholarship for Graduate Students
2020 Outstanding Graduate of Zhejiang Province, Outstanding Undergraduate Graduate of China Jiliang University
2018 Zhejiang Provincial Government Scholarship

📖 Educations

2023.09 – present Ph.D. Student, School of Automation, Southeast University, Nanjing, China.
2020.09 – 2023.06 Master’s Student, China Jiliang University, Hangzhou, China.
2016.09 – 2020.06 Undergraduate Student, China Jiliang University, Hangzhou, China.

💻 Internships

2026.03 – current Ant Group, Agent Research, Hangzhou, China
2024.12 – 2026.02 Baidu, LMMs Research, Shanghai, China
2022.11 – 2023.05 NIO, Autonomous Driving – Algorithm, Beijing, China
2022.03 – 2022.08 ByteDance, E-commerce – Algorithm Hangzhou, China

💬 Services

Reviewers

TIP, TNNLS, TCSVT, ISPRS, PR
NeurIPS2025, CVPR2025, ICCV2025, AAAI2026, ICLR2026, CVPR2026, ICML2026, ECCV2026

Leadership

2018–2019 President, 1st AI and Robotics Association, China Jiliang University

Ming Dai (戴铭)

🔥 News

📝 Publications

Research Direction 1: Referring Video Object Segmentation (RefVOS)

Research Direction 2: Visual Grounding (REC, RES, GREC, GRES)

Research Direction 3: Cross-View Geo-Localization

🎖 Honors and Awards

📖 Educations

💻 Internships

💬 Services