I am currently a third-year Ph.D. student in the School of Automation, Southeast University, under the supervision of Prof. Wankou Yang.

My research interests include MLLM, Visual Grounding, Video Understanding and RL.

I have published 10+ papers in top international AI conferences and journals such as TPAMI, TIP, TCSVT, PR, ICCV, NeurIPS, and AAAI.

🔥 News

  • 2025.11: 🎉 One first-author paper has been accepted to TCSVT 2025.
  • 2025.09: 🎉 One first-author paper has been accepted to TPAMI 2025.
  • 2025.07: 🎉 Two first-author paper has been accepted to ICCV 2025.
  • 2024.12: 🎉 One first-author paper has been accepted to AAAI 2025.
  • 2024.09: 🎉 One first-author paper has been accepted to NeurIPS 2024.
  • 2023.12: 🎉 One first-author paper has been accepted to TIP 2023.
  • 2021.09: 🎉 One first-author paper has been accepted to TCSVT 2021.

📝 Publications

Research Direction 1: Referring Video Object Segmentation (RefVOS)

Arxiv 2025
sym

Arxiv 2025 MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
Ming Dai, Sen Yang, Boqiang Duan, Wankou Yang, Jingdong Wang

Paper | Code | Project

Highlights: MomentSeg is a MLLM method, which unifies temporal grounding and segmentation, enabling key-frame extraction without relying on any external models. In addition, we introduce a novel [FIND] token, which allows the model to perform temporal grounding without requiring any additional timestamp encoding.

Research Direction 2: Visual Grounding (REC, RES, GREC, GRES)

TPAMI 2025
sym

TPAMI 2025 Improving Generalized Visual Grounding with Instance-aware Joint Learning
Ming Dai, Wenxuan Cheng, Jiang-jiang Liu, Lingfeng Yang, Zhenhua Feng, Wankou Yang, Jingdong Wang

Paper | Code | 中文解读

Highlights: InstanceVG supports instance-level referring segmentation across general scenarios (no/single/multiple targets). It also provides consistent prediction across point, box, and mask.

ICCV 2025
sym

ICCV 2025 PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai, Wenxuan Cheng, Jiedong Zhuang, Jiang-jiang Liu, Hongshen Zhao, Zhenhua Feng, Wankou Yang

Paper | Code | 中文解读

Highlights: PropVG achieves end-to-end two-stage visual grounding, overcoming the traditional drawbacks of previous two-stage approaches that relied on external detectors and were often associated with slow inference and limited performance.

ICCV 2025
sym

ICCV 2025 DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
Ming Dai, Wenxuan Cheng, Jiang-jiang Liu, Sen Yang, Wenxiao Cai, Yanpeng Sun, Wankou Yang

Paper | Code img

Highlights: DeRIS analyzes a key bottleneck in visual grounding—Cognition. It decouples the VG task into perception and cognition components, and integrates them effectively through a loopback synergy mechanism.

AAAI 2025 (Selected as Oral)
sym

AAAI 2025 Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Jian Li, Jiedong Zhuang, Xian Zhang, Wankou Yang

Paper | Code

Highlights: C3VG investigates the consistency prediction problem in REC and RIS, introducing a coarse-to-fine architecture that enforces consistency through both implicit and explicit constraints.

NeurIPS 2024
sym

NeurIPS 2024 SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai, Lingfeng Yang, Yihao Xu, Zhenhua Feng, Wankou Yang

Paper | Code | 中文解读

Highlights: SimVG explores the importance of multi-modal understanding for the VG task, proposing a simple yet effective framework. It also adopts a synchronized distillation learning strategy between the teacher and student branches, enhancing the performance of the student branch.

Research Direction 3: Cross-View Geo-Localization

Arxiv 2024
sym

Arxiv 2024 Drone Referring Localization: An Efficient Heterogeneous Spatial Feature Interaction Method For UAV Self-Localization
Ming Dai, Enhui Zheng, Jiahao Chen, Lei Qi, Zhenhua Feng, Wankou Yang

Paper | Code

Highlights: DRL adopts an end-to-end training and inference paradigm to address common issues in image-retrieval-based UAV self-localization, including complex preprocessing, inherent localization errors, and slow inference.

TIP 2023
sym

TIP 2023 Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments
Ming Dai, Enhui Zheng, Zhenhua Feng, Jiedong Zhuang, Wankou Yang

Paper | Code | 中文解读

Highlights: DenseUAV introduces a real-world sampled dataset for vision-based UAV self-localization and provides a comprehensive benchmark for the task.

TCSVT 2021
sym

TCSVT 2021 A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization
Ming Dai, Jianhong Hu, Jiedong Zhuang, Enhui Zheng

Paper | Code

Highlights: FSRA is the first successful application of Transformer models to cross-view geo-localization. It introduces an attention-map-based region partitioning and alignment strategy that alleviates performance degradation caused by viewpoint shifts.

🎖 Honors and Awards

Competition

  • 2023.12 National First Prize, 5th Global Campus AI Algorithm Elite Competition (Zero-Shot Referring Expression Understanding)
  • 2023.10 National First Prize (Champion), 4th “Space Cup” National Innovation and Creativity Competition (Multispectral Object Detection), Team Leader
  • 2022.08 National Second Prize (Runner-up), China Postgraduate Smart City Technology and Creative Design Competition (Object Detection), Team Leader
  • 2018.09 Zhejiang Provincial Robotics Competition: 2nd Prize (Shopping Track), 2nd Prize (Tourism Track), 3rd Prize (Transportation Track)
  • 2017.09 1st Prize (East China Division) and 2nd Prize (National Division), Siemens Cup China Intelligent Manufacturing Challenge, Team Leader

Scholarships and Honors

  • 2025 National Scholarship for Doctoral Students, Advanced Academic Individual, Southeast University
  • 2022 National Scholarship for Graduate Students
  • 2020 Outstanding Graduate of Zhejiang Province, Outstanding Undergraduate Graduate of China Jiliang University
  • 2018 Zhejiang Provincial Government Scholarship

📖 Educations

  • 2023.09 – present Ph.D. Student, School of Automation, Southeast University, Nanjing, China.
  • 2020.09 – 2023.06 Master’s Student, China Jiliang University, Hangzhou, China.
  • 2016.09 – 2020.06 Undergraduate Student, China Jiliang University, Hangzhou, China.

💻 Internships

  • 2024.12 – present Baidu, LMMs Research, Shanghai, China
  • 2022.11 – 2023.05 NIO, Autonomous Driving – Algorithm, Beijing, China
  • 2022.03 – 2022.08 ByteDance, E-commerce – Algorithm Hangzhou, China

💬 Services

Reviewers

  • TNNLS, TCSVT, ISPRS, PR
  • NeurIPS2025, CVPR2025, ICCV2025, AAAI2026, ICLR2026, CVPR2026

Leadership

  • 2018–2019 President, 1st AI and Robotics Association, China Jiliang University