I am currently a fourth-year Ph.D. student in the School of Automation at Southeast University, advised by Prof. Wankou Yang.

I have gained algorithm and research internship experience at ByteDance (2022), NIO (2023), Baidu (2024-2025), and Ant Group (2026). During my internship at Baidu, I was fortunate to work under the guidance of Dr. Jingdong Wang.

I have published 10+ first-authored papers in top international AI conferences and journals, including TPAMI, TIP, ICML, ICCV, CVPR, and NeurIPS. My work has received 1,000+ citations on Google Scholar. My research interests include MLLMs, Agentic RL, Visual Grounding, Video/Image Referring Segmentation, and Agentic Search.

Please feel free to contact me at 869906992@qq.com for questions, discussions, or potential collaborations.

🔥 News

  • 2026.07: 🚀 Our technical report SimpleSearch-VL is released, exploring efficient, reliable, and practical multimodal agentic search. Project/Repo
  • 2026.06: 🎉 MomentSeg has been accepted to ECCV 2026. The project page and code are publicly available.
  • 2026.03: 🎉 VideoSEG-O3 has been accepted to ICML 2026. The code is publicly available.
  • 2026.02: 🎉 DeRVOS has been accepted to CVPR 2026.
  • 2026.01: 🎉 DRL has been accepted to Pattern Recognition. The code is publicly available.
  • 2025.11: 🎉 GC3VG, an extension of C3VG, has been accepted to TCSVT 2025.
  • 2025.09: 🎉 InstanceVG has been accepted to TPAMI 2025. The code is publicly available.
  • 2025.07: 🎉 Two papers have been accepted to ICCV 2025: PropVG and DeRIS. The code for PropVG and DeRIS is publicly available.
  • 2024.12: 🎉 C3VG has been accepted to AAAI 2025 as an oral presentation. The code is publicly available.
  • 2024.09: 🎉 SimVG has been accepted to NeurIPS 2024. The code is publicly available.
  • 2023.12: 🎉 DenseUAV has been accepted to TIP 2023. The code is publicly available.
  • 2021.09: 🎉 FSRA has been accepted to TCSVT 2021. The code is publicly available.

📝 Publications

arXiv 2026
SimpleSearch-VL
arXiv 2026 SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search
Ming Dai, Zhihong Lu, Jinjie Gu, Jiedong Zhuang, Yefeng Liu, Wankou Yang, Jian Wang, Chunhua Shen

Research Direction 2: Referring/Reasoning Video Object Segmentation (RVOS)

ICML 2026
sym
ICML 2026 VideoSEG-O3: A Multi-turn Reinforcement Learning Framework for Reasoning Video Object Segmentation
Ming Dai, Sen Yang, Boqiang Duan, Boyuan Tong, Jiedong Zhuang, Wankou Yang, Jingdong Wang
CVPR 2026
sym
CVPR 2026 DeRVOS: Decoupling Consistent Trajectory Generation and Multimodal Understanding for Referring Video Object Segmentation
Wenxuan Cheng*, Ming Dai*, Huimin Lu, Wankou Yang
ECCV 2026
sym
ECCV 2026 MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
Ming Dai, Sen Yang, Boqiang Duan, Wankou Yang, Jingdong Wang

Research Direction 3: Visual Grounding (REC, RES, GREC, GRES)

TPAMI 2025
sym
TPAMI 2025 Improving Generalized Visual Grounding with Instance-aware Joint Learning
Ming Dai, Wenxuan Cheng, Jiang-jiang Liu, Lingfeng Yang, Zhenhua Feng, Wankou Yang, Jingdong Wang
ICCV 2025
sym
ICCV 2025 PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai, Wenxuan Cheng, Jiedong Zhuang, Jiang-jiang Liu, Hongshen Zhao, Zhenhua Feng, Wankou Yang
ICCV 2025
sym
ICCV 2025 DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
Ming Dai, Wenxuan Cheng, Jiang-jiang Liu, Sen Yang, Wenxiao Cai, Yanpeng Sun, Wankou Yang
TCSVT 2025
sym
TCSVT 2025 GC3VG: Generalized Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Kai Chen, Wenxuan Cheng, Jiedong Zhuang, Zhenhua Feng, Pengfei Zhu, Wankou Yang
AAAI 2025 (Selected as Oral)
sym
AAAI 2025 Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Jian Li, Jiedong Zhuang, Xian Zhang, Wankou Yang
NeurIPS 2024
sym
NeurIPS 2024 SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai, Lingfeng Yang, Yihao Xu, Zhenhua Feng, Wankou Yang

Research Direction 4: Cross-View Geo-Localization

PR 2026
sym
PR 2026 Drone Referring Localization: An Efficient Heterogeneous Spatial Feature Interaction Method For UAV Self-Localization
Ming Dai, Enhui Zheng, Jiahao Chen, Lei Qi, Zhenhua Feng, Wankou Yang
TIP 2023
sym
TIP 2023 Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments
Ming Dai, Enhui Zheng, Zhenhua Feng, Jiedong Zhuang, Wankou Yang
TCSVT 2021
sym
TCSVT 2021 A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization
Ming Dai, Jianhong Hu, Jiedong Zhuang, Enhui Zheng

🎖 Honors and Awards

Competition

  • 2023.12 National First Prize, 5th Global Campus AI Algorithm Elite Competition (Zero-Shot Referring Expression Understanding)
  • 2023.10 National First Prize (Champion), 4th “Space Cup” National Innovation and Creativity Competition (Multispectral Object Detection), Team Leader
  • 2022.08 National Second Prize (Runner-up), China Postgraduate Smart City Technology and Creative Design Competition (Object Detection), Team Leader
  • 2018.09 Zhejiang Provincial Robotics Competition: 2nd Prize (Shopping Track), 2nd Prize (Tourism Track), 3rd Prize (Transportation Track)
  • 2017.09 1st Prize (East China Division) and 2nd Prize (National Division), Siemens Cup China Intelligent Manufacturing Challenge, Team Leader

Scholarships and Honors

  • 2025 National Scholarship for Doctoral Students, Advanced Academic Individual, Southeast University
  • 2022 National Scholarship for Graduate Students
  • 2020 Outstanding Graduate of Zhejiang Province, Outstanding Undergraduate Graduate of China Jiliang University
  • 2018 Zhejiang Provincial Government Scholarship

📖 Educations

  • Southeast University logo 2023.09 – present Ph.D. Student, School of Automation, Southeast University, Nanjing, China.
  • China Jiliang University logo 2020.09 – 2023.06 Master’s Student, China Jiliang University, Hangzhou, China.
  • China Jiliang University logo 2016.09 – 2020.06 Undergraduate Student, China Jiliang University, Hangzhou, China.

💻 Internships

  • Ant Group logo 2026.03 – current Ant Group, Agent Research, Hangzhou, China
  • Baidu logo 2024.12 – 2026.02 Baidu, LMMs Research, Shanghai, China
  • NIO logo 2022.11 – 2023.05 NIO, Autonomous Driving – Algorithm, Beijing, China
  • ByteDance logo 2022.03 – 2022.08 ByteDance, E-commerce – Algorithm, Hangzhou, China

💬 Services

Reviewers

  • TIP, TNNLS, TCSVT, ISPRS, PR
  • NeurIPS2025, CVPR2025, ICCV2025, AAAI2026, ICLR2026, CVPR2026, ICML2026, ECCV2026

Leadership

  • 2018–2019 President, 1st AI and Robotics Association, China Jiliang University