👨🏻‍💻 About

I am Xuchen Li (李旭宸), a first-year Ph.D. student at Institute of Automation, Chinese Academy of Sciences (CASIA ), supervised by Prof. Kaiqi Huang, co-supervised by Dr. Shiyu Hu. Additionally, I am a member of Visual Intelligence Interest Group (VIIG ).

Before that, I received my B.E. degree in Computer Science and Technology with overall ranking 1/449 (0.22%) at School of Computer Science (SCS ) from Beijing University of Posts and Telecommunications (BUPT ) in Jun. 2024.

I am grateful to work with Dr. Shiyu Hu, which has a significant impact on me. I am also grateful to be growing up and studying with my twin brother Xuzhao Li, which is a truly unique and special experience for me.

My research focuses on Visual Language Tracking, Multi-modal Learning, Data-centric AI and Large Language Model. If you are interested in my work or would like to collaborate, please feel free to contact me.

🔥 News

  • 2024.09: 📝 Two papers (MemVLT and CPDTrack) have been accepted by the 38th Conference on Neural Information Processing Systems (NeurIPS, CCF-A Conference)!
  • 2024.08: 📣 Start my Ph.D. life at University of Chinese Academy of Sciences (UCAS), which is located in Huairou District, Beijing, near the beautiful Yanqi Lake.
  • 2024.06: 👨‍💻 Work as research intern at Ant Group (ANT), studying Multi-modal Large Language Model Agent.
  • 2024.06: 👨‍🎓 Obtain my B.E. degree from Beijing University of Posts and Telecommunications (BUPT). I will always remember the wonderful 4 years I spent here. Thanks to all!
  • 2024.05: 🏆 Obtain Beijing Outstanding Graduates (北京市优秀毕业生) (Top 5%, only 38 students obtain this honor of SCS, BUPT)!
  • 2024.05: 📣 Present our work during the 14th Vision and Learning Seminar (VALSE), see our poster for more information!
  • 2024.04: 📝 One paper (DTLLM-VLT) has been accepted as Oral Presentation and awarded Best Paper Honorable Mention Award by the 3rd CVPR Workshop on Vision Datasets Understanding (CVPRW, CCF-A Conference Workshop, Oral Presentation, Best Paper Honorable Mention Award)!
  • 2023.12: 🏆 Obtain College Scholarship of University of Chinese Academy of Sciences (中国科学院大学大学生奖学金) (only 17 students win this scholarship of CASIA)!
  • 2023.12: 🏆 Obtain China National Scholarship (国家奖学金) with a rank of 1/455 (0.22%) (Top 1%, the highest honor for undergraduates in China)!
  • 2023.11: 🏆 Obtain Beijing Merit Student (北京市三好学生) (Top 1%, only 36 students obtain this honor of BUPT)!
  • 2023.09: 📝 One paper (MGIT) has been accepted by the 37th Conference on Neural Information Processing Systems (NeurIPS, CCF-A Conference)!
  • 2022.12: 🏆 Obtain China National Scholarship (国家奖学金) with a rank of 2/430 (0.47%) (Top 1%, the highest honor for undergraduates in China)!

📖 Educations

sym

2024.08 - Now, Ph.D. student
Pattern Recognition and Intelligent System
Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing

sym

2020.09 - 2024.06, B.E. degree
Computer Science and Technology, Overall Ranking 1/449 (0.22%)
School of Computer Science
Beijing University of Posts and Telecommunications (BUPT), Beijing

💻 Experiences

📝 Publications

✅ Acceptance

CVPRW 2024
sym

DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM
Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang
CVPRW 2024 (CCF-A Conference Workshop): the 3rd CVPR Workshop on Vision Datasets Understanding
Oral Presentation, Best Paper Honorable Mention Award
[Paper] [PDF] [Code] [Website] [Award] [Poster] [Slides] [BibTeX]
📌 Visual Language Tracking 📌 LLM 📌 Evaluation Technique

NeurIPS 2024
sym

MemVLT: Visual-Language Tracking with Adaptive Memory-based Prompts
Xiaokun Feng, Xuchen Li, Shiyu Hu, Dailing Zhang, Meiqi Wu, Jing Zhang, Xiaotang Chen, Kaiqi Huang
NeurIPS 2024 (CCF-A Conference): the 38th Conference on Neural Information Processing Systems
📌 Visual Language Tracking 📌 Human-like Modeling 📌 Adaptive Prompts

NeurIPS 2024
sym

Beyond Accuracy: Tracking more like Human through Visual Search
Dailing Zhang, Shiyu Hu, Xiaokun Feng, Xuchen Li, Meiqi Wu, Jing Zhang, Kaiqi Huang
NeurIPS 2024 (CCF-A Conference): the 38th Conference on Neural Information Processing Systems
📌 Visual Object Tracking 📌 Visual Search Mechanism 📌 Visual Turing Test

NeurIPS 2023
sym

A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and Causal Relationship
Shiyu Hu, Dailing Zhang, Meiqi Wu, Xiaokun Feng, Xuchen Li, Xin Zhao, Kaiqi Huang
NeurIPS 2023 (CCF-A Conference): the 37th Conference on Neural Information Processing Systems
[Paper] [PDF] [Code] [Website] [Poster] [Slides] [BibTeX]
📌 Visual Language Tracking 📌 Video Understanding 📌 Hierarchical Annotation

☑️ Ongoing

CAAI-A
sym

DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
Xuchen Li, Shiyu Hu, Xiaokun Feng, Dailing Zhang, Meiqi Wu, Jing Zhang, Kaiqi Huang
Submitted to a CAAI-A conference, Under Review
[Preprint] [PDF] [Website] [BibTeX]
📌 Visual Language Tracking 📌 LLM 📌 Benchmark Construction

Preprint
sym

Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
Xuchen Li, Shiyu Hu, Xiaokun Feng, Dailing Zhang, Meiqi Wu, Jing Zhang, Kaiqi Huang
[Preprint] [PDF] [Website] [BibTeX]
ArXiv Preprint
📌 Visual Language Tracking 📌 Multi-modal Interaction 📌 Evaluation Technology

CAAI-A
sym

Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Shiyu Hu*, Xuchen Li*, Xuzhao Li, Jing Zhang, Yipei Wang, Xin Zhao, Kang Hao Cheong (*Equal Contributions)
Submitted to a CAAI-A conference, Under Review
📌 LVLM 📌 Evaluation Technology 📌 Human-Machine Comparison

CCF-A
sym

Sat-LLM: Multi-View Retrieval-Augmented Satellite Commonsense Multi-Modal Iterative Alignment LLM
Qian Li*, Xuchen Li*, Zongyu Chang, Yuzheng Zhang, Cheng Ji, Shangguang Wang (*Equal Contributions)
Submitted to a CCF-A conference, Under Review
📌 LLM 📌 Satellite Commonsense 📌 Retrieval Augmented Generation

CCF-A
sym

ATCTrack: Leveraging Aligned Target-Context Cues for Robust Vision-Language Tracking
Xiaokun Feng, Shiyu Hu, Xuchen Li, Dailing Zhang, Meiqi Wu, Jing Zhang, Xiaotang Chen, Kaiqi Huang
Submitted to a CCF-A conference, Under Review
📌 Visual Language Tracking 📌 Multi-modal Alignment 📌 Feature Awareness

🏆 Honors

  • Best Paper Honorable Mention Award (最佳论文荣誉提名奖), at CVPR Workshop on Vision Datasets Understanding, 2024
  • China National Scholarship (国家奖学金), My Rank: 1/455 (0.22%), Top 1%, at BUPT, by Ministry of Education of China, 2023
  • China National Scholarship (国家奖学金), My Rank: 2/430 (0.47%), Top 1%, at BUPT, by Ministry of Education of China, 2022
  • Beijing Merit Student (北京市三好学生), Top 1%, at BUPT, by Beijing Municipal Education Commission, 2023
  • Beijing Outstanding Graduates (北京市优秀毕业生), Top 5%, at BUPT, by Beijing Municipal Education Commission, 2024
  • College Scholarship of University of Chinese Academy of Sciences (中国科学院大学大学生奖学金), at CASIA, by University of Chinese Academy of Sciences, 2023

🎤 Talks

  • Oral presentation in Seattle WA, USA at CVPR 2024 conference workshop on vision datasets understanding (Slides)

🔗 Services

  • Reviewer

    International Conference on Learning Representations (ICLR)

    International Conference on Pattern Recognition (ICPR)

🌟 Projects

VideoCube / MGIT / DTVLT Platform
sym

VideoCube / MGIT / DTVLT: A Large-scale Multi-dimensional Multi-modal Global Instance Tracking Intelligent Evaluation Platform

  • Visual Object Tracking / Visual Language Tracking / Environment Construction
  • As of Sept. 2024, the platform has received 440k+ page views, 1.2k+ downloads, 420+ trackers from 220+ countries and regions worldwide.
  • VideoCube / MGIT is the supporting platform for research accepted by IEEE TPAMI 2023 and NeurIPS 2023.
SOTVerse Platform
sym

SOTVerse: A User-defined Single Object Tracking Task Space

  • Visual Object Tracking / Environment Construction / Evaluation Technique
  • As of Sept. 2024, the platform has received 126k+ page views from 150+ countries and regions worldwide.
  • SOTVerse is the supporting platform for research accepted by IJCV 2024.
GOT-10k Platform
sym

GOT-10k: A Large High-diversity Benchmark and Evaluation Platform for Single Object Tracking

  • Visual Object Tracking / Environment Construction / Evaluation Techniquebr>
  • As of Sept. 2024, the platform has received 3.92M+ page views, 7.5k+ downloads, 21.5k+ trackers from 290+ countries and regions worldwide.
  • GOT-10k is the supporting platform for research accepted by IEEE TPAMI 2021.
BioDrone Platform
sym

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision

  • UAV Tracking / Environment Construction / Evaluation Technique
  • As of Sept. 2024, the platform has received 170k+ page views from 200+ countries and regions worldwide.
  • BioDrone is the supporting platform for research accepted by IJCV 2024.


© Xuchen Li | Last updated: Oct. 2024