👨🏻💻 About
I am Xuchen Li (李旭宸), a first-year Ph.D. student at Institute of Automation, Chinese Academy of Sciences (CASIA ), supervised by Prof. Kaiqi Huang, co-supervised by Dr. Shiyu Hu. Additionally, I am a member of Visual Intelligence Interest Group (VIIG ).
Before that, I received my B.E. degree in Computer Science and Technology with overall ranking 1/449 (0.22%) at School of Computer Science (SCS ) from Beijing University of Posts and Telecommunications (BUPT ) in Jun. 2024.
I am grateful to work with Dr. Shiyu Hu, which has a significant impact on me. I am also grateful to be growing up and studying with my twin brother Xuzhao Li, which is a truly unique and special experience for me.
My research focuses on Visual Language Tracking, Multi-modal Learning, Data-centric AI and Large Language Model. If you are interested in my work or would like to collaborate, please feel free to contact me.
🔥 News
- 2024.09: 📝 Two papers (MemVLT and CPDTrack) have been accepted by the 38th Conference on Neural Information Processing Systems (NeurIPS, CCF-A Conference, Poster)!
- 2024.08: 📣 Start my Ph.D. life at University of Chinese Academy of Sciences (UCAS), which is located in Huairou District, Beijing, near the beautiful Yanqi Lake.
- 2024.06: 👨💻 Work as research intern at Ant Group (ANT ), studying Multi-modal Large Language Model Agent.
- 2024.06: 👨🎓 Obtain my B.E. degree from Beijing University of Posts and Telecommunications (BUPT). I will always remember the wonderful 4 years I spent here. Thanks to all!
- 2024.05: 🏆 Obtain Beijing Outstanding Graduates (北京市优秀毕业生) (Top 5%, only 38 students obtain this honor of SCS, BUPT)!
- 2024.05: 📣 Present our work during the 14th Vision and Learning Seminar (VALSE), see our poster for more information!
- 2024.04: 📝 One paper (DTLLM-VLT) has been accepted as Oral Presentation and awarded Best Paper Honorable Mention Award by the 3rd CVPR Workshop on Vision Datasets Understanding (CVPRW, CCF-A Conference Workshop, Oral, Best Paper Honorable Mention Award)!
- 2023.12: 🏆 Obtain College Scholarship of University of Chinese Academy of Sciences (中国科学院大学大学生奖学金) (only 17 students win this scholarship of CASIA)!
- 2023.12: 🏆 Obtain China National Scholarship (国家奖学金) with a rank of 1/455 (0.22%) (Top 1%, the highest honor for undergraduates in China)!
- 2023.11: 🏆 Obtain Beijing Merit Student (北京市三好学生) (Top 1%, only 36 students obtain this honor of BUPT)!
- 2023.09: 📝 One paper (MGIT) has been accepted by the 37th Conference on Neural Information Processing Systems (NeurIPS, CCF-A Conference, Poster)!
- 2022.12: 🏆 Obtain China National Scholarship (国家奖学金) with a rank of 2/430 (0.47%) (Top 1%, the highest honor for undergraduates in China)!
📖 Educations
2024.08 - Now, Ph.D. student
Pattern Recognition and Intelligent System
Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing
2020.09 - 2024.06, B.E. degree
Computer Science and Technology, Overall Ranking 1/449 (0.22%)
School of Computer Science
Beijing University of Posts and Telecommunications (BUPT), Beijing
💻 Experiences
- 2024.06 - 2024.10: Research intern on Multi-modal Large Language Model Agent at Ant Group (ANT ), advised by Dr. Jian Wang and Dr. Ming Yang.
- 2023.05 - 2024.04: Member of Artificial Intelligence Elites Class at Institute of Automation, Chinese Academy of Sciences (CASIA ), advised by Dr. Shiyu Hu and Prof. Kaiqi Huang.
- 2023.01 - 2023.05: Research intern on 3D Reconstruction at Tsinghua University (THU ), advised by Prof. Haoqian Wang.
📝 Publications
✅ Acceptance
DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM
Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang
CVPRW 2024 (CCF-A Conference Workshop, Oral, Best Paper Honorable Mention Award): the 3rd CVPR Workshop on Vision Datasets Understanding
[Paper]
[PDF]
[Code]
[Website]
[Award]
[Poster]
[Slides]
[BibTeX]
📌 Visual Language Tracking 📌 LLM 📌 Evaluation Technique
MemVLT: Visual-Language Tracking with Adaptive Memory-based Prompts
Xiaokun Feng, Xuchen Li, Shiyu Hu, Dailing Zhang, Meiqi Wu, Jing Zhang, Xiaotang Chen, Kaiqi Huang
NeurIPS 2024 (CCF-A Conference, Poster): the 38th Conference on Neural Information Processing Systems
📌 Visual Language Tracking 📌 Human-like Modeling 📌 Adaptive Prompts
Beyond Accuracy: Tracking more like Human through Visual Search
Dailing Zhang, Shiyu Hu, Xiaokun Feng, Xuchen Li, Meiqi Wu, Jing Zhang, Kaiqi Huang
NeurIPS 2024 (CCF-A Conference, Poster): the 38th Conference on Neural Information Processing Systems
📌 Visual Object Tracking 📌 Visual Search Mechanism 📌 Visual Turing Test
A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and Causal Relationship
Shiyu Hu, Dailing Zhang, Meiqi Wu, Xiaokun Feng, Xuchen Li, Xin Zhao, Kaiqi Huang
NeurIPS 2023 (CCF-A Conference, Poster): the 37th Conference on Neural Information Processing Systems
[Paper]
[PDF]
[Code]
[Website]
[Poster]
[Slides]
[BibTeX]
📌 Visual Language Tracking 📌 Video Understanding 📌 Hierarchical Annotation
☑️ Ongoing
DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM
Xuchen Li, Shiyu Hu, Xiaokun Feng, Dailing Zhang, Meiqi Wu, Jing Zhang, Kaiqi Huang
Submitted to a CAAI-A conference, Under Review
📌 Visual Language Tracking 📌 LLM 📌 Benchmark Construction
Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
Xuchen Li, Shiyu Hu, Xiaokun Feng, Dailing Zhang, Meiqi Wu, Jing Zhang, Kaiqi Huang
Submitted to a CCF-A conference workshop, Under Review
📌 Visual Language Tracking 📌 Multi-modal Interaction 📌 Evaluation Technology
Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Shiyu Hu*, Xuchen Li*, Xuzhao Li, Jing Zhang, Yipei Wang, Xin Zhao, Kang Hao Cheong (*Equal Contributions)
Submitted to a CAAI-A conference, Under Review
📌 LVLM 📌 Evaluation Technology 📌 Human-Machine Comparison
Sat-LLM: EveryOne Can Expert in Satellite
Qian Li*, Xuchen Li*, Zongyu Chang, Yuzheng Zhang, Cheng Ji, Shangguang Wang (*Equal Contributions)
Submitted to a CCF-A conference, Under Review
📌 LLM 📌 Satellite Commonsense 📌 Retrieval Augmented Generation
ATCTrack: Leveraging Aligned Target-Context Cues for Robust Vision-Language Tracking
Xiaokun Feng, Shiyu Hu, Xuchen Li, Dailing Zhang, Meiqi Wu, Jing Zhang, Xiaotang Chen, Kaiqi Huang
Submitted to a CCF-A conference, Under Review
📌 Visual Language Tracking 📌 Multi-modal Alignment 📌 Feature Awareness
🏆 Honors
- Best Paper Honorable Mention Award (最佳论文荣誉提名奖), at CVPR Workshop on Vision Datasets Understanding, 2024
- China National Scholarship (国家奖学金), My Rank: 1/455 (0.22%), Top 1%, at BUPT, by Ministry of Education of China, 2023
- China National Scholarship (国家奖学金), My Rank: 2/430 (0.47%), Top 1%, at BUPT, by Ministry of Education of China, 2022
- Beijing Merit Student (北京市三好学生), Top 1%, at BUPT, by Beijing Municipal Education Commission, 2023
- Beijing Outstanding Graduates (北京市优秀毕业生), Top 5%, at BUPT, by Beijing Municipal Education Commission, 2024
- College Scholarship of University of Chinese Academy of Sciences (中国科学院大学大学生奖学金), at CASIA, by University of Chinese Academy of Sciences, 2023
🎤 Talks
- Oral presentation in Seattle WA, USA at CVPR 2024 conference workshop on vision datasets understanding (Slides)
🔗 Services
- Reviewer: ICLR 2025, ICPR 2024
🌟 Projects
GOT-10k: A Large High-diversity Benchmark and Evaluation Platform for Single Object Tracking
- Visual Object Tracking / Evaluation Technology / Large High-diversity Benchmark
- As of Sept. 2024, the platform has received 3.92M+ page views, 7.5k+ downloads, 21.5k+ trackers from 290+ countries and regions worldwide.
- GOT-10k is the supporting platform for research accepted by IEEE TPAMI 2021.
- Visual Object Tracking / Visual Language Tracking / Long Video Understanding
- As of Sept. 2024, the platform has received 440k+ page views, 1.2k+ downloads, 420+ trackers from 220+ countries and regions worldwide.
- VideoCube / MGIT is the supporting platform for research accepted by IEEE TPAMI 2023 and NeurIPS 2023.
SOTVerse: A User-defined Single Object Tracking Task Space
- Visual Object Tracking / Dynamic Open Environment Construction / Visual Evaluation Technique
- As of Sept. 2024, the platform has received 126k+ page views from 150+ countries and regions worldwide.
- SOTVerse is the supporting platform for research accepted by IJCV 2024.
BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision
- Visual Object Tracking / Robust Vision Challenges / Bionic-based UAV Tracking
- As of Sept. 2024, the platform has received 170k+ page views from 200+ countries and regions worldwide.
- BioDrone is the supporting platform for research accepted by IJCV 2024.
© Xuchen Li | Last updated: Oct. 2024