Updated on 2026.03.24

Single Object & Visual Language Tracking

Publish Date	Title	Authors	PDF	Code
2025-07-22	Explicit Context Reasoning with Supervision for Visual Tracking	Fansheng Zeng et.al.	2507.16191	null
2025-07-21	Is Tracking really more challenging in First Person Egocentric Vision?	Matteo Dunnhofer et.al.	2507.16015	null
2025-07-23	EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Contro	An Wang et.al.	2507.15292	null
2025-07-11	SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2	Alen Adamyan et.al.	2507.08548	null
2025-07-10	Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking	Qiangqiang Wu et.al.	2507.07483	null
2025-07-09	Token Bottleneck: One Token to Remember Dynamics	Taekyung Kim et.al.	2507.06543	null
2025-07-08	What You Have is What You Track: Adaptive and Robust Multimodal Tracking	Yuedong Tan et.al.	2507.05899	null
2025-07-08	Stable Tracking-in-the-Loop Control of Cable-Driven Surgical Manipulators under Erroneous Kinematic Chains	Neelay Joglekar et.al.	2507.05663	null
2025-07-07	Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos	Davide Berghi et.al.	2507.04845	null
2025-07-05	Sensitive and accurate femtosecond pulse characterization via two-photon absorption in Fabry-Pérot laser diodes	Adrian F. Chlebowski et.al.	2507.03978	null
2025-07-01	UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions	Siyuan Yao et.al.	2507.00648	null
2025-07-01	ATSTrack: Enhancing Visual-Language Tracking by Aligning Temporal and Spatial Scales	Yihao Zhen et.al.	2507.00454	null
2025-06-30	Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking	Shiao Wang et.al.	2506.23783	null
2025-07-22	R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning	Biao Wang et.al.	2506.21980	null
2025-06-25	Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking	Ben Kang et.al.	2506.20381	null
2025-06-17	Comparison of Two Methods for Stationary Incident Detection Based on Background Image	Deepak Ghimire et.al.	2506.14256	null
2025-06-03	MVTD: A Benchmark Dataset for Maritime Visual Object Tracking	Ahsan Baidar Bakht et.al.	2506.02866	null
2025-05-31	Towards Effective and Efficient Adversarial Defense with Diffusion Models for Robust Visual Tracking	Long Xu et.al.	2506.00325	link
2025-05-29	CLDTracker: A Comprehensive Language Description for Visual Tracking	Mohamad Alansari et.al.	2505.23704	link
2025-05-29	TrackVLA: Embodied Visual Tracking in the Wild	Shaoan Wang et.al.	2505.23189	null
2025-05-28	TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Dynamic Objects	Wen Yang et.al.	2505.22882	null
2025-05-27	Fully Spiking Neural Networks for Unified Frame-Event Object Tracking	Jingjun Yang et.al.	2505.20834	null
2025-05-28	VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models	Kui Wu et.al.	2505.20718	null
2025-05-27	Hierarchical Instruction-aware Embodied Visual Tracking	Kui Wu et.al.	2505.20710	null
2025-06-01	HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval	Matthew Hong et.al.	2505.20455	null
2025-05-28	Progressive Scaling Visual Object Tracking	Jack Hong et.al.	2505.19990	null
2025-05-23	Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking	Cheng-Yen Yang et.al.	2505.18111	null
2025-05-22	Efficient Motion Prompt Learning for Robust Visual Tracking	Jie Zhao et.al.	2505.16321	link
2025-05-19	Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach	Shiao Wang et.al.	2505.12903	link
2025-05-13	Towards Adaptive Meta-Gradient Adversarial Examples for Visual Tracking	Wei-Long Tian et.al.	2505.08999	link
2025-05-11	DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems	Tong Zhang et.al.	2505.07110	null
2025-05-09	CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking	Weihong Li et.al.	2505.05936	link
2025-05-07	Predicting Road Surface Anomalies by Visual Tracking of a Preceding Vehicle	Petr Jahoda et.al.	2505.04392	null
2025-04-19	Adversarial Attack for RGB-Event based Visual Object Tracking	Qiang Chen et.al.	2504.14423	link
2025-05-05	SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation	Junjie Jiang et.al.	2504.04519	link
2025-03-24	SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking	Wenrui Cai et.al.	2503.18338	link
2025-03-22	MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking	Haolin Qin et.al.	2503.17699	link
2025-03-21	Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking	Meng Zhou et.al.	2503.16768	null
2025-03-17	UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network	Siyuan Yao et.al.	2503.12888	link
2025-03-16	A Plug-and-Play Learning-based IMU Bias Factor for Robust Visual-Inertial Odometry	Yang Yi et.al.	2503.12527	null
2025-03-14	Towards General Multimodal Visual Tracking	Andong Lu et.al.	2503.11218	null
2025-03-09	Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking	Chaocan Xue et.al.	2503.06625	link
2025-03-09	Dynamic Updates for Language Adaptation in Visual-Language Tracking	Xiaohai Li et.al.	2503.06621	link
2025-02-28	Technical Report for ReID-SAM on SkiTB Visual Tracking Challenge 2025	Kunjun Li et.al.	2503.01907	null
2025-03-01	Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking	Jiawen Zhu et.al.	2503.00516	link
2025-02-27	MITracker: Multi-View Integration for Visual Object Tracking	Mengjie Xu et.al.	2502.20111	null
2025-02-27	CFTrack: Enhancing Lightweight Visual Tracking through Contrastive Learning and Feature Matching	Juntao Liang et.al.	2502.19705	null
2025-02-26	Enhanced Transformer-Based Tracking for Skiing Events: Overcoming Multi-Camera Challenges, Scale Variations and Rapid Motion – SkiTB Visual Tracking Challenge 2025	Akhil Penta et.al.	2502.18867	null
2025-02-25	UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking	He Wang et.al.	2502.18220	null
2025-02-08	Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark	Shiao Wang et.al.	2502.05574	link
2025-01-13	Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions	Xiantong Zhao et.al.	2501.07133	null
2025-01-05	DeTrack: In-model Latent Denoising Learning for Visual Object Tracking	Xinyu Zhou et.al.	2501.02467	null
2025-01-13	FusionSORT: Fusion Methods for Online Multi-object Visual Tracking	Nathanael L. Baisa et.al.	2501.00843	link
2025-01-01	Less is More: Token Context-aware Learning for Object Tracking	Chenlong Xu et.al.	2501.00758	link
2024-12-28	Learning Adaptive and View-Invariant Vision Transformer with Multi-Teacher Knowledge Distillation for Real-Time UAV Tracking	You Wu et.al.	2412.20002	link
2024-12-26	SUTrack: Towards Simple and Unified Single Object Tracking	Xin Chen et.al.	2412.19138	link
2024-12-15	Exploring Enhanced Contextual Information for Video-Level Object Tracking	Ben Kang et.al.	2412.11023	link
2024-12-13	Visual Object Tracking across Diverse Data Modalities: A Review	Mengmeng Wang et.al.	2412.09991	null
2025-03-07	MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues	Zhaofeng Hu et.al.	2412.02734	link
2024-12-03	GSOT3D: Towards Generic 3D Single Object Tracking in the Wild	Yifan Jiao et.al.	2412.02129	link
2025-02-06	Improving Accuracy and Generalization for Efficient Visual Tracking	Ram Zaveri et.al.	2411.18855	null
2024-11-27	A comparison of extended object tracking with multi-modal sensors in indoor environment	Jiangtao Shuai et.al.	2411.18476	null
2024-12-04	A Distractor-Aware Memory for Visual Object Tracking with SAM2	Jovana Videnovic et.al.	2411.17576	link
2024-11-23	How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking	Xuchen Li et.al.	2411.15600	null
2024-11-24	ClickTrack: Towards Real-time Interactive Single Object Tracking	Kuiran Wang et.al.	2411.13183	null
2024-11-30	SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory	Cheng-Yen Yang et.al.	2411.11922	link
2024-12-09	Vision Eagle Attention: a new lens for advancing image classification	Mahmudul Hasan et.al.	2411.10564	link
2024-11-14	MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation	Jonas Serych et.al.	2411.09551	link
2024-11-12	Visual Tracking with Intermittent Visibility: Switched Control Design and Implementation	Yangge Li et.al.	2411.08144	null
2024-12-16	ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model	Yiming Sun et.al.	2411.01756	null
2024-10-30	IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking	Run Luo et.al.	2410.23907	null
2024-10-27	NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking	Yu Liu et.al.	2410.20421	link
2024-10-19	The Solution for Single Object Tracking Task of Perception Test Challenge 2024	Zhiqiang Zhong et.al.	2410.16329	null
2024-10-13	Gaussian Splatting Visual MPC for Granular Media Manipulation	Wei-Cheng Tseng et.al.	2410.09740	null
2024-10-09	DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM	Xuchen Li et.al.	2410.02492	null
2024-09-30	Opt-in Camera: Person Identification in Video via UWB Localization and Its Application to Opt-in Systems	Matthew Ishige et.al.	2409.19891	null
2024-09-27	Improving Visual Object Tracking through Visual Prompting	Shih-Fang Chen et.al.	2409.18901	link
2024-09-26	General Compression Framework for Efficient Transformer Object Tracking	Lingyi Hong et.al.	2409.17564	null
2024-09-25	Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2	Chunhui Zhang et.al.	2409.16902	link
2024-09-25	Conditional Generative Denoiser for Nighttime UAV Tracking	Yucheng Wang et.al.	2409.16834	link
2024-09-25	Progressive Representation Learning for Real-Time UAV Tracking	Changhong Fu et.al.	2409.16652	link
2024-09-25	Enhancing Nighttime UAV Tracking with Light Distribution Suppression	Liangliang Yao et.al.	2409.16631	link
2024-09-19	WeHelp: A Shared Autonomy System for Wheelchair Users	Abulikemu Abuduweili et.al.	2409.12159	link
2024-09-18	Distilling Channels for Efficient Deep Tracking	Shiming Ge et.al.	2409.11785	null
2024-09-13	Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark	Xuchen Li et.al.	2409.08887	null
2024-09-10	VBIT: Towards Enhancing Privacy Control Over IoT Devices	Jad Al Aaraj et.al.	2409.06233	null
2024-09-03	Ultra-broadband room-temperature Fourier transform spectrometer with watt-level power consumption	Jakub Mnich et.al.	2409.01875	null
2024-08-25	Camouflaged_Object_Tracking__A_Benchmark	Xiaoyu Guo et.al.	2408.13877	link
2024-08-21	Low-Light Object Tracking: A Benchmark	Pengzhi Zhong et.al.	2408.11463	link
2024-08-20	MambaEVT: Event Stream based Visual Object Tracking using State Space Model	Xiao Wang et.al.	2408.10487	link
2024-08-05	VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking	Yuxuan Lu et.al.	2408.02263	null
2024-09-06	3D Single-object Tracking in Point Clouds with High Temporal Variation	Qiao Wu et.al.	2408.02049	null
2024-09-09	SiamMo: Siamese Motion-Centric 3D Object Tracking	Yuxiang Yang et.al.	2408.01688	link
2024-08-02	Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach	Yabin Zhu et.al.	2408.00969	link
2024-08-06	Broadband THz wave generation and detection in organic crystal PNPA at MHz repetition rates	Lukasz A. Sterczewski et.al.	2407.20745	null
2024-07-16	Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers	Zhengbo Zhang et.al.	2407.08394	null
2024-07-11	PINN-Ray: A Physics-Informed Neural Network to Model Soft Robotic Fin Ray Fingers	Xing Wang et.al.	2407.08222	null
2024-07-07	Addressing single object tracking in satellite imagery through prompt-engineered solutions	Athena Psalta et.al.	2407.05518	null
2024-07-07	Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking	You Wu et.al.	2407.05383	null
2024-07-09	P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds	Jiahao Nie et.al.	2407.05238	link
2024-07-07	Tracking Reflected Objects: A Benchmark	Xiaoyu Guo et.al.	2407.05235	null
2024-07-04	TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers	Fatemeh Nourilenjan Nokabadi et.al.	2407.03946	link
2024-07-02	FlowTrack: Point-level Flow Network for 3D Single Object Tracking	Shuo Li et.al.	2407.01959	null
2024-09-07	eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking	Yucheng Chen et.al.	2406.20024	null
2024-06-14	Constrained Motion Planning for a Robotic Endoscope Holder based on Hierarchical Quadratic Programming	Jacinto Colan et.al.	2406.09982	null
2024-06-14	Robust compressive tracking via online weighted multiple instance learning	Sandeep Singh Sengar et.al.	2406.09914	null
2024-07-01	Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking	Xiangyang Yang et.al.	2406.08037	null
2024-06-07	Multi-Granularity Language-Guided Multi-Object Tracking	Yuhao Li et.al.	2406.04844	link
2024-06-02	Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection	Zhuang Qi et.al.	2406.00589	null
2024-05-28	Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion	Hongze Sun et.al.	2405.17903	link
2024-05-27	LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking	Shaohua Dong et.al.	2405.17660	null
2024-05-31	Awesome Multi-modal Object Tracking	Chunhui Zhang et.al.	2405.14200	link
2024-05-20	DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM	Xuchen Li et.al.	2405.12139	null
2024-05-16	A Novel Bounding Box Regression Method for Single Object Tracking	Omar Abdelaziz et.al.	2405.10444	null
2024-05-16	Beyond Traditional Single Object Tracking: A Survey	Omar Abdelaziz et.al.	2405.10439	null
2024-05-08	TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking	Pengcheng Shao et.al.	2405.05004	link
2024-04-22	360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos	Yinzhe Xu et.al.	2404.13953	link
2024-05-25	An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training	Jin Gao et.al.	2404.12210	link
2024-04-16	Attention-Aware Visualization: Tracking and Responding to User Perception Over Time	Arvind Srinivasan et.al.	2404.10732	null
2024-04-15	Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL	Fangwei Zhong et.al.	2404.09857	null
2024-04-15	Learning Tracking Representations from Single Point Annotations	Qiangqiang Wu et.al.	2404.09504	null
2024-04-11	PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds	Weisheng Xu et.al.	2404.07495	link
2024-05-02	Longitudinal Analysis and Quantitative Assessment of Child Development through Mobile Interaction	Juan Carlos Ruiz-Garcia et.al.	2404.06919	link
2024-04-09	LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks	Jianlang Chen et.al.	2404.06247	link
2024-04-08	Semi-Supervised Novelty Detection for Precise Ultra-Wideband Error Signal Prediction	Umberto Albertin et.al.	2404.05351	null
2024-03-29	Context-Aware Integration of Language and Visual References for Natural Language Tracking	Yanyan Shao et.al.	2403.19975	null
2024-03-27	TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes	Liangyu Xu et.al.	2403.18238	null
2024-03-26	OmniVid: A Generative Framework for Universal Video Understanding	Junke Wang et.al.	2403.17935	link
2024-03-26	Exploring Dynamic Transformer for Efficient Object Tracking	Jiawen Zhu et.al.	2403.17651	null
2024-03-29	Elysium: Exploring Object-level Perception in Videos via MLLM	Han Wang et.al.	2403.16558	link
2024-03-25	Multi-attention Associate Prediction Network for Visual Tracking	Xinglong Sun et.al.	2403.16395	null
2024-03-28	SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking	Xiaojun Hou et.al.	2403.16002	link
2024-03-23	Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking	Shaoyu Sun et.al.	2403.15831	null
2024-03-19	TON-VIO: Online Time Offset Modeling Networks for Robust Temporal Alignment in High Dynamic Motion VIO	Chaoran Xiong et.al.	2403.12504	link
2024-03-18	Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model	Jan Krejčí et.al.	2403.11978	null
2024-03-16	A Spectrum-based Image Denoising Method with Edge Feature Enhancement	Peter Luvton et.al.	2403.11036	null
2024-03-15	Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers	Jinxia Xie et.al.	2403.10574	null
2024-03-14	OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning	Lingyi Hong et.al.	2403.09634	null
2024-02-27	ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking	Yushan Han et.al.	2403.07914	null
2024-04-03	Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline	Xiao Wang et.al.	2403.05839	link
2024-03-08	Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance	Liting Lin et.al.	2403.05231	link
2024-03-08	Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy	Yuelin Zhang et.al.	2403.05146	link
2024-03-06	VastTrack: Vast Category Visual Object Tracking	Liang Peng et.al.	2403.03493	link
2024-02-28	Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks	Zhewei Wu et.al.	2402.17976	null
2024-02-26	SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking	Yu Lin et.al.	2402.16249	link
2024-02-26	Reading Relevant Feature from Global Representation Memory for Visual Object Tracking	Xinyu Zhou et.al.	2402.14392	null
2024-02-13	Optimized Information Flow for Transformer Tracking	Janani Kugarajeevan et.al.	2402.08195	link
2024-02-07	BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision	Xin Zhao et.al.	2402.04519	null
2024-02-04	Spatio-temporal Prompting Network for Robust Video Feature Extraction	Guanxiong Sun et.al.	2402.02574	link
2024-01-24	Small Object Tracking in LiDAR Point Cloud: Learning the Target-awareness Prototype and Fine-grained Search Region	Shengjing Tian et.al.	2401.13285	null
2024-01-23	Correlation-Embedded Transformer Tracking: A Single-Branch Framework	Fei Xie et.al.	2401.12743	link
2024-01-20	Unifying Visual and Vision-Language Tracking via Contrastive Learning	Yinchao Ma et.al.	2401.11228	link
2024-01-20	Towards Category Unification of 3D Single Object Tracking on Point Clouds	Jiahao Nie et.al.	2401.11204	null
2024-01-18	Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking	Amir M. Mansourian et.al.	2401.09942	null
2024-01-12	Dense Optical Flow Estimation Using Sparse Regularizers from Reduced Measurements	Muhammad Wasim Nawaz et.al.	2401.06396	null
2024-01-18	Hold ‘em and Fold ‘em: Towards Human-scale, Feedback-Controlled Soft Origami Robots	Immanuel Ampomah Mensah et.al.	2401.04650	null
2024-01-06	Explicit Visual Prompts for Visual Object Tracking	Liangtao Shi et.al.	2401.03142	link
2024-01-03	ODTrack: Online Dense Temporal Token Learning for Visual Tracking	Yaozong Zheng et.al.	2401.01686	link
2023-12-27	X Modality Assisting RGBT Object Tracking	Zhaisheng Ding et.al.	2312.17273	null
2023-12-22	Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset	Lei Liu et.al.	2312.14446	link
2023-12-18	Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking	Shihao Feng et.al.	2312.11051	link
2023-12-17	Robust 3D Tracking with Quality-Aware Shape Completion	Jingwen Zhang et.al.	2312.10608	null
2023-12-15	Tracking Skiers from the Top to the Bottom	Matteo Dunnhofer et.al.	2312.09723	null
2023-12-11	M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking	Jiaming Liu et.al.	2312.06117	link
2023-12-07	Instance Tracking in 3D Scenes from Egocentric Videos	Yunhan Zhao et.al.	2312.04117	link
2024-02-19	Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking	Jiawei Ge et.al.	2311.17085	null
2023-11-21	Visual tracking brain computer interface	Changxing Huang et.al.	2311.12592	null
2024-01-10	ViKi-HyCo: A Hybrid-Control approach for complex car-like maneuvers	Edison P. Velasco Sánchez et.al.	2311.07268	null

Large Language Model

Publish Date	Title	Authors	PDF	Code
2025-07-23	Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks	Linbo Cao et.al.	2507.17747	null
2025-07-23	Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains	Anisha Gunjal et.al.	2507.17746	null
2025-07-23	Megrez2 Technical Report	Boxun Li et.al.	2507.17728	null
2025-07-23	BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems	Malsha Ashani Mahawatta Dona et.al.	2507.17722	null
2025-07-23	AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer	Danny D. Leybzon et.al.	2507.17718	null
2025-07-23	HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging	Taha Ceritli et.al.	2507.17706	null
2025-07-23	Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models	Changxin Tian et.al.	2507.17702	null
2025-07-23	Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations	Zhao Song et.al.	2507.17699	null
2025-07-23	Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks	Ilias Chatzistefanidis et.al.	2507.17695	null
2025-07-23	Simulating multiple human perspectives in socio-ecological systems using large language models	Yongchao Zeng et.al.	2507.17680	null
2025-07-23	See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering	Junjie Wang et.al.	2507.17659	null
2025-07-23	Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries	Victor Hartman et.al.	2507.17636	null
2025-07-23	A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE)	Bowen Zheng et.al.	2507.17618	null
2025-07-23	Decoding Consumer Preferences Using Attention-Based Language Models	Joshua Foster et.al.	2507.17564	null
2025-07-23	BoSS: Beyond-Semantic Speech	Qing Wang et.al.	2507.17563	null
2025-07-23	CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning	Lingxiao Tang et.al.	2507.17548	null
2025-07-23	Anticipate, Simulate, Reason (ASR): A Comprehensive Generative AI Framework for Combating Messaging Scams	Xue Wen Tan et.al.	2507.17543	null
2025-07-23	AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests	Lara Khatib et.al.	2507.17542	null
2025-07-23	Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning	Xinyao Liu et.al.	2507.17539	null
2025-07-23	InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation	Shuai Yang et.al.	2507.17520	null
2025-07-22	Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning	Junhao Shen et.al.	2507.16814	null
2025-07-22	LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs	Da-Chen Lian et.al.	2507.16809	null
2025-07-22	Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis	Zhihao Xu et.al.	2507.16808	null
2025-07-22	Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty	Mehul Damani et.al.	2507.16806	null
2025-07-23	Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning	Yanjun Zheng et.al.	2507.16802	null
2025-07-23	Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent	Xiaoyu Zhan et.al.	2507.16799	null
2025-07-22	Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning	Helena Casademunt et.al.	2507.16795	null
2025-07-22	ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation	Roman Mayr et.al.	2507.16792	null
2025-07-22	Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning	Hongyin Luo et.al.	2507.16784	null
2025-07-22	Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems	Imran Latif et.al.	2507.16781	null
2025-07-22	When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs	Yue Li et.al.	2507.16773	null
2025-07-22	WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding	Ran Wang et.al.	2507.16768	null
2025-07-22	Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support	Fangjian Lei et.al.	2507.16754	null
2025-07-22	CMP: A Composable Meta Prompt for SAM-Based Cross-Domain Few-Shot Segmentation	Shuai Chen et.al.	2507.16753	null
2025-07-22	Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges	Senyao Li et.al.	2507.16731	null
2025-07-23	Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints	Zhenyun Yin et.al.	2507.16727	null
2025-07-22	SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing	Jinbo Hu et.al.	2507.16724	null
2025-07-22	Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation	Yiguo He et.al.	2507.16716	null
2025-07-22	Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory	Guowei Lan et.al.	2507.16713	null
2025-07-22	Advancing Risk and Quality Assurance: A RAG Chatbot for Improved Regulatory Compliance	Lars Hillebrand et.al.	2507.16711	null
2025-07-21	Diffusion Beats Autoregressive in Data-Constrained Settings	Mihir Prabhudesai et.al.	2507.15857	null
2025-07-21	Gemini 2.5 Pro Capable of Winning Gold at IMO 2025	Yichen Huang et.al.	2507.15855	null
2025-07-22	SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction	Zhixiong Zhang et.al.	2507.15852	null
2025-07-21	The Other Mind: How Language Models Exhibit Human Temporal Cognition	Lingyu Li et.al.	2507.15851	null
2025-07-21	3LM: Bridging Arabic, STEM, and Code through Benchmarking	Basma El Amel Boussaha et.al.	2507.15850	null
2025-07-21	The Impact of Language Mixing on Bilingual LLM Reasoning	Yihao Li et.al.	2507.15849	null
2025-07-21	FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs	Anh Nguyen et.al.	2507.15839	null
2025-07-21	Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation	Alessandro B. Melchiorre et.al.	2507.15826	null
2025-07-21	ACS: An interactive framework for conformal selection	Yu Gui et.al.	2507.15825	null
2025-07-21	Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models	Enes Sanli et.al.	2507.15824	null
2025-07-21	Do AI models help produce verified bug fixes?	Li Huang et.al.	2507.15822	null
2025-07-21	LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra	Seth Karten et.al.	2507.15815	null
2025-07-21	True Multimodal In-Context Learning Needs Attention to the Visual Context	Shuo Chen et.al.	2507.15807	null
2025-07-21	ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction	Danhui Chen et.al.	2507.15803	null
2025-07-21	Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation	Ghassen Baklouti et.al.	2507.15793	null
2025-07-21	Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning	Sneheel Sarangi et.al.	2507.15788	null
2025-07-21	Reservoir Computing as a Language Model	Felix Köster et.al.	2507.15779	null
2025-07-21	Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR	Jiakang Wang et.al.	2507.15778	null
2025-07-21	Left Leaning Models: AI Assumptions on Economic Policy	Maxim Chupilkin et.al.	2507.15771	null
2025-07-21	A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining	Yifan Shen et.al.	2507.15770	null
2025-07-18	Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning	Shashanka Venkataramanan et.al.	2507.14137	null
2025-07-18	CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning	Xiaoya Li et.al.	2507.14111	null
2025-07-18	Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment	Viraj Nishesh Darji et.al.	2507.14107	null
2025-07-18	Generative AI-Driven High-Fidelity Human Motion Simulation	Hari Iyer et.al.	2507.14097	null
2025-07-18	Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track	Brian Ondov et.al.	2507.14096	null
2025-07-18	DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration	Xiyun Li et.al.	2507.14088	null
2025-07-18	DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits	Garapati Keerthana et.al.	2507.14079	null
2025-07-18	VLA-Mark: A cross modal watermark for large vision-language alignment model	Shuliang Liu et.al.	2507.14067	null
2025-07-18	Foundation Models as Class-Incremental Learners for Dermatological Image Classification	Mohamed Elkhayat et.al.	2507.14050	null
2025-07-18	EdgeVLA: Efficient Vision-Language-Action Models	Paweł Budzianowski et.al.	2507.14049	null
2025-07-18	Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks	Israt Jahan et.al.	2507.14045	null
2025-07-18	Architecting Human-AI Cocreation for Technical Services – Interaction Modes and Contingency Factors	Jochen Wulf et.al.	2507.14034	null
2025-07-18	KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models	Lam Nguyen et.al.	2507.14032	null
2025-07-18	Moodifier: MLLM-Enhanced Emotion-Driven Image Editing	Jiarong Ye et.al.	2507.14024	null
2025-07-18	Efficient Temporal Tokenization for Mobility Prediction with Large Language Models	Haoyu He et.al.	2507.14017	null
2025-07-18	OrthoInsight: Rib Fracture Diagnosis and Report Generation Based on Multi-Modal Large Models	Ningyong Wu et.al.	2507.13993	null
2025-07-18	Leveraging Pathology Foundation Models for Panoptic Segmentation of Melanoma in H&E Images	Jiaqi Lv et.al.	2507.13974	null
2025-07-18	Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need	Bhishma Dedhia et.al.	2507.13966	null
2025-07-18	DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation	Yitong Li et.al.	2507.13957	null
2025-07-18	Cross-modal Causal Intervention for Alzheimer’s Disease Prediction	Yutao Jin et.al.	2507.13956	null
2025-07-17	VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Shihao Wang et.al.	2507.13353	null
2025-07-17	VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning	Senqiao Yang et.al.	2507.13348	null
2025-07-17	Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes	Tyler Loakman et.al.	2507.13335	null
2025-07-17	A Survey of Context Engineering for Large Language Models	Lingrui Mei et.al.	2507.13334	null
2025-07-17	The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner	Zhouqi Hua et.al.	2507.13332	null
2025-07-17	Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It	Yulu Qin et.al.	2507.13328	null
2025-07-17	GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM	Kyeongjin Ahn et.al.	2507.13323	null
2025-07-17	HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals	Guimin Hu et.al.	2507.13318	null
2025-07-17	Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark	Junsu Kim et.al.	2507.13314	null
2025-07-17	The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations	Carlos Arriaga et.al.	2507.13302	null
2025-07-17	AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research	Yilun Zhao et.al.	2507.13300	null
2025-07-17	Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management	Luis Gasco et.al.	2507.13275	null
2025-07-17	Automating Steering for Safe Multimodal Large Language Models	Lyucheng Wu et.al.	2507.13255	null
2025-07-17	HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models	Ashray Gupta et.al.	2507.13238	null
2025-07-17	Enhancing Cross-task Transfer of Large Language Models via Activation Steering	Xinyu Tang et.al.	2507.13236	null
2025-07-18	MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling	Etienne Le Naour et.al.	2507.13207	null
2025-07-18	Automatically assessing oral narratives of Afrikaans and isiXhosa children	Retief Louw et.al.	2507.13205	null
2025-07-17	GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems	Jisoo Lee et.al.	2507.13190	null
2025-07-17	Black Box Deployed – Functional Criteria for Artificial Moral Agents in the LLM Era	Matthew E. Brophy et.al.	2507.13175	null
2025-07-17	Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities	Hao Sun et.al.	2507.13158	null
2025-07-16	Language Models Improve When Pretraining Data Matches Target Tasks	David Mizrahi et.al.	2507.12466	null
2025-07-16	PhysX: Physical-Grounded 3D Asset Generation	Ziang Cao et.al.	2507.12465	null
2025-07-16	CytoSAE: Interpretable Cell Embeddings for Hematology	Muhammed Furkan Dasdelen et.al.	2507.12464	null
2025-07-16	Mitigating Object Hallucinations via Sentence-Level Early Intervention	Shangpin Peng et.al.	2507.12455	null
2025-07-16	Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length	Saptarshi Mitra et.al.	2507.12442	null
2025-07-16	Describe Anything Model for Visual Question Answering on Text-rich Images	Yen-Linh Vu et.al.	2507.12441	null
2025-07-16	Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models	Yik Siu Chan et.al.	2507.12428	null
2025-07-16	Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data	Chandana Cheerla et.al.	2507.12425	null
2025-07-16	SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?	Xinyi He et.al.	2507.12415	null
2025-07-16	AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models	Santosh Vasa et.al.	2507.12414	null
2025-07-16	ROC-n-reroll: How verifier imperfection affects test-time scaling	Florian E. Dorner et.al.	2507.12399	null
2025-07-16	Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning	Jacinto Colan et.al.	2507.12391	null
2025-07-16	Probing for Arithmetic Errors in Language Models	Yucheng Sun et.al.	2507.12379	null
2025-07-16	Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker	Rachna Saxena et.al.	2507.12378	null
2025-07-16	Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics	Meysam Alizadeh et.al.	2507.12372	null
2025-07-16	Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate	Ana Davila et.al.	2507.12370	null
2025-07-16	GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities	Diganta Misra et.al.	2507.12367	null
2025-07-16	Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models	Samuel Lavoie et.al.	2507.12318	null
2025-07-16	Thought Purity: Defense Paradigm For Chain-of-Thought Attack	Zihao Xue et.al.	2507.12314	null
2025-07-16	Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization	Prashanth Vijayaraghavan et.al.	2507.12308	null
2025-07-15	Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation	Zhen Xu et.al.	2507.11540	null
2025-07-15	Streaming 4D Visual Geometry Transformer	Dong Zhuo et.al.	2507.11539	null
2025-07-15	DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering	Yinsheng Li et.al.	2507.11527	null
2025-07-15	LLM-based ambiguity detection in natural language instructions for collaborative surgical robots	Ana Davila et.al.	2507.11525	null
2025-07-15	AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air	Shiyi Yang et.al.	2507.11515	null
2025-07-15	LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer	Yaoxian Dong et.al.	2507.11457	null
2025-07-16	Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?	Yanjian Zhang et.al.	2507.11423	null
2025-07-15	Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations	Miray Özcan et.al.	2507.11417	null
2025-07-15	Seq vs Seq: An Open Suite of Paired Encoders and Decoders	Orion Weller et.al.	2507.11412	null
2025-07-15	KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?	Soumadeep Saha et.al.	2507.11408	null
2025-07-15	EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes	LG AI Research et.al.	2507.11407	null
2025-07-15	DCR: Quantifying Data Contamination in LLMs Evaluation	Cheng Xu et.al.	2507.11405	null
2025-07-15	Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs	Gabriel Bo et.al.	2507.11371	null
2025-07-15	From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation	Kelly Kurowski et.al.	2507.11364	null
2025-07-15	What is the Best Process Model Representation? A Comparative Analysis for Process Modeling with Large Language Models	Alexis Brissard et.al.	2507.11356	null
2025-07-15	Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces	Yunhao Yang et.al.	2507.11352	null
2025-07-15	RefModel: Detecting Refactorings using Foundation Models	Pedro Simões et.al.	2507.11346	null
2025-07-15	Guiding LLM Decision-Making with Fairness Reward Models	Zara Hall et.al.	2507.11344	null
2025-07-15	MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network	Jianfei Jiang et.al.	2507.11333	null
2025-07-16	Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge	Wenqing Wu et.al.	2507.11330	null
2025-07-14	EmbRACE-3K: Embodied Reasoning and Action in Complex Environments	Mingxian Lin et.al.	2507.10548	null
2025-07-14	Fusing LLM Capabilities with Routing Data	Tao Feng et.al.	2507.10540	null
2025-07-14	Graph World Model	Tao Feng et.al.	2507.10539	null
2025-07-14	CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks	Hongchao Jiang et.al.	2507.10535	null
2025-07-14	Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination	Mingqi Wu et.al.	2507.10532	null
2025-07-14	Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation	Sangmin Bae et.al.	2507.10524	null
2025-07-14	Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jiangkai Wu et.al.	2507.10510	null
2025-07-14	Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance	Kyungtae Han et.al.	2507.10500	null
2025-07-14	Can You Detect the Difference?	İsmail Tarım et.al.	2507.10475	null
2025-07-14	MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking	Mohamed T. Younes et.al.	2507.10472	null
2025-07-14	An Empirical Evaluation of AI-Powered Non-Player Characters’ Perceived Realism and Performance in Virtual Reality Environments	Mikko Korkiakoski et.al.	2507.10469	null
2025-07-14	Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems	Hammad Atta et.al.	2507.10457	null
2025-07-14	CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding	Hongyong Han et.al.	2507.10449	null
2025-07-15	Text-Visual Semantic Constrained AI-Generated Image Quality Assessment	Qiang Li et.al.	2507.10432	null
2025-07-14	Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads	Jing Li et.al.	2507.10427	null
2025-07-14	Multiple Choice Learning of Low Rank Adapters for Language Modeling	Victor Letzelter et.al.	2507.10419	null
2025-07-14	Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters	Runsheng Benson Guo et.al.	2507.10392	null
2025-07-14	Extracting Important Tokens in E-Commerce Queries with a Tag Interaction-Aware Transformer Model	Md. Ahsanul Kabir et.al.	2507.10385	null
2025-07-14	Test-Time Canonicalization by Foundation Models for Robust Perception	Utkarsh Singhal et.al.	2507.10375	null
2025-07-14	Beyond Graph Model: Reliable VLM Fine-Tuning via Random Graph Adapter	Bo Jiang et.al.	2507.10355	null
2025-07-11	The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?	Denis Sutter et.al.	2507.08802	null
2025-07-11	Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective	Hangjie Yuan et.al.	2507.08801	null
2025-07-11	KV Cache Steering for Inducing Reasoning in Small Language Models	Max Belitsky et.al.	2507.08799	null
2025-07-11	One Token to Fool LLM-as-a-Judge	Yulai Zhao et.al.	2507.08794	null
2025-07-11	From One to More: Contextual Part Latents for 3D Generation	Shaocong Dong et.al.	2507.08772	null
2025-07-11	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	Chenyang Song et.al.	2507.08771	null
2025-07-11	EqualMotion: Accessible Motion Capture for the Creative Industries	Clarice Hilton et.al.	2507.08744	null
2025-07-11	Multilingual Multimodal Software Developer for Code Generation	Linzheng Chai et.al.	2507.08719	null
2025-07-11	Unreal is all you need: Multimodal ISAC Data Simulation with Only One Engine	Kongwu Huang et.al.	2507.08716	null
2025-07-11	KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation	Songlin Zhai et.al.	2507.08704	null
2025-07-11	ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way	Rajarshi Roy et.al.	2507.08679	null
2025-07-11	LLMCup: Ranking-Enhanced Comment Updating with LLMs	Hua Ge et.al.	2507.08671	null
2025-07-11	KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment	Jiyao Zhang et.al.	2507.08665	null
2025-07-11	Introspection of Thought Helps AI Agents	Haoran Sun et.al.	2507.08664	null
2025-07-11	Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning	Xingguang Ji et.al.	2507.08649	null
2025-07-11	DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images	Haoran Sun et.al.	2507.08648	null
2025-07-11	NL in the Middle: Code Translation with LLMs and Intermediate Representations	Chi-en Amy Tai et.al.	2507.08627	null
2025-07-11	Adaptive Framework for Ambient Intelligence in Rehabilitation Assistance	Gábor Baranyi et.al.	2507.08624	null
2025-07-11	A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1	Marcin Pietroń et.al.	2507.08621	null
2025-07-11	Agentic Large Language Models for Conceptual Systems Engineering and Design	Soheyl Massoudi et.al.	2507.08619	null
2025-07-10	Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs	Ziyue Li et.al.	2507.07996	null
2025-07-10	Multigranular Evaluation for Brain Visual Decoding	Weihao Xia et.al.	2507.07993	null
2025-07-10	Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs	Jeongseok Hyun et.al.	2507.07990	null
2025-07-10	Automating Expert-Level Medical Reasoning Evaluation of Large Language Models	Shuang Zhou et.al.	2507.07988	null
2025-07-10	CLIP Won’t Learn Object-Attribute Binding from Natural Data and Here is Why	Bijay Gurung et.al.	2507.07985	null
2025-07-10	OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding	JingLi Lin et.al.	2507.07984	null
2025-07-10	Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology	Sabine Felde et.al.	2507.07983	null
2025-07-10	Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling	Haoyu Wu et.al.	2507.07982	null
2025-07-10	Why is Your Language Model a Poor Implicit Reward Model?	Noam Razin et.al.	2507.07981	null
2025-07-10	Defending Against Prompt Injection With a Few DefensiveTokens	Sizhe Chen et.al.	2507.07974	null
2025-07-10	Scaling RL to Long Videos	Yukang Chen et.al.	2507.07966	null
2025-07-10	MIRIX: Multi-Agent Memory System for LLM-Based Agents	Yu Wang et.al.	2507.07957	null
2025-07-10	Dynamic Chunking for End-to-End Hierarchical Sequence Modeling	Sukjun Hwang et.al.	2507.07955	null
2025-07-10	Input Conditioned Layer Dropping in Speech Foundation Models	Abdul Hannan et.al.	2507.07954	null
2025-07-10	SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment	Guoxin Zang et.al.	2507.07939	null
2025-07-10	Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations	Federico Maria Cau et.al.	2507.07916	null
2025-07-10	MIRA: A Novel Framework for Fusing Modalities in Medical RAG	Jinhong Wang et.al.	2507.07902	null
2025-07-10	An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis	Mingda Zhang et.al.	2507.07893	null
2025-07-10	Automating MD simulations for Proteins using Large language Models: NAMD-Agent	Achuth Chandrasekhar et.al.	2507.07887	null
2025-07-10	Opting Out of Generative AI: a Behavioral Experiment on the Role of Education in Perplexity AI Avoidance	Roberto Ulloa et.al.	2507.07881	null
2025-07-09	Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor	Vatsal Agarwal et.al.	2507.07106	null
2025-07-09	4KAgent: Agentic Any Image to 4K Super-Resolution	Yushen Zuo et.al.	2507.07105	null
2025-07-09	Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models	Tiezheng Zhang et.al.	2507.07104	null
2025-07-09	Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful	Martin Marek et.al.	2507.07101	null
2025-07-09	Evaluating Attribute Confusion in Fashion Text-to-Image Generation	Ziyue Liu et.al.	2507.07079	null
2025-07-09	5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage	Ugur Ari et.al.	2507.07045	null
2025-07-09	UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations	Fengran Mo et.al.	2507.07030	null
2025-07-09	FlexOlmo: Open Language Models for Flexible Data Use	Weijia Shi et.al.	2507.07024	null
2025-07-09	First Return, Entropy-Eliciting Explore	Tianyu Zheng et.al.	2507.07017	null
2025-07-09	Integrating Pathology Foundation Models and Spatial Transcriptomics for Cellular Decomposition from Histology Images	Yutong Sun et.al.	2507.07013	null
2025-07-09	GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning	S M Taslim Uddin Raju et.al.	2507.07006	null
2025-07-09	Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs	Yahan Yu et.al.	2507.06999	null
2025-07-09	MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation	Qilong Xing et.al.	2507.06992	null
2025-07-09	Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation	Binquan Zhang et.al.	2507.06980	null
2025-07-09	Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM	Qiyuan Dai et.al.	2507.06973	null
2025-07-09	Scaling Towards the Information Boundary of Instruction Set: InfinityInstruct-Subject Technical Report	Li Du et.al.	2507.06968	null
2025-07-09	CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale	Xiao Liang et.al.	2507.06959	null
2025-07-09	Investigating the Robustness of Retrieval-Augmented Generation at the Query Level	Sezen Perçin et.al.	2507.06956	null
2025-07-10	What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models	Keyon Vafa et.al.	2507.06952	null
2025-07-10	Rethinking Verification for LLM Code Generation: From Generation to Testing	Zihan Ma et.al.	2507.06920	null
2025-07-08	RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models	Keyan Chen et.al.	2507.06231	null
2025-07-08	Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers	Zhiyuan Peng et.al.	2507.06223	null
2025-07-08	Aligned Textual Scoring Rules	Yuxuan Lu et.al.	2507.06221	null
2025-07-08	Is Diversity All You Need for Scalable Robotic Manipulation?	Modi Shi et.al.	2507.06219	null
2025-07-08	CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions	Yuchen Huang et.al.	2507.06210	null
2025-07-08	Ontological differentiation as a measure of semantic accuracy	Pablo Garcia-Cuadrillero et.al.	2507.06208	null
2025-07-08	Differential Mamba	Nadav Schneider et.al.	2507.06204	null
2025-07-08	A Survey on Latent Reasoning	Rui-Jie Zhu et.al.	2507.06203	null
2025-07-08	UQLM: A Python Package for Uncertainty Quantification in Large Language Models	Dylan Bouchard et.al.	2507.06196	null
2025-07-08	SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads	Jiale Lao et.al.	2507.06192	null
2025-07-08	The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains	Scott Geng et.al.	2507.06187	null
2025-07-08	Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review	Zhicheng Lin et.al.	2507.06185	null
2025-07-08	Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling	Prahitha Movva et.al.	2507.06183	null
2025-07-08	Data-Semantics-Aware Recommendation of Diverse Pivot Tables	Whanhee Cho et.al.	2507.06171	null
2025-07-09	Skywork-R1V3 Technical Report	Wei Shen et.al.	2507.06167	null
2025-07-08	Evaluation of Habitat Robotics using Large Language Models	William Li et.al.	2507.06157	null
2025-07-08	Large Language Models Predict Human Well-being – But Not Equally Everywhere	Pat Pataranutaporn et.al.	2507.06141	null
2025-07-08	LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models	Zhihao Chen et.al.	2507.06140	null
2025-07-08	Coding Triangle: How Does Large Language Model Understand Code?	Taolin Zhang et.al.	2507.06138	null
2025-07-08	PrefixAgent: An LLM-Powered Design Framework for Efficient Prefix Adder Optimization	Dongsheng Zuo et.al.	2507.06127	null
2025-07-07	Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing	Chun-Hsiao Yeh et.al.	2507.05259	null
2025-07-07	Spatio-Temporal LLM: Reasoning about Environments and Actions	Haozhen Zheng et.al.	2507.05258	null
2025-07-07	Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions	Yuanzhe Hu et.al.	2507.05257	null
2025-07-07	Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning	Yana Wei et.al.	2507.05255	null
2025-07-07	Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models	Ziqi Miao et.al.	2507.05248	null
2025-07-07	When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors	Scott Emmons et.al.	2507.05246	null
2025-07-07	StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling	Meng Wei et.al.	2507.05240	null
2025-07-07	Logit Reweighting for Topic-Focused Summarization	Joschka Braun et.al.	2507.05235	null
2025-07-07	NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving	Qucheng Peng et.al.	2507.05227	null
2025-07-07	QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions	Zhun Deng et.al.	2507.05220	null
2025-07-07	All in One: Visual-Description-Guided Unified Point Cloud Segmentation	Zongyan Han et.al.	2507.05211	null
2025-07-07	MedGemma Technical Report	Andrew Sellergren et.al.	2507.05201	null
2025-07-07	Train-before-Test Harmonizes Language Model Rankings	Guanhua Zhang et.al.	2507.05195	null
2025-07-07	CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale	Jonathan Hyun et.al.	2507.05178	null
2025-07-08	OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model	Chen Wang et.al.	2507.05177	null
2025-07-07	Differential Attention for Multimodal Crisis Event Analysis	Nusrat Munia et.al.	2507.05165	null
2025-07-07	InfoSteer: Steering Information Utility in Language Model Post-Training	Chunyuan Deng et.al.	2507.05158	null
2025-07-07	AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models	Chinnappa Guggilla et.al.	2507.05157	null
2025-07-07	Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization	Jaewook Lee et.al.	2507.05137	null
2025-07-07	LERa: Replanning with Visual Feedback in Instruction Following	Svyatoslav Pchelintsev et.al.	2507.05135	null
2025-07-03	Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation	Jiaer Xia et.al.	2507.02859	null
2025-07-03	Requirements Elicitation Follow-Up Question Generation	Yuchen Shen et.al.	2507.02858	null
2025-07-03	Answer Matching Outperforms Multiple Choice for Language Model Evaluation	Nikhil Chandak et.al.	2507.02856	null
2025-07-03	MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs	Purbesh Mitra et.al.	2507.02851	null
2025-07-03	LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users	Almog Hilel et.al.	2507.02850	null
2025-07-03	Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection	Ziqi Miao et.al.	2507.02844	null
2025-07-03	LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding	Yuchen Ma et.al.	2507.02843	null
2025-07-03	StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason	Kaiyi Zhang et.al.	2507.02841	null
2025-07-03	ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning	Ruiyang Zhou et.al.	2507.02834	null
2025-07-03	Generalizing Verifiable Instruction Following	Valentina Pyatkin et.al.	2507.02833	null
2025-07-03	SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model	Wencheng Zhang et.al.	2507.02822	null
2025-07-03	Multimodal Mathematical Reasoning with Diverse Solving Perspective	Wenhao Shi et.al.	2507.02804	null
2025-07-03	Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models	Riccardo Cantini et.al.	2507.02799	null
2025-07-03	No time to train! Training-Free Reference-Based Instance Segmentation	Miguel Espinosa et.al.	2507.02798	null
2025-07-03	From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding	Xiangfeng Wang et.al.	2507.02790	null
2025-07-03	Moral Responsibility or Obedience: What Do We Want from AI?	Joseph Boland et.al.	2507.02788	null
2025-07-03	Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs	Ken Tsui et.al.	2507.02778	null
2025-07-03	KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs	Yuzhang Xie et.al.	2507.02773	null
2025-07-03	DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment	Ke-Han Lu et.al.	2507.02768	null
2025-07-03	Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work	Guangwei Zhang et.al.	2507.02760	null
2025-07-02	How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks	Rahul Ramachandran et.al.	2507.01955	null
2025-07-02	Kwai Keye-VL Technical Report	Kwai Keye Team et.al.	2507.01949	null
2025-07-02	SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars	Xiaosheng Zhao et.al.	2507.01939	null
2025-07-02	The Thin Line Between Comprehension and Persuasion in LLMs	Adrian de Wynter et.al.	2507.01936	null
2025-07-03	Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations	Wenhao Wang et.al.	2507.01930	null
2025-07-02	A Survey on Vision-Language-Action Models: An Action Tokenization Perspective	Yifan Zhong et.al.	2507.01925	null
2025-07-03	Decision-Oriented Text Evaluation	Yu-Shiang Huang et.al.	2507.01923	null
2025-07-02	Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models	Chengao Li et.al.	2507.01915	null
2025-07-02	Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning	Qingdong He et.al.	2507.01908	null
2025-07-02	AI4Research: A Survey of Artificial Intelligence for Scientific Research	Qiguang Chen et.al.	2507.01903	null
2025-07-02	High-Layer Attention Pruning with Rescaling	Songtao Liu et.al.	2507.01900	null
2025-07-02	MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants	Dongyi Ding et.al.	2507.01887	null
2025-07-02	A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs	Niccolò McConnell et.al.	2507.01881	null
2025-07-02	Towards Foundation Auto-Encoders for Time-Series Anomaly Detection	Gastón García González et.al.	2507.01875	null
2025-07-02	DIY-MKG: An LLM-Based Polyglot Language Learning System	Kenan Tang et.al.	2507.01872	null
2025-07-02	Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents	Sanjay Krishna Anbalagan et.al.	2507.01862	null
2025-07-02	TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types	Yuhao Lin et.al.	2507.01857	null
2025-07-02	Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages	Samridhi Raj Sinha et.al.	2507.01853	null
2025-07-02	Low-Perplexity LLM-Generated Sequences and Where To Find Them	Arthur Wuhrmann et.al.	2507.01844	null
2025-07-02	MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics	Dmytro Kuzmenko et.al.	2507.01843	null
2025-07-01	Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives	Sixun Dong et.al.	2506.24124	null
2025-06-30	Calligrapher: Freestyle Text Image Customization	Yue Ma et.al.	2506.24123	null
2025-06-30	Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime	Yuqing Wang et.al.	2506.24120	null
2025-07-01	SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning	Bo Liu et.al.	2506.24119	null
2025-07-01	Intertextual Parallel Detection in Biblical Hebrew: A Transformer-Based Benchmark	David M. Smiley et.al.	2506.24117	null
2025-06-30	On the Predictive Power of Representation Dispersion in Language Models	Yanhong Li et.al.	2506.24106	null
2025-06-30	DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World	Xiangtai Li et.al.	2506.24102	null
2025-06-30	MotionGPT3: Human Motion as a Second Modality	Bingfan Zhu et.al.	2506.24086	null
2025-06-30	Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models	Tung-Ling Li et.al.	2506.24056	null
2025-06-30	Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC	Xinming Wei et.al.	2506.24045	null
2025-06-30	A Survey on Vision-Language-Action Models for Autonomous Driving	Sicong Jiang et.al.	2506.24044	null
2025-06-30	Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data	Shubhabrata Mukherjee et.al.	2506.24039	null
2025-06-30	Ella: Embodied Social Agents with Lifelong Memory	Hongxin Zhang et.al.	2506.24019	null
2025-06-30	EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations	Hyunjong Kim et.al.	2506.24016	null
2025-06-30	Large Language Models Don’t Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective	Anselm R. Strohmaier et.al.	2506.24006	null
2025-06-30	The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models	Lijun Sheng et.al.	2506.24000	null
2025-06-30	Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning	Seungjun Yi et.al.	2506.23998	null
2025-06-30	StyleDrive: Towards Driving-Style Aware Benchmarking of End-To-End Autonomous Driving	Ruiyang Hao et.al.	2506.23982	null
2025-06-30	TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation	Renren Jin et.al.	2506.23979	null
2025-06-30	Visual and Memory Dual Adapter for Multi-Modal Object Tracking	Boyue Xu et.al.	2506.23972	null
2025-06-27	MiCo: Multi-image Contrast for Reinforcement Visual Reasoning	Xi Chen et.al.	2506.22434	null
2025-06-27	The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements	Bingchen Zhao et.al.	2506.22419	null
2025-06-27	Sequential Diagnosis with Language Models	Harsha Nori et.al.	2506.22405	null
2025-06-27	HyperCLOVA X THINK Technical Report	NAVER Cloud HyperCLOVA X Team et.al.	2506.22403	null
2025-06-27	Refining Czech GEC: Insights from a Multi-Experiment Approach	Petr Pechman et.al.	2506.22402	null
2025-06-27	QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-06-27	Test-Time Consistency in Vision Language Models	Shih-Han Chou et.al.	2506.22395	null
2025-06-27	What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub	Ramtin Ehsani et.al.	2506.22390	null
2025-06-27	Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment	Yue Zhang et.al.	2506.22385	null
2025-06-27	Probabilistic Optimality for Inference-time Scaling	Youkang Wang et.al.	2506.22376	null
2025-06-27	Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation	Tiankai Chen et.al.	2506.22375	null
2025-06-27	Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement	Maryam Mousavian et.al.	2506.22372	null
2025-06-27	Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny	Carolina Carreira et.al.	2506.22370	null
2025-06-27	DiffSoundStream: Efficient Speech Tokenization via Diffusion Decoding	Yang Yang et.al.	2506.22362	null
2025-06-27	Concept-Level AI for Telecom: Moving Beyond Large Language Models	Viswanath Kumarskandpriya et.al.	2506.22359	null
2025-06-27	Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts	Xiang Li et.al.	2506.22343	null
2025-06-27	Evaluating Scoring Bias in LLM-as-a-Judge	Qingquan Li et.al.	2506.22316	null
2025-06-27	Detection of Personal Data in Structured Datasets Using a Large Language Model	Albert Agisha Ntwali et.al.	2506.22305	null
2025-06-27	Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment	Rui Xu et.al.	2506.22283	null
2025-06-27	COOCO – Common Objects Out-of-Context – Semantic Violation in Scenes: Investigating Multimodal Context in Referential Communication	Filippo Merlo et.al.	2506.22274	null
2025-06-26	Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test	Ziyue Li et.al.	2506.21551	null
2025-06-26	mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale	Xiaona Zhou et.al.	2506.21550	null
2025-06-26	SAM4D: Segment Anything in Camera and LiDAR Streams	Jianyun Xu et.al.	2506.21547	null
2025-06-26	Data Efficacy for Language Model Training	Yalun Dai et.al.	2506.21545	null
2025-06-26	PsyLite Technical Report	Fangjun Ding et.al.	2506.21536	null
2025-06-26	Exploring the Design Space of 3D MLLMs for CT Report Generation	Mohammed Baharoon et.al.	2506.21535	null
2025-06-26	“What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets	Akshay Paruchuri et.al.	2506.21532	null
2025-06-26	Potemkin Understanding in Large Language Models	Marina Mancoridis et.al.	2506.21521	null
2025-06-26	Assessing an evolutionary search engine for small language models, prompts, and evaluation metrics	Cláudio Lúcio do Val Lopes et.al.	2506.21512	null
2025-06-26	Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration	Jiahe Chen et.al.	2506.21509	null
2025-06-26	skLEP: A Slovak General Language Understanding Benchmark	Marek Šuppa et.al.	2506.21508	null
2025-06-26	Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge	Boyu Gou et.al.	2506.21506	null
2025-06-26	Bridging Offline and Online Reinforcement Learning for LLMs	Jack Lanchantin et.al.	2506.21495	null
2025-06-26	Global and Local Entailment Learning for Natural World Imagery	Srikumar Sastry et.al.	2506.21476	null
2025-06-26	TopK Language Models	Ryosuke Takahashi et.al.	2506.21468	null
2025-06-26	Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces	Michael Johnston et.al.	2506.21467	null
2025-06-26	Aligning Spoken Dialogue Models from User Interactions	Anne Wu et.al.	2506.21463	null
2025-06-26	Spatial Mental Modeling from Limited Views	Baiqiao Yin et.al.	2506.21458	null
2025-06-26	ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing	Huadai Liu et.al.	2506.21448	null
2025-06-26	Text2Cypher Across Languages: Evaluating Foundational Models Beyond English	Makbule Gulcin Ozsoy et.al.	2506.21445	null
2025-06-25	The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind	Andrei Lupu et.al.	2506.20664	null
2025-06-25	Memento: Note-Taking for Your Future Self	Chao Wan et.al.	2506.20642	null
2025-06-25	Towards Community-Driven Agents for Machine Learning Engineering	Sijie Li et.al.	2506.20640	null
2025-06-26	DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation	Shansan Gong et.al.	2506.20639	null
2025-06-25	Shape2Animal: Creative Animal Generation from Natural Silhouettes	Quoc-Duy Tran et.al.	2506.20616	null
2025-06-25	AI Assistants to Enhance and Exploit the PETSc Knowledge Base	Barry Smith et.al.	2506.20608	null
2025-06-25	Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm	Baixiang Huang et.al.	2506.20606	null
2025-06-25	Video Perception Models for 3D Scene Synthesis	Rui Huang et.al.	2506.20601	null
2025-06-25	HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction	Zhonghao Shi et.al.	2506.20566	null
2025-06-25	Large Language Model-Driven Code Compliance Checking in Building Information Modeling	Soumya Madireddy et.al.	2506.20551	null
2025-06-25	When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs	Ammar Khairi et.al.	2506.20544	null
2025-06-25	WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads	Hongzhen Huang et.al.	2506.20535	null
2025-06-25	Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	Wenbin Gan et.al.	2506.20531	null
2025-06-25	Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards	Charles Arnal et.al.	2506.20520	null
2025-06-25	OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling	Zengzhi Wang et.al.	2506.20512	null
2025-06-25	BotHash: Efficient and Training-Free Bot Detection Through Approximate Nearest Neighbor	Edoardo Di Paolo et.al.	2506.20503	null
2025-06-25	ReCode: Updating Code API Knowledge with Reinforcement Learning	Haoze Wu et.al.	2506.20495	null
2025-06-25	Brains and language models converge on a shared conceptual space across different languages	Zaid Zada et.al.	2506.20489	null
2025-06-25	Behavior Foundation Model: Towards Next-Generation Whole-Body Control System of Humanoid Robots	Mingqi Yuan et.al.	2506.20487	null
2025-06-25	Counterfactual Influence as a Distributional Quantity	Matthieu Meeus et.al.	2506.20481	null
2025-06-24	Unified Vision-Language-Action Model	Yuqi Wang et.al.	2506.19850	null
2025-06-24	Orthogonal Finetuning Made Scalable	Zeju Qiu et.al.	2506.19847	null
2025-06-24	JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning	Ai Han et.al.	2506.19846	null
2025-06-24	MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration	Yucheng Zhou et.al.	2506.19835	null
2025-06-24	Evaluating Compliance with Visualization Guidelines in Diagrams for Scientific Publications Using Large Vision Language Models	Johannes Rückert et.al.	2506.19825	null
2025-06-24	Persona Features Control Emergent Misalignment	Miles Wang et.al.	2506.19823	null
2025-06-24	CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation	Hao Li et.al.	2506.19816	null
2025-06-24	Curating art exhibitions using machine learning	Eurico Covas et.al.	2506.19813	null
2025-06-24	KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality	Baochang Ren et.al.	2506.19807	null
2025-06-24	LLM-Based Social Simulations Require a Boundary	Zengqing Wu et.al.	2506.19806	null
2025-06-24	KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs	Xin Fan Guo et.al.	2506.19802	null
2025-06-24	Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study	Yuqi Zhu et.al.	2506.19794	null
2025-06-24	SAGE: Strategy-Adaptive Generation Engine for Query Rewriting	Teng Wang et.al.	2506.19783	null
2025-06-24	Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment	Yuhui Sun et.al.	2506.19780	null
2025-06-24	SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning	Yuqian Fu et.al.	2506.19767	null
2025-06-24	Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis	Omar A. Essameldin et.al.	2506.19753	null
2025-06-24	Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?	Chuxuan Hu et.al.	2506.19733	null
2025-06-24	LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis	Lei Kang et.al.	2506.19702	null
2025-06-24	Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models	Jungwoo Park et.al.	2506.19697	null
2025-06-24	UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation	Yue Zhou et.al.	2506.19694	null
2025-06-23	Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations	Jiaming Han et.al.	2506.18898	null
2025-06-23	ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs	Jiaru Zou et.al.	2506.18896	null
2025-06-23	Steering Conceptual Bias via Transformer Latent-Subspace Activation	Vansh Sharma et.al.	2506.18887	null
2025-06-23	Universal Video Temporal Grounding with Generative Multi-modal Large Language Models	Zeqian Li et.al.	2506.18883	null
2025-06-23	OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization	Yiyou Sun et.al.	2506.18880	null
2025-06-23	CommVQ: Commutative Vector Quantization for KV Cache Compression	Junyan Li et.al.	2506.18879	null
2025-06-23	OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation	Qijun Gan et.al.	2506.18866	null
2025-06-23	TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting	Zhongbin Guo et.al.	2506.18862	null
2025-06-23	LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning	Yuhao Wu et.al.	2506.18841	null
2025-06-23	STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning	Aryasomayajula Ram Bharadwaj et.al.	2506.18831	null
2025-06-23	Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories	Islem Bouzenia et.al.	2506.18824	null
2025-06-23	RWESummary: A Framework and Test for Choosing Large Language Models to Summarize Real-World Evidence (RWE) Studies	Arjun Mukerji et.al.	2506.18819	null
2025-06-23	Context-Aware CodeLLM Eviction for AI-assisted Coding	Kishanthan Thangarajah et.al.	2506.18796	null
2025-06-23	TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation	Kamil Szczepanik et.al.	2506.18783	null
2025-06-23	Existing LLMs Are Not Self-Consistent For Simple Tasks	Zhenru Lin et.al.	2506.18781	null
2025-06-23	Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training	Jonathan Cook et.al.	2506.18777	null
2025-06-23	Towards Group Fairness with Multiple Sensitive Attributes in Federated Foundation Models	Yuning Yang et.al.	2506.18732	null
2025-06-23	PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries	Steven Kolawole et.al.	2506.18728	null
2025-06-23	Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation	Jie Li et.al.	2506.18716	link
2025-06-23	LLM-enhanced Interactions in Human-Robot Collaborative Drawing with Older Adults	Marianne Bossema et.al.	2506.18711	null
2025-06-20	VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning	Zhangyang Qi et.al.	2506.17221	null
2025-06-20	No Free Lunch: Rethinking Internal Feedback for LLM Reasoning	Yanzhi Zhang et.al.	2506.17219	null
2025-06-20	Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens	Zeyuan Yang et.al.	2506.17218	link
2025-06-20	BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning	Xuechen Zhang et.al.	2506.17211	null
2025-06-20	Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency	Kathleen C. Fraser et.al.	2506.17209	null
2025-06-20	Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems	Matias Martinez et.al.	2506.17208	null
2025-06-20	DreamCube: 3D Panorama Generation via Multi-plane Synchronization	Yukun Huang et.al.	2506.17206	null
2025-06-20	Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction	Jiekai Ma et.al.	2506.17203	null
2025-06-20	Detecting LLM-Generated Short Answers and Effects on Learner Performance	Shambhavi Bhushan et.al.	2506.17196	link
2025-06-20	CLEAR-3K: Assessing Causal Explanatory Capabilities in Language Models	Naiming Liu et.al.	2506.17180	null
2025-06-20	The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making	Abinitha Gourabathina et.al.	2506.17163	null
2025-06-20	Analyzing PDFs like Binaries: Adversarially Robust PDF Malware Analysis via Intermediate Representation and Language Model	Side Liu et.al.	2506.17162	null
2025-06-20	Do We Need Large VLMs for Spotting Soccer Actions?	Ritabrata Chakraborty et.al.	2506.17144	null
2025-06-20	MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor Classification	David Jacob Drexlin et.al.	2506.17140	null
2025-06-20	Large Language Model Unlearning for Source Code	Xue Jiang et.al.	2506.17125	null
2025-06-20	When Can Model-Free Reinforcement Learning be Enough for Thinking?	Josiah P. Hanna et.al.	2506.17124	null
2025-06-20	Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?	Adithya Bhaskar et.al.	2506.17121	link
2025-06-20	Reassessing Code Authorship Attribution in the Era of Language Models	Atish Kumar Dipongkor et.al.	2506.17120	null
2025-06-20	Are Bias Evaluation Methods Biased ?	Lina Berrayana et.al.	2506.17111	null
2025-06-20	Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving	Chuxue Cao et.al.	2506.17104	null
2025-06-18	PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning	Yuhui Shi et.al.	2506.15683	null
2025-06-18	GenRecal: Generation after Recalibration from Large to Small Vision-Language Models	Byung-Kwan Lee et.al.	2506.15681	null
2025-06-18	Dense SAE Latents Are Features, Not Bugs	Xiaoqing Sun et.al.	2506.15679	null
2025-06-18	SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence	Yao Zhang et.al.	2506.15672	null
2025-06-18	CC-LEARN: Cohort-based Consistency Learning	Xiao Ye et.al.	2506.15662	null
2025-06-18	PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection	Wenhao Li et.al.	2506.15656	null
2025-06-18	AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning	Tevin Wang et.al.	2506.15651	null
2025-06-18	Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning	Ankan Deria et.al.	2506.15649	null
2025-06-18	deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses	Georgios Androutsopoulos et.al.	2506.15648	null
2025-06-18	Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement	Weixiang Zhao et.al.	2506.15647	null
2025-06-18	Demystifying the Visual Quality Paradox in Multimodal Large Language Models	Shuo Xing et.al.	2506.15645	null
2025-06-18	FindingDory: A Benchmark to Evaluate Memory in Embodied Agents	Karmesh Yadav et.al.	2506.15635	null
2025-06-18	Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability	Yusuke Sakai et.al.	2506.15629	null
2025-06-18	The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games	Lyle Goodyear et.al.	2506.15624	null
2025-06-18	The Compositional Architecture of Regret in Large Language Models	Xiangxiang Cui et.al.	2506.15617	null
2025-06-18	BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion	Yuqing Lan et.al.	2506.15610	null
2025-06-18	LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning	Gabrel J. Perin et.al.	2506.15606	link
2025-06-18	LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters	Kunming Zhang et.al.	2506.15595	null
2025-06-18	WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts	Negar Foroutan et.al.	2506.15594	link
2025-06-18	DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement	Shaoqing Lin et.al.	2506.15583	link
2025-06-17	A Variational Framework for Improving Naturalness in Generative Spoken Language Models	Li-Wei Chen et.al.	2506.14767	link
2025-06-17	ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM	Yujun Wang et.al.	2506.14766	null
2025-06-17	Scaling-Up the Pretraining of the Earth Observation Foundation Model PhilEO to the MajorTOM Dataset	Nikolaos Dionelis et.al.	2506.14765	link
2025-06-17	RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills	Chunru Lin et.al.	2506.14763	null
2025-06-17	From Bytes to Ideas: Language Modeling with Autoregressive U-Nets	Mathurin Videau et.al.	2506.14761	link
2025-06-17	Reasoning with Exploration: An Entropy Perspective	Daixuan Cheng et.al.	2506.14758	null
2025-06-17	Large Language Models – the Future of Fundamental Physics?	Caroline Heneka et.al.	2506.14757	null
2025-06-17	Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs	Ring Team et.al.	2506.14731	null
2025-06-17	AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes	Jiahao Qiu et.al.	2506.14728	null
2025-06-17	Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models	Huihan Liu et.al.	2506.14727	null
2025-06-17	Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data	Anton Changalidis et.al.	2506.14704	link
2025-06-17	AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions	Aishan Liu et.al.	2506.14697	null
2025-06-17	Unified Software Engineering agent as AI Software Engineer	Leonhard Applis et.al.	2506.14683	null
2025-06-17	AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models	Ads Dawson et.al.	2506.14682	link
2025-06-17	Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality	Yuto Harada et.al.	2506.14681	null
2025-06-17	Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models	Ling Li et.al.	2506.14674	null
2025-06-17	StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery	Jina Kim et.al.	2506.14670	null
2025-06-17	GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors	Hengyuan Zhang et.al.	2506.14646	link
2025-06-17	Passing the Turing Test in Political Discourse: Fine-Tuning LLMs to Mimic Polarized Social Media Comments	. Pazzaglia et.al.	2506.14645	null
2025-06-17	Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot	Xiang Cheng et.al.	2506.14641	null
2025-06-16	Touch begins where vision ends: Generalizable policies for contact-rich manipulation	Zifan Zhao et.al.	2506.13762	null
2025-06-16	Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins	Chuanruo Ning et.al.	2506.13761	null
2025-06-16	Discrete Diffusion in Large Language and Multimodal Models: A Survey	Runpeng Yu et.al.	2506.13759	link
2025-06-16	AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning	Zewei Zhou et.al.	2506.13757	link
2025-06-16	Steering LLM Thinking with Budget Guidance	Junyan Li et.al.	2506.13752	link
2025-06-16	Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability	Shova Kuikel et.al.	2506.13746	link
2025-06-16	Instruction Following by Boosting Attention of Large Language Models	Vitoria Guardieiro et.al.	2506.13734	null
2025-06-16	Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs	Sayed Mohammad Vakilzadeh Hatefi et.al.	2506.13727	link
2025-06-16	Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models	Arjun Krishna et.al.	2506.13726	null
2025-06-16	OTFusion: Bridging Vision-only and Vision-Language Models via Optimal Transport for Transductive Zero-Shot Learning	Qiyu Xu et.al.	2506.13723	null
2025-06-16	TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Junru Zhang et.al.	2506.13705	link
2025-06-16	Value-Free Policy Optimization via Reward Partitioning	Bilal Faye et.al.	2506.13702	link
2025-06-16	Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems	Shang-Chi Tsai et.al.	2506.13692	null
2025-06-16	What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers	Pulkit Gopalani et.al.	2506.13688	link
2025-06-16	An LLM’s Apology: Outsourcing Awkwardness in the Age of AI	Twm Stone et.al.	2506.13685	link
2025-06-16	Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models	Rylan Schaeffer et.al.	2506.13681	null
2025-06-16	ROSA: Harnessing Robot States for Vision-Language and Action Alignment	Yuqing Wen et.al.	2506.13679	null
2025-06-16	Prefix-Tuning+: Modernizing Prefix-Tuning through Attention Independent Prefix Data	Haonan Wang et.al.	2506.13674	null
2025-06-16	We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems	Junfeng Fang et.al.	2506.13666	link
2025-06-16	DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models	Yunnong Chen et.al.	2506.13663	null
2025-06-13	EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction	Hsi-Che Lin et.al.	2506.12015	null
2025-06-13	code_transformed: The Influence of Large Language Models on Code	Yuliang Xu et.al.	2506.12014	null
2025-06-13	Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making	Xiaopeng Yuan et.al.	2506.12012	null
2025-06-13	Affogato: Learning Open-Vocabulary Affordance Grounding with Automated Data Generation at Scale	Junha Lee et.al.	2506.12009	null
2025-06-13	Generative Representational Learning of Foundation Models for Recommendation	Zheli Zhou et.al.	2506.11999	null
2025-06-13	pLSTM: parallelizable Linear Source Transition Mark networks	Korbinian Pöppel et.al.	2506.11997	null
2025-06-13	VGR: Visual Grounded Reasoning	Jiacong Wang et.al.	2506.11991	null
2025-06-13	How Visual Representations Map to Language Feature Space in Multimodal LLMs	Constantin Venhoff et.al.	2506.11976	null
2025-06-13	Improving Large Language Model Safety with Contrastive Representation Learning	Samuel Simko et.al.	2506.11938	link
2025-06-13	Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback	Dongwei Jiang et.al.	2506.11930	null
2025-06-13	LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?	Zihan Zheng et.al.	2506.11928	null
2025-06-13	GeistBERT: Breathing Life into German NLP	Raphael Scheible-Schmitt et.al.	2506.11903	null
2025-06-13	Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache	Xiaoran Liu et.al.	2506.11886	null
2025-06-13	Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment	Alejandro Peña et.al.	2506.11880	null
2025-06-13	A Short Survey on Formalising Software Requirements using Large Language Models	Arshad Beg et.al.	2506.11874	null
2025-06-13	Post Persona Alignment for Multi-Session Dialogue Generation	Yi-Pei Chen et.al.	2506.11857	null
2025-06-13	TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks	Qihai Zhang et.al.	2506.11844	null
2025-06-13	Your Ride, Your Rules: Psychology and Cognition Enabled Automated Driving Systems	Zhipeng Bao et.al.	2506.11842	null
2025-06-13	CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm	Dingkun Liu et.al.	2506.11830	null
2025-06-13	Revealing Political Bias in LLMs through Structured Multi-Agent Debate	Aishwarya Bandaru et.al.	2506.11825	link
2025-06-12	AutoMind: Adaptive Knowledgeable Agent for Automated Data Science	Yixin Ou et.al.	2506.10974	link
2025-06-12	Farseer: A Refined Scaling Law in Large Language Models	Houyi Li et.al.	2506.10972	link
2025-06-12	Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs	Qizhe Zhang et.al.	2506.10967	link
2025-06-12	GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation	Ning Gao et.al.	2506.10966	null
2025-06-12	ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark	Kangwei Liu et.al.	2506.10960	link
2025-06-12	Distillation of atomistic foundation models across architectures and chemical domains	John L. A. Gardner et.al.	2506.10956	link
2025-06-12	SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks	Lianghong Guo et.al.	2506.10954	link
2025-06-12	Build the web for agents, not agents for the web	Xing Han Lù et.al.	2506.10953	null
2025-06-12	Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training	Mozhi Zhang et.al.	2506.10952	null
2025-06-12	Execution Guided Line-by-Line Code Generation	Boaz Lavon et.al.	2506.10948	link
2025-06-12	GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models	Evelyn Ma et.al.	2506.10946	null
2025-06-12	Self-Adapting Language Models	Adam Zweiger et.al.	2506.10943	null
2025-06-12	Dynamic Epistemic Friction in Dialogue	Timothy Obiso et.al.	2506.10934	null
2025-06-12	The Role of Generative AI in Facilitating Social Interactions: A Scoping Review	T. T. J. E. Arets et.al.	2506.10927	null
2025-06-12	Robustly Improving LLM Fairness in Realistic Settings via Interpretability	Adam Karvonen et.al.	2506.10922	link
2025-06-12	Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization	Or Shafran et.al.	2506.10920	link
2025-06-12	Sequential-Parallel Duality in Prefix Scannable Models	Morris Yau et.al.	2506.10918	null
2025-06-12	Foundation Models for Causal Inference via Prior-Data Fitted Networks	Yuchen Ma et.al.	2506.10914	null
2025-06-12	Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?	Fei Lin et.al.	2506.10912	null
2025-06-12	NoLoCo: No-all-reduce Low Communication Training Method for Large Models	Jari Kolehmainen et.al.	2506.10911	link
2025-06-11	Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling	Tim Z. Xiao et.al.	2506.09998	null
2025-06-11	From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring	Yang Li et.al.	2506.09996	null
2025-06-11	Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages	Amel Muminovic et.al.	2506.09992	link
2025-06-11	Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation	Xinyu Yang et.al.	2506.09991	null
2025-06-11	EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits	Ron Yosef et.al.	2506.09988	null
2025-06-11	A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs	Benno Krojer et.al.	2506.09987	null
2025-06-11	V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning	Mido Assran et.al.	2506.09985	link
2025-06-11	Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs	Hiroshi Matsuda et.al.	2506.09983	link
2025-06-11	AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation	Zijie Wu et.al.	2506.09982	null
2025-06-11	SRLAgent: Enhancing Self-Regulated Learning Skills through Gamification and LLM Assistance	Wentao Ge et.al.	2506.09968	null
2025-06-11	Resa: Transparent Reasoning Models via SAEs	Shangshang Wang et.al.	2506.09967	link
2025-06-11	Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing	Junfei Wu et.al.	2506.09965	link
2025-06-11	Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy	Sushant Gautam et.al.	2506.09958	null
2025-06-11	LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge	Sahar Abdelnabi et.al.	2506.09956	link
2025-06-11	Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking	Wuwei Zhang et.al.	2506.09944	link
2025-06-11	VerIF: Verification Engineering for Reinforcement Learning in Instruction Following	Hao Peng et.al.	2506.09942	link
2025-06-11	From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models	Irving Fang et.al.	2506.09930	null
2025-06-11	PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants	Zheng Zhao et.al.	2506.09902	link
2025-06-11	The Emergence of Abstract Thought in Large Language Models Beyond Any Language	Yuxin Chen et.al.	2506.09890	null
2025-06-11	Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs	Rodion Oblovatny et.al.	2506.09886	null
2025-06-10	VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning	Li Kang et.al.	2506.09049	null
2025-06-10	Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs	Yaniv Nikankin et.al.	2506.09047	link
2025-06-10	Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation	Xiaowen Ma et.al.	2506.09046	null
2025-06-10	Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models	Xuanchi Ren et.al.	2506.09042	link
2025-06-10	Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better	Dianyi Wang et.al.	2506.09040	link
2025-06-10	AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions	Polina Kirichenko et.al.	2506.09038	link
2025-06-10	FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed	Sizhe Dang et.al.	2506.09034	null
2025-06-10	Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning	Haozhen Zhang et.al.	2506.09033	link
2025-06-10	Do MIL Models Transfer?	Daniel Shao et.al.	2506.09022	link
2025-06-10	SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning	Ruiqi Zhang et.al.	2506.09016	link
2025-06-10	Learning to Reason Across Parallel Samples for LLM Reasoning	Jianing Qi et.al.	2506.09014	null
2025-06-10	Boosting Rust Unit Test Coverage through Hybrid Program Analysis and Large Language Models	Bei Chu et.al.	2506.09002	null
2025-06-10	Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models	Chenyu Lian et.al.	2506.08990	link
2025-06-10	SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning	Xiao Liang et.al.	2506.08989	link
2025-06-10	On Finetuning Tabular Foundation Models	Ivan Rubachev et.al.	2506.08982	link
2025-06-10	AdaDec: Uncertainty-Guided Adaptive Decoding for LLM-based Code Generation	Kaifeng He et.al.	2506.08980	null
2025-06-10	Propositional Logic for Probing Generalization in Neural Networks	Anna Langedijk et.al.	2506.08978	null
2025-06-10	Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System	Yuan Guo et.al.	2506.08972	null
2025-06-10	ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations	Amirreza Rouhi et.al.	2506.08968	null
2025-06-10	Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model	Ailin Huang et.al.	2506.08967	null
2025-06-09	GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior	Penghao Wu et.al.	2506.08012	null
2025-06-09	Play to Generalize: Learning to Reason Through Game Play	Yunfei Xie et.al.	2506.08011	link
2025-06-09	Vision Transformers Don’t Need Trained Registers	Nick Jiang et.al.	2506.08010	link
2025-06-09	Hidden in plain sight: VLMs overlook their visual representations	Stephanie Fu et.al.	2506.08008	null
2025-06-09	Reinforcement Pre-Training	Qingxiu Dong et.al.	2506.08007	null
2025-06-09	Reparameterized LLM Training via Orthogonal Equivalence Transformation	Zeju Qiu et.al.	2506.08001	null
2025-06-09	Supporting Construction Worker Well-Being with a Multi-Agent Conversational AI System	Fan Yang et.al.	2506.07997	null
2025-06-09	HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization	Hongzheng Chen et.al.	2506.07972	link
2025-06-09	CyberV: Cybernetics for Test-time Scaling in Video Understanding	Jiahao Meng et.al.	2506.07971	link
2025-06-09	SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence	Ziyang Gong et.al.	2506.07966	link
2025-06-09	Reinforcing Multimodal Understanding and Generation with Dual Self-rewards	Jixiang Hong et.al.	2506.07963	null
2025-06-09	Correlated Errors in Large Language Models	Elliot Kim et.al.	2506.07962	null
2025-06-09	BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models	Peiyan Li et.al.	2506.07961	null
2025-06-09	Language Models over Canonical Byte-Pair Encodings	Tim Vieira et.al.	2506.07956	null
2025-06-09	TokenBreak: Bypassing Text Classification Models Through Token Manipulation	Kasimir Schulz et.al.	2506.07948	null
2025-06-09	Statistical Hypothesis Testing for Auditing Robustness in Language Models	Paulius Rauba et.al.	2506.07947	null
2025-06-09	ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication Protocols	Arnav Sheth et.al.	2506.07945	link
2025-06-09	Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations	Yizhen Li et.al.	2506.07943	null
2025-06-09	Adversarial Attack Classification and Robustness Testing for Large Language Models for Code	Yang Liu et.al.	2506.07942	null
2025-06-09	Gradients: When Markets Meet Fine-tuning – A Distributed Approach to Model Optimisation	Christopher Subia-Waud et.al.	2506.07940	null
2025-06-06	TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation	Muhammad Sohail Danish et.al.	2506.06281	null
2025-06-06	Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias	Yuanzhe Hu et.al.	2506.06280	null
2025-06-06	CoMemo: LVLMs Need Image Context with Image Memory	Shi Liu et.al.	2506.06279	null
2025-06-06	Movie Facts and Fibs (MF $^2$ ): A Benchmark for Long Movie Understanding	Emmanouil Zaranis et.al.	2506.06275	null
2025-06-06	AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization	Mukur Gupta et.al.	2506.06273	null
2025-06-06	RecGPT: A Foundation Model for Sequential Recommendation	Yangqin Jiang et.al.	2506.06270	link
2025-06-06	Cartridges: Lightweight and general-purpose long context representations via self-study	Sabri Eyuboglu et.al.	2506.06266	null
2025-06-06	PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time	Weizhi Zhang et.al.	2506.06254	null
2025-06-06	DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation	Jingyu Xiao et.al.	2506.06251	link
2025-06-06	Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models	Zahra Babaiee et.al.	2506.06242	null
2025-06-06	Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge	Yi Sui et.al.	2506.06240	null
2025-06-06	Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection	Sahrish Khan et.al.	2506.06238	null
2025-06-06	Challenging Vision-Language Models with Surgical Data: A New Dataset and Broad Benchmarking Study	Leon Mayer et.al.	2506.06232	null
2025-06-06	CompilerGPT: Leveraging Large Language Models for Analyzing and Acting on Compiler Optimization Reports	Peter Pirkelbauer et.al.	2506.06227	null
2025-06-06	PROVSYN: Synthesizing Provenance Graphs for Data Augmentation in Intrusion Detection Systems	Yi Huang et.al.	2506.06226	null
2025-06-06	GenIR: Generative Visual Feedback for Mental Image Retrieval	Diji Yang et.al.	2506.06220	null
2025-06-06	STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving	Christian Fruhwirth-Reisinger et.al.	2506.06218	link
2025-06-06	Corrector Sampling in Language Models	Itai Gat et.al.	2506.06215	null
2025-06-06	Can Theoretical Physics Research Benefit from Language Agents?	Sirui Lu et.al.	2506.06214	null
2025-06-06	PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts	Hengzhi Li et.al.	2506.06211	null
2025-06-05	Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets	Lei Hsiung et.al.	2506.05346	null
2025-06-05	SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs	Jiahui Wang et.al.	2506.05344	link
2025-06-05	Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning	Xingjian Ran et.al.	2506.05341	null
2025-06-05	Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models	Anirudh Bharadwaj et.al.	2506.05339	link
2025-06-05	VideoMolmo: Spatio-Temporal Grounding Meets Pointing	Ghazi Shazan Ahmad et.al.	2506.05336	link
2025-06-05	Search Arena: Analyzing Search-Augmented LLMs	Mihran Miroyan et.al.	2506.05334	link
2025-06-05	Unleashing Hour-Scale Video Training for Long Video-Language Understanding	Jingyang Lin et.al.	2506.05332	null
2025-06-05	MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning	Xinyan Chen et.al.	2506.05331	link
2025-06-05	LSM-2: Learning from Incomplete Wearable Sensor Data	Maxwell A. Xu et.al.	2506.05321	null
2025-06-06	Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs	Haoyuan Li et.al.	2506.05318	null
2025-06-05	Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay	Yifan Sun et.al.	2506.05316	null
2025-06-05	Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models	Taha Entesari et.al.	2506.05314	null
2025-06-05	ProRefine: Inference-time Prompt Refinement with Textual Feedback	Deepak Pandita et.al.	2506.05305	null
2025-06-05	Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos	Weifeng Lin et.al.	2506.05302	null
2025-06-05	Power Law Guided Dynamic Sifting for Efficient Attention	Nirav Koley et.al.	2506.05300	null
2025-06-05	Control Tax: The Price of Keeping AI in Check	Mikhail Terekhov et.al.	2506.05296	null
2025-06-05	Sample Complexity and Representation Ability of Test-time Scaling Paradigms	Baihe Huang et.al.	2506.05295	null
2025-06-05	EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?	Yuqian Yuan et.al.	2506.05287	null
2025-06-05	Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning	Nan Huo et.al.	2506.05278	null
2025-06-05	Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams	Mohammed Almutairi et.al.	2506.05265	null
2025-06-04	OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis	Junting Chen et.al.	2506.04217	link
2025-06-04	Language-Image Alignment with Fixed Text Encoders	Jingfeng Yang et.al.	2506.04209	null
2025-06-04	Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning	Shuang Chen et.al.	2506.04207	null
2025-06-04	EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation	Jinghan Jia et.al.	2506.04205	link
2025-06-04	Cascadia: A Cascade Serving System for Large Language Models	Youhe Jiang et.al.	2506.04203	null
2025-06-04	TracLLM: A Generic Framework for Attributing Long Context LLMs	Yanting Wang et.al.	2506.04202	link
2025-06-04	R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning	Qingfei Zhao et.al.	2506.04185	link
2025-06-04	SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models	Yuhao Wu et.al.	2506.04180	null
2025-06-04	SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling	Anhao Zhao et.al.	2506.04179	null
2025-06-04	Does Prompt Design Impact Quality of Data Imputation by LLMs?	Shreenidhi Srinivasan et.al.	2506.04172	null
2025-06-04	VISCA: Inferring Component Abstractions for Automated End-to-End Testing	Parsa Alian et.al.	2506.04161	null
2025-06-04	Image Editing As Programs with Diffusion Models	Yujia Hu et.al.	2506.04158	null
2025-06-04	A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization	Sarvesh Soni et.al.	2506.04156	null
2025-06-04	Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis	Kejian Zhu et.al.	2506.04142	null
2025-06-04	MMR-V: What’s Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos	Kejian Zhu et.al.	2506.04141	null
2025-06-04	TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems	Shaina Raza et.al.	2506.04133	null
2025-06-04	Recent Advances in Medical Image Classification	Loan Dao et.al.	2506.04129	null
2025-06-04	Guided Speculative Inference for Efficient Test-Time Alignment of LLMs	Jonathan Geuter et.al.	2506.04118	link
2025-06-05	Rectified Sparse Attention	Yutao Sun et.al.	2506.04108	null
2025-06-04	TextAtari: 100K Frames Game Playing with Language Agents	Wenhao Li et.al.	2506.04098	link
2025-06-03	Causal Estimation of Tokenisation Bias	Pietro Lesci et.al.	2506.03149	null
2025-06-03	UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation	Bin Lin et.al.	2506.03147	null
2025-06-03	Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM	Pralaypati Ta et.al.	2506.03145	null
2025-06-03	Not All Tokens Are Meant to Be Forgotten	Xiangyu Zhou et.al.	2506.03142	null
2025-06-03	SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation	Siqi Chen et.al.	2506.03139	null
2025-06-03	OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models	Mengdi Jia et.al.	2506.03135	null
2025-06-03	Native-Resolution Image Synthesis	Zidong Wang et.al.	2506.03131	null
2025-06-03	AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation	Lu Qiu et.al.	2506.03126	null
2025-06-03	AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation	Prashanth Vijayaraghavan et.al.	2506.03122	null
2025-06-03	Targeted Forgetting of Image Subgroups in CLIP Models	Zeliang Zhang et.al.	2506.03117	null
2025-06-04	Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback	Xiaoying Zhang et.al.	2506.03106	null
2025-06-03	Beyond Text Compression: Evaluating Tokenizers Across Scales	Jonas F. Lotz et.al.	2506.03101	null
2025-06-03	TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models	Chetwin Low et.al.	2506.03099	null
2025-06-03	EgoVLM: Policy Optimization for Egocentric Video Understanding	Ashwin Vinod et.al.	2506.03097	link
2025-06-03	DPO Learning with LLMs-Judge Signal for Computer Use Agents	Man Luo et.al.	2506.03095	null
2025-06-03	From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit	Valérie Costa et.al.	2506.03093	null
2025-06-03	Literary Evidence Retrieval via Long-Context Language Models	Katherine Thai et.al.	2506.03090	null
2025-06-03	StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs	Qijun Luo et.al.	2506.03077	null
2025-06-03	LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM	Roman Titkov et.al.	2506.03073	null
2025-06-03	EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models	Mingzhe Li et.al.	2506.03067	null
2025-05-30	ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL	Yu Zhang et.al.	2505.24875	null
2025-05-30	The Road to Generalizable Neuro-Symbolic Learning Should be Paved with Foundation Models	Adam Stein et.al.	2505.24874	link
2025-05-30	ProxyThinker: Test-Time Guidance through Small Visual Reasoners	Zilin Xiao et.al.	2505.24872	link
2025-05-30	MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning	Yiqing Liang et.al.	2505.24871	null
2025-05-30	GenSpace: Benchmarking Spatially-Aware Image Generation	Zehan Wang et.al.	2505.24870	null
2025-05-30	SiLVR: A Simple Language-based Video Reasoning Framework	Ce Zhang et.al.	2505.24869	link
2025-05-30	Time Blindness: Why Video-Language Models Can’t See What Humans Can?	Ujjwal Upadhyay et.al.	2505.24867	null
2025-05-30	ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models	Mingjie Liu et.al.	2505.24864	link
2025-05-30	Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization	Joschka Braun et.al.	2505.24859	null
2025-05-30	Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking	Heli Ben-Hamu et.al.	2505.24857	null
2025-05-30	MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning	Jingyan Shen et.al.	2505.24846	null
2025-05-30	Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning	Wanyun Xie et.al.	2505.24844	link
2025-05-30	Cascading Adversarial Bias from Injection to Distillation in Language Models	Harsh Chaudhari et.al.	2505.24842	null
2025-05-30	Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck	Yuwen Tan et.al.	2505.24840	null
2025-05-30	VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software	Brandon Man et.al.	2505.24838	link
2025-06-02	How much do language models memorize?	John X. Morris et.al.	2505.24832	null
2025-05-30	Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs	Juraj Vladika et.al.	2505.24830	null
2025-05-30	LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text	Li yunhan et.al.	2505.24826	link
2025-05-30	PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models	Yinggan Xu et.al.	2505.24823	null
2025-05-30	Bi-Manual Joint Camera Calibration and Scene Representation	Haozhan Tang et.al.	2505.24819	null
2025-05-29	TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models	Yao Xiao et.al.	2505.23769	link
2025-05-29	Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought	Yunze Man et.al.	2505.23766	null
2025-05-29	From Chat Logs to Collective Insights: Aggregative Question Answering	Wentao Zhang et.al.	2505.23765	null
2025-05-29	MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence	Sihan Yang et.al.	2505.23764	null
2025-05-29	ZeroGUI: Automating Online GUI Learning at Zero Human Cost	Chenyu Yang et.al.	2505.23762	link
2025-05-29	Differential Information: An Information-Theoretic Perspective on Preference Optimization	Yunjae Won et.al.	2505.23761	null
2025-05-29	Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint	Heekyung Lee et.al.	2505.23759	link
2025-05-29	DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning	Ziyin Zhang et.al.	2505.23754	link
2025-05-29	ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks	Akashah Shabbir et.al.	2505.23752	link
2025-05-29	Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?	Paul Gölz et.al.	2505.23749	null
2025-05-29	Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence	Diankun Wu et.al.	2505.23747	null
2025-05-29	To Trust Or Not To Trust Your Vision-Language Model’s Prediction	Hao Dong et.al.	2505.23745	link
2025-05-29	LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization	Ronghuan Wu et.al.	2505.23740	null
2025-05-29	ATLAS: Learning to Optimally Memorize the Context at Test Time	Ali Behrouz et.al.	2505.23735	null
2025-05-29	Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time	Mohamad Chehade et.al.	2505.23729	null
2025-05-29	PixelThink: Towards Efficient Chain-of-Pixel Reasoning	Song Wang et.al.	2505.23727	null
2025-05-29	FMG-Det: Foundation Model Guided Robust Object Detection	Darryl Hannan et.al.	2505.23726	null
2025-05-29	MuLoCo: Muon is a practical inner optimizer for DiLoCo	Benjamin Thérien et.al.	2505.23725	null
2025-05-29	SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA	Minrui Luo et.al.	2505.23724	null
2025-05-29	ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering	Zexi Liu et.al.	2505.23723	link
2025-05-28	Zero-Shot Vision Encoder Grafting via LLM Surrogates	Kaiyu Yue et.al.	2505.22664	link
2025-05-28	Training Free Stylized Abstraction	Aimon Rahman et.al.	2505.22663	null
2025-05-28	AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models	Feng Luo et.al.	2505.22662	null
2025-05-28	GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning	Qingchen Yu et.al.	2505.22661	null
2025-05-28	Maximizing Confidence Alone Improves Reasoning	Mihir Prabhudesai et.al.	2505.22660	null
2025-05-28	3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model	Wenbo Hu et.al.	2505.22657	null
2025-05-28	Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents	Michael Kirchhof et.al.	2505.22655	null
2025-05-28	VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models	Ce Zhang et.al.	2505.22654	null
2025-05-28	The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason	Ang Lv et.al.	2505.22653	null
2025-05-28	Sherlock: Self-Correcting Reasoning in Vision-Language Models	Yi Ding et.al.	2505.22651	null
2025-05-28	Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese	Hanjia Lyu et.al.	2505.22645	link
2025-05-28	Understanding (Un)Reliability of Steering Vectors in Language Models	Joschka Braun et.al.	2505.22637	null
2025-05-28	Learning Composable Chains-of-Thought	Fangcong Yin et.al.	2505.22635	null
2025-05-28	Spatial Knowledge Graph-Guided Multimodal Synthesis	Yida Xue et.al.	2505.22633	null
2025-05-28	Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs	Ziling Cheng et.al.	2505.22630	null
2025-05-28	Principled Out-of-Distribution Generalization via Simplicity	Jiawei Ge et.al.	2505.22622	null
2025-05-28	Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding	Chengyue Wu et.al.	2505.22618	null
2025-05-28	The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models	Ganqu Cui et.al.	2505.22617	null
2025-05-28	RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction	Yuchi Wang et.al.	2505.22613	null
2025-05-28	Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates	Haoning Xu et.al.	2505.22608	null
2025-05-27	Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making	Yihan Wang et.al.	2505.21503	null
2025-05-27	ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models	Dingming Li et.al.	2505.21500	null
2025-05-27	AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery	Haowei Wang et.al.	2505.21499	link
2025-05-27	Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment	Xiaojun Jia et.al.	2505.21494	link
2025-05-27	Reinforcing General Reasoning without Verifiers	Xiangxin Zhou et.al.	2505.21493	link
2025-05-27	Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming	Yang Yang et.al.	2505.21486	null
2025-05-27	Are Language Models Consequentialist or Deontological Moral Reasoners?	Keenan Samway et.al.	2505.21479	null
2025-05-27	Policy Optimized Text-to-Image Pipeline Design	Uri Gadot et.al.	2505.21478	null
2025-05-27	Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration	Mehrdad Fazli et.al.	2505.21472	null
2025-05-27	Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration	Zijun Liu et.al.	2505.21471	link
2025-05-27	Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion	Zhanqiu Hu et.al.	2505.21467	null
2025-05-27	ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models	Bozhou Li et.al.	2505.21465	null
2025-05-27	LazyVLM: Neuro-Symbolic Approach to Video Analytics	Xiangru Jian et.al.	2505.21459	null
2025-05-27	Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance	Shintaro Ozaki et.al.	2505.21458	null
2025-05-27	Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO	Muzhi Zhu et.al.	2505.21457	null
2025-05-27	Can Large Reasoning Models Self-Train?	Sheikh Shafayat et.al.	2505.21444	null
2025-05-27	Towards Better Instruction Following Retrieval Models	Yuchen Zhuang et.al.	2505.21439	null
2025-05-27	Hume: Introducing System-2 Thinking in Visual-Language-Action Model	Haoming Song et.al.	2505.21432	null
2025-05-27	Policy Induction: Predicting Startup Success via Explainable Memory-Augmented In-Context Learning	Xianling Mu et.al.	2505.21427	null
2025-05-27	GUARD:Dual-Agent based Backdoor Defense on Chain-of-Thought in Neural Code Generation	Naizhu Jin et.al.	2505.21425	null
2025-05-26	Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs	Hanting Chen et.al.	2505.20155	null
2025-05-26	UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models	Xueyan Zhang et.al.	2505.20154	null
2025-05-26	MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents	Ziming Wei et.al.	2505.20148	link
2025-05-26	FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities	Jin Wang et.al.	2505.20147	null
2025-05-26	SeMe: Training-Free Language Model Merging via Semantic Alignment	Jian Gu et.al.	2505.20144	null
2025-05-26	StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs	Jialin Yang et.al.	2505.20139	null
2025-05-26	AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings	Konstantin Dobler et.al.	2505.20133	null
2025-05-26	Agentic 3D Scene Generation with Spatially Contextualized VLMs	Xinhang Liu et.al.	2505.20129	null
2025-05-26	Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers	Zhengliang Shi et.al.	2505.20128	link
2025-05-26	Agentic AI Process Observability: Discovering Behavioral Variability	Fabiana Fournier et.al.	2505.20127	null
2025-05-26	MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models	Anh Thai et.al.	2505.20122	null
2025-05-27	TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent	Dominik Meier et.al.	2505.20118	link
2025-05-26	Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi’s Zibaldone	Cristian Santini et.al.	2505.20113	null
2025-05-26	ResSVD: Residual Compensated SVD for Large Language Model Compression	Haolei Bai et.al.	2505.20112	null
2025-05-26	Language-Agnostic Suicidal Risk Detection Using Large Language Models	June-Woo Kim et.al.	2505.20109	null
2025-05-26	Adaptive Deep Reasoning: Triggering Deep Thinking When Needed	Yunhao Wang et.al.	2505.20101	null
2025-05-26	AdaTP: Attention-Debiased Token Pruning for Video Large Language Models	Fengyuan Sun et.al.	2505.20100	null
2025-05-26	Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities	Chuangtao Ma et.al.	2505.20099	link
2025-05-26	S2LPP: Small-to-Large Prompt Prediction across LLMs	Liang Cheng et.al.	2505.20097	null
2025-05-26	Multi-Domain Explainability of Preferences	Nitay Calderon et.al.	2505.20088	null
2025-05-26	Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models	Makesh Narsimhan Sreedhar et.al.	2505.20087	null
2025-05-26	Inference-time Alignment in Continuous Space	Yige Yuan et.al.	2505.20081	link
2025-05-23	Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs	Wafa Alghallabi et.al.	2505.18152	link
2025-05-23	First Finish Search: Efficient Test-Time Scaling in Large Language Models	Aradhye Agarwal et.al.	2505.18149	null
2025-05-23	Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find	Owen Bianchi et.al.	2505.18148	null
2025-05-23	Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection	Mykola Trokhymovych et.al.	2505.18136	null
2025-05-23	Gaming Tool Preferences in Agentic LLMs	Kazem Faghih et.al.	2505.18135	link
2025-05-23	VideoGameBench: Can Vision-Language Models complete popular video games?	Alex L. Zhang et.al.	2505.18134	null
2025-05-23	One RL to See Them All: Visual Triple Unified Reinforcement Learning	Yan Ma et.al.	2505.18129	null
2025-05-23	Reward Model Overoptimisation in Iterated RLHF	Lorenz Wolf et.al.	2505.18126	null
2025-05-23	TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations	Alan Arazi et.al.	2505.18125	null
2025-05-23	UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification	Poojah Ganesan et.al.	2505.18122	null
2025-05-23	ProgRM: Build Better GUI Agents with Progress Rewards	Danyang Zhang et.al.	2505.18121	null
2025-05-23	Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models	Jiongran Wu et.al.	2505.18120	null
2025-05-23	Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM	Zinuo Li et.al.	2505.18110	null
2025-05-23	ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework	Lisheng Huang et.al.	2505.18105	link
2025-05-23	How Can I Publish My LLM Benchmark Without Giving the True Answers Away?	Takashi Ishida et.al.	2505.18102	null
2025-05-23	Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL	Joey Hong et.al.	2505.18098	null
2025-05-23	QwenLong-CPRS: Towards $\infty$ -LLMs with Dynamic Context Optimization	Weizhou Shen et.al.	2505.18092	null
2025-05-23	Data Mixing Can Induce Phase Transitions in Knowledge Acquisition	Xinran Gu et.al.	2505.18091	null
2025-05-23	CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays	Hyungyung Lee et.al.	2505.18087	link
2025-05-23	Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding	Xiaoyi Zhang et.al.	2505.18079	null
2025-05-22	CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms	Shilin Yan et.al.	2505.17020	link
2025-05-22	Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework	Chenhao Zhang et.al.	2505.17019	link
2025-05-22	SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward	Kaixuan Fan et.al.	2505.17018	link
2025-05-22	Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO	Chengzhuo Tong et.al.	2505.17017	link
2025-05-22	Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models	Runsen Xu et.al.	2505.17015	null
2025-05-22	SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding	Haoning Wu et.al.	2505.17012	link
2025-05-22	R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning	Huatong Song et.al.	2505.17005	link
2025-05-22	Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?	Jin Jiang et.al.	2505.16998	link
2025-05-22	DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization	Chao Zhang et.al.	2505.16995	null
2025-05-22	Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding	Runpeng Yu et.al.	2505.16990	link
2025-05-22	T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning	Amartya Chakraborty et.al.	2505.16986	null
2025-05-22	UFT: Unifying Supervised and Reinforcement Fine-Tuning	Mingyang Liu et.al.	2505.16984	link
2025-05-22	LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding	Junlong Tong et.al.	2505.16983	link
2025-05-22	Beyond Correlation: Towards Causal Large Language Model Agents in Biomedicine	Adib Bazgir et.al.	2505.16982	null
2025-05-22	HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation	Weizhi Tang et.al.	2505.16978	link
2025-05-22	SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development	Yaxin Du et.al.	2505.16975	link
2025-05-22	CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark	Ahmed Heakl et.al.	2505.16968	link
2025-05-22	Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models	Junjie Xiong et.al.	2505.16957	null
2025-05-22	On Multilingual Encoder Language Model Compression for Low-Resource Languages	Daniil Gurgurov et.al.	2505.16956	null
2025-05-22	A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial Optimization	Shengyu Feng et.al.	2505.16952	null
2025-05-21	InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition	Yijie Zheng et.al.	2505.15818	link
2025-05-21	On the creation of narrow AI: hierarchy and nonlocality of neural network skills	Eric J. Michaud et.al.	2505.15811	link
2025-05-21	MMaDA: Multimodal Large Diffusion Language Models	Ling Yang et.al.	2505.15809	link
2025-05-21	The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation	Patrick Kahardipraja et.al.	2505.15807	link
2025-05-21	Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering	Hwan Chang et.al.	2505.15805	link
2025-05-21	STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs	Zongzhao Li et.al.	2505.15804	link
2025-05-21	VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models	Yuchen Yan et.al.	2505.15801	null
2025-05-21	Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning	Taehoon Kim et.al.	2505.15798	null
2025-05-21	Reverse Engineering Human Preferences with Reinforcement Learning	Lisa Alazraki et.al.	2505.15795	null
2025-05-21	HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving	Zhiwen Chen et.al.	2505.15793	null
2025-05-21	Large Language Models as Computable Approximations to Solomonoff Induction	Jun Wan et.al.	2505.15784	null
2025-05-21	dKV-Cache: The Cache for Diffusion Language Models	Xinyin Ma et.al.	2505.15781	link
2025-05-21	ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning	Changtai Zhu et.al.	2505.15776	link
2025-05-21	Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention	Huanxuan Liao et.al.	2505.15774	link
2025-05-21	MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling	Cheng Yifan et.al.	2505.15772	null
2025-05-21	An Empirical Analysis of Vulnerability Detection Tools for Solidity Smart Contracts Using Line Level Manually Annotated Vulnerabilities	Francesco Salzano et.al.	2505.15756	null
2025-05-21	Exploring The Visual Feature Space for Multimodal Neural Decoding	Weihao Xia et.al.	2505.15755	null
2025-05-21	Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval	Taiye Chen et.al.	2505.15753	null
2025-05-21	Multi-modal Integration Analysis of Alzheimer’s Disease Using Large Language Models and Knowledge Graphs	Kanan Kiguchi et.al.	2505.15747	null
2025-05-21	Evolutionary Computation and Large Language Models: A Survey of Methods, Synergies, and Applications	Dikshit Chauhan et.al.	2505.15741	null
2025-05-20	Language Models use Lookbacks to Track Beliefs	Nikhil Prakash et.al.	2505.14685	null
2025-05-20	Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning	Haolei Xu et.al.	2505.14684	null
2025-05-20	Emerging Properties in Unified Multimodal Pretraining	Chaorui Deng et.al.	2505.14683	null
2025-05-20	UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation	Rui Tian et.al.	2505.14682	null
2025-05-20	UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models	Xiaojie Gu et.al.	2505.14679	link
2025-05-20	Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning	Jiaer Xia et.al.	2505.14677	null
2025-05-20	Reward Reasoning Model	Jiaxin Guo et.al.	2505.14674	null
2025-05-20	UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens	Ruichuan An et.al.	2505.14671	link
2025-05-20	Quartet: Native FP4 Training Can Be Optimal for Large Language Models	Roberto L. Castro et.al.	2505.14669	link
2025-05-20	ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions	Bufang Yang et.al.	2505.14668	null
2025-05-20	Beyond Words: Multimodal LLM Knows When to Speak	Zikai Liao et.al.	2505.14654	null
2025-05-20	General-Reasoner: Advancing LLM Reasoning Across All Domains	Xueguang Ma et.al.	2505.14652	null
2025-05-20	Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits	Tiantian Feng et.al.	2505.14648	link
2025-05-20	CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation	Anna C. Doris et.al.	2505.14646	link
2025-05-20	Think Only When You Need with Large Hybrid-Reasoning Models	Lingjie Jiang et.al.	2505.14631	null
2025-05-20	KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models	Fnu Mohbat et.al.	2505.14629	link
2025-05-20	Debating for Better Reasoning: An Unsupervised Multimodal Approach	Ashutosh Adhikari et.al.	2505.14627	null
2025-05-20	TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning	Zhangchen Xu et.al.	2505.14625	link
2025-05-20	Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs	Morgan Lindsay Heisler et.al.	2505.14620	null
2025-05-20	Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models	Sahar Abdelnabi et.al.	2505.14617	link
2025-05-19	CIE: Controlling Language Model Text Generations Using Continuous Signals	Vinay Samuel et.al.	2505.13448	link
2025-05-19	Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards	Xiaoyuan Liu et.al.	2505.13445	link
2025-05-19	ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models	Liyan Tang et.al.	2505.13444	null
2025-05-19	GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation	Abhay Deshpande et.al.	2505.13441	null
2025-05-19	Optimizing Anytime Reasoning via Budget Relative Policy Optimization	Penghui Qi et.al.	2505.13438	link
2025-05-19	SMOTExT: SMOTE meets Large Language Models	Mateusz Bystroński et.al.	2505.13434	null
2025-05-19	Fine-tuning Quantized Neural Networks with Zeroth-order Optimization	Sifeng Shang et.al.	2505.13430	link
2025-05-19	MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision	Lingxiao Du et.al.	2505.13427	link
2025-05-19	G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning	Liang Chen et.al.	2505.13426	link
2025-05-19	Learnware of Language Models: Specialized Small Language Models Can Do Big	Zhi-Hao Tan et.al.	2505.13425	link
2025-05-19	Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard	Si-Yang Liu et.al.	2505.13421	null
2025-05-19	FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning	Zhuozhao Hu et.al.	2505.13419	link
2025-05-19	CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process	Jinhe Bi et.al.	2505.13408	null
2025-05-19	AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database	Rong Bian et.al.	2505.13406	null
2025-05-19	MR. Judge: Multimodal Reasoner as a Judge	Renjie Pi et.al.	2505.13403	null
2025-05-19	R3: Robust Rubric-Agnostic Reward Models	David Anugraha et.al.	2505.13388	link
2025-05-19	CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition	Nam V. Nguyen et.al.	2505.13380	link
2025-05-19	Thinkless: LLM Learns When to Think	Gongfan Fang et.al.	2505.13379	link
2025-05-19	Seeing, Saying, Solving: An LLM-to-TL Framework for Cooperative Robots	Dan BW Choe et.al.	2505.13376	null
2025-05-19	Multi-Armed Bandits Meet Large Language Models	Djallel Bouneffouf et.al.	2505.13355	null
2025-05-16	Modeling cognitive processes of natural reading with transformer-based Language Models	Bruno Bianchi et.al.	2505.11485	null
2025-05-16	msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML	Zhaolan Huang et.al.	2505.11483	link
2025-05-16	Improving Assembly Code Performance with Large Language Models via Reinforcement Learning	Anjiang Wei et.al.	2505.11480	null
2025-05-16	HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages	Zhilin Wang et.al.	2505.11475	null
2025-05-16	Disentangling Reasoning and Knowledge in Medical Large Language Models	Rahul Thapa et.al.	2505.11462	null
2025-05-16	ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks	Zhixiong Zhuang et.al.	2505.11459	null
2025-05-16	LLMs unlock new paths to monetizing exploits	Nicholas Carlini et.al.	2505.11449	null
2025-05-16	Is Compression Really Linear with Code Intelligence?	Xianzhen Luo et.al.	2505.11441	null
2025-05-16	GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art	Chenkai Zhang et.al.	2505.11436	link
2025-05-16	MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production	Chao Jin et.al.	2505.11432	null
2025-05-16	Mergenetic: a Simple Evolutionary Model Merging Library	Adrian Robert Minut et.al.	2505.11427	link
2025-05-16	When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs	Xiaomin Li et.al.	2505.11423	null
2025-05-16	Towards Cultural Bridge by Bahnaric-Vietnamese Translation Using Transfer Learning of Sequence-To-Sequence Pre-training Language Model	Phan Tran Minh Dat et.al.	2505.11421	null
2025-05-16	EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions	Patryk Bartkowiak et.al.	2505.11417	link
2025-05-16	MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Yinsicheng Jiang et.al.	2505.11415	null
2025-05-16	CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs	Sijia Chen et.al.	2505.11413	null
2025-05-16	Visual Planning: Let’s Think Only with Images	Yi Xu et.al.	2505.11409	link
2025-05-16	Large Language Model Use Impact Locus of Control	Jenny Xiyu Fu et.al.	2505.11406	null
2025-05-16	EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	Bohao Xing et.al.	2505.11405	link
2025-05-16	Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner	Wenchuan Zhang et.al.	2505.11404	link
2025-05-15	End-to-End Vision Tokenizer Tuning	Wenxuan Wang et.al.	2505.10562	null
2025-05-15	Neural Thermodynamic Laws for Large Language Model Training	Ziming Liu et.al.	2505.10559	null
2025-05-15	Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data	Yiwen Liu et.al.	2505.10551	link
2025-05-15	Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning	Milan Ganai et.al.	2505.10547	null
2025-05-15	Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models	Annie Wong et.al.	2505.10543	link
2025-05-15	Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis	Pengfei Wang et.al.	2505.10541	link
2025-05-15	S3C2 Summit 2024-09: Industry Secure Software Supply Chain Summit	Imranur Rahman et.al.	2505.10538	null
2025-05-15	WorldPM: Scaling Human Preference Modeling	Binghai Wang et.al.	2505.10527	link
2025-05-15	MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models	Mugilan Ganesan et.al.	2505.10526	null
2025-05-15	Multi-Token Prediction Needs Registers	Anastasios Gerontopoulos et.al.	2505.10518	link
2025-05-15	RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs	Vibha Belavadi et.al.	2505.10495	null
2025-05-15	Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective	Yutao Mou et.al.	2505.10494	link
2025-05-15	CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning	Shaohan Wang et.al.	2505.10493	null
2025-05-15	Campus AI vs Commercial AI: A Late-Breaking Study on How LLM As-A-Service Customizations Shape Trust and Usage Patterns	Leon Hannig et.al.	2505.10490	null
2025-05-15	Parallel Scaling Law for Language Models	Mouxiang Chen et.al.	2505.10475	link
2025-05-15	Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI	Agnik Saha et.al.	2505.10472	null
2025-05-15	AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge	Ranjan Sapkota et.al.	2505.10468	null
2025-05-15	Superposition Yields Robust Neural Scaling	Yizhou liu et.al.	2505.10465	link
2025-05-15	Vision language models have difficulty recognizing virtual objects	Tyler Tran et.al.	2505.10453	null
2025-05-15	Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	Zemin Huang et.al.	2505.10446	null
2025-05-14	Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?	Anthony GX-Chen et.al.	2505.09614	null
2025-05-14	Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors	Nicolas Dupuis et.al.	2505.09610	null
2025-05-14	Adversarial Suffix Filtering: a Defense Pipeline for LLMs	David Khachaturov et.al.	2505.09602	null
2025-05-14	How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference	Nidhal Jegham et.al.	2505.09598	null
2025-05-14	WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models	Abdullah Mushtaq et.al.	2505.09595	null
2025-05-14	Variational Visual Question Answering	Tobias Jan Wieczorek et.al.	2505.09591	null
2025-05-15	Beyond Likes: How Normative Feedback Complements Engagement Signals on Social Media	Yuchen Wu et.al.	2505.09583	null
2025-05-14	VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation	Chaofan Zhang et.al.	2505.09577	null
2025-05-14	Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach	Shannon Lodoen et.al.	2505.09576	null
2025-05-14	MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8	Linbo Liu et.al.	2505.09569	link
2025-05-14	Using Foundation Models as Pseudo-Label Generators for Pre-Clinical 4D Cardiac CT Segmentation	Anne-Marie Rickmann et.al.	2505.09564	null
2025-05-14	WavReward: Spoken Dialogue Models With Generalist Reward Evaluators	Shengpeng Ji et.al.	2505.09558	link
2025-05-14	PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	Zongqian Li et.al.	2505.09519	link
2025-05-15	Towards Fair In-Context Learning with Tabular Foundation Models	Patrik Kenfack et.al.	2505.09503	null
2025-05-14	Layered Unlearning for Adversarial Relearning	Timothy Qian et.al.	2505.09500	link
2025-05-14	Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput	Bo Zhang et.al.	2505.09498	null
2025-05-14	Card Sorting Simulator: Augmenting Design of Logical Information Architectures with Large Language Models	Eduard Kuric et.al.	2505.09478	null
2025-05-14	Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities	Zachary Ravichandran et.al.	2505.09477	null
2025-05-14	Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment	Paul Tschisgale et.al.	2505.09438	null
2025-05-14	CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios	Raghav Garg et.al.	2505.09436	link
2025-05-13	CodePDE: An Inference Framework for LLM-driven PDE Solver Generation	Shanda Li et.al.	2505.08783	link
2025-05-13	HealthBench: Evaluating Large Language Models Towards Improved Human Health	Rahul K. Arora et.al.	2505.08775	link
2025-05-14	Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology	Yatai Ji et.al.	2505.08765	null
2025-05-13	Aya Vision: Advancing the Frontier of Multilingual Multimodality	Saurabh Dash et.al.	2505.08751	null
2025-05-13	AC-Reason: Towards Theory-Guided Actual Causality Reasoning with Large Language Models	Yanxi Zhang et.al.	2505.08750	link
2025-05-13	DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models	Xiaoyang Chen et.al.	2505.08744	link
2025-05-13	Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies	Xiaoliang Luo et.al.	2505.08739	link
2025-05-13	Towards Foundation Models for Experimental Readout Systems Combining Discrete and Continuous Data	James Giroux et.al.	2505.08736	link
2025-05-13	NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context	Ben Yao et.al.	2505.08734	null
2025-05-13	Securing RAG: A Risk Assessment and Mitigation Framework	Lukas Ammann et.al.	2505.08728	null
2025-05-13	Memorization-Compression Cycles Improve Generalization	Fangyuan Yu et.al.	2505.08727	null
2025-05-13	Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving	Zongchuang Zhao et.al.	2505.08725	link
2025-05-13	TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series	Xiaolei Qin et.al.	2505.08723	link
2025-05-13	PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts	Yang Su et.al.	2505.08719	null
2025-05-13	Controllable Image Colorization with Instance-aware Texts and Masks	Yanru An et.al.	2505.08705	null
2025-05-13	LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs	K M Sajjadul Islam et.al.	2505.08704	null
2025-05-14	Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities	George Saon et.al.	2505.08699	null
2025-05-13	VizCV: AI-assisted visualization of researchers’ publications tracks	Vladimír Lazárik et.al.	2505.08691	null
2025-05-13	Adaptive Schema-aware Event Extraction with Retrieval-Augmented Generation	Sheng Liang et.al.	2505.08690	null
2025-05-13	A Social Robot with Inner Speech for Dietary Guidance	Valerio Belcamino et.al.	2505.08664	link
2025-05-12	DanceGRPO: Unleashing GRPO on Visual Generation	Zeyue Xue et.al.	2505.07818	null
2025-05-12	Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models	Seungjae Lee et.al.	2505.07815	null
2025-05-12	Learning Dynamics in Continual Pre-Training for Large Language Models	Xingjin Wang et.al.	2505.07796	null
2025-05-12	Domain Regeneration: How well do LLMs match syntactic properties of text domains?	Da Ju et.al.	2505.07784	null
2025-05-12	Relative Overfitting and Accept-Reject Framework	Yanxin Liu et.al.	2505.07783	null
2025-05-12	MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering	Rushi Qiang et.al.	2505.07782	link
2025-05-12	Must Read: A Systematic Survey of Computational Persuasion	Nimet Beyza Bozdag et.al.	2505.07775	link
2025-05-12	Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving	Xinji Mai et.al.	2505.07773	link
2025-05-12	Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding	Yifeng Di et.al.	2505.07768	link
2025-05-12	BodyGPS: Anatomical Positioning System	Halid Ziya Yerebakan et.al.	2505.07744	null
2025-05-12	Assessing the Chemical Intelligence of Large Language Models	Nicholas T. Runcie et.al.	2505.07735	link
2025-05-12	Spoken Language Understanding on Unseen Tasks With In-Context Learning	Neeraj Agrawal et.al.	2505.07731	null
2025-05-12	Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction	Jingfen Qiao et.al.	2505.07730	link
2025-05-12	Circuit Partitioning Using Large Language Models for Quantum Compilation and Simulations	Pranav Sinha et.al.	2505.07711	null
2025-05-12	Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images	Elisei Rykov et.al.	2505.07704	null
2025-05-12	PatchTrack: A Comprehensive Analysis of ChatGPT’s Influence on Pull Request Outcomes	Daniel Ogenrwot et.al.	2505.07700	null
2025-05-12	Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models	Songlin Dong et.al.	2505.07690	null
2025-05-12	S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models	Muzhi Dai et.al.	2505.07686	null
2025-05-12	Multimodal Survival Modeling in the Age of Foundation Models	Steven Song et.al.	2505.07683	link
2025-05-12	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	Hang Wu et.al.	2505.07680	null
2025-05-09	Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks	Christos Plachouras et.al.	2505.06224	link
2025-05-09	Adapting a Segmentation Foundation Model for Medical Image Classification	Pengfei Gu et.al.	2505.06217	null
2025-05-09	From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling	Vahid Rahimzadeh et.al.	2505.06184	null
2025-05-09	A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows	Linjiang Cao et.al.	2505.06178	null
2025-05-09	MonetGPT: Solving Puzzles Enhances MLLMs’ Image Retouching Skills	Niladri Shekhar Dutt et.al.	2505.06176	null
2025-05-09	Turbo-ICL: In-Context Learning-Based Turbo Equalization	Zihang Song et.al.	2505.06175	null
2025-05-09	MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks	Wenqi Zeng et.al.	2505.06152	link
2025-05-09	A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets	Ryan Lagasse et.al.	2505.06150	null
2025-05-09	Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study	Faeze Ghorbanpour et.al.	2505.06149	null
2025-05-09	LLMs Get Lost In Multi-Turn Conversation	Philippe Laban et.al.	2505.06120	link
2025-05-09	LLMs Outperform Experts on Challenging Biology Benchmarks	Lennart Justen et.al.	2505.06108	null
2025-05-09	Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs	Sam Bush et.al.	2505.06096	null
2025-05-09	Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities	Hiari Pizzini Cavagna et.al.	2505.06085	null
2025-05-09	Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information	Joshua Harris et.al.	2505.06046	null
2025-05-09	Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification	Leon Eshuijs et.al.	2505.06032	link
2025-05-09	Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation	Stefan Vasilev et.al.	2505.06027	null
2025-05-09	ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding	Shuai Wang et.al.	2505.06020	null
2025-05-09	Exploring the Feasibility of Multilingual Grammatical Error Correction with a Single LLM up to 9B parameters: A Comparative Study of 17 Models	Dawid Wisniewski et.al.	2505.06004	link
2025-05-09	Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition	Congqi Cao et.al.	2505.06002	link
2025-05-09	Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models	Lennart Stöpler et.al.	2505.05970	null
2025-05-08	Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation	Chao Liao et.al.	2505.05472	null
2025-05-08	Generating Physically Stable and Buildable LEGO Designs from Text	Ava Pun et.al.	2505.05469	link
2025-05-08	StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant	Haibo Wang et.al.	2505.05467	null
2025-05-08	ComPO: Preference Alignment via Comparison Oracles	Peter Chen et.al.	2505.05465	null
2025-05-08	Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging	Shiqi Chen et.al.	2505.05464	link
2025-05-08	UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections	Fatima Haouari et.al.	2505.05459	null
2025-05-08	SITE: towards Spatial Intelligence Thorough Evaluation	Wenqi Wang et.al.	2505.05456	null
2025-05-08	Conversational Process Model Redesign	Nataliia Klievtsova et.al.	2505.05453	null
2025-05-08	clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations	Chalamalasetti Kranti et.al.	2505.05445	null
2025-05-08	GesPrompt: Leveraging Co-Speech Gestures to Augment LLM-Based Interaction in Virtual Reality	Xiyun Hu et.al.	2505.05441	null
2025-05-09	EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation	Biao Yi et.al.	2505.05440	null
2025-05-08	Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data	Yudong Wang et.al.	2505.05427	null
2025-05-09	LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering	Ran Zhang et.al.	2505.05423	link
2025-05-08	Crosslingual Reasoning through Test-Time Scaling	Zheng-Xin Yong et.al.	2505.05408	link
2025-05-08	Frame In, Frame Out: Do LLMs Generate More Biased News Headlines than Humans?	Valeria Pastorino et.al.	2505.05406	null
2025-05-08	A Pain Assessment Framework based on multimodal data and Deep Machine Learning methods	Stefanos Gkikas et.al.	2505.05396	null
2025-05-08	DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning	Wenru Liu et.al.	2505.05360	null
2025-05-08	Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization	Sooyoung Park et.al.	2505.05343	link
2025-05-08	FLAM: Frame-Wise Language-Audio Modeling	Yusong Wu et.al.	2505.05335	null
2025-05-08	ICon: In-Context Contribution for Automatic Data Selection	Yixin Yang et.al.	2505.05327	null
2025-05-07	EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning	Zhenghao Xing et.al.	2505.04623	link
2025-05-07	On Path to Multimodal Generalist: General-Level and General-Bench	Hao Fei et.al.	2505.04620	null
2025-05-07	OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution	Lianghong Guo et.al.	2505.04606	link
2025-05-07	OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning	Xianhang Li et.al.	2505.04601	null
2025-05-08	MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection	Zhihao Zhang et.al.	2505.04594	null
2025-05-07	ZeroSearch: Incentivize the Search Capability of LLMs without Searching	Hao Sun et.al.	2505.04588	link
2025-05-07	SlideItRight: Using AI to Find Relevant Slides and Provide Feedback for Open-Ended Questions	Chloe Qianhui Zhao et.al.	2505.04584	link
2025-05-07	Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization	Wenjun Cao et.al.	2505.04578	null
2025-05-07	Communication-Efficient Federated Fine-Tuning of Language Models via Dynamic Update Schedules	Michail Theologitis et.al.	2505.04535	link
2025-05-07	Overcoming Data Scarcity in Generative Language Modelling for Low-Resource Languages: A Systematic Review	Josh McGiff et.al.	2505.04531	null
2025-05-07	Comparative Analysis of Carbon Footprint in Manual vs. LLM-Assisted Code Development	Kuen Sum Cheung et.al.	2505.04521	null
2025-05-07	Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs	Yehui Tang et.al.	2505.04519	null
2025-05-07	“I Can See Forever!”: Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments	Ziyi Zhang et.al.	2505.04488	null
2025-05-07	CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation	Jiahao Li et.al.	2505.04481	null
2025-05-07	TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution	Zhikai Zhao et.al.	2505.04480	link
2025-05-07	Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration	Shigeki Karita et.al.	2505.04457	link
2025-05-07	M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation	Qianru Zhang et.al.	2505.04445	null
2025-05-07	Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs	Mirazul Haque et.al.	2505.04441	null
2025-05-07	OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models	Xiaoyu Xu et.al.	2505.04416	null
2025-05-07	DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception	Junjie Wang et.al.	2505.04410	link
2025-05-06	VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model	Zuwei Long et.al.	2505.03739	link
2025-05-06	Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence	Shuhua Yu et.al.	2505.03736	null
2025-05-06	Meta-Optimization and Program Search using Language Models for Task and Motion Planning	Denis Shcherba et.al.	2505.03725	null
2025-05-06	Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning	François Role et.al.	2505.03703	null
2025-05-06	Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech	Susmita Bhattacharjee et.al.	2505.03697	null
2025-05-06	Graph Drawing for LLMs: An Empirical Evaluation	Walter Didimo et.al.	2505.03678	null
2025-05-06	Distribution-Conditional Generation: From Class Distribution to Creative Generation	Fu Feng et.al.	2505.03667	null
2025-05-06	Binding threshold units with artificial oscillatory neurons	Vladimir Fanaskov et.al.	2505.03648	link
2025-05-06	PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing	Yiping Xie et.al.	2505.03621	null
2025-05-06	Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images	Fangling Jiang et.al.	2505.03611	null
2025-05-06	Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection	Fangling Jiang et.al.	2505.03610	null
2025-05-06	DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes	Sergey Linok et.al.	2505.03581	link
2025-05-06	LlamaFirewall: An open source guardrail system for building secure AI agents	Sahana Chennabasappa et.al.	2505.03574	null
2025-05-06	Say It Another Way: A Framework for User-Grounded Paraphrasing	Cléa Chataigner et.al.	2505.03563	null
2025-05-06	A Comprehensive Survey of Large AI Models for Future Communications: Foundations, Applications and Challenges	Feibo Jiang et.al.	2505.03556	link
2025-05-06	A Hashgraph-Inspired Consensus Mechanism for Reliable Multi-Model Reasoning	Kolawole E. Ogunsina et.al.	2505.03553	null
2025-05-06	STORY2GAME: Generating (Almost) Everything in an Interactive Fiction Game	Eric Zhou et.al.	2505.03547	null
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-06	Ruled by the Representation Space: On the University’s Embrace of Large Language Models	Katia Schwerzmann et.al.	2505.03513	null
2025-05-06	BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models	Zihan Wang et.al.	2505.03501	null
2025-05-05	Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation	Lu Ling et.al.	2505.02836	null
2025-05-05	R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning	Yi-Fan Zhang et.al.	2505.02835	link
2025-05-05	No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves	Dengyang Jiang et.al.	2505.02831	link
2025-05-05	LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery	Jerome Quenum et.al.	2505.02829	null
2025-05-05	ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations	Dmitriy Shopkhoev et.al.	2505.02819	link
2025-05-05	Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing	Diji Yang et.al.	2505.02811	link
2025-05-05	Towards Quantifying the Hessian Structure of Neural Networks	Zhaorui Dong et.al.	2505.02809	link
2025-05-05	Generating HomeAssistant Automations Using an LLM-based Chatbot	Mathyas Giudici et.al.	2505.02802	null
2025-05-05	HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models	Zheng Lin et.al.	2505.02795	null
2025-05-05	Giving Simulated Cells a Voice: Evolving Prompt-to-Intervention Models for Cellular Control	Nam H. Le et.al.	2505.02766	null
2025-05-05	Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models	Matthew Dahl et.al.	2505.02763	null
2025-05-05	Using Knowledge Graphs to harvest datasets for efficient CLIP model training	Simon Ging et.al.	2505.02746	link
2025-05-06	Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation	Gerard Pons et.al.	2505.02737	null
2025-05-05	FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models	Zhouliang Yu et.al.	2505.02735	link
2025-05-05	Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry	Junu Kim et.al.	2505.02722	link
2025-05-05	Less is More: Efficient Weight Farcasting with 1-Layer Neural Network	Xiao Shou et.al.	2505.02714	null
2025-05-05	Technical Report: Evaluating Goal Drift in Language Model Agents	Rauno Arike et.al.	2505.02709	null
2025-05-05	Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play	Yemin Shi et.al.	2505.02707	link
2025-05-05	AI Standardized Patient Improves Human Conversations in Advanced Cancer Care	Kurtis Haut et.al.	2505.02694	link
2025-05-05	Predicting Movie Hits Before They Happen with LLMs	Shaghayegh Agah et.al.	2505.02693	null
2025-05-02	How Effective are Large Time Series Models in Hydrology? A Study on Water Level Forecasting in Everglades	Rahuul Rangaraj et.al.	2505.01415	null
2025-05-02	Dynamic Robot Tool Use with Vision Language Models	Noah Trupin et.al.	2505.01399	null
2025-05-02	FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors	Chenxi Li et.al.	2505.01322	null
2025-05-02	Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System	Sheikh Samit Muhaimin et.al.	2505.01315	null
2025-05-02	Enhancing SPARQL Query Rewriting for Complex Ontology Alignments	Anicet Lepetit Ondo et.al.	2505.01309	null
2025-05-02	Document Retrieval Augmented Fine-Tuning (DRAFT) for safety-critical software assessments	Regan Bolton et.al.	2505.01307	null
2025-05-02	FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing	Gaoxiang Cong et.al.	2505.01263	null
2025-05-02	Digital Pathway Curation (DPC): a comparative pipeline to assess the reproducibility, consensus and accuracy across Gemini, PubMed, and scientific reviewers in biomedical research	Flavio Lichtenstein et.al.	2505.01259	null
2025-05-02	Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging	Elena Mulero Ayllón et.al.	2505.01239	null
2025-05-02	CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning	Tsai-Ning Wang et.al.	2505.01199	null
2025-05-02	Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods	Mahdi Dhaini et.al.	2505.01198	link
2025-05-02	TSTMotion: Training-free Scene-awarenText-to-motion Generation	Ziyan Guo et.al.	2505.01182	null
2025-05-02	LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures	Francisco Aguilera-Martínez et.al.	2505.01177	null
2025-05-02	On the Limitations of Steering in Language Model Alignment	Chebrolu Niranjan et.al.	2505.01162	null
2025-05-02	Methodological Foundations for AI-Driven Survey Question Generation	Ted K. Mburu et.al.	2505.01150	null
2025-05-02	Retrieval-Augmented Generation in Biomedicine: A Survey of Technologies, Datasets, and Clinical Applications	Jiawei He et.al.	2505.01146	null
2025-05-02	MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning	Murtadha Ahmed et.al.	2505.01110	null
2025-05-02	Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study	Ali Mammadov et.al.	2505.01109	link
2025-05-02	Nesterov Method for Asynchronous Pipeline Parallel Optimization	Thalaiyasingam Ajanthan et.al.	2505.01099	link
2025-05-02	Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages	Marco Salmè et.al.	2505.01096	null
2025-05-01	T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT	Dongzhi Jiang et.al.	2505.00703	link
2025-05-01	Robotic Visual Instruction	Yanbang Li et.al.	2505.00693	null
2025-05-01	Visual Test-time Scaling for GUI Agent Grounding	Tiange Luo et.al.	2505.00684	link
2025-05-01	Steering Large Language Models with Register Analysis for Arbitrary Style Transfer	Xinchen Yang et.al.	2505.00679	null
2025-05-01	Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions	Yiming Du et.al.	2505.00675	link
2025-05-01	DeepCritic: Deliberate Critique with Large Language Models	Wenkai Yang et.al.	2505.00662	link
2025-05-01	On the generalization of language models from in-context learning and finetuning: a controlled study	Andrew K. Lampinen et.al.	2505.00661	null
2025-05-01	Large Language Models Understanding: an Inherent Ambiguity Barrier	Daniel N. Nissani et.al.	2505.00654	null
2025-05-01	Open-Source LLM-Driven Federated Transformer for Predictive IoV Management	Yazan Otoum et.al.	2505.00651	null
2025-05-01	Investigating Task Arithmetic for Zero-Shot Information Retrieval	Marco Braga et.al.	2505.00649	link
2025-05-01	Brain Foundation Models with Hypergraph Dynamic Adapter for Brain Disease Analysis	Zhongying Deng et.al.	2505.00627	null
2025-05-01	The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)	Zihao Wang et.al.	2505.00626	null
2025-05-01	FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation	Chaitali Bhattacharyya et.al.	2505.00624	null
2025-05-01	Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction	Simon Giebenhain et.al.	2505.00615	null
2025-05-01	Combining LLMs with Logic-Based Framework to Explain MCTS	Ziyan An et.al.	2505.00610	null
2025-05-01	Can LLMs Help Improve Analogical Reasoning For Strategic Decisions? Experimental Evidence from Humans and GPT-4	Phanish Puranam et.al.	2505.00603	null
2025-05-02	Fast and Low-Cost Genomic Foundation Models via Outlier Removal	Haozheng Luo et.al.	2505.00598	link
2025-05-01	Block Circulant Adapter for Large Language Models	Xinyu Ding et.al.	2505.00582	null
2025-05-01	Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors	Xinyu Ding et.al.	2505.00580	null
2025-05-01	FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension	Jushi Kai et.al.	2505.00570	null
2025-04-30	TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments	Sichang Tu et.al.	2504.21851	null
2025-04-30	COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning	Xindi Wu et.al.	2504.21850	null
2025-04-30	Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization	Anas Anwarul Haq Khan et.al.	2504.21831	null
2025-04-30	Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields	Yixin Gao et.al.	2504.21814	null
2025-04-30	A simple and effective approach for body part recognition on CT scans based on projection estimation	Franko Hrzic et.al.	2504.21810	null
2025-04-30	An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding	Xiuwei Shang et.al.	2504.21803	null
2025-04-30	DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition	Z. Z. Ren et.al.	2504.21801	link
2025-04-30	SWE-smith: Scaling Data for Software Engineering Agents	John Yang et.al.	2504.21798	null
2025-04-30	MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness	Junsheng Huang et.al.	2504.21773	null
2025-04-30	LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs	Baleegh Ahmad et.al.	2504.21770	null
2025-04-30	LLM-based Interactive Imitation Learning for Robotic Manipulation	Jonas Werner et.al.	2504.21769	link
2025-04-30	Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models	Emelie Hallenberg et.al.	2504.21742	null
2025-04-30	TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training	Shengqian Wang et.al.	2504.21735	null
2025-04-30	XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs	Marco Arazzi et.al.	2504.21700	null
2025-04-30	Visual Text Processing: A Comprehensive Review and Unified Evaluation	Yan Shu et.al.	2504.21682	link
2025-04-30	Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs	Pan Suo et.al.	2504.21680	null
2025-04-30	Traceback of Poisoning Attacks to Retrieval-Augmented Generation	Baolei Zhang et.al.	2504.21668	null
2025-04-30	From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising	Jingwen Cai et.al.	2504.21667	null
2025-04-30	AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization	Haotian Luo et.al.	2504.21659	link
2025-04-30	Sadeed: Advancing Arabic Diacritization Through Small Language Model	Zeina Aldallal et.al.	2504.21635	null
2025-04-29	Toward Efficient Exploration by Large Language Model Agents	Dilip Arumugam et.al.	2504.20997	null
2025-04-29	X-Fusion: Introducing New Modality to Frozen Large Language Models	Sicheng Mo et.al.	2504.20996	null
2025-04-29	ACE: A Security Architecture for LLM-Integrated App Systems	Evan Li et.al.	2504.20984	null
2025-04-29	Real-Time Wayfinding Assistant for Blind and Low-Vision Users	Dabbrata Das et.al.	2504.20976	null
2025-04-29	SetKE: Knowledge Editing for Knowledge Elements Overlap	Yifan Wei et.al.	2504.20972	null
2025-04-29	OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification	Shangyu Li et.al.	2504.20964	link
2025-04-29	Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models	Maryna Vyshnyvetska et.al.	2504.20951	null
2025-04-29	Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models	Tyler McDonald et.al.	2504.20946	null
2025-04-29	ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification	Ziqing Fan et.al.	2504.20930	link
2025-04-29	An Empirical Study on the Capability of LLMs in Decomposing Bug Reports	Zhiyuan Chen et.al.	2504.20911	null
2025-04-29	Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers	Quentin Guimard et.al.	2504.20902	null
2025-04-29	LELANTE: LEveraging LLM for Automated ANdroid TEsting	Shamit Fatin et.al.	2504.20896	null
2025-04-29	FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models	Mainak Singha et.al.	2504.20860	null
2025-04-29	X-Cross: Dynamic Integration of Language Models for Cross-Domain Sequential Recommendation	Guy Hadad et.al.	2504.20859	null
2025-04-29	JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry	Anum Afzal et.al.	2504.20849	null
2025-04-29	Language Model for Large-Text Transmission in Noisy Quantum Communications	Yuqi Li et.al.	2504.20842	null
2025-04-29	Universal language model with the intervention of quantum theory	D. -F. Qin et.al.	2504.20839	null
2025-04-29	Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning	Hongfei Xue et.al.	2504.20835	null
2025-04-29	Reinforcement Learning for LLM Reasoning Under Memory Constraints	Alan Lee et.al.	2504.20834	null
2025-04-30	Ascendra: Dynamic Request Prioritization for Efficient LLM Serving	Azam Ikram et.al.	2504.20828	null
2025-04-28	Learning Streaming Video Representation via Multitask Training	Yibin Yan et.al.	2504.20041	null
2025-04-28	AutoJudge: Judge Decoding Without Manual Annotation	Roman Garipov et.al.	2504.20039	null
2025-04-28	SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning	Wufei Ma et.al.	2504.20024	null
2025-04-28	Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages	Pritika Rohera et.al.	2504.20022	null
2025-04-28	Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models	Xin Wang et.al.	2504.20020	null
2025-04-28	LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation	Beizhe Hu et.al.	2504.20013	null
2025-04-28	Towards Automated Scoping of AI for Social Good Projects	Jacob Emmerson et.al.	2504.20010	null
2025-04-28	Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom	Rishika Sen et.al.	2504.20000	null
2025-04-28	HJRNO: Hamilton-Jacobi Reachability with Neural Operators	Yankai Li et.al.	2504.19989	null
2025-04-28	TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons	Emre Can Acikgoz et.al.	2504.19982	null
2025-04-28	Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets	Adam Younsi et.al.	2504.19981	null
2025-04-29	From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification	Junhao Ye et.al.	2504.19959	null
2025-04-28	Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI	Hugo Georgenthum et.al.	2504.19918	null
2025-04-28	Can AI Agents Design and Implement Drug Discovery Pipelines?	Khachik Smbatyan et.al.	2504.19912	null
2025-04-28	GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets	Mingqian He et.al.	2504.19898	null
2025-04-28	CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition	Quynh Phung et.al.	2504.19894	null
2025-04-28	semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage	Ke Hong et.al.	2504.19867	null
2025-04-28	CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback	Chenhan Jiang et.al.	2504.19860	null
2025-04-28	Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language	Anastasia Zhukova et.al.	2504.19856	null
2025-04-29	The Automation Advantage in AI Red Teaming	Rob Mulla et.al.	2504.19855	null
2025-04-25	Generalization Capability for Imitation Learning	Yixiao Wang et.al.	2504.18538	null
2025-04-25	TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation	Gwen Yidou Weng et.al.	2504.18535	null
2025-04-25	Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation	Shivam Duggal et.al.	2504.18509	null
2025-04-25	Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues	Leandra Fichtel et.al.	2504.18483	null
2025-04-25	Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions	James D. Finch et.al.	2504.18474	null
2025-04-25	Fast-Slow Thinking for Large Vision-Language Model Reasoning	Wenyi Xiao et.al.	2504.18458	null
2025-04-25	Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training	Hiroki Naganuma et.al.	2504.18454	null
2025-04-25	Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation	Peiyuan Jing et.al.	2504.18453	null
2025-04-25	Kimi-Audio Technical Report	KimiTeam et.al.	2504.18425	link
2025-04-25	LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection	Rajesh Yarra et.al.	2504.18423	null
2025-04-25	BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs	Hongyu Wang et.al.	2504.18415	null
2025-04-25	An Empirical Study of Evaluating Long-form Question Answering	Ning Xian et.al.	2504.18413	link
2025-04-25	Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers	Jared Moore et.al.	2504.18412	link
2025-04-25	HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?	Yusen Zhang et.al.	2504.18406	null
2025-04-25	Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization	Kesen Zhao et.al.	2504.18397	link
2025-04-25	Bridge the Domains: Large Language Models Enhanced Cross-domain Sequential Recommendation	Qidong Liu et.al.	2504.18383	null
2025-04-25	Pushing the boundary on Natural Language Inference	Pablo Miralles-González et.al.	2504.18376	null
2025-04-25	Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant	Lei Shen et.al.	2504.18373	link
2025-04-25	ThreMoLIA: Threat Modeling of Large Language Model-Integrated Applications	Felix Viktor Jedrzejewski et.al.	2504.18369	null
2025-04-25	Testing Individual Fairness in Graph Neural Networks	Roya Nasiri et.al.	2504.18353	null
2025-04-24	Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models	Xu Ma et.al.	2504.17789	null
2025-04-24	Replay to Remember: Retaining Domain Knowledge in Streaming Language Models	Sneh Pillai et.al.	2504.17780	null
2025-04-24	Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT	Anuja Tayal et.al.	2504.17753	null
2025-04-24	Towards Robust LLMs: an Adversarial Robustness Measurement Framework	Natan Levy et.al.	2504.17723	null
2025-04-24	Multilingual Performance Biases of Large Language Models in Education	Vansh Gupta et.al.	2504.17720	null
2025-04-24	PICO: Reconstructing 3D People In Contact with Objects	Alpár Cseke et.al.	2504.17695	null
2025-04-24	Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks	Haru-Tada Sato et.al.	2504.17685	null
2025-04-24	INSIGHT: Bridging the Student-Teacher Gap in Times of Large Language Models	Jarne Thys et.al.	2504.17677	null
2025-04-24	Energy Considerations of Large Language Model Inference and Efficiency Optimizations	Jared Fernandez et.al.	2504.17674	null
2025-04-24	Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation	Ying Zhu et.al.	2504.17672	null
2025-04-25	Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction	Yuanchang Ye et.al.	2504.17671	null
2025-04-24	Towards a HIPAA Compliant Agentic AI System in Healthcare	Subash Neupane et.al.	2504.17669	null
2025-04-24	Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics	Zena Al-Khalili et.al.	2504.17665	null
2025-04-24	Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models	Julius Vetter et.al.	2504.17660	null
2025-04-24	Portability of Optimizations from SC to TSO	Akshay Gopalakrishnan et.al.	2504.17646	null
2025-04-24	L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference	Qingyuan Liu et.al.	2504.17584	null
2025-04-25	DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training	Xiaoyu Tian et.al.	2504.17565	null
2025-04-24	When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars	Rei Higuchi et.al.	2504.17562	null
2025-04-24	HalluLens: LLM Hallucination Benchmark	Yejin Bang et.al.	2504.17550	null
2025-04-24	A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task	Jiaqi Deng et.al.	2504.17547	null
2025-04-23	Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light	Ali Hassani et.al.	2504.16922	link
2025-04-23	IberBench: LLM Evaluation on Iberian Languages	José Ángel González et.al.	2504.16921	null
2025-04-23	Tracing Thought: Using Chain-of-Thought Reasoning to Identify the LLM Behind AI-Generated Text	Shifali Agrahari et.al.	2504.16913	null
2025-04-23	Do Large Language Models know who did what to whom?	Joseph M. Denning et.al.	2504.16884	null
2025-04-23	Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models	Xuyang Zhu et.al.	2504.16883	null
2025-04-23	Context-Enhanced Vulnerability Detection Based on Large Language Model	Yixin Yang et.al.	2504.16877	null
2025-04-23	Exploring How LLMs Capture and Represent Domain-Specific Knowledge	Mirian Hipolito Garcia et.al.	2504.16871	null
2025-04-23	Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations	Manuel Quintero et.al.	2504.16864	null
2025-04-23	Planning with Diffusion Models for Target-Oriented Dialogue Systems	Hanwen Du et.al.	2504.16858	null
2025-04-23	Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification	Alexander Shvets et.al.	2504.16856	null
2025-04-23	Monte Carlo Planning with Large Language Model for Text-Based Game Agents	Zijing Shi et.al.	2504.16855	null
2025-04-23	Improving Significant Wave Height Prediction Using Chronos Models	Yilin Zhai et.al.	2504.16834	null
2025-04-23	LRASGen: LLM-based RESTful API Specification Generation	Sida Deng et.al.	2504.16833	null
2025-04-23	GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning	Luu Quy Tung et.al.	2504.16832	null
2025-04-23	Decoupled Global-Local Alignment for Improving Compositional Understanding	Xiaoxing Hu et.al.	2504.16801	null
2025-04-23	MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores	Fengwei Zhou et.al.	2504.16786	null
2025-04-23	Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation	Tixiao Shan et.al.	2504.16782	null
2025-04-23	How Effective are Generative Large Language Models in Performing Requirements Classification?	Waad Alhoshan et.al.	2504.16768	null
2025-04-23	Lightweight Latent Verifiers for Efficient Meta-Generation Strategies	Bartosz Piotrowski et.al.	2504.16760	null
2025-04-23	HEMA : A Hippocampus-Inspired Extended Memory Architecture for Long-Context AI Conversations	Kwangseob Ahn et.al.	2504.16754	null
2025-04-22	TTRL: Test-Time Reinforcement Learning	Yuxin Zuo et.al.	2504.16084	link
2025-04-22	MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention	Yucheng Li et.al.	2504.16083	null
2025-04-22	MR. Video: “MapReduce” is the Principle for Long Video Understanding	Ziqi Pang et.al.	2504.16082	null
2025-04-22	From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning	Le Zhuo et.al.	2504.16080	null
2025-04-22	LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities	Thomas Schmied et.al.	2504.16078	null
2025-04-22	PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models	Shi Qiu et.al.	2504.16074	null
2025-04-22	Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation	Zhiyuan Hu et.al.	2504.16073	null
2025-04-22	Describe Anything: Detailed Localized Image and Video Captioning	Long Lian et.al.	2504.16072	null
2025-04-22	A Python Tool for Reconstructing Full News Text from GDELT	A. Fronzetti Colladon et.al.	2504.16063	link
2025-04-22	Vision language models are unreliable at trivial spatial cognition	Sangeet Khemlani et.al.	2504.16061	null
2025-04-22	Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation	Ziqiao Ma et.al.	2504.16060	link
2025-04-22	Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach	Penghui Li et.al.	2504.16057	null
2025-04-22	Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability	Daniel Hendriks et.al.	2504.16056	null
2025-04-22	LongMamba: Enhancing Mamba’s Long Context Capabilities via Training-Free Receptive Field Enlargement	Zhifan Ye et.al.	2504.16053	link
2025-04-22	Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis	Frank Li et.al.	2504.16047	null
2025-04-22	Certified Mitigation of Worst-Case LLM Copyright Infringement	Jingyu Zhang et.al.	2504.16046	null
2025-04-22	LLMs meet Federated Learning for Scalable and Secure IoT Management	Yazan Otoum et.al.	2504.16032	null
2025-04-22	LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale	Joya Chen et.al.	2504.16030	null
2025-04-22	Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3	Ahmed R. Sadik et.al.	2504.16027	null
2025-04-22	Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework	Xinyuan Song et.al.	2504.16016	null
2025-04-21	Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs	Chun-Hsiao Yeh et.al.	2504.15280	link
2025-04-21	VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models	Weiye Xu et.al.	2504.15279	null
2025-04-21	Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning	Jie Cheng et.al.	2504.15275	link
2025-04-21	Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models	Guo Chen et.al.	2504.15271	null
2025-04-21	Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction	Vaishnavh Nagarajan et.al.	2504.15266	link
2025-04-21	Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning	Ehsan Ahmadi et.al.	2504.15263	null
2025-04-21	Leveraging Language Models for Automated Patient Record Linkage	Mohammad Beheshti et.al.	2504.15261	null
2025-04-21	CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation	Anirudh Khatry et.al.	2504.15254	link
2025-04-21	Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators	Yilun Zhou et.al.	2504.15253	link
2025-04-21	MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning	Yahan Yang et.al.	2504.15241	null
2025-04-21	Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions	Saffron Huang et.al.	2504.15236	null
2025-04-21	A Self-Improving Coding Agent	Maxime Robeyns et.al.	2504.15228	null
2025-04-21	EvalAgent: Discovering Implicit Evaluation Criteria from the Web	Manya Wadhwa et.al.	2504.15219	null
2025-04-21	Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs	Marina Sakharova et.al.	2504.15210	null
2025-04-21	Compute-Optimal LLMs Provably Generalize Better With Scale	Marc Finzi et.al.	2504.15208	null
2025-04-21	Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges	Nandan Thakur et.al.	2504.15205	null
2025-04-22	Synergistic Weak-Strong Collaboration by Aligning Preferences	Yizhu Jiao et.al.	2504.15188	null
2025-04-21	DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution	Miaomiao Cai et.al.	2504.15176	null
2025-04-21	The Synthetic Imputation Approach: Generating Optimal Synthetic Texts For Underrepresented Categories In Supervised Classification Tasks	Joan C. Timoneda et.al.	2504.15160	null
2025-04-21	KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking	Juyeon Kim et.al.	2504.15135	link
2025-04-18	Generative AI Act II: Test Time Scaling Drives Cognition Engineering	Shijie Xia et.al.	2504.13828	link
2025-04-18	Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models	Junjie Yang et.al.	2504.13825	null
2025-04-18	CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning	Yang Yue et.al.	2504.13820	link
2025-04-18	Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning	Yixuan Even Xu et.al.	2504.13818	null
2025-04-18	BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models	Zhengxian Wu et.al.	2504.13775	null
2025-04-18	DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs	Tamim Al Mahmud et.al.	2504.13774	link
2025-04-18	Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy?	Motunrayo Ibiyo et.al.	2504.13769	null
2025-04-18	Decoding Vision Transformers: the Diffusion Steering Lens	Ryota Takatsuki et.al.	2504.13763	link
2025-04-18	Scaling sparse feature circuit finding for in-context learning	Dmitrii Kharlapenko et.al.	2504.13756	null
2025-04-18	Learning to Attribute with Attention	Benjamin Cohen-Wang et.al.	2504.13752	link
2025-04-18	Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence	Paul K. Mandal et.al.	2504.13730	link
2025-04-18	OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation	Yichen Wu et.al.	2504.13707	null
2025-04-18	Exploring Multimodal Prompt for Visualization Authoring with Large Language Models	Zhen Wen et.al.	2504.13700	null
2025-04-18	Analysing the Robustness of Vision-Language-Models to Common Corruptions	Muhammad Usama et.al.	2504.13690	null
2025-04-18	Intelligent Interaction Strategies for Context-Aware Cognitive Augmentation	Xiangrong et.al.	2504.13684	null
2025-04-18	Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results	Andrea Santilli et.al.	2504.13677	null
2025-04-18	Large Language Models Will Change The Way Children Think About Technology And Impact Every Interaction Paradigm	Russell Beale et.al.	2504.13667	null
2025-04-18	Do Prompt Patterns Affect Code Quality? A First Empirical Assessment of ChatGPT-Generated Code	Antonio Della Porta et.al.	2504.13656	null
2025-04-18	EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model	Sijing Li et.al.	2504.13650	link
2025-04-18	Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs	Gabriel Freedman et.al.	2504.13644	link
2025-04-17	Perception Encoder: The best visual embeddings are not at the output of the network	Daniel Bolya et.al.	2504.13181	null
2025-04-17	PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding	Jang Hyun Cho et.al.	2504.13180	link
2025-04-17	It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization	Ali Behrouz et.al.	2504.13173	null
2025-04-17	Sleep-time Compute: Beyond Inference Scaling at Test-time	Kevin Lin et.al.	2504.13171	link
2025-04-17	Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling	Tsung-Han Wu et.al.	2504.13169	link
2025-04-17	CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training	Shizhe Diao et.al.	2504.13161	null
2025-04-17	Digital Twin Generation from Visual Data: A Survey	Andrew Melnik et.al.	2504.13159	link
2025-04-17	MIB: A Mechanistic Interpretability Benchmark	Aaron Mueller et.al.	2504.13151	link
2025-04-17	Exploring Expert Failures Improves LLM Agent Tuning	Li-Cheng Lan et.al.	2504.13145	null
2025-04-17	Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo	João Loula et.al.	2504.13139	null
2025-04-17	Energy-Based Reward Models for Robust Language Model Alignment	Anamika Lochab et.al.	2504.13134	link
2025-04-17	LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard	Varun Rao et.al.	2504.13125	null
2025-04-17	Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training	Xinsong Zhang et.al.	2504.13123	null
2025-04-17	VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models	Haojian Huang et.al.	2504.13122	link
2025-04-17	Probing and Inducing Combinational Creativity in Vision-Language Models	Yongqian Peng et.al.	2504.13120	null
2025-04-17	Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration	Yusi Sun et.al.	2504.13119	null
2025-04-17	Uncertainty-Aware Trajectory Prediction via Rule-Regularized Heteroscedastic Deep Classification	Kumar Manas et.al.	2504.13111	link
2025-04-17	EventVAD: Training-Free Event-Aware Video Anomaly Detection	Yihua Shao et.al.	2504.13092	null
2025-04-17	Retrieval-Augmented Generation with Conflicting Evidence	Han Wang et.al.	2504.13079	link
2025-04-18	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	link
2025-04-16	BitNet b1.58 2B4T Technical Report	Shuming Ma et.al.	2504.12285	null
2025-04-16	HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks	Stefan Abi-Karam et.al.	2504.12268	link
2025-04-16	FLIP Reasoning Challenge	Andreas Plesner et.al.	2504.12256	link
2025-04-16	AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection	Xinyu Li et.al.	2504.12250	null
2025-04-16	MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models	Hang Yuan et.al.	2504.12234	null
2025-04-16	Watermarking Needs Input Repetition Masking	David Khachaturov et.al.	2504.12229	null
2025-04-16	d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning	Siyan Zhao et.al.	2504.12216	null
2025-04-16	What Do Large Language Models Know? Tacit Knowledge as a Potential Causal-Explanatory Structure	Céline Budding et.al.	2504.12187	null
2025-04-16	SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data	Suyoung Bae et.al.	2504.12185	null
2025-04-16	Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification	Jaime E. Cuellar et.al.	2504.12180	null
2025-04-16	Multilingual Contextualization of Large Language Models for Document-Level Machine Translation	Miguel Moura Ramos et.al.	2504.12140	null
2025-04-16	Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -	Laura Fieback et.al.	2504.12137	null
2025-04-16	Clarifying Ambiguities: on the Role of Ambiguity Types in Prompting Methods for Clarification Generation	Anfu Tang et.al.	2504.12113	null
2025-04-16	Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation	Shizhan Cai et.al.	2504.12108	null
2025-04-16	Logits DeConfusion with CLIP for Few-Shot Learning	Shuo Li et.al.	2504.12104	link
2025-04-16	Gauging Overprecision in LLMs: An Empirical Study	Adil Bahaj et.al.	2504.12098	null
2025-04-16	Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework	Jack Preuveneers et.al.	2504.12090	null
2025-04-16	Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization	Pritam Sarkar et.al.	2504.12083	null
2025-04-16	Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection	Yumin Kim et.al.	2504.12082	null
2025-04-16	Subitizing-Inspired_Large_Language_Models_for_Floorplanning	Shao-Chien Lu et.al.	2504.12076	null
2025-04-16	Elucidating the Design Space of Multimodal Protein Language Models	Cheng-Yen Hsieh et.al.	2504.11454	null
2025-04-15	TextArena	Leon Guertler et.al.	2504.11442	link
2025-04-15	Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models	Maria Teleki et.al.	2504.11431	link
2025-04-15	A Dual-Space Framework for General Knowledge Distillation of Large Language Models	Xue Zhang et.al.	2504.11426	null
2025-04-15	Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts	Quanyu Long et.al.	2504.11420	null
2025-04-15	Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning	Ali Taghibakhshi et.al.	2504.11409	null
2025-04-15	DataDecide: How to Predict Best Pretraining Data with Small Experiments	Ian Magnusson et.al.	2504.11393	null
2025-04-15	RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models	Juan Diego Rodriguez et.al.	2504.11381	link
2025-04-15	Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions	Wang Bill Zhu et.al.	2504.11373	link
2025-04-15	OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution	Lucio La Cava et.al.	2504.11369	null
2025-04-15	From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation	Jingkun Chen et.al.	2504.11368	null
2025-04-15	Teaching Large Language Models to Reason through Learning and Forgetting	Tianwei Ni et.al.	2504.11364	link
2025-04-15	Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning	Haiming Wang et.al.	2504.11354	link
2025-04-15	Seedream 3.0 Technical Report	Yu Gao et.al.	2504.11346	null
2025-04-15	A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce	Wei Xiong et.al.	2504.11343	link
2025-04-15	REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective	Zhihao Xu et.al.	2504.11337	null
2025-04-15	Looking beyond the next token	Abitha Thankaraj et.al.	2504.11336	null
2025-04-15	Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints	Ruicheng Ao et.al.	2504.11320	link
2025-04-15	Learning to Be A Doctor: Searching for Effective Medical Agent Architectures	Yangyang Zhuang et.al.	2504.11301	null
2025-04-15	Automated Python Translation	Joshua Otten et.al.	2504.11290	null
2025-04-14	InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models	Jinguo Zhu et.al.	2504.10479	link
2025-04-14	Weight Ensembling Improves Reasoning in Language Models	Xingyu Dang et.al.	2504.10478	null
2025-04-14	MIEB: Massive Image Embedding Benchmark	Chenghao Xiao et.al.	2504.10471	link
2025-04-14	Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding	Tao Zhang et.al.	2504.10465	link
2025-04-14	The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer	Weixian Lei et.al.	2504.10462	link
2025-04-14	GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents	Xiaobo Xia et.al.	2504.10458	null
2025-04-14	M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models	Junxiong Wang et.al.	2504.10449	link
2025-04-14	Multimodal Long Video Modeling Based on Temporal Dynamic Context	Haoran Hao et.al.	2504.10443	link
2025-04-14	LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models	Minqian Liu et.al.	2504.10430	null
2025-04-14	Foundation models for electronic health records: representation dynamics and transferability	Michael C. Burkhart et.al.	2504.10422	link
2025-04-14	Can We Edit LLMs for Long-Tail Biomedical Knowledge?	Xinhao Yi et.al.	2504.10421	link
2025-04-15	Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA	Michał Turski et.al.	2504.10419	link
2025-04-14	CliniChat: A Multi-Source Knowledge-Driven Framework for Clinical Interview Dialogue Reconstruction and Evaluation	Jing Chen et.al.	2504.10418	null
2025-04-14	LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models	Parshin Shojaee et.al.	2504.10415	link
2025-04-14	Performance of Large Language Models in Supporting Medical Diagnosis and Treatment	Diogo Sousa et.al.	2504.10405	null
2025-04-14	Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks	Yan zhu et.al.	2504.10403	null
2025-04-14	Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling?	Olha Shaposhnyk et.al.	2504.10397	null
2025-04-14	SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning	Yiting Wang et.al.	2504.10369	null
2025-04-14	DICE: A Framework for Dimensional and Contextual Evaluation of Language Models	Aryan Shrivastava et.al.	2504.10359	null
2025-04-14	Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis	Yifan Yang et.al.	2504.10352	null
2025-04-11	Quantum Large Language Model Fine-Tuning	Sang Hyub Kim et.al.	2504.08732	null
2025-04-11	DocAgent: A Multi-Agent System for Automated Code Documentation Generation	Dayu Yang et.al.	2504.08725	link
2025-04-11	SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling	Krishna C. Puvvada et.al.	2504.08719	null
2025-04-11	SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents	Muhammad Shihab Rashid et.al.	2504.08703	link
2025-04-11	Large Language Models as Span Annotators	Zdeněk Kasner et.al.	2504.08697	null
2025-04-11	TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning	Hang Ni et.al.	2504.08694	null
2025-04-11	Fast-Slow-Thinking: Complex Task Solving with Large Language Models	Yiliu Sun et.al.	2504.08690	null
2025-04-11	Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing	Jiho Kim et.al.	2504.08687	null
2025-04-11	Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model	Team Seawead et.al.	2504.08685	null
2025-04-11	Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis	Alexandre Bazin et.al.	2504.08666	null
2025-04-11	Quality evaluation of Tabby coding assistant using real source code snippets	Marta Borek et.al.	2504.08650	link
2025-04-11	Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents	Alessio Buscemi et.al.	2504.08640	null
2025-04-11	Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging	Gabriele Lozupone et.al.	2504.08635	link
2025-04-11	MooseAgent: A LLM Based Multi-agent Framework for Automating Moose Simulation	Tao Zhang et.al.	2504.08621	link
2025-04-11	Analyzing 16,193 LLM Papers for Fun and Profits	Zhiqiu Xia et.al.	2504.08619	null
2025-04-11	Playpen: An Environment for Exploring Learning Through Conversational Interaction	Nicola Horst et.al.	2504.08590	link
2025-04-11	AstroLLaVA: towards the unification of astronomical data and natural language	Sharaf Zaman et.al.	2504.08583	null
2025-04-11	UoB-NLP at SemEval-2025 Task 11: Leveraging Adapters for Multilingual and Cross-Lingual Emotion Detection	Frances Laureano De Leon et.al.	2504.08543	null
2025-04-11	Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions	Tommaso Galliena et.al.	2504.08531	null
2025-04-11	On The Landscape of Spoken Language Models: A Comprehensive Survey	Siddhant Arora et.al.	2504.08528	null
2025-04-10	Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments	Lorenz Linhardt et.al.	2504.07965	null
2025-04-10	C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Zhongyang Li et.al.	2504.07964	link
2025-04-10	GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Lang Lin et.al.	2504.07962	null
2025-04-10	Detect Anything 3D in the Wild	Hanxue Zhang et.al.	2504.07958	link
2025-04-10	MM-IFEngine: Towards Multimodal Instruction Following	Shengyuan Ding et.al.	2504.07957	link
2025-04-10	VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning	Yukun Qi et.al.	2504.07956	null
2025-04-10	Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory	Mirac Suzgun et.al.	2504.07952	link
2025-04-10	We Are All Creators: Generative AI, Collective Knowledge, and the Path Towards Human-AI Synergy	Jordi Linares-Pellicer et.al.	2504.07936	null
2025-04-10	Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining	Rosie Zhao et.al.	2504.07912	link
2025-04-10	Porting an LLM based Application from ChatGPT to an On-Premise Environment	Teemu Paloniemi et.al.	2504.07907	null
2025-04-10	Redefining Machine Translation on Social Network Services with Large Language Models	Hongcheng Guo et.al.	2504.07901	link
2025-04-10	How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective	Qi Liu et.al.	2504.07898	link
2025-04-10	Fast Adaptation with Behavioral Foundation Models	Harshit Sikchi et.al.	2504.07896	null
2025-04-10	Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge	Riccardo Cantini et.al.	2504.07887	link
2025-04-11	An LLM-Driven Multi-Agent Debate System for Mendelian Diseases	Xinyang Zhou et.al.	2504.07881	null
2025-04-10	Token Level Routing Inference System for Edge Devices	Jianshu She et.al.	2504.07878	null
2025-04-10	SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos	Joshua Li et.al.	2504.07867	null
2025-04-11	Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs	Yichun Yin et.al.	2504.07866	null
2025-04-10	Robust Hallucination Detection in LLMs via Adaptive Token Selection	Mengjia Niu et.al.	2504.07863	null
2025-04-10	2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization	Mengyang Li et.al.	2504.07856	null
2025-04-09	Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning	Nikhil Shivakumar Nayak et.al.	2504.07097	link
2025-04-09	OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens	Jiacheng Liu et.al.	2504.07096	null
2025-04-09	Are We Done with Object-Centric Learning?	Alexander Rubinstein et.al.	2504.07092	link
2025-04-09	KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs	Elan Markowitz et.al.	2504.07087	null
2025-04-09	A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility	Andreas Hochlehnert et.al.	2504.07086	null
2025-04-09	Self-Steering Language Models	Gabriel Grand et.al.	2504.07081	null
2025-04-09	DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning	Atharva Pandey et.al.	2504.07080	null
2025-04-09	Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation	Israfel Salazar et.al.	2504.07072	null
2025-04-09	A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models	Zhouhang Xie et.al.	2504.07070	null
2025-04-09	HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification	Bibek Paudel et.al.	2504.07069	null
2025-04-09	Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer	Shi Pan et.al.	2504.07061	null
2025-04-09	TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling	Liang-Hsuan Tseng et.al.	2504.07053	link
2025-04-09	To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning	Tian Qin et.al.	2504.07052	null
2025-04-09	Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety	Chad Melton et.al.	2504.07022	null
2025-04-09	LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware	Nowfel Mashnoor et.al.	2504.07015	null
2025-04-09	Towards LLMs Robustness to Changes in Prompt Format Styles	Lilian Ngweta et.al.	2504.06969	null
2025-04-09	Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation	Thomas Kerdreux et.al.	2504.06962	null
2025-04-09	VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Xinhao Li et.al.	2504.06958	null
2025-04-09	Adaptive Computation Pruning for the Forgetting Transformer	Zhixuan Lin et.al.	2504.06949	null
2025-04-09	RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts	Natalia Loukachevitch et.al.	2504.06947	link
2025-04-08	GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization	Bojana Ranković et.al.	2504.06265	link
2025-04-08	OmniSVG: A Unified Scalable Vector Graphics Generation Model	Yiying Yang et.al.	2504.06263	null
2025-04-08	Hogwild! Inference: Parallel LLM Generation via Concurrent Attention	Gleb Rodionov et.al.	2504.06261	link
2025-04-08	FEABench: Evaluating Language Models on Multiphysics Reasoning Ability	Nayantara Mudur et.al.	2504.06260	link
2025-04-08	Orb-v3: atomistic simulation at scale	Benjamin Rhodes et.al.	2504.06231	link
2025-04-08	LExT: Towards Evaluating Trustworthiness of Natural Language Explanations	Krithi Shailya et.al.	2504.06227	null
2025-04-08	Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation	Biao Zhang et.al.	2504.06225	null
2025-04-09	Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation	Xiaoxing Hu et.al.	2504.06220	link
2025-04-08	Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs	Dongyang Fan et.al.	2504.06219	null
2025-04-08	From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Chejian Xu et.al.	2504.06214	null
2025-04-08	TxGemma: Efficient and Agentic LLMs for Therapeutics	Eric Wang et.al.	2504.06196	null
2025-04-08	A Self-Supervised Framework for Space Object Behaviour Characterisation	Ian Groves et.al.	2504.06176	null
2025-04-08	Assessing how hyperparameters impact Large Language Models’ sarcasm detection performance	Montgomery Gole et.al.	2504.06166	null
2025-04-09	Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups	Rijul Magu et.al.	2504.06160	null
2025-04-08	A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning	Akash Kumar et.al.	2504.06153	null
2025-04-08	V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models	Xiangxi Zheng et.al.	2504.06148	link
2025-04-08	ARLO: A Tailorable Approach for Transforming Natural Language Software Requirements into Architecture using LLMs	Tooraj Helmi et.al.	2504.06143	null
2025-04-08	Adversarial Training of Reward Models	Alexander Bukharin et.al.	2504.06141	null
2025-04-08	A Multimedia Analytics Model for the Foundation Model Era	Marcel Worring et.al.	2504.06138	null
2025-04-08	QGen Studio: An Adaptive Question-Answer Generation, Training and Evaluation Platform	Movina Moses et.al.	2504.06136	null
2025-04-07	URECA: Unique Region Caption Anything	Sangbeom Lim et.al.	2504.05305	null
2025-04-07	InteractVLM: 3D Interaction Reasoning from 2D Foundational Models	Sai Kumar Dwivedi et.al.	2504.05303	link
2025-04-07	SmolVLM: Redefining small and efficient multimodal models	Andrés Marafioti et.al.	2504.05299	null
2025-04-07	Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations	Pedro Ferreira et.al.	2504.05294	null
2025-04-07	The challenge of uncertainty quantification of large language models in medicine	Zahra Atf et.al.	2504.05278	null
2025-04-07	Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation	Yucheng Chu et.al.	2504.05276	null
2025-04-07	Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models	Yang Yan et.al.	2504.05262	null
2025-04-07	Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models	Adrián Bazaga et.al.	2504.05258	null
2025-04-07	Explaining Low Perception Model Competency with High-Competency Counterfactuals	Sara Pohland et.al.	2504.05254	null
2025-04-07	LLM-based Automated Grading with Human-in-the-Loop	Hang Li et.al.	2504.05239	null
2025-04-07	NoveltyBench: Evaluating Creativity and Diversity in Language Models	Yiming Zhang et.al.	2504.05228	null
2025-04-07	A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?	Julio Silva-Rodríguez et.al.	2504.05227	null
2025-04-07	Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation	Jiaming Chen et.al.	2504.05225	link
2025-04-08	Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG	Hengran Zhang et.al.	2504.05220	null
2025-04-07	Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling	Hengran Zhang et.al.	2504.05216	null
2025-04-07	Post-Training Language Models for Continual Relation Extraction	Sefika Efeoglu et.al.	2504.05214	null
2025-04-07	Quantum Program Linting with LLMs: Emerging Results from a Comparative Study	Seung Yeob Shin et.al.	2504.05204	null
2025-04-07	Training state-of-the-art pathology foundation models with orders of magnitude less data	Mikhail Karasikov et.al.	2504.05186	null
2025-04-07	Concise Reasoning via Reinforcement Learning	Mehdi Fatemi et.al.	2504.05185	link
2025-04-07	BRIDGES: Bridging Graph Modality and Large Language Models within EDA Tasks	Wei Li et.al.	2504.05180	null
2025-04-04	Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions	Ting-Hsuan Liao et.al.	2504.03639	null
2025-04-04	Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning	Xinyi Wang et.al.	2504.03635	null
2025-04-04	Align to Structure: Aligning Large Language Models with Structural Information	Zae Myung Kim et.al.	2504.03622	null
2025-04-04	VISTA-OCR: Towards generative and interactive end to end OCR models	Laziz Hamdi et.al.	2504.03621	null
2025-04-04	Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task	Leonardo Ranaldi et.al.	2504.03616	null
2025-04-04	AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset	Bingxiang He et.al.	2504.03612	null
2025-04-04	MedSAM2: Segment Anything in 3D Medical Images and Videos	Jun Ma et.al.	2504.03600	link
2025-04-04	EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline	Peter Baile Chen et.al.	2504.03598	null
2025-04-04	PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector	Kaidong Li et.al.	2504.03563	null
2025-04-04	Agentic Knowledgeable Self-awareness	Shuofei Qiao et.al.	2504.03553	link
2025-04-04	RANa: Retrieval-Augmented Navigation	Gianluca Monaci et.al.	2504.03524	null
2025-04-04	Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles	Chen Wei Kuo et.al.	2504.03520	null
2025-04-04	SpectR: Dynamically Composing LM Experts with Spectral Routing	William Fleshman et.al.	2504.03454	null
2025-04-04	Optimizing Specific and Shared Parameters for Efficient Parameter Tuning	Van-Anh Nguyen et.al.	2504.03450	null
2025-04-04	LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications	Botao Zhu et.al.	2504.03444	null
2025-04-04	Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models	Mirko Borszukovszki et.al.	2504.03440	null
2025-04-04	Locations of Characters in Narratives: Andersen and Persuasion Datasets	Batuhan Ozyurt et.al.	2504.03434	link
2025-04-04	Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning	Sanghwan Bae et.al.	2504.03380	null
2025-04-04	MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance	Chen Hu et.al.	2504.03379	null
2025-04-04	Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Erik Johannes Husom et.al.	2504.03360	null
2025-04-03	STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection	Divya Velayudhan et.al.	2504.02823	null
2025-04-03	Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models	Mateusz Pach et.al.	2504.02821	link
2025-04-03	Generative Evaluation of Complex Reasoning in Large Language Models	Haowei Lin et.al.	2504.02810	link
2025-04-03	MegaMath: Pushing the Limits of Open Math Corpora	Fan Zhou et.al.	2504.02807	link
2025-04-03	F-ViTA: Foundation Model Guided Visible to Thermal Translation	Jay N. Paranjape et.al.	2504.02801	link
2025-04-04	A Survey of Large Language Models in Mental Health Disorder Detection on Social Media	Zhuohan Ge et.al.	2504.02800	null
2025-04-03	Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence	Anita Rau et.al.	2504.02799	null
2025-04-03	A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models	Gaurav Verma et.al.	2504.02793	null
2025-04-03	Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets	Chuning Zhu et.al.	2504.02792	null
2025-04-03	A Framework for Robust Cognitive Evaluation of LLMs	Karin de Langis et.al.	2504.02789	null
2025-04-03	From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks	Joshua Holstein et.al.	2504.02780	null
2025-04-03	BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs	Alexander Leszczynski et.al.	2504.02779	link
2025-04-03	How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?	Andres Algaba et.al.	2504.02767	link
2025-04-03	Robot-Led Vision Language Model Wellbeing Assessment of Children	Nida Itrat Abbasi et.al.	2504.02765	null
2025-04-03	Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study	Aryan Agrawal et.al.	2504.02733	link
2025-04-04	Why do LLMs attend to the first token?	Federico Barbero et.al.	2504.02732	null
2025-04-03	ERPO: Advancing Safety Alignment via Ex-Ante Reasoning Preference Optimization	Kehua Feng et.al.	2504.02725	null
2025-04-03	TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models	Xinquan Wang et.al.	2504.02712	null
2025-04-03	The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context	Nikhil Verma et.al.	2504.02708	null
2025-04-03	LLM for Complex Reasoning Task: An Exploratory Study in Fermi Problems	Zishuo Liu et.al.	2504.02671	null
2025-04-02	Slot-Level Robotic Placement via Visual Imitation from Single Human Video	Dandan Shan et.al.	2504.01959	null
2025-04-02	Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities	Jing Liu et.al.	2504.01954	null
2025-04-02	The LLM Wears Prada: Analysing Gender Bias and Stereotypes through Online Shopping Data	Massimiliano Luca et.al.	2504.01951	null
2025-04-02	Efficient Federated Learning Tiny Language Models for Mobile Network Feature Prediction	Daniel Becking et.al.	2504.01947	null
2025-04-02	OpenCodeReasoning: Advancing Data Distillation for Competitive Coding	Wasi Uddin Ahmad et.al.	2504.01943	null
2025-04-02	Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length?	Celine Lee et.al.	2504.01935	link
2025-04-02	A thorough benchmark of automatic text classification: From traditional approaches to large language models	Washington Cunha et.al.	2504.01930	link
2025-04-02	Gen-C: Populating Virtual Worlds with Generative Crowds	Andreas Panayiotou et.al.	2504.01924	null
2025-04-02	Is Less Really More? Fake News Detection with Limited Information	Zhaoyang Cao et.al.	2504.01922	link
2025-04-02	Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation	Baban Gain et.al.	2504.01919	null
2025-04-02	FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs	Mothilal Asokan et.al.	2504.01916	link
2025-04-02	Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning	Yinggan Xu et.al.	2504.01911	null
2025-04-02	Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Shreyank N Gowda et.al.	2504.01890	null
2025-04-02	TransientTables: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables	Abhilash Shankarampeta et.al.	2504.01879	null
2025-04-02	From Code Generation to Software Testing: AI Copilot with Context-Based RAG	Yuchen Wang et.al.	2504.01866	null
2025-04-02	Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models	Zhiwei Yu et.al.	2504.01857	null
2025-04-02	Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks	Ali Al-Kaswan et.al.	2504.01850	null
2025-04-02	LARGE: Legal Retrieval Augmented Generation Evaluation Tool	Minhu Park et.al.	2504.01840	link
2025-04-02	Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images	Nusrat Munia et.al.	2504.01838	link
2025-04-02	YourBench: Easy Custom Evaluation Sets for Everyone	Sumuk Shashidhar et.al.	2504.01833	link
2025-03-31	Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation	Shengqiong Wu et.al.	2503.24379	null
2025-03-31	ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning	Harsha Kokel et.al.	2503.24378	null
2025-03-31	Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models	Rui Wang et.al.	2503.24377	link
2025-03-31	Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1	Yi Chen et.al.	2503.24376	link
2025-03-31	Effectively Controlling Reasoning Models through Thinking Intervention	Tong Wu et.al.	2503.24370	null
2025-03-31	Adapting Vision Foundation Models for Real-time Ultrasound Image Segmentation	Xiaoran Zhang et.al.	2503.24368	null
2025-03-31	ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion	Rana Muhammad Shahroz Khan et.al.	2503.24354	null
2025-03-31	PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks	Fang Yan et.al.	2503.24345	null
2025-03-31	Can Test-Time Scaling Improve World Foundation Model?	Wenyan Cong et.al.	2503.24320	link
2025-03-31	BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models	Alok Abhishek et.al.	2503.24310	null
2025-03-31	A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG	Arshia Kermani et.al.	2503.24307	null
2025-03-31	Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning	Jiacheng Lin et.al.	2503.24289	link
2025-03-31	Style Quantization for Data-Efficient GAN Training	Jian Wang et.al.	2503.24282	null
2025-03-31	Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality	Sewoong Lee et.al.	2503.24277	link
2025-03-31	Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation	Dun Yuan et.al.	2503.24245	null
2025-03-31	What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models	Qiyuan Zhang et.al.	2503.24235	link
2025-03-31	Synthetic News Generation for Fake News Classification	Abdul Sittar et.al.	2503.24206	null
2025-03-31	TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance	Jingxian Xu et.al.	2503.24198	null
2025-03-31	Text2Tracks: Prompt-based Music Recommendation via Generative Retrieval	Enrico Palumbo et.al.	2503.24193	null
2025-03-31	Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms	Shuoming Zhang et.al.	2503.24191	null
2025-03-28	Q-Insight: Understanding Image Quality via Visual Reinforcement Learning	Weiqi Li et.al.	2503.22679	link
2025-03-28	QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?	Belinda Z. Li et.al.	2503.22674	link
2025-03-28	Exploring the Effectiveness of Multi-stage Fine-tuning for Cross-encoder Re-rankers	Francesca Pezzuti et.al.	2503.22672	link
2025-03-28	Understanding Co-speech Gestures in-the-wild	Sindhu B Hegde et.al.	2503.22668	null
2025-03-28	Unicorn: Text-Only Data Synthesis for Vision Language Model Training	Xiaomin Yu et.al.	2503.22655	link
2025-03-28	Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Antonia Karamolegkou et.al.	2503.22610	null
2025-03-28	On the Alignment of Post-Publication Reviews & Bibliometric and Altmetric Impact – A Case Study on Expert Statements from the Science Media Center Germany	Dirk Tunger et.al.	2503.22594	null
2025-03-28	LLM-enabled Instance Model Generation	Fengjunjie Pan et.al.	2503.22587	null
2025-03-28	Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish	Kevin Cohen et.al.	2503.22585	link
2025-03-28	Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation	Sarubi Thillainathan et.al.	2503.22582	null
2025-03-28	Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization	Iñigo Pikabea et.al.	2503.22577	null
2025-03-28	Niyama : Breaking the Silos of LLM Inference Serving	Kanishk Goel et.al.	2503.22562	null
2025-03-28	Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation	Zhuo-Yang Song et.al.	2503.22547	null
2025-03-28	Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities	Raman Dutt et.al.	2503.22517	null
2025-03-28	Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery	Samira Alkaee Taleghan et.al.	2503.22516	null
2025-03-28	Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model	Wangtao Sun et.al.	2503.22480	null
2025-03-28	WorkTeam: Constructing Workflows from Natural Language with Multi-Agents	Hanchao Liu et.al.	2503.22473	null
2025-03-28	Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey	Shengyue Guan et.al.	2503.22458	null
2025-03-28	Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning	Abdullah Vanlioglu et.al.	2503.22456	null
2025-03-28	STADE: Standard Deviation as a Pruning Metric	Diego Coello de Portugal Mecke et.al.	2503.22451	link
2025-03-27	Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model	Abdelrahman Shaker et.al.	2503.21782	link
2025-03-27	Video-R1: Reinforcing Video Reasoning in MLLMs	Kaituo Feng et.al.	2503.21776	link
2025-03-27	Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence	Haolin Liu et.al.	2503.21766	null
2025-03-27	Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video	David Yifan Yao et.al.	2503.21761	link
2025-03-27	MemInsight: Autonomous Memory Augmentation for LLM Agents	Rana Salama et.al.	2503.21760	null
2025-03-27	Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck	Adrian Bulat et.al.	2503.21757	null
2025-03-27	GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics	Arsham Gholamzadeh Khoee et.al.	2503.21735	null
2025-03-27	Effective Skill Unlearning through Intervention and Abstention	Yongce Li et.al.	2503.21730	link
2025-03-27	Collab: Controlled Decoding using Mixture of Agents for LLM Alignment	Souradip Chakraborty et.al.	2503.21720	null
2025-03-27	Outlier dimensions favor frequent tokens in language model	Iuri Macocco et.al.	2503.21718	null
2025-03-27	As easy as PIE: understanding when pruning causes language models to disagree	Pietro Tropeano et.al.	2503.21714	link
2025-03-27	Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs	Boyang Yang et.al.	2503.21710	null
2025-03-27	LLM-Gomoku: A Large Language Model-Based System for Strategic Gomoku with Self-Play and Reinforcement Learning	Hui Wang et.al.	2503.21683	null
2025-03-27	JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models’ Detection of Human Self-Destructive Behavior Content in Jirai Community	Yunze Xiao et.al.	2503.21679	null
2025-03-27	How do language models learn facts? Dynamics, curricula and hallucinations	Nicolas Zucchet et.al.	2503.21676	null
2025-03-27	Intelligent IoT Attack Detection Design via ODLLM with Feature Ranking-based Knowledge Base	Satvik Verma et.al.	2503.21674	link
2025-03-27	Model Assembly Learning with Heterogeneous Layer Weight Merging	Yi-Kai Zhang et.al.	2503.21657	null
2025-03-27	UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning	Zhengxi Lu et.al.	2503.21620	link
2025-03-27	Leveraging Language Models for Analyzing Longitudinal Experiential Data in Education	Ahatsham Hayat et.al.	2503.21617	null
2025-03-27	Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach	Javier Coronado-Blázquez et.al.	2503.21613	null
2025-03-26	Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark	Sondos Mahmoud Bsharat et.al.	2503.20786	link
2025-03-26	Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency	Tianqi Liu et.al.	2503.20785	link
2025-03-26	Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields	Shijie Zhou et.al.	2503.20776	null
2025-03-26	ASGO: Adaptive Structured Gradient Optimization	Kang An et.al.	2503.20762	null
2025-03-26	MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search	Yunhai Hu et.al.	2503.20757	null
2025-03-27	Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning	Huajie Tan et.al.	2503.20752	null
2025-03-26	UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines	Chen Tang et.al.	2503.20748	null
2025-03-26	MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams	Yanpeng Sun et.al.	2503.20745	null
2025-03-26	Dynamic Motion Blending for Versatile Motion Editing	Nan Jiang et.al.	2503.20724	null
2025-03-26	From Annotation to Adaptation: Metrics, Synthetic Data, and Aspect Extraction for Aspect-Based Sentiment Analysis with Large Language Models	Nikita Neveditsin et.al.	2503.20715	null
2025-03-26	MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion	Saron Samuel et.al.	2503.20698	null
2025-03-26	Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control	Eloy Anguiano Batanero et.al.	2503.20688	null
2025-03-27	Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound	Yuhao Huang et.al.	2503.20685	null
2025-03-27	Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy	Yinan Sun et.al.	2503.20673	null
2025-03-26	TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews	Huimin Xu et.al.	2503.20666	null
2025-03-26	AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction	Sadaf Khademi et.al.	2503.20662	null
2025-03-26	AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports	Xiangwen Zhang et.al.	2503.20654	null
2025-03-26	Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging	Han Wu et.al.	2503.20641	link
2025-03-26	Collaborative Storytelling and LLM: A Linguistic Analysis of Automatically-Generated Role-Playing Game Sessions	Alessandro Maisto et.al.	2503.20623	null
2025-03-26	IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting	Hao Fu et.al.	2503.20612	link
2025-03-25	SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining	Xiang Xu et.al.	2503.19912	link
2025-03-25	CoLLM: A Large Language Model for Composed Image Retrieval	Chuong Huynh et.al.	2503.19910	link
2025-03-25	FullDiT: Multi-Task Video Generative Foundation Model with Full Attention	Xuan Ju et.al.	2503.19907	null
2025-03-25	CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning	Hao Yu et.al.	2503.19900	link
2025-03-25	A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design	Jie Tian et.al.	2503.19889	null
2025-03-25	CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation	Nengbo Wang et.al.	2503.19878	null
2025-03-25	Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators	Seungone Kim et.al.	2503.19877	null
2025-03-25	SLA-Awareness for AI-assisted coding	Kishanthan Thangarajah et.al.	2503.19876	null
2025-03-25	Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking	Xiaoyu Tian et.al.	2503.19855	null
2025-03-25	Towards Online Multi-Modal Social Interaction Understanding	Xinpeng Li et.al.	2503.19851	link
2025-03-25	FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs	Carlos Plou et.al.	2503.19850	null
2025-03-25	A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950	Zhao Fang et.al.	2503.19844	null
2025-03-25	FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model	Jun Zhou et.al.	2503.19839	null
2025-03-25	Domain-incremental White Blood Cell Classification with Privacy-aware Continual Learning	Pratibha Kumari et.al.	2503.19819	null
2025-03-25	SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI	Zhiyang Liu et.al.	2503.19801	null
2025-03-25	SemEval-2025 Task 9: The Food Hazard Detection Challenge	Korbinian Randl et.al.	2503.19800	null
2025-03-25	PAVE: Patching and Adapting Video Large Language Models	Zhuoming Liu et.al.	2503.19794	link
2025-03-25	Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models	Kartik Thakral et.al.	2503.19783	null
2025-03-25	LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation	Vladan Stojnić et.al.	2503.19777	link
2025-03-25	OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations	Christina Kassab et.al.	2503.19764	null
2025-03-24	DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation	Karim Abou Zeid et.al.	2503.18944	link
2025-03-24	SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding	Mingze Xu et.al.	2503.18943	null
2025-03-24	Video-T1: Test-Time Scaling for Video Generation	Fangfu Liu et.al.	2503.18942	null
2025-03-24	Exploring Training and Inference Scaling Laws in Generative Retrieval	Hongru Cai et.al.	2503.18941	link
2025-03-24	CoMP: Continual Multimodal Pre-training for Vision Foundation Models	Yitong Chen et.al.	2503.18931	link
2025-03-24	Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training	Brian R. Bartoldson et.al.	2503.18929	null
2025-03-24	Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models	Meng Cao et.al.	2503.18923	null
2025-03-24	FFN Fusion: Rethinking Sequential Computation in Large Language Models	Akhiad Bercovich et.al.	2503.18908	null
2025-03-24	xKV: Cross-Layer SVD for KV-Cache Compression	Chi-Chih Chang et.al.	2503.18893	link
2025-03-24	AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration	Zhexuan Wang et.al.	2503.18891	link
2025-03-24	Toward building next-generation Geocoding systems: a systematic review	Zhengcong Yin et.al.	2503.18888	null
2025-03-24	I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders	Andrey Galichin et.al.	2503.18878	link
2025-03-24	Efficient Self-Supervised Adaptation for Medical Image Analysis	Moein Sorkhei et.al.	2503.18873	link
2025-03-24	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-03-24	Reasoning to Learn from Latent Thoughts	Yangjun Ruan et.al.	2503.18866	null
2025-03-24	Structuring Scientific Innovation: A Framework for Modeling and Discovering Impactful Knowledge Combinations	Junlan Chen et.al.	2503.18865	null
2025-03-24	MC-LLaVA: Multi-Concept Personalized Vision-Language Model	Ruichuan An et.al.	2503.18854	link
2025-03-24	Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations	Jeonghyeon Kim et.al.	2503.18817	link
2025-03-24	Defeating Prompt Injections by Design	Edoardo Debenedetti et.al.	2503.18813	null
2025-03-24	SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection	Shrikant Malviya et.al.	2503.18812	link
2025-03-21	Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique	Yansi Li et.al.	2503.17363	null
2025-03-21	HCAST: Human-Calibrated Autonomy Software Tasks	David Rein et.al.	2503.17354	link
2025-03-21	NdLinear Is All You Need for Representation Learning	Alex Reneau et.al.	2503.17353	link
2025-03-21	OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement	Yihe Deng et.al.	2503.17352	link
2025-03-21	Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models	Jianing Qi et.al.	2503.17349	null
2025-03-21	Capturing Individual Human Preferences with Reward Features	André Barreto et.al.	2503.17338	null
2025-03-21	Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs	Reem Gody et.al.	2503.17336	null
2025-03-21	CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities	Yuxuan Zhu et.al.	2503.17332	link
2025-03-21	LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language	Kun Chu et.al.	2503.17309	link
2025-03-21	Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests	John Naulty et.al.	2503.17302	null
2025-03-21	FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models	Mingyang Song et.al.	2503.17287	link
2025-03-21	CASE – Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement	Gaifan Zhang et.al.	2503.17279	null
2025-03-21	Revisiting End To End Sparse Autoencoder Training – A Short Finetune is All You Need	Adam Karvonen et.al.	2503.17272	link
2025-03-21	SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging	Aladin Djuhera et.al.	2503.17239	link
2025-03-21	Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology	Devavrat Tomar et.al.	2503.17238	link
2025-03-21	FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs	Albert Sawczyn et.al.	2503.17229	null
2025-03-21	Automating Adjudication of Cardiovascular Events Using Large Language Models	Sonish Sivarajkumar et.al.	2503.17222	null
2025-03-21	A Language Anchor-Guided Method for Robust Noisy Domain Generalization	Zilin Dai et.al.	2503.17211	null
2025-03-21	TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning	Sheng Wang et.al.	2503.17195	null
2025-03-21	LLMs Love Python: A Study of LLMs’ Bias for Programming Languages and Libraries	Lukas Twist et.al.	2503.17181	link
2025-03-20	DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding	Keyan Chen et.al.	2503.16426	link
2025-03-20	Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models	Yang Sui et.al.	2503.16419	link
2025-03-20	M3: 3D-Spatial MultiModal Memory	Xueyan Zou et.al.	2503.16413	link
2025-03-20	The Emperor’s New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination	Yifan Sun et.al.	2503.16402	link
2025-03-20	Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them	Guanyu Chen et.al.	2503.16401	null
2025-03-20	Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation	Yijia Luo et.al.	2503.16385	link
2025-03-20	LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images	Leyang Wang et.al.	2503.16376	null
2025-03-20	JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse	Muyao Li et.al.	2503.16365	null
2025-03-20	CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners	Yunzhi Yao et.al.	2503.16356	link
2025-03-20	Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences	Krithik Ramesh et.al.	2503.16351	null
2025-03-20	LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates	Ying Shen et.al.	2503.16334	null
2025-03-20	OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence	Long Yuan et.al.	2503.16326	null
2025-03-20	Issue2Test: Generating Reproducing Test Cases from Issue Reports	Noor Nashid et.al.	2503.16320	null
2025-03-20	Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1	Peiran Gu et.al.	2503.16304	null
2025-03-20	Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model	Zhaochong An et.al.	2503.16282	link
2025-03-20	Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens	Shuqi Lu et.al.	2503.16278	link
2025-03-20	Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data	Zijian Li et.al.	2503.16260	null
2025-03-20	Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models	Keda Tao et.al.	2503.16257	null
2025-03-21	Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning	Zhaowei Liu et.al.	2503.16252	link
2025-03-20	Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t	Quy-Anh Dang et.al.	2503.16219	link
2025-03-19	TULIP: Towards Unified Language-Image Pretraining	Zineng Tang et.al.	2503.15485	null
2025-03-19	SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks	Yifei Zhou et.al.	2503.15478	link
2025-03-19	What Makes a Reward Model a Good Teacher? An Optimization Perspective	Noam Razin et.al.	2503.15477	link
2025-03-19	Cube: A Roblox View of 3D Intelligence	Foundation AI Team et.al.	2503.15475	link
2025-03-19	EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining	Boshen Xu et.al.	2503.15470	link
2025-03-19	From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment	Jia-Nan Li et.al.	2503.15463	link
2025-03-19	SkyLadder: Better and Faster Pretraining via Context Window Scheduling	Tongyao Zhu et.al.	2503.15450	link
2025-03-19	VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning	Yang Tan et.al.	2503.15438	link
2025-03-19	Visual Position Prompt for MLLM based Visual Grounding	Wei Tang et.al.	2503.15426	link
2025-03-19	Probing the topology of the space of tokens with structured prompts	Michael Robinson et.al.	2503.15421	null
2025-03-19	Visual Persona: Foundation Model for Full-Body Human Customization	Jisu Nam et.al.	2503.15406	null
2025-03-19	FedSCA: Federated Tuning with Similarity-guided Collaborative Aggregation for Heterogeneous Medical Image Segmentation	Yumin Zhang et.al.	2503.15390	null
2025-03-19	EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models	Yinan Liang et.al.	2503.15369	null
2025-03-19	SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation	Thomas Pickard et.al.	2503.15358	null
2025-03-19	SPILL: Domain-Adaptive Intent Clustering based on Selection and Pooling with Large Language Models	I-Fan Lin et.al.	2503.15351	null
2025-03-19	TruthLens:A Training-Free Paradigm for DeepFake Detection	Ritabrata Chakraborty et.al.	2503.15342	null
2025-03-19	Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs	Yuqi Zhu et.al.	2503.15341	null
2025-03-19	Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context	Junyi Ao et.al.	2503.15338	link
2025-03-19	Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport	Hao Tan et.al.	2503.15337	link
2025-03-19	Euclid Quick Data Release (Q1) Exploring galaxy properties with a multi-modal foundation model	Euclid Collaboration et.al.	2503.15312	link
2025-03-18	Aligning Multimodal LLM with Human Preference: A Survey	Tao Yu et.al.	2503.14504	link
2025-03-18	Engineering Scientific Assistants using Interactive Structured Induction of Programs	Shraddha Surana et.al.	2503.14488	null
2025-03-18	Gricean Norms as a Basis for Effective Collaboration	Fardin Saad et.al.	2503.14484	link
2025-03-18	Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM	Xinyu Fang et.al.	2503.14478	link
2025-03-18	Characterizing Data Visualization Literacy: a Systematic Literature Review	Sara Beschi et.al.	2503.14468	null
2025-03-18	RWKV-7 “Goose” with Expressive Dynamic State Evolution	Bo Peng et.al.	2503.14456	link
2025-03-18	EnvBench: A Benchmark for Automated Environment Setup	Aleksandra Eliseeva et.al.	2503.14443	link
2025-03-18	LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers	Nikhil Abhyankar et.al.	2503.14434	link
2025-03-18	PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play	Wei Fang et.al.	2503.14432	null
2025-03-18	ExDDV: A New Dataset for Explainable Deepfake Detection in Video	Vlad Hondru et.al.	2503.14421	link
2025-03-18	Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models	Siwei Zhang et.al.	2503.14411	null
2025-03-18	Large Language Models for Virtual Human Gesture Selection	Parisa Ghanad Torshizi et.al.	2503.14408	null
2025-03-18	DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers	Mert Bulent Sariyildiz et.al.	2503.14405	null
2025-03-18	From “Hallucination” to “Suture”: Insights from Language Philosophy to Enhance Large Language Models	Qiantong Wang et.al.	2503.14392	null
2025-03-18	How much do LLMs learn from negative examples?	Shadi Hamdan et.al.	2503.14391	link
2025-03-18	Good/Evil Reputation Judgment of Celebrities by LLMs via Retrieval Augmented Generation	Rikuto Tsuchida et.al.	2503.14382	null
2025-03-18	On the Standard Performance Criteria for Applied Control Design: PID, MPC or Machine Learning Controller?	Pouria Sarhadi et.al.	2503.14379	link
2025-03-18	Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels	Maximilian Beck et.al.	2503.14376	link
2025-03-18	MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts	Runqi Meng et.al.	2503.14355	null
2025-03-19	MoonCast: High-Quality Zero-Shot Podcast Generation	Zeqian Ju et.al.	2503.14345	link
2025-03-17	MetaScale: Test-Time Scaling with Evolving Meta-Thoughts	Qin Liu et.al.	2503.13447	null
2025-03-17	MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation	Zhenyu Wu et.al.	2503.13446	null
2025-03-17	Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance	Noah Y. Siegel et.al.	2503.13445	null
2025-03-17	VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning	Ye Liu et.al.	2503.13444	link
2025-03-17	DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models	Haoyang Li et.al.	2503.13443	link
2025-03-18	MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling	Yingyue Li et.al.	2503.13440	link
2025-03-17	xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference	Maximilian Beck et.al.	2503.13427	link
2025-03-17	SuperBPE: Space Travel for Language Models	Alisa Liu et.al.	2503.13423	null
2025-03-17	A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives	Weiqiang Jin et.al.	2503.13415	null
2025-03-18	DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective	Dengyun Peng et.al.	2503.13413	link
2025-03-17	Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis	Alexander Ku et.al.	2503.13401	null
2025-03-17	MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	James Burgess et.al.	2503.13399	link
2025-03-17	Aligned Probing: Relating Toxic Behavior and Model Internals	Andreas Waldis et.al.	2503.13390	null
2025-03-17	Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning	Mengyao Lyu et.al.	2503.13383	null
2025-03-17	Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions	Wan Ju Kang et.al.	2503.13369	null
2025-03-17	Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning	Hai-Long Sun et.al.	2503.13360	null
2025-03-17	Agents Play Thousands of 3D Video Games	Zhongwen Xu et.al.	2503.13356	null
2025-03-17	Valid Text-to-SQL Generation with Unification-based DeepStochLog	Ying Jiao et.al.	2503.13342	link
2025-03-17	LearnMate: Enhancing Online Education with LLM-Powered Personalized Learning Plans and Support	Xinyu Jessica Wang et.al.	2503.13340	null
2025-03-17	Reliable and Efficient Amortized Model-based Evaluation	Sang Truong et.al.	2503.13335	null
2025-03-14	Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense	Shuyang Hao et.al.	2503.11619	null
2025-03-14	ASMA-Tune: Unlocking LLMs’ Assembly Code Comprehension via Structural-Semantic Instruction Tuning	Xinyi Wang et.al.	2503.11617	link
2025-03-14	Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages	Matteo Farina et.al.	2503.11609	link
2025-03-14	Do Construction Distributions Shape Formal Language Learning In German BabyLMs?	Bastian Bunzeck et.al.	2503.11593	null
2025-03-14	Pathology Image Compression with Pre-trained Autoencoders	Srikar Yellapragada et.al.	2503.11591	null
2025-03-14	Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space	Zhiliang Chen et.al.	2503.11586	link
2025-03-14	SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion	Ahmed Nassar et.al.	2503.11576	null
2025-03-14	Synthesizing Access Control Policies using Large Language Models	Adarsh Vatsa et.al.	2503.11573	null
2025-03-14	Implicit Bias-Like Patterns in Reasoning Models	Messi H. J. Lee et.al.	2503.11572	null
2025-03-14	VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity	Jing Bi et.al.	2503.11557	null
2025-03-14	Similarity-Aware Token Pruning: Your VLM but Faster	Ahmadreza Jeddi et.al.	2503.11549	link
2025-03-14	Potential of large language model-powered nudges for promoting daily water and energy conservation	Zonghan Li et.al.	2503.11531	null
2025-03-14	Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models	Hao Cheng et.al.	2503.11519	null
2025-03-14	HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models	Ziqin Zhou et.al.	2503.11513	null
2025-03-14	V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning	Zixu Cheng et.al.	2503.11495	null
2025-03-14	A Review of DeepSeek Models’ Key Innovative Techniques	Chengen Wang et.al.	2503.11486	null
2025-03-14	Integrating LLMs in Gamified Systems	Carlos J. Costa et.al.	2503.11458	null
2025-03-14	D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning	Jia Zhang et.al.	2503.11441	null
2025-03-14	Text Compression for Efficient Language Generation	David Gu et.al.	2503.11426	null
2025-03-14	Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models	Xu Liu et.al.	2503.11411	null
2025-03-13	GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing	Rongyao Fang et.al.	2503.10639	link
2025-03-13	A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1	Zhaoyi Li et.al.	2503.10635	link
2025-03-13	HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model	Jiaming Liu et.al.	2503.10631	null
2025-03-13	UniGoal: Towards Universal Zero-shot Goal-oriented Navigation	Hang Yin et.al.	2503.10630	null
2025-03-13	Transformers without Normalization	Jiachen Zhu et.al.	2503.10622	null
2025-03-13	From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM	Kshitij Ambilduke et.al.	2503.10620	link
2025-03-13	Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search	Andy Zhou et.al.	2503.10619	null
2025-03-13	Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models	Andy Zhou et.al.	2503.10617	null
2025-03-13	R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization	Yi Yang et.al.	2503.10615	link
2025-03-13	CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing	Advait Gupta et.al.	2503.10613	link
2025-03-13	TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention	Jinhao Duan et.al.	2503.10602	link
2025-03-13	GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding	Rui Hu et.al.	2503.10596	link
2025-03-13	Unlock the Power of Unlabeled Data in Language Driving Model	Chaoqun Wang et.al.	2503.10586	null
2025-03-13	VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search	Yiming Jia et.al.	2503.10582	null
2025-03-13	Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models	Afrar Jahin et.al.	2503.10573	null
2025-03-13	ASIDE: Architectural Separation of Instructions and Data in Language Models	Egor Zverev et.al.	2503.10566	null
2025-03-13	Short-term AI literacy intervention does not reduce over-reliance on incorrect ChatGPT recommendations	Brett Puppart et.al.	2503.10556	null
2025-03-13	KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation	Zixian Liu et.al.	2503.10546	null
2025-03-13	DP-GPL: Differentially Private Graph Prompt Learning	Jing Xu et.al.	2503.10544	null
2025-03-13	Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More	Arvid Frydenlund et.al.	2503.10542	null
2025-03-12	MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System	Jihao Zhao et.al.	2503.09600	link
2025-03-12	How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation	Ruohao Guo et.al.	2503.09598	link
2025-03-12	SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment	Katrin Renz et.al.	2503.09594	null
2025-03-12	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Md Mohaiminul Islam et.al.	2503.09590	link
2025-03-12	Cost-Optimal Grouped-Query Attention for Long-Context LLMs	Yingfa Chen et.al.	2503.09579	link
2025-03-12	Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models	Marianne Arriola et.al.	2503.09573	link
2025-03-12	Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks	Lutfi Eren Erdogan et.al.	2503.09572	null
2025-03-13	Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models	Qiguang Chen et.al.	2503.09567	null
2025-03-12	PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs	Oskar van der Wal et.al.	2503.09543	link
2025-03-13	Large Language Models for Multi-Facility Location Mechanism Design	Nguyen Thach et.al.	2503.09533	null
2025-03-13	SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability	Adam Karvonen et.al.	2503.09532	null
2025-03-12	Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning	Bowen Jin et.al.	2503.09516	link
2025-03-12	Reinforcement Learning is all You Need	Yongsheng Lian et.al.	2503.09512	null
2025-03-12	ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning	Ziyu Wan et.al.	2503.09501	link
2025-03-12	MindGYM: Enhancing Vision-Language Models via Synthetic Self-Challenging Questions	Zhe Xu et.al.	2503.09499	link
2025-03-12	Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection	Romain Thoreau et.al.	2503.09493	null
2025-03-12	Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness	Beier Zhu et.al.	2503.09487	null
2025-03-12	BAMBI: Developing Baby Language Models for Italian	Alice Suozzi et.al.	2503.09481	null
2025-03-12	SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery	Jiayuan Huang et.al.	2503.09474	null
2025-03-12	Explicit Learning and the LLM in Machine Translation	Malik Marmonier et.al.	2503.09454	link
2025-03-11	QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension	Yongdong Luo et.al.	2503.08689	link
2025-03-11	Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs	Ariba Khan et.al.	2503.08688	link
2025-03-11	Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents	Haoyu Wang et.al.	2503.08684	link
2025-03-11	Self-Taught Self-Correction for Small Language Models	Viktor Moskvoretskii et.al.	2503.08681	null
2025-03-11	Understanding and Mitigating Distribution Shifts For Machine Learning Force Fields	Tobias Kreiman et.al.	2503.08674	null
2025-03-11	Generating Robot Constitutions & Benchmarks for Semantic Safety	Pierre Sermanet et.al.	2503.08663	null
2025-03-11	Exploring the Word Sense Disambiguation Capabilities of Large Language Models	Pierpaolo Basile et.al.	2503.08662	null
2025-03-11	YuE: Scaling Open Foundation Models for Long-Form Music Generation	Ruibin Yuan et.al.	2503.08638	link
2025-03-11	LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization	Xianfeng Wu et.al.	2503.08619	link
2025-03-11	EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments	Dongping Li et.al.	2503.08604	link
2025-03-11	NSF-SciFy: Mining the NSF Awards Database for Scientific Claims	Delip Rao et.al.	2503.08600	null
2025-03-11	Proc4Gem: Foundation models for physical agency through procedural generation	Yixin Lin et.al.	2503.08593	null
2025-03-11	BiasEdit: Debiasing Stereotyped Language Models via Model Editing	Xin Xu et.al.	2503.08588	link
2025-03-11	HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding	Shehreen Azad et.al.	2503.08585	null
2025-03-11	RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding	Xichen Tan et.al.	2503.08576	null
2025-03-11	DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process	Minjun Zhu et.al.	2503.08569	null
2025-03-11	Reasoning and Sampling-Augmented MCQ Difficulty Prediction via LLMs	Wanyong Feng et.al.	2503.08551	null
2025-03-11	Transferring Extreme Subword Style Using Ngram Model-Based Logit Scaling	Craig Messner et.al.	2503.08550	null
2025-03-11	Graph of AI Ideas: Leveraging Knowledge Graphs and LLMs for AI Research Idea Generation	Xian Gao et.al.	2503.08549	null
2025-03-11	TLA: Tactile-Language-Action Model for Contact-Rich Manipulation	Peng Hao et.al.	2503.08548	null
2025-03-10	Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru	Dunant Cusipuma et.al.	2503.07587	null
2025-03-10	Talking to GDELT Through Knowledge Graphs	Audun Myers et.al.	2503.07584	null
2025-03-10	VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models	Jen-tse Huang et.al.	2503.07575	link
2025-03-10	AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning	Yangzhe Kong et.al.	2503.07557	null
2025-03-10	Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review	Samuel Ferino et.al.	2503.07556	null
2025-03-10	KSOD: Knowledge Supplement for LLMs On Demand	Haoran Li et.al.	2503.07550	null
2025-03-10	Bi-Directional Mental Model Reconciliation for Human-Robot Interaction with Large Language Models	Nina Moorman et.al.	2503.07547	null
2025-03-10	Queueing, Predictions, and LLMs: Challenges and Open Problems	Michael Mitzenmacher et.al.	2503.07545	null
2025-03-10	XIFBench: Evaluating Large Language Models on Multilingual Instruction Following	Zhenyu Li et.al.	2503.07539	null
2025-03-10	Building English ASR model with regional language support	Purvi Agrawal et.al.	2503.07522	null
2025-03-10	GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval	Justus-Jonas Erker et.al.	2503.07519	link
2025-03-10	TokenButler: Token Importance is Predictable	Yash Akhauri et.al.	2503.07518	link
2025-03-10	Language Models Fail to Introspect About Their Knowledge of Language	Siyuan Song et.al.	2503.07513	link
2025-03-10	Plume: Scaffolding Text Composition in Dashboards	Maxim Lisnic et.al.	2503.07512	null
2025-03-10	Sometimes the Model doth Preach: Quantifying Religious Bias in Open LLMs through Demographic Analysis in Asian Nations	Hari Shankar et.al.	2503.07510	link
2025-03-10	Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts	Shiu-hong Kao et.al.	2503.07503	null
2025-03-10	V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation	Guiwei Zhang et.al.	2503.07493	link
2025-03-10	LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?	Bangyan Li et.al.	2503.07487	null
2025-03-10	Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction	Zongzheng Zhang et.al.	2503.07485	link
2025-03-10	VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models	Jiacheng Ruan et.al.	2503.07478	link
2025-03-10	Advancing Vietnamese Information Retrieval with Learning Objective and Benchmark	Phu-Vinh Nguyen et.al.	2503.07470	null
2025-03-10	YOLOE: Real-Time Seeing Anything	Ao Wang et.al.	2503.07465	link
2025-03-10	GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models	Ryugo Morita et.al.	2503.07463	null
2025-03-10	MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning	Xiangru Tang et.al.	2503.07459	link
2025-03-10	LLMs syntactically adapt their language use to their conversational partner	Florian Kandra et.al.	2503.07457	null
2025-03-10	Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration	Dylan J. Foster et.al.	2503.07453	null
2025-03-10	From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development – An Opinion Paper	Sargam Yadav et.al.	2503.07450	null
2025-03-10	From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics	Jaewook Lee et.al.	2503.07429	null
2025-03-10	RePO: ReLU-based Preference Optimization	Junkang Wu et.al.	2503.07426	link
2025-03-10	REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding	Yan Tai et.al.	2503.07413	link
2025-03-10	Towards Safe Robot Foundation Models	Maximilian Tölle et.al.	2503.07404	null
2025-03-10	Keeping Representation Similarity in Finetuning for Medical Image Analysis	Wenqiang Zu et.al.	2503.07399	null
2025-03-10	Revisiting Noise in Natural Language Processing for Computational Social Science	Nadav Borenstein et.al.	2503.07395	null
2025-03-10	Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs	Gonzalo Mancera et.al.	2503.07384	null
2025-03-10	Process-Supervised LLM Recommenders via Flow-guided Tuning	Chongming Gao et.al.	2503.07377	link
2025-03-10	Artificial Utopia: Simulation and Intelligent Agents for a Democratised Future	Yannick Oswald et.al.	2503.07364	null
2025-03-07	Fairness-Aware Low-Rank Adaptation Under Demographic Privacy Constraints	Parameswaran Kamalaruban et.al.	2503.05684	null
2025-03-07	Understanding the Limits of Lifelong Knowledge Editing in LLMs	Lukas Thede et.al.	2503.05683	null
2025-03-07	A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval	Yu Zhang et.al.	2503.05659	link
2025-03-07	Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings	Xuanqing Liu et.al.	2503.05620	null
2025-03-07	A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models	Dong Shu et.al.	2503.05613	null
2025-03-07	From Theory to Application: A Practical Introduction to Neural Operators in Scientific Computing	Prashant K. Jha et.al.	2503.05598	link
2025-03-07	R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning	Huatong Song et.al.	2503.05592	null
2025-03-07	Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data	Shiping Yang et.al.	2503.05587	null
2025-03-07	Evaluating open-source Large Language Models for automated fact-checking	Nicolo’ Fontana et.al.	2503.05565	null
2025-03-07	Revitalizing Saturated Benchmarks: A Weighted Metric Approach for Differentiating Large Language Model Performance	Bryan Etzine et.al.	2503.05551	null
2025-03-07	Leveraging Approximate Caching for Faster Retrieval-Augmented Generation	Shai Bergman et.al.	2503.05530	null
2025-03-07	PoSSUM: A Protocol for Surveying Social-media Users with Multimodal LLMs	Roberto Cerina et.al.	2503.05529	null
2025-03-07	Cognitive Bias Detection Using Advanced Prompt Engineering	Frederic Lemieux et.al.	2503.05516	null
2025-03-07	Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?	Qingyuan Liang et.al.	2503.05507	null
2025-03-07	Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering	Yusong Ke et.al.	2503.05505	null
2025-03-07	Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders	Qijiong Liu et.al.	2503.05493	null
2025-03-07	Maximum Hallucination Standards for Domain-Specific Large Language Models	Tingmingke Lu et.al.	2503.05481	null
2025-03-07	The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence	Noah Mamie et.al.	2503.05473	null
2025-03-07	Soft Policy Optimization: Online Off-Policy RL for Sequence Models	Taco Cohen et.al.	2503.05453	null
2025-03-07	LLM-based Iterative Approach to Metamodeling in Automotive	Nenad Petrovic et.al.	2503.05449	null
2025-03-06	L $^2$ M: Mutual Information Scaling Law for Long-Context Language Modeling	Zhuo Chen et.al.	2503.04725	link
2025-03-06	LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM	Sambal Shikhar et.al.	2503.04724	null
2025-03-07	Shifting Long-Context LLMs Research from Input to Output	Yuhao Wu et.al.	2503.04723	null
2025-03-06	Enough Coin Flips Can Make LLMs Act Bayesian	Ritwik Gupta et.al.	2503.04722	null
2025-03-06	Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities	Guan-Ting Lin et.al.	2503.04721	link
2025-03-06	Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining	Houyi Li et.al.	2503.04715	null
2025-03-06	Scaling Rich Style-Prompted Text-to-Speech Datasets	Anuj Diwan et.al.	2503.04713	link
2025-03-06	Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size	Alireza Behtash et.al.	2503.04704	null
2025-03-06	L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning	Pranjal Aggarwal et.al.	2503.04697	null
2025-03-06	UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets	Wenyu Wang et.al.	2503.04693	null
2025-03-06	Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases	Pengcheng Qiu et.al.	2503.04691	null
2025-03-06	LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue	Sangyeop Kim et.al.	2503.04675	null
2025-03-06	An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding	Dou Hu et.al.	2503.04667	link
2025-03-06	CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models	Shengzhuang Chen et.al.	2503.04655	link
2025-03-06	Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators	Blaine Quackenbush et.al.	2503.04649	link
2025-03-06	Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment	Wen Yang et.al.	2503.04647	link
2025-03-06	Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation	Aishik Konwer et.al.	2503.04639	null
2025-03-06	Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking	Yijie Xu et.al.	2503.04636	null
2025-03-06	Better Process Supervision with Bi-directional Rewarding Signals	Wenxiang Chen et.al.	2503.04618	null
2025-03-06	Towards Data-Efficient Language Models: A Child-Inspired Approach to Language Learning	Mohammad Amin Ghanizadeh et.al.	2503.04611	null
2025-03-05	The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems	Richard Ren et.al.	2503.03750	null
2025-03-05	Process-based Self-Rewarding Language Models	Shimao Zhang et.al.	2503.03746	link
2025-03-05	CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning	Yuqi Zhou et.al.	2503.03743	link
2025-03-05	Towards Understanding Distilled Reasoning Models: A Representational Approach	David D. Baek et.al.	2503.03730	null
2025-03-05	Improving LLM Safety Alignment with Dual-Objective Optimization	Xuandong Zhao et.al.	2503.03710	link
2025-03-05	Effective LLM Knowledge Learning via Model Generalization	Mingkang Zhu et.al.	2503.03705	null
2025-03-05	A Practical Memory Injection Attack against LLM Agents	Shen Dong et.al.	2503.03704	null
2025-03-05	Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models	Jiyue Jiang et.al.	2503.03702	null
2025-03-05	Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks	Zihao Zhao et.al.	2503.03687	link
2025-03-05	Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models	Bar Karov et.al.	2503.03669	link
2025-03-05	Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction	Gustaw Opiełka et.al.	2503.03666	link
2025-03-05	Robust Learning of Diverse Code Edits	Tushar Aggarwal et.al.	2503.03656	null
2025-03-05	Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset	Jessica Hoffmann et.al.	2503.03654	null
2025-03-05	Token-Level Privacy in Large Language Models	Re’em Harel et.al.	2503.03652	null
2025-03-05	Psy-Copilot: Visual Chain of Thought for Counseling	Keqi Chen et.al.	2503.03645	null
2025-03-05	Large language models in finance: estimating financial sentiment for stock prediction	Kemal Kirtac et.al.	2503.03612	null
2025-03-05	Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering	Lingli Cao et.al.	2503.03609	link
2025-03-05	Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling	Keqi Chen et.al.	2503.03607	null
2025-03-05	Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders	Kristian Kuznetsov et.al.	2503.03601	null
2025-03-05	Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs	Haoran Fan et.al.	2503.03594	link
2025-03-04	Wikipedia in the Era of LLMs: Evolution and Risks	Siming Huang et.al.	2503.02879	link
2025-03-04	Language Models can Self-Improve at State-Value Estimation for Better Search	Ethan Mendes et.al.	2503.02878	link
2025-03-04	SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models	Dmitry Nechaev et.al.	2503.02876	link
2025-03-04	The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models	Ke Ji et.al.	2503.02875	null
2025-03-04	Prompting Generative AI with Interaction-Augmented Instructions	Leixian Shen et.al.	2503.02874	null
2025-03-04	FairSense-AI: Responsible AI Meets Sustainability	Shaina Raza et.al.	2503.02865	null
2025-03-04	Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework	Ziang Zhou et.al.	2503.02863	null
2025-03-04	Privacy and Accuracy-Aware AI/ML Model Deduplication	Hong Guan et.al.	2503.02862	null
2025-03-04	(How) Do Language Models Track State?	Belinda Z. Li et.al.	2503.02854	null
2025-03-04	Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers	Zicong He et.al.	2503.02851	link
2025-03-04	Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs	Yuzhe Gu et.al.	2503.02846	link
2025-03-04	Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training	Paul Janson et.al.	2503.02844	null
2025-03-04	AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation	Songming Zhang et.al.	2503.02832	null
2025-03-04	Developing a PET/CT Foundation Model for Cross-Modal Anatomical and Functional Imaging	Yujin Oh et.al.	2503.02824	null
2025-03-04	“What If Smart Homes Could See Our Homes?”: Exploring DIY Smart Home Building Experiences with VLM-Based Camera Sensors	Sojeong Yun et.al.	2503.02816	null
2025-03-04	Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression	Nathan Godey et.al.	2503.02812	link
2025-03-04	RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration	Alicia Russell-Gilbert et.al.	2503.02800	null
2025-03-04	Multimodal AI predicts clinical outcomes of drug combinations from preclinical data	Yepeng Huang et.al.	2503.02781	link
2025-03-04	Implicit Bias in LLMs: A Survey	Xinru Lin et.al.	2503.02776	null
2025-03-04	InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training	Dingdong Wang et.al.	2503.02769	null
2025-02-28	LLM Post-Training: A Deep Dive into Reasoning Large Language Models	Komal Kumar et.al.	2502.21321	link
2025-02-28	Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos	Zhiyu Tan et.al.	2502.21314	null
2025-02-28	FANformer: Improving Large Language Models Through Effective Periodicity Modeling	Yihong Dong et.al.	2502.21309	link
2025-02-28	Contextualizing biological perturbation experiments through language	Menghua Wu et.al.	2502.21290	link
2025-02-28	Adaptive Keyframe Sampling for Long Video Understanding	Xi Tang et.al.	2502.21271	null
2025-03-03	Foundation Models – A Panacea for Artificial Intelligence in Pathology?	Nita Mulliqi et.al.	2502.21264	null
2025-02-28	Modeling Human Beliefs about AI Behavior for Scalable Oversight	Leon Lang et.al.	2502.21262	null
2025-02-28	PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts	Boxiao Yu et.al.	2502.21260	null
2025-02-28	RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete	Yuheng Ji et.al.	2502.21257	null
2025-02-28	TimesBERT: A BERT-Style Foundation Model for Time Series Understanding	Haoran Zhang et.al.	2502.21245	null
2025-03-04	Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs	Xiaomin Li et.al.	2502.21239	null
2025-02-28	Transforming Tuberculosis Care: Optimizing Large Language Models For Enhanced Clinician-Patient Communication	Daniil Filienko et.al.	2502.21236	null
2025-02-28	ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs	Hao Ge et.al.	2502.21231	null
2025-03-03	ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer	Omer Goldman et.al.	2502.21228	null
2025-02-28	Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought	Jianhao Huang et.al.	2502.21212	null
2025-02-28	Chronologically Consistent Large Language Models	Songrun He et.al.	2502.21206	null
2025-02-28	$Δ$ -model correction of Foundation Model based on the models own understanding	Mads-Peter Verner Christiansen et.al.	2502.21179	null
2025-03-03	Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models	Ruta Binkyte et.al.	2502.21123	null
2025-02-28	Optimizing Large Language Models for ESG Activity Detection in Financial Texts	Mattia Birti et.al.	2502.21112	link
2025-02-28	Large Language Model-Based Benchmarking Experiment Settings for Evolutionary Multi-Objective Optimization	Lie Meng Pang et.al.	2502.21108	null
2025-02-27	R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts	Zhongyang Li et.al.	2502.20395	link
2025-02-27	Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis	Jeffrey Yang Fan Chiang et.al.	2502.20383	null
2025-02-27	Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers	Shalev Lifshitz et.al.	2502.20379	null
2025-02-27	PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation	Albert Gong et.al.	2502.20377	link
2025-02-27	Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization	Ryan C. Barron et.al.	2502.20364	link
2025-02-27	Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs	Kuan Lok Zhou et.al.	2502.20356	null
2025-02-27	KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model	Kai Zhang et.al.	2502.20350	null
2025-02-27	Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models	Yi Jing et.al.	2502.20344	null
2025-02-27	Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners	Daniele Paliotta et.al.	2502.20339	null
2025-02-27	Expertise Is What We Want	Alan Ashworth et.al.	2502.20335	null
2025-02-27	Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models	Yukang Yang et.al.	2502.20332	null
2025-02-27	Long-Context Inference with Retrieval-Augmented Speculative Decoding	Guanzheng Chen et.al.	2502.20330	link
2025-02-27	LangProBe: a Language Programs Benchmark	Shangyin Tan et.al.	2502.20315	null
2025-02-27	EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants	Franck Cappello et.al.	2502.20309	link
2025-02-27	M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging	Jinghao Feng et.al.	2502.20301	null
2025-02-27	An exploration of features to improve the generalisability of fake news detection models	Nathaniel Hoy et.al.	2502.20299	null
2025-02-27	Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription	Benjamin Gutteridge et.al.	2502.20295	link
2025-02-27	Visual Adaptive Prompting for Compositional Zero-Shot Learning	Kyle Stein et.al.	2502.20292	null
2025-02-27	Conformal Tail Risk Control for Large Language Model Alignment	Catherine Yu-Chi Chen et.al.	2502.20285	null
2025-02-27	Evaluating Human Trust in LLM-Based Planners: A Preliminary Study	Shenghui Chen et.al.	2502.20284	null
2025-02-26	Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models	Lucy Xiaoyang Shi et.al.	2502.19417	null
2025-02-26	Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing	Akshat Gupta et.al.	2502.19416	null
2025-02-26	Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation	Shiven Sinha et.al.	2502.19414	link
2025-02-26	Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs	Christoph Schuhmann et.al.	2502.19413	null
2025-02-26	Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs	Dayu Yang et.al.	2502.19411	link
2025-02-26	Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices	Xinru Wang et.al.	2502.19410	null
2025-02-26	ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models	Danae Sánchez Villegas et.al.	2502.19409	null
2025-02-26	Learning Code-Edit Embedding to Model Student Debugging Behavior	Hasnain Heickal et.al.	2502.19407	null
2025-02-26	General Reasoning Requires Learning to Reason from the Get-go	Seungwook Han et.al.	2502.19402	null
2025-02-26	TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding	Max Ku et.al.	2502.19400	null
2025-02-26	LiDAR Registration with Visual Foundation Models	Niclas Vödisch et.al.	2502.19374	null
2025-02-26	Deep Learning For Time Series Analysis With Application On Human Motion	Ali Ismail-Fawaz et.al.	2502.19364	null
2025-02-26	DataMan: Data Manager for Pre-training Large Language Models	Ru Peng et.al.	2502.19363	null
2025-02-26	Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?	Yancheng He et.al.	2502.19361	link
2025-02-26	Controlled Diversity: Length-optimized Natural Language Generation	Diana Marie Schenke et.al.	2502.19347	null
2025-02-26	Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets	Tohida Rehman et.al.	2502.19339	null
2025-02-26	I Know What I Don’t Know: Improving Model Cascades Through Confidence Tuning	Stephan Rabanser et.al.	2502.19335	null
2025-02-26	Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems	Hao Peng et.al.	2502.19328	link
2025-02-26	Shh, don’t say that! Domain Certification in LLMs	Cornelius Emde et.al.	2502.19320	null
2025-02-26	Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond	Qizhou Wang et.al.	2502.19301	null
2025-02-25	DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers	Xueguang Ma et.al.	2502.18460	link
2025-02-25	LLM-Based Design Pattern Detection	Christian Schindler et.al.	2502.18458	null
2025-02-25	Evaluating the Effectiveness of Small Language Models in Detecting Refactoring Bugs	Rohit Gheyi et.al.	2502.18454	null
2025-02-25	FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response	Mollie Shichman et.al.	2502.18452	null
2025-02-25	SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Yuxiang Wei et.al.	2502.18449	null
2025-02-25	olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models	Jake Poznanski et.al.	2502.18443	link
2025-02-25	MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning	Chanwoo Park et.al.	2502.18439	null
2025-02-25	Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions	Yizhe Zhang et.al.	2502.18435	null
2025-02-25	Exploring Gender Disparities in Automatic Speech Recognition Technology	Hend ElGhazaly et.al.	2502.18434	null
2025-02-25	TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning	Frederikus Hudi et.al.	2502.18431	link
2025-02-25	PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback	Nils Wandel et.al.	2502.18425	null
2025-02-25	Compressing Language Models for Specialized Domains	Miles Williams et.al.	2502.18424	null
2025-02-25	Rank1: Test-Time Compute for Reranking in Information Retrieval	Orion Weller et.al.	2502.18418	link
2025-02-25	OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference	Xiangyu Zhao et.al.	2502.18411	link
2025-02-25	Enhancing DNA Foundation Models to Address Masking Inefficiencies	Monireh Safari et.al.	2502.18405	null
2025-02-25	Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods	Nicola Cecere et.al.	2502.18389	null
2025-02-25	How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities	Minhua Lin et.al.	2502.18387	null
2025-02-25	MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning	Sepehr Asgarian et.al.	2502.18371	null
2025-02-25	Responsible AI Agents	Deven R. Desai et.al.	2502.18359	null
2025-02-25	Which Contributions Deserve Credit? Perceptions of Attribution in Human-AI Co-Creation	Jessica He et.al.	2502.18357	null
2025-02-24	Introducing Visual Perception Token into Multimodal Large Language Model	Runpeng Yu et.al.	2502.17425	link
2025-02-24	MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs	Jiarui Zhang et.al.	2502.17422	link
2025-02-24	LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification	Penghui Yang et.al.	2502.17421	link
2025-02-24	The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence	Tom Wollschläger et.al.	2502.17420	null
2025-02-24	From System 1 to System 2: A Survey of Reasoning Large Language Models	Zhong-Zhi Li et.al.	2502.17419	link
2025-02-24	Reasoning with Latent Thoughts: On the Power of Looped Transformers	Nikunj Saunshi et.al.	2502.17416	null
2025-02-24	COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs	Liming Liu et.al.	2502.17410	link
2025-02-24	Large Language Models are Powerful EHR Encoders	Stefan Hegselmann et.al.	2502.17403	link
2025-02-24	Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models	Alon Albalak et.al.	2502.17387	link
2025-02-24	Bridging Gaps in Natural Language Processing for Yorùbá: A Systematic Review of a Decade of Progress and Prospects	Toheeb A. Jimoh et.al.	2502.17364	null
2025-02-24	A Closer Look at TabPFN v2: Strength, Limitation, and Extension	Han-Jia Ye et.al.	2502.17361	null
2025-02-24	RELICT: A Replica Detection Framework for Medical Image Generation	Orhun Utku Aydin et.al.	2502.17360	link
2025-02-24	DIS-CO: Discovering Copyrighted Content in VLMs Training Data	André V. Duarte et.al.	2502.17358	link
2025-02-24	Distributional Scaling Laws for Emergent Capabilities	Rosie Zhao et.al.	2502.17356	null
2025-02-24	On Relation-Specific Neurons in Large Language Models	Yihong Liu et.al.	2502.17355	link
2025-02-24	How Scientists Use Large Language Models to Program	Gabrielle O’Brien et.al.	2502.17348	null
2025-02-24	Time series forecasting based on optimized LLM for fault prediction in distribution power grid insulators	João Pedro Matos-Carvalho et.al.	2502.17341	null
2025-02-24	Tokenized SAEs: Disentangling SAE Reconstructions	Thomas Dooms et.al.	2502.17332	null
2025-02-24	HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization	Zhenghao Liu et.al.	2502.17315	link
2025-02-24	`Generalization is hallucination’ through the lens of tensor completions	Liang Ze Wong et.al.	2502.17305	null
2025-02-21	ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval	Guanqi Zhan et.al.	2502.15682	null
2025-02-21	Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training	Jaydeep Borkar et.al.	2502.15680	link
2025-02-21	BOSS: Benchmark for Observation Space Shift in Long-Horizon Task	Yue Yang et.al.	2502.15679	null
2025-02-21	Testing the limits of fine-tuning to improve reasoning in vision language models	Luca M. Schulze Buschoff et.al.	2502.15678	null
2025-02-21	FLEKE: Federated Locate-then-Edit Knowledge Editing	Zongkai Zhao et.al.	2502.15677	link
2025-02-21	AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind	Zhining Zhang et.al.	2502.15676	link
2025-02-21	Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing	Shoumik Saha et.al.	2502.15666	link
2025-02-21	Machine-generated text detection prevents language model collapse	George Drayson et.al.	2502.15654	link
2025-02-21	Empowering LLMs with Logical Reasoning: A Comprehensive Survey	Fengxiang Cheng et.al.	2502.15652	null
2025-02-21	Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models	Anirudh Sundar et.al.	2502.15639	null
2025-02-21	Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification	Vasilii Feofanov et.al.	2502.15637	link
2025-02-21	The Relationship Between Reasoning and Performance in Large Language Models – o3 (mini) Thinks Harder, Not Longer	Marthe Ballon et.al.	2502.15631	link
2025-02-21	Extraction multi-étiquettes de relations en utilisant des couches de Transformer	Ngoc Luyen Le et.al.	2502.15619	null
2025-02-21	Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing	Qi Le et.al.	2502.15618	link
2025-02-21	PDeepPP:A Deep learning framework with Pretrained Protein language for peptide classification	Jixiu Zhai et.al.	2502.15610	link
2025-02-21	On the Robustness of Transformers against Context Hijacking for Linear Classification	Tianle Li et.al.	2502.15609	null
2025-02-21	Cross-Format Retrieval-Augmented Generation in XR with LLMs for Context-Aware Maintenance Assistance	Akos Nagy et.al.	2502.15604	null
2025-02-21	Do Multilingual LLMs Think In English?	Lisa Schut et.al.	2502.15603	null
2025-02-21	WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents	Xinhang Liu et.al.	2502.15601	null
2025-02-21	SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention	Jiaqi Wu et.al.	2502.15594	null
2025-02-20	LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention	Shang Yang et.al.	2502.14866	link
2025-02-20	Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning	Shuyue Stella Li et.al.	2502.14860	link
2025-02-20	FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling	Weilin Zhao et.al.	2502.14856	null
2025-02-20	Prompt-to-Leaderboard	Evan Frick et.al.	2502.14855	link
2025-02-20	GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks	Jianwen Luo et.al.	2502.14848	link
2025-02-20	Red-Teaming LLM Multi-Agent Systems via Communication Attacks	Pengfei He et.al.	2502.14847	null
2025-02-20	Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation	Yue Yang et.al.	2502.14846	null
2025-02-20	Revealing and Mitigating Over-Attention in Knowledge Editing	Pinzheng Wang et.al.	2502.14838	link
2025-02-20	LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models	Shangqing Tu et.al.	2502.14834	link
2025-02-20	Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs	Danni Liu et.al.	2502.14830	link
2025-02-20	Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps	Martin Tutek et.al.	2502.14829	link
2025-02-20	Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison	Aiswarya Baby et.al.	2502.14827	null
2025-02-20	A Survey of Model Architectures in Information Retrieval	Zhichao Xu et.al.	2502.14822	null
2025-02-20	eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables	Luis Antonio Gutiérrez Guanilo et.al.	2502.14820	null
2025-02-20	Dynamic Low-Rank Sparse Adaptation for Large Language Models	Weizhong Huang et.al.	2502.14816	link
2025-02-20	FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis	Fadillah Maani et.al.	2502.14807	link
2025-02-20	From RAG to Memory: Non-Parametric Continual Learning for Large Language Models	Bernal Jiménez Gutiérrez et.al.	2502.14802	link
2025-02-20	A Multi-Agent Perspective on Modern Information Retrieval	Haya Nachimovsky et.al.	2502.14796	null
2025-02-20	Rapid Word Learning Through Meta In-Context Learning	Wentao Wang et.al.	2502.14791	null
2025-02-20	SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features	Michael Tschannen et.al.	2502.14786	link
2025-02-19	Where’s the Bug? Attention Probing for Scalable Fault Localization	Adam Stein et.al.	2502.13966	null
2025-02-19	Autellix: An Efficient Serving Engine for LLM Agents as General Programs	Michael Luo et.al.	2502.13965	null
2025-02-19	MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads	Weihao Liu et.al.	2502.13963	link
2025-02-19	Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering	William Jurayj et.al.	2502.13962	null
2025-02-19	LIDDIA: Language-based Intelligent Drug Discovery Agent	Reza Averly et.al.	2502.13959	null
2025-02-19	Neurosymbolic artificial intelligence via large language models and coherence-driven inference	Steve Huntsman et.al.	2502.13953	null
2025-02-19	Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region	Chak Tou Leong et.al.	2502.13946	null
2025-02-19	A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models	Hao Huang et.al.	2502.13942	null
2025-02-19	Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images	Shengguang Wu et.al.	2502.13928	null
2025-02-19	Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences?	Xiaochen Wang et.al.	2502.13925	null
2025-02-19	LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization	Guanzheng Chen et.al.	2502.13922	link
2025-02-19	Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis	Jiahao Gai et.al.	2502.13921	null
2025-02-19	Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health	Xingbo Wang et.al.	2502.13920	link
2025-02-19	TESS 2: A Large-Scale Generalist Diffusion Language Model	Jaesung Tae et.al.	2502.13917	link
2025-02-19	How Do LLMs Perform Two-Hop Reasoning in Context?	Tianyu Guo et.al.	2502.13913	null
2025-02-19	Lost in Sequence: Do Large Language Models Understand Sequential Recommendation?	Sein Kim et.al.	2502.13909	link
2025-02-19	Judging the Judges: A Collection of LLM-Generated Relevance Judgements	Hossein A. Rahmani et.al.	2502.13908	link
2025-02-19	DataSciBench: An LLM Agent Benchmark for Data Science	Dan Zhang et.al.	2502.13897	link
2025-02-19	NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants	Yiran Qin et.al.	2502.13894	null
2025-02-19	Refining embeddings with fill-tuning: data-efficient generalised performance improvements for materials foundation models	Matthew P. Wilson et.al.	2502.13886	link
2025-02-18	Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Shuo Xing et.al.	2502.13146	link
2025-02-18	Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation	Bencheng Liao et.al.	2502.13145	link
2025-02-18	Pre-training Auto-regressive Robotic Models with 4D Representations	Dantong Niu et.al.	2502.13142	null
2025-02-18	UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models	Huawei Lin et.al.	2502.13141	link
2025-02-18	AIDE: AI-Driven Exploration in the Space of Code	Zhengyao Jiang et.al.	2502.13138	link
2025-02-18	Theorem Prover as a Judge for Synthetic Data Generation	Joshua Ong Jun Leang et.al.	2502.13137	null
2025-02-18	Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions	Taedong Yun et.al.	2502.13135	null
2025-02-18	Learning to Defer for Causal Discovery with Imperfect Experts	Oscar Clivio et.al.	2502.13132	null
2025-02-18	Rethinking Diverse Human Preference Learning through Principal Component Analysis	Feng Luo et.al.	2502.13131	null
2025-02-18	Magma: A Foundation Model for Multimodal AI Agents	Jianwei Yang et.al.	2502.13130	link
2025-02-18	Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning	Jingyang Lin et.al.	2502.13127	null
2025-02-18	RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises	Zenan Zhai et.al.	2502.13125	link
2025-02-18	Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context	Marion Bartl et.al.	2502.13120	null
2025-02-18	STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models	Narun Raman et.al.	2502.13119	null
2025-02-18	Performance Evaluation of Large Language Models in Statistical Programming	Xinyi Song et.al.	2502.13117	link
2025-02-18	MatterChat: A Multi-Modal LLM for Material Science	Yingheng Tang et.al.	2502.13107	null
2025-02-18	Understanding and Rectifying Safety Perception Distortion in VLMs	Xiaohan Zou et.al.	2502.13095	null
2025-02-18	Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	Mengkang Hu et.al.	2502.13092	null
2025-02-18	KAPPA: A Generic Patent Analysis Framework with Keyphrase-Based Portraits	Xin Xia et.al.	2502.13076	null
2025-02-18	Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity	Yuri Kuratov et.al.	2502.13063	link
2025-02-17	Idiosyncrasies in Large Language Models	Mingjie Sun et.al.	2502.12150	link
2025-02-17	HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation	Ling Yang et.al.	2502.12148	link
2025-02-17	Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control	Jinyan Su et.al.	2502.12145	link
2025-02-17	Small Models Struggle to Learn from Strong Reasoners	Yuetai Li et.al.	2502.12143	null
2025-02-17	SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs	Yige Xu et.al.	2502.12134	link
2025-02-17	Transformer Dynamics: A neuroscientific approach to interpretability of large language models	Jesseba Fernando et.al.	2502.12131	null
2025-02-17	Scaling Autonomous Agents via Automatic Reward Modeling And Planning	Zhenfang Chen et.al.	2502.12130	null
2025-02-17	On the Query Complexity of Verifier-Assisted Language Generation	Edoardo Botta et.al.	2502.12123	null
2025-02-17	Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA	Patryk Marszałek et.al.	2502.12122	link
2025-02-17	LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws	Prasanna Mayilvahanan et.al.	2502.12120	null
2025-02-17	PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection	Jinhe Bi et.al.	2502.12119	null
2025-02-17	A-MEM: Agentic Memory for LLM Agents	Wujiang Xu et.al.	2502.12110	link
2025-02-17	Personality Structured Interview for Large Language Model Simulation in Personality Research	Pengda Wang et.al.	2502.12109	null
2025-02-17	Relational Norms for Human-AI Cooperation	Brian D. Earp et.al.	2502.12102	null
2025-02-17	Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications	Li Qiao et.al.	2502.12096	null
2025-02-17	Descriminative-Generative Custom Tokens for Vision-Language Models	Pramuditha Perera et.al.	2502.12095	null
2025-02-17	Meta-Statistical Learning: Supervised Learning of Statistical Inference	Maxime Peyrard et.al.	2502.12088	null
2025-02-17	APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs	Yuxiang Huang et.al.	2502.12085	link
2025-02-17	VLM $^2$ -Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues	Jianshu Zhang et.al.	2502.12084	null
2025-02-17	AdaSplash: Adaptive Sparse Flash Attention	Nuno Gonçalves et.al.	2502.12082	link
2025-02-14	MM-RLHF: The Next Step Forward in Multimodal LLM Alignment	Yi-Fan Zhang et.al.	2502.10391	null
2025-02-14	Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction	WonJin Yoon et.al.	2502.10388	null
2025-02-14	Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models	Jiexin Ding et.al.	2502.10378	null
2025-02-14	Robustness tests for biomedical foundation models should tailor to specification	R. Patrick Xian et.al.	2502.10374	link
2025-02-14	Enhancing Multilingual LLM Pretraining with Model-Based Data Selection	Bettina Messmer et.al.	2502.10361	null
2025-02-14	Organize the Web: Constructing Domains Enhances Pre-Training Data Curation	Alexander Wettig et.al.	2502.10341	null
2025-02-14	Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering	Nick Ferguson et.al.	2502.10338	null
2025-02-14	LLM-Powered Preference Elicitation in Combinatorial Assignment	Ermis Soumalias et.al.	2502.10308	null
2025-02-14	SPIRIT: Short-term Prediction of solar IRradIance for zero-shot Transfer learning using Foundation Models	Aditya Mishra et.al.	2502.10307	null
2025-02-14	Open-Source AI-Powered Optimization in Scalene: Advancing Python Performance Profiling with DeepSeek-R1 and LLaMA 3.2	Saem Hasan et.al.	2502.10299	null
2025-02-14	DeltaProduct: Increasing the Expressivity of DeltaNet Through Products of Householders	Julien Siems et.al.	2502.10297	link
2025-02-14	Probing Perceptual Constancy in Large Vision Language Models	Haoran Sun et.al.	2502.10273	null
2025-02-14	Are Large Language Models the future crowd workers of Linguistics?	Iris Ferrazzo et.al.	2502.10266	null
2025-02-14	Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers	Aivin V. Solatorio et.al.	2502.10263	link
2025-02-14	VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models	Gokul Karthik Kumar et.al.	2502.10250	null
2025-02-14	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	Guoqing Ma et.al.	2502.10248	link
2025-02-14	Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices	Mohamed Aboelenien Ahmed et.al.	2502.10239	null
2025-02-14	AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting	Abdelhakim Benechehab et.al.	2502.10235	link
2025-02-14	Do Large Language Models Reason Causally Like Us? Even Better?	Hanna M. Dettki et.al.	2502.10215	null
2025-02-14	Can Post-Training Quantization Benefit from an Additional QLoRA Integration?	Xiliang Zhu et.al.	2502.10202	null
2025-02-13	Theoretical Benefit and Limitation of Diffusion Language Model	Guhao Feng et.al.	2502.09622	null
2025-02-13	MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Dongzhi Jiang et.al.	2502.09621	null
2025-02-13	Exploring the Potential of Encoder-free Architectures in 3D LMMs	Yiwen Tang et.al.	2502.09620	link
2025-02-13	Human-LLM Coevolution: Evidence from Academic Writing	Mingmeng Geng et.al.	2502.09606	null
2025-02-13	SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models	Yung-Sung Chuang et.al.	2502.09604	link
2025-02-13	GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis	Angelos Zavras et.al.	2502.09598	link
2025-02-13	Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs	Siyan Zhao et.al.	2502.09597	link
2025-02-13	KIMAs: A Configurable Knowledge Integrated Multi-Agent System	Zitao Li et.al.	2502.09596	null
2025-02-13	Logical forms complement probability in understanding language model (and human) performance	Yixuan Wang et.al.	2502.09589	null
2025-02-13	Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks	Qian Wan et.al.	2502.09577	null
2025-02-13	MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing	Vlad Andrei Negru et.al.	2502.09567	null
2025-02-13	Zero-shot generation of synthetic neurosurgical data with large language models	Austin A. Barr et.al.	2502.09566	link
2025-02-13	MDCrow: Automating Molecular Dynamics Workflows with Large Language Models	Quintina Campbell et.al.	2502.09565	link
2025-02-13	EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents	Rui Yang et.al.	2502.09560	null
2025-02-13	Explainable AI-assisted Optimization for Feynman Integral Reduction	Zhuo-Yang Song et.al.	2502.09544	null
2025-02-13	Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages	Shreyan Biswas et.al.	2502.09532	null
2025-02-13	When and How Does CLIP Enable Domain and Compositional Generalization?	Elias Kempf et.al.	2502.09507	link
2025-02-13	Improve LLM-based Automatic Essay Scoring with Linguistic Features	Zhaoyi Joey Hou et.al.	2502.09497	null
2025-02-13	Foundation Neural-Network Quantum States	Riccardo Rende et.al.	2502.09488	null
2025-02-13	Objective quantification of mood states using large language models	Jakub Onysk et.al.	2502.09487	null
2025-02-12	SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation	Ellie Arar et.al.	2502.08642	null
2025-02-12	Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples	Andrianos Michail et.al.	2502.08638	null
2025-02-12	Ensemble based approach to quantifying uncertainty of LLM based classifications	Srijith Rajamohan et.al.	2502.08631	null
2025-02-12	Continuous Cardiac Arrest Prediction in ICU using PPG Foundation Model	Saurabh Kataria et.al.	2502.08612	null
2025-02-12	Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors	Vishwanath Pratap Singh et.al.	2502.08587	null
2025-02-12	Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks	Ang Li et.al.	2502.08586	null
2025-02-12	COAST: Intelligent Time-Adaptive Neural Operators	Zhikai Wu et.al.	2502.08574	null
2025-02-12	QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval	Wonduk Seo et.al.	2502.08557	null
2025-02-12	Human-Centric Foundation Models: Perception, Generation and Agentic Modeling	Shixiang Tang et.al.	2502.08556	link
2025-02-12	Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies	Sunnie S. Y. Kim et.al.	2502.08554	null
2025-02-12	LLMs can implicitly learn from mistakes in-context	Lisa Alazraki et.al.	2502.08550	null
2025-02-12	Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data	Doudou Zhou et.al.	2502.08547	null
2025-02-12	Moment of Untruth: Dealing with Negative Queries in Video Moment Retrieval	Kevin Flanagan et.al.	2502.08544	link
2025-02-12	LLM Pretraining with Continuous Concepts	Jihoon Tack et.al.	2502.08524	null
2025-02-12	The Paradox of Stochasticity: Limited Creativity and Computational Decoupling in Temperature-Varied LLM Outputs of Structured Fictional Data	Evgenii Evstafev et.al.	2502.08515	null
2025-02-12	Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation	Mahnaz Koupaee et.al.	2502.08514	link
2025-02-12	Measuring Diversity in Synthetic Datasets	Yuchang Zhu et.al.	2502.08512	link
2025-02-12	Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction	Wei Li et.al.	2502.08507	link
2025-02-12	Salamandra Technical Report	Aitor Gonzalez-Agirre et.al.	2502.08489	link
2025-02-12	One-Shot Federated Learning with Classifier-Free Diffusion Models	Obaidullah Zaland et.al.	2502.08488	null
2025-02-11	DarwinLM: Evolutionary Structured Pruning of Large Language Models	Shengkun Tang et.al.	2502.07780	link
2025-02-11	Auditing Prompt Caching in Language Model APIs	Chenchen Gu et.al.	2502.07776	link
2025-02-11	Automatic Robot Task Planning by Integrating Large Language Model with Genetic Programming	Azizjon Kobilov et.al.	2502.07772	null
2025-02-11	Breaking Down Bias: On The Limits of Generalizable Pruning Strategies	Sibo Ma et.al.	2502.07771	null
2025-02-11	Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers	Italo Santos et.al.	2502.07763	null
2025-02-11	Scalable Fingerprinting of Large Language Models	Anshul Nasery et.al.	2502.07760	null
2025-02-11	Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension	Wenbo Gong et.al.	2502.07752	null
2025-02-11	WHODUNIT: Evaluation benchmark for culprit detection in mystery stories	Kshitij Gupta et.al.	2502.07747	link
2025-02-11	The Economics of Large Language Models: Token Allocation, Fine-Tuning, and Optimal Pricing	Dirk Bergemann et.al.	2502.07736	null
2025-02-11	Economics of Sourcing Human Data	Sebastin Santy et.al.	2502.07732	null
2025-02-11	Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK	Marcos Cramer et.al.	2502.07728	null
2025-02-11	Making Language Models Robust Against Negation	MohammadHossein Rezaei et.al.	2502.07717	link
2025-02-11	Magic 1-For-1: Generating One Minute Video Clips within One Minute	Hongwei Yi et.al.	2502.07701	link
2025-02-11	A Framework for LLM-powered Design Assistants	Swaroop Panda et.al.	2502.07698	null
2025-02-11	Large Language Models as Proxies for Theories of Human Linguistic Cognition	Imry Ziv et.al.	2502.07687	null
2025-02-11	SymGPT: Auditing Smart Contracts via Combining Symbolic Execution with Large Language Models	Shihao Xia et.al.	2502.07644	null
2025-02-11	FoQA: A Faroese Question-Answering Dataset	Annika Simonsen et.al.	2502.07642	null
2025-02-11	Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving	Yong Lin et.al.	2502.07640	link
2025-02-11	Exploring Mobile Touch Interaction with Large Language Models	Tim Zindulka et.al.	2502.07629	null
2025-02-11	Scaling Pre-training to One Hundred Billion Data for Vision Language Models	Xiao Wang et.al.	2502.07617	null
2025-02-10	EVEv2: Improved Baselines for Encoder-Free Vision-Language Models	Haiwen Diao et.al.	2502.06788	link
2025-02-10	Visual Agentic AI for Spatial Reasoning with a Dynamic API	Damiano Marsili et.al.	2502.06787	null
2025-02-10	DeepCrossAttention: Supercharging Transformer Residual Connections	Mike Heddes et.al.	2502.06785	null
2025-02-10	Towards Internet-Scale Training For Agents	Brandon Trabucco et.al.	2502.06776	null
2025-02-10	Enhancing Trust in Language Model-Based Code Optimization through RLHF: A Research Design	Jingzhi Gong et.al.	2502.06769	null
2025-02-10	Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs	Ryan Synk et.al.	2502.06766	link
2025-02-10	Rationalization Models for Text-to-SQL	Gaetano Rossiello et.al.	2502.06759	null
2025-02-10	Accelerating Data Processing and Benchmarking of AI Models for Pathology	Andrew Zhang et.al.	2502.06750	link
2025-02-10	Gradient Multi-Normalization for Stateless and Scalable LLM Training	Meyer Scetbon et.al.	2502.06742	null
2025-02-10	VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data	Thomas Zeng et.al.	2502.06737	null
2025-02-10	Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining	Daouda Sow et.al.	2502.06733	null
2025-02-10	Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling	Runze Liu et.al.	2502.06703	link
2025-02-10	EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks	Michael Arbel et.al.	2502.06684	null
2025-02-10	Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations	Rui Chen et.al.	2502.06669	null
2025-02-10	Automatic Evaluation of Healthcare LLMs Beyond Question-Answering	Anna Arias-Duart et.al.	2502.06666	null
2025-02-10	Evaluation of Deep Audio Representations for Hearables	Fabian Gröger et.al.	2502.06664	null
2025-02-10	EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models	Xingrun Xing et.al.	2502.06663	null
2025-02-10	Unbiased Evaluation of Large Language Models from a Causal Perspective	Meilin Chen et.al.	2502.06655	null
2025-02-10	In-Context Learning (and Unlearning) of Length Biases	Stephanie Schoch et.al.	2502.06653	null
2025-02-10	Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A	Anna Leschanowsky et.al.	2502.06652	null
2025-02-07	Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Yunhang Shen et.al.	2502.05177	link
2025-02-07	Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach	Jonas Geiping et.al.	2502.05171	link
2025-02-07	NoLiMa: Long-Context Evaluation Beyond Literal Matching	Ali Modarressi et.al.	2502.05167	link
2025-02-07	Multitwine: Multi-Object Compositing with Text and Layout Control	Gemma Canet Tarrés et.al.	2502.05165	null
2025-02-07	DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails	Yihe Deng et.al.	2502.05163	link
2025-02-07	A Lightweight Method to Disrupt Memorized Sequences in LLM	Parjanya Prajakta Prashant et.al.	2502.05159	null
2025-02-07	Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation	Steffen Eger et.al.	2502.05151	link
2025-02-07	CodeSCM: Causal Analysis for Multi-Modal Code Generation	Mukur Gupta et.al.	2502.05150	link
2025-02-07	An Annotated Reading of ‘The Singer of Tales’ in the LLM Era	Kush R. Varshney et.al.	2502.05148	null
2025-02-07	Chest X-ray Foundation Model with Global and Local Representations Integration	Zefan Yang et.al.	2502.05142	link
2025-02-07	Refining Integration-by-Parts Reduction of Feynman Integrals with Machine Learning	Matt von Hippel et.al.	2502.05121	null
2025-02-07	Flexible and Efficient Grammar-Constrained Decoding	Kanghee Park et.al.	2502.05111	null
2025-02-07	Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs	Rohit Saxena et.al.	2502.05092	null
2025-02-07	DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions	Gorkem Can Ates et.al.	2502.05091	null
2025-02-07	Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs	Thierry Bossy et.al.	2502.05087	link
2025-02-07	Causality can systematically address the monsters under the bench(marks)	Felix Leeb et.al.	2502.05085	null
2025-02-07	ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework	Xiaoyu Deng et.al.	2502.05084	null
2025-02-07	Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Tushar Pandey et.al.	2502.05078	link
2025-02-07	nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow	Geliang Ouyang et.al.	2502.05036	link
2025-02-07	EnseSmells: Deep ensemble and programming language models for automated code smells detection	Anh Ho et.al.	2502.05012	link
2025-02-06	Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment	Zuyan Liu et.al.	2502.04328	link
2025-02-06	Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions	Yik Siu Chan et.al.	2502.04322	link
2025-02-06	ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features	Alec Helbling et.al.	2502.04320	link
2025-02-06	sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views	Eyvaz Najafli et.al.	2502.04318	null
2025-02-06	ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters	Kamer Ali Yuksel et.al.	2502.04315	link
2025-02-06	Great Models Think Alike and this Undermines AI Oversight	Shashwat Goel et.al.	2502.04313	link
2025-02-06	ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization	Yinjie Wang et.al.	2502.04306	link
2025-02-06	Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization	Yuanye Liu et.al.	2502.04295	link
2025-02-06	PILAF: Optimal Human Preference Sampling for Reward Modeling	Yunzhen Feng et.al.	2502.04270	null
2025-02-06	How does a Multilingual LM Handle Multiple Languages?	Santhosh Kakarla et.al.	2502.04269	null
2025-02-06	Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion	Marco Mistretta et.al.	2502.04263	link
2025-02-06	Efficient Randomized Experiments Using Foundation Models	Piersilvio De Bartolomeis et.al.	2502.04262	link
2025-02-06	MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion	Xintong Hao et.al.	2502.04235	null
2025-02-06	Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks	Andreas Happe et.al.	2502.04227	link
2025-02-06	Keep It Light! Simplifying Image Clustering Via Text-Free Adapters	Yicen Li et.al.	2502.04226	null
2025-02-06	Éclair – Extracting Content and Layout with Integrated Reading Order for Documents	Ilia Karmanov et.al.	2502.04223	null
2025-02-06	Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data	Laura Biester et.al.	2502.04218	null
2025-02-06	Algorithmic causal structure emerging through compression	Liang Wendong et.al.	2502.04210	null
2025-02-06	“Short-length” Adversarial Training Helps LLMs Defend “Long-length” Jailbreak Attacks: Theoretical and Empirical Evidence	Shaopeng Fu et.al.	2502.04204	link
2025-02-06	The Best Instruction-Tuning Data are Those That Fit	Dylan Zhang et.al.	2502.04194	null
2025-02-05	Do Large Language Model Benchmarks Test Reliability?	Joshua Vendrow et.al.	2502.03461	link
2025-02-05	Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training	Boyao Wang et.al.	2502.03460	null
2025-02-05	SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living	Arkaprava Sinha et.al.	2502.03459	null
2025-02-05	A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs)	Yiye Chen et.al.	2502.03450	null
2025-02-05	BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving	Ran Xin et.al.	2502.03438	null
2025-02-05	On Fairness of Unified Multimodal Large Language Model for Image Generation	Ming Liu et.al.	2502.03429	null
2025-02-05	Harnessing Large Language Models for Curated Code Reviews	Oussama Ben Sghaier et.al.	2502.03425	link
2025-02-05	Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts	Nikta Gohari Sadr et.al.	2502.03418	null
2025-02-05	SPRI: Aligning Large Language Models with Context-Situated Principles	Hongli Zhan et.al.	2502.03397	null
2025-02-05	Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications	Issar Arab et.al.	2502.03395	null
2025-02-05	LIMO: Less is More for Reasoning	Yixin Ye et.al.	2502.03387	link
2025-02-05	Transformers and Their Roles as Time Series Foundation Models	Dennis Wu et.al.	2502.03383	null
2025-02-05	High-Fidelity Simultaneous Speech-To-Speech Translation	Tom Labiausse et.al.	2502.03382	link
2025-02-05	Demystifying Long Chain-of-Thought Reasoning in LLMs	Edward Yeo et.al.	2502.03373	link
2025-02-05	PalimpChat: Declarative and Interactive AI analytics	Chunwei Liu et.al.	2502.03368	null
2025-02-05	Minerva: A Programmable Memory Test Benchmark for Language Models	Menglin Xia et.al.	2502.03358	null
2025-02-05	RadVLM: A Multitask Conversational Vision-Language Model for Radiology	Nicolas Deperrois et.al.	2502.03333	null
2025-02-05	ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model	Qiguang Chen et.al.	2502.03325	null
2025-02-05	Out-of-Distribution Detection using Synthetic Data Generation	Momin Abbas et.al.	2502.03323	null
2025-02-05	Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques	Sangjun Han et.al.	2502.03321	null
2025-02-04	Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling	Xiaowen Qiu et.al.	2502.02590	null
2025-02-04	COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation	Xueqing Deng et.al.	2502.02589	null
2025-02-04	A comparison of translation performance between DeepL and Supertext	Alex Flückiger et.al.	2502.02577	link
2025-02-04	Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement	Soheil Abbasloo et.al.	2502.02573	null
2025-02-04	Learning the RoPEs: Better 2D and 3D Position Encodings with STRING	Connor Schenck et.al.	2502.02562	null
2025-02-04	Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation	Junha Lee et.al.	2502.02548	null
2025-02-04	LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World	Shrikara Arun et.al.	2502.02539	null
2025-02-04	Adaptive Self-improvement LLM Agentic System for ML Library Development	Genghan Zhang et.al.	2502.02534	link
2025-02-04	Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies	Han Zhou et.al.	2502.02533	null
2025-02-04	Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search	Maohao Shen et.al.	2502.02508	null
2025-02-04	Analyzing Similarity Metrics for Data Selection for Language Model Pretraining	Dylan Sam et.al.	2502.02494	null
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-02-04	Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study	Menglong Cui et.al.	2502.02481	null
2025-02-04	Mind the Gap: Evaluating Patch Embeddings from General-Purpose and Histopathology Foundation Models for Cell Segmentation and Classification	Valentina Vadori et.al.	2502.02471	link
2025-02-04	Modular Training of Neural Networks aids Interpretability	Satvik Golechha et.al.	2502.02470	null
2025-02-04	SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency	Qianhao Yuan et.al.	2502.02458	link
2025-02-04	IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning	Quan Zhang et.al.	2502.02454	null
2025-02-04	Personalization Toolkit: Training Free Personalization of Large Vision Language Models	Soroush Seifi et.al.	2502.02452	null
2025-02-04	Beyond English: Evaluating Automated Measurement of Moral Foundations in Non-English Discourse with a Chinese Case Study	Calvin Yixiang Cheng et.al.	2502.02451	link
2025-02-04	Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models	Haoran Ye et.al.	2502.02444	null
2025-01-31	Low-Rank Adapting Models for Sparse Autoencoders	Matthew Chen et.al.	2501.19406	link
2025-01-31	Vintix: Action Model via In-Context Reinforcement Learning	Andrey Polubarov et.al.	2501.19400	link
2025-01-31	Scalable-Softmax Is Superior for Attention	Ken M. Nakanishi et.al.	2501.19399	null
2025-01-31	Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game	Mustafa O. Karabag et.al.	2501.19398	link
2025-02-03	s1: Simple test-time scaling	Niklas Muennighoff et.al.	2501.19393	link
2025-01-31	Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models	Alina Shutova et.al.	2501.19392	link
2025-01-31	Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models	Wenzhi Fang et.al.	2501.19389	link
2025-01-31	Decoding-based Regression	Xingyou Song et.al.	2501.19383	link
2025-01-31	TableMaster: A Recipe to Advance Table Understanding with Language Models	Lang Cao et.al.	2501.19378	null
2025-02-03	SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions	Dominik Wagner et.al.	2501.19377	null
2025-01-31	We’re Different, We’re the Same: Creative Homogeneity Across LLMs	Emily Wenger et.al.	2501.19361	null
2025-01-31	Mechanical Properties of the Meninges: Large Language Model Assisted Systematic Review of over 25,000 Studies	Brandon P. Chelstrom et.al.	2501.19359	null
2025-01-31	The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking	Yuchun Miao et.al.	2501.19358	null
2025-01-31	Towards Adaptive Self-Improvement for Smarter Energy Systems	Alexander Sommer et.al.	2501.19340	null
2025-01-31	PixelWorld: Towards Perceiving Everything as Pixels	Zhiheng Lyu et.al.	2501.19339	null
2025-01-31	Homogeneity Bias as Differential Sampling Uncertainty in Language Models	Messi H. J. Lee et.al.	2501.19337	null
2025-01-31	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao et.al.	2501.19324	null
2025-01-31	MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems	Anirudh Chari et.al.	2501.19318	null
2025-01-31	LLM-based Affective Text Generation Quality Based on Different Quantization Values	Yarik Menchaca Resendiz et.al.	2501.19317	null
2025-01-31	An Efficient Approach for Machine Translation on Low-resource Languages: A Case Study in Vietnamese-Chinese	Tran Ngoc Son et.al.	2501.19314	null
2025-01-30	Foundational Models for 3D Point Clouds: A Survey and Outlook	Vishal Thengane et.al.	2501.18594	null
2025-01-30	Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models	Hao Dong et.al.	2501.18592	link
2025-01-30	Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs	Yue Wang et.al.	2501.18585	null
2025-01-30	Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling	Dan M. Kluger et.al.	2501.18577	link
2025-01-30	Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH	Evgenii Evstafev et.al.	2501.18576	null
2025-01-30	BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos	Lehao Lin et.al.	2501.18565	null
2025-01-30	SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation	Haoquan Fang et.al.	2501.18564	link
2025-01-30	Semantic Web and Creative AI – A Technical Report from ISWS 2023	Raia Abu Ahmad et.al.	2501.18542	null
2025-01-30	Loss Functions and Operators Generated by f-Divergences	Vincent Roulet et.al.	2501.18537	null
2025-01-30	Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges	Manveer Singh Tamber et.al.	2501.18536	link
2025-01-30	Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models	Yi Ding et.al.	2501.18533	null
2025-01-30	Differentially Private Steering for Large Language Model Alignment	Anmol Goel et.al.	2501.18532	link
2025-01-30	Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models	Guanqun Cao et.al.	2501.18516	null
2025-01-30	Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch	Arthur Douillard et.al.	2501.18512	null
2025-01-30	WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training	Benjamin Feuer et.al.	2501.18511	link
2025-01-30	CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction	Peter J. Bentley et.al.	2501.18504	null
2025-01-30	A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models	Changshu Liu et.al.	2501.18482	null
2025-01-30	CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization	Yanxia Deng et.al.	2501.18475	null
2025-01-30	Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations	Chengxi Zeng et.al.	2501.18474	null
2025-01-30	A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models	Shiho Noda et.al.	2501.18463	link
2025-01-29	Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs’ Domain-Specific Insight Learning?	Pouya Pezeshkpour et.al.	2501.17840	link
2025-01-29	Matrix Product Sketching via Coordinated Sampling	Majid Daliri et.al.	2501.17836	null
2025-01-29	Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology	Sobhan Hemati et.al.	2501.17822	null
2025-01-29	Leveraging Multimodal LLM for Inspirational User Interface Search	Seokhyeon Park et.al.	2501.17799	link
2025-01-29	BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation – Challenges and Insights	Chan-Jan Hsu et.al.	2501.17790	null
2025-01-29	Reasoning Over the Glyphs: Evaluation of LLM’s Decipherment of Rare Scripts	Yu-Fei Shih et.al.	2501.17785	null
2025-01-29	AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing	Peter Pak et.al.	2501.17784	null
2025-01-29	2SSP: A Two-Stage Framework for Structured Pruning of LLMs	Fabrizio Sandri et.al.	2501.17771	link
2025-01-29	Hybrid Graphs for Table-and-Text based Question Answering using LLMs	Ankush Agarwal et.al.	2501.17767	null
2025-01-29	On the Partitioning of GPU Power among Multi-Instances	Tirth Vamja et.al.	2501.17752	null
2025-01-29	Early External Safety Testing of OpenAI’s o3-mini: Insights from the Pre-Deployment Evaluation	Aitor Arrieta et.al.	2501.17749	null
2025-01-29	A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches	Ana R. Baião et.al.	2501.17729	null
2025-01-29	Using Code Generation to Solve Open Instances of Combinatorial Design Problems	Christopher D. Rosin et.al.	2501.17725	link
2025-01-29	RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts	Eujeong Choi et.al.	2501.17715	link
2025-01-29	Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate	Yubo Wang et.al.	2501.17703	null
2025-01-29	Planning with Vision-Language Models and a Use Case in Robot-Assisted Teaching	Xuzhe Dang et.al.	2501.17665	null
2025-01-29	Exploring Vision Language Models for Multimodal and Multilingual Stance Detection	Jake Vasilakes et.al.	2501.17654	null
2025-01-29	Tonguescape: Exploring Language Models Understanding of Vowel Articulation	Haruki Sakajo et.al.	2501.17643	link
2025-01-29	Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation	Lin Chen et.al.	2501.17642	null
2025-01-29	In-Context Meta LoRA Generation	Yihua Shao et.al.	2501.17635	null
2025-01-28	SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training	Tianzhe Chu et.al.	2501.17161	null
2025-01-28	AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders	Zhengxuan Wu et.al.	2501.17148	link
2025-01-28	FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data	Deren Lei et.al.	2501.17144	link
2025-01-28	ASTRAL: Automated Safety Testing of Large Language Models	Miriam Ugarte et.al.	2501.17132	null
2025-01-28	Scenario Understanding of Traffic Scenes Through Large Visual Language Models	Rivera Esteban et.al.	2501.17131	null
2025-01-28	Histoires Morales: A French Dataset for Assessing Moral Alignment	Thibaud Leteno et.al.	2501.17117	link
2025-01-28	Optimizing Large Language Model Training Using FP4 Quantization	Ruizhe Wang et.al.	2501.17116	null
2025-01-28	Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction	Carl-Leander Henneking et.al.	2501.17112	null
2025-01-28	COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models	Tobias Materzok et.al.	2501.17104	null
2025-01-28	Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving	Evgenii Evstafev et.al.	2501.17084	null
2025-01-28	Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	Akash Kumar et.al.	2501.17053	null
2025-01-28	How Linguistics Learned to Stop Worrying and Love the Language Models	Richard Futrell et.al.	2501.17047	null
2025-01-28	Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models	Minghan Li et.al.	2501.17039	null
2025-01-28	Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies	Manojkumar Parmar et.al.	2501.17030	null
2025-01-28	Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs	Alessandro Midolo et.al.	2501.17024	link
2025-01-28	Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement	Kei Katsumata et.al.	2501.17022	link
2025-01-28	Large Language Models for Code Generation: The Practitioners Perspective	Zeeshan Rasheed et.al.	2501.16998	link
2025-01-28	Artificial Intelligence Clones	Annie Liang et.al.	2501.16996	null
2025-01-28	FedEFM: Federated Endovascular Foundation Model with Unseen Data	Tuong Do et.al.	2501.16992	null
2025-01-28	Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection	Xiangyu Gao et.al.	2501.16981	null
2025-01-27	LUCY: Linguistic Understanding and Control Yielding Early Stage of Her	Heting Gao et.al.	2501.16327	link
2025-01-27	Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology	Meiyun Cao et.al.	2501.16309	null
2025-01-27	RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval	Long Nguyen et.al.	2501.16303	null
2025-01-27	Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width	Zheng Liu et.al.	2501.16302	null
2025-01-27	Large Models in Dialogue for Active Perception and Anomaly Detection	Tzoulio Chamiti et.al.	2501.16300	link
2025-01-27	FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers	Renshan Zhang et.al.	2501.16297	null
2025-01-27	Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models	Jing Zhang et.al.	2501.16282	null
2025-01-27	Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation	Jiayi Hong et.al.	2501.16277	link
2025-01-27	URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots – A Case Study at HCMUT	Long Nguyen et.al.	2501.16276	null
2025-01-27	Return of the Encoder: Maximizing Parameter Efficiency for SLMs	Mohamed Elfeki et.al.	2501.16273	link
2025-01-27	A foundation model for human-AI collaboration in medical literature mining	Zifeng Wang et.al.	2501.16255	null
2025-01-27	Multi-Agent Geospatial Copilots for Remote Sensing Workflows	Chaehong Lee et.al.	2501.16254	null
2025-01-27	Zero-Shot Decision Tree Construction via Large Language Models	Lucas Carrasco et.al.	2501.16247	null
2025-01-27	CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation	Xiaochuan Ma et.al.	2501.16246	null
2025-01-27	Phase Transitions in Large Language Models and the $O(N)$ Model	Youran Sun et.al.	2501.16241	null
2025-01-27	AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses	Runze Cai et.al.	2501.16240	link
2025-01-27	Distilling foundation models for robust and efficient models in digital pathology	Alexandre Filiot et.al.	2501.16239	null
2025-01-27	Language-Based Bayesian Optimization Research Assistant (BORA)	Abdoulatif Cissé et.al.	2501.16224	null
2025-01-27	Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models	Huayu Li et.al.	2501.16215	link
2025-01-27	Provence: efficient and robust context pruning for retrieval-augmented generation	Nadezhda Chirkova et.al.	2501.16214	null
2025-01-24	HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	Xin Zhou et.al.	2501.14729	link
2025-01-24	Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?	Ipek Baris Schlicht et.al.	2501.14719	null
2025-01-24	Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models	Naihao Deng et.al.	2501.14717	null
2025-01-24	FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing	James Seale Smith et.al.	2501.14713	null
2025-01-24	The Karp Dataset	Mason DiCicco et.al.	2501.14705	null
2025-01-24	Rethinking Table Instruction Tuning	Naihao Deng et.al.	2501.14693	null
2025-01-24	Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST	Fuping Wu et.al.	2501.14685	null
2025-01-24	An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations	Shabnam Hassani et.al.	2501.14683	null
2025-01-24	Diffusion based Text-to-Music Generationwith Global and Local Text based Conditioning	Jisi Zhang et.al.	2501.14680	null
2025-01-24	MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications	Yixing Jiang et.al.	2501.14654	link
2025-01-24	Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion	Ziyao Xu et.al.	2501.14649	link
2025-01-24	Recommending Actionable Strategies: A Semantic Approach to Integrating Analytical Frameworks with Decision Heuristics	Renato Ghisellini et.al.	2501.14634	null
2025-01-24	Extracting Problem Structure with LLMs for Optimized SAT Local Search	André Schilder et.al.	2501.14630	null
2025-01-24	ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations	Tianming Liang et.al.	2501.14607	null
2025-01-24	Knowledge Graphs Construction from Criminal Court Appeals: Insights from the French Cassation Court	Alexander V. Belikov et.al.	2501.14579	null
2025-01-24	ZETA: Leveraging Z-order Curves for Efficient Top-k Attention	Qiuhao Zeng et.al.	2501.14577	null
2025-01-24	Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding	Zhongyi Shui et.al.	2501.14548	link
2025-01-24	Leveraging ChatGPT’s Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research	Hamid Sarmadi et.al.	2501.14546	null
2025-01-24	VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning	Benjamin Callewaert et.al.	2501.14540	null
2025-01-24	Design and Implementation of a Psychiatry Resident Training System Based on Large Language Models	Zhenguang Zhong et.al.	2501.14530	link
2025-01-23	CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation	Guofeng Cui et.al.	2501.13927	null
2025-01-23	The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities	Chan-Jan Hsu et.al.	2501.13921	link
2025-01-23	Analysis of Indic Language Capabilities in LLMs	Aatman Vaidya et.al.	2501.13912	null
2025-01-23	Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models	Linh Tran et.al.	2501.13904	null
2025-01-23	Exploring Finetuned Audio-LLM on Heart Murmur Features	Adrian Florea et.al.	2501.13884	null
2025-01-23	The machine learning platform for developers of large systems	Alexey Naikov et.al.	2501.13881	null
2025-01-23	A RAG-Based Institutional Assistant	Gustavo Kuratomi et.al.	2501.13880	null
2025-01-23	Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning	Shiyu Zhang et.al.	2501.13859	null
2025-01-23	Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes	Shiling Deng et.al.	2501.13851	link
2025-01-23	Think Outside the Data: Colonial Biases and Systemic Issues in Automated Moderation Pipelines for Low-Resource Languages	Farhana Shahid et.al.	2501.13836	null
2025-01-23	On the Reasoning Capacity of AI Models and How to Quantify It	Santosh Kumar Radha et.al.	2501.13833	null
2025-01-23	Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing	Hao Zhang et.al.	2501.13831	null
2025-01-23	Hallucinations Can Improve Large Language Models in Drug Discovery	Shuzhou Yuan et.al.	2501.13824	null
2025-01-23	Large Language Model driven Policy Exploration for Recommender Systems	Jie Wang et.al.	2501.13816	null
2025-01-23	Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change	Mowafak Allaham et.al.	2501.13802	null
2025-01-23	PromptMono: Cross Prompting Attention for Self-Supervised Monocular Depth Estimation in Challenging Environments	Changhao Wang et.al.	2501.13796	null
2025-01-23	Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models	Chaolei Han et.al.	2501.13795	link
2025-01-23	Parameter-Efficient Fine-Tuning for Foundation Models	Dan Zhang et.al.	2501.13787	link
2025-01-23	Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling	Tanya Rodchenko et.al.	2501.13779	null
2025-01-23	Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework	Yoonsang Kim et.al.	2501.13778	link
2025-01-22	VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding	Boqiang Zhang et.al.	2501.13106	link
2025-01-22	Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment	Melissa Kazemi Rad et.al.	2501.13080	null
2025-01-22	Autonomy-of-Experts Models	Ang Lv et.al.	2501.13074	null
2025-01-22	Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Bohao Yang et.al.	2501.13042	link
2025-01-22	Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament	Yantao Liu et.al.	2501.13007	link
2025-01-22	Large Language Model-Based Semantic Communication System for Image Transmission	Soheyb Ribouh et.al.	2501.12988	null
2025-01-22	LLM4WM: Adapting LLM for Wireless Multi-Tasking	Xuanyu Liu et.al.	2501.12983	null
2025-01-22	OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models	Chongren Sun et.al.	2501.12975	link
2025-01-22	Accessible Smart Contracts Verification: Synthesizing Formal Models with Tamed LLMs	Jan Corazza et.al.	2501.12972	link
2025-01-22	It’s complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act	Kristof Meding et.al.	2501.12962	null
2025-01-22	Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference	Weizhi Fei et.al.	2501.12959	null
2025-01-22	GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models	Pengxiang Zhao et.al.	2501.12956	null
2025-01-22	Correctness Assessment of Code Generated by Large Language Models Using Internal Representations	Tuan-Dung Bui et.al.	2501.12934	link
2025-01-22	DynamicEarth: How Far are We from Open-Vocabulary Change Detection?	Kaiyu Li et.al.	2501.12931	null
2025-01-22	A Functional Software Reference Architecture for LLM-Integrated Systems	Alessio Bucaioni et.al.	2501.12904	null
2025-01-22	Architectural Fusion Through Contextual Partitioning in Large Language Models: A Novel Approach to Parameterized Knowledge Integration	Offa Kingsleigh et.al.	2501.12901	null
2025-01-22	Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback	Yafu Li et.al.	2501.12895	link
2025-01-22	Generative AI Misuse Potential in Cyber Security Education: A Case Study of a UK Degree Program	Carlton Shepherd et.al.	2501.12883	null
2025-01-22	WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge	Jingyuan Chen et.al.	2501.12877	null
2025-01-22	HierPromptLM: A Pure PLM-based Framework for Representation Learning on Heterogeneous Text-rich Networks	Qiuyu Zhu et.al.	2501.12857	null
2025-01-21	InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Yi Wang et.al.	2501.12386	link
2025-01-21	MMVU: Measuring Expert-Level Multi-Discipline Video Understanding	Yilun Zhao et.al.	2501.12380	link
2025-01-21	Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists	Thomas F. Eisenmann et.al.	2501.12374	link
2025-01-21	Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL	Yeounoh Chung et.al.	2501.12372	link
2025-01-21	Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models	Samira Abnar et.al.	2501.12370	null
2025-01-21	InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Yuhang Zang et.al.	2501.12368	link
2025-01-21	Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2	Md. Rakibul Islam et.al.	2501.12356	null
2025-01-21	Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration	Thomas Walshe et.al.	2501.12332	null
2025-01-21	Cinepro: Robust Training of Foundation Models for Cancer Detection in Prostate Ultrasound Cineloops	Mohamed Harmanani et.al.	2501.12331	link
2025-01-21	VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	Xianwei Zhuang et.al.	2501.12327	link
2025-01-21	LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations	Hasan Abu-Rasheed et.al.	2501.12300	null
2025-01-21	MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks	Qishen Zhou et.al.	2501.12281	link
2025-01-21	Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement	Maosong Cao et.al.	2501.12273	link
2025-01-21	CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification	Cristiano Patrício et.al.	2501.12266	null
2025-01-21	FOCUS: First Order Concentrated Updating Scheme	Yizhou Liu et.al.	2501.12243	null
2025-01-21	InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models	Pha Nguyen et.al.	2501.12231	null
2025-01-21	CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning	Yuanheng Fang et.al.	2501.12226	null
2025-01-21	Leveraging Large Language Models for Realizing Truly Intelligent User Interfaces	Allard Oelen et.al.	2501.12221	null
2025-01-21	You Can’t Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense	Wuyuao Mai et.al.	2501.12210	null
2025-01-21	Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model	Kazi Hasan Ibn Arif et.al.	2501.12206	link
2025-01-17	FaceXBench: Evaluating Multimodal LLMs on Face Understanding	Kartik Narayan et.al.	2501.10360	link
2025-01-17	Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems	Weibo Gao et.al.	2501.10332	link
2025-01-17	BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation	Suvodip Dey et.al.	2501.10328	link
2025-01-17	Large language models for automated scholarly paper review: A survey	Zhenzhen Zhuang et.al.	2501.10326	null
2025-01-17	Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models	Pit Neitemeier et.al.	2501.10322	null
2025-01-17	HiMix: Reducing Computational Complexity in Large Vision-Language Models	Xuange Zhang et.al.	2501.10318	null
2025-01-17	Addressing Popularity Bias in Third-Party Library Recommendations Using LLMs	Claudio Di Sipio et.al.	2501.10313	null
2025-01-17	Computational Protein Science in the Era of Large Language Models (LLMs)	Wenqi Fan et.al.	2501.10282	null
2025-01-17	Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation	Azat Abdullin et.al.	2501.10200	null
2025-01-17	Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education	William Hersh et.al.	2501.10186	null
2025-01-17	Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval	Vera Pavlova et.al.	2501.10175	null
2025-01-17	Dual Debiasing: Remove Stereotypes and Keep Factual Gender for Fair Language Modeling and Translation	Tomasz Limisiewicz et.al.	2501.10150	null
2025-01-17	A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features	Enes Karanfil et.al.	2501.10144	null
2025-01-17	Exploring the Impact of Generative Artificial Intelligence in Education: A Thematic Analysis	Abhishek Kaushik et.al.	2501.10134	null
2025-01-17	ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario	Lucen Zhong et.al.	2501.10132	link
2025-01-17	PaSa: An LLM Agent for Comprehensive Academic Paper Search	Yichen He et.al.	2501.10120	link
2025-01-17	LLM Reasoner and Automated Planner: A new NPC approach	Israel Puerta-Merino et.al.	2501.10106	null
2025-01-17	Universal Actions for Enhanced Embodied Foundation Models	Jinliang Zheng et.al.	2501.10105	link
2025-01-17	Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks	Michael Schwingshackl et.al.	2501.10080	link
2025-01-17	SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning	Yuecheng Liu et.al.	2501.10074	null
2025-01-16	Distilling Multi-modal Large Language Models for Autonomous Driving	Deepti Hegde et.al.	2501.09757	null
2025-01-16	Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues	Youngjoon Jang et.al.	2501.09754	null
2025-01-16	OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking	Zekun Xi et.al.	2501.09751	link
2025-01-16	Enhancing Lexicon-Based Text Embeddings with Large Language Models	Yibin Lei et.al.	2501.09749	null
2025-01-16	Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models	Bihui Jin et.al.	2501.09745	null
2025-01-16	Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps	Nanye Ma et.al.	2501.09732	null
2025-01-16	A Simple Aerial Detection Baseline of Multimodal Language Models	Qingyun Li et.al.	2501.09720	link
2025-01-16	CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education	Tianyu Wang et.al.	2501.09709	link
2025-01-16	Domain Adaptation of Foundation LLMs for e-Commerce	Christian Herold et.al.	2501.09706	null
2025-01-16	Cueless EEG imagined speech for subject identification: dataset and benchmarks	Ali Derakhshesh et.al.	2501.09700	link
2025-01-16	Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key	Zhihe Yang et.al.	2501.09695	link
2025-01-16	Simulated Interactive Debugging	Yannic Noller et.al.	2501.09694	null
2025-01-16	Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models	Fengli Xu et.al.	2501.09686	null
2025-01-16	Reward-Guided Controlled Generation for Inference-Time Alignment in Diffusion Models: Tutorial and Review	Masatoshi Uehara et.al.	2501.09685	null
2025-01-16	Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark	Alexis Roger et.al.	2501.09672	null
2025-01-16	A Survey of Research in Large Language Models for Electronic Design Automation	Jingyu Pan et.al.	2501.09655	null
2025-01-16	The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models	Jonathan Katzy et.al.	2501.09653	null
2025-01-16	CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding	Johannes Kirmayr et.al.	2501.09645	link
2025-01-16	LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading	Kuan-Ming Liu et.al.	2501.09636	null
2025-01-16	Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework	Yushen Lin et.al.	2501.09631	null
2025-01-15	Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians	Ishan Amin et.al.	2501.09009	link
2025-01-15	Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails	Shaona Ghosh et.al.	2501.09004	null
2025-01-15	Vision Foundation Models for Computed Tomography	Suraj Pai et.al.	2501.09001	link
2025-01-15	CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with Gaussian Representation	Qi Ma et.al.	2501.08982	null
2025-01-15	Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models	Emma Croxford et.al.	2501.08977	null
2025-01-15	Learning to Extract Cross-Domain Aspects and Understanding Sentiments Using Large Language Models	Karukriti Kaushik Ghosh et.al.	2501.08974	null
2025-01-15	Analyzing the Ethical Logic of Six Large Language Models	W. Russell Neuman et.al.	2501.08951	null
2025-01-15	Applying General Turn-taking Models to Conversational Human-Robot Interaction	Gabriel Skantze et.al.	2501.08946	null
2025-01-15	Disentangling Exploration of Large Language Models by Optimal Exploitation	Tim Grams et.al.	2501.08925	null
2025-01-15	GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge	Liam Dugan et.al.	2501.08913	link
2025-01-15	Leveraging Large Language Models as Knowledge-Driven Agents for Reliable Retrosynthesis Planning	Qinyu Ma et.al.	2501.08897	link
2025-01-15	Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving	Tengpeng Li et.al.	2501.08861	link
2025-01-15	Exploring Task-Level Optimal Prompts for Visual In-Context Learning	Yan Zhu et.al.	2501.08841	null
2025-01-15	IDEA: Image Description Enhanced CLIP-Adapter	Zhipeng Ye et.al.	2501.08816	link
2025-01-15	How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering	Christoph Treude et.al.	2501.08774	null
2025-01-15	Admitting Ignorance Helps the Video Question Answering Models to Answer	Haopeng Li et.al.	2501.08771	null
2025-01-15	Enhanced Large Language Models for Effective Screening of Depression and Anxiety	June M. Liu et.al.	2501.08769	null
2025-01-15	Leveraging LLM Agents for Translating Network Configurations	Yunze Wei et.al.	2501.08760	null
2025-01-15	Expanding Vietnamese SentiWordNet to Improve Performance of Vietnamese Sentiment Analysis Models	Hong-Viet Tran et.al.	2501.08758	null
2025-01-15	The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities	Irina Bigoulaeva et.al.	2501.08716	link
2025-01-14	PokerBench: Training Large Language Models to become Professional Poker Players	Richard Zhuang et.al.	2501.08328	link
2025-01-14	Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Miran Heo et.al.	2501.08326	null
2025-01-14	ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations	Ziyuan Huang et.al.	2501.08324	null
2025-01-14	Exploring Robustness of Multilingual LLMs on Real-World Noisy Data	Amirhossein Aliakbarzadeh et.al.	2501.08322	link
2025-01-14	Enhancing Automated Interpretability with Output-Centric Feature Descriptions	Yoav Gur-Arieh et.al.	2501.08319	link
2025-01-14	MiniMax-01: Scaling Foundation Models with Lightning Attention	MiniMax et.al.	2501.08313	null
2025-01-14	HALoGEN: Fantastic LLM Hallucinations and Where to Find Them	Abhilasha Ravichander et.al.	2501.08292	null
2025-01-14	LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding	Hongyu Li et.al.	2501.08282	link
2025-01-14	Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing	Pulkit Arora et.al.	2501.08276	null
2025-01-14	Addressing the sustainable AI trilemma: a case study on LLM agents and RAG	Hui Wu et.al.	2501.08262	link
2025-01-14	Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models	Yifu Qiu et.al.	2501.08248	null
2025-01-14	Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints	Jonathan Nöther et.al.	2501.08246	null
2025-01-14	Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Paul Joe Maliakel et.al.	2501.08219	null
2025-01-14	ASTRID – An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems	Mohita Chowdhury et.al.	2501.08208	null
2025-01-14	ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving	Zain Ul Abedin et.al.	2501.08203	null
2025-01-14	CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation	Jinjun Peng et.al.	2501.08200	link
2025-01-14	OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training	Yijiong Yu et.al.	2501.08197	link
2025-01-14	PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving	Ahmet Caner Yüzügüler et.al.	2501.08192	null
2025-01-14	A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation	Steven Landgraf et.al.	2501.08188	null
2025-01-14	A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following	Yin Fang et.al.	2501.08187	link
2025-01-13	Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss	Xinyu Zhang et.al.	2501.07563	null
2025-01-13	SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing	Varun Biyyala et.al.	2501.07554	link
2025-01-13	Imagine while Reasoning in Space: Multimodal Visualization-of-Thought	Chengzu Li et.al.	2501.07542	null
2025-01-13	ML Mule: Mobile-Driven Context-Aware Collaborative Learning	Haoxiang Yu et.al.	2501.07536	null
2025-01-13	Investigating Large Language Models in Inferring Personality Traits from User Conversations	Jianfeng Zhu et.al.	2501.07532	null
2025-01-13	RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment	Difei Gu et.al.	2501.07525	link
2025-01-13	Parallel Key-Value Cache Fusion for Position Invariant RAG	Philhoon Oh et.al.	2501.07523	null
2025-01-13	Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards	Yangsibo Huang et.al.	2501.07493	null
2025-01-13	TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models	Thales Sales Almeida et.al.	2501.07482	link
2025-01-13	A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities	Yihao Liu et.al.	2501.07468	null
2025-01-13	Understanding and Benchmarking Artificial Intelligence: OpenAI’s o3 Is Not AGI	Rolf Pfister et.al.	2501.07458	null
2025-01-13	Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection	Xin Yin et.al.	2501.07425	null
2025-01-13	Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion	Lala Shakti Swarup Ray et.al.	2501.07408	null
2025-01-13	Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models	Yasiru Ranasinghe et.al.	2501.07396	null
2025-01-13	Enhancing Retrieval-Augmented Generation: A Study of Best Practices	Siran Li et.al.	2501.07391	link
2025-01-13	Extracting Participation in Collective Action from Social Media	Arianna Pera et.al.	2501.07368	null
2025-01-13	Emergent effects of scaling on the functional hierarchies within large language models	Paul C. Bogdan et.al.	2501.07359	null
2025-01-13	Evaluating Pre-Trained Models for Multi-Language Vulnerability Patching	Zanis Ali Khan et.al.	2501.07339	null
2025-01-13	TempoGPT: Enhancing Temporal Reasoning via Quantizing Embedding	Haochuan Zhang et.al.	2501.07335	null
2025-01-13	Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring	Buse Sibel Korkmaz et.al.	2501.07324	link
2025-01-10	LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs	Omkar Thawakar et.al.	2501.06186	link
2025-01-10	PEACE: Empowering Geologic Map Holistic Understanding with MLLMs	Yangyu Huang et.al.	2501.06184	null
2025-01-10	VideoAuteur: Towards Long Narrative Video Generation	Junfei Xiao et.al.	2501.06173	null
2025-01-10	Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories	Gerd Kortemeyer et.al.	2501.06143	null
2025-01-10	Supervision policies can shape long-term risk management in general-purpose AI models	Manuel Cebrian et.al.	2501.06137	link
2025-01-10	CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems	Haichao Liu et.al.	2501.06132	link
2025-01-10	Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI	Yuya Asano et.al.	2501.06129	null
2025-01-10	Merging Feed-Forward Sublayers for Compressed Transformers	Neha Verma et.al.	2501.06126	link
2025-01-10	Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding	Fabian David Schmidt et.al.	2501.06117	link
2025-01-10	From Conversation to Automation: Leveraging Large Language Models to Analyze Strategies in Problem Solving Therapy	Elham Aghakhani et.al.	2501.06101	null
2025-01-10	Personalized Language Model Learning on Text Data Without User Identifiers	Yucheng Ding et.al.	2501.06062	link
2025-01-10	AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery	Johann Wenckstern et.al.	2501.06039	link
2025-01-10	Generate, Transduct, Adapt: Iterative Transduction with VLMs	Oindrila Saha et.al.	2501.06031	null
2025-01-10	Addressing speaker gender bias in large scale speech translation systems	Shubham Bansal et.al.	2501.05989	null
2025-01-10	Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing	Eklavya Sarkar et.al.	2501.05987	link
2025-01-10	Exploring LLMs for Automated Pre-Testing of Cross-Cultural Surveys	Divya Mani Adhikari et.al.	2501.05985	null
2025-01-10	Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea	Eunjung Cho et.al.	2501.05981	null
2025-01-10	Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory	Yunmeng Shu et.al.	2501.05965	null
2025-01-10	Effective faking of verbal deception detection with target-aligned adversarial attacks	Bennett Kleinberg et.al.	2501.05962	null
2025-01-10	Scalable Vision Language Model Training via High Quality Data Curation	Hongyuan Dong et.al.	2501.05952	null
2025-01-09	An Empirical Study of Autoregressive Pre-training from Videos	Jathushan Rajasegaran et.al.	2501.05453	null
2025-01-09	ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding	Xingyu Fu et.al.	2501.05452	null
2025-01-09	Relative Pose Estimation through Affine Corrections of Monocular Depth Priors	Yifan Yu et.al.	2501.05446	link
2025-01-09	Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark	Yunzhuo Hao et.al.	2501.05444	link
2025-01-09	A survey of textual cyber abuse detection using cutting-edge language models and large language models	Jose A. Diaz-Garcia et.al.	2501.05443	null
2025-01-09	Using LLMs to Infer Non-Binary COVID-19 Sentiments of Chinese Micro-bloggers	Jerry Chongyi Hu et.al.	2501.05423	null
2025-01-09	LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation	Xi Ye et.al.	2501.05414	null
2025-01-09	Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation	Darius Petermann et.al.	2501.05413	null
2025-01-09	A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics	Maximilian Alber et.al.	2501.05409	null
2025-01-09	Mechanistic understanding and validation of large AI models with SemanticLens	Maximilian Dreyer et.al.	2501.05398	link
2025-01-09	FairCode: Evaluating Social Bias of LLMs in Code Generation	Yongkang Du et.al.	2501.05396	link
2025-01-09	Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models	Kristian G. Barman et.al.	2501.05382	null
2025-01-09	Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance	Dimitrios Gerogiannis et.al.	2501.05379	null
2025-01-09	Accelerated Diffusion Models via Speculative Sampling	Valentin De Bortoli et.al.	2501.05370	null
2025-01-09	Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction	Hantao Lou et.al.	2501.05336	link
2025-01-09	“What’s Happening”- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles	Xuewen Luo et.al.	2501.05322	null
2025-01-09	Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning	Nora Gourmelon et.al.	2501.05281	link
2025-01-09	CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models	Fabian Hörst et.al.	2501.05269	link
2025-01-09	Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing	Atharva Mutsaddi et.al.	2501.05260	link
2025-01-09	CallNavi: A Study and Challenge on Function Calling Routing and Invocation in Large Language Models	Yewei Song et.al.	2501.05255	null
2025-01-08	EditAR: Unified Conditional Generation with Autoregressive Models	Jiteng Mu et.al.	2501.04699	null
2025-01-08	Re-ranking the Context for Multimodal Retrieval Augmented Generation	Matin Mortaheb et.al.	2501.04695	null
2025-01-08	URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics	Ruilin Luo et.al.	2501.04686	link
2025-01-08	Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations	Archita Srivastava et.al.	2501.04675	null
2025-01-08	DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests	Charles Corbière et.al.	2501.04671	null
2025-01-08	On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena	Tarek Naous et.al.	2501.04662	link
2025-01-08	Assessing Language Comprehension in Large Language Models Using Construction Grammar	Wesley Scivetti et.al.	2501.04661	null
2025-01-08	Multi-task retriever fine-tuning for domain-specific and efficient RAG	Patrice Béchard et.al.	2501.04652	null
2025-01-08	FlairGPT: Repurposing LLMs for Interior Designs	Gabrielle Littlefair et.al.	2501.04648	null
2025-01-08	A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI	Kazusato Oko et.al.	2501.04641	link
2025-01-08	Knowledge Retrieval Based on Generative AI	Te-Lun Yang et.al.	2501.04635	null
2025-01-08	“Can you be my mum?”: Manipulating Social Robots in the Large Language Models Era	Giulio Antonio Abbo et.al.	2501.04633	null
2025-01-08	MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation	Daniele Molino et.al.	2501.04614	null
2025-01-08	Quantum-inspired Embeddings Projection and Similarity Metrics for Representation Learning	Ivan Kankeu et.al.	2501.04591	link
2025-01-08	Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models	Miaoyang He et.al.	2501.04582	null
2025-01-08	InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection	Yuhang Liu et.al.	2501.04575	link
2025-01-08	Supervision-free Vision-Language Alignment	Giorgio Giannone et.al.	2501.04568	null
2025-01-08	OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis	Run Luo et.al.	2501.04561	link
2025-01-08	The Impostor is Among Us: Can Large Language Models Capture the Complexity of Human Personas?	Christopher Lazik et.al.	2501.04543	null
2025-01-08	rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking	Xinyu Guan et.al.	2501.04519	null
2025-01-07	LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving	Lingdong Kong et.al.	2501.04005	null
2025-01-07	Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives	Shaoyuan Xie et.al.	2501.04003	link
2025-01-07	Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Haobo Yuan et.al.	2501.04001	link
2025-01-07	RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance	Matin Mortaheb et.al.	2501.03995	null
2025-01-07	Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles	Yuxi Xia et.al.	2501.03991	null
2025-01-07	(De)-Indexing and the Right to be Forgotten	Salvatore Vilella et.al.	2501.03989	null
2025-01-07	VLM-driven Behavior Tree for Context-aware Task Planning	Naoki Wake et.al.	2501.03968	link
2025-01-07	Vision Language Models as Values Detectors	Giulio Antonio Abbo et.al.	2501.03957	null
2025-01-07	Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States	Jurgita Kapočiūtė-Dzikienė et.al.	2501.03952	null
2025-01-07	Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection	Pablo Miralles-González et.al.	2501.03940	null
2025-01-07	Visual question answering: from early developments to recent advances – a survey	Ngoc Dung Huynh et.al.	2501.03939	null
2025-01-07	Exploring the Potential of Large Language Models in Public Transportation: San Antonio Case Study	Ramya Jonnala et.al.	2501.03904	null
2025-01-07	LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token	Shaolei Zhang et.al.	2501.03895	link
2025-01-07	AlphaPO – Reward shape matters for LLM alignment	Aman Gupta et.al.	2501.03884	null
2025-01-07	CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds	Keonwoo Kim et.al.	2501.03879	null
2025-01-07	Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study	Xaver Maria Krückl et.al.	2501.03863	link
2025-01-07	Progressive Document-level Text Simplification via Large Language Models	Dengzhao Fang et.al.	2501.03857	null
2025-01-07	BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context	Alexis Matzopoulos et.al.	2501.03855	null
2025-01-07	OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints	Mingjie Pan et.al.	2501.03841	null
2025-01-07	MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention	Aadya Arora et.al.	2501.03839	null
2025-01-06	BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning	Beichen Zhang et.al.	2501.03226	link
2025-01-06	Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation	Yuhui Zhang et.al.	2501.03225	link
2025-01-06	Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text	Ayat Najjar et.al.	2501.03212	null
2025-01-06	Detecting AI-Generated Text in Educational Content: Leveraging Machine Learning and Explainable AI for Academic Integrity	Ayat A. Najjar et.al.	2501.03203	null
2025-01-06	The FACTS Grounding Leaderboard: Benchmarking LLMs’ Ability to Ground Responses to Long-Form Input	Alon Jacovi et.al.	2501.03200	null
2025-01-06	CLIX: Cross-Lingual Explanations of Idiomatic Expressions	Aaron Gluck et.al.	2501.03191	null
2025-01-06	Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text	Ali Al-Lawati et.al.	2501.03166	link
2025-01-06	Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy	Risha Goel et.al.	2501.03153	link
2025-01-06	Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches	Alhassan Mumuni et.al.	2501.03151	null
2025-01-06	VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity	Yerong Li et.al.	2501.03139	null
2025-01-06	PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models	Mingyang Song et.al.	2501.03124	link
2025-01-06	CAT: Content-Adaptive Image Tokenization	Junhong Shen et.al.	2501.03120	null
2025-01-06	LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases	Dylan Bouchard et.al.	2501.03112	link
2025-01-06	Sentiment-guided Commonsense-aware Response Generation for Mental Health Counseling	Aseem Srivastava et.al.	2501.03088	null
2025-01-06	Retrieval-Augmented TLAPS Proof Generation with Large Language Models	Yuhao Zhou et.al.	2501.03073	null
2025-01-06	Trust Modeling in Counseling Conversations: A Benchmark Study	Aseem Srivastava et.al.	2501.03064	null
2025-01-06	ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events	Duygu Sezen Islakoglu et.al.	2501.03040	null
2025-01-06	Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders	Dichucheng Li et.al.	2501.03038	null
2025-01-06	Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning	Zhen Li et.al.	2501.03035	null
2025-01-06	CALM: Curiosity-Driven Auditing for Large Language Models	Xiang Zheng et.al.	2501.02997	link
2025-01-03	VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction	Chaoyou Fu et.al.	2501.01957	link
2025-01-03	Metadata Conditioning Accelerates Language Model Pre-training	Tianyu Gao et.al.	2501.01956	link
2025-01-03	Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap	Weizhi Zhang et.al.	2501.01945	link
2025-01-03	Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues and Challenges	Shagun Sinha et.al.	2501.01933	null
2025-01-03	Bridging Classification and Segmentation in Osteosarcoma Assessment via Foundation and Discrete Diffusion Models	Manh Duong Nguyen et.al.	2501.01932	link
2025-01-03	Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding	Jiaming Li et.al.	2501.01926	link
2025-01-03	Virgo: A Preliminary Exploration on Reproducing o1-like MLLM	Yifan Du et.al.	2501.01904	link
2025-01-03	QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture	Shvetank Prakash et.al.	2501.01892	null
2025-01-03	Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions	Rachneet Sachdeva et.al.	2501.01872	link
2025-01-03	Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification	Xiangxiang Dai et.al.	2501.01849	link
2025-01-03	MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning	Pu Yang et.al.	2501.01834	null
2025-01-03	Time Series Language Model for Descriptive Caption Generation	Mohamed Trabelsi et.al.	2501.01832	null
2025-01-03	Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models	Yanjiang Liu et.al.	2501.01830	null
2025-01-03	SDPO: Segment-Level Direct Preference Optimization for Social Agents	Aobo Kong et.al.	2501.01821	link
2025-01-03	BERT4MIMO: A Foundation Model using BERT Architecture for Massive MIMO Channel State Information Prediction	Ferhat Ozgur Catak et.al.	2501.01802	link
2025-01-03	Reading Between the Lines: A dataset and a study on why some texts are tougher than others	Nouran Khallaf et.al.	2501.01796	link
2025-01-03	Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation	Mohammad Khalil et.al.	2501.01793	link
2025-01-03	Efficient LLM Inference with Activation Checkpointing and Hybrid Caching	Sanghyeon Lee et.al.	2501.01792	null
2025-01-03	LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction	Er Jin et.al.	2501.01767	null
2025-01-03	SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation	Mingjie Li et.al.	2501.01765	null
2025-01-02	GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Zhangyang Qi et.al.	2501.01428	link
2025-01-02	Unifying Specialized Visual Encoders for Video Language Models	Jihoon Chung et.al.	2501.01426	link
2025-01-02	Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models	Jingfeng Yao et.al.	2501.01423	link
2025-01-02	OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios	Xize Cheng et.al.	2501.01384	null
2025-01-02	Training Medical Large Vision-Language Models with Abnormal-Aware Feedback	Yucheng Zhou et.al.	2501.01377	null
2025-01-02	ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI	Neda Tavakoli et.al.	2501.01372	link
2025-01-02	CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering	Ben Vardi et.al.	2501.01371	null
2025-01-02	Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability	Dong Shu et.al.	2501.01346	null
2025-01-02	Aligning Large Language Models for Faithful Integrity Against Opposing Argument	Yong Zhao et.al.	2501.01336	link
2025-01-02	CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models	Johan Wahréus et.al.	2501.01335	link
2025-01-02	Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension	Yanbo Fang et.al.	2501.01332	null
2025-01-02	The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation	Shuzheng Gao et.al.	2501.01329	null
2025-01-02	Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking	Xiaoxue Cheng et.al.	2501.01306	null
2025-01-02	Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments – The Depression and Anxiety Case	Kaushik Roy et.al.	2501.01305	null
2025-01-02	NeutraSum: A Language Model can help a Balanced Media Diet by Neutralizing News Summaries	Xi Luo et.al.	2501.01284	null
2025-01-02	CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries	Shudong Liu et.al.	2501.01282	null
2025-01-02	Language Models for Code Optimization: Survey, Challenges and Future Directions	Jingzhi Gong et.al.	2501.01277	link
2025-01-02	Does a Large Language Model Really Speak in Human-Like Language?	Mose Park et.al.	2501.01273	null
2025-01-02	ProgCo: Program Helps Self-Correction of Large Language Models	Xiaoshuai Song et.al.	2501.01264	link
2025-01-02	CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings	Shanghaoran Quan et.al.	2501.01257	null
2024-12-30	Distributed Mixture-of-Agents for Edge Inference with Large Language Models	Purbesh Mitra et.al.	2412.21200	link
2024-12-31	HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation	Zhaojian Yu et.al.	2412.21199	link
2024-12-30	Aviary: training language agents on challenging scientific tasks	Siddharth Narayanan et.al.	2412.21154	link
2024-12-30	Facilitating large language model Russian adaptation with Learned Embedding Propagation	Mikhail Tikhomirov et.al.	2412.21140	link
2024-12-30	Training Software Engineering Agents and Verifiers with SWE-Gym	Jiayi Pan et.al.	2412.21139	link
2024-12-30	Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism	Tim Tsz-Kit Lau et.al.	2412.21124	null
2024-12-30	ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation	Ruixuan Liu et.al.	2412.21123	null
2024-12-30	Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model	Yifei Huang et.al.	2412.21080	link
2024-12-30	Efficient Multi-Task Inferencing with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring	Ehsan Latif et.al.	2412.21065	null
2024-12-30	Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense	Yuyang Zhou et.al.	2412.21051	link
2024-12-30	Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration	Wanglong Lu et.al.	2412.21042	link
2024-12-30	TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization	Chia-Yu Hung et.al.	2412.21037	link
2024-12-30	GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models	Shangyu Xing et.al.	2412.21036	null
2024-12-30	MapQaTor: A System for Efficient Annotation of Map Query Datasets	Mahir Labib Dihan et.al.	2412.21015	link
2024-12-31	Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria	Joonwon Jang et.al.	2412.21006	null
2024-12-30	Plug-and-Play Training Framework for Preference Optimization	Jingyuan Ma et.al.	2412.20996	null
2024-12-30	KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model’s Reasoning Path Aggregation	Siyuan Fang et.al.	2412.20995	null
2024-12-30	Efficiently Serving LLM Reasoning Programs with Certaindex	Yichao Fu et.al.	2412.20993	null
2024-12-30	AlignAb: Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies	Yibo Wen et.al.	2412.20984	null
2024-12-30	UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI	Fangwei Zhong et.al.	2412.20977	null
2024-12-27	MVTamperBench: Evaluating Robustness of Vision-Language Models	Amit Agarwal et.al.	2412.19794	null
2024-12-27	InfAlign: Inference-aware language model alignment	Ananth Balashankar et.al.	2412.19792	null
2024-12-27	Enhancing Whisper’s Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization	Kumud Tripathi et.al.	2412.19785	null
2024-12-27	Can AI Help with Your Personal Finances?	Oudom Hean et.al.	2412.19784	null
2024-12-27	Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration	Le Chen et.al.	2412.19770	link
2024-12-27	On dual-projectively equivalent connections associated to second order superintegrable systems	Andreas Vollmer et.al.	2412.19739	null
2024-12-27	Can Large Language Models Adapt to Other Agents In-Context?	Matthew Riemer et.al.	2412.19726	null
2024-12-27	OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis	Qiushi Sun et.al.	2412.19723	null
2024-12-27	Toward Adaptive Reasoning in Large Language Models with Thought Rollback	Sijia Chen et.al.	2412.19707	link
2024-12-27	A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization	Jingchun Lian et.al.	2412.19685	null
2024-12-27	Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework	Jiang Liu et.al.	2412.19684	null
2024-12-27	CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs	Siyu Wang et.al.	2412.19663	null
2024-12-27	Asymmetrical Reciprocity-based Federated Learning for Resolving Disparities in Medical Diagnosis	Jiaqi Wang et.al.	2412.19654	link
2024-12-27	FreStega: A Plug-and-Play Method for Boosting Imperceptibility and Capacity in Generative Linguistic Steganography for Real-World Scenarios	Kaiyi Pang et.al.	2412.19652	null
2024-12-27	Xmodel-2 Technical Report	Wang Qun et.al.	2412.19638	link
2024-12-27	IMTP: Search-based Code Generation for In-memory Tensor Programs	Yongwon Shin et.al.	2412.19630	null
2024-12-27	Signatures of prediction during natural listening in MEG data?	Sahel Azizpour et.al.	2412.19622	null
2024-12-27	Gradient Weight-normalized Low-rank Projection for Efficient LLM Training	Jia-Hong Huang et.al.	2412.19616	link
2024-12-27	Let Watermarks Speak: A Robust and Unforgeable Watermark for Language Models	Minhao Bai et.al.	2412.19603	null
2024-12-27	SocRATES: Towards Automated Scenario-based Testing of Social Navigation Algorithms	Shashank Rao Marpally et.al.	2412.19595	null
2024-12-24	Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models	Jinhui Yi et.al.	2412.18609	link
2024-12-24	Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models	Zehan Wang et.al.	2412.18605	link
2024-12-24	Explaining in Diffusion: Explaining a Classifier Through Hierarchical Semantics with Text-to-Image Diffusion Models	Tahira Kazimi et.al.	2412.18604	null
2024-12-24	Long-Form Speech Generation with Spoken Language Models	Se Jin Park et.al.	2412.18603	link
2024-12-24	Decentralized Intelligence in GameFi: Embodied AI Agents and the Convergence of DeFi and Virtual Ecosystems	Fernando Jia et.al.	2412.18601	link
2024-12-24	A Paragraph is All It Takes: Rich Robot Behaviors from Interacting, Trusted LLMs	OpenMind et.al.	2412.18588	null
2024-12-24	Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control	Sergey Sedov et.al.	2412.18582	null
2024-12-24	Zero-resource Speech Translation and Recognition with LLMs	Karel Mundnich et.al.	2412.18566	null
2024-12-24	Distilling Fine-grained Sentiment Understanding from Large Language Models	Yice Zhang et.al.	2412.18552	link
2024-12-24	Token-Budget-Aware LLM Reasoning	Tingxu Han et.al.	2412.18547	link
2024-12-24	Consistency Checks for Language Model Forecasters	Daniel Paleka et.al.	2412.18544	null
2024-12-24	PLD-Tree: Persistent Laplacian Decision Tree for Protein-Protein Binding Free Energy Prediction	Xingjian Xu et.al.	2412.18541	null
2024-12-24	Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation	Derong Xu Xinhang Li et.al.	2412.18537	link
2024-12-24	Automated Code Review In Practice	Umut Cihan et.al.	2412.18531	null
2024-12-24	The Key of Understanding Vision Tasks: Explanatory Instructions	Yang Shen et.al.	2412.18525	link
2024-12-24	Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving	Hao Pang et.al.	2412.18511	null
2024-12-24	Think or Remember? Detecting and Directing LLMs Towards Memorization or Generalization	Yi-Fu Fu et.al.	2412.18497	null
2024-12-24	Generating event descriptions under syntactic and semantic constraints	Angela Cao et.al.	2412.18496	link
2024-12-24	Segment-Based Attention Masking for GPTs	Shahar Katz et.al.	2412.18487	link
2024-12-24	3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding	Tatiana Zemskova et.al.	2412.18450	link
2024-12-23	ChatGarment: Garment Estimation, Generation and Editing via Large Language Models	Siyuan Bian et.al.	2412.17811	null
2024-12-23	Reconstructing People, Places, and Cameras	Lea Müller et.al.	2412.17806	link
2024-12-23	Examining Imbalance Effects on Performance and Demographic Fairness of Clinical Language Models	Precious Jones et.al.	2412.17803	null
2024-12-23	Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection	Yitong Chen et.al.	2412.17800	link
2024-12-23	Automating the Search for Artificial Life with Foundation Models	Akarsh Kumar et.al.	2412.17799	link
2024-12-23	Memory makes computation universal, remember?	Erik Garrison et.al.	2412.17794	null
2024-12-23	Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective	Xinmiao Yu et.al.	2412.17787	null
2024-12-23	PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion	Sophia Tang et.al.	2412.17780	null
2024-12-23	ResearchTown: Simulator of Human Research Community	Haofei Yu et.al.	2412.17767	link
2024-12-23	Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy	Priyaranjan Pattnayak et.al.	2412.17759	null
2024-12-23	ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback	Wei Zhang et.al.	2412.17754	null
2024-12-23	Deliberation in Latent Space via Differentiable Cache Augmentation	Luyang Liu et.al.	2412.17747	null
2024-12-23	YuLan-Mini: An Open Data-efficient Language Model	Yiwen Hu et.al.	2412.17743	link
2024-12-23	Reasoning to Attend: Try to Understand How Token Works	Rui Qian et.al.	2412.17741	link
2024-12-23	Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization	Ermo Hua et.al.	2412.17739	link
2024-12-23	Knowledge Editing through Chain-of-Thought	Changyue Wang et.al.	2412.17727	link
2024-12-23	From Models to Microtheories: Distilling a Model’s Topical Knowledge for Grounded Question Answering	Nathaniel Weir et.al.	2412.17701	link
2024-12-23	Understanding the Logic of Direct Preference Alignment through Logic	Kyle Richardson et.al.	2412.17696	null
2024-12-23	FedTLU: Federated Learning with Targeted Layer Updates	Jong-Ik Park et.al.	2412.17692	null
2024-12-23	Large Language Model Safety: A Holistic Survey	Dan Shi et.al.	2412.17686	link
2024-12-20	HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding	Chenxin Tao et.al.	2412.16158	null
2024-12-20	Frequency Is What You Need: Word-frequency Masking Benefits Vision-Language Model Pre-training	Mingliang Liang et.al.	2412.16148	link
2024-12-20	Offline Reinforcement Learning for LLM Multi-Step Reasoning	Huaijie Wang et.al.	2412.16145	link
2024-12-20	Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation	Seyedreza Mohseni et.al.	2412.16135	null
2024-12-20	Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information	Dirk Bergemann et.al.	2412.16132	null
2024-12-20	PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics	Daniil Larionov et.al.	2412.16120	null
2024-12-20	Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts	Muhammad Abdullah Sohail et.al.	2412.16119	link
2024-12-20	PruneVid: Visual Token Pruning for Efficient Video Large Language Models	Xiaohu Huang et.al.	2412.16117	link
2024-12-20	The Content Moderator’s Dilemma: Removal of Toxic Content and Distortions to Online Discourse	Mahyar Habibi et.al.	2412.16114	null
2024-12-20	Demystifying the Potential of ChatGPT-4 Vision for Construction Progress Monitoring	Ahmet Bahaddin Ersoz et.al.	2412.16108	null
2024-12-20	Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers	Yifan Yang et.al.	2412.16102	null
2024-12-20	Logical Consistency of Large Language Models in Fact-checking	Bishwamittra Ghosh et.al.	2412.16100	null
2024-12-20	The Evolution of LLM Adoption in Industry Data Curation Practices	Crystal Qian et.al.	2412.16089	null
2024-12-20	Efficient MedSAMs: Segment Anything in Medical Images on Laptop	Jun Ma et.al.	2412.16085	link
2024-12-20	Formal Mathematical Reasoning: A New Frontier in AI	Kaiyu Yang et.al.	2412.16075	null
2024-12-20	A Framework for Streaming Event-Log Prediction in Business Processes	Benedikt Bollig et.al.	2412.16032	null
2024-12-20	The Only Way is Ethics: A Guide to Ethical Research with Large Language Models	Eddie L. Ungless et.al.	2412.16022	link
2024-12-20	Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs	Lynn Greschner et.al.	2412.15993	null
2024-12-20	BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models	Patrick Haller et.al.	2412.15978	null
2024-12-20	Legommenders: A Comprehensive Content-Based Recommendation Library with LLM Support	Qijiong Liu et.al.	2412.15973	link
2024-12-19	PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation	Muntasir Wahed et.al.	2412.15209	null
2024-12-19	OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving	Shuo Xing et.al.	2412.15208	link
2024-12-19	AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving	Shuo Xing et.al.	2412.15206	link
2024-12-19	MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark	Qihao Zhao et.al.	2412.15194	link
2024-12-19	EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues	Sagar Soni et.al.	2412.15190	null
2024-12-19	LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation	Weijia Shi et.al.	2412.15188	null
2024-12-19	Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning	Simon Frieder et.al.	2412.15184	null
2024-12-19	STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning	Marius Memmel et.al.	2412.15182	null
2024-12-19	HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages	Aman Chaturvedi et.al.	2412.15178	null
2024-12-19	Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying	Federico Castagna et.al.	2412.15177	link
2024-12-19	Rethinking Uncertainty Estimation in Natural Language Generation	Lukas Aichberger et.al.	2412.15176	null
2024-12-19	Language Models as Continuous Self-Evolving Data Engineers	Peidong Wang et.al.	2412.15151	null
2024-12-19	Adaptive Pruning for Large Language Models with Structural Importance Awareness	Haotian Zheng et.al.	2412.15127	null
2024-12-19	Outcome-Refining Process Supervision for Code Generation	Zhuohao Yu et.al.	2412.15118	link
2024-12-19	Qwen2.5 Technical Report	Qwen et.al.	2412.15115	link
2024-12-19	Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture	Thomas F Burns et.al.	2412.15113	link
2024-12-19	Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search	Lei Tan et.al.	2412.15106	null
2024-12-19	Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability	Xiangsen Chen et.al.	2412.15101	null
2024-12-19	Nano-ESG: Extracting Corporate Sustainability Information from News Articles	Fabian Billert et.al.	2412.15093	link
2024-12-19	ScamChatBot: An End-to-End Analysis of Fake Account Recovery on Social Media via Chatbots	Bhupendra Acharya et.al.	2412.15072	null
2024-12-18	Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces	Jihan Yang et.al.	2412.14171	link
2024-12-18	TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks	Frank F. Xu et.al.	2412.14161	link
2024-12-18	Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics with Large Language Models	Atin Sakkeer Hussain et.al.	2412.14146	null
2024-12-18	Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation	Jianyu Zhang et.al.	2412.14145	null
2024-12-18	LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research	Tianyang Gu et.al.	2412.14141	null
2024-12-18	Design choices made by LLM-based test generators prevent them from finding bugs	Noble Saji Mathews et.al.	2412.14137	null
2024-12-18	Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models	Ido Cohen et.al.	2412.14133	link
2024-12-18	Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation	Rémi Marsal et.al.	2412.14103	null
2024-12-18	Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts	Jihye Choi et.al.	2412.14097	null
2024-12-18	Alignment faking in large language models	Ryan Greenblatt et.al.	2412.14093	link
2024-12-18	Future Research Avenues for Artificial Intelligence in Digital Gaming: An Exploratory Report	Markus Dablander et.al.	2412.14085	null
2024-12-18	Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification	Kyle Thompson et.al.	2412.14063	link
2024-12-18	Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets	Simon Thorne et.al.	2412.14062	null
2024-12-18	Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models	Xinghang Li et.al.	2412.14058	null
2024-12-18	A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future	Shilin Sun et.al.	2412.14056	link
2024-12-18	Digestion Algorithm in Hierarchical Symbolic Forests: A Fast Text Normalization Algorithm and Semantic Parsing Framework for Specific Scenarios and Lightweight Deployment	Kevin You et.al.	2412.14054	link
2024-12-18	Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation	Vera Neplenbroek et.al.	2412.14050	link
2024-12-18	CAD-Recode: Reverse Engineering CAD Code from Point Clouds	Danila Rukhovich et.al.	2412.14042	link
2024-12-18	Hansel: Output Length Controlling Framework for Large Language Models	Seoha Song et.al.	2412.14033	null
2024-12-18	Discovering maximally consistent distribution of causal tournaments with Large Language Models	Federico Baldo et.al.	2412.14019	null
2024-12-17	Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents	Yifei Zhou et.al.	2412.13194	null
2024-12-17	GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding	Haoyi Jiang et.al.	2412.13193	link
2024-12-17	HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction	Chen Bao et.al.	2412.13187	null
2024-12-17	Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration	Mark Endo et.al.	2412.13180	null
2024-12-17	SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents	Sheng Yin et.al.	2412.13178	link
2024-12-17	DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation	Miriam Wanner et.al.	2412.13175	null
2024-12-17	Locate n’ Rotate: Two-stage Openable Part Detection with Foundation Model Priors	Siqi Li et.al.	2412.13173	link
2024-12-17	Compressed Chain of Thought: Efficient Reasoning Through Dense Representations	Jeffrey Cheng et.al.	2412.13171	null
2024-12-17	Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study	Bolei Ma et.al.	2412.13169	link
2024-12-17	C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System	Parker Addison et.al.	2412.13163	null
2024-12-17	SWAN: Preprocessing SGD Enables Adam-Level Performance On LLM Training With Significant Memory Reduction	Chao Ma et.al.	2412.13148	null
2024-12-17	Are Your LLMs Capable of Stable Reasoning?	Junnan Liu et.al.	2412.13147	link
2024-12-17	A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis	Xiao Zhou et.al.	2412.13126	null
2024-12-17	AI PERSONA: Towards Life-long Personalization of LLMs	Tiannan Wang et.al.	2412.13103	null
2024-12-17	AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark	Jianlyu Chen et.al.	2412.13102	link
2024-12-17	Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election	Roberto Mondini et.al.	2412.13098	null
2024-12-17	LMUnit: Fine-grained Evaluation with Natural Language Unit Tests	Jon Saad-Falcon et.al.	2412.13091	null
2024-12-17	Taming Multi-Domain, -Fidelity Data: Towards Foundation Models for Atomistic Scale Simulations	Tomoya Shiota et.al.	2412.13088	link
2024-12-17	Modality-Inconsistent Continual Learning of Multimodal Large Language Models	Weiguo Pian et.al.	2412.13050	null
2024-12-17	Harnessing Event Sensory Data for Error Pattern Prediction in Vehicles: A Language Model Approach	Hugo Math et.al.	2412.13041	link
2024-12-16	SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator	Guoxuan Chen et.al.	2412.12094	link
2024-12-16	Instruction-based Image Manipulation by Watching How Things Move	Mingdeng Cao et.al.	2412.12087	null
2024-12-16	CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology	Yuxuan Sun et.al.	2412.12077	null
2024-12-16	CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Guo Chen et.al.	2412.12075	null
2024-12-16	Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats	Kuleen Sasse et.al.	2412.12072	link
2024-12-16	How Private are Language Models in Abstractive Summarization?	Anthony Hughes et.al.	2412.12040	null
2024-12-16	Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection	Ira Ceka et.al.	2412.12039	null
2024-12-16	FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning	Gaojian Wang et.al.	2412.12032	link
2024-12-16	SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval	Yueqian Lin et.al.	2412.12009	link
2024-12-16	The Open Source Advantage in Large Language Models (LLMs)	Jiya Manchanda et.al.	2412.12004	null
2024-12-16	LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts	Zhuhao Wang et.al.	2412.12001	link
2024-12-16	SAMIC: Segment Anything with In-Context Spatial Prompt Engineering	Savinay Nagendra et.al.	2412.11998	null
2024-12-16	Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support	Devika Venugopalan et.al.	2412.11995	link
2024-12-16	ExecRepoBench: Multi-level Executable Code Completion Evaluation	Jian Yang et.al.	2412.11990	null
2024-12-16	SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset Generation	Debarshi Kundu et.al.	2412.11988	link
2024-12-16	Cost-Effective Label-free Node Classification with LLMs	Taiyan Zhang et.al.	2412.11983	link
2024-12-16	AlphaZero Neural Scaling and Zipf’s Law: a Tale of Board Games and Power Laws	Oren Neumann et.al.	2412.11979	link
2024-12-16	Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection	Beomseok Lee et.al.	2412.11978	null
2024-12-16	Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning	Qi Sun et.al.	2412.11974	link
2024-12-16	DARWIN 1.5: Large Language Models as Materials Science Adapted Learners	Tong Xie et.al.	2412.11970	link
2024-12-13	UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities	Muhammad Uzair Khattak et.al.	2412.10372	link
2024-12-13	A Grounded Typology of Word Classes	Coleman Haley et.al.	2412.10369	null
2024-12-13	Robust image classification with multi-modal large language models	Francesco Villani et.al.	2412.10353	null
2024-12-13	Towards a foundation model for heavy-ion collision experiments through point cloud diffusion	Manjunath Omana Kuttan et.al.	2412.10352	null
2024-12-13	A dual contrastive framework	Yuan Sun et.al.	2412.10348	null
2024-12-13	COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models	Yuchen Ren et.al.	2412.10347	null
2024-12-13	Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining	Zhiqi Ge et.al.	2412.10342	null
2024-12-13	AdvPrefix: An Objective for Nuanced LLM Jailbreaks	Sicheng Zhu et.al.	2412.10321	link
2024-12-13	BrushEdit: All-In-One Image Inpainting and Editing	Yaowei Li et.al.	2412.10316	null
2024-12-13	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Zhiyu Wu et.al.	2412.10302	link
2024-12-13	Still “Talking About Large Language Models”: Some Clarifications	Murray Shanahan et.al.	2412.10291	null
2024-12-13	One world, one opinion? The superstar effect in LLM responses	Sofie Goethals et.al.	2412.10281	null
2024-12-13	Benchmarking Linguistic Diversity of Large Language Models	Yanzhu Guo et.al.	2412.10271	link
2024-12-13	Cultural Evolution of Cooperation among LLM Agents	Aron Vallinder et.al.	2412.10270	null
2024-12-13	Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT	Danielle R. Thomas et.al.	2412.10267	link
2024-12-13	Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media	Jiaqing Yuan et.al.	2412.10266	null
2024-12-13	Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models	Harry J. Davies et.al.	2412.10257	null
2024-12-13	Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts	Hazel Kim et.al.	2412.10246	null
2024-12-13	Efficient Continual Pre-training of LLMs for Low-resource Languages	Arijit Nag et.al.	2412.10244	null
2024-12-13	Retrieval-Augmented Semantic Parsing: Using Large Language Models to Improve Generalization	Xiao Zhang et.al.	2412.10207	null
2024-12-12	EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Zhuofan Zong et.al.	2412.09618	null
2024-12-12	V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding	Junqi Ge et.al.	2412.09616	link
2024-12-12	PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models	Chenyu Yang et.al.	2412.09613	null
2024-12-12	Olympus: A Universal Task Router for Computer Vision Tasks	Yuanze Lin et.al.	2412.09612	link
2024-12-12	Feat2GS: Probing Visual Foundation Models with Gaussian Splatting	Yue Chen et.al.	2412.09606	null
2024-12-12	AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials	Yiheng Xu et.al.	2412.09605	null
2024-12-12	SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding	Hao Li et.al.	2412.09604	null
2024-12-12	Do Multimodal Large Language Models See Like Humans?	Jiaying Lin et.al.	2412.09603	null
2024-12-12	Hidden Biases of End-to-End Driving Datasets	Julian Zimmerlin et.al.	2412.09602	link
2024-12-12	InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions	Pan Zhang et.al.	2412.09596	link
2024-12-12	OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages	Chester Palen-Michel et.al.	2412.09587	null
2024-12-12	DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction	Yu Feng et.al.	2412.09572	null
2024-12-12	Does Representation Matter? Exploring Intermediate Layers in Large Language Models	Oscar Skean et.al.	2412.09563	null
2024-12-12	Foundational Large Language Models for Materials Research	Vaibhav Mishra et.al.	2412.09560	link
2024-12-12	Video Creation by Demonstration	Yihong Sun et.al.	2412.09551	null
2024-12-12	Exemplar Masking for Multimodal Incremental Learning	Yi-Lun Lee et.al.	2412.09549	link
2024-12-12	Capturing the Temporal Dependence of Training Data Influence	Jiachen T. Wang et.al.	2412.09538	null
2024-12-12	Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM	Han Wang et.al.	2412.09530	link
2024-12-12	Can Modern LLMs Act as Agent Cores in Radiology~Environments?	Qiaoyu Zheng et.al.	2412.09529	link
2024-12-12	Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Clinical Pathology Analysis	Shengxuming Zhang et.al.	2412.09521	null
2024-12-11	Generative Semantic Communication: Architectures, Technologies, and Applications	Jinke Ren et.al.	2412.08642	null
2024-12-11	Fast Prompt Alignment for Text-to-Image Generation	Khalil Mrini et.al.	2412.08639	link
2024-12-11	Multimodal Latent Language Modeling with Next-Token Diffusion	Yutao Sun et.al.	2412.08635	link
2024-12-11	Synthetic Vision: Training Vision-Language Models to Understand Physics	Vahid Balazadeh et.al.	2412.08619	null
2024-12-11	Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models	Jiahui Li et.al.	2412.08615	link
2024-12-11	Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning	Fan Lu et.al.	2412.08614	link
2024-12-11	Competition and Diversity in Generative AI	Manish Raghavan et.al.	2412.08610	link
2024-12-11	AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models	Mintong Kang et.al.	2412.08608	null
2024-12-11	Preference Discerning with LLM-Enhanced Generative Retrieval	Fabian Paischer et.al.	2412.08604	null
2024-12-11	Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node	Imran Latif et.al.	2412.08602	null
2024-12-11	Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks	Arsalan Masoudifard et.al.	2412.08593	null
2024-12-11	Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning	Hang Zhao et.al.	2412.08587	null
2024-12-11	TURBOATTENTION: Efficient Attention Approximation For High Throughputs LLMs	Hao Kang et.al.	2412.08585	null
2024-12-11	LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations	Zejian Li et.al.	2412.08580	link
2024-12-11	Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning	Rongzhe Wei et.al.	2412.08559	null
2024-12-11	MaestroMotif: Skill Design from Artificial Intelligence Feedback	Martin Klissarov et.al.	2412.08542	null
2024-12-11	SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting	Pallavi Jain et.al.	2412.08536	link
2024-12-11	Continual Learning for Encoder-only Language Models via a Discrete Key-Value Bottleneck	Andor Diera et.al.	2412.08528	null
2024-12-11	EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance	Yingxin Li et.al.	2412.08521	null
2024-12-11	Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation	Pengyue Jia et.al.	2412.08519	null
2024-12-10	Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences	Alan Nawzad Amin et.al.	2412.07763	link
2024-12-10	SAT: Spatial Aptitude Training for Multimodal Language Models	Arijit Ray et.al.	2412.07755	null
2024-12-10	LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models	Ziqi Lu et.al.	2412.07746	null
2024-12-10	Zero-Shot ATC Coding with Large Language Models for Clinical Assessments	Zijian Chen et.al.	2412.07743	null
2024-12-10	AI Expands Scientists’ Impact but Contracts Science’s Focus	Qianyue Hao et.al.	2412.07727	link
2024-12-10	Granite Guardian	Inkit Padhi et.al.	2412.07724	link
2024-12-10	Leveraging Content and Context Cues for Low-Light Image Enhancement	Igor Morawski et.al.	2412.07693	link
2024-12-10	DriveMM: All-in-One Large Multimodal Model for Autonomous Driving	Zhijian Huang et.al.	2412.07689	link
2024-12-10	Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions	Anant Prakash Awasthi et.al.	2412.07687	null
2024-12-10	TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation	Alfredo Garrachón Ruiz et.al.	2412.07682	null
2024-12-10	RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models	Greg Heinrich et.al.	2412.07679	link
2024-12-10	Ask Humans or AI? Exploring Their Roles in Visualization Troubleshooting	Shuyu Shen et.al.	2412.07673	link
2024-12-10	FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks	Bocheng Chen et.al.	2412.07672	null
2024-12-10	Automating Business Intelligence Requirements with Generative AI and Semantic Search	Nimrod Busany et.al.	2412.07668	null
2024-12-10	Searching for Structure: Investigating Emergent Communication with Large Language Models	Tom Kouwenhoven et.al.	2412.07646	null
2024-12-10	TrojanWhisper: Evaluating Pre-trained LLMs to Detect and Localize Hardware Trojans	Md Omar Faruque et.al.	2412.07636	null
2024-12-10	ChocoLlama: Lessons Learned From Teaching Llamas Dutch	Matthieu Meeus et.al.	2412.07633	null
2024-12-10	Piece of Table: A Divide-and-Conquer Approach for Selecting Sub-Tables in Table Question Answering	Wonjin Lee et.al.	2412.07629	null
2024-12-10	OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations	Linke Ouyang et.al.	2412.07626	link
2024-12-10	DRUM: Learning Demonstration Retriever for Large MUlti-modal Models	Ellen Yi-Ge et.al.	2412.07619	null
2024-12-09	Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models	Yi-Lun Lee et.al.	2412.06775	link
2024-12-09	Visual Lexicon: Rich Image Features in Language Space	XuDong Wang et.al.	2412.06774	null
2024-12-09	Training Large Language Models to Reason in a Continuous Latent Space	Shibo Hao et.al.	2412.06769	link
2024-12-09	Ranking-aware adapter for text-driven image ordering with CLIP	Wei-Hsiang Yu et.al.	2412.06760	link
2024-12-09	Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code	Joy Krishan Das et.al.	2412.06757	null
2024-12-09	Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models	Neel Jain et.al.	2412.06748	null
2024-12-09	ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities	Adhiraj Ghosh et.al.	2412.06745	null
2024-12-09	JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM	Takuro Fujii et.al.	2412.06738	link
2024-12-09	AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark	Lan Li et.al.	2412.06724	link
2024-12-09	How to Merge Your Multimodal Models Over Time?	Sebastian Dziadzio et.al.	2412.06712	link
2024-12-09	OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions	Yi-Kai Zhang et.al.	2412.06693	null
2024-12-09	Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach	Weichao Xu et.al.	2412.06684	null
2024-12-09	Toward LLM-Agent-Based Modeling of Transportation Systems: A Conceptual Framework	Tianming Liu et.al.	2412.06681	null
2024-12-09	I Don’t Know: Explicit Modeling of Uncertainty with an [IDK] Token	Roi Cohen et.al.	2412.06676	null
2024-12-09	ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance	Chunwei Wang et.al.	2412.06673	null
2024-12-09	MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models	Shansong Liu et.al.	2412.06660	link
2024-12-09	Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben	Rainer Mühlhoff et.al.	2412.06651	null
2024-12-09	The Narrow Gate: Localized Image-Text Communication in Vision-Language Models	Alessandro Serra et.al.	2412.06646	null
2024-12-09	MAVias: Mitigate any Visual Bias	Ioannis Sarridis et.al.	2412.06632	null
2024-12-09	Copyright-Protected Language Generation via Adaptive Model Fusion	Javier Abad et.al.	2412.06619	link
2024-12-06	Birth and Death of a Rose	Chen Geng et.al.	2412.05278	null
2024-12-06	Sparse autoencoders reveal selective remapping of visual concepts during adaptation	Hyesu Lim et.al.	2412.05276	link
2024-12-06	Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Zhe Chen et.al.	2412.05271	link
2024-12-06	APOLLO: SGD-like Memory, AdamW-level Performance	Hanqing Zhu et.al.	2412.05270	link
2024-12-06	Uncertainty Quantification for Transformer Models for Dark-Pattern Detection	Javier Muñoz et.al.	2412.05251	null
2024-12-06	Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization	Luca Masserano et.al.	2412.05244	null
2024-12-06	CompCap: Improving Multimodal Large Language Models with Composite Captions	Xiaohui Chen et.al.	2412.05243	null
2024-12-06	MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale	Jarvis Guo et.al.	2412.05237	null
2024-12-06	BEExformer: A Fast Inferencing Transformer Architecture via Binarization with Multiple Early Exits	Wazib Ansar et.al.	2412.05225	null
2024-12-06	100% Hallucination Elimination Using Acurai	Michael C. Wood et.al.	2412.05223	link
2024-12-06	Evaluating and Aligning CodeLLMs on Human Preference	Jian Yang et.al.	2412.05210	null
2024-12-06	A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges	Aditi Singh et.al.	2412.05208	null
2024-12-06	Are Frontier Large Language Models Suitable for Q&A in Science Centres?	Jacob Watson et.al.	2412.05200	null
2024-12-06	SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot	Jinlin Wu et.al.	2412.05187	link
2024-12-06	LinVT: Empower Your Image-level Large Language Model to Understand Videos	Lishuai Gao et.al.	2412.05185	link
2024-12-06	QueEn: A Large Language Model for Quechua-English Translation	Junhao Chen et.al.	2412.05184	null
2024-12-06	Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models	Kuofeng Gao et.al.	2412.05167	null
2024-12-06	Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation	Manish Bhattarai et.al.	2412.05159	null
2024-12-06	Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies	Recep Firat Cekinel et.al.	2412.05155	link
2024-12-06	A text-to-tabular approach to generate synthetic patient data using LLMs	Margaux Tornqvist et.al.	2412.05153	link
2024-12-05	Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail	Luca Bartolomei et.al.	2412.04472	link
2024-12-05	NVILA: Efficient Frontier Visual Language Models	Zhijian Liu et.al.	2412.04468	null
2024-12-05	VisionZip: Longer is Better but Not Necessary in Vision Language Models	Senqiao Yang et.al.	2412.04467	link
2024-12-05	Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection	Enshen Zhou et.al.	2412.04455	null
2024-12-05	p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay	Jun Zhang et.al.	2412.04449	link
2024-12-05	EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Lu Qiu et.al.	2412.04447	null
2024-12-05	DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models	Yizhuo Li et.al.	2412.04446	null
2024-12-05	Moto: Latent Motion Token as the Bridging Language for Robot Manipulation	Yi Chen et.al.	2412.04445	link
2024-12-05	Towards Real-Time Open-Vocabulary Video Instance Segmentation	Bin Yan et.al.	2412.04434	null
2024-12-05	Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation	Yuying Ge et.al.	2412.04432	link
2024-12-05	Grounding Descriptions in Images informs Zero-Shot Visual Recognition	Shaunak Halbe et.al.	2412.04429	link
2024-12-05	Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion	Jiuhai Chen et.al.	2412.04424	link
2024-12-05	Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation	Xuying Li et.al.	2412.04415	null
2024-12-05	Establishing Task Scaling Laws via Compute-Efficient Model Ladders	Akshita Bhagia et.al.	2412.04403	null
2024-12-05	SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding	Rong Li et.al.	2412.04383	null
2024-12-05	Discriminative Fine-tuning of LVLMs	Yassine Ouali et.al.	2412.04378	null
2024-12-05	Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting	Edoardo Cetin et.al.	2412.04368	null
2024-12-05	Approximate Top- $k$ for Increased Parallelism	Oscar Key et.al.	2412.04358	null
2024-12-05	Retrieval-Augmented Machine Translation with Unstructured Knowledge	Jiaan Wang et.al.	2412.04342	link
2024-12-05	Liquid: Language Models are Scalable Multi-modal Generators	Junfeng Wu et.al.	2412.04332	link
2024-12-04	From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents	Xinyi Mou et.al.	2412.03563	link
2024-12-04	FLAIR: VLM with Fine-grained Language-informed Image Representations	Rui Xiao et.al.	2412.03561	link
2024-12-04	Best-of-N Jailbreaking	John Hughes et.al.	2412.03556	link
2024-12-04	PaliGemma 2: A Family of Versatile VLMs for Transfer	Andreas Steiner et.al.	2412.03555	null
2024-12-04	SPICE: Smart Projection Interface for Cooking Enhancement	Vera Prohaska et.al.	2412.03551	link
2024-12-04	Perception Tokens Enhance Visual Reasoning in Multimodal Language Models	Mahtab Bigverdi et.al.	2412.03548	null
2024-12-04	Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models	Natalie Mackraz et.al.	2412.03537	null
2024-12-04	A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences	Gabriel Lino Garcia et.al.	2412.03531	null
2024-12-04	FANAL – Financial Activity News Alerting Language Modeling Framework	Urjitkumar Patel et.al.	2412.03527	null
2024-12-04	You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks?	Dominic Lohr et.al.	2412.03516	null
2024-12-04	Distillation of Diffusion Features for Semantic Correspondence	Frank Fundel et.al.	2412.03512	null
2024-12-04	Tight PAC-Bayesian Risk Certificates for Contrastive Learning	Anna van Elst et.al.	2412.03486	link
2024-12-04	Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning	Neale Ratzlaff et.al.	2412.03467	null
2024-12-04	Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks	Dario Serez et.al.	2412.03453	link
2024-12-04	From Words to Workflows: Automating Business Processes	Laura Minkova et.al.	2412.03446	null
2024-12-04	Assessing Foundation Models’ Transferability to Physiological Signals in Precision Medicine	Matthias Christenson et.al.	2412.03427	null
2024-12-04	PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation	Ao Wang et.al.	2412.03409	link
2024-12-04	RedStone: Curating General, Code, Math, and QA Data for Large Language Models	Yaoyao Chang et.al.	2412.03398	null
2024-12-04	Enhancing Supply Chain Visibility with Generative AI: An Exploratory Case Study on Relationship Prediction in Knowledge Graphs	Ge Zheng et.al.	2412.03390	null
2024-12-04	WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis	Chengwei Hu et.al.	2412.03359	null
2024-12-03	T-REG: Preference Optimization with Token-Level Reward Regularization	Wenxuan Zhou et.al.	2412.02685	null
2024-12-03	Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models	Yuda Song et.al.	2412.02674	null
2024-12-03	LLM-Enhanced Path Planning: Safe and Efficient Autonomous Navigation with Instructional Inputs	Pranav Doma et.al.	2412.02655	null
2024-12-03	Time-Reversal Provides Unsupervised Feedback to LLMs	Yerram Varun et.al.	2412.02626	null
2024-12-03	Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions	Kai Sun et.al.	2412.02621	null
2024-12-03	Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback	Hiroki Furuta et.al.	2412.02617	null
2024-12-03	GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot	Aohan Zeng et.al.	2412.02612	link
2024-12-03	AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?	Kaixiong Gong et.al.	2412.02611	null
2024-12-03	Interpretable Company Similarity with Sparse Autoencoders	Marco Molinari et.al.	2412.02605	null
2024-12-03	CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs	Abhas Kumar et.al.	2412.02602	null
2024-12-03	PrefixLLM: LLM-aided Prefix Circuit Design	Weihua Xiao et.al.	2412.02594	link
2024-12-03	OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation	Junyuan Zhang et.al.	2412.02592	link
2024-12-03	Explainable CTR Prediction via LLM Reasoning	Xiaohan Yu et.al.	2412.02588	null
2024-12-03	Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey	Chenyang Liu et.al.	2412.02573	link
2024-12-03	SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection	Joongwon Chae et.al.	2412.02565	link
2024-12-03	Semantic Tokens in Retrieval Augmented Generation	Joel Suro et.al.	2412.02563	null
2024-12-03	Patent-CR: A Dataset for Patent Claim Revision	Lekang Jiang et.al.	2412.02549	null
2024-12-03	Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks	Jinjin Cai et.al.	2412.02531	null
2024-12-03	LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data	Hanyu Zhang et.al.	2412.02525	null
2024-12-03	OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations	Caixin Kang et.al.	2412.02479	null
2024-12-02	T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Shukang Yin et.al.	2411.19951	link
2024-12-02	Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability	Zicheng Lin et.al.	2411.19943	link
2024-11-29	VLSBench: Unveiling Visual Leakage in Multimodal Safety	Xuhao Hu et.al.	2411.19939	link
2024-11-29	On Domain-Specific Post-Training for Multimodal Large Language Models	Daixuan Cheng et.al.	2411.19930	null
2024-11-29	SIMS: Simulating Human-Scene Interactions with Real World Script Planning	Wenjia Wang et.al.	2411.19921	null
2024-11-29	FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation	Chang Won Lee et.al.	2411.19888	null
2024-11-29	PDDLFuse: A Tool for Generating Diverse Planning Domains	Vedant Khandelwal et.al.	2411.19886	null
2024-12-02	LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states	Luis Ibanez-Lissen et.al.	2411.19876	null
2024-11-29	DeMo: Decoupled Momentum Optimization	Bowen Peng et.al.	2411.19870	link
2024-11-29	AIDetx: a compression-based method for identification of machine-learning generated text	Leonardo Almeida et.al.	2411.19869	link
2024-11-29	Reverse Thinking Makes LLMs Stronger Reasoners	Justin Chih-Yao Chen et.al.	2411.19865	null
2024-11-29	Cross-Domain Recommendation Meets Large Language Models	Ajay Krishna Vajjala et.al.	2411.19862	link
2024-11-29	What fifty-one years of Linguistics and Artificial Intelligence research tell us about their correlation: A scientometric review	Mohammed Q. Shormani et.al.	2411.19858	null
2024-11-29	Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation	Dimosthenis Antypas et.al.	2411.19832	null
2024-11-29	Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation	Robin D. Pesl et.al.	2411.19804	null
2024-11-29	INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge	Angelika Romanou et.al.	2411.19799	null
2024-11-29	MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks	Yiming Wu et.al.	2411.19786	null
2024-11-29	PerLA: Perceptive 3D Language Assistant	Guofeng Mei et.al.	2411.19774	null
2024-11-29	LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos	Tiantian Geng et.al.	2411.19772	link
2024-11-29	Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models	Kaican Li et.al.	2411.19757	link
2024-11-27	Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation	Yueru Jia et.al.	2411.18623	null
2024-11-27	Cross-modal Information Flow in Multimodal Large Language Models	Zhi Zhang et.al.	2411.18620	link
2024-11-27	Diffusion Self-Distillation for Zero-Shot Customized Image Generation	Shengqu Cai et.al.	2411.18616	null
2024-11-27	Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation	Nurshat Fateh Ali et.al.	2411.18583	null
2024-11-27	Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning	Omkar Khade et.al.	2411.18571	null
2024-11-27	A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models	Rong Wang et.al.	2411.18564	null
2024-11-27	DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation	Zhixuan Liang et.al.	2411.18562	null
2024-11-27	Retrofitting (Large) Language Models with Dynamic Tokenization	Darius Feher et.al.	2411.18553	null
2024-11-27	AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans	Dillon Loh et.al.	2411.18539	link
2024-11-27	Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models	Minhyeok Lee et.al.	2411.18530	link
2024-11-27	LLM-ABBA: Understand time series via symbolic approximation	Erin Carson et.al.	2411.18506	null
2024-11-27	GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation	Pengfei Zhou et.al.	2411.18499	null
2024-11-27	Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS	Jinyang Wu et.al.	2411.18478	null
2024-11-27	Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding	Ziyin Zhang et.al.	2411.18462	link
2024-11-27	Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator	Frederic Kirstein et.al.	2411.18444	null
2024-11-27	An AI-Assisted Multi-Agent Dual Dialogue System to Support Mental Health Care Providers	Onno P. Kampman et.al.	2411.18429	null
2024-11-27	FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving	Ao Shen et.al.	2411.18424	null
2024-11-27	Politicians vs ChatGPT. A study of presuppositions in French and Italian political communication	Davide Garassino et.al.	2411.18403	null
2024-11-27	Topic Modeling and Sentiment Analysis on Japanese Online Media’s Coverage of Nuclear Energy	Yifan Sun et.al.	2411.18383	null
2024-11-27	ChatGPT as speechwriter for the French presidents	Dominique Labbé et.al.	2411.18382	null
2024-11-26	Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats	Jiaxin Wen et.al.	2411.17693	null
2024-11-26	Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens	Xu Ouyang et.al.	2411.17691	null
2024-11-26	Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration	Yuhang Han et.al.	2411.17686	null
2024-11-26	Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning	Zhu Xu et.al.	2411.17679	link
2024-11-26	Instance-Aware Graph Prompt Learning	Jiazheng Li et.al.	2411.17676	null
2024-11-26	Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting	Liyun Zhang et.al.	2411.17674	null
2024-11-26	SketchAgent: Language-Driven Sequential Sketch Generation	Yael Vinker et.al.	2411.17673	null
2024-11-26	Synthetic Data Generation with LLM for Improved Depression Prediction	Andrea Kang et.al.	2411.17672	null
2024-11-26	How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations	Hyunji Lee et.al.	2411.17666	null
2024-11-26	Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism	Yi-Chien Lin et.al.	2411.17651	link
2024-11-26	On Limitations of LLM as Annotator for Low Resource Languages	Suramya Jadhav et.al.	2411.17637	null
2024-11-26	MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation	Harsh Singh et.al.	2411.17636	null
2024-11-26	Data-driven development of cycle prediction models for lithium metal batteries using multi modal mining	Jaewoong Lee et.al.	2411.17625	null
2024-11-26	Scaling Speech-Text Pre-training with Synthetic Interleaved Data	Aohan Zeng et.al.	2411.17607	null
2024-11-26	HyperSeg: Towards Universal Visual Segmentation with Large Language Model	Cong Wei et.al.	2411.17606	link
2024-11-26	Making History Readable	Bipasha Banerjee et.al.	2411.17600	null
2024-11-26	Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals	William A. Ingram et.al.	2411.17598	null
2024-11-26	Can artificial intelligence predict clinical trial outcomes?	Shuyi Jin et.al.	2411.17595	null
2024-11-26	RTL-Breaker: Assessing the Security of LLMs against Backdoor Attacks on HDL Code Generation	Lakshmi Likhitha Mankali et.al.	2411.17569	null
2024-11-26	Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey	Jiayi Kuang et.al.	2411.17558	null
2024-11-25	Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?	Sohee Yang et.al.	2411.16679	null
2024-11-25	Diffusion Features for Zero-Shot 6DoF Object Pose Estimation	Bernd Von Gimborn et.al.	2411.16668	null
2024-11-25	DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation	Zun Wang et.al.	2411.16657	null
2024-11-25	Self-Generated Critiques Boost Reward Modeling for Language Models	Yue Yu et.al.	2411.16646	null
2024-11-25	Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective	Jean Marie Tshimula et.al.	2411.16642	null
2024-11-25	StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training	Kaustubh Ponkshe et.al.	2411.16618	null
2024-11-25	Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models	Ronghuan Wu et.al.	2411.16602	null
2024-11-25	From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge	Dawei Li et.al.	2411.16594	link
2024-11-25	Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles	Klinsmann Agyei et.al.	2411.16587	link
2024-11-25	MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series	Aaron Wheeler et.al.	2411.16585	link
2024-11-25	Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision	Zhiheng Xi et.al.	2411.16579	null
2024-11-25	Predictive Power of LLMs in Financial Markets	Jerick Shi et.al.	2411.16569	null
2024-11-25	EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code	Shahriyar Zaman Ridoy et.al.	2411.16561	null
2024-11-25	Generating Out-Of-Distribution Scenarios Using Language Models	Erfan Aasi et.al.	2411.16554	null
2024-11-25	Representation Collapsing Problems in Vector Quantization	Wenhao Zhao et.al.	2411.16550	null
2024-11-25	RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics	Chan Hee Song et.al.	2411.16537	null
2024-11-25	Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings	Carolin M. Schuster et.al.	2411.16527	link
2024-11-25	Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency	Jerry Yao-Chieh Hu et.al.	2411.16525	null
2024-11-25	LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation	Steven Song et.al.	2411.16523	link
2024-11-25	Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis	Boming Miao et.al.	2411.16503	null
2024-11-22	Measuring Bullshit in the Language Games played by ChatGPT	Alessandro Trevisan et.al.	2411.15129	null
2024-11-22	Health AI Developer Foundations	Atilla P. Kiraly et.al.	2411.15128	null
2024-11-22	TÜLU 3: Pushing Frontiers in Open Language Model Post-Training	Nathan Lambert et.al.	2411.15124	link
2024-11-22	RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts	Hjalmar Wijk et.al.	2411.15114	link
2024-11-22	Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion	Samarth N Ramesh et.al.	2411.15113	null
2024-11-22	AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution	Fengyuan Liu et.al.	2411.15102	link
2024-11-22	What You See is Not What You Get: Neural Partial Differential Equations and The Illusion of Learning	Arvind Mohan et.al.	2411.15101	null
2024-11-22	XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	Yixin Dong et.al.	2411.15100	null
2024-11-22	Context-Aware Multimodal Pretraining	Karsten Roth et.al.	2411.15099	null
2024-11-22	mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA	Tao Zhang et.al.	2411.15041	null
2024-11-22	One to rule them all: natural language to bind communication, perception and action	Simone Colombani et.al.	2411.15033	null
2024-11-22	Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot	Simone Colombani et.al.	2411.15027	null
2024-11-22	DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models	Keda Tao et.al.	2411.15024	link
2024-11-22	FTA generation using GenAI with an Autonomy sensor Usecase	Sneha Sudhir Shetiya et.al.	2411.15007	null
2024-11-22	ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data	Junhong Shen et.al.	2411.15004	link
2024-11-22	Generative AI may backfire for counterspeech	Dominik Bär et.al.	2411.14986	null
2024-11-22	Exploring Foundation Models Fine-Tuning for Cytology Classification	Manon Dausort et.al.	2411.14975	link
2024-11-22	Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models	Alec Wright et.al.	2411.14972	link
2024-11-22	SwissADT: An Audio Description Translation System for Swiss Languages	Lukas Fischer et.al.	2411.14967	null
2024-11-22	LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement	Jieming Bian et.al.	2411.14961	null
2024-11-21	Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models	Yuhao Dong et.al.	2411.14432	link
2024-11-21	Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation	Zhuoman Liu et.al.	2411.14423	null
2024-11-21	From RNNs to Foundation Models: An Empirical Study on Commercial Building Energy Consumption	Shourya Bose et.al.	2411.14421	null
2024-11-21	Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Yiming Zhang et.al.	2411.14401	null
2024-11-21	Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings	Aaron Zheng et.al.	2411.14398	null
2024-11-21	UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages	Bethel Melesse Tessema et.al.	2411.14343	link
2024-11-21	SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching	Arjun P S et.al.	2411.14322	link
2024-11-21	Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training	Zheheng Luo et.al.	2411.14318	null
2024-11-21	Automated Generation of Code Debugging Exercises	Victor-Alexandru Pădurean et.al.	2411.14303	null
2024-11-21	Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams	Jitendra Bhandari et.al.	2411.14299	link
2024-11-21	EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild	Yumeng Liu et.al.	2411.14280	link
2024-11-21	Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance	Haozhe Zhao et.al.	2411.14279	null
2024-11-21	Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models	Iacopo Ghinassi et.al.	2411.14272	link
2024-11-21	Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective	Ernests Lavrinovics et.al.	2411.14258	null
2024-11-21	Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models	Javier Ferrando et.al.	2411.14257	null
2024-11-21	Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs	Zeyu Dong et.al.	2411.14256	null
2024-11-21	Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification	Junhua Liu et.al.	2411.14252	null
2024-11-21	Natural Language Reinforcement Learning	Xidong Feng et.al.	2411.14251	link
2024-11-21	FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression	Yuke Zhu et.al.	2411.14228	null
2024-11-21	Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data	Paul Fergus et.al.	2411.14219	null
2024-11-20	Find Any Part in 3D	Ziqi Ma et.al.	2411.13550	null
2024-11-20	SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs	Shirley Kokane et.al.	2411.13547	null
2024-11-20	Promoting User Data Autonomy During the Dissolution of a Monopolistic Firm	Rushabh Solanki et.al.	2411.13546	null
2024-11-20	BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Davide Paglieri et.al.	2411.13543	null
2024-11-20	Metacognition for Unknown Situations and Environments (MUSE)	Rodolfo Valiente et.al.	2411.13537	null
2024-11-20	Predictive Insights into LGBTQ+ Minority Stress: A Transductive Exploration of Social Media Discourse	S. Chapagain et.al.	2411.13534	link
2024-11-20	Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models	Chanseo Lee et.al.	2411.13518	null
2024-11-20	Disentangling Memory and Reasoning Ability in Large Language Models	Mingyu Jin et.al.	2411.13504	link
2024-11-20	Neural machine translation of seismic waves for petrophysical inversion	José Cunha Teixeira et.al.	2411.13491	null
2024-11-20	Utilizing Large Language Models to Synthesize Product Desirability Datasets	John D. Hastings et.al.	2411.13485	null
2024-11-20	PatentEdits: Framing Patent Novelty as Textual Entailment	Ryan Lee et.al.	2411.13477	null
2024-11-20	When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training	Haonan Wang et.al.	2411.13476	link
2024-11-20	SoK: A Systems Perspective on Compound AI Threats and Countermeasures	Sarbartha Banerjee et.al.	2411.13459	null
2024-11-20	LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models	Salvatore Mario Carta et.al.	2411.13453	null
2024-11-20	AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations	Gaurav Verma et.al.	2411.13451	null
2024-11-20	WaterPark: A Robustness Assessment of Language Model Watermarking	Jiacheng Liang et.al.	2411.13425	link
2024-11-20	Unleashing the Power of Large Language Models for Group POI Recommendations	Jing Long et.al.	2411.13415	null
2024-11-20	A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback	Alireza Rashidi Laleh et.al.	2411.13410	null
2024-11-20	Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology	Muhammad Sharif et.al.	2411.13409	null
2024-11-20	Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese	Dat Van-Thanh Nguyen et.al.	2411.13407	null
2024-11-19	ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models	Salma Kharrat et.al.	2411.12736	link
2024-11-19	Information Theory of Meaningful Communication	Doron Sivan et.al.	2411.12728	link
2024-11-19	CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs	Zhehan Kan et.al.	2411.12713	null
2024-11-19	Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs	Ahmed Akib Jawad Karim et.al.	2411.12712	null
2024-11-19	Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT?	Ahmed Akib Jawad Karim et.al.	2411.12703	null
2024-11-19	When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations	Huaizhi Ge et.al.	2411.12701	null
2024-11-19	SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference	Jiho Shin et.al.	2411.12692	null
2024-11-19	Neurosymbolic Graph Enrichment for Grounded World Models	Stefano De Giorgis et.al.	2411.12671	null
2024-11-19	DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models	Vinay Kumar Sankarapu et.al.	2411.12643	link
2024-11-19	Improving Controllability and Editability for Pretrained Text-to-Music Generation Models	Yixiao Zhang et.al.	2411.12641	null
2024-11-19	Provable unlearning in topic modeling and downstream tasks	Stanley Wei et.al.	2411.12600	null
2024-11-19	AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Yuanbin Man et.al.	2411.12593	null
2024-11-19	Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models	Laura Ruis et.al.	2411.12580	link
2024-11-19	Large Language Models for Combinatorial Optimization of Design Structure Matrix	Shuo Jiang et.al.	2411.12571	null
2024-11-19	Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues	Riccardo Grazzi et.al.	2411.12537	link
2024-11-19	Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution	Yang Zou et.al.	2411.12530	link
2024-11-19	Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus	Terufumi Morishita et.al.	2411.12498	link
2024-11-19	AI Flow at the Network Edge	Jiawei Shao et.al.	2411.12469	null
2024-11-19	Guide-to-Explain for Controllable Summarization	Sangwon Ryu et.al.	2411.12460	null
2024-11-19	\textsc{Neon}: News Entity-Interaction Extraction for Enhanced Question Answering	Sneha Singhania et.al.	2411.12449	null
2024-11-18	Bi-Mamba: Towards Accurate 1-Bit State Space Models	Shengkun Tang et.al.	2411.11843	null
2024-11-18	Tackling prediction tasks in relational databases with LLMs	Marek Wydmuch et.al.	2411.11829	null
2024-11-18	Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods	Egor Kovalev et.al.	2411.11795	null
2024-11-18	LLM-IE: A Python Package for Generative Information Extraction with Large Language Models	Enshuo Hsu et.al.	2411.11779	null
2024-11-18	sMoRe: Enhancing Object Manipulation and Organization in Mixed Reality Spaces with LLMs and Generative AI	Yunhao Xing et.al.	2411.11752	null
2024-11-18	BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration	Yuzong Chen et.al.	2411.11745	link
2024-11-18	Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment	Allison Huang et.al.	2411.11731	link
2024-11-18	Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation	Mingchao Qi et.al.	2411.11714	link
2024-11-18	FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models	Tao Fan et.al.	2411.11707	null
2024-11-18	MC-LLaVA: Multi-Concept Personalized Vision-Language Model	Ruichuan An et.al.	2411.11706	link
2024-11-18	Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search	Jinhao Jiang et.al.	2411.11694	null
2024-11-18	TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World	Xianlong Wang et.al.	2411.11683	null
2024-11-18	*PSPO: An Effective Process-supervised Policy Optimization for Reasoning Alignment**	Jiawei Li et.al.	2411.11681	link
2024-11-18	Dissecting Misalignment of Multimodal Large Language Models via Influence Function	Lijie Hu et.al.	2411.11667	null
2024-11-18	TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection	Mengxuan Li et.al.	2411.11641	link
2024-11-18	Chapter 7 Review of Data-Driven Generative AI Models for Knowledge Extraction from Scientific Literature in Healthcare	Leon Kopitar et.al.	2411.11635	null
2024-11-18	Signaling and Social Learning in Swarms of Robots	Leo Cazenille et.al.	2411.11616	null
2024-11-18	Leveraging Computational Pathology AI for Noninvasive Optical Imaging Analysis Without Retraining	Danny Barash et.al.	2411.11613	null
2024-11-18	VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation	Bangguo Yu et.al.	2411.11609	null
2024-11-18	Exploring LLMs for Verifying Technical System Specifications Against Requirements	Lasse M. Reinpold et.al.	2411.11582	null
2024-11-15	VeriGraph: Scene Graphs for Execution Verifiable Robot Planning	Daniel Ekpo et.al.	2411.10446	null
2024-11-15	Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization	Weiyun Wang et.al.	2411.10442	null
2024-11-15	LLaVA-o1: Let Vision Language Models Reason Step-by-Step	Guowei Xu et.al.	2411.10440	link
2024-11-15	MARS: Unleashing the Power of Variance Reduction for Training Large Models	Huizhuo Yuan et.al.	2411.10438	link
2024-11-15	Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization	Yuhan Fu et.al.	2411.10436	null
2024-11-15	Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash	Parsa Hejabi et.al.	2411.10422	link
2024-11-15	On the Foundation Model for Cardiac MRI Reconstruction	Chi Zhang et.al.	2411.10403	null
2024-11-15	Interactive Cycle Model – The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses	Libo Wang et.al.	2411.10362	link
2024-11-15	Bias Unveiled: Investigating Social Bias in LLM-Generated Code	Lin Ling et.al.	2411.10351	null
2024-11-15	Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images	Ammar Qammaz et.al.	2411.10334	null
2024-11-15	Number it: Temporal Grounding Videos like Flipping Manga	Yongliang Wu et.al.	2411.10332	link
2024-11-15	Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting	Ziqi Xie et.al.	2411.10309	link
2024-11-15	Static network structure cannot stabilize cooperation among Large Language Model agents	Jin Han et.al.	2411.10294	null
2024-11-15	Scaling Law for Post-training after Model Pruning	Xiaodong Chen et.al.	2411.10272	null
2024-11-15	Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning	Jingru Yang et.al.	2411.10252	null
2024-11-15	Measuring Non-Adversarial Reproduction of Training Data in Large Language Models	Michael Aerni et.al.	2411.10242	null
2024-11-15	Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability	J. Bieniek et.al.	2411.10234	null
2024-11-15	An Empirical Study on LLM-based Agents for Automated Bug Fixing	Xiangxin Meng et.al.	2411.10213	null
2024-11-15	Agentic LLMs in the Supply Chain: Towards Autonomous Multi-Agent Consensus-Seeking	Valeria Jannelli et.al.	2411.10184	null
2024-11-15	CART: Compositional Auto-Regressive Transformer for Image Generation	Siddharth Roheda et.al.	2411.10180	null
2024-11-14	MagicQuill: An Intelligent Interactive Image Editing System	Zichen Liu et.al.	2411.09703	link
2024-11-14	Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models	Wei Wang et.al.	2411.09691	null
2024-11-14	Squeezed Attention: Accelerating Long Context Length LLM Inference	Coleman Hooper et.al.	2411.09688	link
2024-11-14	Adaptive Decoding via Latent Preference Optimization	Shehzaad Dhuliawala et.al.	2411.09661	null
2024-11-14	On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse	Alkis Kalavasis et.al.	2411.09642	null
2024-11-14	Local deployment of large-scale music AI models on commodity hardware	Xun Zhou et.al.	2411.09625	null
2024-11-14	PTR: Precision-Driven Tool Recommendation for Large Language Models	Hang Gao et.al.	2411.09613	null
2024-11-14	The Moral Foundations Weibo Corpus	Renjie Cao et.al.	2411.09612	null
2024-11-14	Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework	Ronak Pradeep et.al.	2411.09607	null
2024-11-14	Accelerating Knowledge Graph and Ontology Engineering with Large Language Models	Cogan Shimizu et.al.	2411.09601	null
2024-11-14	Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images	Bipasha Kundu et.al.	2411.09598	null
2024-11-14	LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models	Zhengyi Wang et.al.	2411.09595	null
2024-11-14	Adopting RAG for LLM-Aided Future Vehicle Design	Vahid Zolfaghari et.al.	2411.09590	null
2024-11-14	BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency	Akari Haga et.al.	2411.09587	null
2024-11-14	Software Performance Engineering for Foundation Model-Powered Software (FMware)	Haoxiang Zhang et.al.	2411.09580	null
2024-11-14	Piecing It All Together: Verifying Multi-Hop Multimodal Claims	Haoran Wang et.al.	2411.09547	null
2024-11-14	A Practical Guide to Fine-tuning Language Models with Limited Data	Márton Szép et.al.	2411.09539	null
2024-11-14	Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents	Yuyou Gan et.al.	2411.09523	null
2024-11-14	Communication Compression for Tensor Parallel LLM Inference	Jan Hansen-Palmus et.al.	2411.09510	null
2024-11-14	Spider: Any-to-Many Multimodal LLM	Jinxiang Lai et.al.	2411.09439	link
2024-11-13	Large Wireless Model (LWM): A Foundation Model for Wireless Channels	Sadjad Alikhani et.al.	2411.08872	link
2024-11-13	The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models	Daniel P. Jeong et.al.	2411.08870	link
2024-11-13	CamemBERT 2.0: A Smarter French Language Model Aged to Perfection	Wissam Antoun et.al.	2411.08868	null
2024-11-13	LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs	Piyush Jha et.al.	2411.08862	null
2024-11-13	Multimodal Instruction Tuning with Hybrid State Space Models	Jianing Zhou et.al.	2411.08840	null
2024-11-13	FinRobot: AI Agent for Equity Research and Valuation with Large Language Models	Tianyu Zhou et.al.	2411.08804	link
2024-11-13	Evaluating World Models with LLM for Decision Making	Chang Yang et.al.	2411.08794	null
2024-11-13	Can sparse autoencoders be used to decompose and interpret steering vectors?	Harry Mayne et.al.	2411.08790	link
2024-11-13	Sharingan: Extract User Action Sequence from Desktop Recordings	Yanting Chen et.al.	2411.08768	null
2024-11-13	Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers	Clément Dumas et.al.	2411.08745	link
2024-11-13	A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models	Dingdong Wang et.al.	2411.08742	null
2024-11-13	Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models	Somanshu Singla et.al.	2411.08733	link
2024-11-13	Polymetis:Large Language Modeling for Multiple Material Domains	Chao Huang et.al.	2411.08728	null
2024-11-13	Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification	Jose-Luis Matez-Bandera et.al.	2411.08727	link
2024-11-13	Theoretical Analysis of Byte-Pair Encoding	László Kozma et.al.	2411.08671	null
2024-11-13	OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances	Youqi Liao et.al.	2411.08665	link
2024-11-13	UniMat: Unifying Materials Embeddings through Multi-modal Learning	Janghoon Ock et.al.	2411.08664	null
2024-11-13	Accelerating Quasi-Static Time Series Simulations with Foundation Models	Alban Puech et.al.	2411.08652	null
2024-11-13	A System Level Performance Evaluation for Superconducting Digital Systems	Joyjit Kundu et.al.	2411.08645	null
2024-11-13	Towards Secure Intelligent O-RAN Architecture: Vulnerabilities, Threats and Promising Technical Solutions using LLMs	Mojdeh Karbalaee Motalleb et.al.	2411.08640	null
2024-11-12	Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data	Juanhui Li et.al.	2411.08028	null
2024-11-12	LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models	Anoop Cherian et.al.	2411.08027	null
2024-11-12	Language Models as Causal Effect Generators	Lucius E. J. Bynum et.al.	2411.08019	link
2024-11-12	ExpressivityArena: Can LLMs Express Information Implicitly?	Joshua Tint et.al.	2411.08010	null
2024-11-12	Can adversarial attacks by large language models be attributed?	Manuel Cebrian et.al.	2411.08003	null
2024-11-12	Derivational Morphology Reveals Analogical Generalization in Large Language Models	Valentin Hofmann et.al.	2411.07990	null
2024-11-12	JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation	Yiyang Ma et.al.	2411.07975	link
2024-11-12	From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents	Chuyi Kong et.al.	2411.07965	null
2024-11-12	Towards Low-bit Communication for Tensor Parallel LLM Inference	Harry Dong et.al.	2411.07942	null
2024-11-12	Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer’s Disease	Francesco Chiumento et.al.	2411.07871	null
2024-11-12	Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders	Xiaofeng Zhu et.al.	2411.07870	null
2024-11-12	Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models	Yusen Zhang et.al.	2411.07858	link
2024-11-12	Tucano: Advancing Neural Text Generation for Portuguese	Nicholas Kluge Corrêa et.al.	2411.07854	link
2024-11-12	NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN	Sonia Raychaudhuri et.al.	2411.07848	null
2024-11-12	Chain Association-based Attacking and Shielding Natural Language Processing Systems	Jiacheng Huang et.al.	2411.07843	null
2024-11-12	FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training	Philip Zmushko et.al.	2411.07837	link
2024-11-12	Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices	Kilian Pfeiffer et.al.	2411.07826	null
2024-11-12	Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models	Youan Cong et.al.	2411.07820	null
2024-11-12	Federated Low-Rank Adaptation with Differential Privacy over Wireless Networks	Tianqu Kang et.al.	2411.07806	null
2024-11-12	Likelihood as a Performance Gauge for Retrieval-Augmented Generation	Tianyu Liu et.al.	2411.07773	link
2024-11-11	UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts	Bo Yang et.al.	2411.07240	link
2024-11-11	OpenThaiGPT 1.5: A Thai-Centric Open Source Large Language Model	Sumeth Yuenyong et.al.	2411.07238	null
2024-11-11	Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations	Chaitanya Malaviya et.al.	2411.07237	null
2024-11-11	Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving	Botao Yu et.al.	2411.07228	null
2024-11-11	TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models	Matheus Simão et.al.	2411.07224	null
2024-11-11	Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks	Madeline Brumley et.al.	2411.07213	null
2024-11-11	General Geospatial Inference with a Population Dynamics Foundation Model	Mohit Agarwal et.al.	2411.07207	link
2024-11-11	DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID	Nyle Siddiqui et.al.	2411.07205	link
2024-11-11	The Super Weight in Large Language Models	Mengxia Yu et.al.	2411.07191	link
2024-11-11	NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics	David Robinson et.al.	2411.07186	null
2024-11-11	SAMPart3D: Segment Any Part in 3D Objects	Yunhan Yang et.al.	2411.07184	link
2024-11-11	Counterfactual Generation from Language Models	Shauli Ravfogel et.al.	2411.07180	link
2024-11-11	More Expressive Attention with Negative Weights	Ang Lv et.al.	2411.07176	link
2024-11-11	Continual Memorization of Factoids in Large Language Models	Howard Chen et.al.	2411.07175	link
2024-11-11	A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19	Vedant Khandelwal et.al.	2411.07163	null
2024-11-11	Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models	Yancheng He et.al.	2411.07140	null
2024-11-11	Stronger Models are NOT Stronger Teachers for Instruction Tuning	Zhangchen Xu et.al.	2411.07133	null
2024-11-11	Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis	Taihang Hu et.al.	2411.07132	link
2024-11-11	Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation	Kaijian Zou et.al.	2411.07130	link
2024-11-11	Benchmarking LLMs’ Judgments with No Gold Standard	Shengwei Xu et.al.	2411.07127	link
2024-11-08	Recycled Attention: Efficient inference for long-context language models	Fangyuan Xu et.al.	2411.05787	link
2024-11-08	Using Language Models to Disambiguate Lexical Choices in Translation	Josh Barua et.al.	2411.05781	link
2024-11-08	Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?	Veronica Chatrath et.al.	2411.05775	null
2024-11-08	Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024	Christopher Malon et.al.	2411.05762	null
2024-11-08	End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering	Dylan Goetting et.al.	2411.05755	link
2024-11-08	Aioli: A Unified Optimization Framework for Language Model Data Mixing	Mayee F. Chen et.al.	2411.05735	link
2024-11-08	Poze: Sports Technique Feedback under Data Constraints	Agamdeep Singh et.al.	2411.05734	null
2024-11-08	STARS: Sensor-agnostic Transformer Architecture for Remote Sensing	Ethan King et.al.	2411.05714	null
2024-11-08	Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal	Fuka Matsuzaki et.al.	2411.05665	link
2024-11-08	The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent	Leon O. H. Kroczek et.al.	2411.05653	null
2024-11-08	LightVA: Lightweight Visual Analytics with LLM Agent-Based Task Planning and Execution	Yuheng Zhao et.al.	2411.05651	null
2024-11-08	Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation	Elena V. Epure et.al.	2411.05649	link
2024-11-08	Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation	Long Truong To et.al.	2411.05641	null
2024-11-08	Assessing Open-Source Large Language Models on Argumentation Mining Subtasks	Mohammad Yeghaneh Abkenar et.al.	2411.05639	null
2024-11-08	A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis	Cristiano Patrício et.al.	2411.05609	link
2024-11-08	Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages	JA Meaney et.al.	2411.05593	null
2024-11-08	Open-set object detection: towards unified problem formulation and benchmarking	Hejer Ammar et.al.	2411.05564	null
2024-11-08	Training objective drives the consistency of representational similarity across datasets	Laure Ciernik et.al.	2411.05561	link
2024-11-08	AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality	Ilias Bournias et.al.	2411.05555	null
2024-11-08	Assessing the Answerability of Queries in Retrieval-Augmented Code Generation	Geonmin Kim et.al.	2411.05547	null
2024-11-07	SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models	Muyang Li et.al.	2411.05007	link
2024-11-07	Analyzing The Language of Visual Tokens	David M. Chan et.al.	2411.05001	null
2024-11-07	Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?	Jonathan Roberts et.al.	2411.05000	null
2024-11-07	DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation	Peiqi Liu et.al.	2411.04999	link
2024-11-07	LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation	Weiquan Huang et.al.	2411.04997	link
2024-11-07	Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models	Weixin Liang et.al.	2411.04996	null
2024-11-07	Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives	Hao Sun et.al.	2411.04991	link
2024-11-07	The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities	Zhaofeng Wu et.al.	2411.04986	link
2024-11-07	Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries	Dylan Manuel et.al.	2411.04981	null
2024-11-07	SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference	Gabriele Oliaro et.al.	2411.04975	link
2024-11-07	BitNet a4.8: 4-bit Activations for 1-bit LLMs	Hongyu Wang et.al.	2411.04965	null
2024-11-07	Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability	Yanjun Gao et.al.	2411.04962	null
2024-11-07	CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM	Jingwei Xu et.al.	2411.04954	null
2024-11-07	M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding	Jaemin Cho et.al.	2411.04952	null
2024-11-07	A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model	Panwen Hu et.al.	2411.04942	null
2024-11-07	VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos	Shehan Munasinghe et.al.	2411.04923	null
2024-11-07	GPTKB: Building Very Large Knowledge Bases from Language Models	Yujia Hu et.al.	2411.04920	link
2024-11-07	OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models	Siming Huang et.al.	2411.04905	null
2024-11-07	In the Era of Prompt Learning with Vision-Language Models	Ankit Jha et.al.	2411.04892	null
2024-11-07	GUI Agents with Foundation Models: A Comprehensive Survey	Shuai Wang et.al.	2411.04890	null
2024-11-06	Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?	Daniel P. Jeong et.al.	2411.04118	link
2024-11-06	How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis	Guan Zhe Hong et.al.	2411.04105	null
2024-11-06	RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models	Maya Varma et.al.	2411.04097	link
2024-11-06	Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation	Ke Fan et.al.	2411.04079	null
2024-11-06	H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models	Nhi Pham et.al.	2411.04077	null
2024-11-06	M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models	Chuhan Li et.al.	2411.04075	null
2024-11-06	Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning	Ping Li et.al.	2411.04059	link
2024-11-06	Beemo: Benchmark of Expert-edited Machine-generated Outputs	Ekaterina Artemova et.al.	2411.04032	link
2024-11-06	Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages	Aniket Deroy et.al.	2411.04025	null
2024-11-06	Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval	Davide Buoso et.al.	2411.04006	null
2024-11-06	Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning	Jiawei Yao et.al.	2411.03978	link
2024-11-06	What Really is Commonsense Knowledge?	Quyet V. Do et.al.	2411.03964	null
2024-11-06	How Does A Text Preprocessing Pipeline Affect Ontology Syntactic Matching?	Zhangcheng Qiang et.al.	2411.03962	link
2024-11-06	Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model	Hatef Otroshi Shahreza et.al.	2411.03960	null
2024-11-06	Fine-Grained Guidance for Retrievers: Leveraging LLMs’ Feedback in Retrieval-Augmented Generation	Yuhang Liu et.al.	2411.03957	null
2024-11-06	Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks	Felipe Marra et.al.	2411.03948	link
2024-11-06	Interactions Across Blocks in Post-Training Quantization of Large Language Models	Khasmamad Shabanovi et.al.	2411.03934	null
2024-11-06	Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models	Minh Duc Bui et.al.	2411.03888	link
2024-11-06	Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models	Zhijian Zhuo et.al.	2411.03884	link
2024-11-06	MEG: Medical Knowledge-Augmented Large Language Models for Question Answering	Laura Cabello et.al.	2411.03883	link
2024-11-05	Inference Optimal VLMs Need Only One Visual Token but Larger Models	Kevin Y. Li et.al.	2411.03312	link
2024-11-05	LLMs for Domain Generation Algorithm Detection	Reynier Leyva La O et.al.	2411.03307	null
2024-11-05	VERITAS: A Unified Approach to Reliability Evaluation	Rajkumar Ramamurthy et.al.	2411.03300	null
2024-11-05	Examining Human-AI Collaboration for Co-Writing Constructive Comments Online	Farhana Shahid et.al.	2411.03295	null
2024-11-05	Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?	Jingyu Xiao et.al.	2411.03292	link
2024-11-05	The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare	Souren Pashangpour et.al.	2411.03287	null
2024-11-05	SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents	Dawei Li et.al.	2411.03284	link
2024-11-05	Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities	Ryosuke Takata et.al.	2411.03252	null
2024-11-05	DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models	Ying Zhou et.al.	2411.03250	null
2024-11-05	From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice	Alicia Guo et.al.	2411.03137	null
2024-11-05	“Create a Fear of Missing Out” – ChatGPT Implements Unsolicited Deceptive Designs in Generated Websites Without Warning	Veronika Krauß et.al.	2411.03108	null
2024-11-05	Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation	Jinbao Chen et.al.	2411.03079	null
2024-11-05	Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning	Bei Li et.al.	2411.03042	null
2024-11-05	HumanVLM: Foundation for Human-Scene Vision-Language Model	Dawei Dai et.al.	2411.03034	null
2024-11-05	Leveraging Large Language Models in Code Question Answering: Baselines and Issues	Georgy Andryushchenko et.al.	2411.03012	link
2024-11-05	Controlling for Unobserved Confounding with Large Language Model Classification of Patient Smoking Status	Samuel Lee et.al.	2411.03004	null
2024-11-05	Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation	Junchen Fu et.al.	2411.02992	null
2024-11-05	Growing a Tail: Increasing Output Diversity in Large Language Models	Michal Shur-Ofry et.al.	2411.02989	null
2024-11-05	[Vision Paper] PRObot: Enhancing Patient-Reported Outcome Measures for Diabetic Retinopathy using Chatbots and Generative AI	Maren Pielka et.al.	2411.02973	null
2024-11-05	Multi-modal NeRF Self-Supervision for LiDAR Semantic Segmentation	Xavier Timoneda et.al.	2411.02969	null
2024-11-04	Training-free Regional Prompting for Diffusion Transformers	Anthony Chen et.al.	2411.02395	link
2024-11-04	Adaptive Length Image Tokenization via Recurrent Allocation	Shivam Duggal et.al.	2411.02393	link
2024-11-04	Attacking Vision-Language Computer Agents via Pop-ups	Yanzhe Zhang et.al.	2411.02391	link
2024-11-04	Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models	Guangzhi Xiong et.al.	2411.02382	null
2024-11-04	Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI	Ramneet Kaur et.al.	2411.02381	null
2024-11-04	Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis	Neel Dey et.al.	2411.02372	link
2024-11-04	DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution	Yang Yue et.al.	2411.02359	link
2024-11-04	“Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization	Eldar Kurtic et.al.	2411.02355	null
2024-11-04	Machine learning identification of maternal inflammatory response and histologic choroamnionitis from placental membrane whole slide images	Abhishek Sharma et.al.	2411.02354	null
2024-11-04	Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences	Ruotong Wang et.al.	2411.02353	null
2024-11-04	Can Large Language Models generalize analogy solving like people can?	Claire E. Stevenson et.al.	2411.02348	null
2024-11-04	WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning	Zehan Qi et.al.	2411.02337	link
2024-11-04	Sparsing Law: Towards Large Language Models with Greater Activation Sparsity	Yuqi Luo et.al.	2411.02335	link
2024-11-04	Disrupting Test Development with AI Assistants	Vijay Joshi et.al.	2411.02328	null
2024-11-04	PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	Ruyang Liu et.al.	2411.02327	link
2024-11-04	An Empirical Study on the Code Refactoring Capability of Large Language Models	Jonathan Cordeiro et.al.	2411.02320	null
2024-11-04	Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast	Marilyn Rego et.al.	2411.02318	null
2024-11-04	Defining and Evaluating Physical Safety for Large Language Models	Yung-Chen Tang et.al.	2411.02317	null
2024-11-04	Evaluating Creative Short Story Generation in Humans and Large Language Models	Mete Ismayilzada et.al.	2411.02316	link
2024-11-04	Taking AI Welfare Seriously	Robert Long et.al.	2411.00986	null
2024-10-31	P-Masking: Power Law Masking Improves Multi-attribute Controlled Generation	Mohamed Elgaar et.al.	2410.24201	null
2024-11-01	SelfCodeAlign: Self-Alignment for Code Generation	Yuxiang Wei et.al.	2410.24198	link
2024-10-31	DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models	Heng-Jui Chang et.al.	2410.24177	null
2024-10-31	Constraint Back-translation Improves Complex Instruction Following of Large Language Models	Yunjia Qi et.al.	2410.24175	link
2024-10-31	$π_0$ : A Vision-Language-Action Flow Model for General Robot Control	Kevin Black et.al.	2410.24164	null
2024-10-31	GPT or BERT: why not both?	Lucas Georges Gabriel Charpentier et.al.	2410.24159	link
2024-10-31	Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning	Jinghan Zhang et.al.	2410.24155	null
2024-10-31	Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning	Jiaqi Liu et.al.	2410.24152	null
2024-10-31	Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age	Nouar AlDahoul et.al.	2410.24148	null
2024-10-31	Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing	Akash Dhruv et.al.	2410.24119	link
2024-10-31	Repository-Level Compositional Code Translation and Validation	Ali Reza Ibrahimzada et.al.	2410.24117	link
2024-10-31	Matchmaker: Self-Improving Large Language Model Programs for Schema Matching	Nabeel Seedat et.al.	2410.24105	null
2024-10-31	Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning	Nabil Omi et.al.	2410.24096	null
2024-10-31	In-Context Fine-Tuning for Time-Series Foundation Models	Abhimanyu Das et.al.	2410.24087	null
2024-10-31	Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs	Muhammed Saeed et.al.	2410.24049	null
2024-10-31	Handwriting Recognition in Historical Documents with Multimodal LLM	Lucian Li et.al.	2410.24034	null
2024-10-31	Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks	Yingzhe Peng et.al.	2410.24032	null
2024-10-31	AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents	Yifan Xu et.al.	2410.24024	link
2024-10-31	SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation	Liang He et.al.	2410.24022	null
2024-10-31	Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?	Ioannis Tsiamas et.al.	2410.24019	null
2024-10-30	ReferEverything: Towards Segmenting Everything We Can Speak of in Videos	Anurag Bagchi et.al.	2410.23287	null
2024-10-30	A Monte Carlo Framework for Calibrated Uncertainty Estimation in Sequence Prediction	Qidong Yang et.al.	2410.23272	null
2024-10-30	TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models	Ziyao Shangguan et.al.	2410.23266	link
2024-10-30	EMMA: End-to-End Multimodal Model for Autonomous Driving	Jyh-Jing Hwang et.al.	2410.23262	null
2024-10-30	Keypoint Abstraction using Large Models for Object-Relative Imitation Learning	Xiaolin Fang et.al.	2410.23254	null
2024-10-30	Evaluating Cultural and Social Awareness of LLM Web Agents	Haoyi Qiu et.al.	2410.23252	null
2024-10-30	Carrot and Stick: Eliciting Comparison Data and Beyond	Yiling Chen et.al.	2410.23243	null
2024-10-30	A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment	Matteo G. Mecattaf et.al.	2410.23242	link
2024-10-30	EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning	Peide Huang et.al.	2410.23234	null
2024-10-30	COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences	Yixin Liu et.al.	2410.23223	link
2024-10-30	Partial Channel Dependence with Channel Masks for Time Series Foundation Models	Seunghan Lee et.al.	2410.23222	null
2024-10-30	OS-ATLAS: A Foundation Action Model for Generalist GUI Agents	Zhiyong Wu et.al.	2410.23218	link
2024-10-31	Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval	Sheryl Hsu et.al.	2410.23214	null
2024-10-30	ProTransformer: Robustify Transformers via Plug-and-Play Paradigm	Zhichao Hou et.al.	2410.23182	link
2024-10-30	ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning	Millennium Bismay et.al.	2410.23180	link
2024-10-30	TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters	Haiyang Wang et.al.	2410.23168	link
2024-10-30	SciPIP: An LLM-based Scientific Paper Idea Proposer	Wenxiao Wang et.al.	2410.23166	link
2024-10-30	FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities	Jingge Xiao et.al.	2410.23160	link
2024-10-30	VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning	Yichao Liang et.al.	2410.23156	null
2024-10-30	Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms	Jordan Meyer et.al.	2410.23144	null
2024-10-29	Local Policies Enable Zero-shot Long-horizon Manipulation	Murtaza Dalal et.al.	2410.22332	null
2024-10-29	Task Vectors are Cross-Modal	Grace Luo et.al.	2410.22330	null
2024-10-29	Enhancing Code Annotation Reliability: Generative AI’s Role in Comment Quality Assessment Models	Seetharam Killivalavan et.al.	2410.22323	null
2024-10-29	Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting	Can Chen et.al.	2410.22318	link
2024-10-29	Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier	Kai Wang et.al.	2410.22317	link
2024-10-29	Natural Language Inference Improves Compositionality in Vision-Language Models	Paola Cascante-Bonilla et.al.	2410.22315	null
2024-10-29	Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving	Bo Jiang et.al.	2410.22313	link
2024-10-29	GPT-4o reads the mind in the eyes	James W. A. Strachan et.al.	2410.22309	null
2024-10-29	SVIP: Towards Verifiable Inference of Open-source Large Language Models	Yifan Sun et.al.	2410.22307	null
2024-10-29	Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning	Yihe Deng et.al.	2410.22304	null
2024-10-29	LLMs are Highly-Constrained Biophysical Sequence Optimizers	Angelica Chen et.al.	2410.22296	null
2024-10-29	Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats	Mohammad Setak et.al.	2410.22293	null
2024-10-29	From melodic note sequences to pitches using word2vec	Daniel Defays et.al.	2410.22285	null
2024-10-29	Embedding-based classifiers can detect prompt injection attacks	Md. Ahsan Ayub et.al.	2410.22284	link
2024-10-29	Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models	Renzhe Yu et.al.	2410.22282	null
2024-10-29	Fourier Head: Helping Large Language Models Learn Complex Probability Distributions	Nate Gillman et.al.	2410.22269	null
2024-10-29	Meta-Learning Adaptable Foundation Models	Jacob L. Block et.al.	2410.22264	null
2024-10-29	FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation	Farima Fatahi Bayat et.al.	2410.22257	null
2024-10-29	Abrupt Learning in Transformers: A Case Study on Matrix Completion	Pulkit Gopalani et.al.	2410.22244	null
2024-10-29	Are Decoder-Only Large Language Models the Silver Bullet for Code Search?	Yuxuan Chen et.al.	2410.22240	link
2024-10-28	Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics	Yaniv Nikankin et.al.	2410.21272	link
2024-10-28	LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior	Hanyu Wang et.al.	2410.21264	null
2024-10-28	BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference	Changwoo Lee et.al.	2410.21262	link
2024-10-28	AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?	Han Bao et.al.	2410.21259	link
2024-10-28	Multi-modal AI for comprehensive breast cancer prognostication	Jan Witowski et.al.	2410.21256	null
2024-10-28	LongReward: Improving Long-context Large Language Models with AI Feedback	Jiajie Zhang et.al.	2410.21252	link
2024-10-28	Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback	Nour Jedidi et.al.	2410.21242	null
2024-10-28	Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce	Zhantao Yang et.al.	2410.21237	null
2024-10-28	Flaming-hot Initiation with Regular Execution Sampling for Large Language Models	Weizhe Chen et.al.	2410.21236	null
2024-10-28	LoRA vs Full Fine-tuning: An Illusion of Equivalence	Reece Shuttleworth et.al.	2410.21228	null
2024-10-28	Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines	Zhixin Zhang et.al.	2410.21220	link
2024-10-28	Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations	Kaifeng Huang et.al.	2410.21218	null
2024-10-28	BongLLaMA: LLaMA for Bangla Language	Abdullah Khan Zehady et.al.	2410.21200	null
2024-10-28	Belief in the Machine: Investigating Epistemological Blind Spots of Language Models	Mirac Suzgun et.al.	2410.21195	link
2024-10-29	Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction	Qintong Zhang et.al.	2410.21169	null
2024-10-28	M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation	Jiaheng Liu et.al.	2410.21157	null
2024-10-28	Palisade – Prompt Injection Detection Framework	Sahasra Kokkula et.al.	2410.21146	null
2024-10-28	LLM-initialized Differentiable Causal Discovery	Shiv Kampani et.al.	2410.21141	null
2024-10-28	Do LLMs generate test oracles that capture the actual or the expected program behaviour?	Michael Konstantinou et.al.	2410.21136	null
2024-10-28	Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments	Marharyta Domnich et.al.	2410.21131	link
2024-10-25	The Potential and Value of AI Chatbot in Personalized Cognitive Training	Zilong Wang et.al.	2410.19733	null
2024-10-25	Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models	Yucheng Zhou et.al.	2410.19732	null
2024-10-25	Counting Ability of Large Language Models and Impact of Tokenization	Xiang Zhang et.al.	2410.19730	link
2024-10-25	FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning	Nicole Cho et.al.	2410.19727	null
2024-10-25	2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision	Shilong Li et.al.	2410.19720	null
2024-10-25	Multi-view biomedical foundation models for molecule-target and property prediction	Parthasarathy Suryanarayanan et.al.	2410.19704	link
2024-10-25	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Xiangyu Zeng et.al.	2410.19702	null
2024-10-25	IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation	Kaixian Qu et.al.	2410.19697	null
2024-10-25	Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs	Yifei Zhang et.al.	2410.19694	null
2024-10-25	APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs	Huaxiaoyue Wang et.al.	2410.19656	null
2024-10-25	Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models	Shenghao Fu et.al.	2410.19635	null
2024-10-25	Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina	Yuan Gao et.al.	2410.19599	null
2024-10-25	Diverse Sign Language Translation	Xin Shen et.al.	2410.19586	link
2024-10-25	ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems	Ritvik Aggarwal Ishneet Sukhvinder Singh Ibrahim Allahverdiyev et.al.	2410.19572	null
2024-10-25	GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing	Hosam Elgendy et.al.	2410.19552	link
2024-10-25	Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?	Antonia Wüst et.al.	2410.19546	link
2024-10-25	Brain-like Functional Organization within Large Language Models	H. Sun et.al.	2410.19542	null
2024-10-25	Detection of Human and Machine-Authored Fake News in Urdu	Muhammad Zain Ali et.al.	2410.19517	link
2024-10-25	SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models	Jahyun Koo et.al.	2410.19503	null
2024-10-25	Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization	Anthony Cui et.al.	2410.19499	null
2024-10-24	Unbounded: A Generative Infinite Game of Character Life Simulation	Jialu Li et.al.	2410.18975	null
2024-10-24	Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques	David Ortiz-Perez et.al.	2410.18972	null
2024-10-24	ConceptDrift: Uncovering Biases through the Lens of Foundational Models	Cristian Daniel Păduraru et.al.	2410.18970	null
2024-10-24	Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Zhangheng Li et.al.	2410.18967	null
2024-10-24	Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions	Yujuan Fu et.al.	2410.18966	null
2024-10-24	On the Crucial Role of Initialization for Matrix Factorization	Bingcong Li et.al.	2410.18965	null
2024-10-24	OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning	Xiaoqiang Wang et.al.	2410.18963	null
2024-10-24	Context is Key: A Benchmark for Forecasting with Essential Textual Information	Andrew Robert Williams et.al.	2410.18959	link
2024-10-24	Bridge-Coder: Unlocking LLMs’ Potential to Overcome Language Gaps in Low-Resource Code	Jipeng Zhang et.al.	2410.18957	null
2024-10-24	BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning	Yujuan Velvin Fu et.al.	2410.18955	null
2024-10-24	Dynamic Vocabulary Pruning in Early-Exit LLMs	Jort Vincenti et.al.	2410.18952	link
2024-10-24	SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models	Zonghao Ying et.al.	2410.18927	null
2024-10-24	From Blind Solvers to Logical Thinkers: Benchmarking LLMs’ Logical Integrity on Faulty Mathematical Problems	A M Muntasir Rahman et.al.	2410.18921	null
2024-10-25	A Survey on Speech Large Language Models	Jing Peng et.al.	2410.18908	null
2024-10-24	PRISM: A Methodology for Auditing Biases in Large Language Models	Leif Azzopardi et.al.	2410.18906	link
2024-10-24	LLMs for Extremely Low-Resource Finno-Ugric Languages	Taido Purason et.al.	2410.18902	link
2024-10-24	Creating and Repairing Robot Programs in Open-World Domains	Claire Schlesinger et.al.	2410.18893	null
2024-10-24	Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks	Graziano A. Manduzio et.al.	2410.18890	null
2024-10-24	Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance	Omer Nahum et.al.	2410.18889	null
2024-10-24	Provably Robust Watermarks for Open-Source Language Models	Miranda Christ et.al.	2410.18861	null
2024-10-23	TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts	Yuxuan Xie et.al.	2410.18071	null
2024-10-23	CLEAR: Character Unlearning in Textual and Visual Modalities	Alexey Dontsov et.al.	2410.18057	null
2024-10-23	LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering	Qingfei Zhao et.al.	2410.18050	link
2024-10-23	Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases	Anna Glazkova et.al.	2410.18040	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2024-10-23	GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration	Xin Li et.al.	2410.18032	link
2024-10-23	MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting	Sungil Seok et.al.	2410.18012	null
2024-10-23	Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation	Suho Kang et.al.	2410.18001	link
2024-10-23	MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers	Zebin Yang et.al.	2410.17957	null
2024-10-23	ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference	Xin He et.al.	2410.17954	null
2024-10-23	SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains	Ran Xu et.al.	2410.17952	null
2024-10-23	Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling	Nirav Bhan et.al.	2410.17950	null
2024-10-23	Toward path-invariant embeddings for local distance source characterization	Lisa Linville et.al.	2410.17937	null
2024-10-23	Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models	He Cao et.al.	2410.17922	link
2024-10-23	Scaling Diffusion Language Models via Adaptation from Autoregressive Models	Shansan Gong et.al.	2410.17891	link
2024-10-23	R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models	Linger Deng et.al.	2410.17885	link
2024-10-23	Lightweight Neural App Control	Filippos Christianos et.al.	2410.17883	null
2024-10-23	AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning	Yehonathan Refael et.al.	2410.17881	null
2024-10-23	Understanding Layer Significance in LLM Alignment	Guangyuan Shi et.al.	2410.17875	null
2024-10-23	DataTales: A Benchmark for Real-World Intelligent Data Narration	Yajing Yang et.al.	2410.17859	link
2024-10-22	PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction	Long Xing et.al.	2410.17247	link
2024-10-22	Towards Reliable Evaluation of Behavior Steering Interventions in LLMs	Itamar Pres et.al.	2410.17245	null
2024-10-22	Frontiers in Intelligent Colonoscopy	Ge-Peng Ji et.al.	2410.17241	link
2024-10-22	Large Language Models Empowered Personalized Web Agents	Hongru Cai et.al.	2410.17236	null
2024-10-22	Automated Spinal MRI Labelling from Reports Using a Large Language Model	Robin Y. Park et.al.	2410.17235	link
2024-10-22	Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy	Benedict Aaron Tjandra et.al.	2410.17234	null
2024-10-22	Few-shot In-Context Preference Learning Using Large Language Models	Chao Yu et.al.	2410.17233	null
2024-10-22	Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods	Tsachi Blau et.al.	2410.17222	null
2024-10-22	MiniPLM: Knowledge Distillation for Pre-Training Language Models	Yuxian Gu et.al.	2410.17215	link
2024-10-22	Exploring Possibilities of AI-Powered Legal Assistance in Bangladesh through Large Language Modeling	Azmine Toushik Wasi et.al.	2410.17210	link
2024-10-22	VoiceBench: Benchmarking LLM-Based Voice Assistants	Yiming Chen et.al.	2410.17196	link
2024-10-23	Non-myopic Generation of Language Model for Reasoning and Planning	Chang Ma et.al.	2410.17195	link
2024-10-22	Remote Timing Attacks on Efficient Language Model Inference	Nicholas Carlini et.al.	2410.17175	null
2024-10-22	From Attention to Activation: Unravelling the Enigmas of Large Language Models	Prannay Kaul et.al.	2410.17174	null
2024-10-22	Self-calibration for Language Model Quantization and Pruning	Miles Williams et.al.	2410.17170	null
2024-10-22	Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence	İlker Işık et.al.	2410.17161	null
2024-10-22	Improving Pinterest Search Relevance Using Large Language Models	Han Wang et.al.	2410.17152	null
2024-10-22	Are Visual-Language Models Effective in Action Recognition? A Comparative Study	Mahmoud Ali et.al.	2410.17149	null
2024-10-22	Can General-Purpose Large Language Models Generalize to English-Thai Machine Translation ?	Jirat Chiaranaipanich et.al.	2410.17145	null
2024-10-22	Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements	Isamu Isozaki et.al.	2410.17141	link
2024-10-21	Reflection-Bench: probing AI intelligence with reflection	Lingyu Li et.al.	2410.16270	link
2024-10-21	SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree	Shuangrui Ding et.al.	2410.16268	link
2024-10-21	xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs	Michael S. Ryoo et.al.	2410.16267	null
2024-10-22	Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance	Zhangwei Gao et.al.	2410.16261	link
2024-10-21	Elucidating the design space of language models for image generation	Xuantong Liu et.al.	2410.16257	link
2024-10-21	CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution	Maosong Cao et.al.	2410.16256	link
2024-10-21	Can Knowledge Editing Really Correct Hallucinations?	Baixiang Huang et.al.	2410.16251	link
2024-10-21	Analyzing Context Contributions in LLM-based Machine Translation	Emmanouil Zaranis et.al.	2410.16246	null
2024-10-21	IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems	Yihuan Mao et.al.	2410.16237	null
2024-10-21	LLaVA-KD: A Framework of Distilling Multimodal Large Language Models	Yuxuan Cai et.al.	2410.16236	link
2024-10-21	ToW: Thoughts of Words Improve Reasoning in Large Language Models	Zhikun Xu et.al.	2410.16235	link
2024-10-21	Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping	Ryan Li et.al.	2410.16232	null
2024-10-21	Building A Coding Assistant via the Retrieval-Augmented Language Model	Xinze Li et.al.	2410.16229	link
2024-10-21	A Realistic Threat Model for Large Language Model Jailbreaks	Valentyn Boreiko et.al.	2410.16222	link
2024-10-21	Pre-training Distillation for Large Language Models: A Design Space Exploration	Hao Peng et.al.	2410.16215	null
2024-10-21	Comprehensive benchmarking of large language models for RNA secondary structure prediction	L. I. Zablocki et.al.	2410.16212	link
2024-10-21	CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning	Kumar Manas et.al.	2410.16207	link
2024-10-21	Improve Vision Language Model Chain-of-thought Reasoning	Ruohong Zhang et.al.	2410.16198	link
2024-10-22	LASER: Script Execution by Autonomous Agents for On-demand Traffic Simulation	Hao Gao et.al.	2410.16197	link
2024-10-21	Contamination Report for Multilingual Benchmarks	Sanchit Ahuja et.al.	2410.16186	null
2024-10-18	Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts	German Gritsai et.al.	2410.14677	link
2024-10-18	SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment	Qin Liu et.al.	2410.14676	null
2024-10-18	Enhancing Large Language Models’ Situated Faithfulness to External Contexts	Yukun Huang et.al.	2410.14675	link
2024-10-18	Decomposing The Dark Matter of Sparse Autoencoders	Joshua Engels et.al.	2410.14670	link
2024-10-18	NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples	Baiqi Li et.al.	2410.14669	null
2024-10-18	MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps	Xiongtao Zhou et.al.	2410.14668	link
2024-10-18	A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning	Shengjie Sun et.al.	2410.14660	null
2024-10-18	Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens	Zhepeng Cen et.al.	2410.14655	null
2024-10-18	EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search	Oliver Sieberling et.al.	2410.14649	link
2024-10-18	Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs	Runchu Tian et.al.	2410.14641	link
2024-10-18	GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings	Raghuveer Thirukovalluru et.al.	2410.14635	link
2024-10-18	Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning	Yuxiang Lu et.al.	2410.14633	null
2024-10-18	On the Regularization of Learnable Embeddings for Time Series Processing	Luca Butera et.al.	2410.14630	null
2024-10-18	CELI: Controller-Embedded Language Model Interactions	Jan-Samuel Wagner et.al.	2410.14627	null
2024-10-18	DiSCo Meets LLMs: A Unified Approach for Sparse Retrieval and Contextual Distillation in Conversational Search	Simon Lupart et.al.	2410.14609	link
2024-10-18	Teaching Models to Balance Resisting and Accepting Persuasion	Elias Stengel-Eskin et.al.	2410.14596	link
2024-10-18	Neuro-Symbolic Traders: Assessing the Wisdom of AI Crowds in Markets	Namid R. Stillman et.al.	2410.14587	null
2024-10-18	Do LLMs estimate uncertainty well in instruction-following?	Juyeon Heo et.al.	2410.14582	link
2024-10-18	Large Language Models Are Overparameterized Text Encoders	Thennal D K et.al.	2410.14578	null
2024-10-18	MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts	Rachel S. Y. Teo et.al.	2410.14574	link
2024-10-17	Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens	Lijie Fan et.al.	2410.13863	null
2024-10-17	PUMA: Empowering Unified MLLM with Multi-granular Visual Generation	Rongyao Fang et.al.	2410.13861	link
2024-10-17	VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding	Runsen Xu et.al.	2410.13860	link
2024-10-17	$γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models	Yaxin Luo et.al.	2410.13859	null
2024-10-17	How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs	Guhao Feng et.al.	2410.13857	null
2024-10-17	Can MLLMs Understand the Deep Implication Behind Chinese Images?	Chenhao Zhang et.al.	2410.13854	link
2024-10-17	Retrospective Learning from Interactions	Zizhao Chen et.al.	2410.13852	null
2024-10-17	Differentiable Robot Rendering	Ruoshi Liu et.al.	2410.13851	null
2024-10-17	SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction	Xuan Zhang et.al.	2410.13846	link
2024-10-17	A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models	Qiaoyu Tang et.al.	2410.13841	null
2024-10-17	Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs	Tianyu Guo et.al.	2410.13835	link
2024-10-17	A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement	Hui Yuan et.al.	2410.13828	link
2024-10-17	Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models	Mazda Moayeri et.al.	2410.13826	null
2024-10-17	AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents	Ke Yang et.al.	2410.13825	null
2024-10-18	Harnessing Webpage UIs for Text-Rich Visual Understanding	Junpeng Liu et.al.	2410.13824	null
2024-10-17	Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning	Xiaodan Xing et.al.	2410.13823	link
2024-10-17	Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance	Mitsuhiko Nakamoto et.al.	2410.13816	null
2024-10-17	De-mark: Watermark Removal in Large Language Models	Ruibo Chen et.al.	2410.13808	null
2024-10-17	A Watermark for Order-Agnostic Language Models	Ruibo Chen et.al.	2410.13805	null
2024-10-18	BenTo: Benchmark Task Reduction with In-Context Transferability	Hongyu Zhao et.al.	2410.13804	link
2024-10-16	Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models	Ce Zhang et.al.	2410.12790	link
2024-10-16	Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception	Jihao Zhao et.al.	2410.12788	link
2024-10-16	In-Context Learning Enables Robot Action Prediction in LLMs	Yida Yin et.al.	2410.12782	null
2024-10-16	Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information	Yingya Li et.al.	2410.12774	null
2024-10-16	Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions	Zhenyu Jiang et.al.	2410.12773	null
2024-10-16	Towards Zero-Shot Camera Trap Image Categorization	Jiří Vyskočil et.al.	2410.12769	null
2024-10-16	The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse	Ekansh Sharma et.al.	2410.12766	null
2024-10-16	StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples	Ajay Patel et.al.	2410.12757	null
2024-10-17	CREAM: Consistency Regularized Self-Rewarding Language Models	Zhaoyang Wang et.al.	2410.12735	link
2024-10-16	WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation	João Matos et.al.	2410.12722	link
2024-10-16	FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression	Zhenheng Tang et.al.	2410.12707	null
2024-10-16	WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines	Genta Indra Winata et.al.	2410.12705	link
2024-10-16	Sarcasm Detection in a Less-Resourced Language	Lazar Đoković et.al.	2410.12704	link
2024-10-16	Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization	Xingqi Wang et.al.	2410.12700	link
2024-10-16	VividMed: Vision Language Model with Versatile Visual Grounding for Medicine	Lingxiao Luo et.al.	2410.12694	link
2024-10-16	Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2	Mohamad Abdi et.al.	2410.12686	null
2024-10-16	3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation	Dewei Zhou et.al.	2410.12669	link
2024-10-16	Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models	Shicheng Xu et.al.	2410.12662	null
2024-10-16	Evaluating Morphological Compositional Generalization in Large Language Models	Mete Ismayilzada et.al.	2410.12656	link
2024-10-16	Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals	Orchid Chetia Phukan et.al.	2410.12645	null
2024-10-15	GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation	Fei Tang et.al.	2410.11841	link
2024-10-15	A Hitchhiker’s Guide to Scaling Law Estimation	Leshem Choshen et.al.	2410.11840	link
2024-10-15	MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding	Yue Cao et.al.	2410.11829	link
2024-10-15	Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws	Yiding Jiang et.al.	2410.11820	link
2024-10-15	Improving Long-Text Alignment for Text-to-Image Diffusion Models	Luping Liu et.al.	2410.11817	link
2024-10-15	SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing	Zhiyuan Zhang et.al.	2410.11815	null
2024-10-15	NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models	Han Han et.al.	2410.11805	link
2024-10-15	FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting	Zhe Li et.al.	2410.11802	null
2024-10-15	Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability	Tsz Ting Chung et.al.	2410.11786	null
2024-10-15	Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty	Joey Wilson et.al.	2410.11783	link
2024-10-15	G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks	Guibin Zhang et.al.	2410.11782	null
2024-10-15	Language Models Encode Numbers Using Digit Representations in Base 10	Amit Arnold Levy et.al.	2410.11781	link
2024-10-15	MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation	Chenxi Wang et.al.	2410.11779	link
2024-10-15	Time-Series Foundation Model for Value-at-Risk	Anubha Goel et.al.	2410.11773	link
2024-10-15	Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models	Kai Yao et.al.	2410.11772	link
2024-10-15	SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding	Ying Chen et.al.	2410.11761	null
2024-10-15	Latent Action Pretraining from Videos	Seonghyeon Ye et.al.	2410.11758	null
2024-10-15	Personas with Attitudes: Controlling LLMs for Diverse Data Annotation	Leon Fröhling et.al.	2410.11745	link
2024-10-15	DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure	Yunfan Xiong et.al.	2410.11744	null
2024-10-15	Light-Weight Fault Tolerant Attention for Large Language Model Training	Yuhang Liang et.al.	2410.11720	null
2024-10-14	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads	Guangxuan Xiao et.al.	2410.10819	link
2024-10-14	Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free	Ziyue Li et.al.	2410.10814	link
2024-10-14	LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory	Di Wu et.al.	2410.10813	link
2024-10-14	Local and Global Decoding in Text Generation	Daniel Gareev et.al.	2410.10810	link
2024-10-14	Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning	Aakanksha et.al.	2410.10801	null
2024-10-14	Towards Foundation Models for 3D Vision: How Close Are We?	Yiming Zuo et.al.	2410.10799	link
2024-10-15	MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling	Jian Yang et.al.	2410.10798	null
2024-10-14	Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance	Sachin Goyal et.al.	2410.10796	link
2024-10-15	LiveXiv – A Multi-Modal Live Benchmark Based on Arxiv Papers Content	Nimrod Shabtay et.al.	2410.10783	link
2024-10-14	When Attention Sink Emerges in Language Models: An Empirical View	Xiangming Gu et.al.	2410.10781	link
2024-10-14	Focused ReAct: Improving ReAct through Reiterate and Early Stop	Shuoqiu Li et.al.	2410.10779	null
2024-10-14	AFlow: Automating Agentic Workflow Generation	Jiayi Zhang et.al.	2410.10762	link
2024-10-14	Denial-of-Service Poisoning Attacks against Large Language Models	Kuofeng Gao et.al.	2410.10760	link
2024-10-14	SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization	Akrit Mudvari et.al.	2410.10759	null
2024-10-14	Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification	Jan Cegin et.al.	2410.10756	link
2024-10-14	NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models	Yanbiao Ji et.al.	2410.10743	null
2024-10-14	SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing	Pengrui Quan et.al.	2410.10741	link
2024-10-14	Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs	Ishan Jindal et.al.	2410.10739	null
2024-10-14	Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning	Kuofeng Gao et.al.	2410.10735	null
2024-10-14	Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank Selection	Giorgos Iacovides et.al.	2410.10728	null
2024-10-11	Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models	Qin Liu et.al.	2410.09047	null
2024-10-11	AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation	Zijun Wang et.al.	2410.09040	link
2024-10-11	Semi-Supervised Learning of Noisy Mixture of Experts Models	Oh-Ran Kwon et.al.	2410.09039	null
2024-10-11	SimpleStrat: Diversifying Language Model Generation with Stratification	Justin Wong et.al.	2410.09038	null
2024-10-11	Mentor-KD: Making Small Language Models Better Multi-step Reasoners	Hojae Lee et.al.	2410.09037	link
2024-10-11	PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents	Xiangyu Yin et.al.	2410.09034	link
2024-10-11	MedMobile: A mobile-sized language model with expert-level clinical capabilities	Krithik Vishwanath et.al.	2410.09019	link
2024-10-11	Parameter-Efficient Fine-Tuning of State Space Models	Kevin Galim et.al.	2410.09016	link
2024-10-11	The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals	Xiaofeng Wu et.al.	2410.09013	null
2024-10-11	Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models	Hao Li et.al.	2410.09012	link
2024-10-11	SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights	Ling Yang et.al.	2410.09008	link
2024-10-11	From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts	Zhuohao Jerry Zhang et.al.	2410.09006	null
2024-10-11	DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection	Haochen Li et.al.	2410.09004	link
2024-10-11	Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference	Grace Proebsting et.al.	2410.08996	null
2024-10-11	The structure of the token space for large language models	Michael Robinson et.al.	2410.08993	null
2024-10-11	Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory	Rebecca M. M. Hicke et.al.	2410.08991	link
2024-10-11	SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning	Ziming Yu et.al.	2410.08989	link
2024-10-11	Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective	Bo Ni et.al.	2410.08985	null
2024-10-11	NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models	Zheng Yi Ho et.al.	2410.08970	null
2024-10-11	Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements	Jingyu Zhang et.al.	2410.08968	null
2024-10-10	DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models	Xiaoxiao He et.al.	2410.08207	null
2024-10-10	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	Gen Luo et.al.	2410.08202	null
2024-10-10	Adam Exploits $\ell_\infty$ -geometry of Loss Landscape via Coordinate-wise Adaptivity	Shuo Xie et.al.	2410.08198	link
2024-10-10	From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions	Changle Qu et.al.	2410.08197	link
2024-10-10	MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code	Zimu Lu et.al.	2410.08196	link
2024-10-10	Features are fate: a theory of transfer learning in high-dimensional regression	Javan Tahir et.al.	2410.08194	null
2024-10-10	GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment	Yuancheng Xu et.al.	2410.08193	null
2024-10-10	MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models	Wenbo Hu et.al.	2410.08182	null
2024-10-10	Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models	Qingni Wang et.al.	2410.08174	null
2024-10-10	On the Evaluation of Generative Robotic Simulations	Feng Chen et.al.	2410.08172	null
2024-10-10	Visual Scratchpads: Enabling Global Reasoning in Vision	Aryo Lotfi et.al.	2410.08165	null
2024-10-10	Agent S: An Open Agentic Framework that Uses Computers Like a Human	Saaket Agashe et.al.	2410.08164	link
2024-10-10	The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading	Keren Gruteke Klein et.al.	2410.08162	link
2024-10-10	DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation	Jiatao Gu et.al.	2410.08159	null
2024-10-10	Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning	Amrith Setlur et.al.	2410.08146	null
2024-10-10	Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs	Xiaoyuan Liu et.al.	2410.08145	link
2024-10-10	DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory	Yutong Wang et.al.	2410.08143	link
2024-10-10	Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction	Jarrid Rector-Brooks et.al.	2410.08134	null
2024-10-10	Think Beyond Size: Dynamic Prompting for More Effective Reasoning	Kamesh R et.al.	2410.08130	null
2024-10-10	Mars: Situated Inductive Reasoning in an Open-World Environment	Xiaojuan Tang et.al.	2410.08126	null
2024-10-09	MM-Ego: Towards Building Egocentric Multimodal LLMs	Hanrong Ye et.al.	2410.07177	null
2024-10-09	Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models	Fei Wang et.al.	2410.07176	null
2024-10-09	Do better language models have crisper vision?	Jona Ruthardt et.al.	2410.07173	null
2024-10-09	One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation	Fabian Paischer et.al.	2410.07170	link
2024-10-09	Sylber: Syllabic Embedding Representation of Speech from Raw Audio	Cheol Jun Cho et.al.	2410.07168	link
2024-10-09	Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate	Qidong Huang et.al.	2410.07167	link
2024-10-09	Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making	Manling Li et.al.	2410.07166	link
2024-10-09	Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning	Chongyu Fan et.al.	2410.07163	link
2024-10-09	Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis	Bohan Zeng et.al.	2410.07155	link
2024-10-09	Towards Interpreting Visual Information Processing in Vision-Language Models	Clement Neo et.al.	2410.07149	link
2024-10-09	Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling	Yingfa Chen et.al.	2410.07145	null
2024-10-09	Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates	Xiaosen Zheng et.al.	2410.07137	link
2024-10-10	EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models	Rui Zhao et.al.	2410.07133	link
2024-10-09	Mental Disorders Detection in the Era of Large Language Models	Gleb Kuzmin et.al.	2410.07129	null
2024-10-09	Exploring the Readiness of Prominent Small Language Models for the Democratization of Financial Literacy	Tagore Rao Kosireddy et.al.	2410.07118	link
2024-10-09	Personalized Visual Instruction Tuning	Renjie Pi et.al.	2410.07113	link
2024-10-09	VHELM: A Holistic Evaluation of Vision Language Models	Tony Lee et.al.	2410.07112	link
2024-10-09	I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy	Gian Maria Campedelli et.al.	2410.07109	link
2024-10-09	Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context	Sangwon Yu et.al.	2410.07103	null
2024-10-09	MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering	Jun Shern Chan et.al.	2410.07095	link
2024-10-07	Fine-Tuning CLIP’s Last Visual Projector: A Few-Shot Cornucopia	Mohammad Fahes et.al.	2410.05270	link
2024-10-07	Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models	Fei Wang et.al.	2410.05269	link
2024-10-07	PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs	Mengzhao Chen et.al.	2410.05265	link
2024-10-07	TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles	Qingchen Yu et.al.	2410.05262	link
2024-10-07	TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens	Ya-Qi Yu et.al.	2410.05261	null
2024-10-07	Differential Transformer	Tianzhu Ye et.al.	2410.05258	link
2024-10-07	GLEE: A Unified Framework and Benchmark for Language-based Economic Environments	Eilam Shapira et.al.	2410.05254	link
2024-10-07	Causal Micro-Narratives	Mourad Heddaya et.al.	2410.05252	null
2024-10-07	SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe	Yuxin Xiao et.al.	2410.05248	null
2024-10-07	Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents	Boyu Gou et.al.	2410.05243	link
2024-10-08	TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models	Rabin Adhikari et.al.	2410.05239	link
2024-10-07	GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models	Iman Mirzadeh et.al.	2410.05229	null
2024-10-07	Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates	Avanika Narayan et.al.	2410.05224	null
2024-10-07	Precise Model Benchmarking with Only a Few Observations	Riccardo Fogliato et.al.	2410.05222	null
2024-10-07	Density estimation with LLMs: a geometric investigation of in-context learning trajectories	Toni J. B. Liu et.al.	2410.05218	null
2024-10-07	Organizing Unstructured Image Collections using Natural Language	Mingxuan Liu et.al.	2410.05217	null
2024-10-07	Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality	Youngtaek Oh et.al.	2410.05210	link
2024-10-07	RevisEval: Improving LLM-as-a-Judge via Response-Adapted References	Qiyuan Zhang et.al.	2410.05193	null
2024-10-07	Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective	Kaiyue Wen et.al.	2410.05192	null
2024-10-07	LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation	Zhijie Wang et.al.	2410.05191	null
2024-10-04	Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models	Zhuochun Li et.al.	2410.03663	link
2024-10-04	Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models	Tinghui Zhu et.al.	2410.03659	link
2024-10-04	RAFT: Realistic Attacks to Fool Text Detectors	James Wang et.al.	2410.03658	link
2024-10-04	Aligning LLMs with Individual Preferences via Interaction	Shujin Wu et.al.	2410.03642	link
2024-10-04	Conditional Enzyme Generation Using Protein Language Models with Adapters	Jason Yang et.al.	2410.03634	null
2024-10-04	Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation	Jie Xiao et.al.	2410.03613	null
2024-10-04	TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation	Jonathan Cook et.al.	2410.03608	null
2024-10-04	LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos	Noriaki Hirose et.al.	2410.03603	null
2024-10-04	Efficiently Identifying Watermarked Segments in Mixed-Source Texts	Xuandong Zhao et.al.	2410.03600	null
2024-10-04	Understanding Reasoning in Chain-of-Thought from the Hopfieldian View	Lijie Hu et.al.	2410.03595	null
2024-10-04	Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models	Xin Zou et.al.	2410.03577	link
2024-10-04	Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs)	Abrar Rahman et.al.	2410.03568	null
2024-10-04	Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding	Wei Wu et.al.	2410.03553	null
2024-10-04	Re-examining Sexism and Misogyny Classification with Annotator Attitudes	Aiqi Jiang et.al.	2410.03543	null
2024-10-04	No Need to Talk: Asynchronous Mixture of Language Models	Anastasiia Filippova et.al.	2410.03529	null
2024-10-04	Steering Large Language Models between Code Execution and Textual Reasoning	Yongchao Chen et.al.	2410.03524	null
2024-10-04	A Probabilistic Perspective on Unlearning and Alignment for Large Language Models	Yan Scholten et.al.	2410.03523	link
2024-10-04	CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios	Zetian Ouyang et.al.	2410.03502	link
2024-10-04	FedStein: Enhancing Multi-Domain Federated Learning Through James-Stein Estimator	Sunny Gupta et.al.	2410.03499	link
2024-10-04	Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores	Robert E. Blackwell et.al.	2410.03492	null
2024-10-03	Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations	Nick Jiang et.al.	2410.02762	link
2024-10-03	FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models	Zhipei Xu et.al.	2410.02761	link
2024-10-03	Erasing Conceptual Knowledge from Language Models	Rohit Gandikota et.al.	2410.02760	link
2024-10-03	Loong: Generating Minute-level Long Videos with Autoregressive Language Models	Yuqing Wang et.al.	2410.02757	null
2024-10-03	SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost	Jifan Zhang et.al.	2410.02755	null
2024-10-03	Training Language Models on Synthetic Edit Sequences Improves Code Synthesis	Ulyana Piterbarg et.al.	2410.02749	link
2024-10-03	CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation	Han He et.al.	2410.02748	link
2024-10-03	Contrastive Localized Language-Image Pre-Training	Hong-You Chen et.al.	2410.02746	null
2024-10-03	Neutral residues: revisiting adapters for model extension	Franck Signe Talla et.al.	2410.02744	null
2024-10-03	MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions	Yekun Chai et.al.	2410.02743	link
2024-10-03	Grounding Large Language Models In Embodied Environment With Imperfect World Models	Haolan Liu et.al.	2410.02742	null
2024-10-03	Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization	Lei Xu et.al.	2410.02741	link
2024-10-03	Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models	Zhengfeng Lai et.al.	2410.02740	null
2024-10-04	Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge	Jiayi Ye et.al.	2410.02736	null
2024-10-03	DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects	Zhaowei Wang et.al.	2410.02730	link
2024-10-03	Unified Multi-Modal Interleaved Document Representation for Information Retrieval	Jaewoo Lee et.al.	2410.02729	null
2024-10-03	Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation	Rohin Manvi et.al.	2410.02725	null
2024-10-03	Large Language Models as Markov Chains	Oussama Zekri et.al.	2410.02724	null
2024-10-03	Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization	Ryan C. Barron et.al.	2410.02721	null
2024-10-03	UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation	Zixuan Li et.al.	2410.02719	null
2024-10-02	Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads	Yuxiang Huang et.al.	2410.01805	link
2024-10-02	Efficient $1$ -bit tensor approximations	Alex W. Neal Riasanovsky et.al.	2410.01799	null
2024-10-02	Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models	Joseph Lee et.al.	2410.01795	link
2024-10-02	When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1	R. Thomas McCoy et.al.	2410.01792	null
2024-10-02	Investigating on RLHF methodology	Alexey Kutalev et.al.	2410.01789	null
2024-10-02	OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models	Heng Yang et.al.	2410.01784	link
2024-10-02	Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models	Shayekh Bin Islam et.al.	2410.01782	link
2024-10-03	Quantifying Generalization Complexity for Large Language Models	Zhenting Qi et.al.	2410.01769	link
2024-10-02	Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes	Hossein Sholehrasa et.al.	2410.01755	null
2024-10-03	Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks	Mengzhao Jia et.al.	2410.01744	link
2024-10-02	VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models	Kailai Feng et.al.	2410.01738	link
2024-10-02	Visual Perception in Text Strings	Qi Jia et.al.	2410.01733	link
2024-10-02	Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing	Yilmazcan Ozyurt et.al.	2410.01727	link
2024-10-02	Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting	Longyu Feng et.al.	2410.01724	null
2024-10-02	Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective	Zeyu Gan et.al.	2410.01720	link
2024-10-02	Examining the Role of Relationship Alignment in Large Language Models	Kristen M. Altenburger et.al.	2410.01708	null
2024-10-02	Interpretable Contrastive Monte Carlo Tree Search Reasoning	Zitian Gao et.al.	2410.01707	link
2024-10-02	An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings	Soham Govande et.al.	2410.01704	link
2024-10-02	CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs	Kangsheng Wang et.al.	2410.01696	null
2024-10-02	U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models	Tung-Yu Wu et.al.	2410.01692	null
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566	null
2024-09-30	LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner	Xiaopan Zhang et.al.	2409.20560	null
2024-09-30	Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos	Md Mohaiminul Islam et.al.	2409.20557	null
2024-09-30	UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models	Qiaojun Yu et.al.	2409.20551	null
2024-09-30	LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation	Ziyao Zhang et.al.	2409.20550	link
2024-09-30	Robi Butler: Remote Multimodal Interactions with Household Robot Assistant	Anxing Xiao et.al.	2409.20548	null
2024-09-30	Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models	Arpan Mukherjee et.al.	2409.20512	null
2024-09-30	COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models	Divyanshu Daiya et.al.	2409.20502	null
2024-09-30	A Weakly Supervised Data Labeling Framework for Machine Lexical Normalization in Vietnamese Social Media	Dung Ha Nguyen et.al.	2409.20467	null
2024-09-30	Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments	Mohamed Elnoor et.al.	2409.20445	null
2024-10-01	Instance-adaptive Zero-shot Chain-of-Thought Prompting	Xiaosong Yuan et.al.	2409.20441	null
2024-09-30	HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding	Fan Yuan et.al.	2409.20429	link
2024-09-30	World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering	Jiacong Wang et.al.	2409.20424	link
2024-09-30	Anti-stereotypical Predictive Text Suggestions Do Not Reliably Yield Anti-stereotypical Writing	Connor Baumler et.al.	2409.20390	null
2024-09-30	Wait, but Tylenol is Acetaminophen… Investigating and Improving Language Models’ Ability to Resist Requests for Misinformation	Shan Chen et.al.	2409.20385	null
2024-09-30	Word-wise intonation model for cross-language TTS systems	Tomilov A. A. et.al.	2409.20374	null
2024-09-30	The Perfect Blend: Redefining RLHF with Mixture of Judges	Tengyu Xu et.al.	2409.20370	null
2024-09-30	VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs	Ruotong Liao et.al.	2409.20365	link
2024-09-30	Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models	Yizhou Huang et.al.	2409.20364	null
2024-09-30	Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference	Ke Yi et.al.	2409.20361	null
2024-09-27	Exploring Token Pruning in Vision State Space Models	Zheng Zhan et.al.	2409.18962	null
2024-09-27	LML: Language Model Learning a Dataset for Data-Augmented Prediction	Praneeth Vadlapati et.al.	2409.18957	link
2024-09-27	Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models	Jiaming Li et.al.	2409.18943	link
2024-09-27	From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding	Heqing Zou et.al.	2409.18938	link
2024-09-27	Social Media Bot Policies: Evaluating Passive and Active Enforcement	Kristina Radivojevic et.al.	2409.18931	null
2024-09-27	AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow	Huizi Yu et.al.	2409.18924	null
2024-09-27	Soft Measures for Extracting Causal Collective Intelligence	Maryam Berijanian et.al.	2409.18911	link
2024-09-27	Improving Visual Object Tracking through Visual Prompting	Shih-Fang Chen et.al.	2409.18901	link
2024-09-27	IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation	Fan Lin et.al.	2409.18892	link
2024-09-27	Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models	Zehan Li et.al.	2409.18878	null
2024-09-27	Predicting and analyzing memorization within fine-tuned Large Language Models	Jérémie Dentan et.al.	2409.18858	null
2024-09-27	Mitigating Selection Bias with Node Pruning and Auxiliary Options	Hyeong Kyu Choi et.al.	2409.18857	null
2024-09-27	LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis	Hamed Babaei Giglou et.al.	2409.18812	link
2024-09-27	Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs	Yanyuan Qiao et.al.	2409.18794	null
2024-09-27	A Survey on the Honesty of Large Language Models	Siheng Li et.al.	2409.18786	link
2024-09-27	Enhancing Explainability in Multimodal Large Language Models Using Ontological Context	Jihen Amara et.al.	2409.18753	null
2024-09-27	OpenObject-NAV: Open-Vocabulary Object-Oriented Navigation Based on Dynamic Carrier-Relationship Scene Graph	Yujie Tang et.al.	2409.18743	null
2024-09-27	Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs	Gleb Mezentsev et.al.	2409.18721	link
2024-09-27	Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity	Sergey Berezin et.al.	2409.18708	link
2024-09-27	Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models	Yiming Chen et.al.	2409.18680	link
2024-09-26	EgoLM: Multi-Modal Language Model of Egocentric Motions	Fangzhou Hong et.al.	2409.18127	null
2024-09-26	Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction	Jing He et.al.	2409.18124	null
2024-09-26	Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography	Yuexi Du et.al.	2409.18119	link
2024-09-26	E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding	Ye Liu et.al.	2409.18111	link
2024-09-26	Open-World Evaluation for Retrieving Diverse Perspectives	Hung-Ting Chen et.al.	2409.18110	null
2024-09-26	MALPOLON: A Framework for Deep Species Distribution Modeling	Theo Larcher et.al.	2409.18102	link
2024-09-26	SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation	Xin Li et.al.	2409.18082	null
2024-09-26	Infer Human’s Intentions Before Following Natural Language Instructions	Yanming Wan et.al.	2409.18073	link
2024-09-26	Infering Alt-text For UI Icons With Large Language Models During App Development	Sabrina Haque et.al.	2409.18060	null
2024-09-26	DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving	Dingrui Wang et.al.	2409.18053	link
2024-09-26	EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions	Kai Chen et.al.	2409.18042	null
2024-09-26	Compositional Hardness of Code in Large Language Models – A Probabilistic Perspective	Yotam Wolf et.al.	2409.18028	null
2024-09-26	An Adversarial Perspective on Machine Unlearning for AI Safety	Jakub Łucki et.al.	2409.18025	link
2024-09-26	DARE: Diverse Visual Question Answering with Robustness Evaluation	Hannah Sterz et.al.	2409.18023	null
2024-09-26	Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles	Lewei He et.al.	2409.18014	null
2024-09-26	Control Industrial Automation System with Large Language Models	Yuchen Xia et.al.	2409.18009	link
2024-09-26	Multilingual Evaluation of Long Context Retrieval and Reasoning	Ameeta Agrawal et.al.	2409.18006	link
2024-09-26	Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation	Ashmi Banerjee et.al.	2409.18003	null
2024-09-26	Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models	Georg Ahnert et.al.	2409.17990	link
2024-09-26	LLM4Brain: Training a Large Language Model for Brain Video Understanding	Ruizhe Zheng et.al.	2409.17987	null
2024-09-25	Attention Prompting on Image for Large Vision-Language Models	Runpeng Yu et.al.	2409.17143	link
2024-09-25	FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression	Fazal Mittu et.al.	2409.17141	link
2024-09-25	Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents	Junting Lu et.al.	2409.17140	null
2024-09-25	Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset	Andrew Goldberg et.al.	2409.17126	null
2024-09-25	Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale	Fan Zhou et.al.	2409.17115	link
2024-09-25	Unveiling Ontological Commitment in Multi-Modal Foundation Models	Mert Keser et.al.	2409.17109	null
2024-09-25	Accumulator-Aware Post-Training Quantization	Ian Colbert et.al.	2409.17092	null
2024-09-25	Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?	Bowen Zhao et.al.	2409.17080	link
2024-09-25	VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models	Yifei Liu et.al.	2409.17066	link
2024-09-25	Benchmarking Domain Generalization Algorithms in Computational Pathology	Neda Zamanitajeddin et.al.	2409.17063	link
2024-09-25	Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia	Azmul Asmar Irfan et.al.	2409.17054	null
2024-09-25	GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design	Phillip Mueller et.al.	2409.17045	null
2024-09-25	How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not	Francesco Verdini et.al.	2409.17044	null
2024-09-25	Counterfactual Token Generation in Large Language Models	Ivi Chatzi et.al.	2409.17027	link
2024-09-25	LLM-CARD: Towards a Description and Landscape of Large Language Models	Shengwei Tian et.al.	2409.17011	link
2024-09-25	Models Can and Should Embrace the Communicative Nature of Human-Generated Math	Sasha Boguraev et.al.	2409.17005	null
2024-09-26	INT-FlashAttention: Enabling Flash Attention for INT8 Quantization	Shimao Chen et.al.	2409.16997	link
2024-09-25	Harnessing Diversity for Important Data Selection in Pretraining Large Language Models	Chi Zhang et.al.	2409.16986	null
2024-09-25	AXCEL: Automated eXplainable Consistency Evaluation using LLMs	P Aditya Sreekar et.al.	2409.16984	null
2024-09-25	Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions	Zeyneb N. Kaya et.al.	2409.16974	null
2024-09-24	Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation	Yong Xien Chng et.al.	2409.16278	null
2024-09-24	LLM Echo Chamber: personalized and automated disinformation	Tony Ma et.al.	2409.16241	link
2024-09-24	EuroLLM: Multilingual Language Models for Europe	Pedro Henrique Martins et.al.	2409.16235	null
2024-09-24	Fine-Tuning is Fine, if Calibrated	Zheda Mai et.al.	2409.16223	link
2024-09-24	Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models	Omar Mussa et.al.	2409.16220	link
2024-09-24	LLMCount: Enhancing Stationary mmWave Detection with Multimodal-LLM	Boyan Li et.al.	2409.16209	null
2024-09-25	CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data	Qian-Wen Zhang et.al.	2409.16202	link
2024-09-24	Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking	Jun Bai et.al.	2409.16198	link
2024-09-24	HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models	Haoran Que et.al.	2409.16191	link
2024-09-24	Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation	Xiaohong Liu et.al.	2409.16183	null
2024-09-24	SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image	Dimitrije Antić et.al.	2409.16178	null
2024-09-24	Cyber Knowledge Completion Using Large Language Models	Braden K Webb et.al.	2409.16176	null
2024-09-24	Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering	Ziyu Zhao et.al.	2409.16167	null
2024-09-24	EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges	Talor Abramovich et.al.	2409.16165	link
2024-09-24	ComiCap: A VLMs pipeline for dense captioning of Comic Panels	Emanuele Vivoli et.al.	2409.16159	link
2024-09-24	Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework	Lu Chen et.al.	2409.16146	link
2024-09-24	Evaluation of state-of-the-art ASR Models in Child-Adult Interactions	Aditya Ashvin et.al.	2409.16135	null
2024-09-24	MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents	Ming Zhu et.al.	2409.16120	link
2024-09-25	Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration	Pin-Jui Ku et.al.	2409.16117	link
2024-09-24	Exploring Hint Generation Approaches in Open-Domain Question Answering	Jamshid Mozafari et.al.	2409.16096	link
2024-09-20	Gender Representation and Bias in Indian Civil Service Mock Interviews	Somonnoy Banerjee et.al.	2409.12194	null
2024-09-18	Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution	Peng Wang et.al.	2409.12191	link
2024-09-18	To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning	Zayne Sprague et.al.	2409.12183	link
2024-09-23	A Controlled Study on Long Context Extension and Generalization in LLMs	Yi Lu et.al.	2409.12181	link
2024-09-18	Finetuning Language Models to Emit Linguistic Expressions of Uncertainty	Arslan Chaudhry et.al.	2409.12180	null
2024-09-18	Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference	Najmeh Forouzandehmehr et.al.	2409.12150	null
2024-09-18	MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning	Justin Chih-Yao Chen et.al.	2409.12147	link
2024-09-18	MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion	Kalakonda Sai Shashank et.al.	2409.12140	link
2024-09-24	Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models	Sijing Chen et.al.	2409.12139	null
2024-09-18	GRIN: GRadient-INformed MoE	Liyuan Liu et.al.	2409.12136	null
2024-09-18	Linguini: A benchmark for language-agnostic linguistic reasoning	Eduardo Sánchez et.al.	2409.12126	link
2024-09-18	Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement	An Yang et.al.	2409.12122	null
2024-09-18	Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference	Edresson Casanova et.al.	2409.12117	null
2024-09-18	Measuring Human and AI Values based on Generative Psychometrics with Large Language Models	Haoran Ye et.al.	2409.12106	link
2024-09-19	Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval	Warren Jouanneau et.al.	2409.12097	null
2024-09-19	The Impact of Element Ordering on LM Agent Performance	Wayne Chi et.al.	2409.12089	link
2024-09-18	Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking	Ningyuan Xi et.al.	2409.12059	null
2024-09-19	Using Large Language Models to Generate Clinical Trial Tables and Figures	Yumeng Yang et.al.	2409.12046	null
2024-09-18	All-in-one foundational models learning across quantum chemical levels	Yuxinxin Chen et.al.	2409.12015	link
2024-09-18	Mixture of Prompt Learning for Vision Language Models	Yu Du et.al.	2409.12011	null
2024-09-17	AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs	Basel Mousi et.al.	2409.11404	null
2024-09-17	NVLM: Open Frontier-Class Multimodal LLMs	Wenliang Dai et.al.	2409.11402	null
2024-09-17	Says Who? Effective Zero-Shot Annotation of Focalization	Rebecca M. M. Hicke et.al.	2409.11390	null
2024-09-17	Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement	Simon Yu et.al.	2409.11378	link
2024-09-17	Towards Time Series Reasoning with LLMs	Winnie Chow et.al.	2409.11376	null
2024-09-17	Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification	Fatema-E- Jannat et.al.	2409.11375	null
2024-09-17	Learning Spatially-Aware Language and Audio Embedding	Bhavika Devnani et.al.	2409.11369	null
2024-09-17	CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration	Jiahui Gao et.al.	2409.11365	null
2024-09-17	CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark	Zachary S. Siegel et.al.	2409.11363	link
2024-09-17	AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances	Dhruv Agarwal et.al.	2409.11360	null
2024-09-17	THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models	Mengfei Liang et.al.	2409.11353	link
2024-09-17	LPT++: Efficient Training on Mixture of Long-tailed Experts	Bowen Dong et.al.	2409.11323	null
2024-09-17	SOAP: Improving and Stabilizing Shampoo using Adam	Nikhil Vyas et.al.	2409.11321	link
2024-09-17	Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models	Divij Gupta et.al.	2409.11302	null
2024-09-17	Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5	Marcel Lamott et.al.	2409.11282	null
2024-09-17	P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task	Weiye Xu et.al.	2409.11279	null
2024-09-17	Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments	Maria Rigaki et.al.	2409.11276	null
2024-09-17	Task Arithmetic for Language Expansion in Speech Translation	Yao-Fei Cheng et.al.	2409.11274	null
2024-09-17	LOLA – An Open-Source Massively Multilingual Large Language Model	Nikit Srivastava et.al.	2409.11272	link
2024-09-17	Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models	Jiahao Qin et.al.	2409.11263	null
2024-09-16	RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	Di Liu et.al.	2409.10516	link
2024-09-16	Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models	Momoko Shiraishi et.al.	2409.10506	null
2024-09-16	DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction	John Wu et.al.	2409.10504	null
2024-09-16	Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles	Kulin Shah et.al.	2409.10502	link
2024-09-16	Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models	Shaznin Sultana et.al.	2409.10490	null
2024-09-16	Do Pre-trained Vision-Language Models Encode Object States?	Kaleb Newman et.al.	2409.10488	link
2024-09-16	XLM for Autonomous Driving Systems: A Comprehensive Review	Sonda Fourati et.al.	2409.10484	null
2024-09-16	Schrodinger’s Memory: Large Language Models	Wei Wang et.al.	2409.10482	null
2024-09-16	Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face	Adekunle Ajibode et.al.	2409.10472	link
2024-09-16	LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning	Jicong Ao et.al.	2409.10444	link
2024-09-16	CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera	Jingpei Lu et.al.	2409.10441	null
2024-09-16	HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models	Vineet Bhat et.al.	2409.10419	link
2024-09-16	A Large-Scale Privacy Assessment of Android Third-Party SDKs	Mark Huasong Meng et.al.	2409.10411	null
2024-09-16	A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration	Zhang Zheng et.al.	2409.10403	null
2024-09-17	Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot	Bhuvan Sachdeva et.al.	2409.10354	null
2024-09-16	Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation	Tianrui Song et.al.	2409.10343	null
2024-09-16	The 20 questions game to distinguish large language models	Gurvan Richardeau et.al.	2409.10338	null
2024-09-16	MGSA: Multi-granularity Graph Structure Attention for Knowledge Graph-to-Text Generation	Shanshan Wang et.al.	2409.10294	null
2024-09-16	ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework	Jiahao Yuan et.al.	2409.10289	link
2024-09-16	ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code	Jia Feng et.al.	2409.10280	link
2024-09-13	Agents in Software Engineering: Survey, Landscape, and Vision	Yanxian Huang et.al.	2409.09030	link
2024-09-13	Contri(e)ve: Context + Retrieve for Scholarly Question Answering	Kanchan Shivashankar et.al.	2409.09010	null
2024-09-13	Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance	Lucio La Cava et.al.	2409.08963	null
2024-09-13	Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions	Zahra Ashktorab et.al.	2409.08937	null
2024-09-13	SynSUM – Synthetic Benchmark with Structured and Unstructured Medical Records	Paloma Rabaey et.al.	2409.08936	link
2024-09-13	LLM-based Weak Supervision Framework for Query Intent Classification in Video Search	Farnoosh Javadi et.al.	2409.08931	null
2024-09-13	Affective Computing Has Changed: The Foundation Model Disruption	Björn Schuller et.al.	2409.08907	null
2024-09-13	AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models	Yifei Yao et.al.	2409.08904	link
2024-09-13	A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research	Martin Obschonka et.al.	2409.08890	null
2024-09-13	Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark	Xuchen Li et.al.	2409.08887	null
2024-09-13	Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies	Zhiqiang Zhong et.al.	2409.08864	null
2024-09-13	FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition	Zhenhua Xu et.al.	2409.08846	null
2024-09-13	AIPO: Improving Training Objective for Iterative Preference Optimization	Yaojie Shen et.al.	2409.08845	link
2024-09-13	A RAG Approach for Generating Competency Questions in Ontology Engineering	Xueli Pan et.al.	2409.08820	null
2024-09-13	Your Weak LLM is Secretly a Strong Teacher for Alignment	Leitian Tao et.al.	2409.08813	null
2024-09-13	Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task	Shao Zhang et.al.	2409.08811	null
2024-09-13	LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment	Huan Zhang et.al.	2409.08795	link
2024-09-13	Optimizing Ingredient Substitution Using Large Language Models to Enhance Phytochemical Content in Recipes	Luis Rita et.al.	2409.08792	null
2024-09-13	Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling	Jialu Tang et.al.	2409.08788	null
2024-09-13	Uncertainty and Generalizability in Foundation Models for Earth Observation	Raul Ramos-Pollan et.al.	2409.08744	null
2024-09-12	Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale	Rogerio Bonatti et.al.	2409.08264	link
2024-09-12	OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering	Jiahao Nick Li et.al.	2409.08250	null
2024-09-12	Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources	Alisia Lupidi et.al.	2409.08239	null
2024-09-12	LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems	Hakan T. Otal et.al.	2409.08234	link
2024-09-12	Adaptive Language-Guided Abstraction from Contrastive Explanations	Andi Peng et.al.	2409.08212	null
2024-09-12	ComAlign: Compositional Alignment in Vision-Language Models	Ali Abdollah et.al.	2409.08206	null
2024-09-12	What Makes a Maze Look Like a Maze?	Joy Hsu et.al.	2409.08202	null
2024-09-12	AudioBERT: Audio Knowledge Augmented Language Model	Hyunjong Ok et.al.	2409.08199	link
2024-09-12	Fine-tuning Large Language Models for Entity Matching	Aaron Steiner et.al.	2409.08185	link
2024-09-12	On the Role of Context in Reading Time Prediction	Andreas Opedal et.al.	2409.08160	link
2024-09-12	Faster Speech-LLaMA Inference with Multi-token Prediction	Desh Raj et.al.	2409.08148	null
2024-09-12	LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models	Zhengliang Liu et.al.	2409.08147	null
2024-09-12	Towards a graph-based foundation model for network traffic analysis	Louis Van Langendonck et.al.	2409.08111	null
2024-09-12	The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language	Michael Ong et.al.	2409.08103	null
2024-09-12	The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal	Huiyuan Xie et.al.	2409.08098	null
2024-09-12	Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks	Benji Peng et.al.	2409.08087	null
2024-09-12	SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality	Chenyang Lei et.al.	2409.08083	link
2024-09-12	SoVAR: Building Generalizable Scenarios from Accident Reports for Autonomous Driving Testing	An Guo et.al.	2409.08081	link
2024-09-12	TravelAgent: An AI Assistant for Personalized Travel Planning	Aili Chen et.al.	2409.08069	null
2024-09-12	An Evaluation Framework for Attributed Information Retrieval using Large Language Models	Hanane Djeddal et.al.	2409.08014	link
2024-09-11	“My Grade is Wrong!”: A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays	Shengxin Hong et.al.	2409.07453	null
2024-09-11	StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos	Sijie Zhao et.al.	2409.07447	null
2024-09-11	SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories	Ben Bogin et.al.	2409.07440	link
2024-09-11	A Suite for Acoustic Language Model Evaluation	Gallil Maimon et.al.	2409.07437	link
2024-09-11	Synthetic continued pretraining	Zitong Yang et.al.	2409.07431	link
2024-09-11	Agent Workflow Memory	Zora Zhiruo Wang et.al.	2409.07429	link
2024-09-11	CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification	Zeqing Qin et.al.	2409.07407	null
2024-09-11	AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge	Han Wang et.al.	2409.07394	link
2024-09-11	Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination	Daniel Zhang-Li et.al.	2409.07372	null
2024-09-11	Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code	Khiem Ton et.al.	2409.07368	null
2024-09-11	Think Together and Work Better: Combining Humans’ and LLMs’ Think-Aloud Outcomes for Effective Text Evaluation	SeongYeub Chu et.al.	2409.07355	link
2024-09-11	Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks	Md Zarif Hossain et.al.	2409.07353	link
2024-09-11	Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization	Mehrdad Zakershahrak et.al.	2409.07335	null
2024-09-11	Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering	Weixi Weng et.al.	2409.07331	null
2024-09-11	MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications	Praveen K Kanithi et.al.	2409.07314	null
2024-09-11	Exploring User-level Gradient Inversion with a Diffusion Prior	Zhuohang Li et.al.	2409.07291	null
2024-09-11	STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM	Qijiong Liu et.al.	2409.07276	null
2024-09-11	MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Enming Zhang et.al.	2409.07267	link
2024-09-11	Alignment of Diffusion Models: Fundamentals, Challenges, and Future	Buhua Liu et.al.	2409.07253	link
2024-09-11	PiTe: Pixel-Temporal Alignment for Large Video-Language Model	Yang Liu et.al.	2409.07239	link
2024-09-10	Benchmarking Sub-Genre Classification For Mainstage Dance Music	Hongzhi Shu et.al.	2409.06690	null
2024-09-10	E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning	Zihan Liao et.al.	2409.06679	null
2024-09-10	LLaMA-Omni: Seamless Speech Interaction with Large Language Models	Qingkai Fang et.al.	2409.06666	link
2024-09-10	Human Perception of LLM-generated Text Content in Social Media Environments	Kristina Radivojevic et.al.	2409.06653	null
2024-09-10	Optimal Workload Placement on Multi-Instance GPUs	Bekir Turkkan et.al.	2409.06646	null
2024-09-10	EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis	Danli Shi et.al.	2409.06644	link
2024-09-11	Segmenting sea ice floes in close-range optical imagery with active contour and foundation models	Giulio Passerotti et.al.	2409.06641	null
2024-09-10	TeXBLEU: Automatic Metric for Evaluate LaTeX Format	Kyudan Jung et.al.	2409.06639	link
2024-09-10	MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders	Wenyu Zhang et.al.	2409.06635	null
2024-09-10	A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio	Ningyuan Xi et.al.	2409.06624	null
2024-09-10	Exploring Italian sentence embeddings properties through multi-tasking	Vivi Nastase et.al.	2409.06622	link
2024-09-10	Alleviating Hallucinations in Large Language Models with Scepticism Modeling	Yetao Wu et.al.	2409.06601	null
2024-09-10	GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering	Sacha Muller et.al.	2409.06595	link
2024-09-10	Quantifying and Enabling the Interpretability of CLIP-like Models	Avinash Madasu et.al.	2409.06579	null
2024-09-10	Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement	Vivi Nastase et.al.	2409.06567	null
2024-09-10	MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science	Mahdieh Aliazam et.al.	2409.06558	null
2024-09-10	Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games	Juhwan Choi et.al.	2409.06518	link
2024-09-10	Aligning Machine and Human Visual Representations across Abstraction Levels	Lukas Muttenthaler et.al.	2409.06509	null
2024-09-10	Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding	Xiaoyu Liang et.al.	2409.06485	null
2024-09-10	Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles	Qiujing Lu et.al.	2409.06450	null
2024-09-09	MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct	Run Luo et.al.	2409.05840	null
2024-09-09	Are Large Language Models a Threat to Programming Platforms? An Exploratory Study	Md Mustakim Billah et.al.	2409.05824	null
2024-09-09	VFA: Vision Frequency Analysis of Foundation Models and Human	Mohammad-Javad Darvishi-Bayazi et.al.	2409.05817	null
2024-09-09	Improving Pretraining Data Using Perplexity Correlations	Tristan Thrush et.al.	2409.05816	null
2024-09-09	Benchmarking Chinese Knowledge Rectification in Large Language Models	Tianhe Lu et.al.	2409.05806	link
2024-09-09	Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models	Emily Cheng et.al.	2409.05771	null
2024-09-09	Model Input Verification of Large Scale Simulations	Rumyana Neykova et.al.	2409.05768	null
2024-09-09	A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System	B. Sankar et.al.	2409.05747	null
2024-09-09	LLMs Will Always Hallucinate, and We Need to Live With This	Sourav Banerjee et.al.	2409.05746	null
2024-09-09	A System and Benchmark for LLM-based Q\&A on Heterogeneous Data	Achille Fokoue et.al.	2409.05735	null
2024-09-09	Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach	Meng Zhou et.al.	2409.05732	null
2024-09-09	The Influence of Task and Group Disparities over Users’ Attitudes Toward Using Large Language Models for Psychotherapy	Qihang He et.al.	2409.05703	null
2024-09-09	Segmentation by Factorization: Unsupervised Semantic Segmentation for Pathology by Factorizing Foundation Model Features	Jacob Gildenblat et.al.	2409.05697	null
2024-09-09	Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone!	Yuchen Shen et.al.	2409.05672	null
2024-09-09	Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case	Vagrant Gautam et.al.	2409.05653	link
2024-09-10	MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery	Hongjin Qian et.al.	2409.05591	link
2024-09-09	Leveraging Content and Acoustic Representations for Efficient Speech Emotion Recognition	Soumya Dutta et.al.	2409.05566	link
2024-09-09	CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning	Jinwei He et.al.	2409.05559	null
2024-09-09	SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning	Alireza Ghafarollahi et.al.	2409.05556	link
2024-09-09	Harmonic Reasoning in Large Language Models	Anna Kruspe et.al.	2409.05521	null
2024-09-06	VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation	Yecheng Wu et.al.	2409.04429	link
2024-09-06	Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques	Davide Clode da Silva et.al.	2409.04424	null
2024-09-06	RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs	Jiaxing Wu et.al.	2409.04421	null
2024-09-06	Question-Answering Dense Video Events	Hangyu Qin et.al.	2409.04388	link
2024-09-06	Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs	Aliakbar Nafar et.al.	2409.04318	link
2024-09-06	An optically accelerated extreme learning machine using hot atomic vapors	Pierre Azam et.al.	2409.04312	null
2024-09-06	Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets	Desiree Heim et.al.	2409.04286	null
2024-09-06	Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models	Yuxiao Huang et.al.	2409.04270	null
2024-09-06	An overview of domain-specific foundation model: key technologies, applications and challenges	Haolong Chen et.al.	2409.04267	null
2024-09-06	UniDet3D: Multi-dataset Indoor 3D Object Detection	Maksim Kolodiazhnyi et.al.	2409.04234	link
2024-09-06	Fast Forwarding Low-Rank Training	Adir Rahamim et.al.	2409.04206	null
2024-09-06	Residual Stream Analysis with Multi-Layer SAEs	Tim Lawson et.al.	2409.04185	link
2024-09-06	GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding	Ziyin Zhang et.al.	2409.04183	null
2024-09-06	Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering	Larissa Pusch et.al.	2409.04181	null
2024-09-06	From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks	Andreas Stephan et.al.	2409.04168	null
2024-09-06	Can OpenSource beat ChatGPT? – A Comparative Study of Large Language Models for Text-to-Code Generation	Luis Mayer et.al.	2409.04164	null
2024-09-06	Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering	Jan Hofmann et.al.	2409.04122	null
2024-09-06	Multi-Programming Language Ensemble for Code Generation in Large Language Model	Tengfei Xue et.al.	2409.04114	link
2024-09-06	Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers	Chenglei Si et.al.	2409.04109	link
2024-09-06	UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity	Yicheng Fu et.al.	2409.04081	null
2024-09-05	Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding	Yunze Man et.al.	2409.03757	link
2024-09-05	Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution	Marga Don et.al.	2409.03754	link
2024-09-05	Attention Heads of Large Language Models: A Survey	Zifan Zheng et.al.	2409.03752	link
2024-09-05	LLM-CI: Assessing Contextual Integrity Norms in Language Models	Yan Shvartzshnaider et.al.	2409.03735	null
2024-09-05	Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry	Meena Jagadeesan et.al.	2409.03734	null
2024-09-05	Planning In Natural Language Improves LLM Search For Code Generation	Evan Wang et.al.	2409.03733	link
2024-09-06	RAG based Question-Answering for Contextual Response Prediction System	Sriram Veturi et.al.	2409.03708	null
2024-09-05	LAST: Language Model Aware Speech Tokenization	Arnon Turetzky et.al.	2409.03701	null
2024-09-05	TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems	Stylianos Loukas Vasileiou et.al.	2409.03671	link
2024-09-05	A Fused Large Language Model for Predicting Startup Success	Abdurahman Maarouf et.al.	2409.03668	null
2024-09-05	The representation landscape of few-shot learning and fine-tuning in large language models	Diego Doimo et.al.	2409.03662	link
2024-09-06	LLM-based multi-agent poetry generation in non-cooperative environments	Ran Zhang et.al.	2409.03659	link
2024-09-05	On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization	Yong Lin et.al.	2409.03650	null
2024-09-05	Text-Guided Mixup Towards Long-Tailed Image Categorization	Richard Franklin et.al.	2409.03583	link
2024-09-05	FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation	Xi Chen et.al.	2409.03525	null
2024-09-05	Have Large Vision-Language Models Mastered Art History?	Ombretta Strafforello et.al.	2409.03521	null
2024-09-05	Tissue Concepts: supervised foundation models in computational pathology	Till Nicke et.al.	2409.03519	link
2024-09-05	From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents	Jifan Yu et.al.	2409.03512	null
2024-09-05	LLM-based event abstraction and integration for IoT-sourced logs	Mohsen Shirali et.al.	2409.03478	link
2024-09-05	How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes	Inacio Vieira et.al.	2409.03454	null
2024-09-04	RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)	Yao Mu et.al.	2409.02920	null
2024-09-04	Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving	Yuhang Lu et.al.	2409.02914	null
2024-09-04	Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling	Kaiwen Zheng et.al.	2409.02908	null
2024-09-05	LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA	Jiajie Zhang et.al.	2409.02897	link
2024-09-04	LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture	Xidong Wang et.al.	2409.02889	link
2024-09-04	CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently	Jonathan Zalach et.al.	2409.02885	null
2024-09-04	Benchmarking Spurious Bias in Few-Shot Image Classifiers	Guangtao Zheng et.al.	2409.02882	link
2024-09-04	Configurable Foundation Models: Building LLMs from a Modular Perspective	Chaojun Xiao et.al.	2409.02877	null
2024-09-04	Historical German Text Normalization Using Type- and Token-Based Language Modeling	Anton Ehrmanntraut et.al.	2409.02841	null
2024-09-04	Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models	Moein Shahiki Tash et.al.	2409.02836	null
2024-09-04	CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models	Wentao Liu et.al.	2409.02834	link
2024-09-04	ExpLLM: Towards Chain of Thought for Facial Expression Recognition	Xing Lan et.al.	2409.02828	null
2024-09-04	Design Contradictions: Help or Hindrance?	Aron E. Owen et.al.	2409.02823	null
2024-09-04	Language Understanding as a Constraint on Consensus Size in LLM Societies	Giordano De Marzo et.al.	2409.02822	null
2024-09-04	Towards a Unified View of Preference Learning for Large Language Models: A Survey	Bofei Gao et.al.	2409.02795	link
2024-09-05	Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?	Yixuan Tang et.al.	2409.02727	link
2024-09-04	Pre-training data selection for biomedical domain adaptation using journal impact metrics	Mathieu Laï-king et.al.	2409.02725	null
2024-09-04	Alignment-Aware Model Extraction Attacks on Large Language Models	Zi Liang et.al.	2409.02718	link
2024-09-04	Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL	Mohammad Reshadati et.al.	2409.02711	null
2024-09-04	LLM-Assisted Visual Analytics: Opportunities and Challenges	Maeve Hutchinson et.al.	2409.02691	null
2024-08-30	SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists	Raoyuan Zhao et.al.	2408.17437	link
2024-08-30	DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model	Mona Sheikh Zeinoddin et.al.	2408.17433	link
2024-08-30	Advancing Multi-talker ASR Performance with Large Language Models	Mohan Shi et.al.	2408.17431	null
2024-08-30	CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models	Jonathan Bourne et.al.	2408.17428	link
2024-09-03	Open-vocabulary Temporal Action Localization using VLMs	Naoki Wake et.al.	2408.17422	null
2024-08-30	Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach	Jialiang Wei et.al.	2408.17404	link
2024-08-30	EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution	Francesco Argenziano et.al.	2408.17379	null
2024-08-30	NDP: Next Distribution Prediction as a More Broad Target	Junhao Ruan et.al.	2408.17377	null
2024-08-30	Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain	Francesca Grasso et.al.	2408.17362	link
2024-08-30	Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage	Md Rafi Ur Rashid et.al.	2408.17354	null
2024-09-02	LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation	Shuyi Ouyang et.al.	2408.17347	null
2024-08-30	Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering	Nicholas Pochinkov et.al.	2408.17322	link
2024-08-30	Bridging Domain Knowledge and Process Discovery Using Large Language Models	Ali Norouzifar et.al.	2408.17316	link
2024-08-30	Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts	Rhui Dih Lee et.al.	2408.17280	null
2024-08-30	Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach	Tong Nie et.al.	2408.17258	null
2024-08-30	VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters	Mouxiang Chen et.al.	2408.17253	link
2024-08-30	Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study	Shubham Agarwal et.al.	2408.17181	null
2024-08-30	Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model	Zhen Ye et.al.	2408.17175	link
2024-08-30	Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning	Xiaoye Qu et.al.	2408.17150	link
2024-08-30	Reasoning AI Performance Degradation in 6G Networks with Large Language Models	Liming Huang et.al.	2408.17097	null
2024-08-29	PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning	Noor Hussein et.al.	2408.16769	link
2024-08-29	How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models	Jiyue Jiang et.al.	2408.16756	link
2024-08-29	Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models	Alec Solway et.al.	2408.16753	null
2024-08-29	A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models	Yi-Lin Tuan et.al.	2408.16751	null
2024-08-29	Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge	Beidi Dong et.al.	2408.16749	null
2024-08-29	Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models	Jiří Milička et.al.	2408.16740	null
2024-08-29	Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling	Hritik Bansal et.al.	2408.16737	null
2024-08-29	VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation	Shiwei Wu et.al.	2408.16730	null
2024-08-30	Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming	Zhifei Xie et.al.	2408.16725	link
2024-08-29	GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models	Moreno D’Incà et.al.	2408.16700	link
2024-08-29	Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity	Ziniu Li et.al.	2408.16673	null
2024-08-29	Space3D-Bench: Spatial 3D Question Answering Benchmark	Emilia Szymanska et.al.	2408.16662	null
2024-08-29	DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving	Yongjie Fu et.al.	2408.16647	null
2024-08-29	Examination of Code generated by Large Language Models	Robin Beer et.al.	2408.16601	link
2024-08-29	Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies	Zhiyang Qi et.al.	2408.16586	null
2024-08-29	WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling	Shengpeng Ji et.al.	2408.16532	link
2024-08-29	CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues	Rena Gao et.al.	2408.16518	link
2024-08-29	LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?	Jan Cegin et.al.	2408.16502	null
2024-08-29	CogVLM2: Visual Language Models for Image and Video Understanding	Wenyi Hong et.al.	2408.16500	link
2024-08-29	A Survey on Evaluating Large Language Models in Code Generation Tasks	Liguo Chen et.al.	2408.16498	null
2024-08-28	Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders	Min Shi et.al.	2408.15998	link
2024-08-29	Spatio-Temporal Context Prompting for Zero-Shot Action Detection	Wei-Jhe Huang et.al.	2408.15996	null
2024-08-28	Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration	Xu Zhang et.al.	2408.15994	null
2024-08-28	BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems	Wei Wang et.al.	2408.15971	null
2024-08-28	More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding	Yuan Tang et.al.	2408.15966	link
2024-08-28	Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games	Nicholas R. Waytowich et.al.	2408.15950	null
2024-08-28	DeMoBot: Deformable Mobile Manipulation with Vision-based Sub-goal Retrieval	Yuying Zhang et.al.	2408.15919	null
2024-08-28	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	Yuncheng Yang et.al.	2408.15915	link
2024-08-28	Decentralized LLM Inference over Edge Networks with Energy Harvesting	Aria Khoshsirat et.al.	2408.15907	null
2024-08-28	LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments	Ruirui Chen et.al.	2408.15903	null
2024-08-28	Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts	Nikolas Gritsch et.al.	2408.15901	null
2024-08-28	Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models	Sebastian Vallejo Vera et.al.	2408.15895	null
2024-08-28	LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Fangxun Shu et.al.	2408.15881	link
2024-08-28	Persuasion Games using Large Language Models	Ganesh Prasath Ramani et.al.	2408.15879	null
2024-08-28	Retrieval-Augmented Instruction Tuning for Automated Process Engineering Calculations : A Tool-Chaining Problem-Solving Framework with Attributable Reflection	Sagar Srinivas Sakhinana et.al.	2408.15866	null
2024-08-28	Benchmarking foundation models as feature extractors for weakly-supervised computational pathology	Peter Neidlinger et.al.	2408.15823	null
2024-08-28	Visual Prompt Engineering for Medical Vision Language Models in Radiology	Stefan Denner et.al.	2408.15802	null
2024-08-28	Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization	Léo Hemamou et.al.	2408.15801	null
2024-08-28	Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models	Hédi Zhegidi et.al.	2408.15796	link
2024-08-28	Efficient LLM Scheduling by Learning to Rank	Yichao Fu et.al.	2408.15792	link
2024-08-27	Generative Verifiers: Reward Modeling as Next-Token Prediction	Lunjun Zhang et.al.	2408.15240	null
2024-08-27	The Mamba in the Llama: Distilling and Accelerating Hybrid Models	Junxiong Wang et.al.	2408.15237	link
2024-08-27	Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations	Yucheng Jiang et.al.	2408.15232	null
2024-08-27	LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet	Nathaniel Li et.al.	2408.15221	null
2024-08-27	Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks	Shide Zhou et.al.	2408.15207	null
2024-08-27	Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation	Jian Hu et.al.	2408.15205	link
2024-08-27	Can Unconfident LLM Annotations Be Used for Confident Conclusions?	Kristina Gligorić et.al.	2408.15204	link
2024-08-27	Infusing Acoustic Pause Context into Text-Based Dementia Assessment	Franziska Braun et.al.	2408.15188	null
2024-08-27	Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement	Longshen Ou et.al.	2408.15176	null
2024-08-27	X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation	Hanjia Lyu et.al.	2408.15172	null
2024-08-27	Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation	N. E. Kriman et.al.	2408.15171	null
2024-08-27	How transformers learn structured data: insights from hierarchical filtering	Jerome Garnier-Brun et.al.	2408.15138	link
2024-08-27	CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP	Zhenchen Tang et.al.	2408.15098	null
2024-08-27	Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models	Xiyu Liu et.al.	2408.15091	null
2024-08-27	BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline	Guosheng Dong et.al.	2408.15079	null
2024-08-27	Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models	Ned Cooper et.al.	2408.15066	null
2024-08-27	The Benefits of Balance: From Information Projections to Variance Reduction	Lang Liu et.al.	2408.15065	null
2024-08-28	DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Wenhui Liao et.al.	2408.15045	link
2024-08-28	A Survey of Large Language Models for European Languages	Wazir Ali et.al.	2408.15040	null
2024-08-27	Speech Recognition Transformers: Topological-lingualism Perspective	Shruti Singh et.al.	2408.14991	null
2024-08-26	A Practitioner’s Guide to Continual Multimodal Pretraining	Karsten Roth et.al.	2408.14471	link
2024-08-27	Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models	Aradhye Agarwal et.al.	2408.14470	link
2024-08-26	Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos	Qirui Chen et.al.	2408.14469	null
2024-08-26	Explicit Inductive Inference using Large Language Models	Tianyang Liu et.al.	2408.14467	null
2024-08-26	Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study	Liuchang Xu Shuo Zhao et.al.	2408.14438	null
2024-08-26	Social perception of faces in a vision-language model	Carina I. Hausladen et.al.	2408.14435	link
2024-08-26	CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models	Shubham Bharti et.al.	2408.14419	null
2024-08-26	MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues	Kuluhan Binici et.al.	2408.14418	null
2024-08-26	Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse	Yahao Ding et.al.	2408.14416	null
2024-08-26	Language-specific Calibration for Pruning Multilingual Language Models	Simon Kurz et.al.	2408.14398	null
2024-08-26	Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning	Sakhinana Sagar Srinivas et.al.	2408.14387	null
2024-08-26	Probing Causality Manipulation of Large Language Models	Chenyang Zhang et.al.	2408.14380	link
2024-08-26	An Embedding is Worth a Thousand Noisy Labels	Francesco Di Salvo et.al.	2408.14358	link
2024-08-26	SWE-bench-java: A GitHub Issue Resolving Benchmark for Java	Daoguang Zan et.al.	2408.14354	link
2024-08-26	Assessing Contamination in Large Language Models: Introducing the LogProber method	Nicolas Yax et.al.	2408.14352	null
2024-08-26	Foundation Models for Music: A Survey	Yinghao Ma et.al.	2408.14340	link
2024-08-26	Claim Verification in the Age of Large Language Models: A Survey	Alphaeus Dmonte et.al.	2408.14317	null
2024-08-26	LLM-3D Print: Large Language Models To Monitor and Control 3D Printing	Yayati Jadhav et.al.	2408.14307	null
2024-08-26	Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails	Malte Josten et.al.	2408.14293	link
2024-08-26	Predictability and Causality in Spanish and English Natural Language Generation	Andrea Busto-Castiñeira et.al.	2408.14283	null
2024-08-23	MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?	Yi-Fan Zhang et.al.	2408.13257	null
2024-08-23	Domain-specific long text classification from sparse relevant information	Célia D’Cruz et.al.	2408.13253	null
2024-08-23	Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption	Sakhinana Sagar Srinivas et.al.	2408.13248	null
2024-08-23	Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time	Yingyu Liang et.al.	2408.13233	null
2024-08-23	EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods	Hongcheng Ding et.al.	2408.13214	null
2024-08-23	DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation	Qiming Zhu et.al.	2408.13204	null
2024-08-23	Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning	Hourui Deng et.al.	2408.13184	null
2024-08-23	IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models	Zhihao Yu et.al.	2408.13073	link
2024-08-23	Guiding IoT-Based Healthcare Alert Systems with Large Language Models	Yulan Gao et.al.	2408.13071	null
2024-08-23	SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks	Kai-Wei Chang et.al.	2408.13040	null
2024-08-23	VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models	Wentao Wu et.al.	2408.13031	link
2024-08-23	In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting	Haowei Du et.al.	2408.13028	null
2024-08-23	A Web-Based Solution for Federated Learning with LLM-Based Automation	Chamith Mawela et.al.	2408.13010	null
2024-08-23	Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates	Hui Wei et.al.	2408.13006	link
2024-08-23	CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution	Ruiyang Xu et.al.	2408.13001	null
2024-08-23	Open Llama2 Model for the Lithuanian Language	Artūras Nakvosas et.al.	2408.12963	null
2024-08-23	Multimodal Contrastive In-Context Learning	Yosuke Miyanishi et.al.	2408.12959	null
2024-08-23	Image Segmentation in Foundation Model Era: A Survey	Tianfei Zhou et.al.	2408.12957	link
2024-08-23	E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group	Yue Pan et.al.	2408.12948	null
2024-08-23	Causal-Guided Active Learning for Debiasing Large Language Models	Zhouhao Sun et.al.	2408.12942	link
2024-08-22	Controllable Text Generation for Large Language Models: A Survey	Xun Liang et.al.	2408.12599	link
2024-08-23	Non-Homophilic Graph Pre-Training and Prompt Learning	Xingtong Yu et.al.	2408.12594	link
2024-08-22	RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment	Xiaohan Wang et.al.	2408.12579	null
2024-08-22	MuMA-ToM: Multi-modal Multi-Agent Theory of Mind	Haojun Shi et.al.	2408.12574	link
2024-08-22	Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Jamba Team et.al.	2408.12570	null
2024-08-22	ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation	Lujia Zhong et.al.	2408.12561	link
2024-08-22	Towards Evaluating and Building Versatile Large Language Models for Medicine	Chaoyi Wu et.al.	2408.12547	link
2024-08-22	Show-o: One Single Transformer to Unify Multimodal Understanding and Generation	Jinheng Xie et.al.	2408.12528	null
2024-08-22	MEDCO: Medical Education Copilots Based on A Multi-Agent Framework	Hao Wei et.al.	2408.12496	null
2024-08-22	GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models	Kunsheng Tang et.al.	2408.12494	link
2024-08-23	Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese	Khang T. Doan et.al.	2408.12480	null
2024-08-22	Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition	Bozheng Li et.al.	2408.12475	null
2024-08-22	DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems	Jiaju Chen et.al.	2408.12470	link
2024-08-22	Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning	Mushui Liu et.al.	2408.12469	null
2024-08-22	Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing	Mengqi Zhang et.al.	2408.12456	null
2024-08-22	Positional Description for Numerical Normalization	Deepanshu Gupta et.al.	2408.12430	null
2024-08-22	FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing	Jue Wang et.al.	2408.12429	link
2024-08-22	Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification	Sudi Murindanyi et.al.	2408.12426	null
2024-08-22	Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code	Mahdi Kazemi et.al.	2408.12416	null
2024-08-22	Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes	Sota Kato et.al.	2408.12406	link
2024-08-21	Great Memory, Shallow Reasoning: Limits of $k$ NN-LMs	Shangyi Geng et.al.	2408.11815	link
2024-08-21	SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs	Yuanyang Yin et.al.	2408.11813	null
2024-08-21	EmbodiedSAM: Online Segment Any 3D Thing in Real Time	Xiuwei Xu et.al.	2408.11811	null
2024-08-21	Approaching Deep Learning through the Spectral Dynamics of Weights	David Yunis et.al.	2408.11804	link
2024-08-21	Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models	Yuzhou Huang et.al.	2408.11801	null
2024-08-21	PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain	Rounak Meyur et.al.	2408.11800	null
2024-08-21	Practical token pruning for foundation models in few-shot conversational virtual assistant systems	Haode Qi et.al.	2408.11799	null
2024-08-21	EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model	Feipeng Ma et.al.	2408.11795	null
2024-08-21	Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design	Nathaniel H. Park et.al.	2408.11793	null
2024-08-21	Critique-out-Loud Reward Models	Zachary Ankner et.al.	2408.11791	link
2024-08-21	DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework	Zhifei Xie et.al.	2408.11788	null
2024-08-21	Personality Alignment of Large Language Models	Minjun Zhu et.al.	2408.11779	link
2024-08-21	Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards	Omar Erak et.al.	2408.11775	link
2024-08-21	Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks	Yiyi Chen et.al.	2408.11749	link
2024-08-21	DH-Bench: Probing Depth and Height Perception of Large Visual-Language Models	Shehreen Azad et.al.	2408.11748	link
2024-08-21	Open-Ended 3D Point Cloud Instance Segmentation	Phuc D. A. Nguyen et.al.	2408.11747	null
2024-08-21	Mixed Sparsity Training: Achieving 4 $\times$ FLOP Reduction for Transformer Pretraining	Pihe Hu et.al.	2408.11746	null
2024-08-21	FocusLLM: Scaling LLM’s Context by Parallel Decoding	Zhenyu Li et.al.	2408.11745	link
2024-08-21	MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Elias Frantar et.al.	2408.11743	link
2024-08-21	CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering	Yuliang Cai et.al.	2408.11742	link
2024-08-20	Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement	Satoshi Kosugi et.al.	2408.11055	link
2024-08-20	Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks	Nathaniel Pinckney et.al.	2408.11053	link
2024-08-20	FLAME: Learning to Navigate with Multimodal LLM in Urban Environments	Yunzhe Xu et.al.	2408.11051	link
2024-08-20	MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding	Jian Chen et.al.	2408.11049	link
2024-08-20	Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders	Yuan Xin et.al.	2408.11046	null
2024-08-20	Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research	Sreyoshi Bhaduri et.al.	2408.11043	null
2024-08-20	Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model	Chunting Zhou et.al.	2408.11039	null
2024-08-20	Scaling Law with Learning Rate Annealing	Howe Tissue et.al.	2408.11029	null
2024-08-20	Athena: Safe Autonomous Agents with Verbal Contrastive Learning	Tanmana Sadhu et.al.	2408.11021	null
2024-08-20	While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output?	Wen Cheng et.al.	2408.11006	link
2024-08-20	SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining	Jonathan Prexl et.al.	2408.11000	link
2024-08-20	CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models	Michael Reinisch et.al.	2408.10995	null
2024-08-20	Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models	Yuyan Chen et.al.	2408.10947	null
2024-08-20	Large Language Model Driven Recommendation	Anton Korikov et.al.	2408.10946	null
2024-08-20	HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments	Kazi Hasan Ibn Arif et.al.	2408.10945	link
2024-08-20	SysBench: Can Large Language Models Follow System Messages?	Yanzhao Qin et.al.	2408.10943	link
2024-08-20	Proxona: Leveraging LLM-Driven Personas to Enhance Creators’ Understanding of Their Audience	Yoonseo Choi et.al.	2408.10937	null
2024-08-20	LBC: Language-Based-Classifier for Out-Of-Variable Generalization	Kangjun Noh et.al.	2408.10923	link
2024-08-21	BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model	Yeyong Yu et.al.	2408.10903	link
2024-08-20	Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs	John Mendonça et.al.	2408.10902	link
2024-08-19	SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP	Yusuke Hirota et.al.	2408.10202	null
2024-08-19	Demystifying the Communication Characteristics for Distributed Transformer Models	Quentin Anthony et.al.	2408.10197	null
2024-08-19	Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models	Aviv Bick et.al.	2408.10189	null
2024-08-19	LongVILA: Scaling Long-Context Visual Language Models for Long Videos	Fuzhao Xue et.al.	2408.10188	link
2024-08-19	SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	Anke Tang et.al.	2408.10174	link
2024-08-19	Customizing Language Models with Instance-wise LoRA for Sequential Recommendation	Xiaoyu Kong et.al.	2408.10159	link
2024-08-19	Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models	Amey Hengle et.al.	2408.10151	link
2024-08-19	In-Context Learning with Representations: Contextual Generalization of Trained Transformers	Tong Yang et.al.	2408.10147	null
2024-08-19	Instruction Finetuning for Leaderboard Generation from Empirical AI Research	Salomon Kabongo et.al.	2408.10141	null
2024-08-19	Rhyme-aware Chinese lyric generator based on GPT	Yixiao Yuan et.al.	2408.10130	null
2024-08-19	Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track	Feiyu Pan et.al.	2408.10125	null
2024-08-19	Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models	Tianyu Zhang et.al.	2408.10124	link
2024-08-19	Geometry Informed Tokenization of Molecules for Language Model Generation	Xiner Li et.al.	2408.10120	null
2024-08-19	GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization	Ran Liu et.al.	2408.10115	link
2024-08-20	PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities	Yuanjian Xu et.al.	2408.10111	null
2024-08-19	ARMADA: Attribute-Based Multimodal Data Augmentation	Xiaomeng Jin et.al.	2408.10086	null
2024-08-19	Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning	Sriyash Poddar et.al.	2408.10075	null
2024-08-19	FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant	Zhengchao Huang et.al.	2408.10072	link
2024-08-19	Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory	Haoran Li et.al.	2408.10053	null
2024-08-19	Defense Priorities in the Open-Source AI Debate: A Preliminary Assessment	Masao Dahlgren et.al.	2408.10026	null
2024-08-16	SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation	Xinyu Xiong et.al.	2408.08870	link
2024-08-16	PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars	Sumanth Prabhu et.al.	2408.08869	null
2024-08-16	A Hassle-free Algorithm for Private Learning in Practice: Don’t Use Tree Aggregation, Use BLTs	H. Brendan McMahan et.al.	2408.08868	null
2024-08-16	Visual Agents as Fast and Slow Thinkers	Guangyan Sun et.al.	2408.08862	link
2024-08-16	DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models	Eman Ali et.al.	2408.08855	link
2024-08-16	GeoTransformer: Enhancing Urban Forecasting with Geospatial Attention Mechanisms	Yuhao Jia et.al.	2408.08852	null
2024-08-16	ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis	Yubao Zhao et.al.	2408.08849	link
2024-08-16	PsychoLex: Unveiling the Psychological Mind of Large Language Models	Mohammad Amin Abbasi et.al.	2408.08848	null
2024-08-16	FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats	Xuanliang Zhang et.al.	2408.08841	link
2024-08-16	EasyRec: Simple yet Effective Language Models for Recommendation	Xubin Ren et.al.	2408.08821	link
2024-08-16	Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models	Lin Zhao et.al.	2408.08813	null
2024-08-16	Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors	Felipe A. Csaszar et.al.	2408.08811	null
2024-08-16	Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge	Ravi Raju et.al.	2408.08808	null
2024-08-16	CIKMar: A Dual-Encoder Approach to Prompt-Based Reranking in Educational Dialogue Systems	Joanito Agili Lopo et.al.	2408.08805	null
2024-08-16	A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks	Boa Jang et.al.	2408.08790	link
2024-08-16	EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics	Chenwei Wan et.al.	2408.08782	link
2024-08-16	Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions	Chenming Tang et.al.	2408.08780	null
2024-08-16	DAC: Decomposed Automation Correction for Text-to-SQL	Dingzirui Wang et.al.	2408.08779	link
2024-08-16	Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused	Dingwei Chen et.al.	2408.08769	null
2024-08-16	Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM	Wanting Yang et.al.	2408.08765	null
2024-08-15	Can Large Language Models Understand Symbolic Graphics Programs?	Zeju Qiu et.al.	2408.08313	null
2024-08-15	ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws	Ruihang Li et.al.	2408.08310	null
2024-08-15	Towards Flexible Visual Relationship Segmentation	Fangrui Zhu et.al.	2408.08305	null
2024-08-15	Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors	Usman Syed et.al.	2408.08302	null
2024-08-15	VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps	Senthil Hariharan Arul et.al.	2408.08301	null
2024-08-15	HELP: Hierarchical Embeddings-based Log Parsing	Andy Xu et.al.	2408.08300	null
2024-08-15	The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community	Shachar Don-Yehiya et.al.	2408.08291	null
2024-08-15	Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model	Jin Wang et.al.	2408.08282	null
2024-08-15	BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts	Qizhen Zhang et.al.	2408.08274	null
2024-08-15	DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System	Xihong Yang et.al.	2408.08231	null
2024-08-15	RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science	David Farr et.al.	2408.08217	null
2024-08-15	Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models	Javier González et.al.	2408.08210	null
2024-08-15	LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation	Bohao Wang et.al.	2408.08208	null
2024-08-15	Heavy Labels Out! Dataset Distillation with Label Space Lightening	Ruonan Yu et.al.	2408.08201	null
2024-08-15	Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy	Shaojun Xu et.al.	2408.08188	null
2024-08-15	General-purpose Clothes Manipulation with Semantic Keypoints	Yuhong Deng et.al.	2408.08160	null
2024-08-15	EmBARDiment: an Embodied AI Agent for Productivity in XR	Riccardo Bovo et.al.	2408.08158	null
2024-08-15	DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search	Huajian Xin et.al.	2408.08152	link
2024-08-15	P/D-Serve: Serving Disaggregated Large Language Model at Scale	Yibo Jin et.al.	2408.08147	null
2024-08-15	KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning	Kaiqi Zhang et.al.	2408.08146	null
2024-08-14	The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models	Karime Maamari et.al.	2408.07702	null
2024-08-15	Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities	Enneng Yang et.al.	2408.07666	link
2024-08-14	Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models	Yi-Cheng Lin et.al.	2408.07665	link
2024-08-14	Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions	Quan Liu et.al.	2408.07663	link
2024-08-14	WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs	Weijian Xie et.al.	2408.07611	null
2024-08-14	Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey	Hamza Kheddar et.al.	2408.07583	null
2024-08-15	MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark	Minxuan Zhou et.al.	2408.07543	link
2024-08-15	Usefulness of data flow diagrams and large language models for security threat validation: a registered report	Winnie Bahati Mbaka et.al.	2408.07537	null
2024-08-14	Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments	Seungjun Han et.al.	2408.07531	null
2024-08-14	Large Language Models Know What Makes Exemplary Contexts	Quanyu Long et.al.	2408.07505	null
2024-08-14	Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach	Shizhou Zhang et.al.	2408.07500	link
2024-08-14	QirK: Question Answering via Intermediate Representation on Knowledge Graphs	Jan Luca Scheerer et.al.	2408.07494	null
2024-08-14	Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems	Ning Lu et.al.	2408.07482	null
2024-08-14	Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization	Yuxin Jiang et.al.	2408.07471	link
2024-08-14	Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification	Yongcheng Li et.al.	2408.07467	link
2024-08-14	Large Language Models Prompting With Episodic Memory	Dai Do et.al.	2408.07465	null
2024-08-14	From Brazilian Portuguese to European Portuguese	João Sanches et.al.	2408.07457	null
2024-08-14	Fact or Fiction? Improving Fact Verification with Knowledge Graphs through Simplified Subgraph Retrievals	Tobias A. Opsahl et.al.	2408.07453	link
2024-08-15	BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning	Asif Hanif et.al.	2408.07440	link
2024-08-14	Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation	CanYi Liu et.al.	2408.07427	null
2024-08-13	Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents	Kexun Zhang et.al.	2408.07060	null
2024-08-13	LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs	Yushi Bai et.al.	2408.07055	link
2024-08-13	Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models	Chun Jie Chong et.al.	2408.07004	null
2024-08-13	LLMs can Schedule	Henrik Abgaryan et.al.	2408.06993	link
2024-08-13	DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs	Dongyuan Li et.al.	2408.06966	null
2024-08-13	Towards Holistic Disease Risk Prediction using Small Language Models	Liv Björkdahl et.al.	2408.06943	null
2024-08-13	OpenResearcher: Unleashing AI for Accelerated Scientific Research	Yuxiang Zheng et.al.	2408.06941	link
2024-08-13	The advantages of context specific language models: the case of the Erasmian Language Model	João Gonçalves et.al.	2408.06931	link
2024-08-13	Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas	Louis Kwok et.al.	2408.06929	link
2024-08-13	SceneGPT: A Language Model for 3D Scene Understanding	Shivam Chandhok et.al.	2408.06926	null
2024-08-13	Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives	Zhihu Wang et.al.	2408.06904	null
2024-08-13	Leveraging Language Models for Emotion and Behavior Analysis in Education	Kaito Tanaka et.al.	2408.06874	null
2024-08-13	LoRA $^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models	Jia-Chen Zhang et.al.	2408.06854	null
2024-08-13	Causal Agent based on Large Language Model	Kairong Han et.al.	2408.06849	link
2024-08-13	DracoGPT: Extracting Visualization Design Preferences from Large Language Models	Huichen Will Wang et.al.	2408.06845	null
2024-08-13	How Aligned are Human Chart Takeaways and LLM Predictions? A Case Study on Bar Charts with Varying Layouts	Huichen Will Wang et.al.	2408.06837	null
2024-08-13	Efficient Search for Customized Activation Functions with Gradient Descent	Lukas Strack et.al.	2408.06820	link
2024-08-13	MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty	Yongjin Yang et.al.	2408.06816	link
2024-08-13	HLSPilot: LLM-based High-Level Synthesis	Chenwei Xiong et.al.	2408.06810	link
2024-08-13	Layerwise Recurrent Router for Mixture-of-Experts	Zihan Qiu et.al.	2408.06793	link
2024-08-12	FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection	Yufei Huang et.al.	2408.06333	link
2024-08-12	Animate, or Inanimate, That is the Question for Large Language Models	Leonardo Ranaldi et.al.	2408.06332	null
2024-08-12	Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let’s Take TravelPlanner as an Example	Yanan Chen et.al.	2408.06318	null
2024-08-12	Long-Form Answers to Visual Questions from Blind and Low Vision People	Mina Huh et.al.	2408.06303	null
2024-08-12	The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery	Chris Lu et.al.	2408.06292	link
2024-08-12	MovieSum: An Abstractive Summarization Dataset for Movie Screenplays	Rohit Saxena et.al.	2408.06281	link
2024-08-13	Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation	Jieyong Kim et.al.	2408.06276	link
2024-08-12	FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data	Haoran Sun et.al.	2408.06273	link
2024-08-12	A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution	Sampath Rajapaksha et.al.	2408.06272	null
2024-08-12	Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment	Karel D’Oosterlinck et.al.	2408.06266	link
2024-08-12	Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning	Yingjin Song et.al.	2408.06259	null
2024-08-12	On Effects of Steering Latent Representation for Large Language Model Unlearning	Dang Huu-Tien et.al.	2408.06223	link
2024-08-12	Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers	Zhenting Qi et.al.	2408.06195	link
2024-08-12	FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework	Lukas Meyer et.al.	2408.06190	link
2024-08-12	Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting	Halley Young et.al.	2408.06186	null
2024-08-12	OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning	Mushui Liu et.al.	2408.06158	link
2024-08-12	LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library	Tianhao Yu et.al.	2408.06150	null
2024-08-12	Self-Supervised Learning on MeerKAT Wide-Field Continuum Images	Erica Lastufka et.al.	2408.06147	link
2024-08-12	Med42-v2: A Suite of Clinical LLMs	Clément Christophe et.al.	2408.06142	null
2024-08-12	Utilize Transformers for translating Wikipedia category names	Hoang-Thang Ta et.al.	2408.06124	null
2024-08-10	Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions	Michele Miranda et.al.	2408.05212	link
2024-08-09	VITA: Towards Open-Source Interactive Omni Multimodal LLM	Chaoyou Fu et.al.	2408.05211	link
2024-08-09	Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners	Michael Vaccaro Jr et.al.	2408.05204	null
2024-08-09	TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning	Yujie Feng et.al.	2408.05200	link
2024-08-09	ECG-FM: An Open Electrocardiogram Foundation Model	Kaden McKeen et.al.	2408.05178	link
2024-08-09	Weak-Annotation of HAR Datasets using Vision Foundation Models	Marius Bock et.al.	2408.05169	link
2024-08-09	AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset	Pritam Deka et.al.	2408.05149	null
2024-08-09	A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning	Ye Yuan et.al.	2408.05141	null
2024-08-09	Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations	Jasmine Latendresse et.al.	2408.05128	null
2024-08-09	Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media	Petre Breazu et.al.	2408.05126	null
2024-08-09	Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video	Chunggi Lee et.al.	2408.05123	null
2024-08-09	A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?	Xinyu Liu et.al.	2408.05109	link
2024-08-09	Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection	Xincheng Pang et.al.	2408.05107	null
2024-08-09	How Well Do LLMs Identify Cultural Unity in Diversity?	Jialin Li et.al.	2408.05102	link
2024-08-09	Hyperbolic Learning with Multimodal Large Language Models	Paolo Mandica et.al.	2408.05097	null
2024-08-09	Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts	Tingchen Fu et.al.	2408.05094	null
2024-08-09	Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models	Zikai Xie et.al.	2408.05093	link
2024-08-09	Generating novel experimental hypotheses from language models: A case study on cross-dative generalization	Kanishka Misra et.al.	2408.05086	link
2024-08-09	RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records	Sangjoon Park et.al.	2408.05074	null
2024-08-09	Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil	Marcelo Sartori Locatelli et.al.	2408.05035	null
2024-08-08	Better Alignment with Instruction Back-and-Forth Translation	Thao Nguyen et.al.	2408.04614	null
2024-08-08	Code-switching in text and speech reveals information-theoretic audience design	Debasmita Bhattacharya et.al.	2408.04596	null
2024-08-09	Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models	Qirui Jiao et.al.	2408.04594	link
2024-08-08	Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness	Xiaojing Fan et.al.	2408.04585	null
2024-08-08	SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More	Tianrun Chen et.al.	2408.04579	null
2024-08-08	SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals	Haoran Zheng et.al.	2408.04575	null
2024-08-08	Learning Fine-Grained Grounded Citations for Attributed Large Language Models	Lei Huang et.al.	2408.04568	link
2024-08-08	Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models	Yupeng Chang et.al.	2408.04556	link
2024-08-08	Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation	Daniele Rege Cambrin et.al.	2408.04523	link
2024-08-08	Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models	Fabio Pernisi et.al.	2408.04522	null
2024-08-08	What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant	Jonan Richards et.al.	2408.04477	null
2024-08-08	Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate	Yiqun Zhang et.al.	2408.04472	link
2024-08-08	RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents	Zihao Zhu et.al.	2408.04449	link
2024-08-08	Large Language Models for cross-language code clone detection	Micheline Bénédicte Moumoula et.al.	2408.04430	link
2024-08-08	Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models	Philipp Müller et.al.	2408.04420	null
2024-08-08	Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning	Seong-Il Park et.al.	2408.04414	null
2024-08-08	Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers	Moritz Scherer et.al.	2408.04413	null
2024-08-08	Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset	Kentaro Ozeki et.al.	2408.04403	link
2024-08-08	Automated Educational Question Generation at Different Bloom’s Skill Levels using Large Language Models: Strategies and Evaluation	Nicy Scaria et.al.	2408.04394	link
2024-08-08	Open-domain Implicit Format Control for Large Language Model Generation	Yiqun Yao et.al.	2408.04392	link
2024-08-07	How Well Can Vision Language Models See Image Details?	Chenhui Gou et.al.	2408.03940	null
2024-08-07	SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature	Vinícius Di Oliveira et.al.	2408.03936	null
2024-08-07	CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases	Xiangyan Liu et.al.	2408.03910	link
2024-08-07	Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models	Shachi H Kumar et.al.	2408.03907	null
2024-08-07	Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond	Beomseok Lee et.al.	2408.03900	link
2024-08-07	Simplifying Scholarly Abstracts for Accessible Digital Libraries	Haining Wang et.al.	2408.03899	link
2024-08-07	From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems	Leixian Shen et.al.	2408.03876	null
2024-08-07	PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training	Haoran Xu et.al.	2408.03865	null
2024-08-07	GAIA – A Large Language Model for Advanced Power Dispatch	Yuheng Cheng et.al.	2408.03847	null
2024-08-07	MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models	Yuchen Dong et.al.	2408.03841	null
2024-08-07	WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models	Prannaya Gupta et.al.	2408.03837	link
2024-08-07	Target Prompting for Information Extraction with Vision Language Model	Dipankar Medhi et.al.	2408.03834	null
2024-08-07	Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning	Simret Araya Gebreegziabher et.al.	2408.03819	null
2024-08-07	Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring	Zifan Wang et.al.	2408.03811	null
2024-08-07	‘Finance Wizard’ at the FinLLM Challenge Task: Financial Text Summarization	Meisin Lee et.al.	2408.03762	null
2024-08-07	MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video	Xiaoqing Guo et.al.	2408.03761	null
2024-08-07	Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation	Jingjing Xie et.al.	2408.03735	link
2024-08-07	Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks	Zizhang Chen et.al.	2408.03732	null
2024-08-07	A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models	Pengxiang Zhao et.al.	2408.03728	null
2024-08-07	Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction	Benjamin Matthias Ruppik et.al.	2408.03706	null
2024-08-06	CoverBench: A Challenging Benchmark for Complex Claim Verification	Alon Jacovi et.al.	2408.03325	null
2024-08-06	Segment Anything in Medical Images and Videos: Benchmark and Deployment	Jun Ma et.al.	2408.03322	link
2024-08-06	TextIM: Part-aware Interactive Motion Synthesis from Text	Siyuan Fan et.al.	2408.03302	null
2024-08-06	KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models	Ruizhe Zhang et.al.	2408.03297	null
2024-08-06	Biomedical SAM 2: Segment Anything in Biomedical Images and Videos	Zhiling Yan et.al.	2408.03286	link
2024-08-07	StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation	Boxi Cao et.al.	2408.03281	link
2024-08-06	Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments	Angie Boggust et.al.	2408.03274	null
2024-08-06	Synthesizing Text-to-SQL Data from Weak and Strong LLMs	Jiaxi Yang et.al.	2408.03256	null
2024-08-06	Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons	Yifei Wang et.al.	2408.03247	link
2024-08-06	Making Long-Context Language Models Better Multi-Hop Reasoners	Yanyang Li et.al.	2408.03246	link
2024-08-06	Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi	Pranita Deshmukh et.al.	2408.03172	null
2024-08-06	Conditioning LLMs with Emotion in Neural Machine Translation	Charles Brazier et.al.	2408.03150	null
2024-08-06	Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization	Yanghai Zhang et.al.	2408.03149	link
2024-08-06	Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations	Leo Donisch et.al.	2408.03130	null
2024-08-06	Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation	Artur Guimarães et.al.	2408.03127	link
2024-08-06	Evaluating the Translation Performance of Large Language Models Based on Euas-20	Yan Huang et.al.	2408.03119	null
2024-08-06	Topic Modeling with Fine-tuning LLMs and Bag of Sentences	Johannes Schneider et.al.	2408.03099	link
2024-08-07	TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration	Siqi Gu et.al.	2408.03095	null
2024-08-06	500xCompressor: Generalized Prompt Compression for Large Language Models	Zongqian Li et.al.	2408.03094	link
2024-08-06	Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement	Le Yu et.al.	2408.03092	link
2024-08-05	Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining	Dongyang Liu et.al.	2408.02657	link
2024-08-05	Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?	Mohammad Bahrami Karkevandi et.al.	2408.02651	null
2024-08-05	Command-line Obfuscation Detection using Small Language Models	Vojtech Outrata et.al.	2408.02637	null
2024-08-05	SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models	Muxi Diao et.al.	2408.02632	null
2024-08-05	Language Model Can Listen While Speaking	Ziyang Ma et.al.	2408.02622	null
2024-08-05	Progressively Selective Label Enhancement for Language Model Alignment	Biao Liu et.al.	2408.02599	null
2024-08-05	Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection	Sajal Aggarwal et.al.	2408.02595	null
2024-08-05	Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization	Ankan Mullick et.al.	2408.02584	null
2024-08-05	DanModCap: Designing a Danmaku Moderation Tool for Video-Sharing Platforms that Leverages Impact Captions	Siying Hu et.al.	2408.02574	null
2024-08-05	Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information	Yauwai Yim et.al.	2408.02559	null
2024-08-05	Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning	Hao Zhou et.al.	2408.02549	null
2024-08-05	RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation	Daniel Fleischer et.al.	2408.02545	link
2024-08-05	Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions	Xinbei Ma et.al.	2408.02544	link
2024-08-05	Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph	Zhao Kaichen et.al.	2408.02535	null
2024-08-05	Practical Attacks against Black-box Code Completion Engines	Slobodan Jenko et.al.	2408.02509	null
2024-08-05	UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model	Zhaowei Li et.al.	2408.02503	link
2024-08-05	Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation	Aaron Imani et.al.	2408.02502	link
2024-08-05	A First Look at License Compliance Capability of LLMs in Code Generation	Weiwei Xu et.al.	2408.02487	link
2024-08-05	Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection	Ting Lei et.al.	2408.02484	link
2024-08-05	From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future	Haolin Jin et.al.	2408.02479	null
2024-08-02	Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting	Xiangyu Zhao et.al.	2408.01423	null
2024-08-02	Mission Impossible: A Statistical Perspective on Jailbreaking LLMs	Jingtong Su et.al.	2408.01420	null
2024-08-02	DebateQA: Evaluating Question Answering on Debatable Knowledge	Rongwu Xu et.al.	2408.01419	link
2024-08-02	Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs	Yilun Hua et.al.	2408.01417	null
2024-08-02	Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer	Yu Yang et.al.	2408.01402	null
2024-08-02	Coalitions of Large Language Models Increase the Robustness of AI Agents	Prattyush Mangal et.al.	2408.01380	null
2024-08-02	Toward Automatic Relevance Judgment using Vision–Language Models for Image–Text Retrieval Evaluation	Jheng-Hong Yang et.al.	2408.01363	null
2024-08-02	Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs	Peng Ding et.al.	2408.01355	link
2024-08-02	MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code	Kaiwen Ning et.al.	2408.01354	link
2024-08-02	Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks	Anders Giovanni Møller et.al.	2408.01346	null
2024-08-02	MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models	Benno Weck et.al.	2408.01337	link
2024-08-02	A Backbone for Long-Horizon Robot Task Understanding	Xiaoshuai Chen et.al.	2408.01334	null
2024-08-02	FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs Only	He Zhu et.al.	2408.01323	null
2024-08-02	A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks	Jiaqi Wang et.al.	2408.01319	null
2024-08-02	Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models	Ying Zhang et.al.	2408.01308	null
2024-08-02	The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models	Hannah Chen et.al.	2408.01285	null
2024-08-02	RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework	Kunlun Zhu et.al.	2408.01262	link
2024-08-02	The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models	Simone Caldarella et.al.	2408.01228	null
2024-08-02	High-Throughput Phenotyping of Clinical Text Using Large Language Models	Daniel B. Hier et.al.	2408.01214	null
2024-08-02	Misinforming LLMs: vulnerabilities, challenges and opportunities	Bo Zhou et.al.	2408.01168	null
2024-08-01	AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation	Mengkang Hu et.al.	2408.00764	link
2024-08-01	UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model	Xiangyu Fan et.al.	2408.00762	null
2024-08-01	Tamper-Resistant Safeguards for Open-Weight LLMs	Rishub Tamirisa et.al.	2408.00761	link
2024-08-01	Thermal Conductivity Predictions with Foundation Atomistic Models	Balázs Póta et.al.	2408.00755	link
2024-08-01	Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model	Benlin Liu et.al.	2408.00754	null
2024-08-01	Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation	Siyu Jiao et.al.	2408.00744	link
2024-08-01	DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency	Jovan Stojkovic et.al.	2408.00741	null
2024-08-01	Virchow 2: Scaling Self-Supervised Mixed Magnification Models in Pathology	Eric Zimmermann et.al.	2408.00738	null
2024-08-01	Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions	Guangzhi Xiong et.al.	2408.00727	link
2024-08-01	An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models	Yangzhen Wu et.al.	2408.00724	null
2024-08-01	Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities	Sunder Ali Khowaja et.al.	2408.00722	null
2024-08-01	SAM 2: Segment Anything in Images and Videos	Nikhila Ravi et.al.	2408.00714	link
2024-08-01	Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM	Xiaofeng Liu et.al.	2408.00706	null
2024-08-01	Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning	Trapoom Ukarapol et.al.	2408.00690	link
2024-08-01	Can Developers Prompt? A Controlled Experiment for Code Documentation Generation	Hans-Alexander Kruse et.al.	2408.00686	null
2024-08-01	ExpertAF: Expert Actionable Feedback from Video	Kumar Ashutosh et.al.	2408.00672	null
2024-08-01	AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models	Daqin Luo et.al.	2408.00665	link
2024-08-01	Disentangling Dense Embeddings with Sparse Autoencoders	Charles O’Neill et.al.	2408.00657	null
2024-08-02	SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models	Hongjun An et.al.	2408.00655	link
2024-08-01	Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning	Xuri Ge et.al.	2408.00644	null
2024-07-31	Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey	Atsuyuki Miyai et.al.	2407.21794	null
2024-07-31	Vision-Language Model Based Handwriting Verification	Mihir Chauhan et.al.	2407.21788	null
2024-07-31	Large Language Monkeys: Scaling Inference Compute with Repeated Sampling	Bradley Brown et.al.	2407.21787	link
2024-07-31	The Llama 3 Herd of Models	Abhimanyu Dubey et.al.	2407.21783	null
2024-07-31	Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs	Shi Liu et.al.	2407.21771	null
2024-07-31	MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts	Xi Victoria Lin et.al.	2407.21770	null
2024-07-31	ReplanVLM: Replanning Robotic Tasks with Visual Language Models	Aoran Mei et.al.	2407.21762	null
2024-07-31	Learning Video Context as Interleaved Multimodal Sequences	Kevin Qinghong Lin et.al.	2407.21757	link
2024-07-31	A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation	Mothilal Asokan et.al.	2407.21739	null
2024-07-31	Open-Vocabulary Audio-Visual Semantic Segmentation	Ruohao Guo et.al.	2407.21721	null
2024-07-31	Adaptive Retrieval-Augmented Generation for Conversational Systems	Xi Wang et.al.	2407.21712	null
2024-07-31	CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature	Stefan Langer et.al.	2407.21708	null
2024-07-31	TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities	Ming Zhang et.al.	2407.21693	link
2024-07-31	Synth-Empathy: Towards High-Quality Synthetic Empathy Data	Hao Liang et.al.	2407.21669	link
2024-08-01	Defending Jailbreak Attack in VLMs via Cross-modality Information Detector	Yue Xu et.al.	2407.21659	link
2024-07-31	MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment	Anurag Das et.al.	2407.21654	null
2024-07-31	Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation	Xiang Luo et.al.	2407.21633	link
2024-07-31	TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods	Gabriel Loiseau et.al.	2407.21630	link
2024-07-31	LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows	Lukas Teufelberger et.al.	2407.21593	null
2024-07-31	A Performance Study of LLM-Generated Code on Leetcode	Tristan Coignion et.al.	2407.21579	null
2024-07-30	ThinK: Thinner Key Cache by Query-Driven Pruning	Yuhui Xu et.al.	2407.21018	null
2024-07-30	CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning	Yuexi Du et.al.	2407.21011	link
2024-07-30	GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Ali Abdollahi et.al.	2407.21001	link
2024-07-30	MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning	Yupeng Chen et.al.	2407.20999	null
2024-07-30	From Feature Importance to Natural Language Explanations Using LLMs with RAG	Sule Tekkesinoglu et.al.	2407.20990	link
2024-07-30	Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks	Alakesh Kalita et.al.	2407.20970	null
2024-07-30	MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions	Xiaowei Chi et.al.	2407.20962	link
2024-07-30	UniProcessor: A Text-induced Unified Low-level Image Processor	Huiyu Duan et.al.	2407.20928	link
2024-07-30	SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition	Hao Tan et.al.	2407.20920	null
2024-07-30	Automated Review Generation Method Based on Large Language Models	Shican Wu et.al.	2407.20906	link
2024-07-30	Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach	Adam Wojciechowski et.al.	2407.20899	link
2024-07-30	ThinkRepair: Self-Directed Automated Program Repair	Xin Yin et.al.	2407.20898	link
2024-07-30	Effective Black Box Testing of Sentiment Analysis Classification Networks	Parsa Karbasizadeh et.al.	2407.20884	null
2024-07-30	Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification	Boyang Zhang et.al.	2407.20859	null
2024-07-30	Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations	Sarthak Anand et.al.	2407.20856	null
2024-07-30	Large Language Model (LLM)-enabled Graphs in Dynamic Networking	Geng Sun et.al.	2407.20840	null
2024-07-30	How to Measure the Intelligence of Large Language Models?	Nils Körber et.al.	2407.20828	null
2024-07-30	Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning	Norman Di Palo et.al.	2407.20798	null
2024-07-30	Interpretable Pre-Trained Transformers for Heart Time-Series Data	Harry J. Davies et.al.	2407.20775	link
2024-07-30	OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance	Yongqiang Yao et.al.	2407.20761	link
2024-07-29	Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing	Ekaterina Iakovleva et.al.	2407.20232	null
2024-07-29	Improving 2D Feature Representations by 3D-Aware Fine-Tuning	Yuanwen Yue et.al.	2407.20229	null
2024-07-29	FlexAttention for Efficient High-Resolution Vision-Language Models	Junyan Li et.al.	2407.20228	null
2024-07-29	Can Editing LLMs Inject Harm?	Canyu Chen et.al.	2407.20224	null
2024-07-29	SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction	Çağhan Köksal et.al.	2407.20214	null
2024-07-29	QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval	Hongming Tan et.al.	2407.20207	null
2024-07-29	MindSearch: Mimicking Human Minds Elicits Deep AI Searcher	Zehui Chen et.al.	2407.20183	link
2024-07-29	Theia: Distilling Diverse Vision Foundation Models for Robot Learning	Jinghuan Shang et.al.	2407.20179	link
2024-07-29	AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs	Feiyang Kang et.al.	2407.20177	link
2024-07-29	Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning	Xingchen Zeng et.al.	2407.20174	link
2024-07-29	Diffusion Feedback Helps CLIP See Better	Wenxuan Wang et.al.	2407.20171	link
2024-07-29	Language-Conditioned Offline RL for Multi-Robot Navigation	Steven Morad et.al.	2407.20164	null
2024-07-29	rLLM: Relational Table Learning with LLMs	Weichen Li et.al.	2407.20157	link
2024-07-29	ByteCheckpoint: A Unified Checkpointing System for LLM Development	Borui Wan et.al.	2407.20143	null
2024-07-29	Strong Copyright Protection for Language Models via Adaptive Model Fusion	Javier Abad et.al.	2407.20105	null
2024-07-29	Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models	Zhe Li et.al.	2407.20053	null
2024-07-29	Exploring Large Language Models to generate Easy to Read content	Paloma Martínez et.al.	2407.20046	null
2024-07-29	MaskInversion: Localized Embeddings via Optimization of Explainability Maps	Walid Bousselham et.al.	2407.20034	null
2024-07-29	Efficient Training of Large Language Models on Distributed Infrastructures: A Survey	Jiangfei Duan et.al.	2407.20018	null
2024-07-29	Rosetta Statements: Lowering the Barrier for Semantic Parsing and Increasing the Cognitive Interoperability of Knowledge Graphs	Lars Vogt et.al.	2407.20007	null
2024-07-26	Wolf: Captioning Everything with a World Summarization Framework	Boyi Li et.al.	2407.18908	null
2024-07-26	SHIC: Shape-Image Correspondences with no Keypoint Supervision	Aleksandar Shtedritski et.al.	2407.18907	null
2024-07-26	A Flexible and Scalable Approach for Collecting Wildlife Advertisements on the Web	Juliana Barbosa et.al.	2407.18898	link
2024-07-26	Small Molecule Optimization with Large Language Models	Philipp Guevorguian et.al.	2407.18897	link
2024-07-26	Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models	Mutahar Safdar et.al.	2407.18827	null
2024-07-26	Automatic Detection of Moral Values in Music Lyrics	Vjosa Preniqi et.al.	2407.18787	link
2024-07-26	The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs	Aleix Sant et.al.	2407.18786	null
2024-07-26	Foundation Models for the Digital Twin Creation of Cyber-Physical Systems	Shaukat Ali et.al.	2407.18779	null
2024-07-26	TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals	Kevin Kliimask et.al.	2407.18764	null
2024-07-26	Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery	Yuni Susanti et.al.	2407.18752	link
2024-07-26	Towards Effective and Efficient Continual Pre-training of Large Language Models	Jie Chen et.al.	2407.18743	null
2024-07-26	Towards Generalized Offensive Language Identification	Alphaeus Dmonte et.al.	2407.18738	null
2024-07-26	LLASP: Fine-tuning Large Language Models for Answer Set Programming	Erica Coppolillo et.al.	2407.18723	null
2024-07-26	Neurosymbolic AI for Enhancing Instructability in Generative AI	Amit Sheth et.al.	2407.18722	null
2024-07-26	Cluster-norm for Unsupervised Probing of Knowledge	Walter Laurito et.al.	2407.18712	link
2024-07-26	Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation	Esteban Garces Arias et.al.	2407.18698	link
2024-07-26	Collaborative Evolving Strategy for Automatic Data-Centric Development	Xu Yang et.al.	2407.18690	null
2024-07-26	The BIAS Detection Framework: Bias Detection in Word Embeddings and Language Models for European Languages	Alexandre Puttick et.al.	2407.18689	link
2024-07-26	Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift	Seongho Son et.al.	2407.18676	null
2024-07-26	Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models	Xiang Shi et.al.	2407.18626	link
2024-07-25	Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning	Tianduo Wang et.al.	2407.18248	link
2024-07-25	LoRA-Pro: Are Low-Rank Adapters Properly Optimized?	Zhengbo Wang et.al.	2407.18242	link
2024-07-25	Recursive Introspection: Teaching Language Model Agents How to Self-Improve	Yuxiao Qu et.al.	2407.18219	null
2024-07-26	Exploring Scaling Trends in LLM Robustness	Nikolaus Howe et.al.	2407.18213	link
2024-07-25	AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction	Chunan Liu et.al.	2407.18184	link
2024-07-25	Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning	Sindhura Kommu et.al.	2407.18181	null
2024-07-25	Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models	Sanae Lotfi et.al.	2407.18158	null
2024-07-25	$\mathbb{X}$ -Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs	Vlad Sobal et.al.	2407.18134	null
2024-07-25	Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Fakhraddin Alwajih et.al.	2407.18129	null
2024-07-25	Efficient Inference of Vision Instruction-Following Models with Elastic Cache	Zuyan Liu et.al.	2407.18121	link
2024-07-25	Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping	Jack Breen et.al.	2407.18105	link
2024-07-25	Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow	Tian Guo et.al.	2407.18103	null
2024-07-25	PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization	Christopher Clarke et.al.	2407.18078	link
2024-07-25	C2P: Featuring Large Language Models with Causal Reasoning	Abdolmahdi Bagheri et.al.	2407.18069	null
2024-07-25	ComPeer: A Generative Conversational Agent for Proactive Peer Support	Tianjian Liu et.al.	2407.18064	link
2024-07-25	Audio Entailment: Assessing Deductive Reasoning for Audio Understanding	Soham Deshmukh et.al.	2407.18062	link
2024-07-25	Difficulty Estimation and Simplification of French Text Using LLMs	Henri Jamet et.al.	2407.18061	null
2024-07-25	The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation	Eric Yang et.al.	2407.18044	null
2024-07-25	RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models	Haoyu Chen et.al.	2407.18035	null
2024-07-25	GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy	Jan Batzner et.al.	2407.18008	null
2024-07-24	I Could’ve Asked That: Reformulating Unanswerable Questions	Wenting Zhao et.al.	2407.17469	link
2024-07-24	WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries	Wenting Zhao et.al.	2407.17468	null
2024-07-24	CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models	Jiawei Gu et.al.	2407.17467	null
2024-07-24	$VILA^2$ : VILA Augmented VILA	Yunhao Fang et.al.	2407.17453	null
2024-07-24	Fluent Student-Teacher Redteaming	T. Ben Thompson et.al.	2407.17447	link
2024-07-24	Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?	Michael-Andrei Panaitescu-Liess et.al.	2407.17417	null
2024-07-24	(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork	Tianjin Huang et.al.	2407.17412	null
2024-07-24	Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models	Yida Zhao et.al.	2407.17406	link
2024-07-24	Grammar-based Game Description Generation using Large Language Models	Tsunehiko Tanaka et.al.	2407.17404	link
2024-07-24	3D Question Answering for City Scene Understanding	Penglei Sun et.al.	2407.17398	null
2024-07-24	PERSONA: A Reproducible Testbed for Pluralistic Alignment	Louis Castricato et.al.	2407.17387	null
2024-07-24	A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance	Amirreza Naziri et.al.	2407.17383	null
2024-07-24	MMRA: A Benchmark for Multi-granularity Multi-image Relational Association	Siwei Wu et.al.	2407.17379	link
2024-07-24	ViPer: Visual Personalization of Generative Models via Individual Preference Learning	Sogand Salehi et.al.	2407.17365	null
2024-07-24	Gradient-based inference of abstract task representations for generalization in neural networks	Ali Hummos et.al.	2407.17356	null
2024-07-24	Scalify: scale propagation for efficient low-precision LLM training	Paul Balança et.al.	2407.17353	link
2024-07-24	Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching	Yuyang Ding et.al.	2407.17349	link
2024-07-24	DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation	Qian Feng et.al.	2407.17348	null
2024-07-24	Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition	Ke Bao et.al.	2407.17344	null
2024-07-24	How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?	Leo Yu-Ho Lo et.al.	2407.17291	null
2024-07-23	PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects	Junyi Li et.al.	2407.16696	link
2024-07-23	Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack	Xiaoyue Xu et.al.	2407.16695	link
2024-07-23	Can Large Language Models Automatically Jailbreak GPT-4V?	Yuanwei Wu et.al.	2407.16686	null
2024-07-23	SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation	Pengfei Chen et.al.	2407.16682	null
2024-07-23	RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent	Huiyu Xu et.al.	2407.16667	null
2024-07-23	Course-Correction: Safety Alignment Using Synthetic Preferences	Rongwu Xu et.al.	2407.16637	link
2024-07-23	Lawma: The Power of Specialization for Legal Tasks	Ricardo Dominguez-Olmedo et.al.	2407.16615	null
2024-07-23	Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?	Jonathan Hayase et.al.	2407.16607	link
2024-07-23	Shared Imagination: LLMs Hallucinate Alike	Yilun Zhou et.al.	2407.16604	null
2024-07-23	A Comparative Study on Patient Language across Therapeutic Domains for Effective Patient Voice Classification in Online Health Discussions	Giorgos Lysandrou et.al.	2407.16593	null
2024-07-23	Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs	Yifan Xia et.al.	2407.16576	null
2024-07-23	TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback	Eunseop Yoon et.al.	2407.16574	link
2024-07-23	Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models	Ioana Buhnila et.al.	2407.16565	link
2024-07-23	Patched RTC: evaluating LLMs for diverse software development tasks	Asankhaya Sharma et.al.	2407.16557	link
2024-07-24	MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues	Liyun Zhang et.al.	2407.16552	null
2024-07-23	Quantifying the Role of Textual Predictability in Automatic Speech Recognition	Sean Robertson et.al.	2407.16537	null
2024-07-23	Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models	Aristeidis Panos et.al.	2407.16526	null
2024-07-23	AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game	Yizhou Chi et.al.	2407.16521	link
2024-07-23	Language-Based Security for Low-Level MPC	Christian Skalka et.al.	2407.16504	null
2024-07-23	Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models	Kenza Benkirane et.al.	2407.16470	link
2024-07-22	AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description	Junyu Xie et.al.	2407.15850	link
2024-07-22	LLMmap: Fingerprinting For Large Language Models	Dario Pasquini et.al.	2407.15847	link
2024-07-22	SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models	Mingze Xu et.al.	2407.15841	link
2024-07-22	MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity	Yangzhou Liu et.al.	2407.15838	link
2024-07-22	dMel: Speech Tokenization made Simple	He Bai et.al.	2407.15835	link
2024-07-22	J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling	Wataru Nakata et.al.	2407.15828	null
2024-07-22	Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight	Ziyuan Huang et.al.	2407.15819	null
2024-07-22	Perceptions of Linguistic Uncertainty by Language Models and Humans	Catarina G Belem et.al.	2407.15814	link
2024-07-22	AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection	Yunkang Cao et.al.	2407.15795	link
2024-07-22	CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning	Emanuele Frascaroli et.al.	2407.15793	link
2024-07-22	Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach	Rian Dolphin et.al.	2407.15788	null
2024-07-22	Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels	Zhuorui Ye et.al.	2407.15786	null
2024-07-22	Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning	Kaiwen Wang et.al.	2407.15762	null
2024-07-22	MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation	Marco Simoni et.al.	2407.15748	null
2024-07-22	OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context	Steffen Kleinle et.al.	2407.15736	null
2024-07-22	TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON	John Chong Min Tan et.al.	2407.15734	link
2024-07-22	Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders	Laura Niss et.al.	2407.15731	null
2024-07-22	SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection	Dimitrios Kollias et.al.	2407.15728	null
2024-07-22	DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design	Zhi Hao Luo et.al.	2407.15723	link
2024-07-22	Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability	Zhuoyan Xu et.al.	2407.15720	link
2024-07-19	Internal Consistency and Self-Feedback in Large Language Models: A Survey	Xun Liang et.al.	2407.14507	link
2024-07-19	On Pre-training of Multimodal Language Models Customized for Chart Understanding	Wan-Cyuan Fan et.al.	2407.14506	null
2024-07-19	PD-TPE: Parallel Decoder with Text-guided Position Encoding for 3D Visual Grounding	Chenshu Hou et.al.	2407.14491	null
2024-07-19	Evaluating the Reliability of Self-Explanations in Large Language Models	Korbinian Randl et.al.	2407.14487	link
2024-07-19	Data-Centric Human Preference Optimization with Rationales	Hoang Anh Just et.al.	2407.14477	link
2024-07-19	Contrastive Learning with Counterfactual Explanations for Radiology Report Generation	Mingjie Li et.al.	2407.14474	null
2024-07-19	Check-Eval: A Checklist-based Approach for Evaluating Text Quality	Jayr Pereira et.al.	2407.14467	null
2024-07-19	Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier	Zachary Wojtowicz et.al.	2407.14452	null
2024-07-19	Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding	Renshan Zhang et.al.	2407.14439	link
2024-07-19	Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders	Senthooran Rajamanoharan et.al.	2407.14435	null
2024-07-19	Mixture of Experts with Mixture of Precisions for Tuning Quality of Service	HamidReza Imani et.al.	2407.14417	null
2024-07-19	System-1.x: Learning to Balance Fast and Slow Planning with Language Models	Swarnadeep Saha et.al.	2407.14414	link
2024-07-19	DEAL: Disentangle and Localize Concept-level Explanations for VLMs	Tang Li et.al.	2407.14412	link
2024-07-19	The Vision of Autonomic Computing: Can LLMs Make It a Reality?	Zhiyang Zhang et.al.	2407.14402	null
2024-07-19	Frontiers of Deep Learning: From Novel Application to Real-World Deployment	Rui Xie et.al.	2407.14386	null
2024-07-19	Open Artificial Knowledge	Vadim Borisov et.al.	2407.14371	null
2024-07-19	Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models	Xuenan Xu et.al.	2407.14355	link
2024-07-19	Improving Retrieval in Sponsored Search by Leveraging Query Context Signals	Akash Kumar Mohankumar et.al.	2407.14346	null
2024-07-19	LLMs left, right, and center: Assessing GPT’s capabilities to label political bias from web domains	Raphael Hernandes et.al.	2407.14344	null
2024-07-19	Multimodal Misinformation Detection using Large Vision-Language Models	Sahar Tahmasebi et.al.	2407.14321	null
2024-07-18	Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data	Charles Jin et.al.	2407.13765	null
2024-07-18	SegPoint: Segment Any Point Cloud via Large Language Model	Shuting He et.al.	2407.13761	null
2024-07-18	Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models	Zhuo Chen et.al.	2407.13757	null
2024-07-18	CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications	Mirza Masfiqur Rahman et.al.	2407.13742	null
2024-07-18	Baba Is AI: Break the Rules to Beat the Benchmark	Nathan Cloos et.al.	2407.13729	null
2024-07-18	CoDefeater: Using LLMs To Find Defeaters in Assurance Cases	Usman Gohar et.al.	2407.13717	link
2024-07-18	Understanding Reference Policies in Direct Preference Optimization	Yixin Liu et.al.	2407.13709	link
2024-07-18	A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice	Shaina Raza et.al.	2407.13699	link
2024-07-18	Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation	Yotam Perlitz et.al.	2407.13696	link
2024-07-18	Prover-Verifier Games improve legibility of LLM outputs	Jan Hendrik Kirchner et.al.	2407.13692	null
2024-07-18	Shaded Route Planning Using Active Segmentation and Identification of Satellite Images	Longchao Da et.al.	2407.13689	null
2024-07-18	FuLG: 150B Romanian Corpus for Language Model Pretraining	Vlad-Andrei Bădoiu et.al.	2407.13657	null
2024-07-18	COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization	Skyler Grandel et.al.	2407.13648	null
2024-07-18	Weak-to-Strong Reasoning	Yuqing Yang et.al.	2407.13647	link
2024-07-18	Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies	Chaofan Tao et.al.	2407.13623	link
2024-07-18	KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration	Youfu Yan et.al.	2407.13598	null
2024-07-18	PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks	Vishal Pallagani et.al.	2407.13597	null
2024-07-18	EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension	Wei Zhang et.al.	2407.13596	link
2024-07-18	Robust Calibration of Large Vision-Language Adapters	Balamurali Murugesan et.al.	2407.13588	link
2024-07-18	Towards Zero-Shot Multimodal Machine Translation	Matthieu Futeral et.al.	2407.13579	link
2024-07-17	LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models	Kaichen Zhang et.al.	2407.12772	link
2024-07-17	EchoSight: Advancing Visual-Language Models with Wiki Knowledge	Yibin Yan et.al.	2407.12735	null
2024-07-17	NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model	Zhongqun Zhang et.al.	2407.12727	null
2024-07-17	Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?	Ben Yao et.al.	2407.12725	null
2024-07-17	The Future of Learning: Large Language Models through the Lens of Students	He Zhang et.al.	2407.12723	null
2024-07-17	MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models	Leyang Shen et.al.	2407.12709	link
2024-07-17	Subgraph-Aware Training of Text-based Methods for Knowledge Graph Completion	Youmin Ko et.al.	2407.12703	link
2024-07-17	Patch-Level Training for Large Language Models	Chenze Shao et.al.	2407.12665	link
2024-07-17	Zero-shot Text-guided Infinite Image Synthesis with LLM guidance	Soyeong Kwon et.al.	2407.12642	null
2024-07-17	Domain-specific or Uncertainty-aware models: Does it really make a difference for biomedical text classification?	Aman Sinha et.al.	2407.12626	null
2024-07-17	Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences	Claudio Pinhanez et.al.	2407.12620	null
2024-07-17	AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism	William Brannon et.al.	2407.12613	link
2024-07-17	VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding	Ofir Abramovich et.al.	2407.12594	link
2024-07-18	Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks	Antoni Kowalczuk et.al.	2407.12588	link
2024-07-17	E5-V: Universal Embeddings with Multimodal Large Language Models	Ting Jiang et.al.	2407.12580	link
2024-07-17	Audio Conditioning for Music Generation via Discrete Bottleneck Features	Simon Rouard et.al.	2407.12563	null
2024-07-17	Conspiracy theories and where to find them on TikTok	Francesco Corso et.al.	2407.12545	null
2024-07-17	Abstraction Alignment: Comparing Model and Human Conceptual Relationships	Angie Boggust et.al.	2407.12543	link
2024-07-17	Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models	Xihe Qiu et.al.	2407.12532	null
2024-07-17	Crafting the Path: Robust Query Rewriting for Information Retrieval	Ingeol Baek et.al.	2407.12529	null
2024-07-16	UrbanWorld: An Urban World Model for 3D City Generation	Yu Shang et.al.	2407.11965	link
2024-07-16	NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?	Mo Li et.al.	2407.11963	link
2024-07-16	Code Documentation and Analysis to Secure Software Development	Paul Attie et.al.	2407.11934	null
2024-07-16	What’s Wrong? Refining Meeting Summaries with LLM Feedback	Frederic Kirstein et.al.	2407.11919	null
2024-07-16	GraphFM: A Scalable Framework for Multi-Graph Pretraining	Divyansha Lachi et.al.	2407.11907	null
2024-07-16	Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads	Aritra Dhar et.al.	2407.11888	null
2024-07-16	Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection	Gaetan Lopez Latouche et.al.	2407.11854	null
2024-07-16	Schema Matching with Large Language Models: an Experimental Study	Marcel Parciak et.al.	2407.11852	link
2024-07-16	LoFTI: Localization and Factuality Transfer to Indian Locales	Sona Elza Simon et.al.	2407.11833	link
2024-07-16	GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text	Kyle Hamilton et.al.	2407.11827	null
2024-07-16	PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	Branden Butler et.al.	2407.11798	null
2024-07-16	Large Language Models as Misleading Assistants in Conversation	Betty Li Hou et.al.	2407.11789	null
2024-07-16	SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models	Xinbo Wu et.al.	2407.11780	null
2024-07-16	Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text	Seyedeh Fatemeh Ebrahimi et.al.	2407.11774	null
2024-07-16	Educational Personalized Learning Path Planning with Large Language Models	Chee Ng et.al.	2407.11773	null
2024-07-16	XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach	Truong Thanh Hung Nguyen et.al.	2407.11771	link
2024-07-16	Robust Utility-Preserving Text Anonymization Based on Large Language Models	Tianyu Yang et.al.	2407.11770	link
2024-07-16	Vectoring Languages	Joseph Chen et.al.	2407.11766	null
2024-07-16	Exploring Quantization for Efficient Pre-Training of Transformer Language Models	Kamran Chitsaz et.al.	2407.11722	link
2024-07-16	Harnessing Large Language Models for Multimodal Product Bundling	Xiaohao Liu et.al.	2407.11712	link
2024-07-15	VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation	Bocheng Zou et.al.	2407.10972	link
2024-07-15	Q-Sparse: All Large Language Models can be Fully Sparsely-Activated	Hongyu Wang et.al.	2407.10969	null
2024-07-15	Fast Matrix Multiplications for Lookup Table-Quantized LLMs	Han Guo et.al.	2407.10960	link
2024-07-15	Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?	Ruisheng Cao et.al.	2407.10956	link
2024-07-15	MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models	Chengguang Gan et.al.	2407.10953	null
2024-07-15	Can Textual Semantics Mitigate Sounding Object Segmentation Preference?	Yaoting Wang et.al.	2407.10947	link
2024-07-15	Learning from Naturally Occurring Feedback	Shachar Don-Yehiya et.al.	2407.10944	link
2024-07-15	GRUtopia: Dream General Robots in a City at Scale	Hanqing Wang et.al.	2407.10943	link
2024-07-15	Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together	Dilara Soylu et.al.	2407.10930	null
2024-07-15	Benchmarking Vision Language Models for Cultural Understanding	Shravan Nayak et.al.	2407.10920	null
2024-07-15	FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets	Xiaohui Victor Li et.al.	2407.10909	link
2024-07-15	Hey, That’s My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique	Mark Russinovich et.al.	2407.10887	null
2024-07-15	SLIP: Securing LLMs IP Using Weights Decomposition	Yehonathan Refael et.al.	2407.10886	null
2024-07-15	Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models	Rui Zhang et.al.	2407.10873	null
2024-07-15	GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM	Keshav Bimbraw et.al.	2407.10870	null
2024-07-15	Physics-Inspired Generative Models in Medical Imaging: A Review	Dennis Hein et.al.	2407.10856	null
2024-07-15	Weighted Grouped Query Attention in Transformers	Sai Sena Chinnakonduru et.al.	2407.10855	null
2024-07-15	An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases	Dylan Bouchard et.al.	2407.10853	link
2024-07-15	MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs	Quang H. Nguyen et.al.	2407.10834	link
2024-07-15	BiasScanner: Automatic Detection and Classification of News Bias to Strengthen Democracy	Tim Menzner et.al.	2407.10829	null
2024-07-12	FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3	Georgios Makridis et.al.	2407.09467	null
2024-07-12	Human-like Episodic Memory for Infinite Context LLMs	Zafeirios Fountas et.al.	2407.09450	link
2024-07-12	ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts	Amelia F. Hardy et.al.	2407.09447	link
2024-07-12	MUSCLE: A Model Update Strategy for Compatible LLM Evolution	Jessica Echterhoff et.al.	2407.09435	null
2024-07-12	A Perspective on Foundation Models for the Electric Power Grid	Hendrik F. Hamann et.al.	2407.09434	null
2024-07-12	Open (Clinical) LLMs are Sensitive to Instruction Phrasings	Alberto Mario Ceballos Arroyo et.al.	2407.09429	link
2024-07-12	TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models	Hang Zou et.al.	2407.09424	null
2024-07-12	Mitigating Entity-Level Hallucination in Large Language Models	Weihang Su et.al.	2407.09417	link
2024-07-12	SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers	Shraman Pramanick et.al.	2407.09413	link
2024-07-12	Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce	Zhe Lin et.al.	2407.09395	null
2024-07-12	PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents	Saber Zerhoudi et.al.	2407.09394	link
2024-07-12	GAVEL: Generating Games Via Evolution and Language Models	Graham Todd et.al.	2407.09388	link
2024-07-12	Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text	Lucio La Cava et.al.	2407.09364	null
2024-07-12	Good Intentions, Risky Inventions: A Method for Assessing the Risks and Benefits of AI in Mobile and Wearable Uses	Marios Constantinides et.al.	2407.09322	link
2024-07-12	Scalability of Bayesian Network Structure Elicitation with Large Language Models: a Novel Methodology and Comparative Analysis	Nikolay Babakov et.al.	2407.09311	null
2024-07-12	Transformer Layers as Painters	Qi Sun et.al.	2407.09298	link
2024-07-12	Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study	Yulong Yang et.al.	2407.09295	null
2024-07-12	CEIPA: Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models	Dong Shu et.al.	2407.09292	null
2024-07-12	Structuring Authenticity Assessments on Historical Documents using LLMs	Andrea Schimmenti et.al.	2407.09290	null
2024-07-12	WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation	Robin Schön et.al.	2407.09288	link
2024-07-11	MAVIS: Mathematical Visual Instruction Tuning	Renrui Zhang et.al.	2407.08739	link
2024-07-11	Real-Time Anomaly Detection and Reactive Planning with Large Language Models	Rohan Sinha et.al.	2407.08735	null
2024-07-11	Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist	Zihao Zhou et.al.	2407.08733	null
2024-07-11	A Taxonomy for Data Contamination in Large Language Models	Medha Palavalli et.al.	2407.08716	null
2024-07-11	GTA: A Benchmark for General Tool Agents	Jize Wang et.al.	2407.08713	link
2024-07-11	eyeballvul: a future-proof benchmark for vulnerability detection in the wild	Timothee Chauvin et.al.	2407.08708	link
2024-07-11	Extracting Training Data from Document-Based VQA Models	Francesco Pinto et.al.	2407.08707	null
2024-07-11	HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models	Runhui Huang et.al.	2407.08706	null
2024-07-11	Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models	Zhening Xing et.al.	2407.08701	null
2024-07-11	Mitigating Catastrophic Forgetting in Language Transfer via Model Merging	Anton Alexandrov et.al.	2407.08699	null
2024-07-11	Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight	Zhiqiang Xie et.al.	2407.08694	null
2024-07-11	Robotic Control via Embodied Chain-of-Thought Reasoning	Zawalski Michał et.al.	2407.08693	null
2024-07-11	SEED-Story: Multimodal Long Story Generation with Large Language Model	Shuai Yang et.al.	2407.08683	link
2024-07-11	NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning	Yi Zhang et.al.	2407.08672	null
2024-07-11	Uncertainty Estimation of Large Language Models in Medical Question Answering	Jiaxin Wu et.al.	2407.08662	null
2024-07-11	Towards Building Specialized Generalist AI with System 1 and System 2 Fusion	Kaiyan Zhang et.al.	2407.08642	null
2024-07-11	$β$-DPO: Direct Preference Optimization with Dynamic $β$	Junkang Wu et.al.	2407.08639	link
2024-07-11	RoboMorph: Evolving Robot Morphology using Large Language Models	Kevin Qiu et.al.	2407.08626	null
2024-07-11	Tamil Language Computing: the Present and the Future	Kengatharaiyer Sarveswaran et.al.	2407.08618	null
2024-07-11	FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision	Jay Shah et.al.	2407.08608	link
2024-07-10	Training on the Test Task Confounds Evaluation and Emergence	Ricardo Dominguez-Olmedo et.al.	2407.07890	link
2024-07-10	Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization	Junkang Wu et.al.	2407.07880	link
2024-07-11	Toto: Time Series Optimized Transformer for Observability	Ben Cohen et.al.	2407.07874	null
2024-07-10	FACTS About Building Retrieval Augmented Generation-based Chatbots	Rama Akkiraju et.al.	2407.07858	null
2024-07-10	OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training	Sami Jaghouar et.al.	2407.07852	link
2024-07-10	Natural Language Mechanisms via Self-Resolution with Foundation Models	Nicolas Della Penna et.al.	2407.07845	null
2024-07-10	Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective	Shengjia Chen et.al.	2407.07841	link
2024-07-10	Decompose and Compare Consistency: Measuring VLMs’ Answer Reliability via Task-Decomposition Consistency Comparison	Qian Yang et.al.	2407.07840	null
2024-07-10	Transformer Alignment in Large Language Models	Murdock Aubry et.al.	2407.07810	null
2024-07-11	AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning	Jongsuk Kim et.al.	2407.07801	link
2024-07-10	Attribute or Abstain: Large Language Models as Long Document Assistants	Jan Buchmann et.al.	2407.07799	link
2024-07-11	Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard	Oguzhan Topsakal et.al.	2407.07796	link
2024-07-10	Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities	Tianjie Ju et.al.	2407.07791	link
2024-07-10	WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment	Jiefu Ou et.al.	2407.07778	null
2024-07-10	Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs	Hao-Tien Lewis Chiang et.al.	2407.07775	null
2024-07-10	Can ChatGPT Pass a Theory of Computing Course?	Matei A. Golesteanu et.al.	2407.07757	null
2024-07-10	Fine-Tuning Large Language Models with User-Level Differential Privacy	Zachary Charles et.al.	2407.07737	null
2024-07-10	PaliGemma: A versatile 3B VLM for transfer	Lucas Beyer et.al.	2407.07726	link
2024-07-10	Why should we ever automate moral decision making?	Vincent Conitzer et.al.	2407.07671	null
2024-07-10	A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability	Ting Fang Tan et.al.	2407.07666	null
2024-07-09	AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning	Jiaxi Cui et.al.	2407.07094	link
2024-07-09	FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation	Liqun Ma et.al.	2407.07093	link
2024-07-09	CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation	Tong Chen et.al.	2407.07087	link
2024-07-09	Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models	Logan Cross et.al.	2407.07086	link
2024-07-09	Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities	Shaltiel Shmidman et.al.	2407.07080	null
2024-07-09	Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps	Yung-Sung Chuang et.al.	2407.07071	link
2024-07-09	Prompting Techniques for Secure Code Generation: A Systematic Investigation	Catherine Tony et.al.	2407.07064	null
2024-07-09	Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence	Weize Chen et.al.	2407.07061	link
2024-07-09	Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model	Wenqi Zhang et.al.	2407.07053	link
2024-07-09	ProtoSAM – One Shot Medical Image Segmentation With Foundational Models	Lev Ayzenberg et.al.	2407.07042	link
2024-07-09	Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models	Yue Zhang et.al.	2407.07035	link
2024-07-09	Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization	Jeongseok Hyun et.al.	2407.07024	link
2024-07-09	Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies	Inwon Kang et.al.	2407.07019	null
2024-07-09	End-To-End Causal Effect Estimation from Unstructured Natural Language Data	Nikita Dhawan et.al.	2407.07018	null
2024-07-09	Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures?	Zhilong Song et.al.	2407.07016	null
2024-07-09	Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning	J. Crosbie et.al.	2407.07011	null
2024-07-09	Metron: Holistic Performance Evaluation Framework for LLM Inference Systems	Amey Agrawal et.al.	2407.07000	link
2024-07-09	Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective	Yu-An Liu et.al.	2407.06992	link
2024-07-09	Segment-Based Interactive Machine Translation for Pre-trained Models	Angel Navarro et.al.	2407.06990	null
2024-07-09	Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models	Yi-Cheng Lin et.al.	2407.06957	link
2024-07-08	Multi-Object Hallucination in Vision-Language Models	Xuweiyi Chen et.al.	2407.06192	link
2024-07-08	4D Contrastive Superflows are Dense 3D Representation Learners	Xiang Xu et.al.	2407.06190	link
2024-07-08	Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision	Orr Zohar et.al.	2407.06189	link
2024-07-08	CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation	Xinying Guo et.al.	2407.06188	null
2024-07-08	JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation	Yu Zeng et.al.	2407.06187	null
2024-07-08	Vision-Language Models under Cultural and Inclusive Considerations	Antonia Karamolegkou et.al.	2407.06177	null
2024-07-08	On Speeding Up Language Model Evaluation	Jin Peng Zhou et.al.	2407.06172	null
2024-07-08	What’s Wrong with Your Code Generated by Large Language Models? An Extensive Study	Shihan Dou et.al.	2407.06153	null
2024-07-09	Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks	Lukas Netz et.al.	2407.06146	null
2024-07-08	ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation	Ethan Chern et.al.	2407.06135	link
2024-07-08	Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization	Hannah K. Bako et.al.	2407.06129	link
2024-07-08	Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities	Avinash Anand et.al.	2407.06125	null
2024-07-08	Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning	Yadong Zhang et.al.	2407.06112	null
2024-07-08	Artificial Intuition: Efficient Classification of Scientific Abstracts	Harsh Sakhrani et.al.	2407.06093	null
2024-07-08	Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models	Jinliang Lu et.al.	2407.06089	null
2024-07-08	From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty	Maor Ivgi et.al.	2407.06071	link
2024-07-08	Variational Best-of-N Alignment	Afra Amini et.al.	2407.06057	null
2024-07-08	MST5 – Multilingual Question Answering over Knowledge Graphs	Nikit Srivastava et.al.	2407.06041	link
2024-07-08	PAS: Data-Efficient Plug-and-Play Prompt Augmentation System	Miao Zheng et.al.	2407.06027	null
2024-07-08	iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement	Aoyu Pang et.al.	2407.06025	link
2024-07-05	Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs	Rudolf Laine et.al.	2407.04694	link
2024-07-05	ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models	Yuzhe Gu et.al.	2407.04693	link
2024-07-05	Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge	Yuanze Lin et.al.	2407.04681	null
2024-07-05	Lost in Translation: The Algorithmic Gap Between LMs and the Brain	Tommaso Tosato et.al.	2407.04680	null
2024-07-05	Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition	Ye Bai et.al.	2407.04675	null
2024-07-05	Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement	Yongji Wu et.al.	2407.04656	null
2024-07-05	Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models	Bolaji Yusuf et.al.	2407.04641	null
2024-07-05	Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework	Reza Averly et.al.	2407.04629	null
2024-07-05	On scalable oversight with weak LLMs judging strong LLMs	Zachary Kenton et.al.	2407.04622	null
2024-07-05	CountGD: Multi-Modal Open-World Counting	Niki Amini-Naieni et.al.	2407.04619	null
2024-07-05	ARM: Efficient Guided Decoding with Autoregressive Reward Models	Sergey Troshin et.al.	2407.04615	null
2024-07-05	AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation	Yuhan Zhu et.al.	2407.04603	link
2024-07-05	Written Term Detection Improves Spoken Term Detection	Bolaji Yusuf et.al.	2407.04601	link
2024-07-05	Testing learning hypotheses using neural networks by manipulating learning data	Cara Su-Yi Leong et.al.	2407.04593	null
2024-07-05	Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions	Shumaila Javaid et.al.	2407.04581	null
2024-07-05	VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models	Hang Gao et.al.	2407.04573	null
2024-07-05	Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition	Aditya K Surikuchi et.al.	2407.04559	link
2024-07-05	Spontaneous Reward Hacking in Iterative Self-Refinement	Jane Pan et.al.	2407.04549	null
2024-07-05	PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts	Ana-Cristina Rogoz et.al.	2407.04541	link
2024-07-05	GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning	Aleksander Ficek et.al.	2407.04528	null
2024-07-03	Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages	Max Zuo et.al.	2407.03321	link
2024-07-03	InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Pan Zhang et.al.	2407.03320	link
2024-07-03	BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations	Zhantao Yang et.al.	2407.03314	null
2024-07-03	Universal Length Generalization with Turing Programs	Kaiying Hou et.al.	2407.03310	null
2024-07-03	Large Language Models for JSON Schema Discovery	Michael J. Mior et.al.	2407.03286	null
2024-07-03	LLM Internal States Reveal Hallucination Risk Faced With a Query	Ziwei Ji et.al.	2407.03282	link
2024-07-03	STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data	Kheir Eddine Daouadi et.al.	2407.03253	null
2024-07-03	Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning	Zhili Shen et.al.	2407.03227	null
2024-07-03	How Does Quantization Affect Multilingual LLMs?	Kelly Marchisio et.al.	2407.03211	null
2024-07-03	TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts	Ruida Wang et.al.	2407.03203	link
2024-07-03	Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models	Haritz Puerto et.al.	2407.03181	link
2024-07-03	Investigating Decoder-only Large Language Models for Speech-to-text Translation	Chao-Wei Huang et.al.	2407.03169	null
2024-07-03	SOS! Soft Prompt Attack Against Open-Source Large Language Models	Ziqing Yang et.al.	2407.03160	null
2024-07-03	Let the Code LLM Edit Itself When You Edit the Code	Zhenyu He et.al.	2407.03157	null
2024-07-03	Reinforcement Learning for Sequence Design Leveraging Protein Language Models	Jithendaraa Subramanian et.al.	2407.03154	null
2024-07-03	Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data	Minato Kondo et.al.	2407.03145	null
2024-07-03	Social Bias Evaluation for Large Language Models Requires Prompt Variations	Rem Hida et.al.	2407.03129	link
2024-07-03	KeyVideoLLM: Towards Large-scale Video Keyframe Selection	Hao Liang et.al.	2407.03104	null
2024-07-03	Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory	Suyeon Lee et.al.	2407.03103	link
2024-07-03	ScreenTK: Seamless Detection of Time-Killing Moments Using Continuous Mobile Screen Text Monitoring	Le Fang et.al.	2407.03063	null
2024-07-02	MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Huiqiang Jiang et.al.	2407.02490	link
2024-07-02	Neurocache: Efficient Vector Retrieval for Long-range Language Modeling	Ali Safaya et.al.	2407.02486	link
2024-07-02	RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs	Yue Yu et.al.	2407.02485	null
2024-07-02	MMedAgent: Learning to Use Medical Tools with Multi-modal Agent	Binxu Li et.al.	2407.02483	link
2024-07-02	Understanding Alignment in Multimodal LLMs: A Comprehensive Study	Elmira Amirloo et.al.	2407.02477	null
2024-07-02	Open Scene Graphs for Open World Object-Goal Navigation	Joel Loo et.al.	2407.02473	null
2024-07-02	ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions	Chan Young Park et.al.	2407.02472	link
2024-07-02	Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I	Harrie Oosterhuis et.al.	2407.02464	null
2024-07-02	Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets	Kheir Eddine Daouadi et.al.	2407.02448	null
2024-07-03	Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs	Jinmin Li et.al.	2407.02411	null
2024-07-02	CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models	Song Wang et.al.	2407.02408	null
2024-07-02	Assessing the Code Clone Detection Capability of Large Language Models	Zixian Zhang et.al.	2407.02402	null
2024-07-02	Learning to Refine with Fine-Grained Natural Language Feedback	Manya Wadhwa et.al.	2407.02397	link
2024-07-02	Is Your AI-Generated Code Really Secure? Evaluating Large Language Models on Secure Code Generation with CodeSecEval	Jiexin Wang et.al.	2407.02395	null
2024-07-02	TokenPacker: Efficient Visual Projector for Multimodal LLM	Wentong Li et.al.	2407.02392	link
2024-07-02	Talking to Machines: do you read me?	Lina M. Rojas-Barahona et.al.	2407.02354	null
2024-07-02	Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification	Pritish Sahu et.al.	2407.02352	null
2024-07-02	Generative Large Language Models in Automated Fact-Checking: A Survey	Ivan Vykopal et.al.	2407.02351	null
2024-07-02	Conceptual Codebook Learning for Vision-Language Models	Yi Zhang et.al.	2407.02350	null
2024-07-02	MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space	Yihong Tang et.al.	2407.02345	null
2024-06-28	Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs	Sukmin Yun et.al.	2406.20098	link
2024-06-28	LLaRA: Supercharging Robot Learning Data for Vision-Language Policy	Xiang Li et.al.	2406.20095	link
2024-06-28	Scaling Synthetic Data Creation with 1,000,000,000 Personas	Xin Chan et.al.	2406.20094	link
2024-06-28	LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression	Jieneng Chen et.al.	2406.20092	link
2024-06-28	ProgressGym: Alignment with a Millennium of Moral Progress	Tianyi Qiu et.al.	2406.20087	link
2024-06-28	Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language	Yicheng Chen et.al.	2406.20085	null
2024-06-28	Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification	Anisha Gunjal et.al.	2406.20079	link
2024-06-28	EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model	Yuxuan Zhang et.al.	2406.20076	link
2024-06-28	To Word Senses and Beyond: Inducing Concepts with Contextualized Language Models	Bastien Liétard et.al.	2406.20054	null
2024-06-28	Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation	Danny Halawi et.al.	2406.20053	null
2024-07-01	BMW Agents – A Framework For Task Automation Through Multi-Agent Collaboration	Noel Crawford et.al.	2406.20041	null
2024-06-28	BioMNER: A Dataset for Biomedical Method Entity Recognition	Chen Tang et.al.	2406.20038	null
2024-06-28	LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models	Renzhi Wang et.al.	2406.20030	null
2024-06-28	ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models	Yuxiang Zhang et.al.	2406.20015	link
2024-06-28	The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models	Xinyi Chen et.al.	2406.19999	link
2024-06-28	Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model	Habib Hajimolahoseini et.al.	2406.19995	null
2024-06-28	ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting	Rui Pan et.al.	2406.19976	null
2024-06-28	STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical	Guohao Sun et.al.	2406.19973	link
2024-06-28	Into the Unknown: Generating Geospatial Descriptions for New Environments	Tzuf Paz-Argaman et.al.	2406.19967	null
2024-06-28	Simulating Financial Market via Large Language Model based Agents	Shen Gao et.al.	2406.19966	null
2024-06-27	ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos	Jr-Jen Chen et.al.	2406.19392	link
2024-06-27	The Remarkable Robustness of LLMs: Stages of Inference?	Vedang Lad et.al.	2406.19384	link
2024-06-27	The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models	Xiliang Zhu et.al.	2406.19358	null
2024-06-27	DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions	Nigel Fernandez et.al.	2406.19356	link
2024-06-27	Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?	Peter Hase et.al.	2406.19354	null
2024-06-27	IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language	Lucky Susanto et.al.	2406.19349	null
2024-06-27	Jump Starting Bandits with LLM-Generated Prior Knowledge	Parand A. Alamdari et.al.	2406.19317	link
2024-06-27	MCNC: Manifold Constrained Network Compression	Chayne Thrash et.al.	2406.19301	null
2024-06-27	From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data	Zheyang Xiong et.al.	2406.19292	link
2024-06-27	PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models	Cathy Mengying Fang et.al.	2406.19283	null
2024-06-27	HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale	Junying Chen et.al.	2406.19280	link
2024-06-27	VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation	Yixiao Song et.al.	2406.19276	link
2024-06-27	AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning	Praneeth Vadlapati et.al.	2406.19271	link
2024-06-27	Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding	Yue Fan et.al.	2406.19263	link
2024-06-27	Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment	Hao Fei et.al.	2406.19255	null
2024-06-27	AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation	Jia Fu et.al.	2406.19251	null
2024-06-27	Revealing Fine-Grained Values and Opinions in Large Language Models	Dustin Wright et.al.	2406.19238	link
2024-06-28	FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts	Shubhankar Singh et.al.	2406.19237	null
2024-06-27	Seeing Is Believing: Black-Box Membership Inference Attacks Against Retrieval Augmented Generation	Yuying Li et.al.	2406.19234	null
2024-06-28	RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs	Ekaterina Taktasheva et.al.	2406.19232	link
2024-06-26	Towards Compositionality in Concept Learning	Adam Stein et.al.	2406.18534	link
2024-06-26	Symbolic Learning Enables Self-Evolving Agents	Wangchunshu Zhou et.al.	2406.18532	link
2024-06-26	PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation	Christoph Leiter et.al.	2406.18528	link
2024-06-26	CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs	Zirui Wang et.al.	2406.18521	link
2024-06-26	“Is ChatGPT a Better Explainer than My Professor?”: Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline	Grace Li et.al.	2406.18512	null
2024-06-26	WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models	Liwei Jiang et.al.	2406.18510	link
2024-06-26	Mental Modeling of Reinforcement Learning Agents by Language Models	Wenhao Lu et.al.	2406.18505	null
2024-06-26	Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming	Zhenghao Zhou et.al.	2406.18501	null
2024-06-26	Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation	Ahmed Njifenjou et.al.	2406.18460	null
2024-06-26	Cascading Large Language Models for Salient Event Graph Generation	Xingwei Tan et.al.	2406.18449	link
2024-06-26	New intelligent empowerment for digital transformation	Peng Yifeng et.al.	2406.18440	null
2024-06-26	IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons	Dan Shi et.al.	2406.18406	link
2024-06-26	Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers	Yibo Jiang et.al.	2406.18400	null
2024-06-26	Adversarial Search Engine Optimization for Large Language Models	Fredrik Nestaas et.al.	2406.18382	null
2024-06-26	MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization	Haolang Lu et.al.	2406.18379	null
2024-06-26	Themis: Towards Flexible and Interpretable NLG Evaluation	Xinyu Hu et.al.	2406.18365	link
2024-06-26	AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations	Adam Dahlgren Lindström et.al.	2406.18346	null
2024-06-26	PDFA Distillation via String Probability Queries {PDFA Distillation via String Probability Queries}	Robert Baumgartner et.al.	2406.18328	link
2024-06-26	PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models	Huixuan Zhang et.al.	2406.18326	null
2024-06-26	MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data	Meng Fang et.al.	2406.18321	null
2024-06-25	MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning	Xiangyu Zhao et.al.	2406.17770	link
2024-06-25	EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data	Jesse Zhang et.al.	2406.17768	null
2024-06-25	BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning	Ercong Nie et.al.	2406.17764	null
2024-06-25	CaLMQA: Exploring culturally specific long-form question answering across 23 languages	Shane Arora et.al.	2406.17761	link
2024-06-25	Accelerating Clinical Evidence Synthesis with Large Language Models	Zifeng Wang et.al.	2406.17755	null
2024-06-25	Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language	Amalie Brogaard Pauli et.al.	2406.17753	null
2024-06-25	Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon	USVSN Sai Prashanth et.al.	2406.17746	link
2024-06-25	Point-SAM: Promptable 3D Segmentation Model for Point Clouds	Yuchen Zhou et.al.	2406.17741	link
2024-06-25	Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model	Fei Xia et.al.	2406.17739	null
2024-06-25	LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users	Elinor Poole-Dayan et.al.	2406.17737	null
2024-06-25	FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model	Feijie Wu et.al.	2406.17706	link
2024-06-25	From Distributional to Overton Pluralism: Investigating Large Language Model Alignment	Thom Lake et.al.	2406.17692	link
2024-06-25	VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Kun Qian et.al.	2406.17681	link
2024-06-25	Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models	Yuan Li et.al.	2406.17675	null
2024-06-25	LaTable: Towards Large Tabular Models	Boris van Breugel et.al.	2406.17673	null
2024-06-25	LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic	Aditya Kalyanpur et.al.	2406.17663	null
2024-06-25	Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients	Aashiq Muhamed et.al.	2406.17660	link
2024-06-25	DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning	Xiaohan Zhang et.al.	2406.17659	null
2024-06-25	Leveraging Large Language Models for Software Model Completion: Results from Industrial and Public Datasets	Christof Tinnes et.al.	2406.17651	link
2024-06-25	Variationist: Exploring Multifaceted Variation and Bias in Written Language Data	Alan Ramponi et.al.	2406.17647	link
2024-06-24	Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs	Shengbang Tong et.al.	2406.16860	link
2024-06-24	EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees	Yuhui Li et.al.	2406.16858	link
2024-06-24	Long Context Transfer from Language to Vision	Peiyuan Zhang et.al.	2406.16852	link
2024-06-24	Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts	Aditya Sharma et.al.	2406.16851	null
2024-06-24	RaTEScore: A Metric for Radiology Report Generation	Weike Zhao et.al.	2406.16845	link
2024-06-24	From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models	Sean Welleck et.al.	2406.16838	null
2024-06-24	USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$ onversations	Mounika Marreddy et.al.	2406.16833	null
2024-06-24	Understanding and Mitigating Tokenization Bias in Language Models	Buu Phan et.al.	2406.16829	null
2024-06-24	Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track	Ronak Pradeep et.al.	2406.16828	link
2024-06-24	GPT-4V Explorations: Mining Autonomous Driving	Zixuan Li et.al.	2406.16817	null
2024-06-24	RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale	Beck LaBash et.al.	2406.16801	link
2024-06-24	Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs	Ashwinee Panda et.al.	2406.16797	link
2024-06-24	Adam-mini: Use Fewer Learning Rates To Gain More	Yushun Zhang et.al.	2406.16793	link
2024-06-24	M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models	Rishabh Maheshwary et.al.	2406.16783	null
2024-06-24	It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension	Sagi Shaier et.al.	2406.16779	null
2024-06-24	Finding Transformer Circuits with Edge Pruning	Adithya Bhaskar et.al.	2406.16778	link
2024-06-24	Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024	Sai Koneru et.al.	2406.16777	null
2024-06-24	WARP: On the Benefits of Weight Averaged Rewarded Policies	Alexandre Ramé et.al.	2406.16768	null
2024-06-24	The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories	Xi Yu Huang et.al.	2406.16767	link
2024-06-24	Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters	Euiin Yi et.al.	2406.16758	link
2024-06-21	GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians	Haoyang Liu et.al.	2406.15341	link
2024-06-21	Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance	Haoling Li et.al.	2406.15330	null
2024-06-21	Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks	Hokyung Lee et.al.	2406.15325	link
2024-06-21	Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model	Doyoung Kim et.al.	2406.15275	link
2024-06-21	Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics	Weijia Zhang et.al.	2406.15264	null
2024-06-21	Unsupervised Morphological Tree Tokenizer	Qingyang Zhu et.al.	2406.15245	null
2024-06-21	Large Batch Analysis for Adagrad Under Anisotropic Smoothness	Yuxing Liu et.al.	2406.15244	null
2024-06-21	Detecting Synthetic Lyrics with Few-Shot Inference	Yanis Labrak et.al.	2406.15231	null
2024-06-21	A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation	Irune Zubiaga et.al.	2406.15227	link
2024-06-21	Unsupervised Extraction of Dialogue Policies from Conversations	Makesh Narsimhan Sreedhar et.al.	2406.15214	null
2024-06-21	Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding	Mohan Li et.al.	2406.15209	null
2024-06-21	Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms	Santiago Berrezueta-Guzman et.al.	2406.15198	null
2024-06-21	UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis	Yulong Hui et.al.	2406.15187	link
2024-06-21	Hybrid Alignment Training for Large Language Models	Chenglong Wang et.al.	2406.15178	link
2024-06-21	EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot	Hao Fei et.al.	2406.15177	link
2024-06-21	Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss	Wei He et.al.	2406.15175	null
2024-06-21	Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d’historiens	Mathieu Chartier et.al.	2406.15173	null
2024-06-21	Assessing Good, Bad and Ugly Arguments Generated by ChatGPT: a New Dataset, its Methodology and Associated Tasks	Victor Hugo Nascimento Rocha et.al.	2406.15130	link
2024-06-21	Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network	Badr AlKhamissi et.al.	2406.15109	link
2024-06-21	PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data	Ishaan Watts et.al.	2406.15053	null
2024-06-20	Model Merging and Safety Alignment: One Bad Model Spoils the Bunch	Hasan Abed Al Kader Hammoud et.al.	2406.14563	null
2024-06-20	Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities	Sachit Menon et.al.	2406.14562	null
2024-06-20	How to Compute the Probability of a Word	Tiago Pimentel et.al.	2406.14561	link
2024-06-21	Asynchronous Large Language Model Enhanced Planner for Autonomous Driving	Yuan Chen et.al.	2406.14556	link
2024-06-20	GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models	Shilong Li et.al.	2406.14550	null
2024-06-20	Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models	Sunny Duan et.al.	2406.14549	null
2024-06-20	Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data	Johannes Treutlein et.al.	2406.14546	link
2024-06-20	Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems	Đorđe Klisura et.al.	2406.14545	null
2024-06-20	Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs	Yuxuan Qiao et.al.	2406.14544	link
2024-06-20	Are LLMs Naturally Good at Synthetic Tabular Data Generation?	Shengzhe Xu et.al.	2406.14541	link
2024-06-20	PostMark: A Robust Blackbox Watermark for Large Language Models	Yapei Chang et.al.	2406.14517	link
2024-06-20	MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding	Xinyu Fang et.al.	2406.14515	link
2024-06-20	Evidence of a log scaling law for political persuasion with large language models	Kobi Hackenburg et.al.	2406.14508	link
2024-06-20	Overview of the CAIL 2023 Argument Mining Track	Jingcong Liang et.al.	2406.14503	null
2024-06-20	Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary	Xingmeng Zhao et.al.	2406.14500	null
2024-06-20	LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors	Sheikh Asif Imran et.al.	2406.14498	link
2024-06-20	CodeRAG-Bench: Can Retrieval Augment Code Generation?	Zora Zhiruo Wang et.al.	2406.14497	link
2024-06-20	African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification	Gregor Geigle et.al.	2406.14496	link
2024-06-20	Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?	Gregor Geigle et.al.	2406.14492	null
2024-06-20	Instruction Pre-Training: Language Models are Supervised Multitask Learners	Daixuan Cheng et.al.	2406.14491	link
2024-06-18	DrVideo: Document Retrieval Based Long Video Understanding	Ziyu Ma et.al.	2406.12846	null
2024-06-18	Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Haoxiang Wang et.al.	2406.12845	link
2024-06-18	Synergizing Foundation Models and Federated Learning: A Survey	Shenghui Li et.al.	2406.12844	null
2024-06-18	GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation	Ci-Siang Lin et.al.	2406.12834	null
2024-06-18	LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation	Seyedarmin Azizi et.al.	2406.12832	link
2024-06-18	What Are the Odds? Language Models Are Capable of Probabilistic Reasoning	Akshay Paruchuri et.al.	2406.12830	link
2024-06-18	From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries	Hitesh Wadhwa et.al.	2406.12824	null
2024-06-18	Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?	Pinzhen Chen et.al.	2406.12822	null
2024-06-18	Adversarial Attacks on Multimodal Agents	Chen Henry Wu et.al.	2406.12814	link
2024-06-18	Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?	Zhe Yang et.al.	2406.12809	link
2024-06-18	Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents	Zehao Wang et.al.	2406.12806	null
2024-06-18	Supporting Human Raters with the Detection of Harmful Content using Large Language Models	Kurt Thomas et.al.	2406.12800	null
2024-06-18	ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Team GLM et.al.	2406.12793	link
2024-06-18	In-Context Learning of Energy Functions	Rylan Schaeffer et.al.	2406.12785	null
2024-06-18	UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions	Xunzhi Wang et.al.	2406.12784	link
2024-06-18	Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries	Eden Biran et.al.	2406.12775	link
2024-06-18	Towards Exact Gradient-based Training on Analog In-memory Computing	Zhaoxian Wu et.al.	2406.12774	null
2024-06-18	GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping	Angel Daruna et.al.	2406.12756	null
2024-06-18	OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI	Zhen Huang et.al.	2406.12753	link
2024-06-18	Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning	Bingchen Zhao et.al.	2406.12742	link
2024-06-17	LLaNA: Large Language and NeRF Assistant	Andrea Amaduzzi et.al.	2406.11840	null
2024-06-17	mDPO: Conditional Preference Optimization for Multimodal Large Language Models	Fei Wang et.al.	2406.11839	link
2024-06-17	MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs	Ziyu Liu et.al.	2406.11833	link
2024-06-17	Unveiling Encoder-Free Vision-Language Models	Haiwen Diao et.al.	2406.11832	link
2024-06-17	Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models	Bingqi Ma et.al.	2406.11831	null
2024-06-17	Language Modeling with Editable External Knowledge	Belinda Z. Li et.al.	2406.11830	link
2024-06-17	WPO: Enhancing RLHF with Weighted Preference Optimization	Wenxuan Zhou et.al.	2406.11827	link
2024-06-17	On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning	Geewook Kim et.al.	2406.11823	link
2024-06-17	MegaScenes: Scene-Level View Synthesis at Scale	Joseph Tung et.al.	2406.11819	link
2024-06-17	Embodied Instruction Following in Unknown Environments	Zhenyu Wu et.al.	2406.11818	null
2024-06-17	Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level	Jie Liu et.al.	2406.11817	null
2024-06-17	VideoLLM-online: Online Video Large Language Model for Streaming Video	Joya Chen et.al.	2406.11816	null
2024-06-17	How Do Large Language Models Acquire Factual Knowledge During Pretraining?	Hoyeon Chang et.al.	2406.11813	link
2024-06-17	RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content	Joao Monteiro et.al.	2406.11811	link
2024-06-17	Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations	Rima Hazra et.al.	2406.11801	link
2024-06-17	DataComp-LM: In search of the next generation of training sets for language models	Jeffrey Li et.al.	2406.11794	null
2024-06-17	CELL your Model: Contrastive Explanation Methods for Large Language Models	Ronny Luss et.al.	2406.11785	null
2024-06-17	Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs	Swanand Ravindra Kadhe et.al.	2406.11780	null
2024-06-17	Improving Multi-Agent Debate with Sparse Communication Topology	Yunxuan Li et.al.	2406.11776	null
2024-06-17	Task Me Anything	Jieyu Zhang et.al.	2406.11775	link
2024-06-14	Quantifying Variance in Evaluation Benchmarks	Lovish Madaan et.al.	2406.10229	null
2024-06-14	EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models	Julian Straub et.al.	2406.10224	link
2024-06-14	Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding	Ridouane Ghermi et.al.	2406.10221	link
2024-06-14	Semantic Membership Inference Attack against Large Language Models	Hamid Mozaffari et.al.	2406.10218	null
2024-06-14	Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs	Rui Yang et.al.	2406.10216	link
2024-06-14	DevBench: A multimodal developmental benchmark for language learning	Alvin Wei Ming Tan et.al.	2406.10215	link
2024-06-14	Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs	Abhimanyu Hans et.al.	2406.10209	link
2024-06-14	A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors	Naaman Tan et.al.	2406.10203	link
2024-06-14	TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners	Tomas de la Rosa et.al.	2406.10196	null
2024-06-14	Detecting and Evaluating Medical Hallucinations in Large Vision Language Models	Jiawei Chen et.al.	2406.10185	null
2024-06-14	Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors	Siyuan Chen et.al.	2406.10181	link
2024-06-14	Let the Poem Hit the Rhythm: Using a Byte-Based Transformer for Beat-Aligned Poetry Generation	Mohamad Elzohbi et.al.	2406.10174	link
2024-06-14	IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce	Wenxuan Ding et.al.	2406.10173	link
2024-06-14	Datasets for Multilingual Answer Sentence Selection	Matteo Gabburo et.al.	2406.10172	null
2024-06-14	CarLLaVA: Vision language models for camera-only closed-loop driving	Katrin Renz et.al.	2406.10165	null
2024-06-14	Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models	Carson Denison et.al.	2406.10162	link
2024-06-14	RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model	Hantao Zhou et.al.	2406.10157	null
2024-06-14	BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack	Yuri Kuratov et.al.	2406.10149	link
2024-06-14	Evaluation of Large Language Models: STEM education and Gender Stereotypes	Smilla Due et.al.	2406.10133	null
2024-06-14	The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models	Yan Liu et.al.	2406.10130	link
2024-06-13	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Muhammad Maaz et.al.	2406.09418	link
2024-06-13	Explore the Limits of Omni-modal Pretraining at Scale	Yiyuan Zhang et.al.	2406.09412	link
2024-06-13	4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities	Roman Bachmann et.al.	2406.09406	null
2024-06-13	Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models	Yushi Hu et.al.	2406.09403	null
2024-06-13	OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation	Junke Wang et.al.	2406.09399	link
2024-06-13	Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms	Miaosen Zhang et.al.	2406.09397	null
2024-06-13	Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA	Jongwoo Park et.al.	2406.09396	link
2024-06-13	Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition	Youngtaek Oh et.al.	2406.09388	link
2024-06-13	Towards Vision-Language Geo-Foundation Model: A Survey	Yue Zhou et.al.	2406.09385	link
2024-06-13	Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models	Lukas Thede et.al.	2406.09384	null
2024-06-13	Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs	Zijia Zhao et.al.	2406.09367	link
2024-06-13	ElicitationGPT: Text Elicitation Mechanisms via Language Models	Yifan Wu et.al.	2406.09363	null
2024-06-13	Enhancing Domain Adaptation through Prompt Gradient Alignment	Hoang Phan et.al.	2406.09353	link
2024-06-13	Separations in the Representational Capabilities of Transformers and Recurrent Architectures	Satwik Bhattamishra et.al.	2406.09347	null
2024-06-13	DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding	Suwon Shon et.al.	2406.09345	null
2024-06-13	ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models	David Anugraha et.al.	2406.09334	link
2024-06-13	REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space	Tomer Ashuach et.al.	2406.09325	null
2024-06-13	Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs	Zhao Xu et.al.	2406.09324	link
2024-06-13	JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models	Delong Ran et.al.	2406.09321	link
2024-06-13	Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases	Meng Wang et.al.	2406.09317	link
2024-06-12	What If We Recaption Billions of Web Images with LLaMA-3?	Xianhang Li et.al.	2406.08478	null
2024-06-12	Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens	Ting-Ji Huang et.al.	2406.08477	null
2024-06-12	Real2Code: Reconstruct Articulated Objects via Code Generation	Zhao Mandi et.al.	2406.08474	null
2024-06-12	PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences	Daiwei Chen et.al.	2406.08469	link
2024-06-12	Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing	Zhangchen Xu et.al.	2406.08464	link
2024-06-12	AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind	Wei Ding et.al.	2406.08455	null
2024-06-12	OLMES: A Standard for Language Model Evaluations	Yuling Gu et.al.	2406.08446	null
2024-06-12	SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models	Chun Yin et.al.	2406.08445	null
2024-06-12	TasTe: Teaching Large Language Models to Translate through Self-Reflection	Yutong Wang et.al.	2406.08434	link
2024-06-12	Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL	Zijin Hong et.al.	2406.08426	null
2024-06-12	OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text	Qingyun Li et.al.	2406.08418	link
2024-06-12	Discovering Preference Optimization Algorithms with and for Large Language Models	Chris Lu et.al.	2406.08414	link
2024-06-12	Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference	Christopher Wolters et.al.	2406.08413	null
2024-06-13	MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos	Xuehai He et.al.	2406.08407	link
2024-06-12	Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models	Chun-Yi Kuan et.al.	2406.08402	link
2024-06-12	cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers	Anirudh Sundar et.al.	2406.08398	null
2024-06-12	VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks	Jiannan Wu et.al.	2406.08394	link
2024-06-12	Large Language Models Must Be Taught to Know What They Don’t Know	Sanyam Kapoor et.al.	2406.08391	link
2024-06-12	Banal Deception Human-AI Ecosystems: A Study of People’s Perceptions of LLM-generated Deceptive Behaviour	Xiao Zhan et.al.	2406.08386	null
2024-06-13	APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation	Weizhao He et.al.	2406.08372	null
2024-06-11	A3VLM: Actionable Articulation-Aware Vision Language Model	Siyuan Huang et.al.	2406.07549	link
2024-06-11	Image and Video Tokenization with Binary Spherical Quantization	Yue Zhao et.al.	2406.07548	link
2024-06-11	Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena	Aidar Myrzakhan et.al.	2406.07545	link
2024-06-11	QuickLLaMA: Query-aware Inference Acceleration for Large Language Models	Jingyao Li et.al.	2406.07528	link
2024-06-11	Simple and Effective Masked Diffusion Language Models	Subham Sekhar Sahoo et.al.	2406.07524	link
2024-06-11	Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling	Liliang Ren et.al.	2406.07522	link
2024-06-11	Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement	Yunzhen Feng et.al.	2406.07515	null
2024-06-11	THaLLE: Text Hyperlocally Augmented Large Language Extension – Technical Report	KBTG Labs et.al.	2406.07505	null
2024-06-11	Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions	Renjie Pi et.al.	2406.07502	link
2024-06-11	TextGrad: Automatic “Differentiation” via Text	Mert Yuksekgonul et.al.	2406.07496	link
2024-06-11	CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization	Frederic Kirstein et.al.	2406.07494	null
2024-06-11	Paraphrasing in Affirmative Terms Improves Negation Understanding	MohammadHossein Rezaei et.al.	2406.07492	null
2024-06-11	PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction	Adnan Abbas et.al.	2406.07485	null
2024-06-11	Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing	Mao Li et.al.	2406.07483	null
2024-06-11	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Zesen Cheng et.al.	2406.07476	link
2024-06-11	Anomaly Detection on Unstable Logs with GPT Models	Fatemeh Hadadi et.al.	2406.07467	null
2024-06-11	Estimating the Hallucination Rate of Generative AI	Andrew Jesson et.al.	2406.07457	null
2024-06-11	Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis	Qining Zhang et.al.	2406.07455	null
2024-06-11	On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations	Shiao Meng et.al.	2406.07444	link
2024-06-11	McEval: Massively Multilingual Code Evaluation	Linzheng Chai et.al.	2406.07436	null
2024-06-10	Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation	Peize Sun et.al.	2406.06525	link
2024-06-10	UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor	Shivani Upadhyay et.al.	2406.06519	link
2024-06-10	Merlin: A Vision Language Foundation Model for 3D Computed Tomography	Louis Blankemeier et.al.	2406.06512	null
2024-06-10	NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative	Asmar Nadeem et.al.	2406.06499	null
2024-06-10	Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation	Oishi Banerjee et.al.	2406.06496	null
2024-06-10	Can Language Models Serve as Text-Based World Simulators?	Ruoyao Wang et.al.	2406.06485	null
2024-06-10	Parallelizing Linear Transformers with the Delta Rule over Sequence Length	Songlin Yang et.al.	2406.06484	link
2024-06-10	Towards a Personal Health Large Language Model	Justin Cosentino et.al.	2406.06474	null
2024-06-10	AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction	Zhen Xing et.al.	2406.06465	null
2024-06-10	Transforming Wearable Data into Health Insights using Large Language Model Agents	Mike A. Merrill et.al.	2406.06464	null
2024-06-10	VCR: Visual Caption Restoration	Tianyu Zhang et.al.	2406.06462	link
2024-06-11	Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies	Junlin Wang et.al.	2406.06461	null
2024-06-10	Evaluating the Retrieval Component in LLM-Based Question Answering Systems	Ashkan Alinejad et.al.	2406.06458	null
2024-06-10	A Large Language Model Pipeline for Breast Cancer Oncology	Tristen Pool et.al.	2406.06455	null
2024-06-10	Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course	Aadarsh Padiyath et.al.	2406.06451	null
2024-06-10	LLM Dataset Inference: Did you train on my dataset?	Pratyush Maini et.al.	2406.06443	link
2024-06-10	Interpretability of Language Models via Task Spaces	Lucas Weber et.al.	2406.06441	null
2024-06-10	Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain	Brian Hu et.al.	2406.06435	link
2024-06-10	Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking	Gabriel Rioux et.al.	2406.06425	null
2024-06-10	An Empirical Design Justice Approach to Identifying Ethical Considerations in the Intersection of Large Language Models and Social Robotics	Alva Markelius et.al.	2406.06400	null
2024-06-07	3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs	Jianing Yang et.al.	2406.05132	link
2024-06-07	An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models	Xiongtao Zhou et.al.	2406.05130	link
2024-06-07	Towards Semantic Equivalence of Tokenization in Multimodal LLM	Shengqiong Wu et.al.	2406.05127	null
2024-06-07	Large Generative Graph Models	Yu Wang et.al.	2406.05109	null
2024-06-07	LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration	Tavor Lipman et.al.	2406.05107	null
2024-06-07	Corpus Poisoning via Approximate Greedy Gradient Descent	Jinyan Su et.al.	2406.05087	link
2024-06-07	Multi-Head RAG: Solving Multi-Aspect Problems with LLMs	Maciej Besta et.al.	2406.05085	link
2024-06-07	SUMIE: A Synthetic Benchmark for Incremental Entity Summarization	Eunjeong Hwang et.al.	2406.05079	null
2024-06-07	Are Large Language Models More Empathetic than Humans?	Anuradha Welivita et.al.	2406.05063	null
2024-06-07	Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions	Shi-Yu Tian et.al.	2406.05055	null
2024-06-07	Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation	Nachiket Kotalwar et.al.	2406.05053	null
2024-06-07	Bootstrapping Referring Multi-Object Tracking	Yani Zhang et.al.	2406.05039	link
2024-06-07	Scenarios and Approaches for Situated Natural Language Explanations	Pengshuo Qiu et.al.	2406.05035	null
2024-06-07	CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search	Fengran Mo et.al.	2406.05013	link
2024-06-07	Compositional Generalization with Grounded Language Models	Sondre Wold et.al.	2406.04989	link
2024-06-07	Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences	Patrick Haller et.al.	2406.04988	link
2024-06-07	MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter	Jitai Hao et.al.	2406.04984	link
2024-06-07	CityCraft: A Real Crafter for 3D City Generation	Jie Deng et.al.	2406.04983	null
2024-06-07	Quantifying Geospatial in the Common Crawl Corpus	Ilya Ilyankou et.al.	2406.04952	null
2024-06-07	BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense	Baktash Ansari et.al.	2406.04947	link
2024-06-06	Verbalized Machine Learning: Revisiting Machine Learning with Language Models	Tim Z. Xiao et.al.	2406.04344	null
2024-06-06	Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image	Stanislaw Szymanowicz et.al.	2406.04343	link
2024-06-06	Learning 1D Causal Visual Representation with De-focus Attention Networks	Chenxin Tao et.al.	2406.04342	link
2024-06-06	RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation	Jiaming Liu et.al.	2406.04339	null
2024-06-06	Coherent Zero-Shot Visual Instruction Generation	Quynh Phung et.al.	2406.04337	null
2024-06-06	DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs	Lingchen Meng et.al.	2406.04334	null
2024-06-06	PaCE: Parsimonious Concept Engineering for Large Language Models	Jinqi Luo et.al.	2406.04331	link
2024-06-06	Parameter-Inverted Image Pyramid Networks	Xizhou Zhu et.al.	2406.04330	link
2024-06-06	Simplified and Generalized Masked Diffusion for Discrete Data	Jiaxin Shi et.al.	2406.04329	link
2024-06-06	Causal Estimation of Memorisation Profiles	Pietro Lesci et.al.	2406.04327	link
2024-06-06	ShareGPT4Video: Improving Video Understanding and Generation with Better Captions	Lin Chen et.al.	2406.04325	null
2024-06-06	Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step	Zhanhao Liang et.al.	2406.04314	link
2024-06-06	Improving Alignment and Robustness with Short Circuiting	Andy Zou et.al.	2406.04313	link
2024-06-06	Semantically Diverse Language Generation for Uncertainty Estimation in Language Models	Lukas Aichberger et.al.	2406.04306	link
2024-06-06	Quixer: A Quantum Transformer Model	Nikhil Khatri et.al.	2406.04305	null
2024-06-06	Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models	Phat Nguyen et.al.	2406.04300	null
2024-06-06	VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval	Junjie Zhou et.al.	2406.04292	link
2024-06-06	Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation	Adam Fisch et.al.	2406.04291	null
2024-06-07	What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages	Nadav Borenstein et.al.	2406.04289	null
2024-06-06	Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People	Dun-Ming Huang et.al.	2406.04278	link
2024-06-05	Wings: Learning Multimodal LLMs without Text-only Forgetting	Yi-Kai Zhang et.al.	2406.03496	null
2024-06-06	Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training	Ao Sun et.al.	2406.03488	link
2024-06-05	Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends	Sanjana Ramprasad et.al.	2406.03487	null
2024-06-05	BIPED: Pedagogically Informed Tutoring System for ESL Education	Soonwoo Kwon et.al.	2406.03486	null
2024-06-05	Does your data spark joy? Performance gains from domain upsampling at the end of training	Cody Blakeney et.al.	2406.03476	null
2024-06-05	AD-H: Autonomous Driving with Hierarchical Agents	Zaibin Zhang et.al.	2406.03474	null
2024-06-05	What is the Best Way for ChatGPT to Translate Poetry?	Shanshan Wang et.al.	2406.03450	null
2024-06-05	Pre-trained Large Language Models Use Fourier Features to Compute Addition	Tianyi Zhou et.al.	2406.03445	null
2024-06-05	Are language models rational? The case of coherence norms and belief revision	Thomas Hofweber et.al.	2406.03442	null
2024-06-05	Cycles of Thought: Measuring LLM Confidence through Stable Explanations	Evan Becker et.al.	2406.03441	null
2024-06-05	Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis	Moein Heidari et.al.	2406.03430	link
2024-06-05	Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach	Saehyung Lee et.al.	2406.03411	link
2024-06-05	Automating Turkish Educational Quiz Generation Using Large Language Models	Kamyar Zeinalipour et.al.	2406.03397	link
2024-06-05	Log Parsing with Self-Generated In-Context Learning and Self-Correction	Yifan Wu et.al.	2406.03376	null
2024-06-05	IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models	David Ifeoluwa Adelani et.al.	2406.03368	null
2024-06-05	CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning	Xinrui Lin et.al.	2406.03367	null
2024-06-05	LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback	Timon Ziegenbein et.al.	2406.03363	null
2024-06-05	Save It for the “Hot” Day: An LLM-Empowered Visual Analytics System for Heat Risk Management	Haobo Li et.al.	2406.03317	null
2024-06-05	The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games	Mikhail Mozikov et.al.	2406.03299	null
2024-06-05	SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms	Xingrun Xing et.al.	2406.03287	link
2024-06-04	Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks	Tianyu He et.al.	2406.02550	link
2024-06-04	Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation	Mohamed El Amine Boudjoghra et.al.	2406.02548	link
2024-06-04	Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning	Alex Jinpeng Wang et.al.	2406.02547	link
2024-06-04	To Believe or Not to Believe Your LLM	Yasin Abbasi Yadkori et.al.	2406.02543	null
2024-06-04	Loki: Low-Rank Keys for Efficient Sparse Attention	Prajwal Singhania et.al.	2406.02542	link
2024-06-04	Parrot: Multilingual Visual Instruction Tuning	Hai-Long Sun et.al.	2406.02539	link
2024-06-04	TopViewRS: Vision-Language Models as Top-View Spatial Reasoners	Chengzu Li et.al.	2406.02537	link
2024-06-04	Mitigate Position Bias in Large Language Models via Scaling a Single Dimension	Yijiong Yu et.al.	2406.02536	link
2024-06-04	SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices	Ruslan Svirschevski et.al.	2406.02532	link
2024-06-04	Scalable MatMul-free Language Modeling	Rui-Jie Zhu et.al.	2406.02528	link
2024-06-04	CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks	Maciej Besta et.al.	2406.02524	link
2024-06-04	RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots	Soroush Nasiriany et.al.	2406.02523	null
2024-06-04	Demystifying the Compression of Mixture-of-Experts Through a Unified Framework	Shwai He et.al.	2406.02500	link
2024-06-04	Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion	Jakub Hoscilowicz et.al.	2406.02481	link
2024-06-04	Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding	Zhihan Zhang et.al.	2406.02472	link
2024-06-04	Meta-Designing Quantum Experiments with Language Models	Sören Arlt et.al.	2406.02470	null
2024-06-04	Seed-TTS: A Family of High-Quality Versatile Speech Generation Models	Philip Anastassiou et.al.	2406.02430	link
2024-06-04	Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion	Ruiqi Li et.al.	2406.02429	null
2024-06-04	GrootVL: Tree Topology is All You Need in State Space Model	Yicheng Xiao et.al.	2406.02395	link
2024-06-04	Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data	Maxime Griot et.al.	2406.02394	link
2024-05-31	Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis	Chaoyou Fu et.al.	2405.21075	null
2024-05-31	Code Pretraining Improves Entity Tracking Abilities of Language Models	Najoung Kim et.al.	2405.21068	null
2024-05-31	Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality	Tri Dao et.al.	2405.21060	link
2024-05-31	RydbergGPT	David Fitzek et.al.	2405.21052	link
2024-05-31	Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling	Jiatao Gu et.al.	2405.21048	null
2024-05-31	Grammar-Aligned Decoding	Kanghee Park et.al.	2405.21047	null
2024-05-31	*Exploratory Preference Optimization: Harnessing Implicit Q-Approximation for Sample-Efficient RLHF**	Tengyang Xie et.al.	2405.21046	null
2024-05-31	Direct Alignment of Language Models via Quality-Aware Self-Refinement	Runsheng Yu et.al.	2405.21040	null
2024-05-31	Standards for Belief Representations in LLMs	Daniel A. Herrmann et.al.	2405.21030	null
2024-05-31	LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models	Elias Stengel-Eskin et.al.	2405.21028	link
2024-05-31	You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet	Zhen Qin et.al.	2405.21022	null
2024-05-31	Improved Techniques for Optimization-Based Jailbreaking on Large Language Models	Xiaojun Jia et.al.	2405.21018	link
2024-06-03	StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond	Pengyuan Lyu et.al.	2405.21013	null
2024-05-31	Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models	Yi Yang et.al.	2405.20991	link
2024-05-31	DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models	Linli Yao et.al.	2405.20985	link
2024-05-31	Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training	Feiteng Fang et.al.	2405.20978	link
2024-05-31	SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales	Tianyang Xu et.al.	2405.20974	link
2024-05-31	LCQ: Low-Rank Codebook based Quantization for Large Language Models	Wen-Pu Cai et.al.	2405.20973	null
2024-06-03	Large Language Models are Zero-Shot Next Location Predictors	Ciro Beneduce et.al.	2405.20962	link
2024-06-03	A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs’ Humour Alignment with Comedians	Piotr Wojciech Mirowski et.al.	2405.20956	null
2024-05-30	MotionLLM: Understanding Human Behaviors from Human Motions and Videos	Ling-Hao Chen et.al.	2405.20340	link
2024-05-30	Visual Perception by Large Language Model’s Weights	Feipeng Ma et.al.	2405.20339	link
2024-05-30	Xwin-LM: Strong and Scalable Alignment Practice for LLMs	Bolin Ni et.al.	2405.20335	link
2024-05-31	ParSEL: Parameterized Shape Editing with Language	Aditya Ganeshan et.al.	2405.20319	null
2024-05-30	CausalQuest: Collecting Natural Causal Questions for AI Agents	Roberto Ceraolo et.al.	2405.20318	link
2024-05-30	ANAH: Analytical Annotation of Hallucinations in Large Language Models	Ziwei Ji et.al.	2405.20315	link
2024-05-30	Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation	Guillaume Huguet et.al.	2405.20313	link
2024-05-30	Large Language Models Can Self-Improve At Web Agent Tasks	Ajay Patel et.al.	2405.20309	link
2024-05-30	Can’t make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models	Himangi Mittal et.al.	2405.20305	null
2024-05-30	Group Robust Preference Optimization in Reward-free RLHF	Shyam Sundhar Ramesh et.al.	2405.20304	link
2024-05-30	Who Writes the Review, Human or AI?	Panagiotis C. Theocharopoulos et.al.	2405.20285	null
2024-05-30	ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections	Massimo Bini et.al.	2405.20271	link
2024-05-30	Evaluating Large Language Model Biases in Persona-Steered Generation	Andy Liu et.al.	2405.20253	link
2024-05-30	Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization	Yuchi Liu et.al.	2405.20252	link
2024-05-30	Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use	Franz Louis Cesista et.al.	2405.20245	null
2024-05-30	Context Injection Attacks on Large Language Models	Cheng’an Wei et.al.	2405.20234	null
2024-05-30	Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies	Harveen Kaur et.al.	2405.20217	null
2024-05-30	TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models	Chen Zhang et.al.	2405.20215	null
2024-05-30	One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments	Ke Yi et.al.	2405.20202	null
2024-05-31	Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations	Zilin Ma et.al.	2405.20195	null
2024-05-29	X-VILA: Cross-Modality Alignment for Large Language Model	Hanrong Ye et.al.	2405.19335	null
2024-05-29	LLMs Meet Multimodal Generation and Editing: A Survey	Yingqing He et.al.	2405.19334	link
2024-05-29	Multi-Modal Generative Embedding Model	Feipeng Ma et.al.	2405.19333	null
2024-05-29	Self-Exploring Language Models: Active Preference Elicitation for Online Alignment	Shenao Zhang et.al.	2405.19332	link
2024-05-29	Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation	Atrisha Sarkar et.al.	2405.19328	null
2024-05-29	MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series	Ge Zhang et.al.	2405.19327	link
2024-05-29	Reasoning3D – Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models	Tianrun Chen et.al.	2405.19326	null
2024-05-29	Nearest Neighbor Speculative Decoding for LLM Generation and Attribution	Minghan Li et.al.	2405.19325	null
2024-05-29	Are Large Language Models Chameleons?	Mingmeng Geng et.al.	2405.19323	null
2024-05-29	Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF	Shicong Cen et.al.	2405.19320	null
2024-05-29	Robust Preference Optimization through Reward Model Distillation	Adam Fisch et.al.	2405.19316	null
2024-05-29	Matryoshka Query Transformer for Large Vision-Language Models	Wenbo Hu et.al.	2405.19315	link
2024-05-29	Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice	Jian-Qiao Zhu et.al.	2405.19313	null
2024-05-29	Expert-Guided Extinction of Toxic Tokens for Debiased Generation	Xueyao Sun et.al.	2405.19299	null
2024-05-29	MASSIVE Multilingual Abstract Meaning Representation: A Dataset and Baselines for Hallucination Detection	Michael Regan et.al.	2405.19285	null
2024-05-29	Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform	Viviane Potocnik et.al.	2405.19284	null
2024-05-29	Programmable Motion Generation for Open-Set Motion Control Tasks	Hanchao Liu et.al.	2405.19283	null
2024-05-29	PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications	Dingkang Yang et.al.	2405.19266	link
2024-05-29	AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data	Zifan Song et.al.	2405.19265	link
2024-05-29	Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models	Zhanhui Zhou et.al.	2405.19262	link
2024-05-28	Why are Visually-Grounded Language Models Bad at Image Classification?	Yuhui Zhang et.al.	2405.18415	link
2024-05-28	Don’t Forget to Connect! Improving RAG with Graph-based Reranking	Jialin Dong et.al.	2405.18414	null
2024-05-28	WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization	Jiawei Ma et.al.	2405.18405	null
2024-05-29	Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass	Ethan Shen et.al.	2405.18400	link
2024-05-28	Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning	Yixiao Zhang et.al.	2405.18386	link
2024-05-28	OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning	Pengxiang Li et.al.	2405.18380	link
2024-05-28	LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models	Anthony Sarah et.al.	2405.18377	null
2024-05-28	Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning	Dongjie Chen et.al.	2405.18376	link
2024-05-28	Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning	Phakphum Artkaew et.al.	2405.18375	link
2024-05-28	PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework	Eshaan Agarwal et.al.	2405.18369	null
2024-05-28	Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?	Yifan Bai et.al.	2405.18361	null
2024-05-28	Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs	Somnath Kumar et.al.	2405.18359	null
2024-05-28	MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning	Somnath Kumar et.al.	2405.18358	null
2024-05-28	Faithful Logical Reasoning via Symbolic Chain-of-Thought	Jundong Xu et.al.	2405.18357	link
2024-05-28	Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography	Jie Liu et.al.	2405.18356	link
2024-05-28	Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation	Anjanava Biswas et.al.	2405.18346	null
2024-05-28	The Battle of LLMs: A Comparative Study in Conversational QA Tasks	Aryan Rangapur et.al.	2405.18344	null
2024-05-28	Frustratingly Easy Test-Time Adaptation of Vision-Language Models	Matteo Farina et.al.	2405.18330	link
2024-05-28	Multi-modal Generation via Cross-Modal In-Context Learning	Amandeep Kumar et.al.	2405.18304	link
2024-05-28	Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning	Renzhi Wang et.al.	2405.18292	null
2024-05-27	Matryoshka Multimodal Models	Mu Cai et.al.	2405.17430	null
2024-05-27	NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models	Chankyu Lee et.al.	2405.17428	null
2024-05-27	Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model	Kuan-Chih Huang et.al.	2405.17427	link
2024-05-27	LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence	Zhuoling Li et.al.	2405.17424	null
2024-05-27	Privacy-Aware Visual Language Models	Laurens Samson et.al.	2405.17423	null
2024-05-27	Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation	Jiaming Liu et.al.	2405.17418	null
2024-05-27	THREAD: Thinking Deeper with Recursive Spawning	Philip Schroeder et.al.	2405.17402	link
2024-05-27	The Expressive Capacity of State Space Models: A Formal Language Perspective	Yash Sarrof et.al.	2405.17394	null
2024-05-27	MindMerger: Efficient Boosting LLM Reasoning in non-English Languages	Zixian Huang et.al.	2405.17386	link
2024-05-27	Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective	Zhen Qin et.al.	2405.17383	null
2024-05-27	ReMoDetect: Reward Models Recognize Aligned LLM’s Generations	Hyunseok Lee et.al.	2405.17382	link
2024-05-27	Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention	Zhen Qin et.al.	2405.17381	link
2024-05-27	RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects	Ahmed Allam et.al.	2405.17378	link
2024-05-28	Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models	ShengYun Peng et.al.	2405.17374	link
2024-05-27	Prompt Optimization with Human Feedback	Xiaoqiang Lin et.al.	2405.17346	link
2024-05-27	Exploring and steering the moral compass of Large Language Models	Alejandro Tlaie et.al.	2405.17345	link
2024-05-27	Cost-efficient Knowledge-based Question Answering with Large Language Models	Junnan Dong et.al.	2405.17337	null
2024-05-27	XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser	Xianfu Cheng et.al.	2405.17336	link
2024-05-27	FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation	Yuting Ma et.al.	2405.17267	null
2024-05-27	On the Noise Robustness of In-Context Learning for Text Generation	Hongfu Gao et.al.	2405.17264	link
2024-05-24	Scaling Laws for Discriminative Classification in Large Language Models	Dean Wyatte et.al.	2405.15765	null
2024-05-24	Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence	Abhinav Patil et.al.	2405.15750	link
2024-05-24	Sparse maximal update parameterization: A holistic approach to sparse training dynamics	Nolan Dey et.al.	2405.15743	link
2024-05-24	Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias	Andres Algaba et.al.	2405.15739	link
2024-05-24	LM4LV: A Frozen Large Language Model for Low-level Vision Tasks	Boyang Zheng et.al.	2405.15734	link
2024-05-24	Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks	Jerome Sieber et.al.	2405.15731	link
2024-05-24	Optimizing Large Language Models for OpenAPI Code Completion	Bohdan Petryshyn et.al.	2405.15729	link
2024-05-24	Disease-informed Adaptation of Vision-Language Models	Jiajin Zhang et.al.	2405.15728	link
2024-05-24	The Impact of Geometric Complexity on Neural Collapse in Transfer Learning	Michael Munn et.al.	2405.15706	null
2024-05-24	Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models	Yue Zhang et.al.	2405.15684	null
2024-05-24	VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap	Sreyan Ghosh et.al.	2405.15683	link
2024-05-24	What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models	Abdelrahman Abdelhamed et.al.	2405.15668	null
2024-05-24	Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning	Wenhan Chang et.al.	2405.15662	null
2024-05-24	$\mathbf{L^2\cdot M = C^2}$ Large Language Models as Covert Channels… a Systematic Analysis	Simen Gaure et.al.	2405.15652	null
2024-05-24	LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots	Ruoyu Wang et.al.	2405.15646	null
2024-05-24	GECKO: Generative Language Model for English, Code and Korean	Sungwoo Oh et.al.	2405.15640	null
2024-05-24	M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models	Hongyu Wang et.al.	2405.15638	link
2024-05-24	GPTZoo: A Large-scale Dataset of GPTs for the Research Community	Xinyi Hou et.al.	2405.15630	link
2024-05-24	A Comparative Analysis of Distributed Training Strategies for GPT-2	Ishan Patwardhan et.al.	2405.15628	null
2024-05-24	Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment	Hao Sun et.al.	2405.15624	null
2024-05-23	PuzzleAvatar: Assembling 3D Avatars from Personal Albums	Yuliang Xiu et.al.	2405.14869	link
2024-05-23	A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns	Asaf Yehudai et.al.	2405.14863	null
2024-05-23	Bitune: Bidirectional Instruction-Tuning	Dawid J. Kopiczko et.al.	2405.14862	null
2024-05-23	Not All Language Model Features Are Linear	Joshua Engels et.al.	2405.14860	link
2024-05-23	PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression	Vladimir Malinovskii et.al.	2405.14852	link
2024-05-23	A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis	Yue Yang et.al.	2405.14839	null
2024-05-23	From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step	Yuntian Deng et.al.	2405.14838	link
2024-05-23	HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models	Bernal Jiménez Gutiérrez et.al.	2405.14831	link
2024-05-23	Designing A Sustainable Marine Debris Clean-up Framework without Human Labels	Raymond Wang et.al.	2405.14815	link
2024-05-23	As an AI Language Model, “Yes I Would Recommend Calling the Police’’: Norm Inconsistency in LLM Decision-Making	Shomik Jain et.al.	2405.14812	null
2024-05-23	Implicit Personalization in Language Models: A Systematic Study	Zhijing Jin et.al.	2405.14808	link
2024-05-23	Can LLMs Solve longer Math Word Problems Better?	Xin Xu et.al.	2405.14804	link
2024-05-23	Lessons from the Trenches on Reproducible Evaluation of Language Models	Stella Biderman et.al.	2405.14782	null
2024-05-23	WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models	Peng Wang et.al.	2405.14768	link
2024-05-23	FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models	Hongyang Yang et.al.	2405.14767	link
2024-05-23	Evaluating Large Language Models for Public Health Classification and Extraction Tasks	Joshua Harris et.al.	2405.14766	null
2024-05-23	Large language models can be zero-shot anomaly detectors for time series?	Sarah Alnegheimish et.al.	2405.14755	link
2024-05-23	A Transformer-Based Approach for Smart Invocation of Automatic Code Completion	Aral de Moor et.al.	2405.14753	link
2024-05-23	MultiCast: Zero-Shot Multivariate Time Series Forecasting Using LLMs	Georgios Chatzigeorgakidis et.al.	2405.14748	null
2024-05-23	Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View	Xuan Liu et.al.	2405.14744	null
2024-05-21	Reducing Transformer Key-Value Cache Size with Cross-Layer Attention	William Brandon et.al.	2405.12981	null
2024-05-21	OmniGlue: Generalizable Feature Matching with Foundation Model Guidance	Hanwen Jiang et.al.	2405.12979	link
2024-05-21	BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once	Theodore Zhao et.al.	2405.12971	null
2024-05-21	Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale	Shriram Chennakesavalu et.al.	2405.12961	link
2024-05-21	Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models	Zhangyue Yin et.al.	2405.12939	link
2024-05-21	Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs	Bilgehan Sel et.al.	2405.12933	null
2024-05-21	Code-mixed Sentiment and Hate-speech Prediction	Anjali Yadav et.al.	2405.12929	link
2024-05-21	Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples	Tim Menzies et.al.	2405.12920	link
2024-05-21	G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation	Xingyuan Pan et.al.	2405.12915	link
2024-05-21	An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation	Zhiyu Tan et.al.	2405.12914	link
2024-05-21	Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment	Holli Sargeant et.al.	2405.12910	link
2024-05-21	Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents	San Kim et.al.	2405.12900	null
2024-05-21	Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models	Abdurahmman Alzahrani et.al.	2405.12884	null
2024-05-21	LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language	James Requeima et.al.	2405.12856	link
2024-05-21	OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models	Zhaojian Yu et.al.	2405.12843	link
2024-05-21	SmartFlow: Robotic Process Automation using LLMs	Arushi Jain et.al.	2405.12842	null
2024-05-21	Large Language Models Meet NLP: A Survey	Libo Qin et.al.	2405.12819	link
2024-05-21	Test Oracle Automation in the era of LLMs	Facundo Molina et.al.	2405.12766	null
2024-05-21	C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning	Ji Ma et.al.	2405.12752	null
2024-05-21	Generative AI and Large Language Models for Cyber Security: All Insights You Need	Mohamed Amine Ferrag et.al.	2405.12750	null
2024-05-20	Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning	Guanglin Zhou et.al.	2405.12217	link
2024-05-20	MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark	Hongwei Liu et.al.	2405.12209	link
2024-05-20	Developers’ Perceptions on the Impact of ChatGPT in Software Development: A Survey	Thiago S. Vaillant et.al.	2405.12195	link
2024-05-20	CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models	Haoxiang Shi et.al.	2405.12174	null
2024-05-20	Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging	Xiaobo Liang et.al.	2405.12163	link
2024-05-20	Eliciting Problem Specifications via Large Language Models	Robert E. Wray et.al.	2405.12147	null
2024-05-20	DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM	Xuchen Li et.al.	2405.12139	null
2024-05-20	MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning	Ting Jiang et.al.	2405.12130	link
2024-05-20	Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation	Zhankui He et.al.	2405.12119	null
2024-05-20	Imp: Highly Capable Large Multimodal Models for Mobile Devices	Zhenwei Shao et.al.	2405.12107	link
2024-05-20	DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction	Hao Chen et.al.	2405.12100	null
2024-05-20	Distributional Semantics, Holism, and the Instability of Meaning	Jumbly Grindrod et.al.	2405.12084	null
2024-05-20	PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation	Zhuobin Huang et.al.	2405.12079	null
2024-05-20	CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models	Tong Zhang et.al.	2405.12063	link
2024-05-20	STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents	Yue Chen et.al.	2405.12059	null
2024-05-20	KG-RAG: Bridging the Gap Between Knowledge and Creativity	Diego Sanmartin et.al.	2405.12035	null
2024-05-20	Can AI Relate: Testing Large Language Model Response for Mental Health Support	Saadia Gabriel et.al.	2405.12021	link
2024-05-20	MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering	Jingqun Tang et.al.	2405.11985	link
2024-05-20	A review on the use of large language models as virtual tutors	Silvia García-Méndez et.al.	2405.11983	null
2024-05-20	Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays	Zhichao Sun et.al.	2405.11976	link
2024-05-17	Observational Scaling Laws and the Predictability of Language Model Performance	Yangjun Ruan et.al.	2405.10938	link
2024-05-17	A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers	Kaiyu Huang et.al.	2405.10936	link
2024-05-17	The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks	Lucius Bushnaq et.al.	2405.10928	link
2024-05-17	Blackbox Adaptation for Medical Image Segmentation	Jay N. Paranjape et.al.	2405.10913	link
2024-05-17	COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain	Dimitrios P. Panagoulias et.al.	2405.10893	null
2024-05-17	Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review	Hongyi Yang et.al.	2405.10883	null
2024-05-17	ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains	Zhaopei Huang et.al.	2405.10860	link
2024-05-17	The Future of Large Language Model Pre-training is Federated	Lorenzo Sani et.al.	2405.10853	null
2024-05-17	Open-Vocabulary Spatio-Temporal Action Detection	Tao Wu et.al.	2405.10832	null
2024-05-17	Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities	Hao Zhou et.al.	2405.10825	null
2024-05-17	ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios	Markus Bayer et.al.	2405.10808	null
2024-05-17	The Relational Machine Calculus	Chris Barrett et.al.	2405.10801	null
2024-05-17	Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings	Albert Sawczyn et.al.	2405.10745	null
2024-05-17	Efficient Multimodal Large Language Models: A Survey	Yizhang Jin et.al.	2405.10739	link
2024-05-17	INDUS: Effective and Efficient Language Models for Scientific Applications	Bishwaranjan Bhattacharjee et.al.	2405.10725	null
2024-05-17	SignLLM: Sign Languages Production Large Language Models	Sen Fang et.al.	2405.10718	null
2024-05-17	Persian Pronoun Resolution: Leveraging Neural Networks and Language Models	Hassan Haji Mohammadi et.al.	2405.10714	null
2024-05-17	SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks	Michael Shliselberg et.al.	2405.10700	null
2024-05-17	Revolutionizing Process Mining: A Novel Architecture for ChatGPT Integration and Enhanced User Experience through Optimized Prompt Engineering	Mehrdad Agha Mohammad Ali Kermani et.al.	2405.10689	null
2024-05-17	Realistic Evaluation of Toxicity in Large Language Models	Tinh Son Luong et.al.	2405.10659	null
2024-05-16	UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models	Sahel Sharifymoghaddam et.al.	2405.10311	link
2024-05-16	4D Panoptic Scene Graph Generation	Jingkang Yang et.al.	2405.10305	link
2024-05-16	Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees	Yu Gui et.al.	2405.10301	link
2024-05-16	HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models	Rhea Sanjay Sukthanker et.al.	2405.10299	link
2024-05-17	Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning	Yuexiang Zhai et.al.	2405.10292	null
2024-05-16	Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction	Jianhao Chen et.al.	2405.10288	link
2024-05-16	FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models	Adrian Bulat et.al.	2405.10286	null
2024-05-16	Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers	Tuo Zhang et.al.	2405.10276	null
2024-05-16	Keep It Private: Unsupervised Privatization of Online Text	Calvin Bao et.al.	2405.10260	link
2024-05-16	When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	Xianzheng Ma et.al.	2405.10255	link
2024-05-16	PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology	George Shaikovski et.al.	2405.10254	null
2024-05-16	A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks	Xuanfan Ni et.al.	2405.10251	null
2024-05-16	IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers	Hao Yan et.al.	2405.10250	null
2024-05-16	A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts	Xinru Zhang et.al.	2405.10246	link
2024-05-16	DocuMint: Docstring Generation for Python using Small Language Models	Bibek Poudel et.al.	2405.10243	link
2024-05-16	Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting	Divij Gupta et.al.	2405.10216	null
2024-05-16	CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations	Jiahao Zhao et.al.	2405.10212	link
2024-05-16	LFED: A Literary Fiction Evaluation Dataset for Large Language Models	Linhao Yu et.al.	2405.10166	link
2024-05-16	PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning	Jiancheng Pan et.al.	2405.10160	link
2024-05-16	Speaker Verification in Agent-Generated Conversations	Yizhe Yang et.al.	2405.10150	null
2024-05-15	Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming	Bushi Xiao et.al.	2405.09508	null
2024-05-15	Constrained Learning for Causal Inference and Semiparametric Statistics	Tiffany Tianhui Cai et.al.	2405.09493	null
2024-05-15	Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts	Donya Rooein et.al.	2405.09482	null
2024-05-15	Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models	Majid Zarharan et.al.	2405.09454	link
2024-05-15	M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts	Yufeng Jiang et.al.	2405.09446	link
2024-05-15	Facilitating Opinion Diversity through Hybrid NLP Approaches	Michiel van der Meer et.al.	2405.09439	null
2024-05-15	A Survey On Text-to-3D Contents Generation In The Wild	Chenhan Jiang et.al.	2405.09431	null
2024-05-15	MicroPython Testbed for Federated Learning Algorithms	Miroslav Popovic et.al.	2405.09423	link
2024-05-15	Matching domain experts by training from scratch on domain knowledge	Xiaoliang Luo et.al.	2405.09395	null
2024-05-15	Compositional imprecise probability	Jack Liell-Cock et.al.	2405.09391	null
2024-05-15	PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models	Devansh Jain et.al.	2405.09373	link
2024-05-15	SARATR-X: A Foundation Model for Synthetic Aperture Radar Images Target Recognition	Weijie L et.al.	2405.09365	link
2024-05-15	Large Language Model Bias Mitigation from the Perspective of Knowledge Editing	Ruizhe Chen et.al.	2405.09341	null
2024-05-15	Prompting-based Synthetic Data Generation for Few-Shot Question Answering	Maximilian Schmidt et.al.	2405.09335	link
2024-05-15	Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls	Pedro Miguel Sánchez Sánchez et.al.	2405.09318	null
2024-05-15	Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological Support	Birger Moell et.al.	2405.09300	null
2024-05-15	Do language models capture implied discourse meanings? An investigation with exhaustivity implicatures of Korean morphology	Hagyeong Shin et.al.	2405.09293	null
2024-05-15	Sign of the Times: Evaluating the use of Large Language Models for Idiomaticity Detection	Dylan Phelps et.al.	2405.09279	null
2024-05-15	Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study	Chi Ma et.al.	2405.09274	null
2024-05-15	New Textual Corpora for Serbian Language Modeling	Mihailo Škorić et.al.	2405.09250	null
2024-05-14	Efficient Vision-Language Pre-training by Cluster Masking	Zihao Wei et.al.	2405.08815	link
2024-05-14	Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs	Edison Jair Bejarano Sepulveda et.al.	2405.08792	link
2024-05-14	Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring	Tiantian Zhang et.al.	2405.08786	link
2024-05-14	Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs	Akhila Yerukola et.al.	2405.08760	link
2024-05-14	Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach	Syed Mhamudul Hasan et.al.	2405.08755	null
2024-05-14	Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding	Zhimin Li et.al.	2405.08748	link
2024-05-14	Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory	Xueyan Niu et.al.	2405.08707	null
2024-05-14	EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera	Beilei Cui et.al.	2405.08672	link
2024-05-14	Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research	Qinglong Cao et.al.	2405.08668	link
2024-05-14	Thinking Tokens for Language Modeling	David Herel et.al.	2405.08644	null
2024-05-15	ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation	Dimitris Gkoumas et.al.	2405.08619	null
2024-05-14	A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine	Hanguang Xiao et.al.	2405.08603	null
2024-05-15	EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark	Xiaohui Zhang et.al.	2405.08596	link
2024-05-14	Open-Vocabulary Object Detection via Neighboring Region Attention Alignment	Sunyuan Qiang et.al.	2405.08593	null
2024-05-14	Improving Transformers with Dynamically Composable Multi-Head Attention	Da Xiao et.al.	2405.08553	link
2024-05-14	Self-Distillation Improves DNA Sequence Inference	Tong Yu et.al.	2405.08538	link
2024-05-14	Falcon 7b for Software Mention Detection in Scholarly Documents	AmeerAli Khan et.al.	2405.08514	null
2024-05-14	Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure	Odysseas S. Chlapanis et.al.	2405.08502	link
2024-05-14	Is Less More? Quality, Quantity and Context in Idiom Processing with Natural Language Models	Agne Knietaite et.al.	2405.08497	link
2024-05-14	Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models	Andrea Piergentili et.al.	2405.08477	null
2024-05-13	Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots	Chengyue Wu et.al.	2405.07990	null
2024-05-13	A Generalist Learner for Multifaceted Medical Image Interpretation	Hong-Yu Zhou et.al.	2405.07988	null
2024-05-13	The Platonic Representation Hypothesis	Minyoung Huh et.al.	2405.07987	link
2024-05-13	Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation	Kevin Stangl et.al.	2405.07969	null
2024-05-13	PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation	Suad Alshammari et.al.	2405.07963	link
2024-05-13	AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments	Samuel Schmidgall et.al.	2405.07960	null
2024-05-13	EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning	Yinzhu Quan et.al.	2405.07938	link
2024-05-13	PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition	Ziyang Zhang et.al.	2405.07932	link
2024-05-13	Stable Diffusion-based Data Augmentation for Federated Learning with Non-IID Data	Mahdi Morafah et.al.	2405.07925	null
2024-05-13	Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?	Hari Chandana Kuchibhotla et.al.	2405.07921	null
2024-05-13	A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking	Ferdinand Schlatt et.al.	2405.07920	link
2024-05-13	PLUTO: Pathology-Universal Transformer	Dinkar Juyal et.al.	2405.07905	null
2024-05-13	Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers	Alena Tsanda et.al.	2405.07886	link
2024-05-13	Zero-Shot Tokenizer Transfer	Benjamin Minixhofer et.al.	2405.07883	link
2024-05-13	RLHF Workflow: From Reward Modeling to Online RLHF	Hanze Dong et.al.	2405.07863	link
2024-05-13	Can LLMs Help Predict Elections? (Counter)Evidence from the World’s Largest Democracy	Pratik Gujral et.al.	2405.07828	null
2024-05-13	A View of How Language Models Will Transform Law	Frank Fagan et.al.	2405.07826	null
2024-05-13	FreeVA: Offline MLLM as Training-Free Video Assistant	Wenhao Wu et.al.	2405.07798	link
2024-05-13	DEPTH: Discourse Education through Pre-Training Hierarchically	Zachary Bamberger et.al.	2405.07788	link
2024-05-13	Generating Human Motion in 3D Scenes from Text Descriptions	Zhi Cen et.al.	2405.07784	null
2024-05-10	Linearizing Large Language Models	Jean Mercat et.al.	2405.06640	link
2024-05-10	Value Augmented Sampling for Language Model Alignment and Personalization	Seungwook Han et.al.	2405.06639	link
2024-05-10	Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark	Evan M. Williams et.al.	2405.06634	link
2024-05-10	Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models	Chakshu Moar et.al.	2405.06626	null
2024-05-10	Explaining Text Similarity in Transformer Models	Alexandros Vasileiou et.al.	2405.06604	link
2024-05-10	Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach	Elham Ravanbakhsh et.al.	2405.06586	null
2024-05-10	What Can Natural Language Processing Do for Peer Review?	Ilia Kuznetsov et.al.	2405.06563	link
2024-05-10	Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval	Mengjia Niu et.al.	2405.06545	null
2024-05-10	Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts	Wenyu Huang et.al.	2405.06524	null
2024-05-10	UniDM: A Unified Framework for Data Manipulation with Large Language Models	Yichen Qian et.al.	2405.06510	null
2024-05-10	Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling	Lyumanshan Ye et.al.	2405.06495	null
2024-05-10	Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification	Yaoqin Ye et.al.	2405.06468	link
2024-05-10	Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation	JoonHo Lee et.al.	2405.06424	link
2024-05-10	Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?	Hunter McNichols et.al.	2405.06414	link
2024-05-10	Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL	Ning Cheng et.al.	2405.06410	null
2024-05-10	Program Synthesis using Inductive Logic Programming for the Abstraction and Reasoning Corpus	Filipe Marinho Rocha et.al.	2405.06399	null
2024-05-10	Memory Mosaics	Jianyu Zhang et.al.	2405.06394	link
2024-05-10	LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play	Li-Chun Lu et.al.	2405.06373	link
2024-05-10	LMD3: Language Model Data Density Dependence	John Kirchenbauer et.al.	2405.06331	null
2024-05-10	Correlation Dimension of Natural Language in a Statistical Manifold	Xin Du et.al.	2405.06321	null
2024-05-09	Natural Language Processing RELIES on Linguistics	Juri Opitz et.al.	2405.05966	null
2024-05-09	OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning	Dan Qiao et.al.	2405.05957	link
2024-05-09	Probing Multimodal LLMs as World Models for Driving	Shiva Sreeram et.al.	2405.05956	link
2024-05-09	Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning	Junzhi Chen et.al.	2405.05955	link
2024-05-09	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts	Jiachen Li et.al.	2405.05949	link
2024-05-09	DOLOMITES: Domain-Specific Long-Form Methodical Tasks	Chaitanya Malaviya et.al.	2405.05938	null
2024-05-09	Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness	Siyuan Li et.al.	2405.05930	null
2024-05-09	Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?	Zorik Gekhman et.al.	2405.05904	null
2024-05-09	Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes	Ziang Guo et.al.	2405.05885	link
2024-05-09	FlockGPT: Guiding UAV Flocking with Linguistic Orchestration	Artem Lykov et.al.	2405.05872	null
2024-05-09	Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control	Gunshi Gupta et.al.	2405.05852	link
2024-05-09	Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning	Artem Lykov et.al.	2405.05824	link
2024-05-09	Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference	Zhihang Lin et.al.	2405.05803	link
2024-05-09	Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language	Ronny Paul et.al.	2405.05777	null
2024-05-09	Experimental Pragmatics with Machines: Testing LLM Predictions for the Inferences of Plain and Embedded Disjunctions	Polina Tsvilodub et.al.	2405.05776	null
2024-05-09	Large Language Model-Aided Evolutionary Search for Constrained Multiobjective Optimization	Zeyi Wang et.al.	2405.05767	null
2024-05-09	Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media	Zhizhen Zhang et.al.	2405.05760	null
2024-05-09	Exploring the Potential of Human-LLM Synergy in Advancing Qualitative Analysis: A Case Study on Mental-Illness Stigma	Han Meng et.al.	2405.05758	null
2024-05-09	Can large language models understand uncommon meanings of common words?	Jinyang Wu et.al.	2405.05741	null
2024-05-09	Evaluating Dialect Robustness of Language Models via Conversation Understanding	Dipankar Srirag et.al.	2405.05688	link
2024-05-08	THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models	Prannay Kaul et.al.	2405.05256	null
2024-05-08	You Only Cache Once: Decoder-Decoder Architectures for Language Models	Yutao Sun et.al.	2405.05254	link
2024-05-08	Open Source Language Models Can Provide Feedback: Evaluating LLMs’ Ability to Help Students Using GPT-4-As-A-Judge	Charles Koutcheme et.al.	2405.05253	link
2024-05-09	LLMs with Personalities in Multi-issue Negotiation Games	Sean Noh et.al.	2405.05248	null
2024-05-08	EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning	Jingfeng Yao et.al.	2405.05237	link
2024-05-08	SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants	Masoud Moghani et.al.	2405.05226	null
2024-05-08	Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers	Jiuxiang Gu et.al.	2405.05219	null
2024-05-08	FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models	Jinglin Xu et.al.	2405.05216	link
2024-05-08	MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning	Inderjeet Nair et.al.	2405.05189	link
2024-05-08	Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming	Tommaso Pasini et.al.	2405.05176	null
2024-05-08	Air Gap: Protecting Privacy-Conscious Conversational Agents	Eugene Bagdasaryan et.al.	2405.05175	null
2024-05-08	XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples	Peiqin Lin et.al.	2405.05116	link
2024-05-08	QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs	Weijia Zhang et.al.	2405.05109	null
2024-05-08	Concerns on Bias in Large Language Models when Creating Synthetic Personae	Helena A. Haxvig et.al.	2405.05080	null
2024-05-08	Impact of Tone-Aware Explanations in Recommender Systems	Ayano Okoso et.al.	2405.05061	null
2024-05-08	Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models	Aylin Gunal et.al.	2405.05060	null
2024-05-08	Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources	Lasse Hyldig Hansen et.al.	2405.05049	null
2024-05-08	${M^2D}$ NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields	Ning Wang et.al.	2405.05010	null
2024-05-08	ADELIE: Aligning Large Language Models on Information Extraction	Yunjia Qi et.al.	2405.05008	link
2024-05-08	NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair	Ruoke Wang et.al.	2405.04994	null
2024-05-07	ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning	Jing Lin et.al.	2405.04533	null
2024-05-07	QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving	Yujun Lin et.al.	2405.04532	link
2024-05-07	NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts	Shudan Zhang et.al.	2405.04520	null
2024-05-07	xLSTM: Extended Long Short-Term Memory	Maximilian Beck et.al.	2405.04517	link
2024-05-07	A Transformer with Stack Attention	Jiaoda Li et.al.	2405.04515	link
2024-05-08	Unveiling Disparities in Web Task Handling Between Human and Web Agent	Kihoon Son et.al.	2405.04497	null
2024-05-07	Toward In-Context Teaching: Adapting Examples to Students’ Misconceptions	Alexis Ross et.al.	2405.04495	null
2024-05-07	Representation Learning of Daily Movement Data Using Text Encoders	Alexander Capstick et.al.	2405.04494	link
2024-05-08	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	DeepSeek-AI et.al.	2405.04434	link
2024-05-07	The Silicone Ceiling: Auditing GPT’s Race and Gender Biases in Hiring	Lena Armstrong et.al.	2405.04412	null
2024-05-07	Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks	Georgios Pantazopoulos et.al.	2405.04403	link
2024-05-07	Large Language Models Cannot Explain Themselves	Advait Sarkar et.al.	2405.04382	null
2024-05-07	A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI	Hannah Chafetz et.al.	2405.04333	null
2024-05-07	Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation	Atharvan Dogra et.al.	2405.04325	null
2024-05-07	Granite Code Models: A Family of Open Foundation Models for Code Intelligence	Mayank Mishra et.al.	2405.04324	link
2024-05-07	Accelerating Speculative Decoding using Dynamic Speculation Length	Jonathan Mamou et.al.	2405.04304	null
2024-05-07	Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework	Xiangpeng Wan et.al.	2405.04294	link
2024-05-07	Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore	Junchao Wu et.al.	2405.04286	null
2024-05-07	On the Foundations of Earth and Climate Foundation Models	Xiao Xiang Zhu et.al.	2405.04285	null
2024-05-07	Semantic API Alignment: Linking High-level User Goals to APIs	Robert Feldt et.al.	2405.04236	null
2024-05-06	Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs	Muhammad Uzair Khattak et.al.	2405.03690	null
2024-05-06	Pose Priors from Language Models	Sanjay Subramanian et.al.	2405.03689	null
2024-05-06	Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames	Keith Burghardt et.al.	2405.03688	link
2024-05-06	Language-Image Models with 3D Understanding	Jang Hyun Cho et.al.	2405.03685	null
2024-05-06	AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design	Kamal Choudhary et.al.	2405.03680	link
2024-05-06	When LLMs Meet Cybersecurity: A Systematic Literature Review	Jie Zhang et.al.	2405.03644	link
2024-05-06	A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama	Vlad-Andrei Cursaru et.al.	2405.03616	null
2024-05-06	GREEN: Generative Radiology Report Evaluation and Error Notation	Sophie Ostmeier et.al.	2405.03595	null
2024-05-06	Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment	Abhinav Agarwalla et.al.	2405.03594	null
2024-05-06	Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing	Han Liu et.al.	2405.03565	null
2024-05-07	ID-centric Pre-training for Recommendation	Yiqing Wu et.al.	2405.03562	null
2024-05-06	AlphaMath Almost Zero: process Supervision without process	Guoxin Chen et.al.	2405.03553	link
2024-05-06	MAmmoTH2: Scaling Instructions from the Web	Xiang Yue et.al.	2405.03548	null
2024-05-06	Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions	Xingyou Song et.al.	2405.03547	null
2024-05-06	Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning	Yubo Mai et.al.	2405.03509	null
2024-05-06	UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images	Yiting Qu et.al.	2405.03486	null
2024-05-06	LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model	Haowen Sun et.al.	2405.03485	link
2024-05-06	Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search	Hideaki Joko et.al.	2405.03480	link
2024-05-07	Large Language Models (LLMs) as Agents for Augmented Democracy	Jairo Gudiño-Rosero et.al.	2405.03452	null
2024-05-06	SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence	Hangyuan Ji et.al.	2405.03446	link
2024-05-03	Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models	Piotr Padlewski et.al.	2405.02287	link
2024-05-03	Structural Pruning of Pre-trained Language Models via Neural Architecture Search	Aaron Klein et.al.	2405.02267	link
2024-05-03	On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?	Maxime Zanella et.al.	2405.02266	link
2024-05-03	Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows	Jasmine Y. Shih et.al.	2405.02260	null
2024-05-03	What matters when building vision-language models?	Hugo Laurençon et.al.	2405.02246	null
2024-05-03	REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs	Deepa Tilwani et.al.	2405.02228	null
2024-05-03	Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks	Lujing Zhang et.al.	2405.02225	null
2024-05-03	FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems	Yashar Deldjoo et.al.	2405.02219	null
2024-05-03	Automatic Programming: Large Language Models and Beyond	Michael R. Lyu et.al.	2405.02213	null
2024-05-03	Assessing and Verifying Task Utility in LLM-Powered Applications	Negar Arabzadeh et.al.	2405.02178	null
2024-05-03	Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset	Hsuvas Borkakoty et.al.	2405.02175	link
2024-05-03	Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models	Mohamad Al Mdfaa et.al.	2405.02162	null
2024-05-03	Neural Context Flows for Learning Generalizable Dynamical Systems	Roussel Desmond Nzoyem et.al.	2405.02154	link
2024-05-03	The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates	Giuseppe Russo Latona et.al.	2405.02150	link
2024-05-03	MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain	Chao Jiang et.al.	2405.02144	null
2024-05-03	Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection	Guillem Ramírez et.al.	2405.02134	null
2024-05-03	Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets	Xuelong Geng et.al.	2405.02132	link
2024-05-03	Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph	Vladyslav Nechakhin et.al.	2405.02105	null
2024-05-03	Argumentative Large Language Models for Explainable and Contestable Decision-Making	Gabriel Freedman et.al.	2405.02079	link
2024-05-03	Comparative Analysis of Retrieval Systems in the Real World	Dmytro Mozolevskyi et.al.	2405.02048	null
2024-05-02	Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models	Seungone Kim et.al.	2405.01535	link
2024-05-02	Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks	Murtaza Dalal et.al.	2405.01534	null
2024-05-02	OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning	Shihao Wang et.al.	2405.01533	link
2024-05-02	FLAME: Factuality-Aware Alignment for Large Language Models	Sheng-Chieh Lin et.al.	2405.01525	null
2024-05-02	A separability-based approach to quantifying generalization: which layer is best?	Luciano Dyballa et.al.	2405.01524	link
2024-05-02	Transformer-Aided Semantic Communications	Matin Mortaheb et.al.	2405.01521	null
2024-05-02	D2PO: Discriminator-Guided DPO with Response Evaluation Models	Prasann Singhal et.al.	2405.01511	link
2024-05-02	Analyzing the Role of Semantic Representations in the Era of Large Language Models	Zhijing Jin et.al.	2405.01502	link
2024-05-02	Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models	Raymond Fok et.al.	2405.01501	null
2024-05-02	Controllable Text Generation in the Instruction-Tuning Era	Dhananjay Ashok et.al.	2405.01490	null
2024-05-02	MANTIS: Interleaved Multi-Image Instruction Tuning	Dongfu Jiang et.al.	2405.01483	link
2024-05-02	NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment	Gerald Shen et.al.	2405.01481	link
2024-05-02	V-FLUTE: Visual Figurative Language Understanding with Textual Explanations	Arkadiy Saakyan et.al.	2405.01474	link
2024-05-02	Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning	Théo Moutakanni et.al.	2405.01469	null
2024-05-02	Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models	Yifei Ming et.al.	2405.01468	null
2024-05-02	A Systematic Literature Review on Large Language Models for Automated Program Repair	Quanjun Zhang et.al.	2405.01466	link
2024-05-02	Natural Language to Verilog: Design of a Recurrent Spiking Neural Network using Large Language Models and ChatGPT	Paola Vitolo et.al.	2405.01419	null
2024-05-02	MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors	Yuan Tang et.al.	2405.01413	link
2024-05-02	Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving	Xin Quan et.al.	2405.01379	link
2024-05-02	GAIA: A General AI Assistant for Intelligent Accelerator Operations	Frank Mayet et.al.	2405.01359	null
2024-05-01	Self-Play Preference Optimization for Language Model Alignment	Yue Wu et.al.	2405.00675	link
2024-05-01	Is Bigger Edit Batch Size Always Better? – An Empirical Study on Model Editing with Llama-3	Junsang Yoon et.al.	2405.00664	link
2024-05-01	HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models	Ningke Li et.al.	2405.00648	null
2024-05-01	When Quantization Affects Confidence of Large Language Models?	Irina Proskurina et.al.	2405.00632	link
2024-05-01	“I’m Not Sure, But…”: Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust	Sunnie S. Y. Kim et.al.	2405.00623	null
2024-05-01	Causal Evaluation of Language Models	Sirui Chen et.al.	2405.00622	link
2024-05-01	Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling	Yida Mu et.al.	2405.00611	link
2024-05-01	Investigating Automatic Scoring and Feedback using Large Language Models	Gloria Ashiya Katuka et.al.	2405.00602	null
2024-05-01	Are Models Biased on Text without Gender-related Language?	Catarina G Belém et.al.	2405.00588	link
2024-05-01	The Real, the Better: Aligning Large Language Models with Online Human Behaviors	Guanying Jiang et.al.	2405.00578	null
2024-05-01	EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model	Deng Li et.al.	2405.00574	null
2024-05-01	NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance	Huan-Yi Su et.al.	2405.00566	null
2024-05-01	Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment	Zhili Liu et.al.	2405.00557	null
2024-05-01	Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs	Nicolas Gorlo et.al.	2405.00552	link
2024-05-01	ChatBI: Towards Natural Language to Complex Business Intelligence SQL	Jinqing Lian et.al.	2405.00527	null
2024-05-01	CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions	Donghee Choi et.al.	2405.00523	null
2024-05-01	Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning	Lucas-Andreï Thil et.al.	2405.00516	null
2024-05-01	GOLD: Geometry Problem Solver with Natural Language Description	Jiaxin Zhang et.al.	2405.00494	link
2024-05-01	Is Temperature the Creativity Parameter of Large Language Models?	Max Peeperkorn et.al.	2405.00492	link
2024-05-01	The Pyramid of Captions	Delong Chen et.al.	2405.00485	null
2024-04-30	Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Yunhao Ge et.al.	2404.19752	null
2024-04-30	PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification	Leon Garza et.al.	2404.19744	null
2024-04-30	Better & Faster Large Language Models via Multi-token Prediction	Fabian Gloeckle et.al.	2404.19737	null
2024-04-30	A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications	Steph Buongiorno et.al.	2404.19729	null
2024-04-30	PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games	Steph Buongiorno et.al.	2404.19721	null
2024-04-30	Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns	Constantinos Patsakis et.al.	2404.19715	null
2024-04-30	Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models	Scott Sumpter et.al.	2404.19713	null
2024-04-30	When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively	Tiziano Labruna et.al.	2404.19705	link
2024-04-30	Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners	Chun Feng et.al.	2404.19696	null
2024-04-30	Towards Generalist Robot Learning from Internet Video: A Survey	Robert McCarthy et.al.	2404.19664	null
2024-04-30	MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation	Min Zhang et.al.	2404.19644	link
2024-04-30	On Training a Neural Network to Explain Binaries	Alexander Interrante-Grant et.al.	2404.19631	null
2024-04-30	Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model	Denys Godwin et.al.	2404.19609	null
2024-04-30	Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning	Xuanli He et.al.	2404.19597	null
2024-04-30	RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing	Yucheng Hu et.al.	2404.19543	link
2024-04-30	MoST: Multi-modality Scene Tokenization for Motion Prediction	Norman Mu et.al.	2404.19531	null
2024-04-30	Do Large Language Models Understand Conversational Implicature – A case study with a chinese sitcom	Shisen Yue et.al.	2404.19509	link
2024-04-30	More Compute Is What You Need	Zhen Guo et.al.	2404.19484	null
2024-05-01	Neuro-Vision to Language: Image Reconstruction and Language enabled Interaction via Brain Recordings	Guobin Shen et.al.	2404.19438	null
2024-04-30	Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships	D. Panas et.al.	2404.19432	null
2024-04-29	Hallucination of Multimodal Large Language Models: A Survey	Zechen Bai et.al.	2404.18930	link
2024-04-29	Holmes: Benchmark the Linguistic Competence of Language Models	Andreas Waldis et.al.	2404.18923	null
2024-04-29	DPO Meets PPO: Reinforced Token Optimization for RLHF	Han Zhong et.al.	2404.18922	link
2024-04-29	TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation	Junhao Cheng et.al.	2404.18919	link
2024-04-29	Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting	Fangcheng Liu et.al.	2404.18911	link
2024-04-29	Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking	Hong Jin Kang et.al.	2404.18881	link
2024-04-29	More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness	Aaron J. Li et.al.	2404.18870	link
2024-04-29	Truth-value judgment in language models: belief directions are context sensitive	Stefan F. Schouten et.al.	2404.18865	null
2024-04-29	Performance-Aligned LLMs for Generating Fast Code	Daniel Nichols et.al.	2404.18864	null
2024-04-29	A Survey on Vision Mamba: Models, Applications and Challenges	Rui Xu et.al.	2404.18861	link
2024-04-29	VERT: Verified Equivalent Rust Transpilation with Few-Shot Learning	Aidan Z. H. Yang et.al.	2404.18852	null
2024-04-29	FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition	Yuxuan Yan et.al.	2404.18848	null
2024-04-29	It’s Difficult to be Neutral – Human and LLM-based Sentiment Annotation of Patient Comments	Petter Mæhlum et.al.	2404.18832	null
2024-04-29	Benchmarking Benchmark Leakage in Large Language Models	Ruijie Xu et.al.	2404.18824	link
2024-04-29	AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering	Wenxiang Zhao et.al.	2404.18816	null
2024-04-29	Unknown Script: Impact of Script on Cross-Lingual Transfer	Wondimagegnhue Tsegaye Tufa et.al.	2404.18810	link
2024-04-29	Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models	Pat Verga et.al.	2404.18796	null
2024-04-29	PECC: Problem Extraction and Coding Challenges	Patrick Haller et.al.	2404.18766	link
2024-04-29	Transitive Vision-Language Prompt Learning for Domain Generalization	Liyuan Wang et.al.	2404.18758	null
2024-04-29	Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models	Hongyi Zhu et.al.	2404.18746	null
2024-04-26	Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo	Stephen Zhao et.al.	2404.17546	link
2024-04-26	Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models	Yuhang Huang et.al.	2404.17534	null
2024-04-26	Large Language Model Agent as a Mechanical Designer	Yayati Jadhav et.al.	2404.17525	null
2024-04-26	On the Use of Large Language Models to Generate Capability Ontologies	Luis Miguel Vieira da Silva et.al.	2404.17524	link
2024-04-26	Enhancing Legal Compliance and Regulation Analysis with Large Language Models	Shabnam Hassani et.al.	2404.17522	null
2024-04-26	A Comprehensive Evaluation on Event Reasoning of Large Language Models	Zhengwei Tao et.al.	2404.17513	link
2024-04-26	CEval: A Benchmark for Evaluating Counterfactual Text Generation	Van Bach Nguyen et.al.	2404.17475	link
2024-04-26	Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System	Robin Schmucker et.al.	2404.17460	null
2024-04-26	“ChatGPT Is Here to Help, Not to Replace Anybody” – An Evaluation of Students’ Opinions On Integrating ChatGPT In CS Courses	Bruno Pereira Cipriano et.al.	2404.17443	null
2024-04-26	PromptCIR: Blind Compressed Image Restoration with Prompt Learning	Bingchen Li et.al.	2404.17433	link
2024-04-26	Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations	Rémy Decoupes et.al.	2404.17401	null
2024-04-26	UniRGB-IR: A Unified Framework for Visible-Infrared Downstream Tasks via Adapter Tuning	Maoxun Yuan et.al.	2404.17360	link
2024-04-26	InspectorRAGet: An Introspection Platform for RAG Evaluation	Kshitij Fadnis et.al.	2404.17347	link
2024-04-26	Introducing cosmosGPT: Monolingual Training for Turkish Language Models	H. Toprak Kesgin et.al.	2404.17336	null
2024-04-26	A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation	Xin Zhang et.al.	2404.17335	null
2024-04-26	An Extendable Cloud-Native Alloy Property Explorer	Zhuoyuan Li et.al.	2404.17330	link
2024-04-26	When to Trust LLMs: Aligning Confidence with Response Quality	Shuchang Tao et.al.	2404.17287	link
2024-04-26	Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM	Xuan Zhang et.al.	2404.17283	link
2024-04-26	Prompting Towards Alleviating Code-Switched Data Scarcity in Under-Resourced Languages with GPT as a Pivot	Michelle Terblanche et.al.	2404.17216	null
2024-04-26	Low-Rank Knowledge Decomposition for Medical Foundation Models	Yuhang Zhou et.al.	2404.17184	link
2024-04-25	The Third Monocular Depth Estimation Challenge	Jaime Spencer et.al.	2404.16831	null
2024-04-25	Make-it-Real: Unleashing Large Multimodal Model’s Ability for Painting 3D Objects with Realistic Materials	Ye Fang et.al.	2404.16829	null
2024-04-25	V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection	Xuanyu Zhang et.al.	2404.16824	null
2024-04-25	How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Zhe Chen et.al.	2404.16821	link
2024-04-25	IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages	Harman Singh et.al.	2404.16816	link
2024-04-26	Make Your LLM Fully Utilize the Context	Shengnan An et.al.	2404.16811	link
2024-04-25	Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning	Tianhui Zhang et.al.	2404.16807	link
2024-04-25	AAPL: Adding Attributes to Prompt Learning for Vision-Language Models	Gahyeon Kim et.al.	2404.16804	link
2024-04-25	Weak-to-Strong Extrapolation Expedites Alignment	Chujie Zheng et.al.	2404.16792	link
2024-04-25	SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension	Bohao Li et.al.	2404.16790	link
2024-04-25	Continual Learning of Large Language Models: A Comprehensive Survey	Haizhou Shi et.al.	2404.16789	link
2024-04-25	Modeling Selective Feature Attention for Representation-based Siamese Text Matching	Jianxiang Zang et.al.	2404.16776	link
2024-04-25	REBEL: Reinforcement Learning via Regressing Relative Rewards	Zhaolin Gao et.al.	2404.16767	link
2024-04-25	Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model	Runzhe Zhan et.al.	2404.16766	null
2024-04-25	RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis	Xiaoman Zhang et.al.	2404.16754	link
2024-04-25	Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class	Mazda Moayeri et.al.	2404.16717	null
2024-04-25	Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding	Mostafa Elhoushi et.al.	2404.16710	link
2024-04-25	Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents	Giorgio Piatti et.al.	2404.16698	link
2024-04-25	Influence of Solution Efficiency and Valence of Instruction on Additive and Subtractive Solution Strategies in Humans and GPT-4	Lydia Uhler et.al.	2404.16692	null
2024-04-25	EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning	Hongxia Xie et.al.	2404.16670	link
2024-04-24	Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data	Aliaksei Vertsel et.al.	2404.15604	null
2024-04-24	ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction	Henry Peng Zou et.al.	2404.15592	link
2024-04-24	MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis	Jiaxin Zhuang et.al.	2404.15580	null
2024-04-24	Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?	Hossein Salami et.al.	2404.15578	null
2024-04-24	Retrieval Head Mechanistically Explains Long-Context Factuality	Wenhao Wu et.al.	2404.15574	link
2024-04-23	PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models	Shashi Kant Gupta et.al.	2404.15549	null
2024-04-23	BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis	Shuhang Lin et.al.	2404.15532	link
2024-04-23	Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models	Mihir Parmar et.al.	2404.15522	link
2024-04-23	Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval	Young Kyun Jang et.al.	2404.15516	null
2024-04-23	ToM-LM: Delegating Theory Of Mind Reasoning to External Symbolic Executors in Large Language Models	Weizhi Tang et.al.	2404.15515	null
2024-04-23	IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents	Jean-Philippe Corbeil et.al.	2404.15488	link
2024-04-23	Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance	Het Patel et.al.	2404.15485	null
2024-04-23	Can Large Language Models Learn the Physics of Metamaterials? An Empirical Study with ChatGPT	Darui Lu et.al.	2404.15458	null
2024-04-23	XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference	João Monteiro et.al.	2404.15420	null
2024-04-23	Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs	Davide Caffagni et.al.	2404.15406	null
2024-04-23	Aligning LLM Agents by Learning Latent Preference from User Edits	Ge Gao et.al.	2404.15269	link
2024-04-23	XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts	Yifeng Ding et.al.	2404.15247	link
2024-04-23	CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies	Weiyan Shi et.al.	2404.15238	link
2024-04-23	Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models	Aidan Z. H. Yang et.al.	2404.15236	null
2024-04-23	Re-Thinking Inverse Graphics With Large Language Models	Peter Kulits et.al.	2404.15228	null
2024-04-23	Does Instruction Tuning Make LLMs More Consistent?	Constanza Fierro et.al.	2404.15206	null
2024-04-23	Setting up the Data Printer with Improved English to Ukrainian Machine Translation	Yurii Paniv et.al.	2404.15196	link
2024-04-23	Regressive Side Effects of Training Language Models to Mimic Student Misconceptions	Shashank Sonkar et.al.	2404.15156	null
2024-04-23	Bias patterns in the application of LLMs for clinical decision support: A comprehensive study	Raphael Poulain et.al.	2404.15149	link
2024-04-23	Rethinking LLM Memorization through the Lens of Adversarial Compression	Avi Schwarzschild et.al.	2404.15146	null
2024-04-23	MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning	Sunan He et.al.	2404.15127	link
2024-04-23	Identifying Fairness Issues in Automatically Generated Testing Content	Kevin Stowe et.al.	2404.15104	null
2024-04-23	Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation	Xun Wu et.al.	2404.15100	null
2024-04-23	Detection of circular permutations by Protein Language Models	Yue Hu et.al.	2404.15087	link
2024-04-23	Multi-Head Mixture-of-Experts	Xun Wu et.al.	2404.15045	link
2024-04-23	TAXI: Evaluating Categorical Knowledge Editing for Language Models	Derek Powell et.al.	2404.15004	link
2024-04-23	Transformers Can Represent $n$ -gram Language Models	Anej Svete et.al.	2404.14994	null
2024-04-23	A Short Review for Ontology Learning from Text: Stride from Shallow Learning, Deep Learning to Large Language Models Trend	Rick Du et.al.	2404.14991	null
2024-04-23	$\texttt{MiniMol}$ : A Parameter-Efficient Foundation Model for Molecular Learning	Kerstin Kläser et.al.	2404.14986	null
2024-04-23	Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case	Muhammad Asif Auyb et.al.	2404.14977	null
2024-04-22	AutoAD III: The Prequel – Back to the Pixels	Tengda Han et.al.	2404.14412	null
2024-04-22	SpaceByte: Towards Deleting Tokenization from Large Language Modeling	Kevin Slagle et.al.	2404.14408	link
2024-04-22	RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?	Adrian de Wynter et.al.	2404.14397	link
2024-04-22	SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation	Yuying Ge et.al.	2404.14396	link
2024-04-22	PARAMANU-GANITA: Language Model with Mathematical Capabilities	Mitodru Niyogi et.al.	2404.14395	null
2024-04-22	A Multimodal Automated Interpretability Agent	Tamar Rott Shaham et.al.	2404.14394	null
2024-04-22	A Survey on Self-Evolution of Large Language Models	Zhengwei Tao et.al.	2404.14387	link
2024-04-22	Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph	Xiaochen Kev Gao et.al.	2404.14372	link
2024-04-23	Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data	Fahim Tajwar et.al.	2404.14367	link
2024-04-22	Better Synthetic Data by Retrieving and Transforming Existing Datasets	Saumya Gandhi et.al.	2404.14361	link
2024-04-22	Rethinking Legal Compliance Automation: Opportunities with Large Language Models	Shabnam Hassani et.al.	2404.14356	null
2024-04-22	Calc-CMU at SemEval-2024 Task 7: Pre-Calc – Learning to Use the Calculator Improves Numeracy in Language Models	Vishruth Veerendranath et.al.	2404.14355	link
2024-04-22	Automated Long Answer Grading with RiceChem Dataset	Shashank Sonkar et.al.	2404.14316	link
2024-04-22	Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels	Jan-Philipp Fränken et.al.	2404.14313	link
2024-04-22	Explaining Arguments’ Strength: Unveiling the Role of Attacks and Supports (Technical Report)	Xiang Yin et.al.	2404.14304	link
2024-04-22	Marking: Visual Grading with Highlighting Errors and Annotating Missing Bits	Shashank Sonkar et.al.	2404.14301	null
2024-04-22	Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach	Yao Wan et.al.	2404.14296	link
2024-04-22	A Survey on Efficient Inference for Large Language Models	Zixuan Zhou et.al.	2404.14294	null
2024-04-22	LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots	Dongge Han et.al.	2404.14285	null
2024-04-22	Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback	Wenyi Xiao et.al.	2404.14233	link
2024-04-19	MoVA: Adapting Mixture of Vision Experts to Multimodal Context	Zhuofan Zong et.al.	2404.13046	link
2024-04-19	Unified Scene Representation and Reconstruction for 3D Large Language Models	Tao Chu et.al.	2404.13044	null
2024-04-19	Data Alignment for Zero-Shot Concept Generation in Dermatology AI	Soham Gadgil et.al.	2404.13043	null
2024-04-19	Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs	Biyang Guo et.al.	2404.13033	link
2024-04-19	When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering	Stephen Choi et.al.	2404.13028	null
2024-04-19	Stronger Random Baselines for In-Context Learning	Gregory Yauney et.al.	2404.13020	link
2024-04-19	Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models	Chuofan Ma et.al.	2404.13013	link
2024-04-19	Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs	Clemencia Siro et.al.	2404.12994	link
2024-04-19	FineRec:Exploring Fine-grained Sequential Recommendation	Xiaokun Zhang et.al.	2404.12975	link
2024-04-19	Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models	Yian Li et.al.	2404.12966	null
2024-04-19	Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction	Qinyuan Wu et.al.	2404.12957	link
2024-04-19	Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models	Konstantinos Vilouras et.al.	2404.12920	link
2024-04-19	Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models	Zhenyang Ni et.al.	2404.12916	link
2024-04-19	Large Language Models for Networking: Workflow, Advances and Challenges	Chang Liu et.al.	2404.12901	null
2024-04-19	Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning	Ahmed Elshabrawy et.al.	2404.12897	null
2024-04-19	Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation	Guanhua Chen et.al.	2404.12879	null
2024-04-19	LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency	Zhaodonghui Li et.al.	2404.12872	link
2024-04-19	How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?	Yang Luo et.al.	2404.12866	link
2024-04-19	Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation	Yilong Chen et.al.	2404.12861	null
2024-04-19	TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages	Aleksei Dorkin et.al.	2404.12845	null
2024-04-18	BLINK: Multimodal Large Language Models Can See but Not Perceive	Xingyu Fu et.al.	2404.12390	null
2024-04-18	Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models	Aitor Ormazabal et.al.	2404.12387	null
2024-04-18	MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale	Xiaotang Gai et.al.	2404.12372	null
2024-04-18	When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes	Asaf Yehudai et.al.	2404.12365	link
2024-04-18	*From $r$ to $Q^$ : Your Language Model is Secretly a Q-Function**	Rafael Rafailov et.al.	2404.12358	null
2024-04-18	Towards a Foundation Model for Partial Differential Equation: Multi-Operator Learning and Extrapolation	Jingmin Sun et.al.	2404.12355	link
2024-04-18	V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning	Hang Hua et.al.	2404.12353	null
2024-04-18	Evaluating AI for Law: Bridging the Gap with Open-Source Solutions	Rohan Bhambhoria et.al.	2404.12349	null
2024-04-18	Large Language Models in Targeted Sentiment Analysis	Nicolay Rusnachenko et.al.	2404.12342	link
2024-04-18	Normative Requirements Operationalization with Large Language Models	Nick Feng et.al.	2404.12335	null
2024-04-18	Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment	Zhaofeng Wu et.al.	2404.12318	null
2024-04-18	Large Language Models for Synthetic Participatory Planning of Shared Automated Electric Mobility Systems	Jiangbo Yu et.al.	2404.12317	null
2024-04-18	Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair	Yusuke Sakai et.al.	2404.12299	null
2024-04-18	Augmenting emotion features in irony detection with Large language modeling	Yucheng Lin et.al.	2404.12291	null
2024-04-18	Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery	Yona Falinie A. Gaus et.al.	2404.12285	null
2024-04-18	Enhancing Embedding Performance through Large Language Model-based Text Enrichment and Rewriting	Nicholas Harris et.al.	2404.12283	null
2024-04-18	Advancing the Robustness of Large Language Models through Self-Denoised Smoothing	Jiabao Ji et.al.	2404.12274	link
2024-04-18	FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom	Yuanqin He et.al.	2404.12273	null
2024-04-18	Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences	Shreya Shankar et.al.	2404.12272	null
2024-04-18	Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM	Michelle S. Lam et.al.	2404.12259	link
2024-04-17	Private federated discovery of out-of-vocabulary words for Gboard	Ziteng Sun et.al.	2404.11607	null
2024-04-17	VG4D: Vision-Language Model Goes 4D Video Recognition	Zhichao Deng et.al.	2404.11605	link
2024-04-17	A Deep Dive into Large Language Models for Automated Bug Localization and Repair	Soneya Binta Hossain et.al.	2404.11595	null
2024-04-17	Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding	Zezhong Fan et.al.	2404.11589	null
2024-04-17	LLMTune: Accelerate Database Knob Tuning with Large Language Models	Xinmei Huang et.al.	2404.11581	link
2024-04-17	On the Scalability of GNNs for Molecular Graphs	Maciej Sypetkowski et.al.	2404.11568	null
2024-04-17	MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation	Kuan-Chieh et.al.	2404.11565	null
2024-04-17	Quantifying Multilingual Performance of Large Language Models Across Languages	Zihao Li et.al.	2404.11553	link
2024-04-17	Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis	Soyoung Yang et.al.	2404.11539	null
2024-04-17	FedPFT: Federated Proxy Fine-Tuning of Foundation Models	Zhaopeng Peng et.al.	2404.11536	link
2024-04-17	Select and Reorder: A Novel Approach for Neural Sign Language Production	Harry Walsh et.al.	2404.11532	null
2024-04-17	Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization	Costas Mavromatis et.al.	2404.11531	link
2024-04-17	Embedding Privacy in Computational Social Science and Artificial Intelligence Research	Keenan Jones et.al.	2404.11515	null
2024-04-17	Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models	Yushuo Chen et.al.	2404.11502	link
2024-04-17	Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models	Yue Zhou et.al.	2404.11500	link
2024-04-18	Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent	Wei Chen et.al.	2404.11459	null
2024-04-17	Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models	Sunhao Dai et.al.	2404.11457	link
2024-04-17	AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts	Meng Jiang et.al.	2404.11449	link
2024-04-17	Open-Ended Wargames with Large Language Models	Daniel P. Hogan et.al.	2404.11446	link
2024-04-17	DUPE: Detection Undermining via Prompt Engineering for Deepfake Text	James Weichert et.al.	2404.11408	null
2024-04-16	Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback	Qiwei Di et.al.	2404.10776	null
2024-04-16	COMBO: Compositional World Models for Embodied Multi-Agent Cooperation	Hongxin Zhang et.al.	2404.10775	null
2024-04-16	Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification	Yu-Yang Li et.al.	2404.10757	link
2024-04-16	Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study	Shusheng Xu et.al.	2404.10719	link
2024-04-16	Dual Modalities of Text: Visual and Textual Generative Pre-training	Yekun Chai et.al.	2404.10710	link
2024-04-16	Question Difficulty Ranking for Multiple-Choice Reading Comprehension	Vatsal Raina et.al.	2404.10704	null
2024-04-16	An empirical study on code review activity prediction in practice	Doriane Olewicki et.al.	2404.10703	null
2024-04-16	Automating REST API Postman Test Cases Using LLM	S Deepika Sri et.al.	2404.10678	null
2024-04-16	Self-playing Adversarial Language Game Enhances LLM Reasoning	Pengyu Cheng et.al.	2404.10642	link
2024-04-16	HLAT: High-quality Large Language Model Pre-trained on AWS Trainium	Haozheng Fan et.al.	2404.10630	link
2024-04-16	Private Attribute Inference from Images with Vision-Language Models	Batuhan Tömekçe et.al.	2404.10618	link
2024-04-16	Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases	Yanze Li et.al.	2404.10595	null
2024-04-16	Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training	Masanori Hirano et.al.	2404.10555	null
2024-04-16	Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning	Xiao Wang et.al.	2404.10552	null
2024-04-16	Capturing the Macroscopic Behaviour of Molecular Dynamics with Membership Functions	Alexander Sikorski et.al.	2404.10523	link
2024-04-16	CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity	Moshe Berchansky et.al.	2404.10513	null
2024-04-16	White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency	Yixin Wan et.al.	2404.10508	null
2024-04-16	Self-Supervised Visual Preference Alignment	Ke Zhu et.al.	2404.10501	link
2024-04-16	When Emotional Stimuli meet Prompt Designing: An Auto-Prompt Graphical Paradigm	Chenggian Ma et.al.	2404.10500	null
2024-04-16	Spiral of Silences: How is Large Language Model Killing Information Retrieval? – A Case Study on Open Domain Question Answering	Xiaoyang Chen et.al.	2404.10496	link
2024-04-15	KG-CTG: Citation Generation through Knowledge Graph-guided Large Language Models	Avinash Anand et.al.	2404.09763	null
2024-04-15	Resilience of Large Language Models for Noisy Instructions	Bin Wang et.al.	2404.09754	null
2024-04-15	Personalized Collaborative Fine-Tuning for On-Device Large Language Models	Nicolas Wagner et.al.	2404.09753	link
2024-04-15	AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides	Kewei Li et.al.	2404.09738	link
2024-04-15	Quantization of Large Language Models with an Overdetermined Basis	Daniil Merkulov et.al.	2404.09737	null
2024-04-15	Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models	Ziwei Luo et.al.	2404.09732	link
2024-04-15	Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model	Hyunsoo Cho et.al.	2404.09717	null
2024-04-15	Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction	David Sobrín-Hidalgo et.al.	2404.09705	null
2024-04-15	Generative AI for Game Theory-based Mobile Networking	Long He et.al.	2404.09699	null
2024-04-15	Are Large Language Models Reliable Argument Quality Annotators?	Nailia Mirzakhmedova et.al.	2404.09696	link
2024-04-15	LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models	Guangyan Li et.al.	2404.09695	null
2024-04-15	Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation	Juhwan Choi et.al.	2404.09682	link
2024-04-15	Learn Your Reference Model for Real Good Alignment	Alexey Gorbatovski et.al.	2404.09656	null
2024-04-15	Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection	Jiaqi Zhu et.al.	2404.09654	null
2024-04-15	Bridging Vision and Language Spaces with Assignment Prediction	Jungin Park et.al.	2404.09632	link
2024-04-15	AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception	Yipo Huang et.al.	2404.09624	link
2024-04-15	UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark	Zhaokun Zhou et.al.	2404.09619	null
2024-04-15	A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions	Pengfei Liu et.al.	2404.09606	link
2024-04-15	Improving Recall of Large Language Models: A Model Collaboration Approach for Relational Triple Extraction	Zepeng Ding et.al.	2404.09593	null
2024-04-15	Modelling Language	Jumbly Grindrod et.al.	2404.09579	null
2024-04-15	Transformers, Contextualism, and Polysemy	Jumbly Grindrod et.al.	2404.09577	link
2024-04-15	Large language models and linguistic intentionality	Jumbly Grindrod et.al.	2404.09576	null
2024-04-12	Probing the 3D Awareness of Visual Foundation Models	Mohamed El Banani et.al.	2404.08636	link
2024-04-12	Pre-training Small Base LMs with Fewer Tokens	Sunny Sanyal et.al.	2404.08634	link
2024-04-12	FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models	Yanting Wang et.al.	2404.08631	link
2024-04-12	Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation	Yanhao Zheng et.al.	2404.08603	link
2024-04-12	Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts	Övgü Özdemir et.al.	2404.08589	link
2024-04-12	Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation	Abu Bakor Hayat Arnob et.al.	2404.08584	link
2024-04-12	FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation	Riza Velioglu et.al.	2404.08582	link
2024-04-12	Lossy Image Compression with Foundation Diffusion Models	Lucas Relic et.al.	2404.08580	null
2024-04-12	Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation	Hanlin Tian et.al.	2404.08570	link
2024-04-12	RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs	Shreyas Chaudhari et.al.	2404.08555	null
2024-04-12	Memory Traces: Are Transformers Tulving Machines?	Jean-Marie Chauvet et.al.	2404.08543	null
2024-04-12	Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward	Xuan Xie et.al.	2404.08517	null
2024-04-12	ChatGPT and general-purpose AI count fruits in pictures surprisingly well	Konlavach Mengsuwan et.al.	2404.08515	null
2024-04-12	Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction	Haoran Qiu et.al.	2404.08509	link
2024-04-12	LaSagnA: Language-based Segmentation Assistant for Complex Queries	Cong Wei et.al.	2404.08506	link
2024-04-12	Strategic Interactions between Large Language Models-based Agents in Beauty Contests	Siting Lu et.al.	2404.08492	null
2024-04-12	Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation	Haozhe Zhao et.al.	2404.08491	link
2024-04-12	Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian	Stefano De Paoli et.al.	2404.08488	null
2024-04-12	Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task	Hassan Ali et.al.	2404.08424	null
2024-04-12	Adapting the Segment Anything Model During Usage in Novel Situations	Robin Schön et.al.	2404.08421	null
2024-04-11	OpenBias: Open-set Bias Detection in Text-to-Image Generative Models	Moreno D’Incà et.al.	2404.07990	link
2024-04-11	Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding	Yiwen Tang et.al.	2404.07989	link
2024-04-11	Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning	Simon Schrodi et.al.	2404.07983	link
2024-04-11	Language Imbalance Can Boost Cross-lingual Generalisation	Anton Schäfer et.al.	2404.07982	link
2024-04-11	Manipulating Large Language Models to Increase Product Visibility	Aounon Kumar et.al.	2404.07981	link
2024-04-11	LLoCO: Learning Long Contexts Offline	Sijun Tan et.al.	2404.07979	link
2024-04-11	Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models	Haotian Zhang et.al.	2404.07973	null
2024-04-11	Rho-1: Not All Tokens Are What You Need	Zhenghao Lin et.al.	2404.07965	link
2024-04-11	On Unified Prompt Tuning for Request Quality Assurance in Public Code Review	Xinyu Chen et.al.	2404.07942	null
2024-04-11	Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation	Jinkyung Park et.al.	2404.07926	null
2024-04-11	LaVy: Vietnamese Multimodal Large Language Model	Chi Tran et.al.	2404.07922	link
2024-04-11	AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs	Zeyi Liao et.al.	2404.07921	link
2024-04-11	DesignQA: A Multimodal Benchmark for Evaluating Large Language Models’ Understanding of Engineering Documentation	Anna C. Doris et.al.	2404.07917	link
2024-04-11	HGRN2: Gated Linear RNNs with State Expansion	Zhen Qin et.al.	2404.07904	link
2024-04-11	High-Dimension Human Value Representation in Large Language Models	Samuel Cahyawijaya et.al.	2404.07900	link
2024-04-11	Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations	Dayeon Ki et.al.	2404.07851	link
2024-04-11	On Training Data Influence of GPT Models	Qingyi Liu et.al.	2404.07840	link
2024-04-11	RecurrentGemma: Moving Past Transformers for Efficient Open Language Models	Aleksandar Botev et.al.	2404.07839	link
2024-04-11	Streamlined Photoacoustic Image Processing with Foundation Models: A Training-Free Solution	Handi Deng et.al.	2404.07833	null
2024-04-11	Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese	Yuichi Inoue et.al.	2404.07824	link
2024-04-10	BRAVE: Broadening the visual encoding of vision-language models	Oğuzhan Fatih Kar et.al.	2404.07204	null
2024-04-10	UMBRAE: Unified Multimodal Decoding of Brain Signals	Weihao Xia et.al.	2404.07202	link
2024-04-10	Scaling Laws for Data Filtering – Data Curation cannot be Compute Agnostic	Sachin Goyal et.al.	2404.07177	link
2024-04-10	Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention	Tsendsuren Munkhdalai et.al.	2404.07143	null
2024-04-10	Open reaction-diffusion systems: bridging probabilistic theory across scales	Mauricio J. del Razo et.al.	2404.07119	link
2024-04-10	Continuous Language Model Interpolation for Dynamic and Controllable Text Generation	Sara Kangaslahti et.al.	2404.07117	link
2024-04-11	From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications	Yongqiang Ma et.al.	2404.07108	null
2024-04-10	Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs	Bowen Jin et.al.	2404.07103	link
2024-04-10	Dynamic Generation of Personalities with Large Language Models	Jianzhi Liu et.al.	2404.07084	link
2024-04-10	VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning	Alexandros Xenos et.al.	2404.07078	link
2024-04-10	Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?	Mingyu Jin et.al.	2404.07066	link
2024-04-10	Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study	Alessandro Stolfo et.al.	2404.07060	null
2024-04-10	Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation	Elisa Sanchez-Bayona et.al.	2404.07053	link
2024-04-10	ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling	Ege Özsoy et.al.	2404.07031	link
2024-04-10	Improving Language Model Reasoning with Self-motivated Learning	Yunlong Feng et.al.	2404.07017	null
2024-04-10	A Mathematical Theory for Learning Semantic Languages by Abstract Learners	Kuo-Yu Liao et.al.	2404.07009	null
2024-04-10	WordDecipher: Enhancing Digital Workspace Communication with Explainable AI for Non-native English Speakers	Yuexi Chen et.al.	2404.07005	null
2024-04-10	LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models	Igor Tufanov et.al.	2404.07004	null
2024-04-10	Event Grounded Criminal Court View Generation withCooperative (Large) Language Models	Linan Yue et.al.	2404.07001	link
2024-04-10	Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study	Hongru Du et.al.	2404.06962	link
2024-04-09	InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD	Xiaoyi Dong et.al.	2404.06512	link
2024-04-09	Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?	Yuan-Hong Liao et.al.	2404.06510	null
2024-04-09	On the Effect of (Near) Duplicate Subwords in Language Modelling	Anton Schäfer et.al.	2404.06508	link
2024-04-09	Pitfalls of Conversational LLMs on News Debiasing	Ipek Baris Schlicht et.al.	2404.06488	null
2024-04-10	Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks	Chonghua Wang et.al.	2404.06480	link
2024-04-10	Text-Based Reasoning About Vector Graphics	Zhenhailong Wang et.al.	2404.06479	null
2024-04-09	Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models	Zihan Fang et.al.	2404.06448	null
2024-04-09	Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems	Kunal Garg et.al.	2404.06413	null
2024-04-09	AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents	Luca Gioacchini et.al.	2404.06411	link
2024-04-09	Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak	Hongyu Cai et.al.	2404.06407	link
2024-04-09	Apprentices to Research Assistants: Advancing Research with Large Language Models	M. Namvarpour et.al.	2404.06404	null
2024-04-09	MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies	Shengding Hu et.al.	2404.06395	link
2024-04-09	MuPT: A Generative Symbolic Music Pretrained Transformer	Xingwei Qu et.al.	2404.06393	null
2024-04-09	Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis	Mikel Zubillaga et.al.	2404.06392	null
2024-04-09	Latent Distance Guided Alignment Training for Large Language Models	Haotian Luo et.al.	2404.06390	null
2024-04-09	Model Generation from Requirements with LLMs: an Exploratory Study	Alessio Ferrari et.al.	2404.06371	null
2024-04-09	Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in Python	Valdecy Pereira et.al.	2404.06370	link
2024-04-09	VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs	Yi Gui et.al.	2404.06369	null
2024-04-09	ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish	Fernando Gallego et.al.	2404.06367	null
2024-04-09	Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation	Sidra Aleem et.al.	2404.06362	link
2024-04-08	MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding	Bo He et.al.	2404.05726	link
2024-04-08	Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs	Keen You et.al.	2404.05719	null
2024-04-08	Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding	Ahmad Idrissi-Yaghir et.al.	2404.05694	null
2024-04-08	Evaluating Mathematical Reasoning Beyond Accuracy	Shijie Xia et.al.	2404.05692	link
2024-04-08	Retrieval-Augmented Open-Vocabulary Object Detection	Jooyeon Kim et.al.	2404.05687	link
2024-04-08	MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation	Kunpeng Song et.al.	2404.05674	link
2024-04-08	CoReS: Orchestrating the Dance of Reasoning and Segmentation	Xiaoyi Bao et.al.	2404.05673	link
2024-04-08	Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data	Haitham Hammami et.al.	2404.05632	link
2024-04-08	LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking	Faren Yan et.al.	2404.05624	null
2024-04-08	MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning	Matteo Farina et.al.	2404.05621	link
2024-04-08	SpeechAlign: Aligning Speech Generation to Human Preferences	Dong Zhang et.al.	2404.05600	link
2024-04-08	MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering	Iñigo Alonso et.al.	2404.05590	null
2024-04-08	Enhancing Software Related Information Extraction with Generative Language Models through Single-Choice Question Answering	Wolfgang Otto et.al.	2404.05587	null
2024-04-08	Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model	Yue-Hua Han et.al.	2404.05583	null
2024-04-08	360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System	Shen Gao et.al.	2404.05569	link
2024-04-08	Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models	Bowen Pan et.al.	2404.05567	null
2024-04-08	Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training	Longhui Zhang et.al.	2404.05560	link
2024-04-08	Evaluating Interventional Reasoning Capabilities of Large Language Models	Tejas Kasetty et.al.	2404.05545	null
2024-04-08	OPSD: an Offensive Persian Social media Dataset and its baseline evaluations	Mehran Safayani et.al.	2404.05540	null
2024-04-08	Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data	Tim Baumgärtner et.al.	2404.05530	null
2024-04-05	Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)	Michael Saxon et.al.	2404.04251	link
2024-04-05	Physical Property Understanding from Language-Embedded Feature Fields	Albert J. Zhai et.al.	2404.04242	null
2024-04-05	Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents	Harsh Kohli et.al.	2404.04237	null
2024-04-05	player2vec: A Language Modeling Approach to Understand Player Behavior in Games	Tianze Wang et.al.	2404.04234	null
2024-04-05	Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation	Ji-Jia Wu et.al.	2404.04231	link
2024-04-05	Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation	Tong Su et.al.	2404.04212	null
2024-04-05	Social Skill Training with Large Language Models	Diyi Yang et.al.	2404.04204	null
2024-04-05	Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?	Ilya Ilyankou et.al.	2404.04169	null
2024-04-05	Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model	Xinrun Du et.al.	2404.04167	null
2024-04-05	Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval	João Coelho et.al.	2404.04163	link
2024-04-05	BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models	Jacek Wiland et.al.	2404.04113	link
2024-04-05	Large language models as oracles for instantiating ontologies with domain-specific knowledge	Giovanni Ciatto et.al.	2404.04108	link
2024-04-05	Robust Preference Optimization with Provable Noise Tolerance for LLMs	Xize Liang et.al.	2404.04102	null
2024-04-05	Label Propagation for Zero-shot Classification with Vision-Language Models	Vladan Stojnić et.al.	2404.04072	link
2024-04-05	Assessing the quality of information extraction	Filip Seitl et.al.	2404.04068	null
2024-04-05	CLUE: A Clinical Language Understanding Evaluation for LLMs	Amin Dada et.al.	2404.04067	link
2024-04-05	VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots	Akhil Padmanabha et.al.	2404.04066	null
2024-04-05	A Comparison of Methods for Evaluating Generative IR	Negar Arabzadeh et.al.	2404.04044	link
2024-04-05	Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer	Hele-Andra Kuulmets et.al.	2404.04042	link
2024-04-05	Willkommens-Merkel, Chaos-Johnson, and Tore-Klose: Modeling the Evaluative Meaning of German Personal Name Compounds	Annerose Eichel et.al.	2404.04031	link
2024-04-04	OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views	Francis Engelmann et.al.	2404.03650	null
2024-04-04	AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent	Hanyu Lai et.al.	2404.03648	link
2024-04-04	Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra	Darioush Kevian et.al.	2404.03647	null
2024-04-04	Locating and Editing Factual Associations in Mamba	Arnab Sen Sharma et.al.	2404.03646	link
2024-04-04	Training LLMs over Neurally Compressed Text	Brian Lester et.al.	2404.03626	null
2024-04-04	Standardizing Knowledge Engineering Practices with a Reference Architecture	Bradley P. Allen et.al.	2404.03624	null
2024-04-04	Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph	Marco Bronzini et.al.	2404.03623	link
2024-04-04	Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models	Wenshan Wu et.al.	2404.03622	null
2024-04-04	DeViDe: Faceted medical knowledge for improved medical vision-language pre-training	Haozhe Luo et.al.	2404.03618	null
2024-04-04	Sailor: Open Language Models for South-East Asia	Longxu Dou et.al.	2404.03608	link
2024-04-04	Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization	Aniruddha Nrusimha et.al.	2404.03605	link
2024-04-04	Evaluating LLMs at Detecting Errors in LLM Responses	Ryo Kamoi et.al.	2404.03602	link
2024-04-04	Intent Detection and Entity Extraction from BioMedical Literature	Ankan Mullick et.al.	2404.03598	link
2024-04-04	ReFT: Representation Finetuning for Language Models	Zhengxuan Wu et.al.	2404.03592	link
2024-04-04	SemGrasp: Semantic Grasp Generation via Language Aligned Discretization	Kailin Li et.al.	2404.03590	null
2024-04-04	Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models	Yantao Liu et.al.	2404.03577	link
2024-04-04	Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity	Jake Varley et.al.	2404.03570	null
2024-04-04	Personalized LLM Response Generation with Parameterized Memory Injection	Kai Zhang et.al.	2404.03565	link
2024-04-04	Select and Summarize: Scene Saliency for Movie Script Summarization	Rohit Saxena et.al.	2404.03561	link
2024-04-04	How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes	Harmon Bhasin et.al.	2404.03558	link
2024-04-03	ALOHa: A New Measure for Hallucination in Captioning Models	Suzanne Petryk et.al.	2404.02904	null
2024-04-03	MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment	Duygu Ceylan et.al.	2404.02899	null
2024-04-03	ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline	Yifan Xu et.al.	2404.02893	link
2024-04-03	MODNO: Multi Operator Learning With Distributed Neural Operators	Zecheng Zhang et.al.	2404.02892	null
2024-04-03	Linear Attention Sequence Parallelism	Weigao Sun et.al.	2404.02882	link
2024-04-03	Integrating Explanations in Learning LTL Specifications from Demonstrations	Ashutosh Gupta et.al.	2404.02872	null
2024-04-03	Toward Inference-optimal Mixture-of-Expert Large Language Models	Longfei Yun et.al.	2404.02852	null
2024-04-03	I-Design: Personalized LLM Interior Designer	Ata Çelen et.al.	2404.02838	null
2024-04-03	Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models	Wanyun Cui et.al.	2404.02837	null
2024-04-03	Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison	Maxime Bouthors et.al.	2404.02835	null
2024-04-03	Empowering Biomedical Discovery with AI Agents	Shanghua Gao et.al.	2404.02831	null
2024-04-03	BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models	Qijun Luo et.al.	2404.02827	link
2024-04-03	Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models	Haoran Sun et.al.	2404.02823	link
2024-04-03	A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches	Zhigen Zhao et.al.	2404.02817	null
2024-04-03	The RealHumanEval: Evaluating Large Language Models’ Abilities to Support Programmers	Hussein Mozannar et.al.	2404.02806	link
2024-04-03	Efficient Multi-Vector Dense Retrieval Using Bit Vectors	Franco Maria Nardini et.al.	2404.02805	link
2024-04-03	AI and personalized learning: bridging the gap with modern educational goals	Kristjan-Julius Laak et.al.	2404.02798	null
2024-04-03	CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech	Jaehyeon Kim et.al.	2404.02781	null
2024-04-03	FPT: Feature Prompt Tuning for Few-shot Readability Assessment	Ziyang Wang et.al.	2404.02772	link
2024-04-03	DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement	Hao Wu et.al.	2404.02755	null
2024-04-02	Segment Any 3D Object with Language	Seungjun Lee et.al.	2404.02157	null
2024-04-02	Iterated Learning Improves Compositionality in Large Vision-Language Models	Chenhao Zheng et.al.	2404.02145	null
2024-04-02	Topic-based Watermarks for LLM-Generated Text	Alexander Nemecek et.al.	2404.02138	null
2024-04-02	ViTamin: Designing Scalable Vision Models in the Vision-Language Era	Jienneg Chen et.al.	2404.02132	link
2024-04-02	FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning	Joel Niklaus et.al.	2404.02127	link
2024-04-02	Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models	Wanyong Feng et.al.	2404.02124	link
2024-04-02	GINopic: Topic Modeling with Graph Isomorphism Network	Suman Adhya et.al.	2404.02115	link
2024-04-02	CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems	Sara Rosenthal et.al.	2404.02103	link
2024-04-02	Advancing LLM Reasoning Generalists with Preference Trees	Lifan Yuan et.al.	2404.02078	link
2024-04-02	Red-Teaming Segment Anything Model	Krzysztof Jankowski et.al.	2404.02067	link
2024-04-02	Digital Forgetting in Large Language Models: A Survey of Unlearning Methods	Alberto Blanco-Justicia et.al.	2404.02062	null
2024-04-02	Long-context LLMs Struggle with Long In-context Learning	Tianle Li et.al.	2404.02060	link
2024-04-02	IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT	Junchen Fu et.al.	2404.02059	link
2024-04-02	Deconstructing In-Context Learning: Understanding Prompts via Corruption	Namrata Shivagunde et.al.	2404.02054	link
2024-04-02	A Survey on Large Language Model-Based Game Agents	Sihao Hu et.al.	2404.02039	link
2024-04-02	MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages	Daryna Dementieva et.al.	2404.02037	null
2024-04-02	Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts	Zhuo Chen et.al.	2404.02022	link
2024-04-02	Large Language Models for Orchestrating Bimanual Robots	Kun Chu et.al.	2404.02018	link
2024-04-02	MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving	Jiangfei Duan et.al.	2404.02015	link
2024-04-02	Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models	Stephan Linzbach et.al.	2404.01992	null
2024-03-29	Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models	Atsuyuki Miyai et.al.	2403.20331	link
2024-03-29	Are We on the Right Way for Evaluating Large Vision-Language Models?	Lin Chen et.al.	2403.20330	link
2024-03-29	ReALM: Reference Resolution As Language Modeling	Joel Ruben Antony Moniz et.al.	2403.20329	null
2024-03-29	Gecko: Versatile Text Embeddings Distilled from Large Language Models	Jinhyuk Lee et.al.	2403.20327	null
2024-03-29	Convolutional Prompting meets Language Models for Continual Learning	Anurag Roy et.al.	2403.20317	null
2024-03-29	Learn “No” to Say “Yes” Better: Improving Vision-Language Models via Negations	Jaisidh Singh et.al.	2403.20312	link
2024-03-29	Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference	Jovan Stojkovic et.al.	2403.20306	null
2024-03-29	Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain	Burcu Sayin et.al.	2403.20288	link
2024-03-29	LUQ: Long-text Uncertainty Quantification for LLMs	Caiqi Zhang et.al.	2403.20279	link
2024-04-01	Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want	Weifeng Lin et.al.	2403.20271	link
2024-03-29	Latxa: An Open Language Model and Evaluation Suite for Basque	Julen Etxaniz et.al.	2403.20266	link
2024-03-29	ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models	Thibaut Thonet et.al.	2403.20262	link
2024-03-29	MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation	Taha Koleilat et.al.	2403.20253	link
2024-03-29	Using LLMs to Model the Beliefs and Preferences of Targeted Populations	Keiichi Namikoshi et.al.	2403.20252	null
2024-03-29	Long-Tailed Anomaly Detection with Learnable Class Names	Chih-Hui Ho et.al.	2403.20236	null
2024-03-29	H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model	Chao Pang et.al.	2403.20213	link
2024-03-29	Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science	Yazheng Yang et.al.	2403.20208	null
2024-03-29	The Future of Combating Rumors? Retrieval, Discrimination, and Generation	Junhao Xu et.al.	2403.20204	null
2024-03-29	ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models	Shuo Liu et.al.	2403.20194	null
2024-03-29	HARMamba: Efficient Wearable Sensor Human Activity Recognition Based on Bidirectional Selective SSM	Shuangjian Li et.al.	2403.20183	null
2024-03-28	RSMamba: Remote Sensing Image Classification with State Space Model	Keyan Chen et.al.	2403.19654	link
2024-03-28	InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction	Sirui Xu et.al.	2403.19652	null
2024-03-28	MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions	Kai Zhang et.al.	2403.19651	link
2024-03-28	Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models	Samuel Marks et.al.	2403.19647	link
2024-03-28	Change-Agent: Towards Interactive Comprehensive Change Interpretation and Analysis from Change Detection and Change Captioning	Chenyang Liu et.al.	2403.19646	link
2024-03-28	Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models	Yucheng Shi et.al.	2403.19631	link
2024-03-28	RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents	Zeren Chen et.al.	2403.19622	null
2024-03-28	SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects	Avinash Ummadisingu et.al.	2403.19607	null
2024-03-28	Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation	Zhongliang Zhou et.al.	2403.19584	link
2024-03-28	Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics	Norman Di Palo et.al.	2403.19578	null
2024-03-28	WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models	Piotr Molenda et.al.	2403.19548	null
2024-03-28	Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models	Ang Lv et.al.	2403.19521	link
2024-03-28	Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical Data	Shan Chen et.al.	2403.19511	link
2024-03-28	LLMs as Academic Reading Companions: Extending HCI Through Synthetic Personae	Celia Chen et.al.	2403.19506	null
2024-03-28	Evolving Assembly Code in an Adversarial Environment	Irina Maliukov et.al.	2403.19489	link
2024-03-28	JDocQA: Japanese Document Question Answering Dataset for Generative Language Models	Eri Onami et.al.	2403.19454	link
2024-03-28	Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model	Qi Gou et.al.	2403.19443	null
2024-03-28	OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion	Xinyu Zhan et.al.	2403.19417	null
2024-03-28	BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation	Yuhong He et.al.	2403.19414	null
2024-03-28	Checkpoint Merging via Bayesian Optimization in LLM Pretraining	Deyuan Liu et.al.	2403.19390	null
2024-03-27	Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models	Yanwei Li et.al.	2403.18814	link
2024-03-27	ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation	Suraj Patni et.al.	2403.18807	link
2024-03-27	Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation	Mateusz Klimaszewski et.al.	2403.18804	link
2024-03-27	Projective Methods for Mitigating Gender Bias in Pre-trained Language Models	Hillary Dawkins et.al.	2403.18803	link
2024-03-27	Long-form factuality in large language models	Jerry Wei et.al.	2403.18802	link
2024-03-27	Towards a World-English Language Model for On-Device Virtual Assistants	Rricha Jalota et.al.	2403.18783	null
2024-03-27	3P-LLM: Probabilistic Path Planning using Large Language Model for Autonomous Robot Navigation	Ehsan Latif et.al.	2403.18778	null
2024-03-27	ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object	Chenshuang Zhang et.al.	2403.18775	link
2024-03-27	CheckEval: Robust Evaluation Framework using Large Language Model via Checklist	Yukyung Lee et.al.	2403.18771	null
2024-03-27	MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model	Yike Wu et.al.	2403.18760	link
2024-03-27	CYCLE: Learning to Self-Refine the Code Generation	Yangruibo Ding et.al.	2403.18746	link
2024-03-27	Understanding the Learning Dynamics of Alignment with Human Feedback	Shawn Im et.al.	2403.18742	link
2024-03-27	PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations	Ehsan Latif et.al.	2403.18721	null
2024-03-27	Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding	Xintong Wang et.al.	2403.18715	link
2024-03-27	The Invalsi Benchmark: measuring Language Models Mathematical and Language understanding in Italian	Andrea Esuli et.al.	2403.18697	null
2024-03-27	NL-ITI: Optimizing Probing and Intervention for Improvement of ITI Method	Jakub Hoscilowicz et.al.	2403.18680	link
2024-03-27	An Exploratory Study on Upper-Level Computing Students’ Use of Large Language Models as Tools in a Semester-Long Project	Ben Arie Tanay et.al.	2403.18679	null
2024-03-27	SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens	Chengbo Liu et.al.	2403.18647	link
2024-03-27	To Recommend or Not: Recommendability Identification in Conversations with Pre-trained Language Models	Zhefan Wang et.al.	2403.18628	link
2024-03-27	Vulnerability Detection with Code Language Models: How Far Are We?	Yangruibo Ding et.al.	2403.18624	link
2024-03-26	OmniVid: A Generative Framework for Universal Video Understanding	Junke Wang et.al.	2403.17935	link
2024-03-26	Track Everything Everywhere Fast and Robustly	Yunzhou Song et.al.	2403.17931	null
2024-03-26	MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution	Wei Tao et.al.	2403.17927	null
2024-03-26	LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning	Rui Pan et.al.	2403.17919	link
2024-03-26	Large scale paired antibody language models	Henry Kenlay et.al.	2403.17889	null
2024-03-26	Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation	Carlos Gomes et.al.	2403.17886	link
2024-03-26	MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation	Andreea Iana et.al.	2403.17876	link
2024-03-26	Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach	Andrea Ferrario et.al.	2403.17873	null
2024-03-26	Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications	Philip Lippmann et.al.	2403.17860	null
2024-03-26	ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages	Bhawna Piryani et.al.	2403.17859	link
2024-03-26	Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs	David R. Mortensen et.al.	2403.17856	null
2024-03-26	ArabicaQA: A Comprehensive Dataset for Arabic Question Answering	Abdelrahman Abdallah et.al.	2403.17848	link
2024-03-26	Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation	Abdelrhman Werby et.al.	2403.17846	null
2024-03-26	Mechanistic Design and Scaling of Hybrid Architectures	Michael Poli et.al.	2403.17844	link
2024-03-26	ReMamber: Referring Image Segmentation with Mamba Twister	Yuhuan Yang et.al.	2403.17839	link
2024-03-26	A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities	Ibrahim Ethem Hamamci et.al.	2403.17834	link
2024-03-26	Assessment of Multimodal Large Language Models in Alignment with Human Values	Zhelun Shi et.al.	2403.17830	null
2024-03-26	Accelerating Radio Spectrum Regulation Workflows with Large Language Models (LLMs)	Amir Ghasemi et.al.	2403.17819	null
2024-03-26	Graph Language Model (GLM): A new graph-based approach to detect social instabilities	Wallyson Lemes de Oliveira et.al.	2403.17816	null
2024-03-26	Are Compressed Language Models Less Subgroup Robust?	Leonidas Gee et.al.	2403.17811	link
2024-03-25	Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making	Shuai Ma et.al.	2403.16812	null
2024-03-25	An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems	Hanqing Yang et.al.	2403.16809	link
2024-03-25	Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback	Zhangqian Bi et.al.	2403.16792	link
2024-03-25	All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification	Deepak Narayan Gadde et.al.	2403.16750	null
2024-03-25	A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models	Nils Ingelhag et.al.	2403.16730	null
2024-03-25	ProCQA: A Large-scale Community-based Programming Question Answering Dataset for Code Search	Zehan Li et.al.	2403.16702	link
2024-03-25	Synapse: Learning Preferential Concepts from Visual Demonstrations	Sadanand Modak et.al.	2403.16689	null
2024-03-25	Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography	Jiayue Zhang et.al.	2403.16687	null
2024-03-25	RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict	Yirong Zeng et.al.	2403.16662	link
2024-03-25	Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT	Rohit Raju et.al.	2403.16655	null
2024-03-25	CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment	Feiteng Fang et.al.	2403.16649	link
2024-03-25	Virtual Co-Pilot: Multimodal Large Language Model-enabled Quick-access Procedures for Single Pilot Operations	Fan Li et.al.	2403.16645	null
2024-03-25	Semantically Enriched Cross-Lingual Sentence Embeddings for Crisis-related Social Media Texts	Rabindra Lamsal et.al.	2403.16614	null
2024-03-25	Conversational Grounding: Annotation and Analysis of Grounding Acts and Grounding Units	Biswesh Mohapatra et.al.	2403.16609	null
2024-03-25	TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques	Ashok Urlana et.al.	2403.16592	null
2024-03-25	Can Large Language Models (or Humans) Distill Text?	Nicolas Audinet de Pieuchon et.al.	2403.16584	link
2024-03-25	NSINA: A News Corpus for Sinhala	Hansi Hettiarachchi et.al.	2403.16571	link
2024-03-25	Elysium: Exploring Object-level Perception in Videos via MLLM	Han Wang et.al.	2403.16558	link
2024-03-25	DOrA: 3D Visual Grounding with Order-Aware Referring	Tung-Yu Wu et.al.	2403.16539	null
2024-03-25	Open-Set Recognition in the Age of Vision-Language Models	Dimity Miller et.al.	2403.16528	link
2024-03-25	Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art	Neeloy Chakraborty et.al.	2403.16527	null
2024-03-25	Harnessing the power of LLMs for normative reasoning in MASs	Bastin Tony Roy Savarimuthu et.al.	2403.16524	null
2024-03-25	Norm Violation Detection in Multi-Agent Systems using Large Language Models: A Pilot Study	Shawn He et.al.	2403.16517	null
2024-03-25	Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social Media	Uma Sushmitha Gunturi et.al.	2403.16514	null
2024-03-22	LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models	Yuzhang Shang et.al.	2403.15388	null
2024-03-22	Long-CLIP: Unlocking the Long-Text Capability of CLIP	Beichen Zhang et.al.	2403.15378	link
2024-03-22	InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding	Yi Wang et.al.	2403.15377	link
2024-03-22	Can large language models explore in-context?	Akshay Krishnamurthy et.al.	2403.15371	null
2024-03-22	CoLLEGe: Concept Embedding Generation for Large Language Models	Ryan Teehan et.al.	2403.15362	null
2024-03-22	Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities	Zhitong Xiong et.al.	2403.15356	link
2024-03-22	Controlled Training Data Generation with Diffusion Models	Teresa Yeo et.al.	2403.15309	null
2024-03-22	Sphere Neural-Networks for Rational Reasoning	Tiansi Dong et.al.	2403.15297	null
2024-03-22	Measuring Gender and Racial Biases in Large Language Models	Jiafu An et.al.	2403.15281	null
2024-03-22	Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review	Jinge Wang et.al.	2403.15274	null
2024-03-22	Event Temporal Relation Extraction based on Retrieval-Augmented on LLMs	Xiaobin Zhang et.al.	2403.15273	null
2024-03-22	Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models	Huanxuan Liao et.al.	2403.15268	link
2024-03-22	AI Exposure and Strategic Positioning on an Online Work Platform	Shun Yiu et.al.	2403.15262	null
2024-03-22	FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions	Orion Weller et.al.	2403.15246	link
2024-03-22	Shadow Generation for Composite Image Using Diffusion model	Qingyang Liu et.al.	2403.15234	link
2024-03-22	An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets	Jonathan Katzy et.al.	2403.15230	link
2024-03-22	Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models	Qiong Wu et.al.	2403.15226	link
2024-03-22	Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations	Pranav Kulkarni et.al.	2403.15218	link
2024-03-22	InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection	Thales Bertaglia et.al.	2403.15214	link
2024-03-22	MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection	Taeheon Kim et.al.	2403.15209	null
2024-03-21	MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?	Renrui Zhang et.al.	2403.14624	null
2024-03-21	Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey	Zeyu Han et.al.	2403.14608	null
2024-03-21	MyVLM: Personalizing VLMs for User-Specific Queries	Yuval Alaluf et.al.	2403.14599	null
2024-03-21	ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training	Zonghan Yang et.al.	2403.14589	null
2024-03-21	Large Language Models for Multi-Choice Question Classification of Medical Subjects	Víctor Ponce-López et.al.	2403.14582	null
2024-03-21	RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain	William James Bolton et.al.	2403.14578	link
2024-03-21	A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students’ Formative Assessment Responses in Science	Clayton Cohn et.al.	2403.14565	null
2024-03-21	The Era of Semantic Decoding	Maxime Peyrard et.al.	2403.14562	null
2024-03-21	Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling	Chengxu Zhuang et.al.	2403.14551	null
2024-03-21	EDT: Improving Large Language Models’ Generation by Entropy-based Dynamic Temperature Sampling	Shimao Zhang et.al.	2403.14541	link
2024-03-21	Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference	Han Zhao et.al.	2403.14520	link
2024-03-21	The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs)	Joschka Haltaufderheide et.al.	2403.14473	null
2024-03-21	Detoxifying Large Language Models via Knowledge Editing	Mengru Wang et.al.	2403.14472	link
2024-03-21	ChatGPT Alternative Solutions: Large Language Models Survey	Hanieh Alipour et.al.	2403.14469	null
2024-03-21	Recourse for reclamation: Chatting with generative language models	Jennifer Chien et.al.	2403.14467	null
2024-03-21	Towards Single-System Illusion in Software-Defined Vehicles – Automated, AI-Powered Workflow	Krzysztof Lebioda et.al.	2403.14460	null
2024-03-21	Multi-Level Explanations for Generative Language Models	Lucas Monteiro Paes et.al.	2403.14459	null
2024-03-21	gTBLS: Generating Tables from Text by Conditional Question Answering	Anirudh Sundar et.al.	2403.14457	null
2024-03-21	Language Models Can Reduce Asymmetry in Information Markets	Nasim Rahaman et.al.	2403.14443	null
2024-03-21	A Multimodal Approach to Device-Directed Speech Detection with Large Language Models	Dominik Wager et.al.	2403.14438	null
2024-03-20	RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition	Ziyu Liu et.al.	2403.13805	link
2024-03-20	Learning from Models and Data for Visual Grounding	Ruozhen He et.al.	2403.13804	null
2024-03-20	Reverse Training to Nurse the Reversal Curse	Olga Golovneva et.al.	2403.13799	null
2024-03-20	Bridge the Modality and Capacity Gaps in Vision-Language Model Selection	Chao Yi et.al.	2403.13797	null
2024-03-20	RewardBench: Evaluating Reward Models for Language Modeling	Nathan Lambert et.al.	2403.13787	link
2024-03-20	Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts	Guangzeng Han et.al.	2403.13786	link
2024-03-20	Information-Theoretic Distillation for Reference-less Summarization	Jaehun Jung et.al.	2403.13780	null
2024-03-20	Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation	Hugues Thomas et.al.	2403.13777	null
2024-03-20	Describe-and-Dissect: Interpreting Neurons in Vision Networks with Language Models	Nicholas Bai et.al.	2403.13771	link
2024-03-20	Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model	Diwei Wang et.al.	2403.13756	null
2024-03-20	Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement	Catherine Arnett et.al.	2403.13754	null
2024-03-20	EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation	Atnafu Lambebo Tonja et.al.	2403.13737	null
2024-03-20	Large Language Models meet Network Slicing Management and Orchestration	Abdulhalim Dandoush et.al.	2403.13721	null
2024-03-20	SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning	Hongjun Wang et.al.	2403.13684	null
2024-03-20	PARAMANU-AYN: An Efficient Novel Generative and Instruction-tuned Language Model for Indian Legal Case Documents	Mitodru Niyogi et.al.	2403.13681	null
2024-03-20	RoleInteract: Evaluating the Social Interaction of Role-Playing Agents	Hongzhan Chen et.al.	2403.13679	link
2024-03-20	Grounding Spatial Relations in Text-Only Language Models	Gorka Azkune et.al.	2403.13666	link
2024-03-20	Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese	Meet Doshi et.al.	2403.13638	null
2024-03-20	VL-Mamba: Exploring State Space Models for Multimodal Learning	Yanyuan Qiao et.al.	2403.13600	null
2024-03-20	No more optimization rules: LLM-enabled policy-based multi-modal query optimizer (version 1)	Yifan Wang et.al.	2403.13597	null
2024-03-19	LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression	Zhuoshi Pan et.al.	2403.12968	link
2024-03-19	Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models	Zuyan Liu et.al.	2403.12966	link
2024-03-19	Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models	Ce Zhang et.al.	2403.12964	link
2024-03-19	Dated Data: Tracing Knowledge Cutoffs in Large Language Models	Jeffrey Cheng et.al.	2403.12958	link
2024-03-19	Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models	Elaine Sui et.al.	2403.12952	link
2024-03-19	Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models	Joana Ribeiro de Faria et.al.	2403.12936	null
2024-03-19	Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties	Efrain Torres-Lomas et.al.	2403.12935	null
2024-03-19	Rapid AIdeation: Generating Ideas With the Self and in Collaboration With Large Language Models	Gionnieve Lim et.al.	2403.12928	null
2024-03-19	Supporting Energy Policy Research with Large Language Models	Grant Buster et.al.	2403.12924	null
2024-03-19	Contextual AD Narration with Interleaved Multimodal Sequence	Hanlin Wang et.al.	2403.12922	link
2024-03-19	Semantic Layering in Room Segmentation via LLMs	Taehyeon Kim et.al.	2403.12920	null
2024-03-19	Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts	Sai Ashish Somayajula et.al.	2403.12918	link
2024-03-19	Yell At Your Robot: Improving On-the-Fly from Language Corrections	Lucy Xiaoyang Shi et.al.	2403.12910	null
2024-03-19	Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference	Baolin Li et.al.	2403.12900	null
2024-03-19	mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding	Anwen Hu et.al.	2403.12895	link
2024-03-20	MEDBind: Unifying Language and Multimodal Medical Data Embeddings	Yuan Gao et.al.	2403.12894	null
2024-03-19	HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning	Fucai Ke et.al.	2403.12884	link
2024-03-19	Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models	Zehui Chen et.al.	2403.12881	link
2024-03-19	Epistemology of Language Models: Do Language Models Have Holistic Knowledge?	Minsu Kim et.al.	2403.12862	null
2024-03-19	RASP: A Drone-based Reconfigurable Actuation and Sensing Platform Towards Ambient Intelligent Systems	Minghui Zhao et.al.	2403.12853	null
2024-03-18	Modality-Agnostic fMRI Decoding of Vision and Language	Mitja Nikolaus et.al.	2403.11771	null
2024-03-18	Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs	M. Jehanzeb Mirza et.al.	2403.11755	link
2024-03-18	Revisiting The Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems	Aditya Narayan Sankaran et.al.	2403.11752	link
2024-03-18	Embedded Named Entity Recognition using Probing Classifiers	Nicholas Popovič et.al.	2403.11747	link
2024-03-18	TTT-KD: Test-Time Training for 3D Semantic Segmentation through Knowledge Distillation from Foundation Models	Lisa Weijler et.al.	2403.11691	null
2024-03-18	HDLdebugger: Streamlining HDL debugging with Large Language Models	Xufeng Yao et.al.	2403.11671	null
2024-03-18	Prioritized Semantic Learning for Zero-shot Instance Navigation	Xander Sun et.al.	2403.11650	link
2024-03-18	Arc2Face: A Foundation Model of Human Faces	Foivos Paraperas Papantoniou et.al.	2403.11641	link
2024-03-18	Compositional Kronecker Context Optimization for Vision-Language Models	Kun Ding et.al.	2403.11631	null
2024-03-18	Let’s Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model	Haoyun Xu et.al.	2403.11621	null
2024-03-18	CRS-Diff: Controllable Generative Remote Sensing Foundation Model	Datao Tang et.al.	2403.11614	link
2024-03-18	Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines	Ekaterina Trofimova et.al.	2403.11585	null
2024-03-18	Reinforcement Learning with Token-level Feedback for Controllable Text Generation	Wendi Li et.al.	2403.11558	link
2024-03-18	LLM^3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning	Shu Wang et.al.	2403.11552	link
2024-03-18	Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters	Jiazuo Yu et.al.	2403.11549	link
2024-03-18	DEE: Dual-stage Explainable Evaluation Method for Text Generation	Shenyu Zhang et.al.	2403.11509	null
2024-03-18	Do CLIPs Always Generalize Better than ImageNet Models?	Qizhou Wang et.al.	2403.11497	null
2024-03-18	VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding	Yue Fan et.al.	2403.11481	null
2024-03-18	HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models	Huy Nghiem et.al.	2403.11456	link
2024-03-18	Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge	Jiahe Wang et.al.	2403.11450	null
2024-03-18	LLM Guided Evolution - The Automation of Models Advancing Models	Clint Morris et.al.	2403.11446	link
2024-03-18	StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation	Jinpeng Li et.al.	2403.11439	null
2024-03-18	InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions	Yifan Wang et.al.	2403.11435	null
2024-03-18	A Novel Paradigm Boosting Translation Capabilities of Large Language Models	Jiaxin Guo et.al.	2403.11430	null
2024-03-15	VideoAgent: Long-form Video Understanding with Large Language Model as Agent	Xiaohan Wang et.al.	2403.10517	null
2024-03-15	Demystifying Faulty Code with LLM: Step-by-Step Reasoning for Explainable Fault Localization	Ratnadira Widyasari et.al.	2403.10507	null
2024-03-15	ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment	Xiaofeng Wu et.al.	2403.10504	null
2024-03-15	Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study	Chenguang Wang et.al.	2403.10499	link
2024-03-15	Reconfigurable Robot Identification from Motion Data	Yuhang Hu et.al.	2403.10496	null
2024-03-15	Can a GPT4-Powered AI Agent Be a Good Enough Performance Attribution Analyst?	Bruno de Melo et.al.	2403.10482	null
2024-03-15	Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases	Jiarui Li et.al.	2403.10446	link
2024-03-15	Optimal Block-Level Draft Verification for Accelerating Speculative Decoding	Ziteng Sun et.al.	2403.10444	null
2024-03-15	Using an LLM to Turn Sign Spottings into Spoken Language Sentences	Ozge Mercanoglu Sincan et.al.	2403.10434	null
2024-03-15	SocialGenPod: Privacy-Friendly Generative AI Social Web Applications with Decentralised Personal Data Stores	Vidminas Vizgirda et.al.	2403.10408	link
2024-03-15	A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE	Hervé Déjean et.al.	2403.10407	null
2024-03-15	Monotonic Representation of Numeric Properties in Language Models	Benjamin Heinzerling et.al.	2403.10381	link
2024-03-15	EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models	Rocktim Jyoti Das et.al.	2403.10378	link
2024-03-15	TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale	Pengcheng Jiang et.al.	2403.10351	null
2024-03-15	Investigating grammatical abstraction in language models using few-shot learning of novel noun gender	Priyanka Sukumaran et.al.	2403.10338	null
2024-03-15	CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model	Shang-Hsuan Chiang et.al.	2403.10326	link
2024-03-15	NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models	Chen Qian et.al.	2403.10319	link
2024-03-15	Uni-SMART: Universal Science Multimodal Analysis and Research Transformer	Hengxing Cai et.al.	2403.10301	null
2024-03-15	Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models	Tian Meng et.al.	2403.10287	null
2024-03-15	Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning	Shang-Hsuan Chiang et.al.	2403.10281	link
2024-03-14	GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping	Yuhang Zheng et.al.	2403.09637	link
2024-03-14	Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference	Piotr Nawrot et.al.	2403.09636	null
2024-03-14	Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models	Akhil Kedia et.al.	2403.09635	link
2024-03-14	OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning	Lingyi Hong et.al.	2403.09634	null
2024-03-14	3D-VLA: A 3D Vision-Language-Action Generative World Model	Haoyu Zhen et.al.	2403.09631	null
2024-03-14	Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking	Eric Zelikman et.al.	2403.09629	link
2024-03-14	Explore In-Context Segmentation via Latent Diffusion Models	Chaoyang Wang et.al.	2403.09616	null
2024-03-14	MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training	Brandon McKinzie et.al.	2403.09611	null
2024-03-14	Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey	Xiaoyu Liu et.al.	2403.09606	null
2024-03-14	Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis	Gregory Coppola et.al.	2403.09599	null
2024-03-14	Renovating Names in Open-Vocabulary Segmentation Benchmarks	Haiwen Huang et.al.	2403.09593	null
2024-03-14	ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models	Runyu Ma et.al.	2403.09583	null
2024-03-14	Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation	Yunhao Gou et.al.	2403.09572	null
2024-03-14	Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models	Laura Fernández-Becerra et.al.	2403.09567	null
2024-03-14	Welcome Your New AI Teammate: On Safety Analysis by Leashing Large Language Models	Ali Nouri et.al.	2403.09565	null
2024-03-14	PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps	Ruixuan Liu et.al.	2403.09562	null
2024-03-14	Less is More: Data Value Estimation for Visual Instruction Tuning	Zikang Liu et.al.	2403.09559	null
2024-03-15	Logits of API-Protected LLMs Leak Proprietary Information	Matthew Finlayson et.al.	2403.09539	null
2024-03-14	VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding	Chris Kelly et.al.	2403.09530	null
2024-03-15	WavCraft: Audio Editing and Generation with Natural Language Prompts	Jinhua Liang et.al.	2403.09527	link
2024-03-13	Simple and Scalable Strategies to Continually Pre-train Large Language Models	Adam Ibrahim et.al.	2403.08763	link
2024-03-13	Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework	Jingling Li et.al.	2403.08743	null
2024-03-13	The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models	Carlo Nicolini et.al.	2403.08739	null
2024-03-13	ILCiteR: Evidence-grounded Interpretable Local Citation Recommendation	Sayar Ghosh Roy et.al.	2403.08737	link
2024-03-13	Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization	Renjie Pi et.al.	2403.08730	null
2024-03-14	SOTOPIA- $π$ : Interactive Learning of Socially Intelligent Language Agents	Ruiyi Wang et.al.	2403.08715	link
2024-03-13	Review of Generative AI Methods in Cybersecurity	Yagmur Yigit et.al.	2403.08701	null
2024-03-13	TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning	Shangding Gu et.al.	2403.08694	link
2024-03-13	Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages	Rik van Noord et.al.	2403.08693	null
2024-03-13	Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records	Erlend Frayling et.al.	2403.08664	null
2024-03-13	Self-Supervised Learning for Covariance Estimation	Tzvi Diskin et.al.	2403.08662	null
2024-03-13	Human Alignment of Large Language Models through Online Preference Optimisation	Daniele Calandriello et.al.	2403.08635	null
2024-03-13	MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models	Subash Neupane et.al.	2403.08607	null
2024-03-13	Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation	Daniel Honerkamp et.al.	2403.08605	link
2024-03-13	DevBench: A Comprehensive Benchmark for Software Development	Bowen Li et.al.	2403.08604	link
2024-03-13	Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments	Sitao Cheng et.al.	2403.08593	null
2024-03-13	Non-discrimination Criteria for Generative Language Models	Sara Sterlie et.al.	2403.08564	link
2024-03-13	AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models	Yifei Gao et.al.	2403.08542	link
2024-03-13	Language models scale reliably with over-training and on downstream tasks	Samir Yitzhak Gadre et.al.	2403.08540	link
2024-03-13	Masked Generative Story Transformer with Character Guidance and Caption Augmentation	Christos Papadimitriou et.al.	2403.08502	link
2024-03-12	Beyond Text: Frozen Large Language Models in Visual Signal Comprehension	Lei Zhu et.al.	2403.07874	link
2024-03-12	Rethinking Generative Large Language Model Evaluation for Semantic Comprehension	Fangyun Wei et.al.	2403.07872	null
2024-03-12	Exploring Safety Generalization Challenges of Large Language Models via Code	Qibing Ren et.al.	2403.07865	link
2024-03-12	Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation	Shihao Zhao et.al.	2403.07860	link
2024-03-12	MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric	Haokun Lin et.al.	2403.07839	null
2024-03-12	DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies	William Xie et.al.	2403.07832	null
2024-03-12	The Missing Piece in Model Editing: A Deep Dive into the Hidden Damage Brought By Model Editing	Jianchen Wang et.al.	2403.07825	null
2024-03-12	Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM	Sainbayar Sukhbaatar et.al.	2403.07816	null
2024-03-12	Chronos: Learning the Language of Time Series	Abdul Fatir Ansari et.al.	2403.07815	link
2024-03-12	Beyond Memorization: The Challenge of Random Memory Access in Language Models	Tongyao Zhu et.al.	2403.07805	link
2024-03-12	Fine-tuning Large Language Models with Sequential Instructions	Hanxu Hu et.al.	2403.07794	link
2024-03-12	Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations	Carlos Jose Xavier Cruz et.al.	2403.07769	link
2024-03-12	Synth $^2$ : Boosting Visual-Language Models with Synthetic Captions and Image Embeddings	Sahand Sharifzadeh et.al.	2403.07750	null
2024-03-12	FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models	Yan Liu et.al.	2403.07747	null
2024-03-12	Multi-modal Auto-regressive Modeling via Visual Words	Tianshuo Peng et.al.	2403.07720	link
2024-03-12	WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?	Alexandre Drouin et.al.	2403.07718	link
2024-03-12	StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models	Zhicheng Guo et.al.	2403.07714	link
2024-03-12	Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards	Wei Shen et.al.	2403.07708	null
2024-03-12	Large, Small or Both: A Novel Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization	Yanyue Zhang et.al.	2403.07693	null
2024-03-12	Reference-free Monolithic Preference Optimization with Odds Ratio	Jiwoo Hong et.al.	2403.07691	link
2024-03-11	Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena	Leonie Weissweiler et.al.	2403.06965	null
2024-03-11	Materials science in the era of large language models: a perspective	Ge Lei et.al.	2403.06949	null
2024-03-11	Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation	Xinyao Li et.al.	2403.06946	link
2024-03-11	Naming, Describing, and Quantifying Visual Objects in Humans and LLMs	Alberto Testoni et.al.	2403.06935	link
2024-03-11	ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis	Yanming Liu et.al.	2403.06932	link
2024-03-11	MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning	Yichuan Li et.al.	2403.06914	link
2024-03-11	Application of Quantum Tensor Networks for Protein Classification	Debarshi Kundu et.al.	2403.06890	null
2024-03-11	Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents	Nishchal Prasad et.al.	2403.06872	link
2024-03-11	Semantic Residual Prompts for Continual Learning	Martin Menabue et.al.	2403.06870	link
2024-03-11	Learning with Noisy Foundation Models	Hao Chen et.al.	2403.06869	null
2024-03-11	A Geospatial Approach to Predicting Desert Locust Breeding Grounds in Africa	Ibrahim Salihu Yusuf et.al.	2403.06860	null
2024-03-11	Development of a Reliable and Accessible Caregiving Language Model (CaLM)	Bambang Parmanto et.al.	2403.06857	null
2024-03-11	DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation	Guosheng Zhao et.al.	2403.06845	null
2024-03-11	RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback	Yanming Liu et.al.	2403.06840	link
2024-03-11	ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts	Lyuye Zhang et.al.	2403.06838	null
2024-03-11	Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?	Egor Zverev et.al.	2403.06833	link
2024-03-11	The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework	Zhuo Chen et.al.	2403.06832	link
2024-03-11	ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model	Zhiwei Liu et.al.	2403.06765	link
2024-03-11	An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models	Liang Chen et.al.	2403.06764	link
2024-03-11	ALaRM: Align Language Models via Hierarchical Rewards Modeling	Yuhang Lai et.al.	2403.06754	link
2024-03-08	Bayesian Preference Elicitation with Language Models	Kunal Handa et.al.	2403.05534	null
2024-03-08	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	Machel Reid et.al.	2403.05530	null
2024-03-08	GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM	Hao Kang et.al.	2403.05527	link
2024-03-08	DeepSeek-VL: Towards Real-World Vision-Language Understanding	Haoyu Lu et.al.	2403.05525	link
2024-03-08	Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola	Yijiang Li et.al.	2403.05523	null
2024-03-08	Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT	Aisha Khatun et.al.	2403.05519	null
2024-03-08	Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought	James Chua et.al.	2403.05518	link
2024-03-08	To Err Is Human, but Llamas Can Learn It Too	Agnes Luhtaru et.al.	2403.05493	link
2024-03-08	Will GPT-4 Run DOOM?	Adrian de Wynter et.al.	2403.05468	null
2024-03-08	Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs	Arijit Nag et.al.	2403.05434	null
2024-03-08	Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition	Bingbing Wang et.al.	2403.05428	null
2024-03-08	FedFMS: Exploring Federated Foundation Models for Medical Image Segmentation	Yuxi Liu et.al.	2403.05408	link
2024-03-08	Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery	Xavier Bou et.al.	2403.05381	link
2024-03-08	VLM-PL: Advanced Pseudo Labeling approach Class Incremental Object Detection with Vision-Language Model	Junsu Kim et.al.	2403.05346	null
2024-03-08	Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings	Wei Zhou et.al.	2403.05338	null
2024-03-08	ChatASU: Evoking LLM’s Reflexion to Truly Understand Aspect Sentiment in Dialogues	Yiding Liu et.al.	2403.05326	null
2024-03-08	RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation	Zihao Wang et.al.	2403.05313	null
2024-03-08	Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents	Jinyang Li et.al.	2403.05307	link
2024-03-08	ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications	Sotaro Takeshita et.al.	2403.05303	link
2024-03-08	Modeling Dynamic (De)Allocations of Local Memory for Translation Validation	Abhishek Rose et.al.	2403.05302	null
2024-03-07	iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries	Adam Coscia et.al.	2403.04760	link
2024-03-07	KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts	Adam Coscia et.al.	2403.04758	link
2024-03-07	LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error	Boshi Wang et.al.	2403.04746	link
2024-03-08	How Far Are We from Intelligent Visual Deductive Reasoning?	Yizhe Zhang et.al.	2403.04732	link
2024-03-07	Common 7B Language Models Already Possess Strong Math Capabilities	Chen Li et.al.	2403.04706	link
2024-03-07	ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes	Hashmat Shadab Malik et.al.	2403.04701	link
2024-03-07	Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification	Ekaterina Fadeeva et.al.	2403.04696	link
2024-03-07	Telecom Language Models: Must They Be Large?	Nicola Piovesan et.al.	2403.04666	null
2024-03-07	Yi: Open Foundation Models by 01.AI	01. AI et.al.	2403.04652	link
2024-03-07	Teaching Large Language Models to Reason with Reinforcement Learning	Alex Havrilla et.al.	2403.04642	null
2024-03-07	CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Qilang Ye et.al.	2403.04640	link
2024-03-07	A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds	Xuenan Xu et.al.	2403.04594	link
2024-03-07	Embodied Understanding of Driving Scenarios	Yunsong Zhou et.al.	2403.04593	link
2024-03-07	Wiki-TabNER:Advancing Table Interpretation Through Named Entity Recognition	Aneta Koleva et.al.	2403.04577	link
2024-03-07	Reducing self-supervised learning complexity improves weakly-supervised classification performance in computational pathology	Tim Lenz et.al.	2403.04558	null
2024-03-07	Enhancing Data Quality in Federated Fine-Tuning of Foundation Models	Wanru Zhao et.al.	2403.04529	null
2024-03-07	Where does In-context Translation Happen in Large Language Models	Suzanna Sia et.al.	2403.04510	null
2024-03-07	GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability	Zihan Luo et.al.	2403.04483	link
2024-03-08	Do Large Language Model Understand Multi-Intent Spoken Language ?	Shangjian Yin et.al.	2403.04481	link
2024-03-08	Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset	Minjin Kim et.al.	2403.04460	link
2024-03-06	Backtracing: Retrieving the Cause of the Query	Rose E. Wang et.al.	2403.03956	link
2024-03-06	Bridging Language and Items for Retrieval and Recommendation	Yupeng Hou et.al.	2403.03952	link
2024-03-06	The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models	Adithya Bhaskar et.al.	2403.03942	link
2024-03-06	Did Translation Models Get More Robust Without Anyone Even Noticing?	Ben Peters et.al.	2403.03923	null
2024-03-06	Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing	Asmita et.al.	2403.03897	link
2024-03-06	IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators	Indraneil Paul et.al.	2403.03894	link
2024-03-06	From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models	Luiza Pozzobon et.al.	2403.03893	link
2024-03-06	FaaF: Facts as a Function for the evaluation of RAG systems	Vasileios Katranidis et.al.	2403.03888	link
2024-03-06	SaulLM-7B: A pioneering Large Language Model for Law	Pierre Colombo et.al.	2403.03883	null
2024-03-06	Learning to Decode Collaboratively with Multiple Language Models	Shannon Zejiang Shen et.al.	2403.03870	link
2024-03-06	On the Origins of Linear Representations in Large Language Models	Yibo Jiang et.al.	2403.03867	null
2024-03-06	KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions	Fangyuan Xu et.al.	2403.03866	null
2024-03-06	Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning	Deepanway Ghosal et.al.	2403.03864	link
2024-03-06	X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification	Hanzi Xu et.al.	2403.03863	link
2024-03-06	Designing Informative Metrics for Few-Shot Example Selection	Rishabh Adiga et.al.	2403.03861	null
2024-03-06	Emojinize : Enriching Any Text with Emoji Translations	Lars Henning Klein et.al.	2403.03857	null
2024-03-06	ShortGPT: Layers in Large Language Models are More Redundant Than You Expect	Xin Men et.al.	2403.03853	null
2024-03-06	Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ	Carolin Holtermann et.al.	2403.03814	link
2024-03-06	Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery	Wei Zhang et.al.	2403.03790	null
2024-03-06	PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion	Zekai Zhang et.al.	2403.03788	link
2024-03-05	The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning	Nathaniel Li et.al.	2403.03218	null
2024-03-05	CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments	Savitha Sam Abraham et.al.	2403.03203	null
2024-03-05	Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement	Rafaela Martelo et.al.	2403.03188	link
2024-03-05	Reliable, Adaptable, and Attributable Language Models with Retrieval	Akari Asai et.al.	2403.03187	null
2024-03-05	MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting	Fangchen Liu et.al.	2403.03174	null
2024-03-05	SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection	Peng Qi et.al.	2403.03170	null
2024-03-05	PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset	Arda Uzunoğlu et.al.	2403.03167	link
2024-03-05	Quantum Many-Body Physics Calculations with Large Language Models	Haining Pan et.al.	2403.03154	null
2024-03-05	Language Guided Exploration for RL Agents in Text Environments	Hitesh Golchha et.al.	2403.03141	null
2024-03-05	CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following	Kaiyan Zhang et.al.	2403.03129	null
2024-03-05	Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution	Flor Miriam Plaza-del-Arco et.al.	2403.03121	link
2024-03-05	“In Dialogues We Learn”: Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning	Chuanqi Cheng et.al.	2403.03102	null
2024-03-05	KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents	Yuqi Zhu et.al.	2403.03101	link
2024-03-05	Learning to Use Tools via Cooperative and Interactive Agents	Zhengliang Shi et.al.	2403.03031	link
2024-03-05	Socratic Reasoning Improves Positive Text Rewriting	Anmol Goel et.al.	2403.03029	null
2024-03-05	Word Importance Explains How Prompts Affect Language Model Outputs	Stefan Hackmann et.al.	2403.03028	null
2024-03-05	OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following	Haochen Shi et.al.	2403.03017	null
2024-03-05	Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations	Hasan Abu-Rasheed et.al.	2403.03008	null
2024-03-05	Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models	Gen Luo et.al.	2403.03003	link
2024-03-05	Localized Zeroth-Order Prompt Optimization	Wenyang Hu et.al.	2403.02993	null
2024-03-02	LM4OPT: Unveiling the Potential of Large Language Models in Formulating Mathematical Optimization Problems	Tasnim Ahmed et.al.	2403.01342	null
2024-03-02	Making Hybrid Languages: A Recipe	Leif Andersen et.al.	2403.01335	null
2024-03-02	Chaining thoughts and LLMs to learn DNA structural biophysics	Tyler D. Ross et.al.	2403.01332	link
2024-03-02	VBART: The Turkish LLM	Meliksah Turker et.al.	2403.01308	null
2024-03-02	ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation	Moran Yanuka et.al.	2403.01306	link
2024-03-02	Improving the Validity of Automatically Generated Feedback via Reinforcement Learning	Alexander Scarlatos et.al.	2403.01304	link
2024-03-02	NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention	Tianyi Zhang et.al.	2403.01273	link
2024-03-02	Employing LLMs for Incident Response Planning and Review	Sam Hays et.al.	2403.01271	null
2024-03-02	Dissecting Language Models: Machine Unlearning via Selective Pruning	Nicholas Pochinkov et.al.	2403.01267	link
2024-03-02	Accelerating Greedy Coordinate Gradient via Probe Sampling	Yiran Zhao et.al.	2403.01251	link
2024-03-02	SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code	Ziniu Hu et.al.	2403.01248	null
2024-03-02	Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal	Jianheng Huang et.al.	2403.01244	link
2024-03-02	IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact	Ruikang Liu et.al.	2403.01241	link
2024-03-02	Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy	Jamie Hayes et.al.	2403.01218	null
2024-03-02	API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access	Jiayuan Su et.al.	2403.01216	null
2024-03-02	Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning	Shuo Yang et.al.	2403.01209	null
2024-03-02	The Case for Animal-Friendly AI	Sankalpa Ghose et.al.	2403.01199	null
2024-03-02	DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling	Shanghaoran Quan et.al.	2403.01197	link
2024-03-02	RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots	Philip Feldman. James R. Foulds et.al.	2403.01193	null
2024-03-02	Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding	Ha-Thanh Nguyen et.al.	2403.01185	null
2024-02-29	The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?	Alex Gu et.al.	2402.19475	null
2024-02-29	The All-Seeing Project V2: Towards General Relation Comprehension of the Open World	Weiyun Wang et.al.	2402.19474	link
2024-02-29	Retrieval-Augmented Generation for AI-Generated Content: A Survey	Penghao Zhao et.al.	2402.19473	link
2024-02-29	Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling	Gabriel Grand et.al.	2402.19471	null
2024-03-01	TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning	Kate Sanders et.al.	2402.19467	null
2024-02-29	Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models	Chen Qian et.al.	2402.19465	link
2024-02-29	Curiosity-driven Red-teaming for Large Language Models	Zhang-Wei Hong et.al.	2402.19464	link
2024-02-29	Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap	Saurabh Srivastava et.al.	2402.19450	link
2024-02-29	Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models	Frederik Kunstner et.al.	2402.19449	null
2024-02-29	ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL	Yifei Zhou et.al.	2402.19446	link
2024-02-29	Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation	Jonathan Yang et.al.	2402.19432	null
2024-02-29	Compositional API Recommendation for Library-Oriented Code Generation	Zexiong Ma et.al.	2402.19431	null
2024-02-29	Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models	Soham De et.al.	2402.19427	null
2024-02-29	Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines	Lijia Ma et.al.	2402.19421	null
2024-02-29	PaECTER: Patent-level Representation Learning using Citation-informed Transformers	Mainak Ghosh et.al.	2402.19411	null
2024-02-29	On the Scaling Laws of Geographical Representation in Language Models	Nathan Godey et.al.	2402.19406	null
2024-02-29	Entity-Aware Multimodal Alignment Framework for News Image Captioning	Junzhe Zhang et.al.	2402.19404	null
2024-02-29	Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Match Human Crowd Accuracy	Philipp Schoenegger et.al.	2402.19379	null
2024-02-29	OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models	Jenish Maharjan et.al.	2402.19371	null
2024-02-29	SoK: Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency	Akila Wickramasekara et.al.	2402.19366	null
2024-02-28	Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards	Haoxiang Wang et.al.	2402.18571	link
2024-02-28	Diffusion Language Models Are Versatile Protein Learners	Xinyou Wang et.al.	2402.18567	link
2024-02-28	A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic	Gregory Coppola et.al.	2402.18566	null
2024-02-28	Approaching Human-Level Forecasting with Language Models	Danny Halawi et.al.	2402.18563	null
2024-02-28	Implicit Bias of Next-Token Prediction	Christos Thrampoulidis et.al.	2402.18551	null
2024-02-28	Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling	Mahdi Karami et.al.	2402.18508	null
2024-02-28	Few-Shot Fairness: Unveiling LLM’s Potential for Fairness-Aware Classification	Garima Chhikara et.al.	2402.18502	null
2024-02-28	Language Models Represent Beliefs of Self and Others	Wentao Zhu et.al.	2402.18496	null
2024-02-28	IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding	Lanyun Zhu et.al.	2402.18476	null
2024-02-28	Meta-Task Prompting Elicits Embedding from Large Language Models	Yibin Lei et.al.	2402.18458	link
2024-02-28	Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization	Deng Li et.al.	2402.18447	null
2024-02-28	Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication	Weize Chen et.al.	2402.18439	link
2024-02-28	A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models	Xiujie Song et.al.	2402.18409	link
2024-02-28	Balanced Similarity with Auxiliary Prompts: Towards Alleviating Text-to-Image Retrieval Bias for CLIP in Zero-shot Learning	Hanyao Wang et.al.	2402.18400	null
2024-02-28	Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models	Ercong Nie et.al.	2402.18397	null
2024-02-28	The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA	Yiming Li et.al.	2402.18385	link
2024-02-28	Large Language Models As Evolution Strategies	Robert Tjarko Lange et.al.	2402.18381	null
2024-02-28	Tokenization Is More Than Compression	Craig W. Schmidt et.al.	2402.18376	link
2024-02-28	VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models	Seoyeon Kim et.al.	2402.18374	link
2024-02-28	Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning	Jiachun Li et.al.	2402.18344	link
2024-02-27	ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Zekun Qi et.al.	2402.17766	link
2024-02-27	The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits	Shuming Ma et.al.	2402.17764	null
2024-02-27	Massive Activations in Large Language Models	Mingjie Sun et.al.	2402.17762	link
2024-02-27	Towards Optimal Learning of Language Models	Yuxian Gu et.al.	2402.17759	null
2024-02-27	Evaluating Very Long-Term Conversational Memory of LLM Agents	Adyasha Maharana et.al.	2402.17753	null
2024-02-27	Tower: An Open Multilingual Large Language Model for Translation-Related Tasks	Duarte M. Alves et.al.	2402.17733	link
2024-02-27	AmbigNLG: Addressing Task Ambiguity in Instruction for NLG	Ayana Niwa et.al.	2402.17717	link
2024-02-27	Case-Based or Rule-Based: How Do Transformers Do the Math?	Yi Hu et.al.	2402.17709	link
2024-02-27	RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations	Jing Huang et.al.	2402.17700	link
2024-02-27	NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents	Tamara Czinczoll et.al.	2402.17682	link
2024-02-27	The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks	Ashwin Prasad Shivarpatna Venkatesh et.al.	2402.17679	null
2024-02-27	CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention	Mohammad Sadil Khan et.al.	2402.17678	null
2024-02-27	Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models	Yunpeng Huang et.al.	2402.17671	null
2024-02-27	Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs	Tanise Ceron et.al.	2402.17649	null
2024-02-27	SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation	Shuangrui Ding et.al.	2402.17645	link
2024-02-27	Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data	Xiao Liu et.al.	2402.17644	link
2024-02-27	Variational Learning is Effective for Large Deep Networks	Yuesong Shen et.al.	2402.17641	link
2024-02-27	Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling	David S. W. Williams et.al.	2402.17622	null
2024-02-27	Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization	Wenqi Zhang et.al.	2402.17574	link
2024-02-27	Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers	Xinyu Tang et.al.	2402.17564	link
2024-02-26	Integrating Large Language Models with Graphical Session-Based Recommendation	Naicheng Guo et.al.	2402.16539	null
2024-02-26	LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments	Junzhe Chen et.al.	2402.16499	link
2024-02-26	On Languaging a Simulation Engine	Han Liu et.al.	2402.16482	null
2024-02-26	Unveiling ChatGPT’s Usage in Open Source Projects: A Mining-based Study	Rosalia Tufano et.al.	2402.16480	null
2024-02-26	mEdIT: Multilingual Text Editing via Instruction Tuning	Vipul Raheja et.al.	2402.16472	link
2024-02-26	Unveiling Vulnerability of Self-Attention	Khai Jiet Liong et.al.	2402.16470	link
2024-02-26	Defending LLMs against Jailbreaking Attacks via Backtranslation	Yihan Wang et.al.	2402.16459	link
2024-02-26	ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing	Liuzhenghao Lv et.al.	2402.16445	link
2024-02-26	ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors	Zhexin Zhang et.al.	2402.16444	link
2024-02-26	Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models	Tianyi Tang et.al.	2402.16438	link
2024-02-26	RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions	Yuansen Zhang et.al.	2402.16431	null
2024-02-26	Predicting Sustainable Development Goals Using Course Descriptions – from LLMs to Conventional Foundation Models	Lev Kharlashkin et.al.	2402.16420	null
2024-02-26	From RAGs to riches: Using large language models to write documents for clinical trials	Nigel Markey et.al.	2402.16406	null
2024-02-26	MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property	Shiwen Ni et.al.	2402.16389	link
2024-02-26	Immunization against harmful fine-tuning attacks	Domenic Rosati et.al.	2402.16382	null
2024-02-26	Improving LLM-based Machine Translation with Systematic Self-Correction	Zhaopeng Feng et.al.	2402.16379	link
2024-02-26	Unraveling Babel: Exploring Multilingual Activation Patterns within Large Language Models	Weize Liu et.al.	2402.16367	null
2024-02-26	LLM Inference Unveiled: Survey and Roofline Model Insights	Zhihang Yuan et.al.	2402.16363	link
2024-02-26	Layer-wise Regularized Dropout for Neural Language Models	Shiwen Ni et.al.	2402.16361	null
2024-02-26	An Integrated Data Processing Framework for Pretraining Foundation Models	Yiding Sun et.al.	2402.16358	link
2024-02-26	Language-guided Skill Learning with Temporal Variational Inference	Haotian Fu et.al.	2402.16354	null
2024-02-23	AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning	Jianguo Zhang et.al.	2402.15506	link
2024-02-23	API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs	Kinjal Basu et.al.	2402.15491	link
2024-02-23	Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models	Yiran Liu et.al.	2402.15481	null
2024-02-23	Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization	Swaroop Nath et.al.	2402.15473	link
2024-02-23	Repetition Improves Language Model Embeddings	Jacob Mitchell Springer et.al.	2402.15449	link
2024-02-23	A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models	Stefan Hegselmann et.al.	2402.15422	link
2024-02-23	PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning	Simon Holk et.al.	2402.15420	null
2024-02-23	Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?	Nader Asadi et.al.	2402.15414	null
2024-02-23	Grasp, See and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior	Kechun Xu et.al.	2402.15402	link
2024-02-23	Explorations of Self-Repair in Language Models	Cody Rushing et.al.	2402.15390	link
2024-02-23	Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction	Jun Wang et.al.	2402.15368	null
2024-02-23	Farsight: Fostering Responsible AI Awareness During AI Application Prototyping	Zijie J. Wang et.al.	2402.15350	link
2024-02-23	NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data	Sergei Bogdanov et.al.	2402.15343	link
2024-02-23	Ranking Entities along Conceptual Space Dimensions with LLMs: An Analysis of Fine-Tuning Strategies	Nitesh Kumar et.al.	2402.15337	link
2024-02-23	GPTVQ: The Blessing of Dimensionality for LLM Quantization	Mart van Baalen et.al.	2402.15319	null
2024-02-23	ArabianGPT: Native Arabic GPT-based Large Language	Anis Koubaa et.al.	2402.15313	null
2024-02-23	Counterfactual Generation with Identifiability Guarantees	Hanqi Yan et.al.	2402.15309	link
2024-02-23	Representing Online Handwriting for Recognition in Large Vision-Language Models	Anastasiia Fadeeva et.al.	2402.15307	null
2024-02-23	How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries	Somnath Banerjee et.al.	2402.15302	link
2024-02-23	Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models	Yuzhe Zhang et.al.	2402.15301	null
2024-02-22	PALO: A Polyglot Large Multimodal Model for 5B People	Muhammad Maaz et.al.	2402.14818	link
2024-02-22	Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging	Yuzhe Yang et.al.	2402.14815	link
2024-02-22	WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition	Lianghui Zhu et.al.	2402.14812	link
2024-02-22	Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking	Nikhil Prakash et.al.	2402.14811	null
2024-02-22	CriticBench: Benchmarking LLMs for Critique-Correct Reasoning	Zicheng Lin et.al.	2402.14809	link
2024-02-22	RelayAttention for Efficient Large Language Model Serving with Long System Prompts	Lei Zhu et.al.	2402.14808	link
2024-02-22	A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health	Nikhil Behari et.al.	2402.14807	null
2024-02-22	Identifying Multiple Personalities in Large Language Models with External Evaluation	Xiaoyang Song et.al.	2402.14805	null
2024-02-22	Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models	Xudong Lu et.al.	2402.14800	link
2024-02-22	Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic	Nathaniel Weir et.al.	2402.14798	null
2024-02-22	Zero-shot cross-lingual transfer in instruction tuning of large language model	Nadezhda Chirkova et.al.	2402.14778	null
2024-02-22	2D Matryoshka Sentence Embeddings	Xianming Li et.al.	2402.14776	link
2024-02-22	DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models	Yuhang Cao et.al.	2402.14767	link
2024-02-22	MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues	Ge Bai et.al.	2402.14762	link
2024-02-22	Generalizing Reward Modeling for Out-of-Distribution Preference Learning	Chen Jia et.al.	2402.14760	link
2024-02-22	Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation	Jiawei Wang et.al.	2402.14744	link
2024-02-22	Dependency Annotation of Ottoman Turkish with Multilingual BERT	Şaziye Betül Özateş et.al.	2402.14743	null
2024-02-22	Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs	Arash Ahmadian et.al.	2402.14740	null
2024-02-22	Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models	Seungduk Kim et.al.	2402.14714	link
2024-02-22	IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus	Honghao Gui et.al.	2402.14710	link
2024-02-21	Coercing LLMs to do and reveal (almost) anything	Jonas Geiping et.al.	2402.14020	link
2024-02-21	Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment	Vyas Raina et.al.	2402.14016	link
2024-02-21	OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems	Chaoqun He et.al.	2402.14008	link
2024-02-21	Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models	Zhiwei He et.al.	2402.14007	link
2024-02-21	Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models	Aline Ioste et.al.	2402.14002	null
2024-02-21	Analysing The Impact of Sequence Composition on Language Model Pre-Training	Yu Zhao et.al.	2402.13991	link
2024-02-21	Towards Building Multilingual Language Model for Medicine	Pengcheng Qiu et.al.	2402.13963	link
2024-02-21	Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality	Rahul Zalkikar et.al.	2402.13954	link
2024-02-21	Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning	Debjit Paul et.al.	2402.13950	null
2024-02-21	Do Efficient Transformers Really Save Computation?	Kai Yang et.al.	2402.13934	null
2024-02-21	Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content	Federico Bianchi et.al.	2402.13926	null
2024-02-21	SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization	Prakamya Mishra et.al.	2402.13919	link
2024-02-21	What Linguistic Features and Languages are Important in LLM Translation?	Ryandito Diandaru et.al.	2402.13917	null
2024-02-21	Calibrating Large Language Models with Sample Consistency	Qing Lyu et.al.	2402.13904	null
2024-02-21	Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models	Chenyang Lyu et.al.	2402.13887	null
2024-02-21	$\texttt{Se}^2$: $\textit{Se}$quential Example $\textit{Se}$ lection for In-Context Learning	Haoyu Liu et.al.	2402.13874	link
2024-02-21	An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach	Mohammad Amaz Uddin et.al.	2402.13871	null
2024-02-21	Kuaiji: the First Chinese Accounting Large Language Model	Jiayuan Luo et.al.	2402.13866	null
2024-02-21	RealDex: Towards Human-like Grasping for Robotic Dexterous Hand	Yumeng Liu et.al.	2402.13853	null
2024-02-21	VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models	Jiawei Liang et.al.	2402.13851	null
2024-02-20	Towards audio language modeling – an overview	Haibin Wu et.al.	2402.13236	null
2024-02-20	Unlocking Insights: Semantic Search in Jupyter Notebooks	Lan Li et.al.	2402.13234	null
2024-02-20	A Touch, Vision, and Language Dataset for Multimodal Alignment	Letian Fu et.al.	2402.13232	link
2024-02-20	Investigating Cultural Alignment of Large Language Models	Badr AlKhamissi et.al.	2402.13231	link
2024-02-20	Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive	Arka Pal et.al.	2402.13228	link
2024-02-20	AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning	Qiao Jin et.al.	2402.13225	null
2024-02-20	RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian	Adrian Cosma et.al.	2402.13222	link
2024-02-20	How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts	Yusu Qian et.al.	2402.13220	link
2024-02-20	Softmax Probabilities (Mostly) Predict Large Language Model Correctness on Multiple-Choice Q&A	Benjamin Plaut et.al.	2402.13213	link
2024-02-20	Soft Self-Consistency Improves Language Model Agents	Han Wang et.al.	2402.13212	link
2024-02-20	Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation	Dongjin Kang et.al.	2402.13211	null
2024-02-20	Bayesian Reward Models for LLM Alignment	Adam X. Yang et.al.	2402.13210	null
2024-02-20	How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena	Marco Gaido et.al.	2402.13208	link
2024-02-20	Question Calibration and Multi-Hop Modeling for Temporal Question Answering	Chao Xue et.al.	2402.13188	null
2024-02-20	What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents	Mingyu Jin et.al.	2402.13184	link
2024-02-20	DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models	Norman Di Palo et.al.	2402.13181	null
2024-02-20	Benchmarking Retrieval-Augmented Generation for Medicine	Guangzhi Xiong et.al.	2402.13178	link
2024-02-20	Defending Jailbreak Prompts via In-Context Adversarial Game	Yujun Zhou et.al.	2402.13148	null
2024-02-20	OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog	Adnen Abdessaied et.al.	2402.13146	null
2024-02-20	The Hidden Space of Transformer Language Adapters	Jesujoba O. Alabi et.al.	2402.13137	link
2024-02-19	Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding	Zhuoming Chen et.al.	2402.12374	link
2024-02-19	AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies	Xiao Ye et.al.	2402.12370	link
2024-02-19	A Critical Evaluation of AI Feedback for Aligning Large Language Models	Archit Sharma et.al.	2402.12366	link
2024-02-19	Emergent Word Order Universals from Cognitively-Motivated Language Models	Tatsuki Kuribayashi et.al.	2402.12363	link
2024-02-19	Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge	Julien Delile et.al.	2402.12352	null
2024-02-19	GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations	Jinhao Duan et.al.	2402.12348	link
2024-02-19	Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!	Zhanhui Zhou et.al.	2402.12343	link
2024-02-19	Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models	Christian Schlarmann et.al.	2402.12336	link
2024-02-19	Query-Based Adversarial Prompt Generation	Jonathan Hayase et.al.	2402.12329	null
2024-02-19	Shall We Talk: Exploring Spontaneous Collaborations of Competing LLM Agents	Zengqing Wu et.al.	2402.12327	link
2024-02-19	ARKS: Active Retrieval in Knowledge Soup for Code Generation	Hongjin Su et.al.	2402.12317	link
2024-02-19	Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray Reports	Felix J. Dorfner et.al.	2402.12298	null
2024-02-19	KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students	Matthew Shu et.al.	2402.12291	null
2024-02-19	DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models	Xiaoyu Tian et.al.	2402.12289	null
2024-02-19	Adaptive Skeleton Graph Decoding	Shuowei Jin et.al.	2402.12280	null
2024-02-19	Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks	Nadezhda Chirkova et.al.	2402.12279	null
2024-02-19	Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models	Puxuan Yu et.al.	2402.12276	link
2024-02-19	High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models	Michela Lorandi et.al.	2402.12267	link
2024-02-19	Uncertainty quantification in fine-tuned LLMs using LoRA ensembles	Oleksandr Balabanov et.al.	2402.12264	link
2024-02-19	NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms	Jonathan Zheng et.al.	2402.12261	link
2024-02-16	PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter	Junfei Xiao et.al.	2402.10896	null
2024-02-16	RLVF: Learning from Verbal Feedback without Overgeneralization	Moritz Stephan et.al.	2402.10893	link
2024-02-16	Instruction Diversity Drives Generalization To Unseen Tasks	Dylan Zhang et.al.	2402.10891	null
2024-02-16	When is Tree Search Useful for LLM Planning? It Depends on the Discriminator	Ziru Chen et.al.	2402.10890	link
2024-02-16	Multi-modal preference alignment remedies regression of visual instruction tuning on language model	Shengzhi Li et.al.	2402.10884	link
2024-02-16	EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models	Muhammad Shihab Rashid et.al.	2402.10866	link
2024-02-16	Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities	Mingyu Jin et.al.	2402.10835	link
2024-02-16	RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model	Jianhao Yuan et.al.	2402.10828	null
2024-02-16	Quantifying the Persona Effect in LLM Simulations	Tiancheng Hu et.al.	2402.10811	link
2024-02-16	Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond	Yongqi Li et.al.	2402.10805	null
2024-02-16	EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge	Xuan Shen et.al.	2402.10787	link
2024-02-16	A Condensed Transition Graph Framework for Zero-shot Link Prediction with Large Language Models	Mingchen Li et.al.	2402.10779	null
2024-02-16	AutoGPT+P: Affordance-based Task Planning with Large Language Models	Timo Birr et.al.	2402.10778	null
2024-02-16	How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs?	Ehsan Doostmohammadi et.al.	2402.10770	null
2024-02-16	Distillation Enhanced Generative Retrieval	Yongqi Li et.al.	2402.10769	null
2024-02-16	Inference to the Best Explanation in Large Language Models	Dhairya Dalal et.al.	2402.10767	null
2024-02-16	When Dataflow Analysis Meets Large Language Models	Chengpeng Wang et.al.	2402.10754	link
2024-02-16	ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages	Junjie Ye et.al.	2402.10753	link
2024-02-16	GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models	Pengcheng Jiang et.al.	2402.10744	link
2024-02-16	Let’s Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning	Yinpeng Liu et.al.	2402.10738	link
2024-02-15	Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation	Huizhuo Yuan et.al.	2402.10210	null
2024-02-15	Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment	Rui Yang et.al.	2402.10207	link
2024-02-15	Chain-of-Thought Reasoning Without Prompting	Xuezhi Wang et.al.	2402.10200	null
2024-02-15	A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents	Lingbo Mo et.al.	2402.10196	link
2024-02-15	BitDelta: Your Fine-Tune May Only Be Worth One Bit	James Liu et.al.	2402.10193	link
2024-02-15	Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models	Chen Ling et.al.	2402.10189	link
2024-02-15	Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective	Tianyi Qiu et.al.	2402.10184	null
2024-02-15	TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation	Yaoxiang Wang et.al.	2402.10178	link
2024-02-15	OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset	Shubham Toshniwal et.al.	2402.10176	link
2024-02-15	Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence	Yinhong Liu et.al.	2402.10175	link
2024-02-15	OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models	Ali AhmadiTeshnizi et.al.	2402.10172	link
2024-02-15	Data Engineering for Scaling Language Models to 128K Context	Yao Fu et.al.	2402.10171	link
2024-02-15	Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients	Mahyar Abbasian et.al.	2402.10153	null
2024-02-15	ControlLM: Crafting Diverse Personalities for Language Models	Yixuan Weng et.al.	2402.10151	link
2024-02-15	TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles	Yinhong Liu et.al.	2402.10137	null
2024-02-15	Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem	Davor Hafnar et.al.	2402.10133	link
2024-02-15	Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning	Ming Li et.al.	2402.10110	link
2024-02-15	Quantized Embedding Vectors for Controllable Diffusion Language Models	Cheng Kang et.al.	2402.10107	null
2024-02-15	GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving	Jiaxin Zhang et.al.	2402.10104	link
2024-02-15	Any-Shift Prompting for Generalization over Distributions	Zehao Xiao et.al.	2402.10099	null
2024-02-14	AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability	Siwei Yang et.al.	2402.09404	link
2024-02-14	Reinforcement Learning from Human Feedback with Active Queries	Kaixuan Ji et.al.	2402.09401	null
2024-02-14	Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference	Harry Dong et.al.	2402.09398	link
2024-02-14	LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset	Botao Yu et.al.	2402.09391	link
2024-02-14	HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation	Yihao Fang et.al.	2402.09390	link
2024-02-14	Transformers Can Achieve Length Generalization But Not Robustly	Yongchao Zhou et.al.	2402.09371	null
2024-02-14	Pseudorandom Error-Correcting Codes	Miranda Christ et.al.	2402.09370	null
2024-02-14	Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking	Yi Fung et.al.	2402.09369	link
2024-02-14	Copyright Traps for Large Language Models	Matthieu Meeus et.al.	2402.09363	link
2024-02-14	HiRE: High Recall Approximate Top- $k$ Estimation for Efficient LLM Inference	Yashas Samaga B L et.al.	2402.09360	null
2024-02-14	Developing a Framework for Auditing Large Language Models Using Human-in-the-Loop	Maryam Amirizaniani et.al.	2402.09346	null
2024-02-14	Mitigating Reward Hacking via Information-Theoretic Reward Modeling	Yuchun Miao et.al.	2402.09345	link
2024-02-14	AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach	Maryam Amirizaniani et.al.	2402.09334	null
2024-02-14	ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization	Feifan Song et.al.	2402.09320	link
2024-02-14	Embracing the black box: Heading towards foundation models for causal discovery from time series data	Gideon Stein et.al.	2402.09305	link
2024-02-14	Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code	Vahid Majdinasab et.al.	2402.09299	link
2024-02-14	Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey	Zhichen Dong et.al.	2402.09283	link
2024-02-14	Leveraging Large Language Models for Enhanced NLP Task Performance through Knowledge Distillation and Optimized Training Strategies	Yining Huang et.al.	2402.09282	null
2024-02-14	Personalized Large Language Models	Stanisław Woźniak et.al.	2402.09269	null
2024-02-14	Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation	Xiaoying Zhang et.al.	2402.09267	null
2024-02-13	Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance	Linxi Zhao et.al.	2402.08680	null
2024-02-13	COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability	Xingang Guo et.al.	2402.08679	link
2024-02-13	Human Curriculum Effects Emerge with In-Context Learning in Neural Networks	Jacob Russin et.al.	2402.08674	link
2024-02-13	Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models	Yuqing Liu et.al.	2402.08670	null
2024-02-13	Improving Generalization in Semantic Parsing by Increasing Natural Language Variation	Irina Saparina et.al.	2402.08666	link
2024-02-13	The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation Setting	David Haag et.al.	2402.08658	null
2024-02-13	PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs	Michael Dorkenwald et.al.	2402.08657	null
2024-02-13	Tandem Transformers for Inference Efficient LLMs	Aishwarya P S et.al.	2402.08644	null
2024-02-13	SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages	Nedjma Ousidhoum et.al.	2402.08638	null
2024-02-13	Knowledge Editing on Black-box Large Language Models	Xiaoshuai Song et.al.	2402.08631	link
2024-02-13	Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning	Haeju Lee et.al.	2402.08594	link
2024-02-13	Test-Time Backdoor Attacks on Multimodal Large Language Models	Dong Lu et.al.	2402.08577	link
2024-02-13	Online Foundation Model Selection in Robotics	Po-han Li et.al.	2402.08570	null
2024-02-13	Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast	Xiangming Gu et.al.	2402.08567	link
2024-02-13	Artificial Intelligence for Literature Reviews: Opportunities and Challenges	Francisco Bolanos et.al.	2402.08565	null
2024-02-13	Higher Layers Need More LoRA Experts	Chongyang Gao et.al.	2402.08562	link
2024-02-13	Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback	Vineet Bhat et.al.	2402.08546	null
2024-02-13	The Application of ChatGPT in Responding to Questions Related to the Boston Bowel Preparation Scale	Xiaoqiang Liu et.al.	2402.08492	null
2024-02-13	Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models	Shaeke Salman et.al.	2402.08473	null
2024-02-13	Large Language Models for the Automated Analysis of Optimization Algorithms	Camilo Chacón Sartori et.al.	2402.08472	link
2024-02-12	A systematic investigation of learnability from single child linguistic input	Yulu Qin et.al.	2402.07899	link
2024-02-12	Suppressing Pink Elephants with Direct Principle Feedback	Louis Castricato et.al.	2402.07896	null
2024-02-12	WildfireGPT: Tailored Large Language Model for Wildfire Analysis	Yangxinyu Xie et.al.	2402.07877	link
2024-02-12	Policy Improvement using Language Feedback Models	Victor Zhong et.al.	2402.07876	link
2024-02-12	PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs	Soroush Nasiriany et.al.	2402.07872	null
2024-02-12	Scaling Laws for Fine-Grained Mixture of Experts	Jakub Krajewski et.al.	2402.07871	link
2024-02-12	PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models	Wei Zou et.al.	2402.07867	link
2024-02-12	Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models	Siddharth Karamcheti et.al.	2402.07865	link
2024-02-12	AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy	Philipp Schoenegger et.al.	2402.07862	null
2024-02-12	Lissard: Long and Simple Sequential Reasoning Datasets	Mirelle Bueno et.al.	2402.07859	link
2024-02-12	Mercury: An Efficiency Benchmark for LLM Code Synthesis	Mingzhe Du et.al.	2402.07844	link
2024-02-12	Do Membership Inference Attacks Work on Large Language Models?	Michael Duan et.al.	2402.07841	link
2024-02-12	Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model	Ahmet Üstün et.al.	2402.07827	null
2024-02-12	Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning	Z Liu et.al.	2402.07818	null
2024-02-12	Injecting Wiktionary to improve token-level contextual representations using contrastive learning	Anna Mosolova et.al.	2402.07817	null
2024-02-12	Retrieval-Augmented Thought Process as Sequential Decision Making	Thomas Pouplin et.al.	2402.07812	null
2024-02-12	Empowering Federated Learning for Massive Models with NVIDIA FLARE	Holger R. Roth et.al.	2402.07792	null
2024-02-12	TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection	Hui Liu et.al.	2402.07776	link
2024-02-12	Quantitative knowledge retrieval from large language models	David Selby et.al.	2402.07770	link
2024-02-12	Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model	Mikail Khona et.al.	2402.07757	null
2024-02-09	Feedback Loops With Language Models Drive In-Context Reward Hacking	Alexander Pan et.al.	2402.06627	link
2024-02-09	Understanding the Effects of Iterative Prompting on Truthfulness	Satyapriya Krishna et.al.	2402.06625	null
2024-02-09	Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning	Shivalika Singh et.al.	2402.06619	null
2024-02-09	FaBERT: Pre-training BERT on Persian Blogs	Mostafa Masumi et.al.	2402.06617	null
2024-02-09	On the Out-Of-Distribution Generalization of Multimodal Large Language Models	Xingxuan Zhang et.al.	2402.06599	null
2024-02-09	CigaR: Cost-efficient Program Repair with LLMs	Dávid Hidvégi et.al.	2402.06598	link
2024-02-09	Understanding the Weakness of Large Language Model Agents within a Complex Android Environment	Mingzhe Xing et.al.	2402.06596	link
2024-02-09	Self-consistent context aware conformer transducer for speech recognition	Konstantin Kolokolov et.al.	2402.06592	null
2024-02-09	G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German	Ehsan Latif et.al.	2402.06584	link
2024-02-09	Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning	Amir Ziai et.al.	2402.06560	link
2024-02-09	The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical Model	Gregory Coppola et.al.	2402.06557	link
2024-02-09	Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA	Marek Šuppa et.al.	2402.06549	link
2024-02-09	Calibrating Long-form Generations from Large Language Models	Yukun Huang et.al.	2402.06544	link
2024-02-09	Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty	Kaiqu Liang et.al.	2402.06529	link
2024-02-09	Multimodal Clinical Trial Outcome Prediction with Large Language Models	Wenhao Zheng et.al.	2402.06512	link
2024-02-09	Iris-SAM: Iris Segmentation Using a Foundational Model	Parisa Farmanifard et.al.	2402.06497	link
2024-02-09	Large Language Models for Captioning and Retrieving Remote Sensing Images	João Daniel Silva et.al.	2402.06475	null
2024-02-09	V-STaR: Training Verifiers for Self-Taught Reasoners	Arian Hosseini et.al.	2402.06457	null
2024-02-09	StruQ: Defending Against Prompt Injection with Structured Queries	Sizhe Chen et.al.	2402.06363	link
2024-02-09	CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models	Peiyuan Gong et.al.	2402.06360	link
2024-02-08	SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models	Peng Gao et.al.	2402.05935	link
2024-02-08	Driving Everywhere with Large Language Model Policy Adaptation	Boyi Li et.al.	2402.05932	null
2024-02-08	WebLINX: Real-World Website Navigation with Multi-Turn Dialogue	Xing Han Lù et.al.	2402.05930	link
2024-02-08	An Interactive Agent Foundation Model	Zane Durante et.al.	2402.05929	null
2024-02-08	On the Convergence of Zeroth-Order Federated Tuning in Large Language Models	Zhenqing Ling et.al.	2402.05926	link
2024-02-08	Efficient Stagewise Pretraining via Progressive Subnetworks	Abhishek Panigrahi et.al.	2402.05913	null
2024-02-08	FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs	Eun Cheol Choi et.al.	2402.05904	link
2024-02-08	Large Language Model Meets Graph Neural Network in Knowledge Distillation	Shengxiang Hu et.al.	2402.05894	null
2024-02-08	Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking	Nikhil Sharma et.al.	2402.05880	null
2024-02-08	PromptCrypt: Prompt Encryption for Secure Communication with Large Language Models	Guo Lin et.al.	2402.05868	link
2024-02-08	How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis	Federico Bianchi et.al.	2402.05863	link
2024-02-08	Let Your Graph Do the Talking: Encoding Structured Data for LLMs	Bryan Perozzi et.al.	2402.05862	link
2024-02-08	Learning to Route Among Specialized Experts for Zero-Shot Generalization	Mohammed Muqeeth et.al.	2402.05859	link
2024-02-08	Limitations of Agents Simulated by Predictive Models	Raymond Douglas et.al.	2402.05829	null
2024-02-08	Is it Possible to Edit Large Language Models Robustly?	Xinbei Ma et.al.	2402.05827	link
2024-02-08	Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models	Lingzhi Wang et.al.	2402.05813	null
2024-02-08	Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning	Zhiheng Xi et.al.	2402.05808	link
2024-02-08	How do Transformers perform In-Context Autoregressive Learning?	Michael E. Sander et.al.	2402.05787	null
2024-02-08	Limits of Transformer Language Models on Algorithmic Learning	Jonathan Thomm et.al.	2402.05785	link
2024-02-08	Text-to-Code Generation with Modality-relative Pre-training	Fenia Christopoulou et.al.	2402.05783	null
2024-02-07	Opening the AI black box: program synthesis via mechanistic interpretability	Eric J. Michaud et.al.	2402.05110	link
2024-02-07	You Can REST Now: Automated Specification Inference and Black-Box Testing of RESTful APIs with Large Language Models	Alix Decrop et.al.	2402.05102	null
2024-02-07	Hydragen: High-Throughput LLM Inference with Shared Prefixes	Jordan Juravsky et.al.	2402.05099	link
2024-02-07	Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation	Dennis Hoftijzer et.al.	2402.05090	link
2024-02-07	A Roadmap to Pluralistic Alignment	Taylor Sorensen et.al.	2402.05070	link
2024-02-07	SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models	Lijun Li et.al.	2402.05044	link
2024-02-07	How BERT Speaks Shakespearean English? Evaluating Historical Bias in Contextual Language Models	Miriam Cuscito et.al.	2402.05034	null
2024-02-07	A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?	Agustinus Kristiadi et.al.	2402.05015	link
2024-02-07	Pedagogical Alignment of Large Language Models	Shashank Sonkar et.al.	2402.05000	link
2024-02-07	An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration	Yihao Li et.al.	2402.04978	null
2024-02-07	ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12	Liuqing Chen et.al.	2402.04975	null
2024-02-07	Reconfidencing LLMs from the Grouping Loss Perspective	Lihu Chen et.al.	2402.04957	null
2024-02-07	Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems	Samuel Kernan Freire et.al.	2402.04955	null
2024-02-07	Prompting Implicit Discourse Relation Annotation	Frances Yung et.al.	2402.04918	null
2024-02-07	Personalized Text Generation with Fine-Grained Linguistic Control	Bashar Alhafni et.al.	2402.04914	link
2024-02-07	L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ	Hyesung Jeon et.al.	2402.04902	null
2024-02-07	Detecting Generated Native Ads in Conversational Search	Sebastian Schmidt et.al.	2402.04889	link
2024-02-07	Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback	Zheng Wang et.al.	2402.04867	null
2024-02-07	Automated Smart Contract Summarization via LLMs	Yingjie Mao et.al.	2402.04863	null
2024-02-07	CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay	Natasha Butt et.al.	2402.04858	link
2024-02-06	AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls	Yu Du et.al.	2402.04253	link
2024-02-06	HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal	Mantas Mazeika et.al.	2402.04249	link
2024-02-06	Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks	Jongho Park et.al.	2402.04248	link
2024-02-06	Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science	Xiangru Tang et.al.	2402.04247	null
2024-02-06	CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations	Ji Qi et.al.	2402.04236	link
2024-02-06	Can Generative Agents Predict Emotion?	Ciaran Regan et.al.	2402.04232	null
2024-02-06	“Task Success” is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors	Lin Guan et.al.	2402.04210	null
2024-02-06	Explaining Autonomy: Enhancing Human-Robot Interaction through Explanation Generation with Large Language Models	David Sobrín-Hidalgo et.al.	2402.04206	link
2024-02-06	SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models	Yichen Shi et.al.	2402.04178	link
2024-02-06	Scaling Laws for Downstream Task Performance of Large Language Models	Berivan Isik et.al.	2402.04177	null
2024-02-06	Harnessing the Plug-and-Play Controller by Prompting	Hao Wang et.al.	2402.04160	null
2024-02-06	Multi-line AI-assisted Code Authoring	Omer Dunay et.al.	2402.04141	null
2024-02-06	Advancing Legal Reasoning: The Integration of AI to Navigate Complexities and Biases in Global Jurisprudence with Semi-Automated Arbitration Processes (SAAPs)	Michael De’Shazer et.al.	2402.04140	null
2024-02-06	Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science	Pengfei Liu et.al.	2402.04119	link
2024-02-06	Measuring Implicit Bias in Explicitly Unbiased Large Language Models	Xuechunzi Bai et.al.	2402.04105	link
2024-02-06	The Use of a Large Language Model for Cyberbullying Detection	Bayode Ogunleye et.al.	2402.04088	null
2024-02-06	A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation	Zhengbo Wang et.al.	2402.04087	link
2024-02-06	Provably learning a multi-head attention layer	Sitan Chen et.al.	2402.04084	null
2024-02-06	Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models	Reza Khanmohammadi et.al.	2402.04075	null
2024-02-06	Retrieve to Explain: Evidence-driven Predictions with Language Models	Ravi Patel et.al.	2402.04068	link

Video Understanding

Publish Date	Title	Authors	PDF	Code
2025-07-22	Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models	Tz-Ying Wu et.al.	2507.17050	null
2025-07-22	Controllable Hybrid Captioner for Improved Long-form Video Understanding	Kuleen Sasse et.al.	2507.17047	null
2025-07-22	SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities	Yasser Ashraf et.al.	2507.16151	null
2025-07-21	DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding	Xiaoyi Bao et.al.	2507.15569	null
2025-07-20	Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction	Ce Zhang et.al.	2507.15130	null
2025-07-20	Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding	Yuanhan Zhang et.al.	2507.15028	null
2025-07-20	LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering	Xinxin Dong et.al.	2507.14784	null
2025-07-19	InterAct-Video: Reasoning-Rich Video QA for Urban Traffic	Joseph Raj Vishal et.al.	2507.14743	null
2025-07-18	Generalist Forecasting with Frozen Video Models via Latent Diffusion	Jacob C Walker et.al.	2507.13942	null
2025-07-18	Team of One: Cracking Complex Video QA with Model Synergy	Jun Xie et.al.	2507.13820	null
2025-07-18	CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks	Yanan Wang et.al.	2507.13609	null
2025-07-17	VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Shihao Wang et.al.	2507.13353	null
2025-07-17	FIQ: Fundamental Question Generation with the Integration of Question Embeddings for Video Question Answering	Ju-Young Oh et.al.	2507.12816	null
2025-07-18	DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition	Hayat Ullah et.al.	2507.12426	null
2025-07-16	Calisthenics Skills Temporal Video Segmentation	Antonio Finocchiaro et.al.	2507.12245	null
2025-07-15	UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks	Peiran Wu et.al.	2507.11336	null
2025-07-14	EmbRACE-3K: Embodied Reasoning and Action in Complex Environments	Mingxian Lin et.al.	2507.10548	null
2025-07-14	Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jiangkai Wu et.al.	2507.10510	null
2025-07-14	DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs	Jiahe Zhao et.al.	2507.10302	null
2025-07-14	ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models	Yongheng Zhang et.al.	2507.09876	null
2025-07-22	VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding	Younggun Kim et.al.	2507.09815	null
2025-07-13	ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments	Jiali Chen et.al.	2507.09693	null
2025-07-13	GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them?	Yiyang Zhou et.al.	2507.09491	null
2025-07-11	Infinite Video Understanding	Dell Zhang et.al.	2507.09068	null
2025-07-10	Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs	Jeongseok Hyun et.al.	2507.07990	null
2025-07-18	Hierarchical Multi-Stage Transformer Architecture for Context-Aware Temporal Action Localization	Hayat Ullah et.al.	2507.06411	null
2025-07-09	Omni-Video: Democratizing Unified Video Understanding and Generation	Zhiyu Tan et.al.	2507.06119	null
2025-07-08	MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding	Tongtong Cheng et.al.	2507.06072	null
2025-07-14	Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation	Quanzhu Niu et.al.	2507.05948	null
2025-07-08	Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models	L’ea Dubois et.al.	2507.05822	null
2025-07-07	Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models	Eunseop Yoon et.al.	2507.04976	null
2025-07-07	HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding	Yuxuan Cai et.al.	2507.04909	null
2025-07-07	From Vision To Language through Graph of Events in Space and Time: An Explainable Self-supervised Approach	Mihai Masala et.al.	2507.04815	null
2025-07-07	Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning	Feng Yue et.al.	2507.04702	null
2025-07-07	VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents	Rui Meng et.al.	2507.04590	null
2025-07-06	M $^3$ -Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding	Shenxi Liu et.al.	2507.04289	null
2025-07-04	Multimodal Alignment with Cross-Attentive GRUs for Fine-Grained Video Understanding	Namho Kim et.al.	2507.03531	null
2025-07-07	AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding	Weili Xu et.al.	2507.02591	null
2025-07-02	Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges	Sanjeda Akter et.al.	2507.02074	null
2025-07-01	Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames	Anurag Arnab et.al.	2507.02001	null
2025-07-02	Kwai Keye-VL Technical Report	Kwai Keye Team et.al.	2507.01949	null
2025-07-09	LongAnimation: Long Animation Generation with Dynamic Global-Local Memory	Nan Chen et.al.	2507.01945	null
2025-07-02	AVC-DPO: Aligned Video Captioning via Direct Preference Optimization	Jiyang Tang et.al.	2507.01492	null
2025-06-30	Embedding-based Retrieval in Multimodal Content Moderation	Hanzhong Liang et.al.	2507.01066	null
2025-07-02	GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning	GLM-V Team et.al.	2507.01006	null
2025-07-01	CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs	Jiaming Zhang et.al.	2507.00817	null
2025-07-01	Bisecle: Binding and Separation in Continual Learning for Video Language Understanding	Yue Tan et.al.	2507.00469	null
2025-06-28	MANTA: Cross-Modal Semantic Alignment and Information-Theoretic Optimization for Long-form Multimodal Understanding	Ziqi Zhong et.al.	2507.00068	null
2025-06-30	Flash-VStream: Efficient Real-Time Understanding for Long Video Streams	Haoji Zhang et.al.	2506.23825	null
2025-06-29	MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition	Yuhuan Yang et.al.	2506.23283	null
2025-06-28	ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment	Amir Aghdam et.al.	2506.22967	null
2025-06-28	Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder	Dang Jisheng et.al.	2506.22880	null
2025-06-27	Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs	Shaojie Zhang et.al.	2506.22139	null
2025-06-27	DIVE: Deep-search Iterative Video Exploration A Technical Report for the CVRR Challenge at CVPR 2025	Umihiro Kamoto et.al.	2506.21891	null
2025-06-27	LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs	Boyuan Sun et.al.	2506.21862	null
2025-06-26	Task-Aware KV Compression For Cost-Effective Long Video Understanding	Minghao Qin et.al.	2506.21184	null
2025-06-26	IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes	Yujia Liang et.al.	2506.21116	null
2025-06-25	Dense Video Captioning using Graph-based Sentence Summarization	Zhiwang Zhang et.al.	2506.20583	null
2025-06-25	Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization	Zhiwang Zhang et.al.	2506.20567	null
2025-06-24	PEVLM: Parallel Encoding for Vision-Language Models	Letian Kang et.al.	2506.19651	null
2025-06-24	Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification	Minghao Qin et.al.	2506.19225	null
2025-06-23	Universal Video Temporal Grounding with Generative Multi-modal Large Language Models	Zeqian Li et.al.	2506.18883	null
2025-06-27	MUPA: Towards Multi-Path Agentic Reasoning for Grounded Video Question Answering	Jisheng Dang et.al.	2506.18071	null
2025-06-22	SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model	Guankun Wang et.al.	2506.17873	null
2025-06-21	CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning	Kailing Li et.al.	2506.17629	null
2025-06-19	How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?	Giuseppe Lando et.al.	2506.16450	null
2025-06-19	GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning	Yi Chen et.al.	2506.16141	link
2025-06-19	PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning	Yizhe Li et.al.	2506.16082	null
2025-06-18	Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation	Ruoyu Wang et.al.	2506.15757	null
2025-06-18	InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding	Minsoo Kim et.al.	2506.15745	null
2025-06-18	video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models	Changli Tang et.al.	2506.15220	null
2025-06-17	SIRI-Bench: Challenging VLMs’ Spatial Intelligence through Complex Reasoning Tasks	Zijian Song et.al.	2506.14512	null
2025-06-17	EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization	Xiaoqi Wang et.al.	2506.14356	link
2025-06-17	GHz spiking neuromorphic photonic chip with in-situ training	Jinlong Xiang et.al.	2506.14272	null
2025-06-18	AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding	Zhucun Xue et.al.	2506.13589	null
2025-06-16	MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models	Geewook Kim et.al.	2506.13564	null
2025-06-14	Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding	Youze Wang et.al.	2506.12336	null
2025-06-13	Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation	Divyanshu Mishra et.al.	2506.11777	link
2025-06-11	Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search	Linhao Yu et.al.	2506.11155	null
2025-06-15	VideoDeepResearch: Long Video Understanding With Agentic Tool Using	Huaying Yuan et.al.	2506.10821	link
2025-06-12	CogStream: Context-guided Streaming Video Question Answering	Zicheng Zhao et.al.	2506.10516	null
2025-06-11	V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning	Mido Assran et.al.	2506.09985	link
2025-06-11	CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models	Aaron Foss et.al.	2506.09943	link
2025-06-11	HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios	Kunyu Peng et.al.	2506.09650	link
2025-06-11	TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision	Ayush Gupta et.al.	2506.09445	null
2025-06-11	Synthetic Human Action Video Data Generation with Pose Transfer	Vaclav Knapp et.al.	2506.09411	null
2025-06-10	Seedance 1.0: Exploring the Boundaries of Video Generation Models	Yu Gao et.al.	2506.09113	null
2025-06-10	VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks	Xinlong Chen et.al.	2506.09079	null
2025-06-10	MLVTG: Mamba-Based Feature Alignment and LLM-Driven Purification for Multi-Modal Video Temporal Grounding	Zhiyi Zhu et.al.	2506.08512	null
2025-06-09	CyberV: Cybernetics for Test-time Scaling in Video Understanding	Jiahao Meng et.al.	2506.07971	link
2025-06-09	Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue Reasoning	Tieyuan Chen et.al.	2506.07811	link
2025-06-16	SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis	Jianhui Wei et.al.	2506.07603	null
2025-06-09	SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding	Nianbo Zeng et.al.	2506.07600	null
2025-06-09	Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding	Boyu Chen et.al.	2506.07576	null
2025-06-10	ARGUS: Hallucination and Omission Evaluation in Video-LLMs	Ruchit Rawal et.al.	2506.07371	null
2025-06-08	A Culturally-diverse Multilingual Multimodal Video Benchmark & Model	Bhuiyan Sanjid Shafique et.al.	2506.07032	null
2025-06-13	MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks	Sanjoy Chowdhury et.al.	2506.07016	null
2025-06-07	How Important are Videos for Training Video LLMs?	George Lydakis et.al.	2506.06928	null
2025-06-06	Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models	Seung-jae Lee et.al.	2506.06537	null
2025-06-06	ExAct: A Video-Language Benchmark for Expert Action Analysis	Han Yi et.al.	2506.06277	null
2025-06-06	Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision	Yuping He et.al.	2506.06253	null
2025-06-06	VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning	Zikang Wang et.al.	2506.06097	null
2025-06-06	EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs	Ivan Rodin et.al.	2506.05787	null
2025-06-05	FRAME: Pre-Training Video Feature Representations via Anticipation and Memory	Sethuraman TV et.al.	2506.05543	null
2025-06-05	VideoMolmo: Spatio-Temporal Grounding Meets Pointing	Ghazi Shazan Ahmad et.al.	2506.05336	link
2025-06-05	AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Lidong Lu et.al.	2506.05328	null
2025-06-05	Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos	Weifeng Lin et.al.	2506.05302	null
2025-06-05	TextVidBench: A Benchmark for Long Video Scene Text Understanding	Yangyang Zhong et.al.	2506.04983	null
2025-06-05	APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval	Hong Gao et.al.	2506.04953	null
2025-06-05	DualX-VSR: Dual Axial Spatial $\times$ Temporal Transformer for Real-World Video Super-Resolution without Motion Compensation	Shuo Cao et.al.	2506.04830	null
2025-06-04	DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding	Hongzhi Zhang et.al.	2506.03990	null
2025-06-04	Video, How Do Your Tokens Merge?	Sam Pollard et.al.	2506.03885	null
2025-06-04	Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning	Daeun Lee et.al.	2506.03525	null
2025-06-03	Seeing the Arrow of Time in Large Multimodal Models	Zihui Xue et.al.	2506.03340	null
2025-06-03	EgoVLM: Policy Optimization for Egocentric Video Understanding	Ashwin Vinod et.al.	2506.03097	link
2025-06-03	HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation	Yicheng Xiao et.al.	2506.02975	null
2025-06-03	METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding	Mengyue Wang et.al.	2506.02850	link
2025-06-04	Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments	Di Wen et.al.	2506.02845	null
2025-06-04	InterRVOS: Interaction-aware Referring Video Object Segmentation	Woojeong Jin et.al.	2506.02356	null
2025-06-02	Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency	Hongyu Li et.al.	2506.01908	link
2025-06-02	VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking	Desen Meng et.al.	2506.01725	null
2025-06-02	ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding	Yiyang Zhou et.al.	2506.01300	null
2025-06-01	MOOSE: Pay Attention to Temporal Dynamics for Video Understanding via Optical Flows	Hong Nguyen et.al.	2506.01119	null
2025-06-01	Keystep Recognition using Graph Neural Networks	Julia Lee Romero et.al.	2506.01102	null
2025-06-01	FlexSelect: Flexible Token Selection for Efficient Long Video Understanding	Yunzhu Zhang et.al.	2506.00993	null
2025-06-01	SynPO: Synergizing Descriptiveness and Preference Optimization for Video Detailed Captioning	Jisheng Dang et.al.	2506.00835	null
2025-05-31	Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis	Vasilii Korolkov et.al.	2506.00667	null
2025-05-31	Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning	Sara Ghazanfari et.al.	2506.00318	null
2025-05-30	PerFormer: A Permutation Based Vision Transformer for Remaining Useful Life Prediction	Zhengyang Fan et.al.	2506.00259	null
2025-05-30	SiLVR: A Simple Language-based Video Reasoning Framework	Ce Zhang et.al.	2505.24869	link
2025-05-30	Time Blindness: Why Video-Language Models Can’t See What Humans Can?	Ujjwal Upadhyay et.al.	2505.24867	null
2025-05-30	VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software	Brandon Man et.al.	2505.24838	link
2025-05-30	Learning reusable concepts across different egocentric video understanding tasks	Simone Alberto Peirone et.al.	2505.24690	null
2025-06-04	Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering	Md Intisar Chowdhury et.al.	2505.24371	null
2025-05-30	VUDG: A Dataset for Video Understanding Domain Generalization	Ziyi Wang et.al.	2505.24346	null
2025-05-30	DisTime: Distribution-based Time Representation for Video Large Language Models	Yingsen Zeng et.al.	2505.24329	link
2025-05-30	Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders	Bo Fang et.al.	2505.24158	null
2025-05-29	Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding	Mingyang Mao et.al.	2505.23990	null
2025-05-29	ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding	David Ma et.al.	2505.23922	link
2025-05-29	MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection	Yixian Shen et.al.	2505.23870	null
2025-05-29	VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos	Tingyu Song et.al.	2505.23693	link
2025-05-29	VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models	Xiangdong Zhang et.al.	2505.23656	link
2025-05-29	One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory	Chenhao Zheng et.al.	2505.23617	null
2025-05-29	VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation	Shi-Xue Zhang et.al.	2505.23484	link
2025-05-29	VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?	Yuanxin Liu et.al.	2505.23359	link
2025-05-29	PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling	Xiao Yu et.al.	2505.23155	link
2025-05-28	VidText: Towards Comprehensive Evaluation for Video Text Understanding	Zhoufaran Yang et.al.	2505.22810	link
2025-05-28	Universal Visuo-Tactile Video Understanding for Embodied Interaction	Yifan Xie et.al.	2505.22566	null
2025-05-28	Fostering Video Reasoning via Next-Event Prediction	Haonan Wang et.al.	2505.22457	null
2025-05-27	HuMoCon: Concept Discovery for Human Motion Understanding	Qihang Fang et.al.	2505.20920	null
2025-05-27	MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding	Fuwen Luo et.al.	2505.20715	link
2025-05-27	HCQA-1.5 @ Ego4D EgoSchema Challenge 2025	Haoyu Zhang et.al.	2505.20644	null
2025-05-26	HoPE: Hybrid of Position Embedding for Length Generalization in Vision-Language Models	Haoran Li et.al.	2505.20444	null
2025-05-26	TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos	Fanheng Kong et.al.	2505.20124	link
2025-05-26	AdaTP: Attention-Debiased Token Pruning for Video Large Language Models	Fengyuan Sun et.al.	2505.20100	null
2025-05-26	Two Causally Related Needles in a Video Haystack	Miaoyu Li et.al.	2505.19853	null
2025-05-25	Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs	Xuan Zhang et.al.	2505.19155	null
2025-05-28	Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding	Xiaoyi Zhang et.al.	2505.18079	null
2025-05-23	VIBE: Video-to-Text Information Bottleneck Evaluation for TL;DR	Shenghui Chen et.al.	2505.17423	link
2025-05-22	Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning	Fanrui Zhang et.al.	2505.16836	link
2025-05-22	Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles	Jun Xie et.al.	2505.16784	null
2025-05-22	SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding	Sushant Gautam et.al.	2505.16630	null
2025-05-22	Temporal Object Captioning for Street Scene Videos from LiDAR Tracks	Vignesh Gopinathan et.al.	2505.16594	null
2025-05-22	QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design	Benjamin Schneider et.al.	2505.16175	link
2025-05-21	ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation	Tony Montes et.al.	2505.15928	link
2025-05-21	Clapper: Compact Learning and Video Representation in VLMs	Lingyu Kong et.al.	2505.15529	null
2025-05-21	ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning	Ziqiang Xu et.al.	2505.15447	null
2025-05-21	LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval	Zhenyu Ning et.al.	2505.15269	null
2025-05-21	Leveraging Foundation Models for Multimodal Graph-Based Action Recognition	Fatemeh Ziaeetabar et.al.	2505.15192	null
2025-05-20	VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation	Wentao Ma et.al.	2505.14640	null
2025-05-20	Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models	Xuyang Liu et.al.	2505.14454	link
2025-05-20	Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?	Bo Feng et.al.	2505.14321	null
2025-05-20	LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts	Qifeng Cai et.al.	2505.13928	link
2025-05-20	Domain Adaptation of VLM for Soccer Video Understanding	Tiancheng Jiang et.al.	2505.13860	null
2025-05-20	A Challenge to Build Neuro-Symbolic Video Agents	Sahil Shah et.al.	2505.13851	link
2025-05-19	Understanding Complexity in VideoQA via Visual Program Generation	Cristobal Eyzaguirre et.al.	2505.13429	null
2025-05-19	FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	Zihua Wang et.al.	2505.12728	link
2025-05-19	Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding	Thong Nguyen et.al.	2505.12605	null
2025-05-19	SurveillanceVQA-589K: A Benchmark for Comprehensive Surveillance Video-Language Understanding with Large Models	Bo Liu et.al.	2505.12589	null
2025-05-18	From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations	Yuzhi Li et.al.	2505.12237	null
2025-05-16	Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models	Keunwoo Peter Yu et.al.	2505.11326	link
2025-05-13	SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation	Edoardo Bianchi et.al.	2505.08665	null
2025-05-13	VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models	Pritam Sarkar et.al.	2505.08455	link
2025-05-12	Pixel Motion as Universal Representation for Robot Control	Kanchana Ranasinghe et.al.	2505.07817	null
2025-05-12	Gameplay Highlights Generation	Vignesh Edithal et.al.	2505.07721	null
2025-05-11	Seed1.5-VL Technical Report	Dong Guo et.al.	2505.07062	null
2025-05-11	Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge	Bin Li et.al.	2505.06814	null
2025-05-08	StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant	Haibo Wang et.al.	2505.05467	null
2025-05-03	VideoLLM Benchmarks and Evaluation: A Survey	Yogesh Kumar et.al.	2505.03829	null
2025-05-06	RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph	Sameer Malik et.al.	2505.03173	null
2025-05-08	Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection	Sungheon Jeong et.al.	2505.02393	link
2025-05-03	An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding	Siyang Jiang et.al.	2505.01743	null
2025-05-02	TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action	Jen-Hao Cheng et.al.	2505.01583	link
2025-05-16	VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding	Zongxia Li et.al.	2505.01481	link
2025-05-16	Empowering Agentic Video Analytics Systems with Video Language Models	Yuxuan Yan et.al.	2505.00254	null
2025-05-13	SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding	Chenkai Zhang et.al.	2504.21435	link
2025-04-30	Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering	Yumeng Shi et.al.	2504.21403	null
2025-04-29	FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding	Yanan Guo et.al.	2504.20384	null
2025-04-30	VideoMultiAgents: A Multi-Agent Framework for Video Question Answering	Noriyuki Kugo et.al.	2504.20091	link
2025-04-28	Learning Streaming Video Representation via Multitask Training	Yibin Yan et.al.	2504.20041	null
2025-04-25	ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding	Yi-Xing Peng et.al.	2504.18152	null
2025-04-24	VEU-Bench: Towards Comprehensive Understanding of Video Editing	Bozheng Li et.al.	2504.17828	null
2025-04-29	TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Ling You et.al.	2504.17365	null
2025-04-24	TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos	Linli Yao et.al.	2504.17343	link
2025-04-28	MASR: Self-Reflective Reasoning through Multimodal Hierarchical Attention Focusing for Agent-based Video Understanding	Shiwen Cao et.al.	2504.17213	null
2025-05-10	DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs	Zhenhailong Wang et.al.	2504.17040	null
2025-04-22	MR. Video: “MapReduce” is the Principle for Long Video Understanding	Ziqi Pang et.al.	2504.16082	null
2025-04-22	Describe Anything: Detailed Localized Image and Video Captioning	Long Lian et.al.	2504.16072	null
2025-04-22	ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting	Jian Hu et.al.	2504.15921	null
2025-04-24	Vidi: Large Multimodal Models for Video Understanding and Editing	Vidi Team et.al.	2504.15681	null
2025-04-21	IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs	David Ma et.al.	2504.15415	link
2025-04-21	Towards Understanding Camera Motions in Any Video	Zhiqiu Lin et.al.	2504.15376	null
2025-04-21	Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models	Guo Chen et.al.	2504.15271	null
2025-04-21	An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes	Ji Qi et.al.	2504.15270	null
2025-04-23	Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos	Songping Wang et.al.	2504.14921	null
2025-04-20	OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding	Songtao Jiang et.al.	2504.14692	null
2025-04-20	Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection	Weijun Zhuang et.al.	2504.14553	null
2025-04-20	Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Tong Zeng et.al.	2504.14526	link
2025-04-20	ResNetVLLM – Multi-modal Vision LLM for the Video Understanding Task	Ahmad Khalil et.al.	2504.14432	null
2025-04-19	How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?	Rahul Thapa et.al.	2504.14391	null
2025-04-17	Perception Encoder: The best visual embeddings are not at the output of the network	Daniel Bolya et.al.	2504.13181	null
2025-04-17	PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding	Jang Hyun Cho et.al.	2504.13180	link
2025-04-17	VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models	Haojian Huang et.al.	2504.13122	link
2025-04-21	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	link
2025-04-17	Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval	WonJun Moon et.al.	2504.13035	null
2025-04-22	FocusedAD: Character-centric Movie Audio Description	Xiaojun Ye et.al.	2504.12157	link
2025-04-16	Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization	Pritam Sarkar et.al.	2504.12083	null
2025-04-21	PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild	Henghui Ding et.al.	2504.11326	null
2025-04-15	Video Summarization with Large Language Models	Min Jung Lee et.al.	2504.11199	null
2025-04-15	OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding	Dianbing Xi et.al.	2504.10825	null
2025-04-14	Multimodal Long Video Modeling Based on Temporal Dynamic Context	Haoran Hao et.al.	2504.10443	link
2025-04-14	Mavors: Multi-granularity Video Representation for Multimodal Large Language Model	Yang Shi et.al.	2504.10068	null
2025-04-13	TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning	Xingjian Zhang et.al.	2504.09641	link
2025-04-12	VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro	Zheyuan Zhang et.al.	2504.09282	null
2025-04-11	Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Huu-Loc Tran et.al.	2504.08384	null
2025-04-15	F $^3$ Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos	Zhaoyu Liu et.al.	2504.08222	link
2025-04-10	SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding	Yangliu Hu et.al.	2504.07745	null
2025-04-10	VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding	Henghao Zhao et.al.	2504.07519	null
2025-04-10	How Can Objects Help Video-Language Understanding?	Zitian Tang et.al.	2504.07454	null
2025-04-13	VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Xinhao Li et.al.	2504.06958	null
2025-04-09	LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding	Ziyi Wang et.al.	2504.06835	null
2025-04-08	From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction	Vladimir Golovkin et.al.	2504.06357	null
2025-04-08	From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Chejian Xu et.al.	2504.06214	null
2025-04-08	Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA	Zijie Song et.al.	2504.05783	null
2025-04-09	Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting	Yunlong Tang et.al.	2504.05541	link
2025-04-07	REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding	Sakib Reza et.al.	2504.05491	null
2025-04-07	REVEAL: Relation-based Video Representation Learning for Video-Question-Answering	Sofian Chaybouti et.al.	2504.05463	null
2025-04-07	PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition	Jie Wang et.al.	2504.05075	null
2025-04-07	InstructionBench: An Instructional Video Understanding Benchmark	Haiwan Wei et.al.	2504.05040	null
2025-04-12	Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus Adaptation	Zhaofeng Shi et.al.	2504.04840	null
2025-04-06	Advancing Egocentric Video Question Answering with Multimodal Large Language Models	Alkesh Patel et.al.	2504.04550	null
2025-04-06	VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT	Zhuo Zhi et.al.	2504.04471	null
2025-04-10	VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models	Dahun Kim et.al.	2504.03970	link
2025-04-08	Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation	Chuanqi Cheng et.al.	2504.02438	link
2025-04-03	Leveraging Static Relationships for Intra-Type and Inter-Type Message Passing in Video Question Answering	Lili Liang et.al.	2504.02417	null
2025-04-03	Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval	Boseung Jeong et.al.	2504.02397	null
2025-04-03	Moment Quantization for Video Temporal Grounding	Xiaolong Sun et.al.	2504.02286	null
2025-04-06	Re-thinking Temporal Search for Long-Form Video Understanding	Jinhui Ye et.al.	2504.02259	link
2025-04-02	Aligned Better, Listen Better for Audio-Visual Large Language Models	Yuxin Guo et.al.	2504.02061	null
2025-04-07	Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Shreyank N Gowda et.al.	2504.01890	null
2025-04-02	Spatial-R1: Enhancing MLLMs in Video Spatial Reasoning	Kun Ouyang et.al.	2504.01805	link
2025-04-02	TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding	Junwen Pan et.al.	2504.01407	null
2025-04-02	Slow-Fast Architecture for Video Multi-Modal Large Language Models	Min Shi et.al.	2504.01328	link
2025-04-01	Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation	Junyu Xie et.al.	2504.01020	null
2025-03-31	Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1	Yi Chen et.al.	2503.24376	link
2025-03-31	DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description	Adrienne Deganutti et.al.	2503.24096	null
2025-03-31	H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding	Qi Wu et.al.	2503.24008	null
2025-03-31	A SAT-centered XAI method for Deep Learning based Video Understanding	Hojer Key et.al.	2503.23870	null
2025-03-31	The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning	Mingkai Tian et.al.	2503.23679	null
2025-03-30	CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition	Jongseo Lee et.al.	2503.23447	null
2025-03-29	OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts	Yuxuan Wang et.al.	2503.22952	null
2025-03-28	EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos	Yuxuan Li et.al.	2503.22152	link
2025-03-27	Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model	Abdelrahman Shaker et.al.	2503.21782	link
2025-03-27	BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding	Shuming Liu et.al.	2503.21483	link
2025-03-27	Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering	Erika Mori et.al.	2503.21190	null
2025-03-26	BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation	Yulu Pan et.al.	2503.20781	link
2025-03-26	From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment	Yucheng Suo et.al.	2503.20472	null
2025-03-26	Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding	Joao Pereira et.al.	2503.20362	null
2025-03-25	ACVUBench: Audio-Centric Video Understanding Benchmark	Yudong Yang et.al.	2503.19951	link
2025-03-25	PAVE: Patching and Adapting Video Large Language Models	Zhuoming Liu et.al.	2503.19794	link
2025-03-31	Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations	Jungin Park et.al.	2503.19706	link
2025-03-25	Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation	Hongcheng Gao et.al.	2503.19622	link
2025-03-27	SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding	Mingze Xu et.al.	2503.18943	null
2025-03-24	CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos	Yang Liu et.al.	2503.18808	null
2025-03-24	Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks	Nina Shvetsova et.al.	2503.18637	null
2025-03-24	Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding	Xiangrui Liu et.al.	2503.18478	null
2025-03-24	Breaking the Encoder Barrier for Seamless Video-Language Understanding	Handong Li et.al.	2503.18422	null
2025-03-25	VTD-CLIP: Video-to-Text Discretization via Prompting CLIP	Wencheng Zhu et.al.	2503.18407	null
2025-03-23	MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps	Valentin Gabeff et.al.	2503.18223	link
2025-03-22	4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding	Wenxuan Zhu et.al.	2503.17827	link
2025-03-22	V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction	Yiming Zhao et.al.	2503.17736	link
2025-03-22	Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization	Zhuo Tao et.al.	2503.17651	null
2025-03-21	PVChat: Personalized Video Chat with One-Shot Learning	Yufei Shi et.al.	2503.17069	null
2025-03-21	Temporal Action Detection Model Compression by Progressive Block Drop	Xiaoyong Chen et.al.	2503.16916	null
2025-03-20	XAttention: Block Sparse Attention with Antidiagonal Scoring	Ruyi Xu et.al.	2503.16428	link
2025-03-20	Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Zhihang Liu et.al.	2503.16036	link
2025-03-20	Agentic Keyframe Search for Video Question Answering	Sunqi Fan et.al.	2503.16032	link
2025-03-25	STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding	Zichen Liu et.al.	2503.15973	link
2025-03-20	DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering	Haochen Wang et.al.	2503.15887	null
2025-03-20	MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations	Kyungho Bae et.al.	2503.15871	null
2025-03-20	What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?	Xuanming Cui et.al.	2503.15846	null
2025-03-19	Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering	Thanh-Son Nguyen et.al.	2503.14957	null
2025-03-19	FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Chongjun Tu et.al.	2503.14935	null
2025-03-18	Impossible Videos	Zechen Bai et.al.	2503.14378	null
2025-03-18	SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability	Jiankang Wang et.al.	2503.13983	null
2025-03-18	Improving LLM Video Understanding with 16 Frames Per Second	Yixuan Li et.al.	2503.13956	null
2025-03-17	Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition	Shristi Das Biswas et.al.	2503.13724	null
2025-03-17	Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory	Saket Gurukar et.al.	2503.13707	null
2025-03-17	Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos	Chiara Plizzari et.al.	2503.13646	link
2025-03-17	VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning	Ye Liu et.al.	2503.13444	link
2025-03-17	Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding	Weiyu Guo et.al.	2503.13139	null
2025-03-17	Efficient Motion-Aware Video MLLM	Zijia Zhao et.al.	2503.13016	null
2025-03-17	VITED: Video Temporal Evidence Distillation	Yujie Lu et.al.	2503.12855	null
2025-03-17	ViSpeak: Visual Instruction Feedback in Streaming Videos	Shenghao Fu et.al.	2503.12769	null
2025-03-16	AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding	Xiao Wang et.al.	2503.12559	link
2025-03-16	Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?	Tianyuan Qu et.al.	2503.12496	link
2025-03-16	Causality Model for Semantic Understanding on Videos	Li Yicong et.al.	2503.12447	null
2025-03-16	VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining	Yunze Liu et.al.	2503.12332	null
2025-03-14	Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers	Weiming Ren et.al.	2503.11579	null
2025-03-14	V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning	Zixu Cheng et.al.	2503.11495	null
2025-03-14	Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding	David Gastager et.al.	2503.11392	null
2025-03-14	LLaVA-MLB: Mitigating and Leveraging Attention Bias for Training-Free Video LLMs	Leqi Shen et.al.	2503.11205	null
2025-03-13	Large-scale Pre-training for Grounded Video Caption Generation	Evangelos Kazakos et.al.	2503.10781	link
2025-03-13	Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing	Yudong Liu et.al.	2503.10742	link
2025-03-13	4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models	Wanhua Li et.al.	2503.10437	link
2025-03-13	LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents	Boyu Chen et.al.	2503.10200	null
2025-03-13	TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs	Yunxiao Wang et.al.	2503.09994	null
2025-03-14	On the Limitations of Vision-Language Models in Understanding Image Transforms	Ahmad Mustafa Anis et.al.	2503.09837	null
2025-03-13	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Md Mohaiminul Islam et.al.	2503.09590	link
2025-03-12	VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary	Kevin Qinghong Lin et.al.	2503.09402	link
2025-03-12	VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers	Ruanjun Li et.al.	2503.09387	null
2025-03-12	Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption	Luozheng Qin et.al.	2503.09279	null
2025-03-13	FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models	Fufangchen Zhao et.al.	2503.09158	null
2025-03-12	Memory-enhanced Retrieval Augmentation for Long Video Understanding	Huaying Yuan et.al.	2503.09149	null
2025-03-12	Generative Frame Sampler for Long Video Understanding	Linli Yao et.al.	2503.09146	null
2025-03-12	Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding	Haoyu Zhang et.al.	2503.09143	null
2025-03-12	Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment	Xiaowei Bi et.al.	2503.09081	null
2025-03-12	Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization	Zongshang Pang et.al.	2503.09027	null
2025-03-11	QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension	Yongdong Luo et.al.	2503.08689	link
2025-03-11	HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding	Shehreen Azad et.al.	2503.08585	null
2025-03-11	RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding	Xichen Tan et.al.	2503.08576	null
2025-03-11	Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos	Soumya Shamarao Jahagirdar et.al.	2503.08335	null
2025-03-10	BEARCUBS: A benchmark for computer-using web agents	Yixiao Song et.al.	2503.07919	null
2025-03-10	ALLVB: All-in-One Long Video Understanding Benchmark	Xichen Tan et.al.	2503.07298	null
2025-03-10	Towards Fine-Grained Video Question Answering	Wei Dai et.al.	2503.06820	null
2025-03-09	TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos	Chen-Lin Zhang et.al.	2503.06526	link
2025-03-08	Get In Video: Add Anything You Want to the Video	Shaobin Zhuang et.al.	2503.06268	null
2025-03-07	Unified Reward Model for Multimodal Understanding and Generation	Yibin Wang et.al.	2503.05236	null
2025-03-06	Token-Efficient Long Video Understanding for Multimodal LLMs	Jindong Jiang et.al.	2503.04130	null
2025-03-06	EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models	Haiyang Yu et.al.	2503.04058	null
2025-03-05	EgoLife: Towards Egocentric Life Assistant	Jingkang Yang et.al.	2503.03803	link
2025-03-06	Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection	Wenqiao Li et.al.	2503.03562	null
2025-03-03	Parameter-free Video Segmentation for Vision and Language Understanding	Louis Mahon et.al.	2503.01201	null
2025-03-02	Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning	Baoqi Pei et.al.	2503.00986	link
2025-03-04	An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions	Zhe Wang et.al.	2503.00796	null
2025-03-01	Streaming Video Question-Answering with In-context Video KV-Cache Retrieval	Shangzhe Di et.al.	2503.00540	link
2025-02-28	PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos	Kangda Wei et.al.	2503.00162	null
2025-02-25	An Analysis of Segment Anything 2	Clayton Bromley et.al.	2503.00042	null
2025-02-28	Adaptive Keyframe Sampling for Long Video Understanding	Xi Tang et.al.	2502.21271	null
2025-02-28	HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models	Xiao Wang et.al.	2502.20811	null
2025-02-27	OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection	Shuming Liu et.al.	2502.20361	link
2025-02-27	M-LLM Based Video Frame Selection for Efficient Video Understanding	Kai Hu et.al.	2502.19680	null
2025-02-26	InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model	Fengbin Guan et.al.	2502.19026	null
2025-02-26	Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos	Luigi Seminara et.al.	2502.17753	link
2025-02-23	Fine-Grained Video Captioning through Scene Graph Consolidation	Sanghyeok Chu et.al.	2502.16427	null
2025-03-01	LongCaptioning: Unlocking the Power of Long Video Caption Generation in Large Multimodal Models	Hongchen Wei et.al.	2502.15393	null
2025-02-21	Weakly Supervised Video Scene Graph Generation via Natural Language Supervision	Kibum Kim et.al.	2502.15370	link
2025-02-20	Can Hallucination Correction Improve Video-Language Alignment?	Lingjun Zhao et.al.	2502.15079	null
2025-03-04	AVD2: Accident Video Diffusion for Accident Video Description	Cheng Li et.al.	2502.14801	null
2025-02-19	Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning	Caihua Liu et.al.	2502.13754	null
2025-02-19	Pretrained Image-Text Models are Secretly Video Captioners	Chunhui Zhang et.al.	2502.13363	link
2025-02-18	VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation	Xinlong Chen et.al.	2502.12782	link
2025-02-18	MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos	Huaying Yuan et.al.	2502.12558	null
2025-02-17	video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model	Guangzhi Sun et.al.	2502.11775	link
2025-02-18	Open-Ended and Knowledge-Intensive Video Question Answering	Md Zarif Ul Alam et.al.	2502.11747	null
2025-02-17	VRoPE: Rotary Position Embedding for Video Large Language Models	Zikang Liu et.al.	2502.11664	link
2025-02-18	iMOVE: Instance-Motion-Aware Video Understanding	Jiaze Li et.al.	2502.11594	null
2025-02-15	SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding	Zhenyu Yang et.al.	2502.10810	null
2025-02-15	Semantics-aware Test-time Adaptation for 3D Human Pose Estimation	Qiuxia Lin et.al.	2502.10724	null
2025-02-14	Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering	Mark Beliaev et.al.	2502.09573	null
2025-02-11	EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering	Sheng Zhou et.al.	2502.07411	link
2025-02-11	Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis	Amir Hosein Fadaei et.al.	2502.07277	null
2025-02-11	A Survey on Mamba Architecture for Vision Applications	Fady Ibrahim et.al.	2502.07161	null
2025-02-12	A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems	Linxiao Gong et.al.	2502.06581	null
2025-02-11	CoS: Chain-of-Shot Prompting for Long Video Understanding	Jian Hu et.al.	2502.06428	null
2025-02-09	Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding	Xingjian Diao et.al.	2502.06020	link
2025-02-07	Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Yunhang Shen et.al.	2502.05177	link
2025-02-07	VideoRoPE: What Makes for Good Video Rotary Position Embedding?	Xilin Wei et.al.	2502.05173	link
2025-02-06	WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs	Jack Hong et.al.	2502.04326	null
2025-02-05	SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living	Arkaprava Sinha et.al.	2502.03459	null
2025-02-05	MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding	Pengyi Li et.al.	2502.03183	null
2025-02-05	A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions	Hao Yin et.al.	2502.02817	null
2025-02-04	Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives	Simone Alberto Peirone et.al.	2502.02487	link
2025-02-04	TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes	Xingcheng Zhou et.al.	2502.02449	null
2025-02-06	LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models	Tzu-Tao Chang et.al.	2502.02406	null
2025-02-03	VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos	Xubin Ren et.al.	2502.01549	link
2025-02-04	AIN: The Arabic INclusive Large Multimodal Model	Ahmed Heakl et.al.	2502.00094	link
2025-01-31	$\infty$ -Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation	Saul Santos et.al.	2501.19098	link
2025-01-30	MAMS: Model-Agnostic Module Selection Framework for Video Captioning	Sangho Lee et.al.	2501.18269	null
2025-01-28	Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	Yun Li et.al.	2501.16786	null
2025-01-27	AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models	Zheng Lian et.al.	2501.16566	link
2025-01-27	Understanding Long Videos via LLM-Powered Entity Relation Graphs	Meng Chu et.al.	2501.15953	null
2025-01-26	TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding	Xingjian Zhang et.al.	2501.15513	link
2025-01-25	HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	Jiaxing Zhao et.al.	2501.15111	null
2025-01-25	VideoPure: Diffusion-based Adversarial Purification for Video Recognition	Kaixun Jiang et.al.	2501.14999	link
2025-01-24	ENTER: Event Based Interpretable Reasoning for VideoQA	Hammad Ayyubi et.al.	2501.14194	null
2025-01-30	Temporal Preference Optimization for Long-Form Video Understanding	Rui Li et.al.	2501.13919	null
2025-01-23	ReasVQA: Advancing VideoQA with Imperfect Reasoning Process	Jianxin Liang et.al.	2501.13536	null
2025-01-23	Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge	Haomiao Xiong et.al.	2501.13468	link
2025-01-28	VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding	Boqiang Zhang et.al.	2501.13106	link
2025-01-22	InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Yi Wang et.al.	2501.12386	link
2025-01-21	MMVU: Measuring Expert-Level Multi-Discipline Video Understanding	Yilun Zhao et.al.	2501.12380	link
2025-01-21	InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Yuhang Zang et.al.	2501.12368	link
2025-02-03	HFGCN:Hypergraph Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition	Pengcheng Dong et.al.	2501.11007	null
2025-01-15	Admitting Ignorance Helps the Video Question Answering Models to Answer	Haopeng Li et.al.	2501.08771	null
2025-01-14	Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Miran Heo et.al.	2501.08326	null
2025-01-14	Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness	Jiaxing Zhao et.al.	2501.07978	link
2025-01-24	Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding	Liping Yuan et.al.	2501.07888	link
2025-01-14	AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation	Sitong Gong et.al.	2501.07810	link
2025-01-17	MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning	Tieyuan Chen et.al.	2501.07227	null
2025-01-13	TimeLogic: A Temporal Logic Benchmark for Video QA	Sirnam Swetha et.al.	2501.07214	null
2025-01-13	Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling	Jiebin Yan et.al.	2501.07087	null
2025-01-12	X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding	Wenqi Zhou et.al.	2501.06835	null
2025-01-12	VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning	Ji Soo Lee et.al.	2501.06761	link
2025-01-13	Valley2: Exploring Multimodal Models with Scalable Vision-Language Design	Ziheng Wu et.al.	2501.05901	link
2025-01-10	Zero-shot Shark Tracking and Biometrics from Aerial Imagery	Chinmay K Lalgudi et.al.	2501.05717	null
2025-01-10	From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities	Dominick Reilly et.al.	2501.05711	link
2025-01-09	OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?	Yifei Li et.al.	2501.05510	link
2025-01-09	Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning	Huabin Liu et.al.	2501.05069	null
2025-01-09	LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding	Jiaxing Zhao et.al.	2501.05067	null
2025-01-09	LongViTU: Instruction Tuning for Long-Form Video Understanding	Rujie Wu et.al.	2501.05037	null
2025-01-09	ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark	Ronghao Dang et.al.	2501.05031	link
2025-01-08	Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs	Zeyi Huang et.al.	2501.04336	null
2025-01-08	H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving	Siran Chen et.al.	2501.04302	null
2025-01-03	Classifier-Guided Captioning Across Modalities	Ariel Shaulov et.al.	2501.03183	null
2025-01-06	MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models	Wenyi Hong et.al.	2501.02955	null
2024-12-30	FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models	Tianyu Fu et.al.	2501.01986	link
2025-01-03	HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding	Heqing Zou et.al.	2501.01645	link
2025-01-02	Unifying Specialized Visual Encoders for Video Language Models	Jihoon Chung et.al.	2501.01426	link
2025-01-02	Multi-Modal Video Feature Extraction for Popularity Prediction	Haixu Liu et.al.	2501.01422	null
2025-01-08	VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM	Yuqian Yuan et.al.	2501.00599	link
2024-12-31	Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method	Zhenpeng Huang et.al.	2501.00584	null
2024-12-31	OV-HHIR: Open Vocabulary Human Interaction Recognition Using Cross-modal Integration of Large Language Models	Lala Shakti Swarup Ray et.al.	2501.00432	null
2025-01-09	Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding	Yue Fan et.al.	2501.00358	null
2024-12-30	Detection-Fusion for Knowledge Graph Extraction from Videos	Taniya Das et.al.	2501.00136	link
2024-12-30	Hierarchical Banzhaf Interaction for General Video-Language Representation Learning	Peng Jin et.al.	2412.20964	link
2025-01-05	ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding	Xiao Wang et.al.	2412.20504	link
2024-12-28	DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments	Xijun Wang et.al.	2412.20042	null
2024-12-30	MVTamperBench: Evaluating Robustness of Vision-Language Models	Amit Agarwal et.al.	2412.19794	null
2024-12-26	Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries	Roberto Amoroso et.al.	2412.19304	null
2024-12-24	Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models	Jinhui Yi et.al.	2412.18609	link
2024-12-23	HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data	Ting Zhou et.al.	2412.17574	link
2024-12-23	VidCtx: Context-aware Video Question Answering with Image Models	Andreas Goulas et.al.	2412.17415	link
2024-12-22	FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos	Zhengqian Wu et.al.	2412.17022	link
2024-12-22	Video Domain Incremental Learning for Human Action Recognition in Home Environments	Yuanda Hu et.al.	2412.16946	null
2024-12-20	PruneVid: Visual Token Pruning for Efficient Video Large Language Models	Xiaohu Huang et.al.	2412.16117	link
2024-12-20	PolySmart @ TRECVid 2024 Medical Video Question Answering	Jiaxin Wu et.al.	2412.15514	null
2024-12-19	HiCM $^2$ : Hierarchical Compact Memory Modeling for Dense Video Captioning	Minkuk Kim et.al.	2412.14585	null
2024-12-18	Learning from Massive Human Videos for Universal Humanoid Pose Control	Jiageng Mao et.al.	2412.14172	null
2024-12-18	InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Cong Wei et.al.	2412.14006	link
2024-12-18	Do Language Models Understand Time?	Xi Ding et.al.	2412.13845	link
2024-12-19	G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o	Tony Cheng Tong et.al.	2412.13647	link
2024-12-17	FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering	Zheng Cheng et.al.	2412.12833	null
2024-12-17	Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning	Shiping Ge et.al.	2412.12791	link
2024-12-17	ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries	Wangyu Xue et.al.	2412.12675	null
2024-12-16	CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Guo Chen et.al.	2412.12075	null
2024-12-16	VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting	Muhammet Furkan Ilaslan et.al.	2412.11621	link
2024-12-16	Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning	Zhuyang Xie et.al.	2412.11467	null
2024-12-15	Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition	Yulin Wang et.al.	2412.11228	link
2024-12-15	Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track	Deepak Gupta et.al.	2412.11056	null
2024-12-14	Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives	Ji-jun Park et.al.	2412.10720	null
2024-12-12	VCA: Video Curious Agent for Long Video Understanding	Zeyuan Yang et.al.	2412.10471	null
2024-12-11	COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework	Xin Dong et.al.	2412.10435	null
2024-12-13	Apollo: An Exploration of Video Understanding in Large Multimodal Models	Orr Zohar et.al.	2412.10360	null
2024-12-13	B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens	Zhuqiang Lu et.al.	2412.09919	link
2024-12-16	IQViC: In-context, Question Adaptive Vision Compressor for Long-term Video Understanding LMMs	Sosuke Yamao et.al.	2412.09907	null
2024-12-17	ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation	Ali Athar et.al.	2412.09754	null
2024-12-12	PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models	Chenyu Yang et.al.	2412.09613	null
2024-12-12	Neptune: The Long Orbit to Benchmarking Long Video Understanding	Arsha Nagrani et.al.	2412.09582	link
2024-12-12	Agent-based Video Trimming	Lingfeng Yang et.al.	2412.09513	null
2024-12-12	InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption	Tiehan Fan et.al.	2412.09283	null
2024-12-12	Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering	Sai Bhargav Rongali et.al.	2412.09230	null
2024-12-10	3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark	Wufei Ma et.al.	2412.07825	null
2024-12-10	GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning	Yicheng Wang et.al.	2412.07704	null
2024-12-10	Multi-Scale Contrastive Learning for Video Temporal Grounding	Thong Thanh Nguyen et.al.	2412.07157	null
2024-12-09	VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features	Sifei Li et.al.	2412.06296	null
2024-12-11	Towards Long Video Understanding via Fine-detailed Video Story Generation	Zeng You et.al.	2412.06182	null
2024-12-06	Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Zhe Chen et.al.	2412.05271	link
2024-12-11	LinVT: Empower Your Image-level Large Language Model to Understand Videos	Lishuai Gao et.al.	2412.05185	link
2024-12-06	Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection	Khurram Azeem Hashmi et.al.	2412.04915	null
2024-12-12	Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model	Keunwoo Peter Yu et.al.	2412.04729	null
2024-12-05	VisionZip: Longer is Better but Not Necessary in Vision Language Models	Senqiao Yang et.al.	2412.04467	link
2024-12-04	VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	Chaoyu Li et.al.	2412.03735	null
2024-12-04	Streaming Detection of Queried Event Start	Cristobal Eyzaguirre et.al.	2412.03567	link
2024-12-04	Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning	Wujian Peng et.al.	2412.03565	link
2024-12-04	AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning	Yiwu Zhong et.al.	2412.03248	link
2024-12-04	Video LLMs for Temporal Reasoning in Long Videos	Fawad Javed Fateh et.al.	2412.02930	null
2024-12-03	VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding	Kangsan Kim et.al.	2412.02186	link
2024-12-03	Progress-Aware Video Frame Captioning	Zihui Xue et.al.	2412.02071	null
2024-12-04	Towards Universal Soccer Video Understanding	Jiayuan Rao et.al.	2412.01820	link
2024-12-02	PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos	Meng Cao et.al.	2412.01800	null
2024-12-05	SEAL: Semantic Attention Learning for Long Video Representation	Lan Wang et.al.	2412.01798	null
2024-12-02	Unlocking Video-LLM via Agent-of-Thoughts Distillation	Yudi Shi et.al.	2412.01694	null
2024-12-02	Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation	Xin Yan et.al.	2412.01316	null
2024-12-02	Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks	Joseph Raj Vishal et.al.	2412.01132	link
2024-12-01	VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation	Weiming Ren et.al.	2412.00927	null
2024-12-01	VideoSAVi: Self-Aligned Video Language Models without Human Supervision	Yogesh Kulkarni et.al.	2412.00624	null
2024-11-30	Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models	Nadeen Fathallah et.al.	2412.00342	null
2024-11-29	STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training	Haiyi Qiu et.al.	2412.00161	null
2024-12-02	T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Shukang Yin et.al.	2411.19951	link
2024-11-29	Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark	Joseph Heyward et.al.	2411.19941	null
2024-11-29	LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos	Tiantian Geng et.al.	2411.19772	link
2024-11-29	Look Every Frame All at Once: Video-Ma $^2$ mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing	Hosu Lee et.al.	2411.19460	null
2024-11-29	Actions and Objects Pathways for Domain Adaptation in Video Question Answering	Safaa Abdullahi Moallim Mohamud et.al.	2411.19434	null
2024-11-27	AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans	Dillon Loh et.al.	2411.18539	link
2024-11-27	TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability	Shimin Chen et.al.	2411.18211	link
2024-11-27	HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation	Trong-Thuan Nguyen et.al.	2411.18042	null
2024-11-27	VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format	Yueqian Wang et.al.	2411.17991	link
2024-11-25	Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding	Andong Deng et.al.	2411.16932	null
2024-11-25	SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context	Jungang Li et.al.	2411.16213	null
2024-11-25	VideoOrion: Tokenizing Object Dynamics in Videos	Yicheng Feng et.al.	2411.16156	null
2024-11-23	ReWind: Understanding Long Videos with Instructed Learnable Memory	Anxhelo Diko et.al.	2411.15556	null
2024-11-23	FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity	Hang Hua et.al.	2411.15411	null
2024-11-22	VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection	Songhao Han et.al.	2411.14794	link
2024-11-22	Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning	AJ Piergiovanni et.al.	2411.14688	null
2024-11-21	Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Yiming Zhang et.al.	2411.14401	null
2024-11-20	Extending Video Masked Autoencoders to 128 frames	Nitesh Bharadwaj Gundavarapu et.al.	2411.13683	null
2024-11-20	Principles of Visual Tokens for Efficient Video Understanding	Xinyue Hao et.al.	2411.13626	null
2024-11-20	Teaching VLMs to Localize Specific Objects from In-context Examples	Sivan Doveh et.al.	2411.13317	link
2024-11-20	VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation	Ziyang Luo et.al.	2411.13281	null
2024-11-20	Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	Yongdong Luo et.al.	2411.13093	link
2024-11-19	AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Yuanbin Man et.al.	2411.12593	null
2024-11-19	DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding	Yudong Han et.al.	2411.12355	null
2024-11-17	TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models	Tingyu Qu et.al.	2411.11066	link
2024-11-16	ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models	Vipula Rawte et.al.	2411.10867	null
2024-11-13	Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks?	Quan Zhang et.al.	2411.08466	null
2024-11-12	Grounded Video Caption Generation	Evangelos Kazakos et.al.	2411.07584	null
2024-11-11	EVQAScore: Efficient Video Question Answering Data Evaluation	Hao Liang et.al.	2411.06908	null
2024-11-11	Multi-Modal interpretable automatic video captioning	Antoine Hanna-Asaad et.al.	2411.06872	null
2024-11-08	Poze: Sports Technique Feedback under Data Constraints	Agamdeep Singh et.al.	2411.05734	null
2024-11-08	Video RWKV:Video Action Recognition Based RWKV	Zhuowen Yin et.al.	2411.05636	null
2024-11-06	Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning	Ping Li et.al.	2411.04059	link
2024-11-06	StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding	Junming Lin et.al.	2411.03628	link
2024-11-05	Personalized Video Summarization by Multimodal Video Understanding	Brian Chen et.al.	2411.03531	null
2024-11-05	PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	Ruyang Liu et.al.	2411.02327	link
2024-11-04	SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities	Ehsan Faghihi et.al.	2411.01975	null
2024-11-02	Designing a Robust Radiology Report Generation System	Sonit Singh et.al.	2411.01153	null
2024-10-31	Technical Report for Soccernet 2023 – Dense Video Captioning	Zheng Ruan et.al.	2411.00882	null
2024-10-31	Video Token Merging for Long-form Video Understanding	Seon-Ho Lee et.al.	2410.23782	null
2024-10-30	TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models	Ziyao Shangguan et.al.	2410.23266	link
2024-10-30	Situational Scene Graph for Structured Human-centric Situation Understanding	Chinthani Sugandhika et.al.	2410.22829	link
2024-10-29	Standardization Trends on Safety and Trustworthiness Technology for Advanced AI	Jonghong Jeon et.al.	2410.22151	null
2024-10-28	Zero-Shot Action Recognition in Surveillance Videos	Joao Pereira et.al.	2410.21113	null
2024-10-26	Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning	Sullam Jeoung et.al.	2410.20252	null
2024-10-25	FLAASH: Flow-Attention Adaptive Semantic Hierarchical Fusion for Multi-Modal Tobacco Content Analysis	Naga VS Raviteja Chappa et.al.	2410.19896	null
2024-10-25	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Xiangyu Zeng et.al.	2410.19702	null
2024-10-24	VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks	Lawrence Jang et.al.	2410.19100	null
2024-10-24	CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Sara Ghaboura et.al.	2410.18976	link
2024-10-22	LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	Xiaoqian Shen et.al.	2410.17434	link
2024-10-22	Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models	Zhijie Tan et.al.	2410.16983	null
2024-10-22	EVC-MF: End-to-end Video Captioning Network with Multi-scale Features	Tian-Zi Niu et.al.	2410.16624	null
2024-10-21	xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs	Michael S. Ryoo et.al.	2410.16267	null
2024-10-20	EVA: An Embodied World Model for Future Video Anticipation	Xiaowei Chi et.al.	2410.15461	null
2024-10-20	ContextDet: Temporal Action Detection with Adaptive Context Aggregation	Ning Wang et.al.	2410.15279	null
2024-10-20	Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison	Shiyu Hu et.al.	2410.15270	null
2024-10-19	Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling	Hao Wu et.al.	2410.14993	null
2024-10-18	Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Josiah Aklilu et.al.	2410.14340	null
2024-10-15	It’s Just Another Day: Unique Video Captioning by Discriminative Prompting	Toby Perrett et.al.	2410.11702	null
2024-10-15	VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI	Sijie Cheng et.al.	2410.11623	null
2024-10-15	VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models	Xiaohan Lan et.al.	2410.11417	null
2024-10-15	TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models	Mu Cai et.al.	2410.10818	link
2024-10-14	LVD-2M: A Long-take Video Dataset with Temporally Dense Captions	Tianwei Xiong et.al.	2410.10816	link
2024-10-14	MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer	Minghao Zhu et.al.	2410.10589	link
2024-10-16	Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs	Kai Han et.al.	2410.10441	link
2024-10-13	ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification	Chen Mao et.al.	2410.09875	null
2024-10-13	MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models	Hang Hua et.al.	2410.09733	null
2024-10-12	Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering	Ting Yu et.al.	2410.09380	null
2024-10-12	Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering	Ting Yu et.al.	2410.09379	link
2024-10-11	VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding	Houlun Chen et.al.	2410.08593	link
2024-10-10	Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models	Qingni Wang et.al.	2410.08174	null
2024-10-10	Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations	Yiyuan Zhang et.al.	2410.08049	link
2024-10-10	TVBench: Redesigning Video-Language Evaluation	Daniel Cores et.al.	2410.07752	null
2024-10-09	MM-Ego: Towards Building Egocentric Multimodal LLMs	Hanrong Ye et.al.	2410.07177	null
2024-10-11	Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization	Changli Tang et.al.	2410.06682	null
2024-10-15	ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition	Mohammadreza Salehi et.al.	2410.05774	null
2024-10-08	Enhancing Temporal Modeling of Video LLMs via Time Gating	Zi-Yuan Hu et.al.	2410.05714	link
2024-10-08	TRACE: Temporal Grounding Video LLM via Causal Event Modeling	Yongxin Guo et.al.	2410.05643	link
2024-10-09	SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference	Yuan Zhang et.al.	2410.04417	link
2024-10-04	SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data	Liqian Zhang et.al.	2410.03879	link
2024-10-04	Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models	Haibo Wang et.al.	2410.03290	link
2024-10-07	Frame-Voyager: Learning to Query Frames for Video Large Language Models	Sicheng Yu et.al.	2410.03226	null
2024-10-04	AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark	Wenhao Chai et.al.	2410.03051	null
2024-10-03	AirLetters: An Open Video Dataset of Characters Drawn in the Air	Rishit Dagli et.al.	2410.02921	null
2024-10-01	YouTube Video Analytics for Patient Engagement: Evidence from Colonoscopy Preparation Videos	Yawen Guo et.al.	2410.02830	null
2024-10-03	Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos	Jianrui Zhang et.al.	2410.02763	null
2024-10-09	DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM	Xuchen Li et.al.	2410.02492	null
2024-10-02	Deep learning for action spotting in association football videos	Silvio Giancola et.al.	2410.01304	null
2024-10-02	UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark	Hasnat Md Abdullah et.al.	2410.01180	link
2024-10-01	ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding	Liang Shi et.al.	2410.00982	link
2024-10-01	Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting	Chen Cai et.al.	2410.00771	link
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566	null
2024-10-04	VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs	Ruotong Liao et.al.	2409.20365	link
2024-09-30	Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Zicheng Zhang et.al.	2409.20063	null
2024-10-02	Visual Context Window Extension: A New Perspective for Long Video Understanding	Hongchen Wei et.al.	2409.20018	null
2024-09-29	Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding	Xiao Wang et.al.	2409.19532	null
2024-09-27	From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding	Heqing Zou et.al.	2409.18938	link
2024-09-27	Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks	Min Yang et.al.	2409.18478	null
2024-09-26	E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding	Ye Liu et.al.	2409.18111	link
2024-09-26	IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning	Soeun Lee et.al.	2409.18046	link
2024-09-26	LLM4Brain: Training a Large Language Model for Brain Video Understanding	Ruizhe Zheng et.al.	2409.17987	null
2024-09-26	EAGLE: Egocentric AGgregated Language-video Engine	Jing Bi et.al.	2409.17523	null
2024-09-23	Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP	Zeliang Zhang et.al.	2409.15035	null
2024-09-24	Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding	Yan Shu et.al.	2409.14485	link
2024-09-22	Scene-Text Grounding for Text-Based Video Question Answering	Sheng Zhou et.al.	2409.14319	link
2024-09-20	ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation	Abrar Anwar et.al.	2409.13682	link
2024-09-20	Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder	Aditya Kommineni et.al.	2409.13606	null
2024-09-20	First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge	Yingzhe Peng et.al.	2409.13538	null
2024-09-19	Interpretable Action Recognition on Hard to Classify Actions	Anastasia Anichenko et.al.	2409.13091	null
2024-09-17	AMEGO: Active Memory from long EGOcentric videos	Gabriele Goletto et.al.	2409.10917	null
2024-09-16	HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions	Alexandru Bobe et.al.	2409.10641	null
2024-09-16	SoccerNet 2024 Challenges Results	Anthony Cioppa et.al.	2409.10587	link
2024-09-14	QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems	Zhixian He et.al.	2409.09348	null
2024-09-12	Top-down Activity Representation Learning for Video Question Answering	Yanan Wang et.al.	2409.07748	null
2024-09-12	Multi-object event graph representation learning for Video Question Answering	Yanan Wang et.al.	2409.07747	null
2024-09-10	Enhancing Long Video Understanding via Hierarchical Event-Based Memory	Dingxin Cheng et.al.	2409.06299	null
2024-09-11	VidLPRO: A $\underline{Vid}$eo-$\underline{L}$anguage $\underline{P}$re-training Framework for $\underline{Ro}$ botic and Laparoscopic Surgery	Mohammadmahdi Honarmand et.al.	2409.04732	null
2024-09-06	Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment	Keyne Oei et.al.	2409.04607	link
2024-09-05	TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations	Mingze Gao et.al.	2409.03206	null
2024-09-04	LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture	Xidong Wang et.al.	2409.02889	link
2024-09-03	A Novel Audio-Visual Information Fusion System for Mental Disorders Detection	Yichun Li et.al.	2409.02243	null
2024-09-02	VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges	Yuxuan Wang et.al.	2409.01071	null
2024-08-31	Streamlining Forest Wildfire Surveillance: AI-Enhanced UAVs Utilizing the FLAME Aerial Video Dataset for Lightweight and Efficient Monitoring	Lemeng Zhao et.al.	2409.00510	null
2024-08-31	StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models	Yuxiang Guo et.al.	2409.00304	null
2024-09-20	HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics	Gueter Josmy Faure et.al.	2408.17443	link
2024-08-29	CogVLM2: Visual Language Models for Image and Video Understanding	Wenyi Hong et.al.	2408.16500	link
2024-08-29	DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning	Zeyi Bo et.al.	2408.16195	null
2024-08-28	Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input	Jiajun Liu et.al.	2408.15542	null
2024-08-27	Fine-grained length controllable video captioning with ordinal embeddings	Tomoya Nitta et.al.	2408.15447	null
2024-08-27	GenRec: Unifying Video Generation and Recognition with Diffusion Models	Zejia Weng et.al.	2408.15241	link
2024-08-27	Sec2Sec Co-attention for Video-Based Apparent Affective Prediction	Mingwei Sun et.al.	2408.15209	link
2024-08-26	Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos	Qirui Chen et.al.	2408.14469	null
2024-08-26	Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification	Mahrukh Awan et.al.	2408.14441	null
2024-08-26	Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos	Jiajun Fei et.al.	2408.14023	link
2024-08-26	LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models	Qihang Ge et.al.	2408.14008	null
2024-08-23	Cap2Sum: Learning to Summarize Videos by Generating Captions	Cairong Zhao et.al.	2408.12800	null
2024-08-22	Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models	Jean Park et.al.	2408.12763	null
2024-08-21	Audio Description Customization	Rosiana Natalie et.al.	2408.11406	null
2024-08-21	LongVILA: Scaling Long-Context Visual Language Models for Long Videos	Fuzhao Xue et.al.	2408.10188	link
2024-08-17	Flatten: Video Action Recognition is an Image Classification task	Junlin Chen et.al.	2408.09220	null
2024-07-31	Segment Anything for Videos: A Systematic Survey	Chunhui Zhang et.al.	2408.08315	link
2024-08-15	VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps	Senthil Hariharan Arul et.al.	2408.08301	null
2024-08-15	LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning	Jiajie Li et.al.	2408.07981	null
2024-08-15	Continuous Perception Benchmark	Zeyu Wang et.al.	2408.07867	null
2024-08-14	Disentangle and denoise: Tackling context misalignment for video moment retrieval	Kaijing Ma et.al.	2408.07600	null
2024-08-12	HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization	Sakib Reza et.al.	2408.06437	link
2024-08-12	OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning	Mushui Liu et.al.	2408.06158	link
2024-08-12	CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer	Zhuoyi Yang et.al.	2408.06072	link
2024-08-09	Spherical World-Locking for Audio-Visual Localization in Egocentric Videos	Heeseung Yun et.al.	2408.05364	null
2024-08-08	VideoQA in the Era of LLMs: An Empirical Study	Junbin Xiao et.al.	2408.04223	link
2024-08-06	LLaVA-OneVision: Easy Visual Task Transfer	Bo Li et.al.	2408.03326	link
2024-08-06	Dual-path Collaborative Generation Network for Emotional Video Captioning	Cheng Ye et.al.	2408.03006	link
2024-08-05	Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph	Zhao Kaichen et.al.	2408.02535	null
2024-08-05	FE-Adapter: Adapting Image-based Emotion Classifiers to Videos	Shreyank N Gowda et.al.	2408.02421	null
2024-08-05	COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark	Koki Maeda et.al.	2408.02272	link
2024-08-01	Text-Guided Video Masked Autoencoder	David Fan et.al.	2408.00759	null
2024-08-01	Multimodal Fusion and Coherence Modeling for Video Topic Segmentation	Hai Yu et.al.	2408.00365	null
2024-07-31	Learning Video Context as Interleaved Multimodal Sequences	Kevin Qinghong Lin et.al.	2407.21757	link
2024-07-30	Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos	Dhruv Verma et.al.	2407.20642	link
2024-07-23	Causal Understanding For Video Question Answering	Bhanu Prakash Reddy Guda et.al.	2407.20257	null
2024-07-29	Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter	Chao Liu et.al.	2407.19981	null
2024-07-28	Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation	Tz-Ying Wu et.al.	2407.19520	null
2024-07-26	Wolf: Captioning Everything with a World Summarization Framework	Boyi Li et.al.	2407.18908	null
2024-07-26	Harnessing Temporal Causality for Advanced Temporal Action Detection	Shuming Liu et.al.	2407.17792	link
2024-07-23	EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval	Thomas Hummel et.al.	2407.16658	link
2024-07-22	LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding	Haoning Wu et.al.	2407.15754	link
2024-07-23	End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling	Jianxin Liang et.al.	2407.15047	null
2024-07-21	Audio-visual training for improved grounding in video-text LLMs	Shivprasad Sagare et.al.	2407.15046	null
2024-07-19	EVLM: An Efficient Vision-Language Model for Visual Understanding	Kaibing Chen et.al.	2407.14177	null
2024-07-19	Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance	Changye Li et.al.	2407.13982	null
2024-07-18	Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data	Wufei Ma et.al.	2407.13094	null
2024-07-17	Goldfish: Vision-Language Understanding of Arbitrarily Long Videos	Kirolos Ataallah et.al.	2407.12679	null
2024-07-16	Scaling Sign Language Translation	Biao Zhang et.al.	2407.11855	null
2024-07-23	Video-Language Alignment via Spatio-Temporal Graph Transformer	Shi-Xue Zhang et.al.	2407.11677	link
2024-07-04	Purification Of Contaminated Convolutional Neural Networks Via Robust Recovery: An Approach with Theoretical Guarantee in One-Hidden-Layer Case	Hanxiao Lu et.al.	2407.11031	null
2024-07-15	TripletViNet: Mitigating Misinformation Video Spread Across Platforms	Petar Smolovic et.al.	2407.10644	null
2024-07-12	Open Vocabulary Multi-Label Video Classification	Rohit Gupta et.al.	2407.09073	null
2024-07-11	VideoMamba: Spatio-Temporal Selective State Space Model	Jinyoung Park et.al.	2407.08476	link
2024-07-16	Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding	Minghui Wu et.al.	2407.08150	link
2024-07-10	Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems	Chashi Mahiul Islam et.al.	2407.07392	null
2024-07-09	Rethinking Image-to-Video Adaptation: An Object-centric Perspective	Rui Qian et.al.	2407.06871	null
2024-07-09	VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model	Xinhao Li et.al.	2407.06491	link
2024-07-08	Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision	Orr Zohar et.al.	2407.06189	link
2024-07-06	OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding	Tiancheng Zhao et.al.	2407.04923	null
2024-07-20	Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning	Thong Nguyen et.al.	2407.03788	link
2024-07-04	VDMA: Video Question Answering with Dynamically Generated Multi-Agents	Noriyuki Kugo et.al.	2407.03610	null
2024-07-03	InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Pan Zhang et.al.	2407.03320	link
2024-07-03	KeyVideoLLM: Towards Large-scale Video Keyframe Selection	Hao Liang et.al.	2407.03104	null
2024-07-03	Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering	Zhaohe Liao et.al.	2407.03008	null
2024-07-03	PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition	Yanbin Hao et.al.	2407.02934	link
2024-07-03	Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs	Jinmin Li et.al.	2407.02411	null
2024-07-02	The Solution for the ICCV 2023 Perception Test Challenge 2023 – Task 6 – Grounded videoQA	Hailiang Zhang et.al.	2407.01907	null
2024-07-10	Referring Atomic Video Action Recognition	Kunyu Peng et.al.	2407.01872	link
2024-06-30	Tarsier: Recipes for Training and Evaluating Large Video Description Models	Jiawei Wang et.al.	2407.00634	link
2024-06-30	Hierarchical Memory for Long Video QA	Yiqin Wang et.al.	2407.00603	null
2024-06-28	InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding	Kirolos Ataallah et.al.	2406.19875	link
2024-06-27	Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads	Ali Khaleghi Rahimian et.al.	2406.19391	link
2024-06-27	OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding	Tao Zhang et.al.	2406.19389	null
2024-06-27	VideoMambaPro: A Leap Forward for Mamba in Video Understanding	Hui Lu et.al.	2406.19006	link
2024-06-25	Zero-Shot Long-Form Video Understanding through Screenplay	Yongliang Wu et.al.	2406.17309	null
2024-06-24	PVUW 2024 Challenge on Complex Video Understanding: Methods and Results	Henghui Ding et.al.	2406.17005	link
2024-06-25	OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer	Lu Zhang et.al.	2406.16620	link
2024-06-24	Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks	Daniel Wen et.al.	2406.16346	null
2024-06-24	VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models	Yuxuan Wang et.al.	2406.16338	null
2024-06-22	HCQA @ Ego4D EgoSchema Challenge 2024	Haoyu Zhang et.al.	2406.15771	link
2024-06-22	video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models	Guangzhi Sun et.al.	2406.15704	link
2024-06-20	MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding	Xinyu Fang et.al.	2406.14515	link
2024-06-20	Live Video Captioning	Eduardo Blanco-Fernández et.al.	2406.14206	link
2024-06-20	Towards Event-oriented Long Video Understanding	Yifan Du et.al.	2406.14129	link
2024-06-19	Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset	Yuchen Yang et.al.	2406.13809	null
2024-06-21	AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding	Alessandro Suglia et.al.	2406.13807	link
2024-06-19	GUI Action Narrator: Where and When Did That Action Take Place?	Qinchen Wu et.al.	2406.13719	null
2024-06-19	GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement	Hao Wang et.al.	2406.13136	null
2024-06-18	DrVideo: Document Retrieval Based Long Video Understanding	Ziyu Ma et.al.	2406.12846	null
2024-06-18	VoCo-LLaMA: Towards Vision Compression with Large Language Models	Xubing Ye et.al.	2406.12275	link
2024-06-26	Slot State Space Models	Jindong Jiang et.al.	2406.12272	link
2024-06-18	Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM	Huaxin Zhang et.al.	2406.12235	link
2024-06-17	Task Me Anything	Jieyu Zhang et.al.	2406.11775	link
2024-06-17	Hallucination Mitigation Prompts Long-term Video Understanding	Yiwei Sun et.al.	2406.11333	null
2024-06-17	VideoVista: A Versatile Benchmark for Video Understanding and Reasoning	Yunxin Li et.al.	2406.11303	null
2024-06-17	i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment	Daechul Ahn et.al.	2406.11280	link
2024-06-16	VELOCITI: Can Video-Language Models Bind Semantic Concepts through Time?	Darshana Saravanan et.al.	2406.10889	null
2024-06-15	EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos	Vineet Parikh et.al.	2406.10750	null
2024-06-15	Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model	Lu Xu et.al.	2406.10484	link
2024-06-14	Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding	Ridouane Ghermi et.al.	2406.10221	link
2024-06-22	Localizing Events in Videos with Multimodal Queries	Gengyuan Zhang et.al.	2406.10079	null
2024-06-14	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	Yiqi Wu et.al.	2406.09781	null
2024-06-14	A Survey of Video Datasets for Grounded Event Understanding	Kate Sanders et.al.	2406.09646	link
2024-06-13	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Muhammad Maaz et.al.	2406.09418	link
2024-06-17	Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA	Jongwoo Park et.al.	2406.09396	link
2024-06-13	Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs	Zijia Zhao et.al.	2406.09367	link
2024-06-13	MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos	Xuehai He et.al.	2406.08407	link
2024-06-12	Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams	Haoji Zhang et.al.	2406.08085	link
2024-06-12	LVBench: An Extreme Long Video Understanding Benchmark	Weihan Wang et.al.	2406.08035	link
2024-06-12	Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models	Shimin Chen et.al.	2406.08024	null
2024-06-12	Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model	Elaheh Baharlouei et.al.	2406.07841	link
2024-06-17	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Zesen Cheng et.al.	2406.07476	link
2024-06-11	MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD	Ioanna Ntinou et.al.	2406.07191	null
2024-06-10	NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative	Asmar Nadeem et.al.	2406.06499	null
2024-06-10	Vript: A Video Is Worth Thousands of Words	Dongjie Yang et.al.	2406.06040	link
2024-06-08	1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR’24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation	Qingfeng Liu et.al.	2406.05352	null
2024-06-07	Semantic Segmentation on VSPW Dataset through Masked Video Consistency	Chen Liang et.al.	2406.04979	null
2024-06-06	ShareGPT4Video: Improving Video Understanding and Generation with Better Captions	Lin Chen et.al.	2406.04325	null
2024-06-06	MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding	Junjie Zhou et.al.	2406.04264	link
2024-06-07	3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Ruipu Wu et.al.	2406.04002	null
2024-06-04	Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges	Daniel A. P. Oliveira et.al.	2406.02748	null
2024-06-04	Contrastive Language Video Time Pre-training	Hengyue Liu et.al.	2406.02631	null
2024-05-21	Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration	Wei Ji et.al.	2406.01601	null
2024-06-03	Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos	Luigi Seminara et.al.	2406.01486	link
2024-06-02	Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering	Xingrui Wang et.al.	2406.00622	link
2024-06-01	2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Biao Wu et.al.	2406.00500	null
2024-06-06	HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model	Khoa Vo et.al.	2406.00307	null
2024-05-31	Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization	Richard Luo et.al.	2405.20648	link
2024-05-30	Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera	Inpyo Song et.al.	2405.19794	null
2024-05-30	Encoding and Controlling Global Semantics for Long-form Video Question Answering	Thong Thanh Nguyen et.al.	2405.19723	link
2024-05-30	EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos	Ryo Fujii et.al.	2405.19644	link
2024-05-29	VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	Ziyang Wang et.al.	2405.19209	link
2024-05-28	MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning	Somnath Kumar et.al.	2405.18358	null
2024-05-28	Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions	Rui Zhang et.al.	2405.17729	null
2024-05-27	Video Enriched Retrieval Augmented Generation Using Aligned Video Captions	Kevin Dela Rosa et.al.	2405.17706	link
2024-05-25	Streaming Long Video Understanding with Large Language Models	Rui Qian et.al.	2405.16009	null
2024-05-23	MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models	Jiuming Liu et.al.	2405.14338	null
2024-05-22	Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline	Dingyi Yang et.al.	2405.14040	null
2024-05-22	TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment	Wei Li et.al.	2405.13911	link
2024-05-22	Dense Connector for MLLMs	Huanjin Yao et.al.	2405.13800	link
2024-05-22	VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding	Yongxin Guo et.al.	2405.13382	link
2024-05-21	Anticipating Object State Changes	Victoria Manousaki et.al.	2405.12789	null
2024-05-17	Open-Vocabulary Spatio-Temporal Action Detection	Tao Wu et.al.	2405.10832	null
2024-05-14	Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis	Yao Fu et.al.	2405.08944	null
2024-05-14	CinePile: A Long Video Question Answering Dataset and Benchmark	Ruchit Rawal et.al.	2405.08813	null
2024-05-14	No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding	Yingjie Zhai et.al.	2405.08344	link
2024-05-13	FreeVA: Offline MLLM as Training-Free Video Assistant	Wenhao Wu et.al.	2405.07798	link
2024-05-11	Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People	Masaki Kuribayashi et.al.	2405.07060	null
2024-05-11	Retrieval Enhanced Zero-Shot Video Captioning	Yunchuan Ma et.al.	2405.07046	null
2024-05-11	Global Motion Understanding in Large-Scale Video Object Segmentation	Volodymyr Fedynyak et.al.	2405.07031	null
2024-05-09	A Survey on Backbones for Deep Video Action Recognition	Zixuan Tang et.al.	2405.05584	null
2024-05-08	Transfer-LMR: Heavy-Tail Driving Behavior Recognition in Diverse Traffic Scenarios	Chirag Parikh et.al.	2405.05354	null
2024-05-07	Vision Mamba: A Comprehensive Survey and Taxonomy	Xiao Liu et.al.	2405.04404	link
2024-05-06	Foundation Models for Video Understanding: A Survey	Neelu Madan et.al.	2405.03770	link
2024-05-08	How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs	Muhammad Uzair Khattak et.al.	2405.03690	null
2024-05-06	WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning	Yuanhan Zhang et.al.	2405.03272	null
2024-04-30	Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition	Zhendong Liu et.al.	2404.19383	null
2024-05-01	Capabilities of Gemini Models in Medicine	Khaled Saab et.al.	2404.18416	null
2024-04-26	Learning text-to-video retrieval from image captioning	Lucas Ventura et.al.	2404.17498	null
2024-04-26	MovieChat+: Question-aware Sparse Memory for Long Video Question Answering	Enxin Song et.al.	2404.17176	link
2024-04-26	Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting	Yuanyuan Liu et.al.	2404.17100	null
2024-04-29	PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning	Lin Xu et.al.	2404.16994	link
2024-04-25	SFMViT: SlowFast Meet ViT in Chaotic World	Jiaying Lin et.al.	2404.16609	link
2024-04-23	IPAD: Industrial Process Anomaly Detection Dataset	Jinfan Liu et.al.	2404.15033	null
2024-04-23	Pegasus-v1 Technical Report	Raehyuk Jung et.al.	2404.14687	null
2024-04-26	Narrative Action Evaluation with Prompt-Guided Multimodal Interaction	Shiyi Zhang et.al.	2404.14471	link
2024-04-20	Movie101v2: Improved Movie Narration Benchmark	Zihao Yue et.al.	2404.13370	null
2024-04-18	Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models	Reka Team et.al.	2404.12387	null
2024-04-18	From Image to Video, what do we need in multimodal LLMs?	Suyuan Huang et.al.	2404.11865	null
2024-04-17	VG4D: Vision-Language Model Goes 4D Video Recognition	Zhichao Deng et.al.	2404.11605	link
2024-04-15	Leveraging Temporal Contextualization for Video Action Recognition	Minji Kim et.al.	2404.09490	link
2024-04-15	The 8th AI City Challenge	Shuo Wang et.al.	2404.09432	null
2024-04-16	Human-in-the-Loop Segmentation of Multi-species Coral Imagery	Scarlett Raine et.al.	2404.09406	link
2024-04-14	In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition	Wiktor Mucha et.al.	2404.09308	link
2024-04-14	TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning	Quang Minh Dinh et.al.	2404.09275	link
2024-04-14	Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection	Jin Yang et.al.	2404.09263	link
2024-04-12	Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis	Maged Shoman et.al.	2404.08229	link
2024-04-11	Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval	Minkuk Kim et.al.	2404.07610	link
2024-04-10	A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos	Suleyman Ozdel et.al.	2404.07351	null
2024-04-10	Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention	Suleyman Ozdel et.al.	2404.07347	null
2024-04-09	MoReVQA: Exploring Modular Reasoning Models for Video Question Answering	Juhong Min et.al.	2404.06511	null
2024-04-07	X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model	Jan Held et.al.	2404.06332	null
2024-04-24	MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding	Bo He et.al.	2404.05726	link
2024-04-06	SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos	Tao Wu et.al.	2404.04565	link
2024-04-19	Koala: Key frame-conditioned long video-LLM	Reuben Tan et.al.	2404.04346	null
2024-04-05	Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering	Lili Liang et.al.	2404.04007	null
2024-04-04	OW-VISCap: Open-World Video Instance Segmentation and Captioning	Anwesa Choudhuri et.al.	2404.03657	null
2024-04-04	MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	Kirolos Ataallah et.al.	2404.03413	link
2024-04-10	LongVLM: Efficient Long Video Understanding via Large Language Models	Yuetian Weng et.al.	2404.03384	link
2024-04-03	DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement	Hao Wu et.al.	2404.02755	null
2024-04-05	SnAG: Scalable and Accurate Video Grounding	Fangzhou Mu et.al.	2404.02257	null
2024-04-01	TraveLER: A Multi-LMM Agent Framework for Video Question-Answering	Chuyi Shang et.al.	2404.01476	link
2024-04-01	CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes	Ting En Lam et.al.	2404.01299	link
2024-04-01	Streaming Dense Video Captioning	Xingyi Zhou et.al.	2404.01297	link
2024-04-02	Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward	Ruohong Zhang et.al.	2404.01258	link
2024-04-01	VideoDistill: Language-aware Vision Distillation for Video Question Answering	Bo Zou et.al.	2404.00973	null
2024-03-31	$R^2$ -Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	Ye Liu et.al.	2404.00801	link
2024-03-30	Instrument-tissue Interaction Detection Framework for Surgical Video Understanding	Wenjun Lin et.al.	2404.00322	null
2024-03-30	ST-LLM: Large Language Models Are Effective Temporal Learners	Ruyang Liu et.al.	2404.00308	link
2024-03-29	A Unified Framework for Human-centric Point Cloud Video Understanding	Yiteng Xu et.al.	2403.20031	null
2024-03-28	Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality	Sishuo Chen et.al.	2403.19221	link
2024-03-27	An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Wonkyun Kim et.al.	2403.18406	link
2024-03-26	OmniVid: A Generative Framework for Universal Video Understanding	Junke Wang et.al.	2403.17935	link
2024-03-25	Understanding Long Videos in One Multimodal Language Model Pass	Kanchana Ranasinghe et.al.	2403.16998	link
2024-03-24	AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue	Yunlong Tang et.al.	2403.16276	null
2024-03-22	InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding	Yi Wang et.al.	2403.15377	link
2024-03-25	VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding	Ahmad Mahmood et.al.	2403.14743	link
2024-03-21	Language Repository for Long Video Understanding	Kumara Kahatapitiya et.al.	2403.14622	link
2024-03-21	Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels	Tianming Liang et.al.	2403.14430	null
2024-03-18	Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation	Zixin Zhu et.al.	2403.12042	link
2024-03-18	Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation	Wangbo Zhao et.al.	2403.11808	link
2024-03-27	LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model	Yuxin Cao et.al.	2403.11656	null
2024-03-18	VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding	Yue Fan et.al.	2403.11481	null
2024-03-15	VideoAgent: Long-form Video Understanding with Large Language Model as Agent	Xiaohan Wang et.al.	2403.10517	null
2024-03-14	Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding	Guo Chen et.al.	2403.09626	link
2024-03-25	Don’t Judge by the Look: Towards Motion Coherent Video Representation	Yitian Zhang et.al.	2403.09506	link
2024-03-13	DAM: Dynamic Adapter Merging for Continual Video QA Learning	Feng Cheng et.al.	2403.08755	link
2024-03-11	Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions	Lan Wang et.al.	2403.07198	null
2024-03-12	VideoMamba: State Space Model for Efficient Video Understanding	Kunchang Li et.al.	2403.06977	link
2024-03-25	An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models	Liang Chen et.al.	2403.06764	link
2024-03-08	Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation	Joseph Cho et.al.	2403.05131	null
2024-03-11	Beyond MOT: Semantic Multi-Object Tracking	Yunhao Li et.al.	2403.05021	link
2024-03-08	Pix2Gif: Motion-Guided Diffusion for GIF Generation	Hitesh Kandala et.al.	2403.04634	link
2024-03-05	A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives	Simone Alberto Peirone et.al.	2403.03037	null
2024-03-03	MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies	Zhende Song et.al.	2403.01422	null
2024-03-01	Abductive Ego-View Accident Video Understanding for Safe Driving Perception	Jianwu Fang et.al.	2403.00436	null
2024-02-29	Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers	Tsai-Shien Chen et.al.	2402.19479	null
2024-03-11	TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning	Kate Sanders et.al.	2402.19467	null
2024-02-29	Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition	Boyu Chen et.al.	2402.18951	null
2024-02-27	MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning	Huiyu Xiong et.al.	2402.17680	null
2024-02-25	LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding	Yuxuan Wang et.al.	2402.16050	link
2024-02-22	Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation	Mingxuan Yan et.al.	2402.14326	null
2024-02-21	LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs	Yunxin Li et.al.	2402.13546	null
2024-02-28	Video ReCap: Recursive Captioning of Hour-Long Videos	Md Mohaiminul Islam et.al.	2402.13250	link
2024-02-20	VideoPrism: A Foundational Visual Encoder for Video Understanding	Long Zhao et.al.	2402.13217	null
2024-02-20	Slot-VLM: SlowFast Slots for Video-Language Modeling	Jiaqi Xu et.al.	2402.13088	null
2024-02-19	System Identification of Neural Systems: Going Beyond Images to Modelling Dynamics	Mai Gamal et.al.	2402.12519	null
2024-02-19	LVCHAT: Facilitating Long Video Comprehension	Yu Wang et.al.	2402.12079	link
2024-02-28	Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly Videos	Shijia Feng et.al.	2402.11057	link
2024-02-16	Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering	David Romero et.al.	2402.10698	link
2024-02-13	World Model on Million-Length Video And Language With RingAttention	Hao Liu et.al.	2402.08268	link
2024-02-12	BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind	Yuanyuan Mao et.al.	2402.07402	null
2024-02-09	Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning	Amir Ziai et.al.	2402.06560	link
2024-02-09	Dynamic swarms regulate the morphology and distribution of soft membrane domains	Aakanksha Gubbala et.al.	2402.06518	null
2024-02-08	Memory Consolidation Enables Long-Context Video Understanding	Ivana Balažević et.al.	2402.05861	null
2024-02-06	Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization	Yang Jin et.al.	2402.03161	null
2024-02-04	Spatio-temporal Prompting Network for Robust Video Feature Extraction	Guanxiong Sun et.al.	2402.02574	link
2024-02-02	Simulator-Free Visual Domain Randomization via Video Games	Chintan Trivedi et.al.	2402.01335	link
2024-01-30	YTCommentQA: Video Question Answerability in Instructional Videos	Saelyne Yang et.al.	2401.17343	link
2024-01-30	Multi-granularity Correspondence Learning from Long-term Noisy Videos	Yijie Lin et.al.	2401.16702	null
2024-01-29	Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model	Till Grutschus et.al.	2401.16280	null
2024-01-25	Knowledge Graph Supported Benchmark and Video Captioning for Basketball	Zeyu Xi et.al.	2401.13888	null
2024-01-22	ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition	Jiaming Zhou et.al.	2401.11654	null
2024-01-21	Exploring Missing Modality in Multimodal Egocentric Datasets	Merey Ramazanova et.al.	2401.11470	null
2024-01-19	Learning to Visually Connect Actions and their Effects	Eric Peh et.al.	2401.10805	null
2024-01-28	Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering	Haibo Wang et.al.	2401.10711	link
2024-01-17	CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding	Yunze Liu et.al.	2401.09057	null
2024-01-16	Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data	Yuhui Zhang et.al.	2401.08567	link
2024-01-16	Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization	Chongzhi Zhang et.al.	2401.08232	null
2024-01-11	Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition	Yukun Zuo et.al.	2401.06287	link
2024-01-10	HaltingVT: Adaptive Token Halting Transformer for Efficient Video Recognition	Qian Wu et.al.	2401.04975	link
2024-01-10	SnapCap: Efficient Snapshot Compressive Video Captioning	Jianqiao Sun et.al.	2401.04903	null
2024-01-08	Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification	Wentao Zhu et.al.	2401.04154	null
2024-01-08	Dr $^2$ Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning	Chen Zhao et.al.	2401.04105	link
2024-01-08	STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering	Yueqian Wang et.al.	2401.03901	link

Publish Date	Title	Authors	PDF	Code
2025-07-23	HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning	Li Jun et.al.	2507.17402	null
2025-07-23	IONext: Unlocking the Next Era of Inertial Odometry	Shanshan Zhang et.al.	2507.17089	null
2025-07-22	Controllable Hybrid Captioner for Improved Long-form Video Understanding	Kuleen Sasse et.al.	2507.17047	null
2025-07-22	Explicit Context Reasoning with Supervision for Visual Tracking	Fansheng Zeng et.al.	2507.16191	null
2025-07-22	PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation	Yaofang Liu et.al.	2507.16116	null
2025-07-21	Nonlinear Framework for Speech Bandwidth Extension	Tarikul Islam Tamiti et.al.	2507.15970	null
2025-07-21	Dissociating model architectures from inference computations	Noor Sajid et.al.	2507.15776	null
2025-07-20	Time-Aware Attention for Enhanced Electronic Health Records Modeling	Junhan Yu et.al.	2507.14847	null
2025-07-18	DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits	Garapati Keerthana et.al.	2507.14079	null
2025-07-18	DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation	Yitong Li et.al.	2507.13957	null
2025-07-18	Team of One: Cracking Complex Video QA with Model Synergy	Jun Xie et.al.	2507.13820	null
2025-07-18	Bi-GRU Based Deception Detection using EEG Signals	Danilo Avola et.al.	2507.13718	null
2025-07-18	ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle	Mihran Miroyan et.al.	2507.12674	null
2025-07-16	A Bayesian Spatio-Temporal Model of Temperature- and Humidity-Related Mortality Using High-Resolution Climate Data	Corinna Perchtold et.al.	2507.12643	null
2025-07-15	SurgeryLSTM: A Time-Aware Neural Model for Accurate and Explainable Length of Stay Prediction After Spine Surgery	Ha Na Cho et.al.	2507.11570	null
2025-07-14	Reprogramming Vision Foundation Models for Spatio-Temporal Forecasting	Changlu Chen et.al.	2507.11558	null
2025-07-13	Continental scale habitat modelling with artificial intelligence and multimodal earth observation	Sara Si-Moussi et.al.	2507.09732	null
2025-07-11	Predictive Causal Inference via Spatio-Temporal Modeling and Penalized Empirical Likelihood	Byunghee Lee et.al.	2507.08896	null
2025-07-09	M $^2$ -MFP: A Multi-Scale and Multi-Level Memory Failure Prediction Framework for Reliable Cloud Infrastructure	Hongyi Xie et.al.	2507.07144	null
2025-07-08	What’s Making That Sound Right Now? Video-centric Audio-Visual Localization	Hahyeon Choi et.al.	2507.04667	null
2025-07-07	FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation	Maolin Wang et.al.	2507.04651	null
2025-07-05	Transformer with Koopman-Enhanced Graph Convolutional Network for Spatiotemporal Dynamics Forecasting	Zekai Wang et.al.	2507.03855	null
2025-06-29	Fractional Policy Gradients: Reinforcement Learning with Long-Term Memory	Urvi Pawar et.al.	2507.00073	null
2025-06-29	MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition	Yuhuan Yang et.al.	2506.23283	null
2025-06-27	Linking climate and dengue in the Philippines using a two-stage Bayesian spatio-temporal model	Stephen Jun Villejo et.al.	2506.22334	null
2025-06-27	An Efficient Class of Bayesian Generalized Quadratic Nonlinear Dynamic Models with Application to Birth Rate Monitoring	Madelyn Clinch et.al.	2506.22188	null
2025-06-26	AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification	Galvin Brice S. Lim et.al.	2506.21338	null
2025-06-25	A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs	Kethmi Hirushini Hettige et.al.	2506.20073	null
2025-06-24	A Batch-Insensitive Dynamic GNN Approach to Address Temporal Discontinuity in Graph Streams	Yang Zhou et.al.	2506.19282	null
2025-06-23	VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning	Xuanyu Zhang et.al.	2506.18564	null
2025-06-22	Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba	Donghyun Lee et.al.	2506.18184	null
2025-06-19	AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction	Qianru Zhang et.al.	2506.16001	link
2025-06-17	EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization	Xiaoqi Wang et.al.	2506.14356	link
2025-06-16	Online-Optimized Gated Radial Basis Function Neural Network-Based Adaptive Control	Mingcong Li et.al.	2506.13168	null
2025-06-16	C-TLSAN: Content-Enhanced Time-Aware Long- and Short-Term Attention Network for Personalized Recommendation	Siqi Liang et.al.	2506.13021	link
2025-06-15	Leveraging MIMIC Datasets for Better Digital Health: A Review on Open Problems, Progress Highlights, and Future Promises	Afifa Khaled et.al.	2506.12808	null
2025-06-15	Large Scalable Cross-Domain Graph Neural Networks for Personalized Notification at LinkedIn	Shihai He et.al.	2506.12700	null
2025-06-14	DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification	Darryl Ho et.al.	2506.12585	null
2025-06-13	AgriPotential: A Novel Multi-Spectral and Multi-Temporal Remote Sensing Dataset for Agricultural Potentials	Mohammad El Sakka et.al.	2506.11740	null
2025-06-10	Multivariate Long-term Time Series Forecasting with Fourier Neural Filter	Chenheng Xu et.al.	2506.09174	null
2025-06-10	LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4 $\times$ RTX 4090s	Xijun Wang et.al.	2506.08529	null
2025-06-10	MLVTG: Mamba-Based Feature Alignment and LLM-Driven Purification for Multi-Modal Video Temporal Grounding	Zhiyi Zhu et.al.	2506.08512	null
2025-06-10	Large Deviations for Markovian Graphon Processes and Associated Dynamical Systems on Networks	Shankar Bhamidi et.al.	2506.08333	null
2025-06-09	A Temporal FRBR/FRBRoo-Based Model for Component-Level Versioning of Legal Norms	Hudson de Martim et.al.	2506.07853	null
2025-06-08	FANVID: A Benchmark for Face and License Plate Recognition in Low-Resolution Videos	Kavitha Viswanathan et.al.	2506.07304	null
2025-06-08	Technical Report: A Practical Guide to Kaldi ASR Optimization	Mengze Hong et.al.	2506.07149	null
2025-06-07	Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences	Mellon M. Zhang et.al.	2506.06944	null
2025-05-30	State Estimation and Control of Dynamic Systems from High-Dimensional Image Data	Ashik E Rasul et.al.	2506.05375	null
2025-06-03	Large-scale Self-supervised Video Foundation Model for Intelligent Surgery	Shu Yang et.al.	2506.02692	null
2025-06-01	MOOSE: Pay Attention to Temporal Dynamics for Video Understanding via Optical Flows	Hong Nguyen et.al.	2506.01119	null
2025-06-01	3D Skeleton-Based Action Recognition: A Review	Mengyuan Liu et.al.	2506.00915	null
2025-05-30	Entanglement for Pattern Learning in Temporal Data with Logarithmic Complexity: Benchmarking on IBM Quantum Hardware	Mostafizur Rahaman Laskar et.al.	2506.00097	null
2025-05-30	TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection	Xinqi Xiong et.al.	2505.24866	null
2025-06-03	Binary Cumulative Encoding meets Time Series Forecasting	Andrei Chernov et.al.	2505.24595	null
2025-05-30	Two-stage MCMC for Fast Bayesian Inference of Large Spatio-temporal Ordinal Data, with Application to US Drought	Staci Hepler et.al.	2505.24594	null
2025-05-30	Enhancing the Accuracy of Spatio-Temporal Models for Wind Speed Prediction by Incorporating Bias-Corrected Crowdsourced Data	Eamonn Organ et.al.	2505.24506	link
2025-05-30	Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model	Yuting Zhang et.al.	2505.24476	link
2025-05-30	Bayesian Inference for Spatially-Temporally Misaligned Data Using Predictive Stacking	Soumyakanti Pan et.al.	2505.24397	null
2025-05-30	DisTime: Distribution-based Time Representation for Video Large Language Models	Yingsen Zeng et.al.	2505.24329	link
2025-05-29	CLDTracker: A Comprehensive Language Description for Visual Tracking	Mohamad Alansari et.al.	2505.23704	link
2025-05-29	RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting	Mohamad Hakam Shams Eddin et.al.	2505.22535	null
2025-05-27	Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion	Yang Yang et.al.	2505.21593	null
2025-05-26	UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space	Yong Liu et.al.	2505.19958	null
2025-05-26	CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features	X. Feng et.al.	2505.19434	link
2025-05-25	SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition	Yunbo Liu et.al.	2505.19369	null
2025-05-25	Advancing Video Self-Supervised Learning via Image Foundation Models	Jingwei Wu et.al.	2505.19218	link
2025-05-23	Efficient Algorithms for Electing Successive Committees	Pallavi Jain et.al.	2505.18287	null
2025-05-22	Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis	Xin You et.al.	2505.17333	null
2025-05-22	Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands	Kristin Qi et.al.	2505.17137	null
2025-05-22	Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers	Yuzhu Wang et.al.	2505.16607	null
2025-05-21	EEG-Based Inter-Patient Epileptic Seizure Detection Combining Domain Adversarial Training with CNN-BiLSTM Network	Rina Tazaki et.al.	2505.15203	null
2025-06-03	AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection	Zhipei Xu et.al.	2505.15173	null
2025-05-20	StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning	Huaijie Wang et.al.	2505.13997	null
2025-05-19	MAGI-1: Autoregressive Video Generation at Scale	Sand. ai et.al.	2505.13211	link
2025-05-16	ASRC-SNN: Adaptive Skip Recurrent Connection Spiking Neural Network	Shang Xu et.al.	2505.11455	link
2025-05-16	STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking	Sicheng Shen et.al.	2505.11151	link
2025-05-15	ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data	Chengsen Wang et.al.	2505.10083	null
2025-05-14	Mission Balance: Generating Under-represented Class Samples using Video Diffusion Models	Danush Kumar Venkatesh et.al.	2505.09858	link
2025-05-19	Learning Long-Context Diffusion Policies via Past-Token Prediction	Marcel Torne et.al.	2505.09561	null
2025-05-14	WSCIF: A Weakly-Supervised Color Intelligence Framework for Tactical Anomaly Detection in Surveillance Keyframes	Wei Meng et.al.	2505.09129	null
2025-05-14	Modeling Interdependent Cybersecurity Threats Using Bayesian Networks: A Case Study on In-Vehicle Infotainment Systems	Sangita Sridar et.al.	2505.09048	null
2025-05-13	Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People	Haoshuai Zhou et.al.	2505.08215	null
2025-05-12	Joint Graph Convolution and Sequential Modeling for Scalable Network Traffic Estimation	Nan Jiang et.al.	2505.07674	null
2025-05-06	PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing	Yiping Xie et.al.	2505.03621	null
2025-04-30	AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis	Enmin Zhong et.al.	2505.00569	null
2025-04-25	STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting	Yunze Deng et.al.	2504.18318	null
2025-04-23	Subject-driven Video Generation via Disentangled Identity and Motion	Daneul Kim et.al.	2504.17816	null
2025-04-25	Improving Significant Wave Height Prediction Using Chronos Models	Yilin Zhai et.al.	2504.16834	null
2025-04-21	Topological model selection: a case-study in tumour-induced angiogenesis	Robert A McDonald et.al.	2504.15442	null
2025-04-18	RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding	Hang Ji et.al.	2504.12643	null
2025-04-16	Multimodal Spatio-temporal Graph Learning for Alignment-free RGBT Video Object Detection	Qishun Wang et.al.	2504.11779	null
2025-04-14	Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis	Yifan Yang et.al.	2504.10352	null
2025-04-14	Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition	Hongyu Qu et.al.	2504.10079	null
2025-04-14	Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling	Hoang M. Truong et.al.	2504.09960	null
2025-04-12	VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro	Zheyuan Zhang et.al.	2504.09282	null
2025-04-08	Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA	Zijie Song et.al.	2504.05783	null
2025-04-10	LATTE: Lightweight Attention-based Traffic Accident Anticipation Engine	Jiaxun Zhang et.al.	2504.04103	null
2025-04-05	Multi-resolution Score-Based Variational Graphical Diffusion for Causal Disaster System Modeling and Inference	Xuechun Li et.al.	2504.04015	link
2025-04-04	Crash Time Matters: HybridMamba for Fine-Grained Temporal Localization in Traffic Surveillance Footage	Ibne Farabi Shihab et.al.	2504.03235	null
2025-04-03	EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling	Hao Yin et.al.	2504.02402	null
2025-04-07	Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Shreyank N Gowda et.al.	2504.01890	null
2025-04-02	Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation	Mingrui Ye et.al.	2504.01764	link
2025-04-01	AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection	Loveneet Saini et.al.	2504.00559	null
2025-03-31	Near-surface coherent structures in an intense tropical cyclone: conditional eddies and vertical momentum fluxes	Chibueze N. Oguejiofor et.al.	2504.00293	null
2025-03-31	Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions	Thinesh Thiyakesan Ponbagavathi et.al.	2503.24298	null
2025-04-15	Frequency-Aware Attention-LSTM for PM $_{2.5}$ Time Series Forecasting	Jiahui Lu et.al.	2503.24043	null
2025-03-27	Advancing Spatiotemporal Prediction using Artificial Intelligence: Extending the Framework of Geographically and Temporally Weighted Neural Network (GTWNN) for Differing Geographical and Temporal Contexts	Nicholas Robert Fisk et.al.	2503.22751	null
2025-03-28	Long-Term Electricity Demand Prediction Using Non-negative Tensor Factorization and Genetic Algorithm-Driven Temporal Modeling	Toma Masaki et.al.	2503.22132	null
2025-03-27	Comparative Analysis of Image, Video, and Audio Classifiers for Automated News Video Segmentation	Jonathan Attard et.al.	2503.21848	null
2025-03-27	Video-R1: Reinforcing Video Reasoning in MLLMs	Kaituo Feng et.al.	2503.21776	link
2025-03-25	VTD-CLIP: Video-to-Text Discretization via Prompting CLIP	Wencheng Zhu et.al.	2503.18407	null
2025-03-23	TransAnimate: Taming Layer Diffusion to Generate RGBA Video	Xuewei Chen et.al.	2503.17934	null
2025-03-22	A Temporal Modeling Framework for Video Pre-Training on Video Instance Segmentation	Qing Zhong et.al.	2503.17672	null
2025-03-21	Enhancing Steering Estimation with Semantic-Aware GNNs	Fouad Makiyeh et.al.	2503.17153	null
2025-03-21	PE-CLIP: A Parameter-Efficient Fine-Tuning of Vision Language Models for Dynamic Facial Expression Recognition	Ibtissam Saadi et.al.	2503.16945	null
2025-03-20	Spatial-temporal models for forest inventory data	Paul B. May et.al.	2503.16691	link
2025-03-20	Copula-based spatio-temporal modeling of air pollutant data incorporating covariate dependencies	Soyun Jeon et.al.	2503.15935	null
2025-03-27	Dynamic Bi-Elman Attention Networks: A Dual-Directional Context-Aware Test-Time Learning for Text Classification	ZhengLin Lai et.al.	2503.15469	link
2025-03-19	Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation	Haoyu Ji et.al.	2503.15126	null
2025-04-02	Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset	Yiqun Mei et.al.	2503.14485	null
2025-03-14	Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding	David Gastager et.al.	2503.11392	null
2025-03-25	Lightweight Models for Emotional Analysis in Video	Quoc-Tien Nguyen et.al.	2503.10530	link
2025-03-13	Interactive Multimodal Fusion with Temporal Modeling	Jun Yu et.al.	2503.10523	null
2025-03-13	Mamba-VA: A Mamba-based Approach for Continuous Emotion Recognition in Valence-Arousal Space	Yuheng Liang et.al.	2503.10104	link
2025-03-09	Future-Aware Interaction Network For Motion Forecasting	Shijie Li et.al.	2503.06565	null
2025-03-06	STX-Search: Explanation Search for Continuous Dynamic Spatio-Temporal Models	Saif Anwar et.al.	2503.04509	null
2025-03-06	Token-Efficient Long Video Understanding for Multimodal LLMs	Jindong Jiang et.al.	2503.04130	null
2025-03-05	IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction	Jiangtong Zhu et.al.	2503.03882	null
2025-03-05	TrafficKAN-GCN: Graph Convolutional-based Kolmogorov-Arnold Network for Traffic Flow Optimization	Jiayi Zhang et.al.	2503.03276	link
2025-03-04	Deep Learning-Enhanced Visual Monitoring in Hazardous Underwater Environments with a Swarm of Micro-Robots	Shuang Chen et.al.	2503.02752	link
2025-03-03	Bayesian spatio-temporal modelling for infectious disease outbreak detection	Matthew Adeoye et.al.	2503.01456	link
2025-03-03	STGAN: Spatial-temporal Graph Autoregression Network for Pavement Distress Deterioration Prediction	Shilin Tong et.al.	2503.01152	null
2025-03-02	MoSFormer: Augmenting Temporal Context with Memory of Surgery for Surgical Phase Recognition	Hao Ding et.al.	2503.00695	null
2025-02-28	JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection	Hyeonuk Nam et.al.	2502.20857	link
2025-03-18	Towards Practical Real-Time Neural Video Compression	Zhaoyang Jia et.al.	2502.20762	link
2025-02-26	Arctic teleconnection on climate and ozone pollution in the polar jet stream path of eastern US	K Shuvo Bakar et.al.	2502.19234	null
2025-02-24	MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation	Jiehao Luo et.al.	2502.16907	link
2025-02-21	Beyond Fixed Variables: Expanding-variate Time Series Forecasting via Flat Scheme and Spatio-temporal Focal Learning	Minbo Ma et.al.	2502.15296	null
2025-02-19	Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning	Caihua Liu et.al.	2502.13754	null
2025-02-17	Unhackable Temporal Rewarding for Scalable Video MLLMs	En Yu et.al.	2502.12081	null
2025-02-17	Deep Spatio-Temporal Neural Network for Air Quality Reanalysis	Ammar Kheder et.al.	2502.11941	link
2025-02-16	ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models	Shixuan Li et.al.	2502.11059	null
2025-02-15	Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video	Runyang Feng et.al.	2502.10616	null
2025-02-13	Non-Markovian Discrete Diffusion with Causal Language Models	Yangtian Zhang et.al.	2502.09767	null
2025-02-09	Temporal Model On Quantum Logic	Francesco D’Agostino et.al.	2502.07817	null
2025-02-09	Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding	Xingjian Diao et.al.	2502.06020	link
2025-02-08	4D VQ-GAN: Synthesising Medical Scans at Any Time Point for Personalised Disease Progression Modelling of Idiopathic Pulmonary Fibrosis	An Zhao et.al.	2502.05713	null
2025-02-06	MedGNN: Towards Multi-resolution Spatiotemporal Graph Learning for Medical Time Series Classification	Wei Fan et.al.	2502.04515	link
2025-02-06	MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling	Sharana Dharshikgan Suresh Dass et.al.	2502.03724	link
2025-02-10	Kronecker Mask and Interpretive Prompts are Language-Action Video Learners	Jingyi Yang et.al.	2502.03549	link
2025-01-27	Foundation for unbiased cross-validation of spatio-temporal models for species distribution modeling	Diana Koldasbayeva et.al.	2502.03480	link
2025-02-04	Robust and Conjugate Spatio-Temporal Gaussian Processes	William Laplante et.al.	2502.02450	link
2025-01-31	GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling	Pinxin Liu et.al.	2501.18898	link
2025-01-30	Track-On: Transformer-based Online Point Tracking with Memory	Görkay Aydemir et.al.	2501.18487	link
2025-01-30	Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss	Wenshuo Chen et.al.	2501.18232	link
2025-01-28	Extending Information Bottleneck Attribution to Video Sequences	Veronika Solopova et.al.	2501.16889	link
2025-01-28	Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	Yun Li et.al.	2501.16786	null
2025-01-24	Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation	Haipeng Chen et.al.	2501.14356	null
2025-01-23	Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models	Chaolei Han et.al.	2501.13795	link
2025-01-21	Efficient Dynamic Image Reconstruction with motion estimation	Toluwani Okunola et.al.	2501.12497	null
2025-01-22	Budget-constrained Collaborative Renewable Energy Forecasting Market	Carla Goncalves et.al.	2501.12367	link
2025-01-21	DSTSA-GCN: Advancing Skeleton-Based Gesture Recognition with Semantic-Aware Spatio-Temporal Topology Modeling	Hu Cui et.al.	2501.12086	link
2025-01-20	Leveraging graph neural networks and mobility data for COVID-19 forecasting	Fernando H. O. Duarte et.al.	2501.11711	link
2025-01-17	Gamma-ray burst prompt emission spectra at high energies	Samanta Macera et.al.	2501.10507	null
2025-01-15	MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation	Olga Zatsarynna et.al.	2501.08837	null
2025-01-15	FlexiClip: Locality-Preserving Free-Form Character Animation	Anant Khandelwal et.al.	2501.08676	null
2025-01-13	Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling	Jiebin Yan et.al.	2501.07087	null
2025-01-12	Kolmogorov-Arnold Recurrent Network for Short Term Load Forecasting Across Diverse Consumers	Muhammad Umair Danish et.al.	2501.06965	null
2025-01-10	MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection	Arkaprava Sinha et.al.	2501.06138	link
2025-01-15	Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation	Sun-Hyuk Choi et.al.	2501.04939	link
2025-01-07	Three-dimensional attention Transformer for state evaluation in real-time strategy games	Yanqing Ye et.al.	2501.03832	null
2025-01-06	STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution	Rui Xie et.al.	2501.02976	null
2025-01-03	Innate behavioural mechanisms and defensive traits in ecological models of predator-prey types	Sangeeta Saha et.al.	2501.01687	null
2024-12-24	Multi-View Fusion Neural Network for Traffic Demand Prediction	Dongran Zhang et.al.	2412.19839	null
2024-12-26	Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries	Roberto Amoroso et.al.	2412.19304	null
2024-12-24	Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models	Jinhui Yi et.al.	2412.18609	link
2024-12-20	Mask-RadarNet: Enhancing Transformer With Spatial-Temporal Semantic Context for Radar Object Detection in Autonomous Driving	Yuzhi Wu et.al.	2412.15595	null
2024-12-19	DroughtSet: Understanding Drought Through Spatial-Temporal Learning	Xuwei Tan et.al.	2412.15075	link
2024-12-19	Efficient Self-Supervised Video Hashing with Selective State Spaces	Jinpeng Wang et.al.	2412.14518	link
2024-12-19	Diffusion and Discrete Temporal Models of the Growth of Free-Ranging Cats in Urban Areas	Rodrigo Perusquía Cortés et.al.	2412.14445	null
2024-12-18	TAUDiff: Improving statistical downscaling for extreme weather events using generative diffusion models	Rahul Sundar et.al.	2412.13627	null
2024-12-16	STDHL: Spatio-Temporal Dynamic Hypergraph Learning for Wind Power Forecasting	Xiaochong Dong et.al.	2412.11393	null
2024-12-11	Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction	Bohan Li et.al.	2412.08243	null
2024-12-10	Modeling High-Resolution Spatio-Temporal Wind with Deep Echo State Networks and Stochastic Partial Differential Equations	Kesen Wang et.al.	2412.07265	null
2025-01-07	LMS-AutoTSF: Learnable Multi-Scale Decomposition and Integrated Autocorrelation for Time Series Forecasting	Ibrahim Delibasoglu et.al.	2412.06866	link
2024-12-09	How to Merge Your Multimodal Models Over Time?	Sebastian Dziadzio et.al.	2412.06712	link
2024-12-05	MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation	Longtao Zheng et.al.	2412.04448	null
2024-12-03	Towards the efficacy of federated prediction for epidemics on networks	Chengpeng Fu et.al.	2412.02161	link
2024-12-02	Navigating Challenges in Spatio-temporal Modelling of Antarctic Krill Abundance: Addressing Zero-inflated Data and Misaligned Covariates	André Victor Ribeiro Amaral et.al.	2412.01399	link
2024-11-30	PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation	Qiyao Xue et.al.	2412.00596	link
2024-11-27	Predicting Extubation Failure in Intensive Care: The Development of a Novel, End-to-End Actionable and Interpretable Prediction System	Akram Yoosoofsah et.al.	2412.00105	null
2024-11-27	TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video	Jinyuan Qu et.al.	2411.18671	null
2024-11-26	Temporal Models for Demographic and Global Health Outcomes in Multiple Populations: Introducing the Normal-with-Optional-Shrinkage Data Model Class	Leontine Alkema et.al.	2411.18646	null
2024-11-26	SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation	Claudia Cuttano et.al.	2411.17646	link
2024-11-25	GAST: Sequential Gaussian Avatars with Hierarchical Spatio-temporal Context	Wangze Xu et.al.	2411.16768	null
2024-11-20	MambaDETR: Query-based Temporal Modeling using State Space Model for Multi-View 3D Object Detection	Tong Ning et.al.	2411.13628	null
2024-11-19	Hierarchical Spatio-Temporal Uncertainty Quantification for Distributed Energy Adoption	Wenbin Zhou et.al.	2411.12193	null
2024-11-15	TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding	Quang P. M. Pham et.al.	2411.10509	link
2024-11-15	MDHP-Net: Detecting Injection Attacks on In-vehicle Network using Multi-Dimensional Hawkes Process and Temporal Model	Qi Liu et.al.	2411.10258	null
2024-11-11	HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision	Shubo Lin et.al.	2411.06780	null
2024-11-14	Gaussian process modelling of infectious diseases using the Greta software package and GPUs	Eva Gunn et.al.	2411.05556	null
2024-11-07	Multi-temporal crack segmentation in concrete structure using deep learning approaches	Said Harb et.al.	2411.04620	null
2024-11-07	TrajGPT: Controlled Synthetic Trajectory Generation Using a Multitask Transformer-Based Spatiotemporal Model	Shang-Ling Hsu et.al.	2411.04381	link
2024-11-05	FilterNet: Harnessing Frequency Filters for Time Series Forecasting	Kun Yi et.al.	2411.01623	link
2024-10-31	Self-Ensembling Gaussian Splatting for Few-shot Novel View Synthesis	Chen Zhao et.al.	2411.00144	link
2024-10-30	LGU-SLAM: Learnable Gaussian Uncertainty Matching with Deformable Correlation Sampling for Deep Visual SLAM	Yucheng Huang et.al.	2410.23231	link
2024-10-27	Neural rendering enables dynamic tomography	Ivan Grega et.al.	2410.20558	null
2024-10-25	UbiHR: Resource-efficient Long-range Heart Rate Sensing on Ubiquitous Devices	Haoyu Bian et.al.	2410.19279	null
2024-10-24	Classifying Bicycle Infrastructure Using On-Bike Street-Level Images	Kal Backman et.al.	2410.19194	null
2024-10-24	Spatio-spectral-temporal Modelling of Two Young Pulsar Wind Nebulae	A. Kundu et.al.	2410.18386	null
2024-10-25	Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers	Valeria Ruscio et.al.	2410.18067	null
2024-10-22	A Survey on Deep Learning-based Gaze Direction Regression: Searching for the State-of-the-art	Franko Šikić et.al.	2410.17082	null
2024-11-27	Spectrum and location of ongoing extreme particle acceleration in Cassiopeia A	Jooyun Woo et.al.	2410.16522	null
2024-10-18	Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models	Tangwen Qian et.al.	2410.13196	null
2024-10-14	Fed-piLot: Optimizing LoRA Assignment for Efficient Federated Foundation Model Fine-Tuning	Zikai Zhang et.al.	2410.10200	null
2024-10-09	Causal Representation Learning in Temporal Data via Single-Parent Decoding	Philippe Brouillard et.al.	2410.07013	link
2024-10-08	Enhancing Temporal Modeling of Video LLMs via Time Gating	Zi-Yuan Hu et.al.	2410.05714	link
2024-10-04	Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models	Haibo Wang et.al.	2410.03290	link
2024-10-04	Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach	Yaofang Liu et.al.	2410.03160	link
2024-10-04	AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark	Wenhao Chai et.al.	2410.03051	null
2024-10-03	A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios	Pascal Kündig et.al.	2410.02846	link
2024-09-30	Masked Autoregressive Model for Weather Forecasting	Doyi Kim et.al.	2409.20117	null
2024-09-30	SurgPETL: Parameter-Efficient Image-to-Surgical-Video Transfer Learning for Surgical Phase Recognition	Shu Yang et.al.	2409.20083	null
2024-09-29	PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond	Chen Song et.al.	2409.19772	link
2024-09-26	PGN: The RNN’s New Successor is Effective for Long-Range Time Series Forecasting	Yuxin Jia et.al.	2409.17703	link
2024-09-26	MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling	Weihao Yuan et.al.	2409.17686	null
2024-09-23	Automated Spatio-Temporal Weather Modeling for Load Forecasting	Julie Keisler et.al.	2409.16326	null
2024-09-24	Self-Supervised Representation Learning with Augmentations of Continuous Training Data Improves the Feel and Performance of Myoelectric Control	Shriram Tallam Puranam Raghu et.al.	2409.16015	null
2024-09-24	DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection	Jiaxin Ye et.al.	2409.15936	link
2024-09-18	SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba	Xiangning Zhang et.al.	2409.12108	link
2024-09-18	DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech	Xin Qi et.al.	2409.11835	null
2024-09-21	Self-Supervised Learning via VICReg Enables Training of EMG Pattern Recognition Using Continuous Data with Unclear Labels	Shriram Tallam Puranam Raghu et.al.	2409.11632	null
2024-09-14	QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems	Zhixian He et.al.	2409.09348	null
2024-09-08	Estimating velocities of infectious disease spread through spatio-temporal log-Gaussian Cox point processes	Fernando Rodriguez Avellaneda et.al.	2409.05036	null
2024-09-05	TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations	Mingze Gao et.al.	2409.03206	null
2024-09-01	Searching for MeV-scale Axion-like Particles and Dark Photons with PandaX-4T	PandaX Collaboration et.al.	2409.00773	null
2024-09-17	Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation	Haozhe Lou et.al.	2408.14873	null
2024-08-23	Multivariate Time-Series Anomaly Detection based on Enhancing Graph Attention Networks with Topological Analysis	Zhe Liu et.al.	2408.13082	link
2024-08-23	Animal Identification with Independent Foreground and Background Modeling	Lukas Picek et.al.	2408.12930	null
2024-08-22	Deep Analysis of Time Series Data for Smart Grid Startup Strategies: A Transformer-LSTM-PSO Model Approach	Zecheng Zhang et.al.	2408.12129	null
2024-08-20	TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning	Bin Wang et.al.	2408.10688	link
2024-08-20	DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba	Shuning Xu et.al.	2408.10679	null
2024-08-20	Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?	Chen Liang et.al.	2408.10627	null
2024-08-19	Uncertainty Quantification of Pre-Trained and Fine-Tuned Surrogate Models using Conformal Prediction	Vignesh Gopakumar et.al.	2408.09881	link
2024-08-14	Limit Theorems for Weakly Dependent Non-stationary Random Field Arrays and Asymptotic Inference of Dynamic Spatio-temporal Models	Yue Pan et.al.	2408.07429	null
2024-08-12	OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning	Mushui Liu et.al.	2408.06158	link
2024-08-12	Spacetime $E(n)$ -Transformer: Equivariant Attention for Spatio-temporal Graphs	Sergio G. Charles et.al.	2408.06039	link
2024-08-16	Performance and Non-adversarial Robustness of the Segment Anything Model 2 in Surgical Video Segmentation	Yiqing Shen et.al.	2408.04098	null
2024-08-07	Surgformer: Surgical Transformer with Hierarchical Temporal Attention for Surgical Phase Recognition	Shu Yang et.al.	2408.03867	link
2024-08-07	PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model	Yunlong Huang et.al.	2408.03540	link
2024-09-09	SiamMo: Siamese Motion-Centric 3D Object Tracking	Yuxiang Yang et.al.	2408.01688	link
2024-09-11	RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining	Hongtao Wu et.al.	2407.21773	link
2024-08-03	Unveiling land use dynamics: Insights from a hierarchical Bayesian spatio-temporal modelling of Compositional Data	Mario Figueira et.al.	2407.21695	null
2024-07-30	Autogenic Language Embedding for Coherent Point Tracking	Zikai Song et.al.	2407.20730	link
2024-07-26	UniForensics: Face Forgery Detection via General Facial Representation	Ziyuan Fang et.al.	2407.19079	null
2024-07-26	Harnessing Temporal Causality for Advanced Temporal Action Detection	Shuming Liu et.al.	2407.17792	link
2024-07-24	PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction	Nan Peng et.al.	2407.17378	link
2024-07-24	DVPE: Divided View Position Embedding for Multi-View 3D Object Detection	Jiasen Wang et.al.	2407.16955	link
2024-07-22	A divide-and-conquer approach for spatio-temporal analysis of large house price data from Greater London	Kapil Gupta et.al.	2407.15905	null
2024-07-03	Digital Twin-based Driver Risk-Aware Intelligent Mobility Analytics for Urban Transportation Management	Tao Li et.al.	2407.15025	null
2024-08-06	Physics-guided Active Sample Reweighting for Urban Flow Prediction	Wei Jiang et.al.	2407.13605	link
2024-07-15	Human-Centric Transformer for Domain Adaptive Action Recognition	Kun-Yu Lin et.al.	2407.10860	null
2024-07-15	Spatio-temporal neural distance fields for conditional generative modeling of the heart	Kristine Sørensen et.al.	2407.10663	link
2024-07-12	Open Vocabulary Multi-Label Video Classification	Rohit Gupta et.al.	2407.09073	null
2024-07-09	Rethinking Image-to-Video Adaptation: An Object-centric Perspective	Rui Qian et.al.	2407.06871	null
2024-07-07	Efficient Bayesian dynamic closed skew-normal model preserving mean and covariance for spatio-temporal data	Hajime Kuno et.al.	2407.05288	link
2024-07-03	Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation	Mengmeng Cui et.al.	2407.02990	null
2024-07-03	PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition	Yanbin Hao et.al.	2407.02934	link
2024-07-16	Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion	Bohan Li et.al.	2407.02077	link
2024-07-29	Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition	Guanghao Zhu et.al.	2406.17538	link
2024-06-23	Multi-Scale Temporal Difference Transformer for Video-Text Retrieval	Ni Wang et.al.	2406.16111	null
2024-06-20	ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning	Zhongjie Duan et.al.	2406.14130	link
2024-06-20	LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction	Kuang Wu et.al.	2406.13988	null
2024-06-18	RIGL: A Unified Reciprocal Approach for Tracing the Independent and Group Learning Processes	Xiaoshan Yu et.al.	2406.12465	link
2024-06-18	Translation Equivariant Transformer Neural Processes	Matthew Ashman et.al.	2406.12409	null
2024-06-18	LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition	Yunze Deng et.al.	2406.12355	null
2024-06-15	X-Ray spectral and temporal properties of LMXB 4U 1608-52- observed with AstroSat and NICER	Sree Bhattacherjee et.al.	2406.10666	null
2024-06-13	OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation	Junke Wang et.al.	2406.09399	link
2024-06-13	Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs	Zijia Zhao et.al.	2406.09367	link
2024-06-17	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Zesen Cheng et.al.	2406.07476	link
2024-06-11	RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation	Mirgahney Mohamed et.al.	2406.07169	null
2024-06-11	AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding	Xing Zhang et.al.	2406.07091	null
2024-06-07	Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement	Wei Qian et.al.	2406.04942	null
2024-06-07	Bayesian inference of Latent Spectral Shapes	Hiu Ching Yip et.al.	2406.04915	null
2024-06-07	MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome	Yixin Huang et.al.	2406.04680	link
2024-06-05	Non-stationary Spatio-Temporal Modeling Using the Stochastic Advection-Diffusion Equation	Martin Outzen Berild et.al.	2406.03400	link
2024-06-04	I4VGen: Image as Stepping Stone for Text-to-Video Generation	Xiefan Guo et.al.	2406.02230	null
2024-06-03	UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation	Xiang Wang et.al.	2406.01188	null
2024-06-01	DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation	Qihang Xie et.al.	2406.00341	null
2024-06-01	A Review of Pulse-Coupled Neural Network Applications in Computer Vision and Image Processing	Nurul Rafi et.al.	2406.00239	null
2024-05-31	Streamflow Prediction with Uncertainty Quantification for Water Management: A Constrained Reasoning and Learning Approach	Mohammed Amine Gharsallaoui et.al.	2406.00133	link
2024-05-31	4Diffusion: Multi-view Video Diffusion Model for 4D Generation	Haiyu Zhang et.al.	2405.20674	null
2024-05-30	Streaming Video Diffusion: Online Video Editing with Diffusion Models	Feng Chen et.al.	2405.19726	link
2024-05-30	Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training	Jinxia Yang et.al.	2405.19654	link
2024-05-30	FTS: A Framework to Find a Faithful TimeSieve	Songning Lai et.al.	2405.19647	null
2024-05-24	Dynamical Analysis of a Cocaine-Heroin Epidemiological Model with Spatial Distributions	Achraf Zinihi et.al.	2405.15532	null
2024-05-20	Biomarker Selection for Adaptive Systems	Joshua Pickard et.al.	2405.09809	null
2024-05-14	No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding	Yingjie Zhai et.al.	2405.08344	link
2024-05-13	Improved Bound for Robust Causal Bandits with Linear Models	Zirui Yan et.al.	2405.07795	null
2024-05-10	Residual-based Attention Physics-informed Neural Networks for Efficient Spatio-Temporal Lifetime Assessment of Transformers Operated in Renewable Power Plants	Ibai Ramirez et.al.	2405.06443	null
2024-05-10	A Multi-Channel Spatial-Temporal Transformer Model for Traffic Flow Forecasting	Jianli Xiao et.al.	2405.06266	null
2024-05-07	DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	Chen Min et.al.	2405.04390	null
2024-05-07	Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling	Jiawei Shi et.al.	2405.04309	null
2024-05-06	Hierarchical Space-Time Attention for Micro-Expression Recognition	Haihong Hao et.al.	2405.03202	link
2024-05-21	RSCaMa: Remote Sensing Image Change Captioning with State Space Model	Chenyang Liu et.al.	2404.18895	link
2024-04-24	Deep Predictive Model Learning with Parametric Bias: Handling Modeling Difficulties and Temporal Model Changes	Kento Kawaharazuka et.al.	2404.15726	null
2024-04-19	MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model	Kang Zeng et.al.	2404.12794	link
2024-04-13	Understanding Human-COVID-19 Dynamics using Geospatial Big Data: A Systematic Literature Review	Binbin Lin et.al.	2404.10013	null
2024-04-15	A spatio-temporal model to detect potential outliers in disease mapping	Victoire Michal et.al.	2404.09882	null
2024-04-11	Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos	Soumyabrata Chaudhuri et.al.	2404.07645	link
2024-04-05	Low-Rank Robust Subspace Tensor Clustering for Metro Passenger Flow Modeling	Jiuyun Hu et.al.	2404.04403	null
2024-04-03	Spatio-temporal Modeling of Count Data	Steffen Maletz et.al.	2404.02982	link
2024-03-31	$R^2$ -Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	Ye Liu et.al.	2404.00801	link
2024-03-30	ST-LLM: Large Language Models Are Effective Temporal Learners	Ruyang Liu et.al.	2404.00308	link
2024-03-28	X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization	Anna Kukleva et.al.	2403.19811	link
2024-03-25	TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models	Zhongwei Zhang et.al.	2403.17005	null
2024-04-13	Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition	R. Gnana Praveen et.al.	2403.13659	link
2024-03-19	SUN Team’s Contribution to ABAW 2024 Competition: Audio-visual Valence-Arousal Estimation and Expression Recognition	Denis Dresvyanskiy et.al.	2403.12609	null
2024-03-18	Bayesian Optimization Sequential Surrogate (BOSS) Algorithm: Fast Bayesian Inference for a Broad Class of Bayesian Hierarchical Models	Dayi Li et.al.	2403.12250	null
2024-03-19	Exploring Facial Expression Recognition through Semi-Supervised Pretraining and Temporal Modeling	Jun Yu et.al.	2403.11942	null
2024-03-15	Spatio-temporal Occupancy Models with INLA	Jafet Belmont et.al.	2403.10680	null
2024-03-15	Multivariate Bayesian models with flexible shared interactions for analyzing spatio-temporal patterns of rare cancers	Garazi Retegui et.al.	2403.10440	link
2024-03-13	Leveraging Non-Decimated Wavelet Packet Features and Transformer Models for Time Series Forecasting	Guy P Nason et.al.	2403.08630	null
2024-03-10	Coherent Temporal Synthesis for Incremental Action Segmentation	Guodong Ding et.al.	2403.06102	null
2024-04-26	Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention	R. Gnana Praveen et.al.	2403.04654	link

Updated on 2026.03.24

Single Object & Visual Language Tracking

Large Language Model

Video Understanding

Multi-modal Learning