AI Firehose
@ai-firehose.column.social
490 followers 570 following 3.7K posts
Daily-updated stream of AI news || Monitoring research blog sites || Research articles from ArXiv
Posts Media Videos Starter Packs
ai-firehose.column.social
Researchers launched BMC-LongCLIP, a biomedical vision-language model with a 512-token context, cutting token waste from 55% to 2.2%. This leads to significant retrieval improvements, achieving up to 30% gain in Recall@1 and better zero-shot classification accuracy. https://arxiv.org/abs/2510.03978
No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models
ArXiv link for No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models
arxiv.org
ai-firehose.column.social
A study introduces SIFToM, a neurosymbolic model that helps robots interpret human instructions in noise, achieving near-human accuracy in tasks. This innovation bridges AI and real-world human-robot collaboration, improving reliability in dynamic environments. https://arxiv.org/abs/2409.10849
Pragmatic Embodied Spoken Instruction Following in Human-Robot Collaboration with Theory of Mind
ArXiv link for Pragmatic Embodied Spoken Instruction Following in Human-Robot Collaboration with Theory of Mind
arxiv.org
ai-firehose.column.social
SketchPlan revolutionizes drone navigation by interpreting 2D sketches on depth images into 3D flight paths. Utilizing synthetic training data, it achieves impressive real-world success, highlighting the potential for intuitive human-robot communication. https://arxiv.org/abs/2510.03545
SketchPlan: Diffusion Based Drone Planning From Human Sketches
ArXiv link for SketchPlan: Diffusion Based Drone Planning From Human Sketches
arxiv.org
ai-firehose.column.social
Stanford and Red Hat researchers introduced PG-DLM, a particle Gibbs sampling algorithm for discrete diffusion models. This enhances inference-time control in text generation, allowing trajectory-level refinement without retraining and boosting reward-guided outputs. https://arxiv.org/abs/2507.08390
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling
ArXiv link for Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling
arxiv.org
ai-firehose.column.social
A study presents AMAS, a dynamic LLM-driven framework for multi-agent systems, enhancing problem-solving efficiency by optimizing communication. Early results indicate it outperforms previous methods across tasks, paving the way for flexible AI collaborations. https://arxiv.org/abs/2510.01617
AMAS: Adaptively Determining Communication Topology for LLM-based Multi-Agent System
ArXiv link for AMAS: Adaptively Determining Communication Topology for LLM-based Multi-Agent System
arxiv.org
ai-firehose.column.social
ACE (Agentic Context Engineering) alters language models with evolving playbooks, enhancing adaptability without weight updates. It boosts performance in agent tasks and domain reasoning while cutting adaptation latency, setting a benchmark for AI efficiency. https://arxiv.org/abs/2510.04618
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
ArXiv link for Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
arxiv.org
ai-firehose.column.social
A study reveals LLM-judged benchmarks can undermine validity due to design flaws, leading to misleading rankings. Researchers introduce innovative metrics to diagnose these issues, advocating for more reliable and transparent evaluation. https://arxiv.org/abs/2509.20293
When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity
ArXiv link for When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity
arxiv.org
ai-firehose.column.social
An innovative initiative has created an open-source dataset of 215,670 documents from Sri Lanka, covering law, government, and media in Sinhala, Tamil, and English. This project boosts civic engagement and supports NLP research, enhancing access to vital information. https://arxiv.org/abs/2510.04124
Sri Lanka Document Datasets: A Large-Scale, Multilingual Resource for Law, News, and Policy (v20251005)
ArXiv link for Sri Lanka Document Datasets: A Large-Scale, Multilingual Resource for Law, News, and Policy (v20251005)
arxiv.org
ai-firehose.column.social
GQR enhances multimodal retrieval by refining query representations at test time, achieving high performance while being 14x faster and requiring 54x less memory than traditional methods. This approach enables efficient visual document retrieval. https://arxiv.org/abs/2510.05038
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
ArXiv link for Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
arxiv.org
ai-firehose.column.social
Researchers created Verbalized Sampling (VS), a method that reduces mode collapse in language models by producing diverse response distributions. This innovation improves creativity while ensuring accuracy and safety, enhancing potential for diverse AI applications. https://arxiv.org/abs/2510.01171
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
ArXiv link for Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
arxiv.org
ai-firehose.column.social
This study introduces Gaussian Partial Information Decomposition, boosting efficiency in analyzing multimodal data interactions while preserving optimal information quantification. Their method surpasses existing techniques, enhancing model selection and data fusion. https://arxiv.org/abs/2510.04417
Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions
ArXiv link for Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions
arxiv.org
ai-firehose.column.social
CLOUDANOBENCH is a benchmark for context-level anomaly detection in cloud systems, combining metrics and logs to enhance detection accuracy. Paired with CLOUDANOAGENT, an LLM-based solution, this research advances cloud infrastructure management. https://arxiv.org/abs/2508.01844
Towards Generalizable Context-aware Anomaly Detection: A Large-scale Benchmark in Cloud Environments
ArXiv link for Towards Generalizable Context-aware Anomaly Detection: A Large-scale Benchmark in Cloud Environments
arxiv.org
ai-firehose.column.social
DLLM is an innovative framework that uses Large Language Models to enhance cognitive diagnosis in web-based education systems, tackling noise and data imbalance issues effectively. It shows great predictive power across various noise levels. https://arxiv.org/abs/2510.04093
Harnessing LLM for Noise-Robust Cognitive Diagnosis in Web-Based Intelligent Education Systems
ArXiv link for Harnessing LLM for Noise-Robust Cognitive Diagnosis in Web-Based Intelligent Education Systems
arxiv.org
ai-firehose.column.social
Researchers present MedLog, a new protocol for event-level logging of clinical AI in healthcare, like syslog. It allows real-time monitoring and transparency, significantly enhancing patient safety through continuous auditing of rapidly evolving medical AI systems. https://arxiv.org/abs/2510.04033
A global log for medical AI
ArXiv link for A global log for medical AI
arxiv.org
ai-firehose.column.social
Researchers unveiled SLM-MUX, a multi-model architecture that orchestrates small language models, surpassing larger models with accuracy gains up to 13.4%. This method avoids groupthink, showcasing efficient collaboration over mere scaling. https://arxiv.org/abs/2510.05077
Slm-mux: Orchestrating small language models for reasoning
ArXiv link for Slm-mux: Orchestrating small language models for reasoning
arxiv.org
ai-firehose.column.social
SylCipher, a syllable-level unsupervised speech recognition, skips costly phoneme resources, achieving a 40% drop in character error rates on LibriSpeech. It boosts accessibility in hard languages like Mandarin, paving the way for better voice technology. https://arxiv.org/abs/2510.03639
Towards Unsupervised Speech Recognition at the Syllable-Level
ArXiv link for Towards Unsupervised Speech Recognition at the Syllable-Level
arxiv.org
ai-firehose.column.social
A framework to align AI with human preferences, leveraging population-proportional alignment from social choice theory, aims to reduce bias and manipulation in preference learning, showing promise in recommendation systems and language model alignment. https://arxiv.org/abs/2506.05619
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
ArXiv link for Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
arxiv.org
ai-firehose.column.social
Humanoid-COA combines foundation model reasoning with an Embodied Chain-of-Action mechanism for zero-shot loco-manipulation. It surpasses existing methods in real-world scenarios, showcasing notable adaptability and better success rates in complex environments. https://arxiv.org/abs/2504.09532
Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation
ArXiv link for Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation
arxiv.org
ai-firehose.column.social
VirDA introduces visual reprogramming layers enabling a single pre-trained backbone to adapt to new domains without fine-tuning, achieving 92.8% accuracy with just 1.5 million parameters—a remarkable boost in efficiency and performance for image classification. https://arxiv.org/abs/2510.01660
VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming
ArXiv link for VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming
arxiv.org
ai-firehose.column.social
This AI preference alignment framework reduces biases in traditional methods, aligning policies with actual preferences. By merging social choice theory with algorithms, it boosts AI's resistance to manipulation, improving alignment in various systems. https://arxiv.org/abs/2506.05619
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
ArXiv link for Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
arxiv.org
ai-firehose.column.social
VirDA is a new UDA method that reuses a pretrained backbone, achieving 92.8% accuracy with 1.5M parameters. It uses visual reprogramming layers to adapt across domains without full fine-tuning, enhancing AI application efficiency. https://arxiv.org/abs/2510.01660
VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming
ArXiv link for VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming
arxiv.org
ai-firehose.column.social
The Humanoid-COA framework combines multimodal foundation model reasoning and an Embodied Chain-of-Action mechanism to achieve remarkable zero-shot loco-manipulation. Extensive tests show it significantly outperforms existing methods in complex tasks. https://arxiv.org/abs/2504.09532
Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation
ArXiv link for Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation
arxiv.org
ai-firehose.column.social
VirDA presents a new approach for unsupervised domain adaptation via visual reprogramming to adapt pretrained models without fine-tuning. It attains high accuracy with just 1.5M parameters—less than its nearest competitors—potentially redefining ML efficiency. https://arxiv.org/abs/2510.01660
VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming
ArXiv link for VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming
arxiv.org
ai-firehose.column.social
MIT researchers unveil a preference learning framework that aligns AI policies with diverse human opinions, addressing biases in traditional methods. By integrating social choice theory, this approach enhances AI decision-making in complex human feedback scenarios. https://arxiv.org/abs/2506.05619
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
ArXiv link for Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
arxiv.org
ai-firehose.column.social
Humanoid-COA merges multimodal foundation model reasoning with a Chain-of-Action mechanism, enabling zero-shot loco-manipulation. Demonstrated on robots, it outperforms prior methods in complex, long-horizon tasks, reshaping how robots interpret human instructions. https://arxiv.org/abs/2504.09532
Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation
ArXiv link for Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation
arxiv.org