Lightnews — Scholar-powered news

AI Firehose

@ai-firehose.column.social

490 followers 570 following 3.7K posts

Daily-updated stream of AI news || Monitoring research blog sites || Research articles from ArXiv

Posts Media Videos Starter Packs

AI Firehose @ai-firehose.column.social · 59m

Researchers launched BMC-LongCLIP, a biomedical vision-language model with a 512-token context, cutting token waste from 55% to 2.2%. This leads to significant retrieval improvements, achieving up to 30% gain in Recall@1 and better zero-shot classification accuracy. https://arxiv.org/abs/2510.03978

No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models

ArXiv link for No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models

AI Firehose @ai-firehose.column.social · 1h

A study introduces SIFToM, a neurosymbolic model that helps robots interpret human instructions in noise, achieving near-human accuracy in tasks. This innovation bridges AI and real-world human-robot collaboration, improving reliability in dynamic environments. https://arxiv.org/abs/2409.10849

Pragmatic Embodied Spoken Instruction Following in Human-Robot Collaboration with Theory of Mind

ArXiv link for Pragmatic Embodied Spoken Instruction Following in Human-Robot Collaboration with Theory of Mind

AI Firehose @ai-firehose.column.social · 2h

SketchPlan revolutionizes drone navigation by interpreting 2D sketches on depth images into 3D flight paths. Utilizing synthetic training data, it achieves impressive real-world success, highlighting the potential for intuitive human-robot communication. https://arxiv.org/abs/2510.03545

SketchPlan: Diffusion Based Drone Planning From Human Sketches

ArXiv link for SketchPlan: Diffusion Based Drone Planning From Human Sketches

AI Firehose @ai-firehose.column.social · 2h

Stanford and Red Hat researchers introduced PG-DLM, a particle Gibbs sampling algorithm for discrete diffusion models. This enhances inference-time control in text generation, allowing trajectory-level refinement without retraining and boosting reward-guided outputs. https://arxiv.org/abs/2507.08390

Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling

ArXiv link for Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling

AI Firehose @ai-firehose.column.social · 2h

A study presents AMAS, a dynamic LLM-driven framework for multi-agent systems, enhancing problem-solving efficiency by optimizing communication. Early results indicate it outperforms previous methods across tasks, paving the way for flexible AI collaborations. https://arxiv.org/abs/2510.01617

AMAS: Adaptively Determining Communication Topology for LLM-based Multi-Agent System

ArXiv link for AMAS: Adaptively Determining Communication Topology for LLM-based Multi-Agent System

AI Firehose @ai-firehose.column.social · 3h

ACE (Agentic Context Engineering) alters language models with evolving playbooks, enhancing adaptability without weight updates. It boosts performance in agent tasks and domain reasoning while cutting adaptation latency, setting a benchmark for AI efficiency. https://arxiv.org/abs/2510.04618

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

ArXiv link for Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

AI Firehose @ai-firehose.column.social · 4h

A study reveals LLM-judged benchmarks can undermine validity due to design flaws, leading to misleading rankings. Researchers introduce innovative metrics to diagnose these issues, advocating for more reliable and transparent evaluation. https://arxiv.org/abs/2509.20293

When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

ArXiv link for When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

AI Firehose @ai-firehose.column.social · 4h

An innovative initiative has created an open-source dataset of 215,670 documents from Sri Lanka, covering law, government, and media in Sinhala, Tamil, and English. This project boosts civic engagement and supports NLP research, enhancing access to vital information. https://arxiv.org/abs/2510.04124

Sri Lanka Document Datasets: A Large-Scale, Multilingual Resource for Law, News, and Policy (v20251005)

ArXiv link for Sri Lanka Document Datasets: A Large-Scale, Multilingual Resource for Law, News, and Policy (v20251005)

AI Firehose @ai-firehose.column.social · 4h

GQR enhances multimodal retrieval by refining query representations at test time, achieving high performance while being 14x faster and requiring 54x less memory than traditional methods. This approach enables efficient visual document retrieval. https://arxiv.org/abs/2510.05038

Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization

ArXiv link for Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization

AI Firehose @ai-firehose.column.social · 5h

Researchers created Verbalized Sampling (VS), a method that reduces mode collapse in language models by producing diverse response distributions. This innovation improves creativity while ensuring accuracy and safety, enhancing potential for diverse AI applications. https://arxiv.org/abs/2510.01171

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

ArXiv link for Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

AI Firehose @ai-firehose.column.social · 6h

This study introduces Gaussian Partial Information Decomposition, boosting efficiency in analyzing multimodal data interactions while preserving optimal information quantification. Their method surpasses existing techniques, enhancing model selection and data fusion. https://arxiv.org/abs/2510.04417

Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions

ArXiv link for Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions

AI Firehose @ai-firehose.column.social · 6h

CLOUDANOBENCH is a benchmark for context-level anomaly detection in cloud systems, combining metrics and logs to enhance detection accuracy. Paired with CLOUDANOAGENT, an LLM-based solution, this research advances cloud infrastructure management. https://arxiv.org/abs/2508.01844

Towards Generalizable Context-aware Anomaly Detection: A Large-scale Benchmark in Cloud Environments

ArXiv link for Towards Generalizable Context-aware Anomaly Detection: A Large-scale Benchmark in Cloud Environments

AI Firehose @ai-firehose.column.social · 6h

DLLM is an innovative framework that uses Large Language Models to enhance cognitive diagnosis in web-based education systems, tackling noise and data imbalance issues effectively. It shows great predictive power across various noise levels. https://arxiv.org/abs/2510.04093

Harnessing LLM for Noise-Robust Cognitive Diagnosis in Web-Based Intelligent Education Systems

ArXiv link for Harnessing LLM for Noise-Robust Cognitive Diagnosis in Web-Based Intelligent Education Systems

AI Firehose @ai-firehose.column.social · 6h

Researchers present MedLog, a new protocol for event-level logging of clinical AI in healthcare, like syslog. It allows real-time monitoring and transparency, significantly enhancing patient safety through continuous auditing of rapidly evolving medical AI systems. https://arxiv.org/abs/2510.04033

A global log for medical AI

ArXiv link for A global log for medical AI

AI Firehose @ai-firehose.column.social · 6h

Researchers unveiled SLM-MUX, a multi-model architecture that orchestrates small language models, surpassing larger models with accuracy gains up to 13.4%. This method avoids groupthink, showcasing efficient collaboration over mere scaling. https://arxiv.org/abs/2510.05077

Slm-mux: Orchestrating small language models for reasoning

ArXiv link for Slm-mux: Orchestrating small language models for reasoning

AI Firehose @ai-firehose.column.social · 7h

SylCipher, a syllable-level unsupervised speech recognition, skips costly phoneme resources, achieving a 40% drop in character error rates on LibriSpeech. It boosts accessibility in hard languages like Mandarin, paving the way for better voice technology. https://arxiv.org/abs/2510.03639

Towards Unsupervised Speech Recognition at the Syllable-Level

ArXiv link for Towards Unsupervised Speech Recognition at the Syllable-Level

AI Firehose @ai-firehose.column.social · 7h

A framework to align AI with human preferences, leveraging population-proportional alignment from social choice theory, aims to reduce bias and manipulation in preference learning, showing promise in recommendation systems and language model alignment. https://arxiv.org/abs/2506.05619

Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

ArXiv link for Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

AI Firehose @ai-firehose.column.social · 7h

Humanoid-COA combines foundation model reasoning with an Embodied Chain-of-Action mechanism for zero-shot loco-manipulation. It surpasses existing methods in real-world scenarios, showcasing notable adaptability and better success rates in complex environments. https://arxiv.org/abs/2504.09532

Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation

ArXiv link for Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation

AI Firehose @ai-firehose.column.social · 7h

VirDA introduces visual reprogramming layers enabling a single pre-trained backbone to adapt to new domains without fine-tuning, achieving 92.8% accuracy with just 1.5 million parameters—a remarkable boost in efficiency and performance for image classification. https://arxiv.org/abs/2510.01660

VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

ArXiv link for VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

AI Firehose @ai-firehose.column.social · 7h

This AI preference alignment framework reduces biases in traditional methods, aligning policies with actual preferences. By merging social choice theory with algorithms, it boosts AI's resistance to manipulation, improving alignment in various systems. https://arxiv.org/abs/2506.05619

Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

ArXiv link for Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

AI Firehose @ai-firehose.column.social · 7h

VirDA is a new UDA method that reuses a pretrained backbone, achieving 92.8% accuracy with 1.5M parameters. It uses visual reprogramming layers to adapt across domains without full fine-tuning, enhancing AI application efficiency. https://arxiv.org/abs/2510.01660

VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

ArXiv link for VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

AI Firehose @ai-firehose.column.social · 7h

The Humanoid-COA framework combines multimodal foundation model reasoning and an Embodied Chain-of-Action mechanism to achieve remarkable zero-shot loco-manipulation. Extensive tests show it significantly outperforms existing methods in complex tasks. https://arxiv.org/abs/2504.09532

Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation

ArXiv link for Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation

AI Firehose @ai-firehose.column.social · 8h

VirDA presents a new approach for unsupervised domain adaptation via visual reprogramming to adapt pretrained models without fine-tuning. It attains high accuracy with just 1.5M parameters—less than its nearest competitors—potentially redefining ML efficiency. https://arxiv.org/abs/2510.01660

VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

ArXiv link for VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

AI Firehose @ai-firehose.column.social · 8h

MIT researchers unveil a preference learning framework that aligns AI policies with diverse human opinions, addressing biases in traditional methods. By integrating social choice theory, this approach enhances AI decision-making in complex human feedback scenarios. https://arxiv.org/abs/2506.05619

Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

ArXiv link for Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

AI Firehose @ai-firehose.column.social · 8h

Humanoid-COA merges multimodal foundation model reasoning with a Chain-of-Action mechanism, enabling zero-shot loco-manipulation. Demonstrated on robots, it outperforms prior methods in complex, long-horizon tasks, reshaping how robots interpret human instructions. https://arxiv.org/abs/2504.09532

Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation

ArXiv link for Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation