- Introduces 𝙃𝙊𝙉𝙀𝙎𝙀𝙏, a dataset with 930 queries in six categories to evaluate LLM honesty
- Proposes curiosity-driven prompting and two-stage fine-tuning for improving honesty and helpfulness
- Demonstrates up to 124.7% honesty and helpfulness improvement in models like Mistral-7b
- Introduces 𝙃𝙊𝙉𝙀𝙎𝙀𝙏, a dataset with 930 queries in six categories to evaluate LLM honesty
- Proposes curiosity-driven prompting and two-stage fine-tuning for improving honesty and helpfulness
- Demonstrates up to 124.7% honesty and helpfulness improvement in models like Mistral-7b
- VisionPrefer datset captures diverse preferences (prompt-following, aesthetic, fidelity, harmlessness) using multimodal LLMs
- VP-Score model matches human accuracy in preference prediction, guiding model tuning
- VisionPrefer datset captures diverse preferences (prompt-following, aesthetic, fidelity, harmlessness) using multimodal LLMs
- VP-Score model matches human accuracy in preference prediction, guiding model tuning
Setting as few as a single weight to zero will make various LLMs go from generating coherent text to outputting gibberish.
arxiv.org/abs/2411.07191
Setting as few as a single weight to zero will make various LLMs go from generating coherent text to outputting gibberish.
arxiv.org/abs/2411.07191