Michael Günther
michael-g-u.bsky.social
Michael Günther
@michael-g-u.bsky.social
ML @jina-ai.bsky.social
https://github.com/guenthermi
Some examples for instructions:
- How to translate named entities and technical terms (e.g., "Big Data," "Embeddings")
- Specifying date formats (MM/DD/YY, DD/MM/YY, YYYY-MM-DD)
- Define tone of a text (e.g., formal vs informal)
Nevertheless LLM's latency might be much higher.
January 26, 2025 at 4:28 PM
Whether to use late chunking also depends on the chunk size, for smaller chunks late chunking is generally more useful than for large chunk sizes.
December 5, 2024 at 8:49 AM
Chunking improves the performance for fact retrieval task but can actually harm the performance for other retrieval tasks. Late chunking is useful for coherent datasets and often a good compromise to help embeddings to retain context information but also to focus on details:
December 5, 2024 at 8:49 AM
First, more input helps, but not for all retrieval tasks equally:
December 5, 2024 at 8:49 AM