Claude 4 Sonnet solved 26% of Kotlin-bench tasks, outperforming OpenAI's o3.
Claude 4 Sonnet & Opus are available in Firebender today for all users of JetBrains IDEs. Try them out and let us know what you think!
Claude 4 Sonnet solved 26% of Kotlin-bench tasks, outperforming OpenAI's o3.
Claude 4 Sonnet & Opus are available in Firebender today for all users of JetBrains IDEs. Try them out and let us know what you think!
TL;DR: Grok 3 is a very capable coding model for Android & Kotlin development. GPT-4.1 shows improvement but still trails behind other major competitors.
See the full leaderboard here:
firebender.com/leaderboard
TL;DR: Grok 3 is a very capable coding model for Android & Kotlin development. GPT-4.1 shows improvement but still trails behind other major competitors.
See the full leaderboard here:
firebender.com/leaderboard
Gemini 2.5 topped the leaderboard solving 14% of issues, with Claude 3.7 thinking solving 12% in 2nd place.
Code, datasets, and results here: firebender.com/blog/kotlin-...
Gemini 2.5 topped the leaderboard solving 14% of issues, with Claude 3.7 thinking solving 12% in 2nd place.
Code, datasets, and results here: firebender.com/blog/kotlin-...