Results#
This section presents the results of the evaluation.
We performed evaluations using popular AI coding agents in representative scenarios — different agents, different programming languages, and different codebases — to show that the results are not specific to a single setup.
All evaluations were conducted using the JetBrains-powered version of Serena, as it is the more powerful backend with a broader set of refactoring and navigation capabilities. The evaluation can easily be repeated with the LSP-based backend to assess its subset of capabilities.
You can run your own evaluation on a project of your choice by reusing our evaluation prompt.