Evaluation Result
Generated by: GPT-5.4 (medium) in Copilot Cli
Codebase: ente - a large monorepo in Dart, TypeScript, Go, Rust, and other languages.
Copilot CLI (GPT-5.4, medium)#
As a coding agent, I’d absolutely ask my owner to add Serena because it makes me noticeably sharper and calmer on real code—especially symbol-aware navigation, cross-file refactors, and monorepo dependency jumps—while I still lean on built-ins for tiny text edits and non-code work
Baseline. I started from source code only, avoided repo docs/notes, and ran each reversible experiment against the repo directly. After every edit/refactor experiment, the working tree returned to its baseline state.
1. Headline: what Serena changes#
Serena changes the workflow when the task is about code symbols rather than raw text. In this repo, the practical delta was:
Added capability / materially better workflow. Symbol-aware navigation and refactors in TypeScript: structural overviews, precise code-only references, hierarchy queries, symbol-targeted rename, file move with import updates, and inline. These usually collapsed a 2-6 step built-in chain into one semantic operation after discovery, and reduced manual scope verification.
Applies but offers little or no improvement. Small local edits inside an already-understood method. Built-ins can patch only the changed lines; Serena’s body replacement resends the whole symbol, so it was often less payload-efficient for 1-3 line tweaks.
Outside Serena’s scope. Non-code reads, free-text search, git inspection, config/package files, and other text-first tasks. Built-ins remained the natural tools there.
Two important observed limits constrained Serena’s delta here: the strongest measured gains were concentrated in the TypeScript desktop app and the Rust core crate where I ran the hands-on comparisons, and some refactors still carried diff-shape tradeoffs such as formatting churn or unexpected target-file choices.
Verdict: In this repo, Serena was a strong TypeScript symbol layer on top of the built-ins, not a general replacement for text/file work.
2. Added value and differences by area#
Area |
What changed vs built-ins |
Frequency |
Value per hit |
|---|---|---|---|
Cross-file symbol refactors |
|
Medium |
High: typically 2-5 calls saved plus less manual scope checking |
Code-only discovery |
Symbol overview, symbol body retrieval, reference search, and type hierarchy returned code structure directly instead of raw text matches. For |
High |
Medium: usually 1-3 follow-up reads/filters avoided |
Stable addressing |
Name paths stayed reusable across multiple edits ( |
Medium |
Medium: less re-reading, less stale-context risk |
Small in-method edits |
Serena was not more efficient. Replacing |
High |
Low negative: built-ins used smaller edit payloads |
External dependency lookup in a monorepo |
Once indexing was available, Serena resolved Electron types from |
Medium |
Medium-High: usually 1-3 searches plus path discovery avoided |
Verdict: Serena’s highest-value delta was semantic refactoring plus dependency-aware code lookup in the TypeScript/Rust parts of the monorepo; its weakest area was tiny local edits.
3. Detailed evidence, grouped by capability#
3.1 Codebase understanding#
Task 1: high-level repository overview#
Attempted: top-level layout and likely code-heavy areas.
Serena chain:
serena-list_dir(.)-> directory list.Built-in chain: not separately needed; Serena added no unique value beyond a normal directory listing here.
Payloads: one short directory listing either way.
Finding: no meaningful delta; this is plain filesystem exploration.
Verdict: For repo layout, Serena was neutral.
Task 2: structural overview of a large file, plus the concrete next step#
Target file:
desktop/src/main.ts(753 lines).Serena chain:
get_symbols_overview(main.ts, depth=1)-> concise symbol map of top-level functions and nested locals undermain; next stepfind_symbol(createMainWindow, include_body=true).Built-in chain:
rgonconst|function|class|exportinmain.ts-> flat text hits; next stepviewof lines331-439to readcreateMainWindow.Payloads observed:
Serena overview output: compact symbol list for the file; next-step body fetch returned only the selected symbol body.
Built-in overview output: many matching lines without structure; next-step read required ~109 lines of file content.
Delta: Serena’s overview was not just shorter; it also supplied stable symbol names for the follow-up call. Built-ins could answer the question, but only after a second text-localizing step.
Verdict: Serena materially improved the “overview -> inspect one function” flow by making the follow-up call symbol-based instead of line-based.
Task 3: retrieve a specific class method body without reading the surrounding file#
Target symbol:
AutoLauncher/toggleAutoLaunchindesktop/src/main/services/auto-launcher.ts.Serena chain:
find_symbol(AutoLauncher/toggleAutoLaunch, include_body=true)-> exact method body.Built-in chain:
viewof the relevant file range (20-40) after locating the method.Payloads observed:
Serena returned the 10-line method body only.
Built-in read returned 21 lines of surrounding class context.
Delta: Serena saved one localization step and avoided unrelated lines.
Verdict: Serena added a real but modest efficiency gain for targeted method retrieval.
Task 4: find all references for one non-trivial symbol; compare code-use recall/precision vs text mention search#
Target symbol:
waitindesktop/src/main/utils/common.ts.Serena chain:
find_referencing_symbols(wait)-> 3 files with symbol contexts:main.ts,ffmpeg-worker.ts,ml-worker.ts.Built-in chain:
rg \bwait\b desktop/src-> 7 files including:real uses/imports,
comments/doc strings (
preload.ts,watch.ts,main.tsprose),the definition itself,
doc mentions in
desktop/docs/release.mdwhen searching broader scope.
Payloads observed:
Serena: one structured result grouped by referencing symbol.
Built-ins: one broader result, but it needed human filtering to answer “who uses this in code?”
Delta: Serena improved precision, not just convenience. For the code-only question, built-ins needed extra filtering or extra reads.
Verdict: Serena clearly improved reference search when the question is semantic (“who uses this in code?”) rather than textual.
Task 5: supertypes / subclasses / implementations#
Equivalent used: interface hierarchy in
web/apps/ensu/src/services/llm/inference.ts, because this TS area had interface implementations rather than rich class inheritance.Serena chain:
type_hierarchy(InferenceBackend, both)->WasmInference,TauriInference;type_hierarchy(WasmInference, both)-> supertypeInferenceBackend.Built-in chain:
rg InferenceBackend|implements InferenceBackend-> manual reconstruction from four text matches.Payloads observed: Serena returned the hierarchy directly; built-ins returned only raw declarations/usages.
Delta: Serena removed the manual synthesis step. Built-ins were sufficient here because the hierarchy was shallow, but that was because the example was small.
Verdict: Serena added moderate value for hierarchy queries; the value grows with hierarchy depth.
Task 6: external dependency symbol lookup#
Targets used after indexing was available:
BrowserWindowandserveNextAtin the desktop TypeScript app, plusUrlandZeroizinginrust/core.Serena chain (TS):
find_declaration(new BrowserWindow(...), include_body=true)->desktop/node_modules/electron/electron.d.ts, bodyclass BrowserWindow extends Electron.BrowserWindow {}find_declaration(import serveNextAt ... , include_body=true)->desktop/node_modules/next-electron-server/index.d.ts, bodydeclare function serveNextAt(uri: string, options?: Options): void;find_symbol(..., search_deps=true)on those dependency files returned dependency-side docs.
Serena chain (Rust):
find_declaration(use reqwest::{Response, Url};, include_body=true)-><ext:lib.rs|...>external symbolUrl[0]with the struct bodyfind_declaration(use zeroize::Zeroizing;, include_body=true)-><ext:lib.rs|...>external symbolZeroizing[0]with the struct bodyfind_symbol(..., relative_path=<ext...>, search_deps=true)returned dependency-side docs for those external symbols.
Built-in equivalent chain:
Manually infer the correct monorepo-local dependency root (
desktop/node_modules, not repo root),or manually inspect Cargo metadata /
Cargo.lock,then open the resolved dependency files directly (for Rust, under the Cargo registry).
Payloads observed: Serena returned the declaration target and a small signature/body directly; built-ins required package-root discovery first, which is a real extra step in a monorepo.
Delta: Serena does add capability and efficiency here once indexing exists. The gain is larger in this monorepo than in a single-package repo because dependency ownership is split across package-local Node dependencies and shared Cargo registry sources.
Verdict: With indexing available, Serena added meaningful external dependency lookup, and the value was amplified by the monorepo layout.
3.2 Single-file edits#
Task 7a: small tweak (1-3 lines inside a method)#
Change: rename local
autoLaunch->launcherinsideAutoLauncher/toggleAutoLaunch.Built-in chain:
view(auto-launcher.ts, 20-40)->apply_patchon 3 changed lines ->git diff.Serena chain:
find_symbol(toggleAutoLaunch, include_body=true)->replace_symbol_body(toggleAutoLaunch)->git diff.Payloads observed:
Built-ins: read 21 lines, patch changed 3 logical lines.
Serena: fetched 10-line body, resent full 10-line body.
Delta: same result, same number of main steps, but the symbolic edit resent untouched lines.
Verdict: For tiny in-method tweaks, built-ins were more payload-efficient and Serena added no real workflow advantage.
Task 7b: medium rewrite (~10-30 lines)#
Change: rewrite
uniqueSavePathto usecandidatePath+forloop.Built-in chain:
view(main.ts, 500-540)->apply_patchreplacing the function body ->git diff.Serena chain:
find_symbol(uniqueSavePath, include_body=true)->replace_symbol_body(uniqueSavePath)->git diff.Payloads observed:
Built-ins: read 41 lines to safely anchor a ~10-line rewrite.
Serena: fetched the 11-line symbol body and resent the rewritten body only.
Delta: Here Serena was more efficient: less prerequisite read volume and no dependence on surrounding file context.
Verdict: Serena was better for medium symbol-sized rewrites.
Task 7c: large / whole-body rewrite#
Change: rewrite the entire
createMainWindowbody.Built-in chain:
view(main.ts, 331-439)->apply_patchreplacing the function body ->git diff.Serena chain:
find_symbol(createMainWindow, include_body=true)->replace_symbol_body(createMainWindow)->git diff.Payloads observed:
Built-ins: read ~109 lines and patched the whole function.
Serena: fetched the same symbol body and resent the whole rewritten body.
Delta: Serena still avoided a file-range read, but once the symbol itself dominates the payload, the token gap mostly disappears.
Verdict: For whole-body rewrites, Serena’s gain was modest: better addressing, not dramatically smaller payload.
Task 8: insert a new function at a structural location#
Insertion:
waitSecondsafterwaitindesktop/src/main/utils/common.ts.Built-in chain:
view(common.ts, 1-40)->apply_patchinserting after the existing function.Serena chain:
find_symbol(wait)->insert_after_symbol(wait).Payloads observed:
Built-ins: read 26 lines to place a 1-line function.
Serena: no extra file-range read once the symbol name was known.
Delta: Serena made the location structural instead of textual.
Verdict: Serena improved insertions when the location is “after symbol X” rather than “after line Y”.
Task 9: rename a private helper used only within one file#
Target symbol:
openStreetMapUserAgentindesktop/src/main.ts.Built-in chain:
view/rgto find call site + definition ->apply_patchupdating both.Serena chain:
rename(openStreetMapUserAgent -> buildOpenStreetMapUserAgent).Payloads observed:
Built-ins: two textual sites had to be found and updated manually.
Serena: one rename call, terse success response (
"Success").
Delta: small call-count improvement, larger correctness improvement when the file is bigger or the name is less unique.
Verdict: Serena was somewhat better even for single-file private renames because it removed manual site enumeration.
3.3 Multi-file changes#
Task 10: rename a symbol across several files including imports#
Target symbol:
wait->delay.Built-in chain: read
common.ts,main.ts,ffmpeg-worker.ts,ml-worker.ts-> one multi-fileapply_patch->git diff.Serena chain:
rename(wait -> delay)on the defining symbol ->git diff.Payloads observed:
Built-ins: required reading 4 files and manually updating export, imports, and call sites.
Serena: one semantic rename updated the same 4 files.
Success signals: built-ins only showed success via resulting diff; Serena returned
"Success".Delta: this is one of Serena’s clearest wins: same final diff, far less manual scope work.
Verdict: Serena strongly improved multi-file renames by collapsing discovery + edit into one symbol-based refactor.
Task 11: move a symbol from one module to another, updating imports#
Target: move
nullToUndefinedout ofcommon.ts.Serena chain executed:
move(nullToUndefined, target_relative_path=http.ts)->git diff.Observed result: Serena removed the symbol from
common.ts, updatedffmpeg-worker.ts, but created a new filedesktop/src/main/utils/nullToUndefined.tsinstead of merging intohttp.ts.Built-in equivalent: would require manually copying the symbol into the intended target module, updating imports, then deleting the old definition.
Delta: Serena still automated the cross-file update, but did not provide the specific “move into existing module” behavior I was testing.
Verdict: Serena added partial value for symbol moves here, but not the full capability of “move into a chosen existing TS file”.
Task 12: move a file/package and update imports#
Target: move
desktop/src/main/utils/http.tstodesktop/src/main/services/http.ts.Serena chain:
move(file http.ts -> services/)->git diff.Observed result: the file was renamed/moved and
ffmpeg-worker.tsimport updated from../utils/httpto./http.Built-in equivalent: locate all imports, move the file, patch each import path, then verify.
Delta: this was a real one-call semantic file move.
Verdict: Serena materially improved file moves that require import updates.
Task 12 (safe delete with no remaining usages)#
Attempted: searched for naturally unused TS symbols in the working areas (
main.ts,common.ts,temp.ts,inference.ts) and checked several candidates (registerForEnteLinks,minimumWindowSize,AutoLauncher/isEnabled,openStreetMapUserAgent,safeJson,buildSamplingConfig).Observed result: every plausible candidate still had live references.
Outcome: no suitable candidate found in the TS areas where Serena was operational, so I skipped this comparison instead of forcing an invalid input.
Verdict: No evidence either way here because the repo did not offer a clean unused-symbol candidate in the code areas Serena handled reliably.
Task 13: delete a symbol and propagate deletion to call sites#
Attempted: looked for a helper whose call sites could be semantically removed rather than inlined or manually rewritten.
Observed result: the good candidates in this repo were better modeled as inline refactors, not delete-with-propagation.
Outcome: no suitable candidate; skipped rather than using an unsafe input.
Verdict: No measured delta here because the available candidates were inline candidates, not safe propagate-delete candidates.
Task 13 (inline a small helper)#
Target symbol:
waitForRendererDevServer.Built-in chain:
viewcall site + definition ->apply_patchreplacingawait waitForRendererDevServer()withawait wait(1000)and deleting the helper ->git diff.Serena chain:
inline(waitForRendererDevServer, keep_definition=false)->git diff.Observed result: both achieved the inline, but Serena also rewrote unrelated import formatting at the top of the file.
Success signals: built-ins: resulting diff; Serena:
{"status":"SUCCESS"}.Delta: Serena added the unique semantic refactor, but in this run it also introduced format churn outside the logical change.
Verdict: Serena added real inline capability, with a low-frequency but real tradeoff of broader formatting churn.
3.4 Reliability & correctness-oriented checks#
Task 14: scope precision#
Demonstrated with:
AutoLauncher/toggleAutoLaunch,openStreetMapUserAgent, andInferenceBackend.Serena: symbol names and name paths targeted the exact code entity.
Built-ins: text search for names such as
waitorwriteToTemporaryFileover-matched comments, docs, and multiple textual occurrences.Delta: Serena’s unit of work was the symbol; built-ins’ unit was the matching line.
Verdict: Serena was reliably more precise whenever the target was a symbol rather than a string.
Task 15: atomicity#
Observed: Serena rename/file-move/inline each ran as one refactor operation after symbol selection.
Built-ins: a single
apply_patchcan update multiple files atomically as a patch, but it cannot discover missed sites; semantic completeness remains manual.Delta: Serena’s advantage was not transactional all-or-none patching; it was scope computation.
Verdict: Serena improved semantic completeness more than patch atomicity.
Task 16: success signals#
Observed Serena success outputs:
OKfor body replacement,"Success"for rename, JSON result for move,{"status":"SUCCESS"}for inline.Observed built-in success outputs: only indirect evidence via
git diff/ clean revert.
Verdict: Serena gave clearer machine-readable success signals for refactors than the built-ins did.
4. Token-efficiency analysis#
By edit size#
Edit size |
Built-ins |
Serena |
More efficient |
|---|---|---|---|
Small tweak ( |
Read ~21 lines, patch only changed lines |
Fetch 10-line body, resend full 10-line body |
Built-ins |
Medium rewrite ( |
Read ~41 lines to safely patch ~10 lines |
Fetch 11-line body, resend 11-line body |
Serena |
Large rewrite ( |
Read ~109 lines, patch whole body |
Fetch ~same symbol body, resend whole body |
Near tie, slight Serena advantage only from structural targeting |
Cross-file rename ( |
Read 4 files, craft 4-file patch |
One rename after discovery |
Serena by a large margin |
Forced reads#
Built-ins often needed a localization read before the edit.
Serena avoided that when the symbol was already known, but not when the task itself required understanding the body.
Stable vs ephemeral addressing#
Serena’s addresses (
createMainWindow,wait,openStreetMapUserAgent) stayed useful across later operations.Built-in line ranges from
viewbecame stale after edits, so later text operations required re-grepping or re-viewing.
Verdict: Serena is most token-efficient for medium-to-large symbol work and cross-file refactors; built-ins stay leaner for tiny local edits.
5. Reliability & correctness (under correct use)#
Precision of matching: Serena’s reference search answered “who uses this in code?” better than
rg, which mixed real uses with prose/comment matches.Scope disambiguation: Serena targeted exact symbols (
AutoLauncher/toggleAutoLaunch,InferenceBackend) rather than relying on unique text strings.Atomicity: Serena computed and updated semantic scope in one refactor call; built-ins could batch edits, but only after manual scope discovery.
Semantic queries vs text search: hierarchy and references were the strongest examples. Built-ins could reconstruct them, but only with manual interpretation.
External dependencies: after indexing was available, Serena resolved desktop TypeScript dependencies into package-local declaration files under
desktop/node_modulesand Rust dependencies into external Cargo sources such asurlandzeroize. Built-ins could still reach those files, but only after manual package-root or registry-path discovery.Monorepo effect: this repo magnified Serena’s dependency-lookup value because “the dependency source” was not at one obvious global root. Serena jumped from app code to the right package-local or registry-backed dependency context directly.
Verdict: Serena improved correctness by narrowing work to exact symbols and by resolving dependencies across monorepo boundaries.
6. Workflow effects across a session#
Advantages compounded when I stayed in symbol space. Example:
get_symbols_overview(main.ts)produced symbol names that I later reused forfind_symbol(createMainWindow),rename(openStreetMapUserAgent), andinline(waitForRendererDevServer).Built-in workflows required refreshes. Across repeated
main.tsexperiments, I repeatedly had to reacquire ranges withview/rgbefore editing because prior line-based context was no longer trustworthy.In the monorepo, Serena also compounded by removing package-boundary bookkeeping. In the desktop app I could jump from
main.tsinto Electron andnext-electron-serverdeclarations without first reasoning about workspace roots; inrust/coreI could jump into Cargo-registry dependencies through external symbol handles instead of manually reconstructing registry paths fromCargo.lock.The compounding effect disappeared for tiny edits and non-code work, where built-ins were already direct and minimal.
One tradeoff compounded too: some Serena refactors carried formatting side effects (notably
inline), so the semantic benefit does not guarantee a surgically small diff.
Verdict: Serena’s advantages compound most in code-centric monorepo sessions, where symbol reuse and dependency jumps save both re-reading and package-root discovery work.
7. Unique capabilities#
Capability with no practical one-step built-in equivalent |
Frequency |
Impact |
|---|---|---|
Semantic cross-file rename from a single symbol definition |
Medium |
High |
Type hierarchy query (implementations / supertypes) |
Low-Medium |
Medium |
Inline refactor across call sites |
Low |
High when applicable |
File move with import updates |
Low-Medium |
High |
External dependency resolution from in-repo code into package-local or registry-backed sources |
Medium |
Medium-High |
Built-ins can approximate all of these manually, but not as a single semantic operation.
Verdict: Serena did add unique practical capabilities, especially around refactors that require scope computation rather than text replacement.
8. Tasks outside Serena’s scope (built-in only)#
Reading non-code files like
desktop/package.jsonFree-text search such as
ente://appor URL stringsGit inspection / diff / cleanup
Config/package/changelog/docs/notebook reading
Exact textual patching once the line range is already known
In this session, these built-in-only tasks were roughly 40% of the total operational steps by count, but they were usually the low-complexity steps around the more valuable semantic work.
Verdict: A substantial share of everyday terminal work remains built-in-only, but Serena targets the higher-value symbol-heavy slice rather than the whole session.
9. Practical usage rule#
Use Serena first when the task is about a code symbol and especially when it spans multiple files, references, or a whole symbol body. Use built-ins first when the task is about text, config/docs, free-text search, git state, or a 1-3 line local tweak. The highest-yield mixed workflow in this repo was: discover/refactor with Serena, inspect non-code and do tiny patches with built-ins.
Verdict: Choose Serena for symbol semantics and built-ins for text locality.