Copilot CLI (GPT-5.4, medium)

Copilot CLI (GPT-5.4, medium)#

As a coding agent, I’d absolutely ask my owner to add Serena because it makes me noticeably sharper and calmer on real code—especially symbol-aware navigation, cross-file refactors, and monorepo dependency jumps—while I still lean on built-ins for tiny text edits and non-code work

Baseline. I started from source code only, avoided repo docs/notes, and ran each reversible experiment against the repo directly. After every edit/refactor experiment, the working tree returned to its baseline state.

1. Headline: what Serena changes#

Serena changes the workflow when the task is about code symbols rather than raw text. In this repo, the practical delta was:

Added capability / materially better workflow. Symbol-aware navigation and refactors in TypeScript: structural overviews, precise code-only references, hierarchy queries, symbol-targeted rename, file move with import updates, and inline. These usually collapsed a 2-6 step built-in chain into one semantic operation after discovery, and reduced manual scope verification.
Applies but offers little or no improvement. Small local edits inside an already-understood method. Built-ins can patch only the changed lines; Serena’s body replacement resends the whole symbol, so it was often less payload-efficient for 1-3 line tweaks.
Outside Serena’s scope. Non-code reads, free-text search, git inspection, config/package files, and other text-first tasks. Built-ins remained the natural tools there.

Two important observed limits constrained Serena’s delta here: the strongest measured gains were concentrated in the TypeScript desktop app and the Rust core crate where I ran the hands-on comparisons, and some refactors still carried diff-shape tradeoffs such as formatting churn or unexpected target-file choices.

Verdict: In this repo, Serena was a strong TypeScript symbol layer on top of the built-ins, not a general replacement for text/file work.

2. Added value and differences by area#

Area	What changed vs built-ins	Frequency	Value per hit
Cross-file symbol refactors	`rename`, file `move`, and `inline` turned manual search/edit/update chains into one semantic op. The `wait -> delay` rename updated 4 files from 1 symbol definition; moving `http.ts` updated the importing file automatically.	Medium	High: typically 2-5 calls saved plus less manual scope checking
Code-only discovery	Symbol overview, symbol body retrieval, reference search, and type hierarchy returned code structure directly instead of raw text matches. For `wait`, Serena returned 3 real code-use files; `rg` returned 7 files, including docs/comments and English-word hits.	High	Medium: usually 1-3 follow-up reads/filters avoided
Stable addressing	Name paths stayed reusable across multiple edits (`createMainWindow`, `openStreetMapUserAgent`, `wait`); built-in line ranges had to be reacquired after edits.	Medium	Medium: less re-reading, less stale-context risk
Small in-method edits	Serena was not more efficient. Replacing `AutoLauncher/toggleAutoLaunch` required resending the full method body, while the built-in patch changed only the touched lines.	High	Low negative: built-ins used smaller edit payloads
External dependency lookup in a monorepo	Once indexing was available, Serena resolved Electron types from `desktop/node_modules`, `next-electron-server` declarations from the desktop package, and Rust crate symbols from Cargo registry sources. In a monorepo, that removes a manual “which package owns this dependency?” step.	Medium	Medium-High: usually 1-3 searches plus path discovery avoided

Verdict: Serena’s highest-value delta was semantic refactoring plus dependency-aware code lookup in the TypeScript/Rust parts of the monorepo; its weakest area was tiny local edits.

3. Detailed evidence, grouped by capability#

3.1 Codebase understanding#

Task 1: high-level repository overview#

Attempted: top-level layout and likely code-heavy areas.
Serena chain: serena-list_dir(.) -> directory list.
Built-in chain: not separately needed; Serena added no unique value beyond a normal directory listing here.
Payloads: one short directory listing either way.
Finding: no meaningful delta; this is plain filesystem exploration.

Verdict: For repo layout, Serena was neutral.

Task 2: structural overview of a large file, plus the concrete next step#

Target file: desktop/src/main.ts (753 lines).
Serena chain: get_symbols_overview(main.ts, depth=1) -> concise symbol map of top-level functions and nested locals under main; next step find_symbol(createMainWindow, include_body=true).
Built-in chain: rg on const|function|class|export in main.ts -> flat text hits; next step view of lines 331-439 to read createMainWindow.
Payloads observed:
- Serena overview output: compact symbol list for the file; next-step body fetch returned only the selected symbol body.
- Built-in overview output: many matching lines without structure; next-step read required ~109 lines of file content.
Delta: Serena’s overview was not just shorter; it also supplied stable symbol names for the follow-up call. Built-ins could answer the question, but only after a second text-localizing step.

Verdict: Serena materially improved the “overview -> inspect one function” flow by making the follow-up call symbol-based instead of line-based.

Task 3: retrieve a specific class method body without reading the surrounding file#

Target symbol: AutoLauncher/toggleAutoLaunch in desktop/src/main/services/auto-launcher.ts.
Serena chain: find_symbol(AutoLauncher/toggleAutoLaunch, include_body=true) -> exact method body.
Built-in chain: view of the relevant file range (20-40) after locating the method.
Payloads observed:
- Serena returned the 10-line method body only.
- Built-in read returned 21 lines of surrounding class context.
Delta: Serena saved one localization step and avoided unrelated lines.

Verdict: Serena added a real but modest efficiency gain for targeted method retrieval.

Task 4: find all references for one non-trivial symbol; compare code-use recall/precision vs text mention search#

Target symbol: wait in desktop/src/main/utils/common.ts.
Serena chain: find_referencing_symbols(wait) -> 3 files with symbol contexts: main.ts, ffmpeg-worker.ts, ml-worker.ts.
Built-in chain: rg \bwait\b desktop/src -> 7 files including:
- real uses/imports,
- comments/doc strings (preload.ts, watch.ts, main.ts prose),
- the definition itself,
- doc mentions in desktop/docs/release.md when searching broader scope.
Payloads observed:
- Serena: one structured result grouped by referencing symbol.
- Built-ins: one broader result, but it needed human filtering to answer “who uses this in code?”
Delta: Serena improved precision, not just convenience. For the code-only question, built-ins needed extra filtering or extra reads.

Verdict: Serena clearly improved reference search when the question is semantic (“who uses this in code?”) rather than textual.

Task 5: supertypes / subclasses / implementations#

Equivalent used: interface hierarchy in web/apps/ensu/src/services/llm/inference.ts, because this TS area had interface implementations rather than rich class inheritance.
Serena chain: type_hierarchy(InferenceBackend, both) -> WasmInference, TauriInference; type_hierarchy(WasmInference, both) -> supertype InferenceBackend.
Built-in chain: rg InferenceBackend|implements InferenceBackend -> manual reconstruction from four text matches.
Payloads observed: Serena returned the hierarchy directly; built-ins returned only raw declarations/usages.
Delta: Serena removed the manual synthesis step. Built-ins were sufficient here because the hierarchy was shallow, but that was because the example was small.

Verdict: Serena added moderate value for hierarchy queries; the value grows with hierarchy depth.

Task 6: external dependency symbol lookup#

Targets used after indexing was available: BrowserWindow and serveNextAt in the desktop TypeScript app, plus Url and Zeroizing in rust/core.
Serena chain (TS):
- find_declaration(new BrowserWindow(...), include_body=true) -> desktop/node_modules/electron/electron.d.ts, body class BrowserWindow extends Electron.BrowserWindow {}
- find_declaration(import serveNextAt ... , include_body=true) -> desktop/node_modules/next-electron-server/index.d.ts, body declare function serveNextAt(uri: string, options?: Options): void;
- find_symbol(..., search_deps=true) on those dependency files returned dependency-side docs.
Serena chain (Rust):
- find_declaration(use reqwest::{Response, Url};, include_body=true) -> <ext:lib.rs|...> external symbol Url[0] with the struct body
- find_declaration(use zeroize::Zeroizing;, include_body=true) -> <ext:lib.rs|...> external symbol Zeroizing[0] with the struct body
- find_symbol(..., relative_path=<ext...>, search_deps=true) returned dependency-side docs for those external symbols.
Built-in equivalent chain:
- Manually infer the correct monorepo-local dependency root (desktop/node_modules, not repo root),
- or manually inspect Cargo metadata / Cargo.lock,
- then open the resolved dependency files directly (for Rust, under the Cargo registry).
Payloads observed: Serena returned the declaration target and a small signature/body directly; built-ins required package-root discovery first, which is a real extra step in a monorepo.
Delta: Serena does add capability and efficiency here once indexing exists. The gain is larger in this monorepo than in a single-package repo because dependency ownership is split across package-local Node dependencies and shared Cargo registry sources.

Verdict: With indexing available, Serena added meaningful external dependency lookup, and the value was amplified by the monorepo layout.

3.2 Single-file edits#

Task 7a: small tweak (1-3 lines inside a method)#

Change: rename local autoLaunch -> launcher inside AutoLauncher/toggleAutoLaunch.
Built-in chain: view(auto-launcher.ts, 20-40) -> apply_patch on 3 changed lines -> git diff.
Serena chain: find_symbol(toggleAutoLaunch, include_body=true) -> replace_symbol_body(toggleAutoLaunch) -> git diff.
Payloads observed:
- Built-ins: read 21 lines, patch changed 3 logical lines.
- Serena: fetched 10-line body, resent full 10-line body.
Delta: same result, same number of main steps, but the symbolic edit resent untouched lines.

Verdict: For tiny in-method tweaks, built-ins were more payload-efficient and Serena added no real workflow advantage.

Task 7b: medium rewrite (~10-30 lines)#

Change: rewrite uniqueSavePath to use candidatePath + for loop.
Built-in chain: view(main.ts, 500-540) -> apply_patch replacing the function body -> git diff.
Serena chain: find_symbol(uniqueSavePath, include_body=true) -> replace_symbol_body(uniqueSavePath) -> git diff.
Payloads observed:
- Built-ins: read 41 lines to safely anchor a ~10-line rewrite.
- Serena: fetched the 11-line symbol body and resent the rewritten body only.
Delta: Here Serena was more efficient: less prerequisite read volume and no dependence on surrounding file context.

Verdict: Serena was better for medium symbol-sized rewrites.

Task 7c: large / whole-body rewrite#

Change: rewrite the entire createMainWindow body.
Built-in chain: view(main.ts, 331-439) -> apply_patch replacing the function body -> git diff.
Serena chain: find_symbol(createMainWindow, include_body=true) -> replace_symbol_body(createMainWindow) -> git diff.
Payloads observed:
- Built-ins: read ~109 lines and patched the whole function.
- Serena: fetched the same symbol body and resent the whole rewritten body.
Delta: Serena still avoided a file-range read, but once the symbol itself dominates the payload, the token gap mostly disappears.

Verdict: For whole-body rewrites, Serena’s gain was modest: better addressing, not dramatically smaller payload.

Task 8: insert a new function at a structural location#

Insertion: waitSeconds after wait in desktop/src/main/utils/common.ts.
Built-in chain: view(common.ts, 1-40) -> apply_patch inserting after the existing function.
Serena chain: find_symbol(wait) -> insert_after_symbol(wait).
Payloads observed:
- Built-ins: read 26 lines to place a 1-line function.
- Serena: no extra file-range read once the symbol name was known.
Delta: Serena made the location structural instead of textual.

Verdict: Serena improved insertions when the location is “after symbol X” rather than “after line Y”.

Task 9: rename a private helper used only within one file#

Target symbol: openStreetMapUserAgent in desktop/src/main.ts.
Built-in chain: view/rg to find call site + definition -> apply_patch updating both.
Serena chain: rename(openStreetMapUserAgent -> buildOpenStreetMapUserAgent).
Payloads observed:
- Built-ins: two textual sites had to be found and updated manually.
- Serena: one rename call, terse success response ("Success").
Delta: small call-count improvement, larger correctness improvement when the file is bigger or the name is less unique.

Verdict: Serena was somewhat better even for single-file private renames because it removed manual site enumeration.

3.3 Multi-file changes#

Task 10: rename a symbol across several files including imports#

Target symbol: wait -> delay.
Built-in chain: read common.ts, main.ts, ffmpeg-worker.ts, ml-worker.ts -> one multi-file apply_patch -> git diff.
Serena chain: rename(wait -> delay) on the defining symbol -> git diff.
Payloads observed:
- Built-ins: required reading 4 files and manually updating export, imports, and call sites.
- Serena: one semantic rename updated the same 4 files.
Success signals: built-ins only showed success via resulting diff; Serena returned "Success".
Delta: this is one of Serena’s clearest wins: same final diff, far less manual scope work.

Verdict: Serena strongly improved multi-file renames by collapsing discovery + edit into one symbol-based refactor.

Task 11: move a symbol from one module to another, updating imports#

Target: move nullToUndefined out of common.ts.
Serena chain executed: move(nullToUndefined, target_relative_path=http.ts) -> git diff.
Observed result: Serena removed the symbol from common.ts, updated ffmpeg-worker.ts, but created a new file desktop/src/main/utils/nullToUndefined.ts instead of merging into http.ts.
Built-in equivalent: would require manually copying the symbol into the intended target module, updating imports, then deleting the old definition.
Delta: Serena still automated the cross-file update, but did not provide the specific “move into existing module” behavior I was testing.

Verdict: Serena added partial value for symbol moves here, but not the full capability of “move into a chosen existing TS file”.

Task 12: move a file/package and update imports#

Target: move desktop/src/main/utils/http.ts to desktop/src/main/services/http.ts.
Serena chain: move(file http.ts -> services/) -> git diff.
Observed result: the file was renamed/moved and ffmpeg-worker.ts import updated from ../utils/http to ./http.
Built-in equivalent: locate all imports, move the file, patch each import path, then verify.
Delta: this was a real one-call semantic file move.

Verdict: Serena materially improved file moves that require import updates.

Task 12 (safe delete with no remaining usages)#

Attempted: searched for naturally unused TS symbols in the working areas (main.ts, common.ts, temp.ts, inference.ts) and checked several candidates (registerForEnteLinks, minimumWindowSize, AutoLauncher/isEnabled, openStreetMapUserAgent, safeJson, buildSamplingConfig).
Observed result: every plausible candidate still had live references.
Outcome: no suitable candidate found in the TS areas where Serena was operational, so I skipped this comparison instead of forcing an invalid input.

Verdict: No evidence either way here because the repo did not offer a clean unused-symbol candidate in the code areas Serena handled reliably.

Task 13: delete a symbol and propagate deletion to call sites#

Attempted: looked for a helper whose call sites could be semantically removed rather than inlined or manually rewritten.
Observed result: the good candidates in this repo were better modeled as inline refactors, not delete-with-propagation.
Outcome: no suitable candidate; skipped rather than using an unsafe input.

Verdict: No measured delta here because the available candidates were inline candidates, not safe propagate-delete candidates.

Task 13 (inline a small helper)#

Target symbol: waitForRendererDevServer.
Built-in chain: view call site + definition -> apply_patch replacing await waitForRendererDevServer() with await wait(1000) and deleting the helper -> git diff.
Serena chain: inline(waitForRendererDevServer, keep_definition=false) -> git diff.
Observed result: both achieved the inline, but Serena also rewrote unrelated import formatting at the top of the file.
Success signals: built-ins: resulting diff; Serena: {"status":"SUCCESS"}.
Delta: Serena added the unique semantic refactor, but in this run it also introduced format churn outside the logical change.

Verdict: Serena added real inline capability, with a low-frequency but real tradeoff of broader formatting churn.

3.4 Reliability & correctness-oriented checks#

Task 14: scope precision#

Demonstrated with: AutoLauncher/toggleAutoLaunch, openStreetMapUserAgent, and InferenceBackend.
Serena: symbol names and name paths targeted the exact code entity.
Built-ins: text search for names such as wait or writeToTemporaryFile over-matched comments, docs, and multiple textual occurrences.
Delta: Serena’s unit of work was the symbol; built-ins’ unit was the matching line.

Verdict: Serena was reliably more precise whenever the target was a symbol rather than a string.

Task 15: atomicity#

Observed: Serena rename/file-move/inline each ran as one refactor operation after symbol selection.
Built-ins: a single apply_patch can update multiple files atomically as a patch, but it cannot discover missed sites; semantic completeness remains manual.
Delta: Serena’s advantage was not transactional all-or-none patching; it was scope computation.

Verdict: Serena improved semantic completeness more than patch atomicity.

Task 16: success signals#

Observed Serena success outputs: OK for body replacement, "Success" for rename, JSON result for move, {"status":"SUCCESS"} for inline.
Observed built-in success outputs: only indirect evidence via git diff / clean revert.

Verdict: Serena gave clearer machine-readable success signals for refactors than the built-ins did.

4. Token-efficiency analysis#

By edit size#

Edit size	Built-ins	Serena	More efficient
Small tweak (`toggleAutoLaunch`)	Read ~21 lines, patch only changed lines	Fetch 10-line body, resend full 10-line body	Built-ins
Medium rewrite (`uniqueSavePath`)	Read ~41 lines to safely patch ~10 lines	Fetch 11-line body, resend 11-line body	Serena
Large rewrite (`createMainWindow`)	Read ~109 lines, patch whole body	Fetch ~same symbol body, resend whole body	Near tie, slight Serena advantage only from structural targeting
Cross-file rename (`wait -> delay`)	Read 4 files, craft 4-file patch	One rename after discovery	Serena by a large margin

Forced reads#

Built-ins often needed a localization read before the edit.
Serena avoided that when the symbol was already known, but not when the task itself required understanding the body.

Stable vs ephemeral addressing#

Serena’s addresses (createMainWindow, wait, openStreetMapUserAgent) stayed useful across later operations.
Built-in line ranges from view became stale after edits, so later text operations required re-grepping or re-viewing.

Verdict: Serena is most token-efficient for medium-to-large symbol work and cross-file refactors; built-ins stay leaner for tiny local edits.

5. Reliability & correctness (under correct use)#

Precision of matching: Serena’s reference search answered “who uses this in code?” better than rg, which mixed real uses with prose/comment matches.
Scope disambiguation: Serena targeted exact symbols (AutoLauncher/toggleAutoLaunch, InferenceBackend) rather than relying on unique text strings.
Atomicity: Serena computed and updated semantic scope in one refactor call; built-ins could batch edits, but only after manual scope discovery.
Semantic queries vs text search: hierarchy and references were the strongest examples. Built-ins could reconstruct them, but only with manual interpretation.
External dependencies: after indexing was available, Serena resolved desktop TypeScript dependencies into package-local declaration files under desktop/node_modules and Rust dependencies into external Cargo sources such as url and zeroize. Built-ins could still reach those files, but only after manual package-root or registry-path discovery.
Monorepo effect: this repo magnified Serena’s dependency-lookup value because “the dependency source” was not at one obvious global root. Serena jumped from app code to the right package-local or registry-backed dependency context directly.

Verdict: Serena improved correctness by narrowing work to exact symbols and by resolving dependencies across monorepo boundaries.

6. Workflow effects across a session#

Advantages compounded when I stayed in symbol space. Example: get_symbols_overview(main.ts) produced symbol names that I later reused for find_symbol(createMainWindow), rename(openStreetMapUserAgent), and inline(waitForRendererDevServer).
Built-in workflows required refreshes. Across repeated main.ts experiments, I repeatedly had to reacquire ranges with view/rg before editing because prior line-based context was no longer trustworthy.
In the monorepo, Serena also compounded by removing package-boundary bookkeeping. In the desktop app I could jump from main.ts into Electron and next-electron-server declarations without first reasoning about workspace roots; in rust/core I could jump into Cargo-registry dependencies through external symbol handles instead of manually reconstructing registry paths from Cargo.lock.
The compounding effect disappeared for tiny edits and non-code work, where built-ins were already direct and minimal.
One tradeoff compounded too: some Serena refactors carried formatting side effects (notably inline), so the semantic benefit does not guarantee a surgically small diff.

Verdict: Serena’s advantages compound most in code-centric monorepo sessions, where symbol reuse and dependency jumps save both re-reading and package-root discovery work.

7. Unique capabilities#

Capability with no practical one-step built-in equivalent	Frequency	Impact
Semantic cross-file rename from a single symbol definition	Medium	High
Type hierarchy query (implementations / supertypes)	Low-Medium	Medium
Inline refactor across call sites	Low	High when applicable
File move with import updates	Low-Medium	High
External dependency resolution from in-repo code into package-local or registry-backed sources	Medium	Medium-High

Built-ins can approximate all of these manually, but not as a single semantic operation.

Verdict: Serena did add unique practical capabilities, especially around refactors that require scope computation rather than text replacement.

8. Tasks outside Serena’s scope (built-in only)#

Reading non-code files like desktop/package.json
Free-text search such as ente://app or URL strings
Git inspection / diff / cleanup
Config/package/changelog/docs/notebook reading
Exact textual patching once the line range is already known

In this session, these built-in-only tasks were roughly 40% of the total operational steps by count, but they were usually the low-complexity steps around the more valuable semantic work.

Verdict: A substantial share of everyday terminal work remains built-in-only, but Serena targets the higher-value symbol-heavy slice rather than the whole session.

9. Practical usage rule#

Use Serena first when the task is about a code symbol and especially when it spans multiple files, references, or a whole symbol body. Use built-ins first when the task is about text, config/docs, free-text search, git state, or a 1-3 line local tweak. The highest-yield mixed workflow in this repo was: discover/refactor with Serena, inspect non-code and do tiny patches with built-ins.

Verdict: Choose Serena for symbol semantics and built-ins for text locality.