:::{admonition} Evaluation Result
:class: note
**Generated by:** GPT-5.4 (medium) in Copilot Cli  
**Codebase:** [ente](https://github.com/ente-io/ente) - a large monorepo in Dart, TypeScript, Go, Rust, and other languages.
:::

# Copilot CLI (GPT-5.4, medium)

> As a coding agent, I’d absolutely ask my owner to add Serena because it makes me noticeably sharper and calmer on
real code—especially symbol-aware navigation, cross-file refactors, and monorepo dependency jumps—while I still lean
on built-ins for tiny text edits and non-code work


**Baseline.** I started from source code only, avoided repo docs/notes, and ran each reversible experiment against the repo directly. After every edit/refactor experiment, the working tree returned to its baseline state.

## 1. Headline: what Serena changes

Serena changes the workflow **when the task is about code symbols rather than raw text**. In this repo, the practical delta was:

1. **Added capability / materially better workflow.** Symbol-aware navigation and refactors in TypeScript: structural overviews, precise code-only references, hierarchy queries, symbol-targeted rename, file move with import updates, and inline. These usually collapsed a 2-6 step built-in chain into **one semantic operation after discovery**, and reduced manual scope verification.
2. **Applies but offers little or no improvement.** Small local edits inside an already-understood method. Built-ins can patch only the changed lines; Serena's body replacement resends the whole symbol, so it was often **less payload-efficient** for 1-3 line tweaks.
3. **Outside Serena's scope.** Non-code reads, free-text search, git inspection, config/package files, and other text-first tasks. Built-ins remained the natural tools there.

Two important observed limits constrained Serena's delta here: **the strongest measured gains were concentrated in the TypeScript desktop app and the Rust core crate where I ran the hands-on comparisons**, and **some refactors still carried diff-shape tradeoffs** such as formatting churn or unexpected target-file choices.

**Verdict:** In this repo, Serena was a strong TypeScript symbol layer on top of the built-ins, not a general replacement for text/file work.

## 2. Added value and differences by area

| Area | What changed vs built-ins | Frequency | Value per hit |
| --- | --- | --- | --- |
| **Cross-file symbol refactors** | `rename`, file `move`, and `inline` turned manual search/edit/update chains into one semantic op. The `wait -> delay` rename updated 4 files from 1 symbol definition; moving `http.ts` updated the importing file automatically. | Medium | High: typically **2-5 calls saved** plus less manual scope checking |
| **Code-only discovery** | Symbol overview, symbol body retrieval, reference search, and type hierarchy returned code structure directly instead of raw text matches. For `wait`, Serena returned **3 real code-use files**; `rg` returned **7 files**, including docs/comments and English-word hits. | High | Medium: usually **1-3 follow-up reads/filters avoided** |
| **Stable addressing** | Name paths stayed reusable across multiple edits (`createMainWindow`, `openStreetMapUserAgent`, `wait`); built-in line ranges had to be reacquired after edits. | Medium | Medium: less re-reading, less stale-context risk |
| **Small in-method edits** | Serena was not more efficient. Replacing `AutoLauncher/toggleAutoLaunch` required resending the full method body, while the built-in patch changed only the touched lines. | High | Low negative: built-ins used **smaller edit payloads** |
| **External dependency lookup in a monorepo** | Once indexing was available, Serena resolved Electron types from `desktop/node_modules`, `next-electron-server` declarations from the desktop package, and Rust crate symbols from Cargo registry sources. In a monorepo, that removes a manual "which package owns this dependency?" step. | Medium | Medium-High: usually **1-3 searches plus path discovery avoided** |

**Verdict:** Serena's highest-value delta was semantic refactoring plus dependency-aware code lookup in the TypeScript/Rust parts of the monorepo; its weakest area was tiny local edits.

## 3. Detailed evidence, grouped by capability

### 3.1 Codebase understanding

#### Task 1: high-level repository overview

- **Attempted:** top-level layout and likely code-heavy areas.
- **Serena chain:** `serena-list_dir(.)` -> directory list.
- **Built-in chain:** not separately needed; Serena added no unique value beyond a normal directory listing here.
- **Payloads:** one short directory listing either way.
- **Finding:** **no meaningful delta**; this is plain filesystem exploration.

**Verdict:** For repo layout, Serena was neutral.

#### Task 2: structural overview of a large file, plus the concrete next step

- **Target file:** `desktop/src/main.ts` (753 lines).
- **Serena chain:** `get_symbols_overview(main.ts, depth=1)` -> concise symbol map of top-level functions and nested locals under `main`; next step `find_symbol(createMainWindow, include_body=true)`.
- **Built-in chain:** `rg` on `const|function|class|export` in `main.ts` -> flat text hits; next step `view` of lines `331-439` to read `createMainWindow`.
- **Payloads observed:**
    - Serena overview output: compact symbol list for the file; next-step body fetch returned only the selected symbol body.
    - Built-in overview output: many matching lines without structure; next-step read required **~109 lines** of file content.
- **Delta:** Serena's overview was not just shorter; it also supplied **stable symbol names** for the follow-up call. Built-ins could answer the question, but only after a second text-localizing step.

**Verdict:** Serena materially improved the "overview -> inspect one function" flow by making the follow-up call symbol-based instead of line-based.

#### Task 3: retrieve a specific class method body without reading the surrounding file

- **Target symbol:** `AutoLauncher/toggleAutoLaunch` in `desktop/src/main/services/auto-launcher.ts`.
- **Serena chain:** `find_symbol(AutoLauncher/toggleAutoLaunch, include_body=true)` -> exact method body.
- **Built-in chain:** `view` of the relevant file range (`20-40`) after locating the method.
- **Payloads observed:**
    - Serena returned the **10-line method body** only.
    - Built-in read returned **21 lines** of surrounding class context.
- **Delta:** Serena saved one localization step and avoided unrelated lines.

**Verdict:** Serena added a real but modest efficiency gain for targeted method retrieval.

#### Task 4: find all references for one non-trivial symbol; compare code-use recall/precision vs text mention search

- **Target symbol:** `wait` in `desktop/src/main/utils/common.ts`.
- **Serena chain:** `find_referencing_symbols(wait)` -> 3 files with symbol contexts: `main.ts`, `ffmpeg-worker.ts`, `ml-worker.ts`.
- **Built-in chain:** `rg \bwait\b desktop/src` -> 7 files including:
    - real uses/imports,
    - comments/doc strings (`preload.ts`, `watch.ts`, `main.ts` prose),
    - the definition itself,
    - doc mentions in `desktop/docs/release.md` when searching broader scope.
- **Payloads observed:**
    - Serena: one structured result grouped by referencing symbol.
    - Built-ins: one broader result, but it needed human filtering to answer "who uses this in code?"
- **Delta:** Serena improved **precision**, not just convenience. For the code-only question, built-ins needed extra filtering or extra reads.

**Verdict:** Serena clearly improved reference search when the question is semantic ("who uses this in code?") rather than textual.

#### Task 5: supertypes / subclasses / implementations

- **Equivalent used:** interface hierarchy in `web/apps/ensu/src/services/llm/inference.ts`, because this TS area had interface implementations rather than rich class inheritance.
- **Serena chain:** `type_hierarchy(InferenceBackend, both)` -> `WasmInference`, `TauriInference`; `type_hierarchy(WasmInference, both)` -> supertype `InferenceBackend`.
- **Built-in chain:** `rg InferenceBackend|implements InferenceBackend` -> manual reconstruction from four text matches.
- **Payloads observed:** Serena returned the hierarchy directly; built-ins returned only raw declarations/usages.
- **Delta:** Serena removed the manual synthesis step. Built-ins were sufficient here because the hierarchy was shallow, but that was because the example was small.

**Verdict:** Serena added moderate value for hierarchy queries; the value grows with hierarchy depth.

#### Task 6: external dependency symbol lookup

- **Targets used after indexing was available:** `BrowserWindow` and `serveNextAt` in the desktop TypeScript app, plus `Url` and `Zeroizing` in `rust/core`.
- **Serena chain (TS):**
    - `find_declaration(new BrowserWindow(...), include_body=true)` -> `desktop/node_modules/electron/electron.d.ts`, body `class BrowserWindow extends Electron.BrowserWindow {}`
    - `find_declaration(import serveNextAt ... , include_body=true)` -> `desktop/node_modules/next-electron-server/index.d.ts`, body `declare function serveNextAt(uri: string, options?: Options): void;`
    - `find_symbol(..., search_deps=true)` on those dependency files returned dependency-side docs.
- **Serena chain (Rust):**
    - `find_declaration(use reqwest::{Response, Url};, include_body=true)` -> `<ext:lib.rs|...>` external symbol `Url[0]` with the struct body
    - `find_declaration(use zeroize::Zeroizing;, include_body=true)` -> `<ext:lib.rs|...>` external symbol `Zeroizing[0]` with the struct body
    - `find_symbol(..., relative_path=<ext...>, search_deps=true)` returned dependency-side docs for those external symbols.
- **Built-in equivalent chain:**
    - Manually infer the correct monorepo-local dependency root (`desktop/node_modules`, not repo root),
    - or manually inspect Cargo metadata / `Cargo.lock`,
    - then open the resolved dependency files directly (for Rust, under the Cargo registry).
- **Payloads observed:** Serena returned the declaration target and a small signature/body directly; built-ins required **package-root discovery first**, which is a real extra step in a monorepo.
- **Delta:** Serena **does add capability and efficiency here** once indexing exists. The gain is larger in this monorepo than in a single-package repo because dependency ownership is split across package-local Node dependencies and shared Cargo registry sources.

**Verdict:** With indexing available, Serena added meaningful external dependency lookup, and the value was amplified by the monorepo layout.

### 3.2 Single-file edits

#### Task 7a: small tweak (1-3 lines inside a method)

- **Change:** rename local `autoLaunch` -> `launcher` inside `AutoLauncher/toggleAutoLaunch`.
- **Built-in chain:** `view(auto-launcher.ts, 20-40)` -> `apply_patch` on 3 changed lines -> `git diff`.
- **Serena chain:** `find_symbol(toggleAutoLaunch, include_body=true)` -> `replace_symbol_body(toggleAutoLaunch)` -> `git diff`.
- **Payloads observed:**
    - Built-ins: read **21 lines**, patch changed **3 logical lines**.
    - Serena: fetched **10-line body**, resent **full 10-line body**.
- **Delta:** same result, same number of main steps, but the symbolic edit resent untouched lines.

**Verdict:** For tiny in-method tweaks, built-ins were more payload-efficient and Serena added no real workflow advantage.

#### Task 7b: medium rewrite (~10-30 lines)

- **Change:** rewrite `uniqueSavePath` to use `candidatePath` + `for` loop.
- **Built-in chain:** `view(main.ts, 500-540)` -> `apply_patch` replacing the function body -> `git diff`.
- **Serena chain:** `find_symbol(uniqueSavePath, include_body=true)` -> `replace_symbol_body(uniqueSavePath)` -> `git diff`.
- **Payloads observed:**
    - Built-ins: read **41 lines** to safely anchor a **~10-line** rewrite.
    - Serena: fetched the **11-line symbol body** and resent the rewritten body only.
- **Delta:** Here Serena was more efficient: less prerequisite read volume and no dependence on surrounding file context.

**Verdict:** Serena was better for medium symbol-sized rewrites.

#### Task 7c: large / whole-body rewrite

- **Change:** rewrite the entire `createMainWindow` body.
- **Built-in chain:** `view(main.ts, 331-439)` -> `apply_patch` replacing the function body -> `git diff`.
- **Serena chain:** `find_symbol(createMainWindow, include_body=true)` -> `replace_symbol_body(createMainWindow)` -> `git diff`.
- **Payloads observed:**
    - Built-ins: read **~109 lines** and patched the whole function.
    - Serena: fetched the **same symbol body** and resent the whole rewritten body.
- **Delta:** Serena still avoided a file-range read, but once the symbol itself dominates the payload, the token gap mostly disappears.

**Verdict:** For whole-body rewrites, Serena's gain was modest: better addressing, not dramatically smaller payload.

#### Task 8: insert a new function at a structural location

- **Insertion:** `waitSeconds` after `wait` in `desktop/src/main/utils/common.ts`.
- **Built-in chain:** `view(common.ts, 1-40)` -> `apply_patch` inserting after the existing function.
- **Serena chain:** `find_symbol(wait)` -> `insert_after_symbol(wait)`.
- **Payloads observed:**
    - Built-ins: read **26 lines** to place a **1-line** function.
    - Serena: no extra file-range read once the symbol name was known.
- **Delta:** Serena made the location structural instead of textual.

**Verdict:** Serena improved insertions when the location is "after symbol X" rather than "after line Y".

#### Task 9: rename a private helper used only within one file

- **Target symbol:** `openStreetMapUserAgent` in `desktop/src/main.ts`.
- **Built-in chain:** `view`/`rg` to find call site + definition -> `apply_patch` updating both.
- **Serena chain:** `rename(openStreetMapUserAgent -> buildOpenStreetMapUserAgent)`.
- **Payloads observed:**
    - Built-ins: two textual sites had to be found and updated manually.
    - Serena: one rename call, terse success response (`"Success"`).
- **Delta:** small call-count improvement, larger correctness improvement when the file is bigger or the name is less unique.

**Verdict:** Serena was somewhat better even for single-file private renames because it removed manual site enumeration.

### 3.3 Multi-file changes

#### Task 10: rename a symbol across several files including imports

- **Target symbol:** `wait` -> `delay`.
- **Built-in chain:** read `common.ts`, `main.ts`, `ffmpeg-worker.ts`, `ml-worker.ts` -> one multi-file `apply_patch` -> `git diff`.
- **Serena chain:** `rename(wait -> delay)` on the defining symbol -> `git diff`.
- **Payloads observed:**
    - Built-ins: required reading **4 files** and manually updating export, imports, and call sites.
    - Serena: one semantic rename updated the same 4 files.
- **Success signals:** built-ins only showed success via resulting diff; Serena returned `"Success"`.
- **Delta:** this is one of Serena's clearest wins: same final diff, far less manual scope work.

**Verdict:** Serena strongly improved multi-file renames by collapsing discovery + edit into one symbol-based refactor.

#### Task 11: move a symbol from one module to another, updating imports

- **Target:** move `nullToUndefined` out of `common.ts`.
- **Serena chain executed:** `move(nullToUndefined, target_relative_path=http.ts)` -> `git diff`.
- **Observed result:** Serena removed the symbol from `common.ts`, updated `ffmpeg-worker.ts`, but created a **new file** `desktop/src/main/utils/nullToUndefined.ts` instead of merging into `http.ts`.
- **Built-in equivalent:** would require manually copying the symbol into the intended target module, updating imports, then deleting the old definition.
- **Delta:** Serena still automated the cross-file update, but **did not provide the specific "move into existing module" behavior I was testing**.

**Verdict:** Serena added partial value for symbol moves here, but not the full capability of "move into a chosen existing TS file".

#### Task 12: move a file/package and update imports

- **Target:** move `desktop/src/main/utils/http.ts` to `desktop/src/main/services/http.ts`.
- **Serena chain:** `move(file http.ts -> services/)` -> `git diff`.
- **Observed result:** the file was renamed/moved and `ffmpeg-worker.ts` import updated from `../utils/http` to `./http`.
- **Built-in equivalent:** locate all imports, move the file, patch each import path, then verify.
- **Delta:** this was a real one-call semantic file move.

**Verdict:** Serena materially improved file moves that require import updates.

#### Task 12 (safe delete with no remaining usages)

- **Attempted:** searched for naturally unused TS symbols in the working areas (`main.ts`, `common.ts`, `temp.ts`, `inference.ts`) and checked several candidates (`registerForEnteLinks`, `minimumWindowSize`, `AutoLauncher/isEnabled`, `openStreetMapUserAgent`, `safeJson`, `buildSamplingConfig`).
- **Observed result:** every plausible candidate still had live references.
- **Outcome:** **no suitable candidate found in the TS areas where Serena was operational**, so I skipped this comparison instead of forcing an invalid input.

**Verdict:** No evidence either way here because the repo did not offer a clean unused-symbol candidate in the code areas Serena handled reliably.

#### Task 13: delete a symbol and propagate deletion to call sites

- **Attempted:** looked for a helper whose call sites could be semantically removed rather than inlined or manually rewritten.
- **Observed result:** the good candidates in this repo were better modeled as **inline** refactors, not delete-with-propagation.
- **Outcome:** **no suitable candidate**; skipped rather than using an unsafe input.

**Verdict:** No measured delta here because the available candidates were inline candidates, not safe propagate-delete candidates.

#### Task 13 (inline a small helper)

- **Target symbol:** `waitForRendererDevServer`.
- **Built-in chain:** `view` call site + definition -> `apply_patch` replacing `await waitForRendererDevServer()` with `await wait(1000)` and deleting the helper -> `git diff`.
- **Serena chain:** `inline(waitForRendererDevServer, keep_definition=false)` -> `git diff`.
- **Observed result:** both achieved the inline, but Serena also rewrote unrelated import formatting at the top of the file.
- **Success signals:** built-ins: resulting diff; Serena: `{"status":"SUCCESS"}`.
- **Delta:** Serena added the unique semantic refactor, but in this run it also introduced **format churn outside the logical change**.

**Verdict:** Serena added real inline capability, with a low-frequency but real tradeoff of broader formatting churn.

### 3.4 Reliability & correctness-oriented checks

#### Task 14: scope precision

- **Demonstrated with:** `AutoLauncher/toggleAutoLaunch`, `openStreetMapUserAgent`, and `InferenceBackend`.
- **Serena:** symbol names and name paths targeted the exact code entity.
- **Built-ins:** text search for names such as `wait` or `writeToTemporaryFile` over-matched comments, docs, and multiple textual occurrences.
- **Delta:** Serena's unit of work was the symbol; built-ins' unit was the matching line.

**Verdict:** Serena was reliably more precise whenever the target was a symbol rather than a string.

#### Task 15: atomicity

- **Observed:** Serena rename/file-move/inline each ran as one refactor operation after symbol selection.
- **Built-ins:** a single `apply_patch` can update multiple files atomically as a patch, but it **cannot discover missed sites**; semantic completeness remains manual.
- **Delta:** Serena's advantage was not transactional all-or-none patching; it was **scope computation**.

**Verdict:** Serena improved semantic completeness more than patch atomicity.

#### Task 16: success signals

- **Observed Serena success outputs:** `OK` for body replacement, `"Success"` for rename, JSON result for move, `{"status":"SUCCESS"}` for inline.
- **Observed built-in success outputs:** only indirect evidence via `git diff` / clean revert.

**Verdict:** Serena gave clearer machine-readable success signals for refactors than the built-ins did.

## 4. Token-efficiency analysis

### By edit size

| Edit size | Built-ins | Serena | More efficient |
| --- | --- | --- | --- |
| **Small tweak** (`toggleAutoLaunch`) | Read ~21 lines, patch only changed lines | Fetch 10-line body, resend full 10-line body | **Built-ins** |
| **Medium rewrite** (`uniqueSavePath`) | Read ~41 lines to safely patch ~10 lines | Fetch 11-line body, resend 11-line body | **Serena** |
| **Large rewrite** (`createMainWindow`) | Read ~109 lines, patch whole body | Fetch ~same symbol body, resend whole body | **Near tie**, slight Serena advantage only from structural targeting |
| **Cross-file rename** (`wait -> delay`) | Read 4 files, craft 4-file patch | One rename after discovery | **Serena by a large margin** |

### Forced reads

- Built-ins often needed a **localization read** before the edit.
- Serena avoided that when the symbol was already known, but not when the task itself required understanding the body.

### Stable vs ephemeral addressing

- Serena's addresses (`createMainWindow`, `wait`, `openStreetMapUserAgent`) stayed useful across later operations.
- Built-in line ranges from `view` became stale after edits, so later text operations required re-grepping or re-viewing.

**Verdict:** Serena is most token-efficient for medium-to-large symbol work and cross-file refactors; built-ins stay leaner for tiny local edits.

## 5. Reliability & correctness (under correct use)

- **Precision of matching:** Serena's reference search answered "who uses this in code?" better than `rg`, which mixed real uses with prose/comment matches.
- **Scope disambiguation:** Serena targeted exact symbols (`AutoLauncher/toggleAutoLaunch`, `InferenceBackend`) rather than relying on unique text strings.
- **Atomicity:** Serena computed and updated semantic scope in one refactor call; built-ins could batch edits, but only after manual scope discovery.
- **Semantic queries vs text search:** hierarchy and references were the strongest examples. Built-ins could reconstruct them, but only with manual interpretation.
- **External dependencies:** after indexing was available, Serena resolved desktop TypeScript dependencies into package-local declaration files under `desktop/node_modules` and Rust dependencies into external Cargo sources such as `url` and `zeroize`. Built-ins could still reach those files, but only after manual package-root or registry-path discovery.
- **Monorepo effect:** this repo magnified Serena's dependency-lookup value because "the dependency source" was not at one obvious global root. Serena jumped from app code to the right package-local or registry-backed dependency context directly.

**Verdict:** Serena improved correctness by narrowing work to exact symbols and by resolving dependencies across monorepo boundaries.

## 6. Workflow effects across a session

- **Advantages compounded** when I stayed in symbol space. Example: `get_symbols_overview(main.ts)` produced symbol names that I later reused for `find_symbol(createMainWindow)`, `rename(openStreetMapUserAgent)`, and `inline(waitForRendererDevServer)`.
- **Built-in workflows required refreshes**. Across repeated `main.ts` experiments, I repeatedly had to reacquire ranges with `view`/`rg` before editing because prior line-based context was no longer trustworthy.
- **In the monorepo, Serena also compounded by removing package-boundary bookkeeping.** In the desktop app I could jump from `main.ts` into Electron and `next-electron-server` declarations without first reasoning about workspace roots; in `rust/core` I could jump into Cargo-registry dependencies through external symbol handles instead of manually reconstructing registry paths from `Cargo.lock`.
- **The compounding effect disappeared** for tiny edits and non-code work, where built-ins were already direct and minimal.
- **One tradeoff compounded too:** some Serena refactors carried formatting side effects (notably `inline`), so the semantic benefit does not guarantee a surgically small diff.

**Verdict:** Serena's advantages compound most in code-centric monorepo sessions, where symbol reuse and dependency jumps save both re-reading and package-root discovery work.

## 7. Unique capabilities

| Capability with no practical one-step built-in equivalent | Frequency | Impact |
| --- | --- | --- |
| **Semantic cross-file rename from a single symbol definition** | Medium | High |
| **Type hierarchy query (implementations / supertypes)** | Low-Medium | Medium |
| **Inline refactor across call sites** | Low | High when applicable |
| **File move with import updates** | Low-Medium | High |
| **External dependency resolution from in-repo code into package-local or registry-backed sources** | Medium | Medium-High |

Built-ins can approximate all of these manually, but not as a single semantic operation.

**Verdict:** Serena did add unique practical capabilities, especially around refactors that require scope computation rather than text replacement.

## 8. Tasks outside Serena's scope (built-in only)

- Reading non-code files like `desktop/package.json`
- Free-text search such as `ente://app` or URL strings
- Git inspection / diff / cleanup
- Config/package/changelog/docs/notebook reading
- Exact textual patching once the line range is already known

In this session, these built-in-only tasks were **roughly 40% of the total operational steps by count**, but they were usually the low-complexity steps around the more valuable semantic work.

**Verdict:** A substantial share of everyday terminal work remains built-in-only, but Serena targets the higher-value symbol-heavy slice rather than the whole session.

## 9. Practical usage rule

Use **Serena first** when the task is about a **code symbol** and especially when it spans **multiple files, references, or a whole symbol body**. Use **built-ins first** when the task is about **text, config/docs, free-text search, git state, or a 1-3 line local tweak**. The highest-yield mixed workflow in this repo was: **discover/refactor with Serena, inspect non-code and do tiny patches with built-ins**.

**Verdict:** Choose Serena for symbol semantics and built-ins for text locality.
