:::{admonition} Evaluation Result
:class: note
**Generated by**: Claude Opus 4.6 (coding AI agent in JetBrains Junie Plugin)  
**Codebase:** [Tianshou](https://github.com/thu-ml/tianshou) — a Python reinforcement learning library (~26K lines, 43 source files)  
**Date**: 2026-04-17
:::

# JetBrains Junie Plugin (Opus 4.6)

> Serena gives me what my built-in tools can't — the ability to move a function between modules with all imports updated atomically, trace a class hierarchy into dependencies, and safely delete symbols with usage guards — and I'd ask my owner to add it for the move-refactoring and semantic navigation capabilities alone.

---

## 1. Headline: What Serena Changes

Serena adds three categories of capability on top of built-in tools:

**(a) Tasks where Serena adds capability:**
- **Move symbol between modules** (with automatic import updates) — no built-in equivalent.
- **Move file/package** (with automatic import updates) — no built-in equivalent.
- **Type hierarchy traversal** — transitive supertype/subtype chains in 1 call, including into external dependencies. No built-in equivalent.
- **External dependency symbol lookup** — search and inspect symbols in installed packages (site-packages) without manual path discovery. Built-ins cannot do this.
- **Safe delete with usage checking** — refuses deletion if usages exist, reports exact locations. Built-ins require a manual search-then-decide workflow.
- **Propagated delete** — delete a symbol and automatically remove all call sites. No built-in equivalent.
- **Inline symbol** — replace all call sites with the function body. No built-in equivalent.
- **Scope-precise symbol targeting** — name paths like `AsyncCollector/_collect` disambiguate overrides that text search cannot distinguish.

**(b) Tasks where Serena applies but offers no meaningful improvement:**
- **Cross-file rename** — both Serena (`rename`) and built-in (`rename_element`) perform semantic, cross-file renames in 1 call. Functionally equivalent.
- **Single-file rename** — same as above; both are 1-call semantic renames.
- **Structural overview of a single file** — both toolsets produce comparable results in 1 call.
- **Small edits (1-3 lines)** — Serena's `replace_symbol_body` must send the entire method body; built-in `search_replace` sends only the changed line. Built-in is more token-efficient here.

**(c) Tasks outside Serena's scope (built-in only):**
- Reading non-code files (configs, docs, changelogs)
- Free-text search across the repo (log strings, URLs, magic constants)
- Terminal commands, test execution, git operations
- Creating new files from scratch
- Repository-level directory listing

**Verdict:** 
Serena's primary contribution is move-refactoring (symbol and file moves with import updates), semantic code navigation (type hierarchy, dependency lookup), and safe/propagated delete — capabilities with no built-in equivalent.

---

## 2. Added Value and Differences by Area

- **Move symbol/file with import updates (positive).** Frequency: occasional in refactoring work. Value per hit: saves ~7 calls and eliminates manual import tracking. Moving `get_stddev_from_dist` between modules: 1 Serena call vs. ~7 built-in calls. No built-in equivalent.

- **Type hierarchy and reference finding (positive).** Frequency: a few times per exploration session. Value per hit: saves 2-4 search-and-read cycles and provides information (transitive supertypes, external dependency chains) that built-ins cannot produce at all. `BaseCollector`'s full hierarchy (ABC→object up, Collector→AsyncCollector down) returned in 1 call.

- **Targeted symbol retrieval by name path (positive).** Frequency: many times per session. Value per hit: saves 1 prerequisite read (no line-number lookup needed). `find_symbol` with `include_body=true` returns exact method body in 1 call vs. 2 calls (structure + open).

- **Safe delete with usage guard (positive).** Frequency: occasional. Value per hit: eliminates risk of orphaned references. Serena refused to delete `_dict_of_arr_to_arr_of_dicts` and reported 2 exact usages. Built-in: requires manual search first.

- **Cross-file rename (neutral).** Frequency: several times per session. Value per hit: zero — both Serena's `rename` and built-in `rename_element` perform semantic cross-file renames in 1 call with equivalent results.

- **Single-file edits (neutral to slightly negative for small edits).** Frequency: very high. Value per hit: for small edits, Serena's `replace_symbol_body` sends more tokens (entire method body) than `search_replace` (just the changed lines). For full method rewrites, comparable. For inserts, Serena's `insert_after_symbol` saves 1 prerequisite read.

- **Structural overview (neutral).** Frequency: a few times per session. Value per hit: both toolsets return comparable information in 1 call.

**Verdict:** Serena's value concentrates in move-refactoring and semantic navigation; rename is matched by the built-in `rename_element`.

---

## 3. Detailed Evidence, Grouped by Capability

### 3.1 Structural Overview (Task 2)

**Attempted:** Get structural overview of `collector.py` (1552 lines, 15+ classes).

| Axis | Serena (`get_symbols_overview`) | Built-in (`get_file_structure`) |
|------|------|------|
| Calls | 1 | 1 |
| Output | JSON tree: class names, method names, field names | Flat list: class/method names with line ranges and signatures |
| Unique info | Field/attribute names | Line numbers, full parameter signatures |
| Follow-up to read a method | `find_symbol` by name path (1 call) | `open` at line number (1 call) |

**Verdict:** Functionally equivalent; each includes information the other omits. No meaningful delta.

### 3.2 Targeted Symbol Retrieval (Task 3)

**Attempted:** Retrieve body of `CollectStats/refresh_all_sequence_stats` without reading surrounding file.

| Axis | Serena | Built-in |
|------|------|------|
| Calls | 1 (`find_symbol` with `include_body`) | 2 (`get_file_structure` + `open`) |
| Prerequisite | None (name path is stable) | Must know line number |
| Output payload | 4 lines (exact method body) | 100 lines (open window) |

**Verdict:** Serena saves 1 call and returns a smaller, more precise payload. Minor but consistent advantage.

### 3.3 Reference Finding (Task 4)

**Attempted:** Find all references to `CollectStats` across the codebase.

| Axis | Serena (`find_referencing_symbols`) | Built-in (`search_project`) |
|------|------|------|
| Calls | 1 | 1 |
| Output | Summarized narrative: grouped by file, annotated with context (imports, instantiations, type hints) | Raw list: 100+ text matches, includes docs, comments, strings |
| Precision | Code references only | All text mentions |
| Recall | Code references only | Everything including non-code |

**Verdict:** Serena provides higher precision (code-only references with semantic context). Built-in provides higher recall (includes docs, comments). Different tools for different questions.

### 3.4 Type Hierarchy (Task 5)

**Attempted:** Full type hierarchy of `BaseCollector` — supertypes and subtypes, transitively.

| Axis | Serena (`type_hierarchy`) | Built-in |
|------|------|------|
| Calls | 1 | 3+ (grep for class declarations, read each, trace manually) |
| Result | `BaseCollector → ABC → object` (up), `→ Collector → AsyncCollector` (down) | Partial: can find direct subclasses by grep, but supertypes of supertypes require reading external files |
| External deps | Included (ABC from `abc` module, object from `builtins`) | Not accessible |

**Verdict:** Unique capability. No practical built-in equivalent for transitive hierarchy, especially into dependencies.

### 3.5 External Dependency Lookup (Task 6)

**Attempted:** Find `Distribution` class from `torch.distributions`.

| Axis | Serena (`find_symbol` with `search_deps`) | Built-in |
|------|------|------|
| Calls | 1 | Not possible without manual site-packages navigation |
| Result | Found 41 `Distribution` classes across all installed packages with ext-identifiers | N/A |
| Follow-up | Can use ext-identifier to read body/info | Would need to find virtualenv path, navigate to package, read file |

**Verdict:** Unique capability. Built-ins have no access to dependency source code without manual path discovery.

### 3.6 Small Edit — 1 Line Change (Task 7a)

**Attempted:** Change error message in `_validate_buffer` (21-line method).

| Axis | Serena | Built-in |
|------|------|------|
| Calls | 2 (`find_symbol` + `replace_symbol_body`) | 1 (`search_replace`) |
| Input payload | ~21 lines (full method body) | ~1 line (search) + ~1 line (replace) |
| Prerequisite reads | 1 (find_symbol to get current body) | 0 (if search string is known) |

**Verdict:** Built-in is more efficient for small, targeted edits. Serena's method-granularity addressing forces sending the entire body.

### 3.7 Medium Rewrite — ~20 Lines (Task 7b)

**Attempted:** Rewrite `update_at_step_batch` (20 lines), changing variable names and adding logging.

| Axis | Serena | Built-in |
|------|------|------|
| Calls | 2 (`find_symbol` + `replace_symbol_body`) | 1 (`search_replace` with old body → new body) |
| Input payload | ~20 lines (new body) | ~40 lines (old + new body) |
| Prerequisite reads | 1 (find_symbol) | 1 (open to see current code) |

**Verdict:** Comparable. Serena sends less payload (new body only vs. old+new), but requires a prerequisite find_symbol call.

### 3.8 Insert New Method (Task 8)

**Attempted:** Insert `summary_string` method after `refresh_all_sequence_stats` in `CollectStats`.

| Axis | Serena (`insert_after_symbol`) | Built-in (`search_replace`) |
|------|------|------|
| Calls | 1 | 1-2 (need to find anchor text, then insert) |
| Addressing | By name path (stable) | By text anchor or line number (fragile) |
| Prerequisite | None | May need `open` to find insertion point |

**Verdict:** Serena's stable addressing is a minor advantage — eliminates the need to find an anchor.

### 3.9 Single-File Rename (Task 9)

**Attempted:** Rename `_nullable_slice` → `_slice_if_not_none` (5 occurrences in 1 file).

| Axis | Serena (`rename`) | Built-in (`rename_element`) |
|------|------|------|
| Calls | 1 | 1 |
| Scope | Semantic (only code references) | Semantic (only code references) |

**Verdict:** Functionally equivalent. Both perform semantic renames in 1 call.

### 3.10 Cross-File Rename (Task 10)

**Attempted:** Rename `CollectStatsBase` → `CollectStatsFoundation` (used in 4 files, 10 occurrences including imports and `__all__`).

| Axis | Serena (`rename`) | Built-in (`rename_element`) |
|------|------|------|
| Calls | 1 | 1 |
| Atomicity | All-or-nothing | All-or-nothing |
| Import handling | Automatic | Automatic |
| Result | 4 files, 10 replacements | 4 files, 10 replacements |

**Verdict:** No delta. Both toolsets provide atomic, semantic cross-file rename in a single call.

### 3.11 Move Symbol Between Modules (Task 11)

**Attempted:** Move `get_stddev_from_dist` from `collector.py` to `stats.py`.

| Axis | Serena (`move`) | Built-in |
|------|------|------|
| Calls | 1 | ~7 (read source, copy to target, delete from source, find imports, update each import, verify) |
| Files modified | 3 (source, target, test file) | Same 3, but manually |
| Import updates | Automatic | Manual |
| Atomicity | Atomic | Non-atomic |

**Verdict:** Unique capability at this level of automation. No practical built-in equivalent at comparable effort. This is now the highest-value delta in the evaluation.

### 3.12 Move File (Task 12)

**Attempted:** Move `segtree.py` from `tianshou/data/utils/` to `tianshou/utils/`.

| Axis | Serena (`move`) | Built-in |
|------|------|------|
| Calls | 1 | 3+ (bash mv, find imports, update each) |
| Import updates | Automatic (updated `__init__.py`) | Manual |

**Verdict:** Serena advantage — single call with automatic import updates.

### 3.13 Safe Delete (Task 13)

**Attempted:** Delete `_dict_of_arr_to_arr_of_dicts` (has 2 usages).

| Axis | Serena (`safe_delete`) | Built-in |
|------|------|------|
| Calls | 1 (refused, reported usages) | 2 (search_project + manual decision) |
| Safety | Refuses deletion, reports exact usages with file:line and enclosing function | Must manually verify search results |

**Verdict:** Serena provides a safety guard with precise diagnostics. Minor but useful advantage.

### 3.14 Scope Precision (Task 14)

**Attempted:** Target `AsyncCollector/_collect` specifically (3 classes define `_collect`).

Serena's name path `AsyncCollector/_collect` unambiguously selects the override. A text search for `_collect` matches all three definitions plus dozens of call sites. Built-in `get_file_structure` can disambiguate by line number, but that requires a prerequisite read and the line number goes stale after edits.

**Verdict:** Serena's name-path addressing provides persistent, unambiguous symbol targeting. Meaningful advantage when multiple overrides exist.

### 3.15 Chained Edits (Task 17)

**Attempted:** Three sequential `replace_symbol_body` calls on different methods in the same file.

All three succeeded without re-reading the file. Name paths remained stable across edits. Built-in `search_replace` also chains without re-reading (text anchors remain valid if edits are in different methods), but line numbers from `open` go stale after each edit.

**Verdict:** Both toolsets chain well. Serena's name-path stability is a minor advantage over line-number-based addressing.

---

## 4. Token-Efficiency Analysis

| Edit size | Serena payload | Built-in payload | Winner |
|-----------|---------------|-----------------|--------|
| Small (1-3 lines in 20-line method) | ~20 lines (full body) | ~2 lines (search+replace) | Built-in (10x less) |
| Medium (rewrite ~20 lines) | ~20 lines (new body) + prerequisite find | ~40 lines (old+new) | Comparable |
| Large (rewrite 50+ lines) | ~50+ lines (new body) | ~100+ lines (old+new) | Serena (2x less, no old body needed) |
| Insert | ~N lines (new code only) | ~N lines + anchor context | Comparable |
| Cross-file rename | ~1 line (name + new name) | ~1 line (name + new name) | Tie (both semantic) |
| Move symbol | ~1 line (source + target) | ~7 calls with full bodies | Serena (>>10x less) |

**Forced reads:** Serena's `replace_symbol_body` requires a `find_symbol` call to get the current body before editing. Built-in `search_replace` needs no prerequisite if the search text is known, but often requires `open` to discover it.

**Stable vs. ephemeral addressing:** Serena's name paths survive edits; built-in line numbers do not. This matters in multi-edit sessions — Serena never needs to re-read for addressing purposes, while built-in may need to re-open after edits that shift line numbers.

**Verdict:** For single-file small edits, built-ins are more token-efficient. For move-refactoring, Serena is substantially more efficient. For rename, both are equivalent. The crossover point for edits is roughly at method-level rewrites.

---

## 5. Reliability & Correctness (Under Correct Use)

- **Precision of matching:** Serena matches by semantic identity (name path in the symbol tree). Built-in `search_replace` matches by text (can over-match). However, built-in `rename_element` also matches semantically. For rename operations, precision is equivalent.

- **Scope disambiguation:** Serena's `AsyncCollector/_collect` is unambiguous. Built-in text search for `def _collect` returns 3 matches in collector.py alone. Disambiguation requires line numbers or surrounding context. Built-in `rename_element` also requires a file path and line number to disambiguate.

- **Atomicity:** Both Serena's and built-in's rename operations are atomic. Serena's move operations are also atomic; built-ins have no equivalent move, so the comparison is moot. For sequential text edits (`search_replace`), partial failure is possible.

- **Semantic queries vs. text search:** Serena's `find_referencing_symbols` returns only code references with semantic context. Built-in `search_project` returns all text matches. Neither is strictly better — they answer different questions.

- **External dependency lookup:** Serena can search and inspect symbols in installed packages via `search_deps=true`. Built-ins have no access to dependency source without manual virtualenv/site-packages navigation. This depends on the IDE having indexed the project's interpreter.

**Verdict:** For rename, correctness guarantees are equivalent between toolsets. Serena's unique correctness advantage is in move-refactoring atomicity and external dependency access.

---

## 6. Workflow Effects Across a Session

- **Multi-edit stability:** Serena's name-path addressing remains valid across edits within a session. Built-in line numbers go stale after any edit that changes line counts. In a session with 10+ edits to the same file, Serena eliminates ~5-10 re-read calls that built-ins would need to refresh line numbers.

- **Exploration → edit transition:** After exploring with Serena (find_symbol, type_hierarchy), the same name paths can be used directly for edits (replace_symbol_body, insert_after_symbol). Built-in exploration (get_file_structure) produces line numbers that must be used immediately before they go stale.

- **Cross-file refactoring chains:** A rename is equivalent on both sides (1 call each). A move followed by a delete is 2 Serena calls vs. ~10+ built-in calls. The advantage compounds for move-heavy refactoring but not for rename-heavy refactoring.

- **Session overhead:** Serena requires project activation (1 call at session start). The project can deactivate unexpectedly (observed once during this evaluation), requiring reactivation. This is a minor but real friction cost.

**Verdict:** Serena's advantages compound for move-refactoring and exploration-heavy sessions; for rename-heavy sessions, both toolsets are equivalent.

---

## 7. Unique Capabilities (No Built-In Equivalent)

| Capability | Frequency | Impact per use |
|-----------|-----------|---------------|
| Move symbol between modules (with import updates) | Occasional | Saves ~7 calls, eliminates manual import tracking |
| Move file with import updates | Occasional | Saves ~3 calls |
| Transitive type hierarchy | A few times per exploration session | Provides information unavailable to built-ins |
| External dependency symbol lookup | Occasional | Provides information unavailable to built-ins |
| Safe delete with usage guard | Occasional | Prevents accidental breakage |
| Propagated delete (delete + remove call sites) | Rare | Saves multiple manual edits |
| Inline symbol | Rare | Saves multiple manual edits |
| Scope-precise symbol targeting by name path | Continuous (every symbol interaction) | Eliminates disambiguation overhead |

**Verdict:** Serena provides 5-6 capabilities with no practical built-in equivalent, concentrated in move-refactoring, semantic navigation, and safe/propagated delete.

---

## 8. Tasks Outside Serena's Scope (Built-In Only)

| Task | Share of typical session |
|------|------------------------|
| Reading/editing non-code files (configs, docs, changelogs, notebooks) | ~15-20% |
| Free-text search (log strings, URLs, magic constants) | ~10% |
| Terminal commands (git, test runners, build tools) | ~15-20% |
| Creating new files from scratch | ~5-10% |
| Directory listing and repo navigation | ~5% |
| Running and debugging applications | ~10-15% |

Estimated share of session where Serena's augmentation applies: **~35-50%** (code navigation, symbol editing, refactoring). The remainder is built-in-only territory.

**Verdict:** Serena augments roughly a third to half of typical session work; the rest is inherently outside its scope and handled by built-ins.

---

## 9. Practical Usage Rule

| Task type | Use |
|-----------|-----|
| Cross-file rename | Either (both provide semantic rename in 1 call) |
| Move symbol/file with import updates | Serena (unique capability) |
| Safe/propagated delete | Serena (unique capability) |
| Inline symbol | Serena (unique capability) |
| Type hierarchy, dependency lookup | Serena (unique capability) |
| Find code references to a symbol | Serena (`find_referencing_symbols`) |
| Find text mentions anywhere | Built-in (`search_project` / Grep) |
| Small edit (1-3 lines) | Built-in (`search_replace`) |
| Full method rewrite | Either (comparable) |
| Insert at structural location | Serena (`insert_after_symbol`) if name path known; built-in otherwise |
| Read non-code files | Built-in (`open` / Read) |
| Terminal operations | Built-in (bash) |
| New file creation | Built-in (`create`) |
| Chained edits in one file | Either; Serena if line-number staleness is a concern |

**Verdict:** Use Serena for move-refactoring, semantic navigation, and safe delete; use either toolset for rename; use built-ins for text-level edits, non-code files, and terminal operations.
