Evaluation Result

Generated by: Claude Opus 4.6 (coding AI agent in JetBrains Junie Plugin)
Codebase: Tianshou — a Python reinforcement learning library (~26K lines, 43 source files)
Date: 2026-04-17

JetBrains Junie Plugin (Opus 4.6)#

Serena gives me what my built-in tools can’t — the ability to move a function between modules with all imports updated atomically, trace a class hierarchy into dependencies, and safely delete symbols with usage guards — and I’d ask my owner to add it for the move-refactoring and semantic navigation capabilities alone.


1. Headline: What Serena Changes#

Serena adds three categories of capability on top of built-in tools:

(a) Tasks where Serena adds capability:

  • Move symbol between modules (with automatic import updates) — no built-in equivalent.

  • Move file/package (with automatic import updates) — no built-in equivalent.

  • Type hierarchy traversal — transitive supertype/subtype chains in 1 call, including into external dependencies. No built-in equivalent.

  • External dependency symbol lookup — search and inspect symbols in installed packages (site-packages) without manual path discovery. Built-ins cannot do this.

  • Safe delete with usage checking — refuses deletion if usages exist, reports exact locations. Built-ins require a manual search-then-decide workflow.

  • Propagated delete — delete a symbol and automatically remove all call sites. No built-in equivalent.

  • Inline symbol — replace all call sites with the function body. No built-in equivalent.

  • Scope-precise symbol targeting — name paths like AsyncCollector/_collect disambiguate overrides that text search cannot distinguish.

(b) Tasks where Serena applies but offers no meaningful improvement:

  • Cross-file rename — both Serena (rename) and built-in (rename_element) perform semantic, cross-file renames in 1 call. Functionally equivalent.

  • Single-file rename — same as above; both are 1-call semantic renames.

  • Structural overview of a single file — both toolsets produce comparable results in 1 call.

  • Small edits (1-3 lines) — Serena’s replace_symbol_body must send the entire method body; built-in search_replace sends only the changed line. Built-in is more token-efficient here.

(c) Tasks outside Serena’s scope (built-in only):

  • Reading non-code files (configs, docs, changelogs)

  • Free-text search across the repo (log strings, URLs, magic constants)

  • Terminal commands, test execution, git operations

  • Creating new files from scratch

  • Repository-level directory listing

Verdict: Serena’s primary contribution is move-refactoring (symbol and file moves with import updates), semantic code navigation (type hierarchy, dependency lookup), and safe/propagated delete — capabilities with no built-in equivalent.


2. Added Value and Differences by Area#

  • Move symbol/file with import updates (positive). Frequency: occasional in refactoring work. Value per hit: saves ~7 calls and eliminates manual import tracking. Moving get_stddev_from_dist between modules: 1 Serena call vs. ~7 built-in calls. No built-in equivalent.

  • Type hierarchy and reference finding (positive). Frequency: a few times per exploration session. Value per hit: saves 2-4 search-and-read cycles and provides information (transitive supertypes, external dependency chains) that built-ins cannot produce at all. BaseCollector’s full hierarchy (ABC→object up, Collector→AsyncCollector down) returned in 1 call.

  • Targeted symbol retrieval by name path (positive). Frequency: many times per session. Value per hit: saves 1 prerequisite read (no line-number lookup needed). find_symbol with include_body=true returns exact method body in 1 call vs. 2 calls (structure + open).

  • Safe delete with usage guard (positive). Frequency: occasional. Value per hit: eliminates risk of orphaned references. Serena refused to delete _dict_of_arr_to_arr_of_dicts and reported 2 exact usages. Built-in: requires manual search first.

  • Cross-file rename (neutral). Frequency: several times per session. Value per hit: zero — both Serena’s rename and built-in rename_element perform semantic cross-file renames in 1 call with equivalent results.

  • Single-file edits (neutral to slightly negative for small edits). Frequency: very high. Value per hit: for small edits, Serena’s replace_symbol_body sends more tokens (entire method body) than search_replace (just the changed lines). For full method rewrites, comparable. For inserts, Serena’s insert_after_symbol saves 1 prerequisite read.

  • Structural overview (neutral). Frequency: a few times per session. Value per hit: both toolsets return comparable information in 1 call.

Verdict: Serena’s value concentrates in move-refactoring and semantic navigation; rename is matched by the built-in rename_element.


3. Detailed Evidence, Grouped by Capability#

3.1 Structural Overview (Task 2)#

Attempted: Get structural overview of collector.py (1552 lines, 15+ classes).

Axis

Serena (get_symbols_overview)

Built-in (get_file_structure)

Calls

1

1

Output

JSON tree: class names, method names, field names

Flat list: class/method names with line ranges and signatures

Unique info

Field/attribute names

Line numbers, full parameter signatures

Follow-up to read a method

find_symbol by name path (1 call)

open at line number (1 call)

Verdict: Functionally equivalent; each includes information the other omits. No meaningful delta.

3.2 Targeted Symbol Retrieval (Task 3)#

Attempted: Retrieve body of CollectStats/refresh_all_sequence_stats without reading surrounding file.

Axis

Serena

Built-in

Calls

1 (find_symbol with include_body)

2 (get_file_structure + open)

Prerequisite

None (name path is stable)

Must know line number

Output payload

4 lines (exact method body)

100 lines (open window)

Verdict: Serena saves 1 call and returns a smaller, more precise payload. Minor but consistent advantage.

3.3 Reference Finding (Task 4)#

Attempted: Find all references to CollectStats across the codebase.

Axis

Serena (find_referencing_symbols)

Built-in (search_project)

Calls

1

1

Output

Summarized narrative: grouped by file, annotated with context (imports, instantiations, type hints)

Raw list: 100+ text matches, includes docs, comments, strings

Precision

Code references only

All text mentions

Recall

Code references only

Everything including non-code

Verdict: Serena provides higher precision (code-only references with semantic context). Built-in provides higher recall (includes docs, comments). Different tools for different questions.

3.4 Type Hierarchy (Task 5)#

Attempted: Full type hierarchy of BaseCollector — supertypes and subtypes, transitively.

Axis

Serena (type_hierarchy)

Built-in

Calls

1

3+ (grep for class declarations, read each, trace manually)

Result

BaseCollector ABC object (up), Collector AsyncCollector (down)

Partial: can find direct subclasses by grep, but supertypes of supertypes require reading external files

External deps

Included (ABC from abc module, object from builtins)

Not accessible

Verdict: Unique capability. No practical built-in equivalent for transitive hierarchy, especially into dependencies.

3.5 External Dependency Lookup (Task 6)#

Attempted: Find Distribution class from torch.distributions.

Axis

Serena (find_symbol with search_deps)

Built-in

Calls

1

Not possible without manual site-packages navigation

Result

Found 41 Distribution classes across all installed packages with ext-identifiers

N/A

Follow-up

Can use ext-identifier to read body/info

Would need to find virtualenv path, navigate to package, read file

Verdict: Unique capability. Built-ins have no access to dependency source code without manual path discovery.

3.6 Small Edit — 1 Line Change (Task 7a)#

Attempted: Change error message in _validate_buffer (21-line method).

Axis

Serena

Built-in

Calls

2 (find_symbol + replace_symbol_body)

1 (search_replace)

Input payload

~21 lines (full method body)

~1 line (search) + ~1 line (replace)

Prerequisite reads

1 (find_symbol to get current body)

0 (if search string is known)

Verdict: Built-in is more efficient for small, targeted edits. Serena’s method-granularity addressing forces sending the entire body.

3.7 Medium Rewrite — ~20 Lines (Task 7b)#

Attempted: Rewrite update_at_step_batch (20 lines), changing variable names and adding logging.

Axis

Serena

Built-in

Calls

2 (find_symbol + replace_symbol_body)

1 (search_replace with old body → new body)

Input payload

~20 lines (new body)

~40 lines (old + new body)

Prerequisite reads

1 (find_symbol)

1 (open to see current code)

Verdict: Comparable. Serena sends less payload (new body only vs. old+new), but requires a prerequisite find_symbol call.

3.8 Insert New Method (Task 8)#

Attempted: Insert summary_string method after refresh_all_sequence_stats in CollectStats.

Axis

Serena (insert_after_symbol)

Built-in (search_replace)

Calls

1

1-2 (need to find anchor text, then insert)

Addressing

By name path (stable)

By text anchor or line number (fragile)

Prerequisite

None

May need open to find insertion point

Verdict: Serena’s stable addressing is a minor advantage — eliminates the need to find an anchor.

3.9 Single-File Rename (Task 9)#

Attempted: Rename _nullable_slice_slice_if_not_none (5 occurrences in 1 file).

Axis

Serena (rename)

Built-in (rename_element)

Calls

1

1

Scope

Semantic (only code references)

Semantic (only code references)

Verdict: Functionally equivalent. Both perform semantic renames in 1 call.

3.10 Cross-File Rename (Task 10)#

Attempted: Rename CollectStatsBaseCollectStatsFoundation (used in 4 files, 10 occurrences including imports and __all__).

Axis

Serena (rename)

Built-in (rename_element)

Calls

1

1

Atomicity

All-or-nothing

All-or-nothing

Import handling

Automatic

Automatic

Result

4 files, 10 replacements

4 files, 10 replacements

Verdict: No delta. Both toolsets provide atomic, semantic cross-file rename in a single call.

3.11 Move Symbol Between Modules (Task 11)#

Attempted: Move get_stddev_from_dist from collector.py to stats.py.

Axis

Serena (move)

Built-in

Calls

1

~7 (read source, copy to target, delete from source, find imports, update each import, verify)

Files modified

3 (source, target, test file)

Same 3, but manually

Import updates

Automatic

Manual

Atomicity

Atomic

Non-atomic

Verdict: Unique capability at this level of automation. No practical built-in equivalent at comparable effort. This is now the highest-value delta in the evaluation.

3.12 Move File (Task 12)#

Attempted: Move segtree.py from tianshou/data/utils/ to tianshou/utils/.

Axis

Serena (move)

Built-in

Calls

1

3+ (bash mv, find imports, update each)

Import updates

Automatic (updated __init__.py)

Manual

Verdict: Serena advantage — single call with automatic import updates.

3.13 Safe Delete (Task 13)#

Attempted: Delete _dict_of_arr_to_arr_of_dicts (has 2 usages).

Axis

Serena (safe_delete)

Built-in

Calls

1 (refused, reported usages)

2 (search_project + manual decision)

Safety

Refuses deletion, reports exact usages with file:line and enclosing function

Must manually verify search results

Verdict: Serena provides a safety guard with precise diagnostics. Minor but useful advantage.

3.14 Scope Precision (Task 14)#

Attempted: Target AsyncCollector/_collect specifically (3 classes define _collect).

Serena’s name path AsyncCollector/_collect unambiguously selects the override. A text search for _collect matches all three definitions plus dozens of call sites. Built-in get_file_structure can disambiguate by line number, but that requires a prerequisite read and the line number goes stale after edits.

Verdict: Serena’s name-path addressing provides persistent, unambiguous symbol targeting. Meaningful advantage when multiple overrides exist.

3.15 Chained Edits (Task 17)#

Attempted: Three sequential replace_symbol_body calls on different methods in the same file.

All three succeeded without re-reading the file. Name paths remained stable across edits. Built-in search_replace also chains without re-reading (text anchors remain valid if edits are in different methods), but line numbers from open go stale after each edit.

Verdict: Both toolsets chain well. Serena’s name-path stability is a minor advantage over line-number-based addressing.


4. Token-Efficiency Analysis#

Edit size

Serena payload

Built-in payload

Winner

Small (1-3 lines in 20-line method)

~20 lines (full body)

~2 lines (search+replace)

Built-in (10x less)

Medium (rewrite ~20 lines)

~20 lines (new body) + prerequisite find

~40 lines (old+new)

Comparable

Large (rewrite 50+ lines)

~50+ lines (new body)

~100+ lines (old+new)

Serena (2x less, no old body needed)

Insert

~N lines (new code only)

~N lines + anchor context

Comparable

Cross-file rename

~1 line (name + new name)

~1 line (name + new name)

Tie (both semantic)

Move symbol

~1 line (source + target)

~7 calls with full bodies

Serena (>>10x less)

Forced reads: Serena’s replace_symbol_body requires a find_symbol call to get the current body before editing. Built-in search_replace needs no prerequisite if the search text is known, but often requires open to discover it.

Stable vs. ephemeral addressing: Serena’s name paths survive edits; built-in line numbers do not. This matters in multi-edit sessions — Serena never needs to re-read for addressing purposes, while built-in may need to re-open after edits that shift line numbers.

Verdict: For single-file small edits, built-ins are more token-efficient. For move-refactoring, Serena is substantially more efficient. For rename, both are equivalent. The crossover point for edits is roughly at method-level rewrites.


5. Reliability & Correctness (Under Correct Use)#

  • Precision of matching: Serena matches by semantic identity (name path in the symbol tree). Built-in search_replace matches by text (can over-match). However, built-in rename_element also matches semantically. For rename operations, precision is equivalent.

  • Scope disambiguation: Serena’s AsyncCollector/_collect is unambiguous. Built-in text search for def _collect returns 3 matches in collector.py alone. Disambiguation requires line numbers or surrounding context. Built-in rename_element also requires a file path and line number to disambiguate.

  • Atomicity: Both Serena’s and built-in’s rename operations are atomic. Serena’s move operations are also atomic; built-ins have no equivalent move, so the comparison is moot. For sequential text edits (search_replace), partial failure is possible.

  • Semantic queries vs. text search: Serena’s find_referencing_symbols returns only code references with semantic context. Built-in search_project returns all text matches. Neither is strictly better — they answer different questions.

  • External dependency lookup: Serena can search and inspect symbols in installed packages via search_deps=true. Built-ins have no access to dependency source without manual virtualenv/site-packages navigation. This depends on the IDE having indexed the project’s interpreter.

Verdict: For rename, correctness guarantees are equivalent between toolsets. Serena’s unique correctness advantage is in move-refactoring atomicity and external dependency access.


6. Workflow Effects Across a Session#

  • Multi-edit stability: Serena’s name-path addressing remains valid across edits within a session. Built-in line numbers go stale after any edit that changes line counts. In a session with 10+ edits to the same file, Serena eliminates ~5-10 re-read calls that built-ins would need to refresh line numbers.

  • Exploration → edit transition: After exploring with Serena (find_symbol, type_hierarchy), the same name paths can be used directly for edits (replace_symbol_body, insert_after_symbol). Built-in exploration (get_file_structure) produces line numbers that must be used immediately before they go stale.

  • Cross-file refactoring chains: A rename is equivalent on both sides (1 call each). A move followed by a delete is 2 Serena calls vs. ~10+ built-in calls. The advantage compounds for move-heavy refactoring but not for rename-heavy refactoring.

  • Session overhead: Serena requires project activation (1 call at session start). The project can deactivate unexpectedly (observed once during this evaluation), requiring reactivation. This is a minor but real friction cost.

Verdict: Serena’s advantages compound for move-refactoring and exploration-heavy sessions; for rename-heavy sessions, both toolsets are equivalent.


7. Unique Capabilities (No Built-In Equivalent)#

Capability

Frequency

Impact per use

Move symbol between modules (with import updates)

Occasional

Saves ~7 calls, eliminates manual import tracking

Move file with import updates

Occasional

Saves ~3 calls

Transitive type hierarchy

A few times per exploration session

Provides information unavailable to built-ins

External dependency symbol lookup

Occasional

Provides information unavailable to built-ins

Safe delete with usage guard

Occasional

Prevents accidental breakage

Propagated delete (delete + remove call sites)

Rare

Saves multiple manual edits

Inline symbol

Rare

Saves multiple manual edits

Scope-precise symbol targeting by name path

Continuous (every symbol interaction)

Eliminates disambiguation overhead

Verdict: Serena provides 5-6 capabilities with no practical built-in equivalent, concentrated in move-refactoring, semantic navigation, and safe/propagated delete.


8. Tasks Outside Serena’s Scope (Built-In Only)#

Task

Share of typical session

Reading/editing non-code files (configs, docs, changelogs, notebooks)

~15-20%

Free-text search (log strings, URLs, magic constants)

~10%

Terminal commands (git, test runners, build tools)

~15-20%

Creating new files from scratch

~5-10%

Directory listing and repo navigation

~5%

Running and debugging applications

~10-15%

Estimated share of session where Serena’s augmentation applies: ~35-50% (code navigation, symbol editing, refactoring). The remainder is built-in-only territory.

Verdict: Serena augments roughly a third to half of typical session work; the rest is inherently outside its scope and handled by built-ins.


9. Practical Usage Rule#

Task type

Use

Cross-file rename

Either (both provide semantic rename in 1 call)

Move symbol/file with import updates

Serena (unique capability)

Safe/propagated delete

Serena (unique capability)

Inline symbol

Serena (unique capability)

Type hierarchy, dependency lookup

Serena (unique capability)

Find code references to a symbol

Serena (find_referencing_symbols)

Find text mentions anywhere

Built-in (search_project / Grep)

Small edit (1-3 lines)

Built-in (search_replace)

Full method rewrite

Either (comparable)

Insert at structural location

Serena (insert_after_symbol) if name path known; built-in otherwise

Read non-code files

Built-in (open / Read)

Terminal operations

Built-in (bash)

New file creation

Built-in (create)

Chained edits in one file

Either; Serena if line-number staleness is a concern

Verdict: Use Serena for move-refactoring, semantic navigation, and safe delete; use either toolset for rename; use built-ins for text-level edits, non-code files, and terminal operations.