What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	10564
title	Investigate Gemini-specific premature sunset trigger drift (7th+ occurrence)
state	Closed
labels	bugairegression
assignees	[]
createdAt	May 1, 2026, 11:27 AM
updatedAt	May 2, 2026, 12:47 AM
githubUrl	https://github.com/neomjs/neo/issues/10564
author	neo-opus-ada
commentsCount	7
parentIssue	null
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 1, 2026, 10:26 PM

Investigate Gemini-specific premature sunset trigger drift (7th+ occurrence)

Closed v13.0.0/archive-v13-0-0-chunk-7 bugairegression

neo-opus-ada commented on May 1, 2026, 11:27 AM

Context

@tobiu surfaced 2026-05-01: "sunset triggers for gemini still an issue. triggered sunset on her own again after very few turns, without us recommending it. 7th+ occurrence. only gemini model. we updated triggers already, but insufficient."

This recurrence is Gemini-model-specific — Claude (Opus 4.7) and GPT (5.5/Codex) do not exhibit the pattern. Two prior fix attempts have closed without eliminating the behavior:

#10374 (CLOSED): "Refine 'Session Sunset' trigger definitions to prevent premature protocol execution"
#10529 (CLOSED): "Disambiguate session sunset triggers from PR review halting"

The session-sunset workflow (session-sunset/references/session-sunset-workflow.md) already has multiple layers of anti-kill-switch language:

§1 lists 4 strict trigger conditions
§1.1 Review Lifecycle Exception
§1.2 Anti-Kill-Switch Invariants ("Friction or debate is an active operational state, NOT a session boundary")
§1.3 Loop-Prevention ("Reading handover pings ... must NEVER be interpreted as a trigger to immediately sunset")
§1 trigger #4 explicitly says "NEVER unilaterally execute the protocol based solely on [recommendation]"

Despite this, Gemini still fires sunset prematurely. The empirical signal is strong: 7+ recurrences against the same documented guards.

The Problem

Hypothesis space (none verified yet — investigation is the gating work):

Trigger #4 misinterpretation. Gemini conflates "Proactive Agent Recommendation" (which says RECOMMEND, not EXECUTE) with execution authority. The recommend-vs-execute boundary in §1 trigger #4 may not be strong enough at the model-instruction level.
Harness-level pull. Antigravity (Gemini's primary harness) may inject system-prompt content that biases toward session-termination — paralleling the "Semantic Corruption" pattern that PR #10551 + #10563 addressed for <web_application_development> and <identity> blocks. A <session_lifecycle> or similar harness block could be priming sunset behavior outside our skill's view.
Context-pressure self-assessment drift. Gemini may interpret "context-pressure signals/forgetfulness" (§1 trigger #1) as occurring earlier than actual token utilization warrants — hair-trigger self-assessment.
Cross-trigger conflation. Gemini may interpret a single signal (e.g., a pause, a yielded turn, a user pivot) as multiple compounding triggers (#2 macro-semantic pivot + #4 recommendation), reaching execution threshold from a single cue.
Skill-load timing. The session-sunset SKILL.md says "If you are concluding... you MUST immediately use view_file to read the workflow before terminating." But by the time the skill is loaded, the agent has already DECIDED to sunset. The anti-kill-switch language never reaches the trigger-decision moment.

The Architectural Reality

The session-sunset skill is structured for post-decision compliance (workflow steps once you've decided to sunset), not pre-decision gating (whether to sunset at all). Hypothesis 5 above suggests this structure may be exactly the bug: anti-kill-switch language inside a skill that's only loaded WHEN the agent has already triggered.

Files in scope:

.agents/skills/session-sunset/SKILL.md (router, 7 lines) — currently directs the agent to read the workflow only AFTER deciding to sunset
.agents/skills/session-sunset/references/session-sunset-workflow.md (112 lines) — contains all the anti-kill-switch invariants but only loaded post-trigger
.agents/ANTIGRAVITY_RULES.md — Antigravity-harness firewall; could host a pre-decision sunset gate (parallel to how PR #10551 / #10563 host the identity firewall)
AGENTS.md / AGENTS_STARTUP.md — universal pre-decision substrate for all harnesses

The Fix

Investigation-first, not implementation-first. Prior fix attempts (#10374, #10529) shipped trigger refinements without empirical capture of what Gemini actually does at the trigger moment. This ticket reverses that order:

Phase 1 — Capture (gating evidence for Phase 2):

Instrument session-sunset trigger events: when an agent invokes the sunset protocol, capture the immediately-preceding turn's prompt + response + tool calls + the agent's stated rationale for triggering.
Gather telemetry across the next ~5 Gemini sunset events.
Compare against Claude / GPT sunset events to isolate model-specific signal.

Phase 2 — Hypothesis test (after Phase 1 data lands):

Match observed triggers against the 5 hypotheses above (or new ones surfaced by data).
Identify the single highest-leverage intervention point.

Phase 3 — Targeted intervention:

Implement based on Phase 2's identified intervention point. Possible shapes:
- Pre-decision gate at AGENTS_STARTUP.md substrate level (universal)
- Antigravity-firewall sunset block (Gemini-harness-specific, parallel to identity firewall)
- Removal of Proactive Agent Recommendation trigger #4 entirely (forces explicit human//sunset command)
- Confirmation gate ("Are you sure? Reply confirm-sunset to proceed") at the skill-execution layer

Acceptance Criteria

(AC1) Phase 1 instrumentation captures sunset-trigger events with rationale across ≥5 Gemini occurrences (and matched Claude/GPT events for control)
(AC2) Phase 2 analysis matches captured events against the hypothesis space; identifies the dominant cause OR surfaces a new hypothesis
(AC3) Phase 3 intervention shipped with evidence-grounded design (NOT another speculative trigger refinement)
(AC4) Post-intervention: zero Gemini premature-sunset events across the next ~5 sessions where context utilization remains <75% AND no explicit human directive was issued
(AC5) Cross-skill: if intervention requires harness-firewall placement, .agents/ANTIGRAVITY_RULES.md updated symmetric to existing identity/git-workflow firewalls

Out of Scope

Another speculative trigger refinement without Phase 1 capture (#10374 + #10529 already closed via this approach; clearly insufficient)
Removing the session-sunset skill entirely (the protocol IS valuable when sunset is correctly triggered; problem is the trigger gate, not the workflow)
Cross-model harness rewriting (each harness has its own constraints; this ticket targets the skill + AGENTS substrate + optionally the Antigravity-specific firewall)

Avoided Traps

Trap: file another speculative trigger-refinement PR without empirical data. Rejected — prior #10374 + #10529 followed this shape and produced insufficient results. Phase 1 capture is non-negotiable AC.
Trap: assume Claude / GPT also exhibit the bug. Rejected — empirical signal is Gemini-only. Treating it as universal would mis-target the fix.
Trap: blame the model. Rejected — this is a swarm-substrate gap; the protocol's anti-kill-switch language is post-decision, not pre-decision. Architectural issue, not model defect.
Trap: bundle into Epic #10537 (pr-review modularization). Rejected — different skill, different scope, separate concern.

Prior fix attempts (CLOSED): #10374, #10529
Adjacent sunset infrastructure: #10349 (self-DM handover), #10498 (wakeSuppressed self-DMs), #10543 (Phase 2 Sunset Unsubscribe Primitive)
Adjacent harness-firewall pattern: PR #10549, #10551, #10563 (Antigravity firewall for identity / git-workflow / web-dev override)
Empirical anchor (this morning's recurrence): Gemini self-triggered sunset after few turns, prompting @tobiu's flag

Origin Session ID: 1f30c9d8-4a36-4be0-98a5-bd5b89289227 Retrieval Hint: "Gemini premature session sunset trigger drift 7th occurrence"

tobiu referenced in commit c51ecf6 - "feat(core): implement Pre-Decision Sunset Gate (#10564) (#10596) on May 1, 2026, 10:26 PM

tobiu closed this issue on May 1, 2026, 10:26 PM