fix(templates): give TypeScript templates per-session in-process short-term memory#1666
Conversation
…t-term memory The TypeScript Strands no-memory template cached a single global Agent shared across every session, so a process serving multiple sessions (e.g. `agentcore dev`) leaked conversation history between them and grew without bound. The VercelAI template was stateless and could not recall earlier turns even within one session. Key both templates' in-process history by sessionId, bounded by an insertion-ordered Map acting as an LRU(128) — mirroring the Python templates. On AgentCore Runtime each microVM serves a single session, so this holds one entry; the bound only matters for many-session processes like local dev. No AWS service is provisioned; this is free, best-effort short-term memory that resets on cold start. Verified on deployed AgentCore Runtime: same-session recall works, distinct sessions stay isolated, and bare invokes are unaffected.
|
Claude Security Review: no high-confidence findings. (run) |
Package TarballHow to installgh release download pr-1666-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.1.tgz |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Thanks for fixing the cross-session bleed — the approach mirrors the Python templates cleanly and the bounded LRU is a nice touch.
I think there is one correctness issue in the Vercel AI template around error handling that should be addressed before merging: a failure during streamText (synchronous throw, async error, or mid-stream failure) leaves the user turn appended to messages without a corresponding assistant turn, which corrupts subsequent turns on the same session (Bedrock/Anthropic reject consecutive user messages). See the inline comment for options.
Minor concurrency note (non-blocking): two concurrent requests against the same sessionId will interleave pushes into the shared messages array. Unlikely on AgentCore Runtime (one session per microVM) and the feature is documented as best-effort, so I'd be fine leaving this as-is — flagging only in case you want to call it out in the comment.
| assistant += chunk; | ||
| yield { data: chunk }; | ||
| } | ||
| messages.push({ role: 'assistant', content: assistant }); |
There was a problem hiding this comment.
Failed turns leave the history in a broken state. The user message is pushed at line 38 before streamText runs, and the assistant message is only pushed at line 52 after the stream completes successfully. If streamText throws, the for await errors mid-stream, or the generator is aborted by the runtime, the user turn stays in messages with no matching assistant turn. On the next invocation for the same sessionId, the history sent to the model will contain two consecutive user messages, which providers like Bedrock/Anthropic strict-validate and reject — so a single transient error effectively poisons the session for the rest of the process's lifetime.
A few ways to fix:
- Stage the user message and only commit on success. Build the request as
[...messages, { role: 'user', content: prompt }], and onlymessages.push(userMsg, assistantMsg)after the stream drains successfully. - Wrap in try/finally and roll back on failure. Push the user message up front, then in a
catch/finallyeither pop the trailing user message or push a placeholder assistant message before re-throwing. - Snapshot-and-restore on error. Capture
messages.lengthbefore the call; on error, truncate back to that length.
Option 1 is the cleanest IMO and matches how most history-managing examples in the ai SDK docs do it.
There was a problem hiding this comment.
Fixed in ccb52f3 with Option 1 (stage, commit on success), with one adjustment based on what I observed testing it.
I reproduced the failure against Bedrock and the mechanism is slightly different from a throw: a provider-level streamText failure surfaces as a silently-empty textStream — the for await completes with zero chunks and does not throw. So a plain try/catch around the loop never fires; the old code committed the user turn plus an empty-content assistant turn, and the next turn for that session was rejected with The content field in the Message object at messages.N is empty.
The fix builds the request from a staged copy and only commits once a non-empty reply has streamed:
const userMessage: ModelMessage = { role: 'user', content: payload.prompt ?? '' };
const result = streamText({ model, system: SYSTEM_PROMPT, messages: [...history, userMessage] });
let assistant = '';
for await (const chunk of result.textStream) { assistant += chunk; yield { data: chunk }; }
if (assistant.length > 0) {
history.push(userMessage, { role: 'assistant', content: assistant });
}The assistant.length > 0 guard covers both the silent-empty case and a genuine mid-stream throw (the push is simply never reached), so a failed turn is dropped cleanly instead of poisoning the session. Verified end-to-end: a failed turn leaves history empty, and a subsequent good turn on the same session works and recalls correctly.
On the concurrency note — agreed it's not worth locking given one-session-per-microVM on Runtime and the best-effort framing; the inline comment now states the history-consistency reasoning.
Coverage Report
|
| return cachedAgent; | ||
| if (agentCache.size >= AGENT_CACHE_LIMIT) { | ||
| const oldest = agentCache.keys().next().value; | ||
| if (oldest !== undefined) agentCache.delete(oldest); |
There was a problem hiding this comment.
qq: do we need to do any clean up here aside from just purging oldest agent?
There was a problem hiding this comment.
Good question — no, purging the oldest agent is sufficient; there's nothing else to tear down.
I checked what a Strands TS Agent holds and what the template gives it:
- The
Agenthas noclose/dispose/[Symbol.asyncDispose]— it's a plain object holding in-memory state (messages,modelState, the tool/conversation registries). Dropping the last reference makes it GC-eligible; there are no per-agent sockets, file handles, or timers to release. - The only things that could hold a connection — the MCP clients and the model — are module-level singletons in the template (
const mcpClients = [...],const tools = [...]at top scope), shared by reference across every cached agent. They're intentionally not per-agent, so eviction must not close them (other live sessions are still using them); they persist for the process lifetime by design.
So eviction is just delete of the map entry, matching the Python agent_factory (cache.popitem() with no teardown). If a future change gave each agent its own MCP client, that's when we'd need a disposal hook — but that's not the case here.
streamText errors surface as a silently-empty textStream (the for-await completes with zero chunks rather than throwing), so the previous code committed the user turn plus an empty assistant turn. On the next invocation for the same sessionId, the provider rejected the empty assistant content, poisoning the session for the rest of the process. Build the request from a staged copy ([...history, userMessage]) and only push the exchange into the persistent history after a non-empty reply, so a failed turn is dropped cleanly. Addresses PR review feedback.
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Nice fix overall — keying in-process history by sessionId and bounding the cache via Map-as-LRU mirrors the Python templates cleanly, and the VercelAI commit-on-non-empty refinement in ccb52f3 is solid.
One thing that I think still needs to be addressed before merge: the same "failed turn poisons the session" hazard that the prior review surfaced for VercelAI also applies to the no-memory Strands branch, and the current change doesn't guard against it. Details inline.
| {{else}} | ||
| const agent = await getOrCreateAgent(); | ||
| const sessionId = context?.sessionId ?? 'default-session'; | ||
| const agent = await getOrCreateAgent(sessionId); |
There was a problem hiding this comment.
Mid-stream failure on a cached Strands Agent poisons the session — same class of bug as the one we just fixed in VercelAI, but on this branch it survives because nothing rolls back the agent's internal state.
Looking at @strands-agents/sdk 1.7.0 (dist/src/agent/agent.js), Agent.stream(prompt) appends the user message to this.messages before invoking the model:
// Normalize input and append user messages on first invocation only
if (currentArgs !== undefined) {
const messagesToAppend = this._normalizeInput(currentArgs);
for (const message of messagesToAppend) {
yield this._appendMessage(message, invocationState); // <-- user msg committed here
}
...
}
...
const modelResult = yield* this._invokeModel(invocationState, structuredOutputChoice);The outer catch (error) only does special handling for CancelledError / InterruptError; a generic error from _invokeModel (e.g. Bedrock throttling, transient provider error) just re-throws after closing spans — this.messages keeps the dangling user turn. Because we now cache one Agent per sessionId, the next invocation on that session calls _normalizeInput again and appends another user message, so the agent's history ends with two consecutive user messages and Bedrock/Anthropic providers strict-reject the request from then on. Single transient error → session permanently broken for the lifetime of the process.
Before this PR, cachedAgent was a single global, so this was already latent — but limited to "the whole process gets stuck" rather than per-session. Now that we're explicitly selling "per-session in-process short-term memory", the failure mode is more visible and worth closing.
A few ways to fix:
- Snapshot/restore via the SDK.
AgentexposestakeSnapshot({ include: ['messages'] })andloadSnapshot(snapshot)(seedist/src/agent/agent.d.ts). Beforeagent.stream(...), take a messages snapshot; in acatch,loadSnapshotit and rethrow. Cleanest because it uses the supported public API. - Length-based truncation. Capture
agent.messages.lengthbefore the loop; on error, splice back to that length and rethrow. Lower-level but doesn't depend on snapshot semantics. - Evict on error. Wrap the
for awaitin try/catch; on erroragentCache.delete(sessionId)and rethrow. Simpler, but the session loses all prior history on any transient failure — fine as a stopgap if the SDK snapshot API isn't stable yet.
Option 1 matches the SDK's own "keep agent.messages reinvokable at all times" design intent (per the deferred-append comment in agent.js) and is what I'd lean toward. Option 3 is the minimal viable fix if you'd rather not introduce a snapshot dependency in the template.
Worth verifying by reproducing the same way the VercelAI case was: force a provider error on turn 1 (e.g. invalid model id, or temporarily throttled), then send turn 2 on the same --session-id and confirm it isn't rejected with a "consecutive user messages" / role-validation error from Bedrock.
There was a problem hiding this comment.
Fixed in 1540ed6 with Option 1 (snapshot/restore via the SDK's takeSnapshot/loadSnapshot).
I reproduced it as you suggested and want to share the nuance I found, because it changes the severity (not the fix):
The dangling user turn is real — after a forced turn-1 failure on a cached agent, agent.messages is left as ["user"], and a subsequent turn appends another, giving ["user","user","assistant"]. Strands' model adapters don't merge consecutive same-role messages (AnthropicModel._formatMessages is a plain .map), so the doubled user turn is sent as-is.
But whether it actually breaks is provider-dependent:
- Bedrock (the template's default
modelProvider): I tested both via the Agent and a rawConverseCommandwith two consecutive user messages — both accepted, and the model answered correctly. So on the default provider the session is not poisoned. - Anthropic / OpenAI direct: those APIs require strict user/assistant alternation, so the doubled user turn would be rejected from then on.
Since the template ships all four providers, I fixed it regardless rather than rely on Bedrock's leniency.
The guard snapshots messages before the stream and restores on error, scoped to the no-memory branch only — the hasMemory branch persists through AgentCoreMemorySessionManager (server-side events via hooks), where rolling back local agent.messages wouldn't undo the server write, so it deliberately keeps its existing try/finally … flush() shape.
const snapshot = agent.takeSnapshot({ include: ['messages'] });
try {
for await (const event of agent.stream(payload.prompt ?? '')) { /* … */ }
} catch (error) {
agent.loadSnapshot(snapshot);
throw error;
}Verified end-to-end through the BedrockAgentCoreApp handler with a model that throws on turn 1: turn 1 returns an SSE error and rolls back, and turn 2 on the same sessionId runs cleanly with ["user","assistant"] history instead of a poisoned ["user","user",…].
Agent.stream() appends the user message to the cached agent's history before invoking the model, and a generic mid-stream error (e.g. throttling or a transient provider error) re-throws without rolling that back. Since the no-memory branch now caches one Agent per sessionId, the next turn on that session would append a second user message, leaving consecutive user turns that providers requiring strict role alternation (e.g. Anthropic) reject — poisoning the session for the process lifetime. Snapshot the agent's messages before streaming and restore them in a catch before re-throwing, using the SDK's takeSnapshot/loadSnapshot. Scoped to the no-memory branch; the memory branch persists via the session manager and must not roll back local state. Verified end-to-end through the runtime handler: a forced turn-1 failure leaves history clean and the next turn on the same session succeeds. Addresses PR review feedback.
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
LGTM — all serious issues raised in earlier reviews have been addressed in the latest two commits, and I don't have anything new to flag.
What I checked:
- VercelAI failed/empty-turn handling (
ccb52f30): Building the streamText request from[...history, userMessage](rather than pushing first) and only committing onassistant.length > 0is correct. A mid-stream throw propagates out of thefor awaitbefore the commit, and a silent-empty stream is dropped — either wayhistoryis never mutated for a failed turn, so the next turn on the same session won't be poisoned with a dangling user message or empty assistant message. - Strands snapshot/restore (
1540ed67): I confirmedAgent#takeSnapshot/loadSnapshotare part of the public API in@strands-agents/sdk1.5.0 (the pinned^1.5.0floor), and thatloadSnapshotwith a{ include: ['messages'] }snapshot truncates and re-pushes deep copies of the messages array (agent.messages.length = 0+Message.fromJSON(...)). So a mid-_invokeModelerror after Strands appends the user turn is correctly rolled back, and the cachedAgentstays reusable. Restricting the guard to the{{else}}(no-memory) branch is the right call — onhasMemory, server-side memory writes via the session manager wouldn't be undone by a local rollback. - LRU bound:
Map-as-LRU with delete+set on hit andkeys().next().valueeviction on overflow is correct insertion-ordered LRU semantics. - Snapshot test: the snapshot file matches the template sources verbatim, so
npm run test:snapshotswill pass.
Pre-existing/non-blocking (not introduced by this PR, just noting):
- The
hasMemoryStrands branch'sagentCacheis still unbounded, unlike the no-memory branch's LRU(128). Consistent with the Python templates'agent_factory, so fine to leave for now. - The PR description says this "mirrors the Python templates", but the Python
http/strands/base/main.pyno-memory branch actually keeps a single global_agent, not a per-session LRU. The TS change here is strictly better than the Python equivalent in the no-memory path; just calling out for accuracy.
Nothing blocking from me — approving.
|
Thanks for the thorough re-review. One factual correction on the non-blocking note: the Python def agent_factory():
cache = OrderedDict()
def get_or_create_agent(session_id, ...):
if session_id in cache:
cache.move_to_end(session_id); return cache[session_id]
if len(cache) >= 128:
cache.popitem(last=False)
cache[session_id] = Agent(...)
return cache[session_id]The single-global Agreed on the other note: the |
Description
The TypeScript agent templates did not provide correct per-session short-term memory — in-process, turn-to-turn conversation recall, which is distinct from the AgentCore Memory service.
Strands (
memory: none) cached a single globalAgent, so a process serving more than one session shared one conversation history across all of them. On AgentCore Runtime the 1:1 session→microVM model masks this, but inagentcore dev(one Node process, many sessions) it caused cross-session history bleed and unbounded growth. Reproduced: session A sets "favorite color is teal"; a different session B then asks and is told "teal".VercelAI called
streamText({ prompt })with no history, so it could not recall earlier turns even within a single session.This change keys each template's in-process history by
sessionId, bounded by an insertion-orderedMapacting as an LRU(128) — mirroring the Python templates. On AgentCore Runtime each microVM serves a single session, so this holds one entry; the bound only matters for many-session processes such as local dev. No AWS service is provisioned: this is free, best-effort short-term memory that resets on cold start. ThelongAndShortTerm(memory-service) Strands branch is unchanged.A prior change had disabled TypeScript memory because the Strands TS SDK lacked the AgentCore Memory session manager; that correctly removed the durable service integration but also removed the free in-process continuity, which never needed the session manager. This restores it.
Related Issue
Closes #1665
Documentation PR
N/A — templates are self-documenting via inline comments; no user-facing docs change.
Type of Change
Testing
How have you tested the change?
npm run test:unitandnpm run test:integnpm run typechecknpm run lintsrc/assets/, I rannpm run test:update-snapshotsand committed the updated snapshotsAdditionally, both templates were rendered, type-checked (
tsc --noEmit), and exercised end-to-end against deployed AgentCore Runtime and a local server (account-isolated test account):sessionIdacross two turns → second turn recalls the first (Strands and VercelAI).sessionIds in one process → histories stay isolated (the bleed is gone).--session-idrecall still works.Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.