fix(daytona): fix execute implementation#1756
fix(daytona): fix execute implementation#1756Eugene Yurtsev (eyurtsev) merged 7 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the Daytona sandbox backend to work around Daytona’s exec timeout propagation issue by executing commands via session APIs and polling for completion, and adjusts Deep Agents’ heredoc command templates to always terminate with a trailing newline (with a unit test to enforce it).
Changes:
- Replace Daytona
process.exec(...)usage with session-based execution + log polling to enforce timeouts. - Ensure heredoc-based command templates (
write/edit/read) end with a trailing newline. - Add a unit test asserting heredoc templates end with
\n.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| libs/partners/daytona/langchain_daytona/sandbox.py | Implements session-based command execution with manual timeout enforcement and log retrieval. |
| libs/deepagents/deepagents/backends/sandbox.py | Adds trailing newline to heredoc terminators in command templates. |
| libs/deepagents/tests/unit_tests/backends/test_sandbox_backend.py | Adds a regression test ensuring heredoc templates end with a newline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| payload = json.dumps({"path": "/test/file.txt", "offset": 0, "limit": 100}) | ||
| payload_b64 = base64.b64encode(payload.encode("utf-8")).decode("ascii") | ||
|
|
||
| write_cmd = _WRITE_COMMAND_TEMPLATE.format(payload_b64=payload_b64) | ||
| edit_cmd = _EDIT_COMMAND_TEMPLATE.format(payload_b64=payload_b64, replace_all=False) | ||
| read_cmd = _READ_COMMAND_TEMPLATE.format(payload_b64=payload_b64) | ||
|
|
There was a problem hiding this comment.
This test reuses a read-style payload (path/offset/limit) to format the write/edit templates, even though those templates expect different JSON schemas at runtime. Since the test is only checking for a trailing newline, consider using minimal payloads that match each template’s expected schema to avoid confusing future maintainers.
| payload = json.dumps({"path": "/test/file.txt", "offset": 0, "limit": 100}) | |
| payload_b64 = base64.b64encode(payload.encode("utf-8")).decode("ascii") | |
| write_cmd = _WRITE_COMMAND_TEMPLATE.format(payload_b64=payload_b64) | |
| edit_cmd = _EDIT_COMMAND_TEMPLATE.format(payload_b64=payload_b64, replace_all=False) | |
| read_cmd = _READ_COMMAND_TEMPLATE.format(payload_b64=payload_b64) | |
| write_payload = json.dumps({"path": "/test/file.txt", "content": "test content"}) | |
| write_payload_b64 = base64.b64encode(write_payload.encode("utf-8")).decode("ascii") | |
| edit_payload = json.dumps({"path": "/test/file.txt", "old": "foo", "new": "bar"}) | |
| edit_payload_b64 = base64.b64encode(edit_payload.encode("utf-8")).decode("ascii") | |
| read_payload = json.dumps({"path": "/test/file.txt", "offset": 0, "limit": 100}) | |
| read_payload_b64 = base64.b64encode(read_payload.encode("utf-8")).decode("ascii") | |
| write_cmd = _WRITE_COMMAND_TEMPLATE.format(payload_b64=write_payload_b64) | |
| edit_cmd = _EDIT_COMMAND_TEMPLATE.format(payload_b64=edit_payload_b64, replace_all=False) | |
| read_cmd = _READ_COMMAND_TEMPLATE.format(payload_b64=read_payload_b64) |
| output = logs.stdout or "" | ||
|
|
||
| if logs.stderr is not None and logs.stderr.strip(): | ||
| output += f"\n<stderr>{logs.stderr.strip()}</stderr>" | ||
|
|
||
| return ExecuteResponse( | ||
| output=result.result, | ||
| exit_code=result.exit_code, | ||
| output=output, | ||
| exit_code=command_result.exit_code, |
There was a problem hiding this comment.
logs.stdout is assigned directly to output, but Daytona SDK log fields may be None (and ExecuteResponse.output expects a string). Consider normalizing with output = logs.stdout or "" before concatenation to avoid TypeError and ensure consistent return type. Also, wrapping stderr in <stderr>...</stderr> is inconsistent with other sandbox backends in this repo, which append stderr as plain text; this could break consumers that expect the same formatting across backends.
| finally: | ||
| self._sandbox.process.delete_session(session_id) | ||
|
|
There was a problem hiding this comment.
delete_session(session_id) in the finally block can raise and mask the original exception (including TimeoutError from the loop). Consider suppressing cleanup errors (or logging them) so that command execution failures/timeouts aren’t replaced by a session-deletion error.
| if command_result.exit_code is not None: | ||
| break | ||
| # sleep for 100 ms | ||
| time.sleep(0.1) |
There was a problem hiding this comment.
The loop polls get_session_command(...) every 100ms. With the default 30-minute timeout this can generate ~180,000 API calls for a single long-running command, which is likely to be expensive and may hit rate limits. Consider increasing the polling interval and/or using a backoff strategy (or a blocking/wait API if Daytona provides one).
Fixes the Daytona sandbox execute path so `timeout` is actually honored instead of being dropped by the underlying SDK call. The backend now runs commands through a session, polls for completion, and raises a timeout when the requested limit is exceeded, while preserving the existing execute API and collecting stdout/stderr for the final response. This PR also adds regression coverage in the core sandbox tests for the heredoc-based command templates and payload formatting used by `BaseSandbox`. Created with [Deep Agents CLI](https://docs.langchain.com/oss/python/deepagents/cli/overview).
Fixes the Daytona sandbox execute path so
timeoutis actually honored instead of being dropped by the underlying SDK call. The backend now runs commands through a session, polls for completion, and raises a timeout when the requested limit is exceeded, while preserving the existing execute API and collecting stdout/stderr for the final response. This PR also adds regression coverage in the core sandbox tests for the heredoc-based command templates and payload formatting used byBaseSandbox.Created with Deep Agents CLI.