test(sdk): add evals for llm judge, tool selection, followup quality, and multi-turn memory by mdrxy · Pull Request #1838 · langchain-ai/deepagents

Mason Daugherty (mdrxy) · 2026-03-12T18:07:01Z

Port from Agent Builder: LLM-as-judge assertion (LLMJudge) and three new eval suites — tool selection, followup question quality, and multi-turn memory behavior.

The judge assertion fills a gap where substring matching can't evaluate semantic correctness, using a second LLM to grade agent responses against human-readable criteria with per-criterion pass/fail granularity.

Eugene Yurtsev (eyurtsev)

very nice! do you think you could run the evals against this branch to just get a sense of what it looks like?

Mason Daugherty (mdrxy) · 2026-03-12T19:50:32Z

https://github.com/langchain-ai/deepagents/actions/runs/23020875594

Mason Daugherty (mdrxy) added 3 commits March 12, 2026 14:01

test(sdk): add llm judge assertion and tool selection evals

2c88669

.

e451388

.

7d5392d

github-actions bot added deepagents Related to the `deepagents` SDK / agent harness internal User is a member of the `langchain-ai` GitHub organization size: L 500-999 LOC tests Adding tests or correcting existing labels Mar 12, 2026

Eugene Yurtsev (eyurtsev) reviewed Mar 12, 2026

View reviewed changes

test(sdk): add multi-turn memory, tool selection, and followup evals

cd52c1b

Mason Daugherty (mdrxy) changed the title ~~test(sdk): add llm judge assertion and tool selection evals~~ test(sdk): add evals for llm judge, tool selection, followup quality, and multi-turn memory Mar 12, 2026

Merge branch 'main' into mdrxy/ab-evals

eaee525

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into mdrxy/ab-evals

1e62174

Merge branch 'main' into mdrxy/ab-evals

d182dfa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(sdk): add evals for llm judge, tool selection, followup quality, and multi-turn memory#1838

test(sdk): add evals for llm judge, tool selection, followup quality, and multi-turn memory#1838
Mason Daugherty (mdrxy) wants to merge 7 commits intomainfrom
mdrxy/ab-evals

Mason Daugherty (mdrxy) commented Mar 12, 2026 •

edited

Loading

Uh oh!

Eugene Yurtsev (eyurtsev) left a comment

Uh oh!

This comment was marked as outdated.

Mason Daugherty (mdrxy) commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mason Daugherty (mdrxy) commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Eugene Yurtsev (eyurtsev) left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Mason Daugherty (mdrxy) commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mason Daugherty (mdrxy) commented Mar 12, 2026 •

edited

Loading