Braintrust
Braintrust is an AI evaluation platform that helps engineering teams test, score, and monitor large language model applications.
About this data
Updated June 29, 2026
Overall Pulse Score
-4 over this period
A 0-100 index summarizing the tone of 100 relevant public mentions gathered from public online communities across 10 weeks in the selected period. It measures online sentiment, not a rating of the product's quality.
Weekly Sentiment Trend
Pulse Score by week over the selected period. Each point is one complete week of mentions.
This week in public discussion
Commenters discussing Braintrust over the recent period reflected a slightly cautious tone, with bug reports representing the most prominent thread of conversation. Several mentions flagged reliability and integration concerns, including specific issues around Zod schema mismatches, gevent incompatibility, and tracing leaks. Praise focused on certain features and comparisons to competitors, though complaints about missing features and integration gaps outnumbered positive remarks across the discussion overall.
Read the deeper analysisAI-generated summary of public online discussion during this period. It reflects the tone of that discussion, not facts about the product or our views.
Sentiment mix by week
How the tone of public discussion splits each week.
Ringed points mark weeks with unusually high discussion volume, more than double this product's typical week.
Most-discussed praise
Most-discussed complaints
Themes across the selected period, with mention counts.
How Braintrust compares
Pulse Score over the selected period versus the top tracked competitors in Coding.
Where the mentions come from
Share of the 100 relevant public mentions in the selected period, by source.
Sample public mentions
Showing 5 of 100 analyzed public mentions in this period, with links to the original source. We do not reproduce full threads.
“gevent incompatibility. braintrust.Eval() appears to be unsafe under gevent monkey patching because the sync API drives the async evaluator with asyncio.run(). While that loop is active, sibling gevent greenlets on the same OS thread can observe it via asyncio.get_running_loop()....”
“list-experiments MCP tool throws: repo_info.dirty is required in Zod schema but absent in API responses. ## Description The list-experiments MCP tool fails entirely when any returned experiment has a repo_info object that lacks the dirty field. This happens whenever evals are run...”
“Evals: automatically compare runs against prior or selected baseline. ## Context AgentV should make eval comparisons first-class. During WTG Braintrust/Phoenix UX benchmarking, Braintrust automatically compared a new experiment against the prior compatible experiment, and Phoenix...”
“[Bug] UseBraintrustTracing leaks Microsoft Agent Framework local history sentinel with RequirePerServiceCallChatHistoryPersistence. When using Braintrust.Sdk.AgentFramework with Microsoft Agent Framework and RequirePerServiceCallChatHistoryPersistence = true, UseBraintrustTracing...”
“feat(eleatic): generic per-row trace_json + drawer Trace panel. ## Context eleatic's drill-down drawer shows the structured verdict (output_json/expected_json) but has no **trace** of how a row was produced (the LLM call's input/output, token usage, latency) — Braintrust's trace ...”
240+ more analyzed mentions, full history, and theme breakdowns are part of Pro.
Get ProDeeper analysis
- Bug reports and missing features dominated discussion volume and set a predominantly frustrated tone across the four-week window.
- Sentiment followed a volatile trajectory, dipping sharply in mid-May, recovering briefly during a high-volume week in early June, then softening again through late June.
- Opinion on integrations was divided, with some commenters praising compatibility and others citing integration gaps as a core complaint.
- Several mentions framed Braintrust as a conditional or secondary tool in evaluation workflows, reflecting qualified rather than confident adoption sentiment.
| Praise theme | Mentions |
|---|---|
| Strong features | 19 |
| Good integrations | 7 |
| Compared to rivals | 7 |
| Feature requests | 3 |
| AI quality | 2 |
| Complaint theme | Mentions |
|---|---|
| Bugs | 29 |
| Missing features | 14 |
| Reliability | 12 |
| Lacking integrations | 10 |
| Compared to rivals | 9 |
Discussion about Braintrust over the past four weeks has been dominated by a persistent undercurrent of frustration, with bug reports and missing features generating the bulk of conversation volume. Commenters surfaced a range of technical friction points, including schema validation failures in MCP tooling, async runtime conflicts under gevent monkey patching, and tracing leaks tied to specific SDK and framework combinations. These complaint themes collectively outweighed praise by a meaningful margin, and the tone across bug-related mentions leaned toward practical urgency rather than casual disappointment, suggesting the voices driving discussion skew toward active developers hitting real blockers.
Sentiment moved in a jagged pattern across the window rather than in a clean direction. The trajectory opened in a moderate range before dipping sharply around mid-May, recovered somewhat in the final days of May and into early June when discussion volume also spiked heavily, then softened again through late June. The high-volume week in early June coincided with a moderate score, suggesting that broader attention did not translate into more positive tone. The most recent readings trended downward, leaving the overall arc slightly negative relative to where the window started.
Praise themes were present but quieter in volume. Feature appreciation and favorable competitor comparisons did surface across a handful of mentions, and some commenters acknowledged integration capabilities positively. However, integration concerns appeared on the complaint side as well, pointing to divided opinion on that specific dimension.
Where opinion was most visibly split was around Braintrust's positioning relative to competing and complementary tools. Several mentions framed it as a deferred or secondary layer in evaluation architectures rather than a primary gate, reflecting a tone of conditional or qualified confidence rather than outright endorsement. This architectural hesitancy, combined with the reliability and missing-feature complaints, shaped a discussion that felt more exploratory and problem-oriented than enthusiastic.
AI-generated summary of public online discussion during this period. It reflects the tone of that discussion, not facts about the product or our views.
Member perspectives
Individual opinions from Pro members, posted over time. These are personal member views, not aggregated sentiment data.
Overall Pulse Score
-4 over this period
A 0-100 index summarizing the tone of 100 relevant public mentions gathered from public online communities across 10 weeks in the selected period. It measures online sentiment, not a rating of the product's quality.
Data summary
Compare with another tool
Braintrust
43
Trainual
88
Score-level preview from live weekly tracking.
Are you Braintrust?
If you represent this product, you can share context about the data shown here. We read every submission.
Share feedbackAffiliate disclosure
Some links on this site may be affiliate links. If you click one and make a purchase, we may earn a commission at no extra cost to you. Learn more.
Compare with similar tools
Bubble
No-code platform that lets users design, build, and deploy full-stack web applications without writing code.
Free tier; paid plans available
View DetailsRevenueCat
A platform that manages in-app purchases, subscriptions, and revenue analytics for iOS and Android app developers.
Free tier; paid plans available
View Details