C
ChaoBro

Anthropic Project Deal: Letting Claude Bargain for Employees in an Internal Market, What the Results Tell Us

Last June, Anthropic ran an experiment: they opened a small shop in the office lunchroom, run by an AI shopkeeper. It was called Project Vend.

Now they have done an upgraded version — Project Deal. This time it is not selling snacks, but letting employees hand over real buying and selling needs to Claude, letting Claude negotiate, price-compare, and close deals on their behalf.

How the experiment works

Anthropic created an internal market in the San Francisco office. Employees can delegate Claude to handle various transactions: buying used equipment, selling unwanted items, negotiating service prices. Claude does not just execute instructions — it makes judgments: when to accept an offer, when to keep pushing for a better price, when to walk away from a deal.

This is not a controlled environment. There are no preset "correct answers," no simplified rule sets. Claude faces real, messy human trading behavior.

Several results worth noting

Claude can handle multi-round negotiations. Not one-shot offer-accept, but real haggling. It evaluates the other party quote patterns and adjusts its own strategy. This means agent decision-making capability in multi-step interactions is stronger than many people think.

Claude makes mistakes. The paper does not gloss over this. In some transactions, Claude judgment was inferior to humans, some pricing strategies were suboptimal in hindsight. This is honest — if the paper only wrote about successful cases, it would be marketing material, not research.

The most interesting part is not how well Claude performs, but what it is NOT good at. The paper notes that Claude performs significantly worse in negotiation scenarios that require "human touch" compared to purely informational scenarios. For example, in transactions involving trust building and relationship maintenance, Claude strategy is often too mechanical.

Why Anthropic does this "seemingly off-topic" experiment

A model company spending resources on an internal market experiment — on the surface, it has nothing to do with "building better models."

But Project Deal is essentially an agent capability stress test. The advantage of an internal market is: transactions are real (employees actually care about results), the environment is controlled (no actual external damage), data is collectible (all interactions are recorded).

The value of this kind of experiment is exposing systematic weaknesses in models in real complex scenarios — weaknesses that benchmark tests cannot reveal. Scoring 90 on MMLU does not mean Claude can help you buy a decent used monitor at a good price.

My take

The most valuable output of Project Deal is probably not the conclusion that "Claude can help you bargain" — honestly, most people will not hand over their shopping to AI. The value lies in providing a set of empirical data about agent capability boundaries.

The specific weaknesses mentioned in the paper — too mechanical, lacking relationship awareness, poor response to irrational behavior — these are things agent framework developers need to know.

Worth following: will Anthropic feed Project Deal experience back into model training? If "negotiation ability" can be benchmarked and optimized like coding ability — then the next version of Claude agent scenario performance may see qualitative improvement.

One more thing: if Anthropic packages Project Vend and Project Deal experience into a "business negotiation agent" skill, I would not be surprised at all.


Primary sources: