TL;DR
Google’s internal AI coding agent, Agent Smith, now generates over 25% of the company’s new production code. Unlike autocomplete tools like Copilot, Agent Smith works asynchronously in the background, writing, testing, and iterating on code without human interaction. For API teams, this raises questions about contract stability, test coverage, documentation drift, and review workflows when a quarter of your codebase is machine-generated.
Introduction
During a March 2026 earnings call, Google CEO Sundar Pichai dropped a number that made the entire software industry pause: AI-generated code now accounts for more than 25% of new code produced at Google.
This isn’t autocomplete. This isn’t Copilot suggestions accepted by developers. This is code that ships in production after AI generation. The tool behind it, internally called Agent Smith (a nod to the self-replicating antagonist from The Matrix), became so popular among Google’s 180,000+ employees that the company had to throttle access to manage infrastructure strain.
Agent Smith represents a different category from the AI coding tools most developers use today. Where Copilot and Claude Code assist in real time, Agent Smith works in the background. Engineers assign tasks, walk away, and return later to review completed work.
For API development teams, this shift from “AI-assisted” to “AI-generated” code raises practical questions. When 25% of your codebase is written by an autonomous agent, how do you keep API contracts stable? How do you ensure tests cover machine-generated endpoints? How do you prevent documentation from drifting?
This article breaks down what Agent Smith does, how it differs from other AI coding tools, and what API teams should prepare for.
What Agent Smith does
Asynchronous autonomous coding
Agent Smith doesn’t sit in your IDE waiting for you to type. It operates asynchronously in the background. Here’s the workflow:
- An engineer describes a task in natural language
- Agent Smith breaks the task into subtasks
- It writes code across multiple files
- It runs tests and iterates on failures
- The engineer reviews the completed work
This is fundamentally different from Copilot’s inline suggestions or Claude Code’s interactive sessions. Agent Smith is closer to a junior developer who takes a ticket, disappears for a few hours, and comes back with a pull request.
Engineers can delegate tasks and check progress via Google’s internal chat platform, even from mobile devices. The tool accesses employee profiles and relevant documentation automatically, pulling context from Google’s internal knowledge base.
Built on Gemini and Antigravity
Agent Smith runs on Google’s Gemini model family, augmented with retrieval systems that give it access to Google’s vast internal codebase and documentation. It’s built on top of Antigravity, Google’s existing agentic coding platform, but extends it with autonomous task decomposition and execution.
The retrieval augmentation is key. Agent Smith doesn’t generate code in isolation. It searches Google’s internal codebase for similar patterns, references existing implementations, and follows internal coding conventions. This context awareness is what enables production-quality output at the 25% scale.
What “25% of new code” means
Pichai’s figure needs context. “25% of new code” refers to code that:
- Is generated by AI, not autocompleted
- Passes code review (human engineers still review Agent Smith’s output)
- Ships in production systems
- Is measured across all of Google’s engineering output
This doesn’t mean 25% of Google’s total codebase is AI-generated. It means 25% of new code being written today comes from Agent Smith. The distinction matters because new code is additive to an existing human-written codebase. But the trajectory is clear: the percentage is growing, and Pichai highlighted it as a strategic advantage.
How Agent Smith differs from other AI coding tools
The AI coding tool spectrum
| Tool | Mode | Interaction | Scope | Production code? |
|---|---|---|---|---|
| GitHub Copilot | Real-time autocomplete | Inline in IDE | Line/function level | After human acceptance |
| Claude Code | Interactive session | Conversational | Multi-file changes | After human review |
| Cursor Agent | Background + interactive | IDE-embedded | Project-level | After human review |
| Agent Smith | Asynchronous autonomous | Task delegation | Full feature implementation | After human review |
| KAIROS (unreleased) | Always-on daemon | Background monitoring | Repository-wide | TBD |
Agent Smith sits at the autonomous end of this spectrum. The only step further would be fully autonomous deployment without human review, which no major tool does yet (and shouldn’t).
Why asynchronous matters for API teams
Real-time AI coding tools (Copilot, Claude Code) work within the developer’s flow. The developer sees what the AI writes, understands the context, and makes corrections in the moment.
Asynchronous agents change this dynamic. When Agent Smith writes an API endpoint, the developer reviews it after the fact. The review is separated from the creation context. This means:
- The developer may not understand why the agent chose a particular response format
- API contract changes may not be obvious in a standard code review
- Related artifacts (tests, docs, mocks) may not have been updated
- Breaking changes might slip through if the reviewer doesn’t check the full impact
What breaks when AI writes your API code
API contract drift
An API contract is the agreement between your service and its consumers: endpoints, request/response schemas, status codes, error formats. When a human developer modifies an API, they typically update the OpenAPI spec, notify consumers, and version the change.
When an autonomous agent modifies an API, those coordination steps don’t happen automatically. Agent Smith writes code that passes tests. But tests only cover what was previously written. If the agent changes a response schema in a way that passes existing tests but breaks downstream consumers, the breakage shows up in production.
Example scenario:
- Agent Smith is tasked with “Add user preferences to the profile endpoint”
- It adds a
preferencesfield to theGET /api/users/{id}response - Existing tests pass because they don’t assert on the absence of extra fields
- Frontend team’s TypeScript types don’t include
preferences - Mobile app’s strict JSON parsing throws on the unexpected field
The code is correct. The tests pass. The contract is broken.
Test coverage gaps
AI-generated code comes with AI-generated tests, and AI agents tend to write tests that validate what they built, not tests that guard against regressions. This creates a blind spot: the tests confirm the new behavior works, but they don’t confirm that existing behavior is preserved.
For API endpoints, this means:
- Response time benchmarks may not be tested
- Error response formats may diverge from your standard error schema
- Rate limiting behavior may not be validated
- Authentication edge cases may not be covered
- Pagination behavior may differ from existing endpoints
Documentation drift
If your API documentation is generated from code annotations or OpenAPI specs, agent-modified code should propagate to docs automatically. But many teams maintain documentation separately. When Agent Smith adds an endpoint or modifies a response schema, the documentation update is a separate task that the agent may or may not perform.
Even with auto-generated docs, the descriptions, examples, and usage notes require human context that an AI agent doesn’t have. The agent can document what an endpoint does. It can’t document why it exists, who uses it, or what trade-offs led to its design.
Review fatigue
When 25% of code is AI-generated, 25% of code reviews are reviewing AI output. AI-generated code is syntactically consistent and well-structured, which makes it look “fine” at a glance. But looking fine isn’t the same as being correct in context.
Reviewers face a new challenge: the code reads well but may not align with architectural decisions, team conventions, or unstated requirements that exist in the reviewer’s head but not in the agent’s prompt. Over time, review fatigue for AI-generated code can lead to rubber-stamping, which is the point where bugs start shipping.
How to build agent-proof API workflows
1. Make API contracts the source of truth
Design-first API development is the strongest defense against agent-induced drift. When the OpenAPI spec is the source of truth, any code change that breaks the contract is detectable.
Without design-first:
Code change → Tests pass → Ship → Contract broken
With design-first:
Spec defines contract → Code must match spec → Contract validation catches drift
Apidog’s visual API designer lets you define endpoints, schemas, and response formats before any code is written. When Agent Smith (or any agent) generates code, you validate it against the spec, not against existing tests that may be incomplete.
2. Use contract testing, not unit tests
Unit tests validate internal behavior. Contract tests validate the agreement between services. When an AI agent modifies your API, contract tests catch changes that unit tests miss.
Contract test example:
// This test fails if the response shape changes,
// even if the new shape is "valid"
describe("GET /api/users/:id contract", () => {
it("returns expected schema", async () => {
const response = await request(app).get("/api/users/123");
expect(response.body).toMatchSchema({
type: "object",
required: ["id", "name", "email", "created_at"],
properties: {
id: { type: "string" },
name: { type: "string" },
email: { type: "string", format: "email" },
created_at: { type: "string", format: "date-time" }
},
additionalProperties: false // This catches unexpected fields
});
});
});
The additionalProperties: false line is critical. Without it, an agent that adds fields to the response passes all tests. With it, any schema change requires explicit contract updates.
Apidog automates contract testing from your API spec. Define your schema once, and Apidog validates every response against it in both manual testing and CI/CD runs.
3. Gate deployments on spec validation
Add API spec validation to your CI/CD pipeline. Before any code (human or AI-generated) deploys, verify it matches the declared contract:
# CI/CD pipeline step
- name: Validate API contract
run: |
# Diff the current spec against the running implementation
apidog run --test-scenario-id CONTRACT_TESTS
# Fail if any contract violations found
if [ $? -ne 0 ]; then
echo "API contract violation detected. Review changes."
exit 1
fi
This catches Agent Smith’s contract-breaking changes before they reach production.
4. Require spec updates for API changes
Create a development rule: any PR that modifies API behavior must include a corresponding OpenAPI spec update. For AI-generated PRs, this means the agent must update the spec, or a human must do it before merging.
In Apidog, spec changes automatically propagate to:
- API documentation
- Mock server responses
- Test assertions
- Client SDK types
This cascade ensures no artifact drifts when the contract changes.
5. Monitor API behavior in production
Even with contract tests and spec validation, production monitoring catches what pre-production tests miss. Track:
- Response schema violations: Log when responses don’t match the declared schema
- New fields appearing: Alert on response fields that aren’t in the spec
- Error rate changes: AI-generated endpoints may have different error distributions
- Latency shifts: Agent-written code may have different performance characteristics
- Traffic pattern changes: New endpoints may receive unexpected traffic patterns
6. Separate API review from code review
Standard code review asks: “Does this code work?” API review asks: “Does this change affect consumers?”
For AI-generated API changes, create a separate review checklist:
- Does this change break any existing consumer?
- Is the OpenAPI spec updated?
- Are backward-incompatible changes versioned?
- Are error responses consistent with the existing error format?
- Are new endpoints documented with examples?
- Have downstream teams been notified?
The trajectory: where autonomous coding is heading
Agent Smith today vs. tomorrow
Agent Smith at 25% is the starting point. Sergey Brin called AI agents a “big focus” during a March 2026 sales town hall. The 25% figure will grow as the tool improves, access restrictions loosen, and workflows adapt.
Other companies are building similar systems:
- Claude Code’s KAIROS (leaked in the source code): always-on daemon with GitHub webhook subscriptions and background workers
- GitHub Copilot Agent Mode: multi-step coding tasks with autonomous file editing
- Amazon’s CodeWhisperer: expanding from autocomplete toward agentic workflows
The industry trend is clear: AI coding tools are moving from “assistant” to “autonomous contributor” to “background infrastructure.” Within a few years, the question won’t be whether AI writes your API code, but how much of it.
What API teams should prepare for now
Design-first is no longer optional. When agents write code, the API spec is the only stable artifact. Make it the source of truth now, before agent adoption makes it urgent.
Invest in contract testing infrastructure. Unit tests aren’t enough when the code author doesn’t understand your unwritten conventions. Contract tests encode those conventions explicitly.
Choose tools that keep artifacts in sync. Disconnected tools (separate API client, separate test runner, separate mock server, separate doc generator) create drift opportunities that agents exploit. Integrated platforms like Apidog keep everything synchronized.
Build review processes for AI-generated code. Standard code review doesn’t catch API contract violations. Create checklists and automated validation specifically for API changes.
Try Apidog free to build API workflows that stay consistent whether your next code change comes from a human developer, Agent Smith, or whatever autonomous coding tool comes next.
FAQ
What is Google Agent Smith?
Agent Smith is Google’s internal AI coding agent built on the Gemini model family and the Antigravity platform. It works asynchronously in the background: engineers assign tasks, and Agent Smith writes, tests, and iterates on code without real-time human interaction. It generated over 25% of Google’s new production code as of March 2026.
Is Agent Smith available outside Google?
No. Agent Smith is an internal tool restricted to Google employees. Google has not announced plans for a public release. The technology is similar to Copilot Agent Mode and Claude Code, but it’s more deeply integrated with Google’s internal codebase and documentation systems.
Does AI-generated code break API contracts?
It can. AI agents write code that passes tests, but tests may not cover all aspects of your API contract. Schema changes, new response fields, different error formats, and behavioral modifications can slip through testing while breaking downstream consumers. Contract testing and design-first development prevent this.
Should API teams worry about Agent Smith?
Not about Agent Smith specifically, since it’s Google-internal. But about the trend it represents, yes. Similar autonomous coding tools (Copilot Agent Mode, KAIROS, and others) will reach your team. Preparing your API workflow now, with design-first development, contract testing, and integrated tooling, positions you to adopt autonomous agents safely.
How do I prevent AI agents from breaking my APIs?
Use design-first development with the OpenAPI spec as the source of truth. Add contract testing with additionalProperties: false to catch unexpected schema changes. Gate deployments on spec validation. Use an integrated platform like Apidog that synchronizes specs, tests, mocks, and docs automatically.
What’s the difference between AI-assisted and AI-generated code?
AI-assisted code (Copilot suggestions, Claude Code sessions) is written in real time with human oversight. The developer sees and approves each change. AI-generated code (Agent Smith) is produced asynchronously without real-time human involvement. The developer reviews completed work after the fact. This difference changes review dynamics and increases the risk of undetected contract violations.
Will AI agents replace API developers?
No. Agent Smith still requires human task definition, code review, and deployment approval. A March 2026 MIT study confirmed that AI augments developer productivity but doesn’t replace the judgment, context awareness, and architectural thinking that humans provide. The role shifts from writing code to defining tasks, reviewing output, and maintaining system coherence.
Key takeaways
- Google’s Agent Smith generates 25% of new production code through asynchronous, autonomous operation
- This represents a shift from AI-assisted to AI-generated code, changing review dynamics for API teams
- API contract drift is the primary risk when autonomous agents modify endpoints and schemas
- Design-first development with OpenAPI specs as the source of truth prevents contract breakage
- Contract testing with strict schema validation catches changes unit tests miss
- Integrated platforms like Apidog synchronize specs, tests, mocks, and docs to prevent drift
- The trend toward autonomous coding agents is accelerating; prepare your API workflows now
Agent Smith at 25% is the beginning. The companies that build agent-proof API workflows today will be the ones that adopt autonomous coding tools safely tomorrow.



