TL;DR
H Company launched Holo3 on March 31, 2026, a mixture-of-experts model scoring 78.85% on OSWorld-Verified, the highest ever recorded on the leading desktop computer use benchmark. It beats GPT-5.4 and Opus 4.6 at a fraction of the cost. The API is live now, and the 35B variant is open-weight on HuggingFace under Apache 2.0.
The computer use gap most developers haven't solved
You've automated your APIs. Your CI/CD pipeline runs clean. But there's still a class of task that breaks every automation: legacy enterprise software with no API, desktop apps that predate REST, multi-step workflows that cross five different UIs.
Traditional RPA tools (UiPath, Automation Anywhere) handle this with brittle screen-coordinate scripts that break every time the UI changes. The alternative has been manual work.
Computer use AI changes that equation. Models that see screenshots and issue click, type, and scroll actions can navigate any GUI without needing an API. Holo3, released March 31, 2026 by Paris-based H Company, is currently the strongest publicly available model for this class of task.
What is Holo3?
Holo3 is a computer use model: you give it a screenshot of a desktop or browser, tell it what task to complete, and it returns actions (clicks, keystrokes, scroll commands) to execute on that screen. You capture the result, screenshot again, and repeat until the task is done.

H Company ships two variants:
- Holo3-122B-A10B — the flagship. 122B total parameters, 10B active (sparse MoE). Hosted API only at hcompany.ai/holo-models-api. Sets the current benchmark record.
- Holo3-35B-A3B — 35B total, 3B active. Open-weight on HuggingFace under Apache 2.0. Free tier on H Company's inference API. Self-hostable.
The MoE (mixture of experts) architecture means only a fraction of the parameters fire per token, so the model is significantly cheaper to run than its total parameter count suggests. H Company states Holo3-122B-A10B costs less than GPT-5.4 and Opus 4.6 on a per-task basis.
OSWorld-Verified: what the benchmark actually measures
OSWorld-Verified is the leading benchmark for evaluating AI computer use. Unlike benchmarks that score on output text, OSWorld tests execution: the agent must complete real tasks on a real computer, and success is verified by checking the actual state of the system afterward.
Tasks span the full complexity range:
- Single-app tasks (open a file, fill a form, copy data between cells)
- Cross-app workflows (retrieve a value from a PDF, update a spreadsheet, send a confirmation email)
- Long-horizon multi-app sequences that require reasoning across several systems without losing context
Holo3-122B-A10B scores 78.85% on OSWorld-Verified. To put that in context: scores above 40% were considered state-of-the-art until recently. Previous leading models from Anthropic and OpenAI sat in the 60-65% range.

The gap matters most at the hard end of the benchmark. H Company's internal H Corporate Benchmarks (486 tasks across E-commerce, Business software, Collaboration, and Multi-App workflows) show Holo3 especially pulling ahead on multi-app tasks — the ones that require coordinating data across several applications simultaneously.
How Holo3 was trained: the Agentic Learning Flywheel
Most computer use models are trained on static demonstrations. H Company built a continuous training loop they call the Agentic Learning Flywheel:
- Synthetic Navigation Data — Human and generated instructions produce scenario-specific navigation examples.
- Out-of-Domain Augmentation — The scenarios are programmatically extended to cover unexpected UI states and edge cases.
- Curated Reinforcement Learning — Each data sample is filtered and used in an RL pipeline to directly maximize task completion rates.
The training data comes from the Synthetic Environment Factory — a system where coding agents build complete enterprise web applications from scratch based on scenario specs. These environments include verifiable tasks with end-to-end validation scripts, so the model trains on realistic business workflows rather than toy examples.
The result: Holo3 outperforms base Qwen3.5 models with larger parameter counts on the same benchmark tasks. Architecture alone doesn't explain the gap; the training methodology does.
How to call the Holo3 API
The Holo3 API follows a standard screenshot-action loop pattern. Here's the basic flow:
1. Set up authentication
# H Company Inference API base URL
https://api.hcompany.ai/v1
# Header
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
Get your API key at hcompany.ai/holo-models-api. The free tier covers Holo3-35B-A3B.
2. Send a screenshot with a task
import base64
import httpx
# Capture your screen (example using pyautogui)
import pyautogui
screenshot = pyautogui.screenshot()
screenshot.save("/tmp/screen.png")
with open("/tmp/screen.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = httpx.post(
"https://api.hcompany.ai/v1/computer-use",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"model": "holo3-122b-a10b",
"task": "Open the invoice folder and find the most recent PDF",
"screenshot": image_b64,
"screen_width": 1920,
"screen_height": 1080
}
)
action = response.json()
print(action)
3. Parse and execute the action
The API returns structured actions you execute on the host machine:
{
"action_type": "click",
"coordinate": [245, 380],
"reasoning": "The invoice folder icon is visible at this position"
}
Action types include: click, double_click, right_click, type, key, scroll, screenshot_request (when the model needs a fresh view), and task_complete.
4. Loop until completion
def run_computer_use_task(task: str, max_steps: int = 20):
for step in range(max_steps):
screenshot = capture_screen()
response = call_holo3_api(task, screenshot)
action = response["action"]
if action["action_type"] == "task_complete":
print(f"Done in {step + 1} steps")
return response["result"]
execute_action(action)
raise TimeoutError("Task not completed within step limit")
Testing Holo3 API calls with Apidog
Once you're calling the Holo3 API, you need to validate that your integration works reliably, especially for production automation. Apidog handles this cleanly.
Import the endpoint: In Apidog, create a new HTTP request to https://api.hcompany.ai/v1/computer-use. Add your Authorization header as an environment variable so you don't hardcode keys.
Set up request validation: Apidog's test assertions let you check the response structure automatically:
// In Apidog's post-response script
pm.test("Action type is valid", () => {
const validActions = ["click", "type", "key", "scroll", "task_complete", "screenshot_request"];
pm.expect(validActions).to.include(pm.response.json().action.action_type);
});
pm.test("Coordinates are within screen bounds", () => {
const action = pm.response.json().action;
if (action.coordinate) {
pm.expect(action.coordinate[0]).to.be.within(0, 1920);
pm.expect(action.coordinate[1]).to.be.within(0, 1080);
}
});
Mock the API during development: Use Apidog's Smart Mock to generate realistic Holo3 responses without hitting the live API. This saves credits during integration testing and lets your frontend or orchestration layer develop in parallel.
Run test scenarios: Chain multiple Holo3 requests in an Apidog Test Scenario to simulate a full multi-step task loop. You can validate that the action sequence is coherent across steps before running it on a live machine.
Holo3 vs Claude Computer Use vs OpenAI Operator
| Holo3-122B | Holo3-35B | Claude Computer Use | OpenAI Operator | |
|---|---|---|---|---|
| OSWorld-Verified | 78.85% | ~55% (est.) | ~65% | ~62% |
| API access | Yes | Yes (free tier) | Yes | Yes |
| Open weights | No | Yes (Apache 2.0) | No | No |
| Self-hostable | No | Yes | No | No |
| Cost vs GPT-5.4 | Lower | Much lower | Comparable | GPT-5.4 pricing |
| Best for | Production enterprise | Dev/testing/OSS | Anthropic ecosystem | OpenAI ecosystem |
The practical choice depends on your stack:
- Holo3-122B if you need peak accuracy on complex multi-app workflows and cost is secondary to reliability.
- Holo3-35B for development, testing, open-source projects, or when you want to self-host.
- Claude Computer Use if you're already deep in the Anthropic ecosystem and want unified API billing.
- OpenAI Operator if you're using GPT-5.4 elsewhere and want a single vendor relationship.
Enterprise use cases
Holo3 covers workflows that have no clean API-based solution:
Legacy system data entry — ERP and CRM systems from the 2000s with no REST API. Holo3 can navigate the desktop UI and enter or extract data without requiring a modernization project.
Cross-platform reconciliation — Pull a figure from a PDF, check it against an internal spreadsheet, update a third-party dashboard. Holo3 handles the full sequence autonomously.
Regression testing for web apps — Instead of maintaining brittle Selenium scripts tied to element IDs, point Holo3 at your staging environment with a plain-language task description. It adapts to UI changes without selector updates.
Competitive intelligence — Systematically browse and extract structured data from websites that block standard scraping.
H Company's H Corporate Benchmarks show Holo3 achieving strong results across all four categories: E-commerce, Business software, Collaboration, and Multi-App. Multi-App workflows show the biggest performance gap over competitors — tasks that require reasoning across several applications without losing state are where the training methodology pays off most.
What's next: Adaptive Agency
H Company is direct about what comes after Holo3. Their current work centers on Adaptive Agency — models that don't just navigate software they've seen before, but learn to navigate entirely new, bespoke enterprise software in real-time.
Current computer use models, including Holo3, are still trained on a finite set of software environments. An agent hitting a custom internal tool it has never seen will have lower success rates than on standard apps. Adaptive Agency aims to close that gap: the model would reason about the software structure on first contact, build a working model of how it operates, and execute tasks without prior training data.
If H Company delivers on this, it removes the main remaining limitation of computer use AI for enterprise deployment.
Conclusion
Holo3 sets a new bar for desktop computer use. At 78.85% on OSWorld-Verified, it's measurably better than Claude and GPT-based alternatives on complex multi-step tasks. The free tier on Holo3-35B-A3B and Apache 2.0 open weights make it accessible for developers to test without upfront cost.
The integration pattern is straightforward: screenshot, POST to the API, execute the returned action, repeat. Where Apidog helps is making that integration reliable — validating response structures, mocking during development, and running test scenarios before you deploy against live systems.
If you're building anything that touches desktop GUIs, try Apidog free and test your Holo3 integration before it hits production.
FAQ
What is Holo3?Holo3 is a computer use AI model from H Company that takes screenshots as input and returns actions (clicks, keystrokes, scrolls) to complete tasks on a desktop or browser. It scores 78.85% on the OSWorld-Verified benchmark, the highest result recorded on that test.
Is Holo3 open source?The smaller variant, Holo3-35B-A3B, is open-weight under Apache 2.0 and downloadable from HuggingFace. The flagship Holo3-122B-A10B is API-only. Both are available through H Company's inference API, with a free tier for the 35B model.
How does the OSWorld benchmark work?OSWorld tests AI agents on real computer tasks — web navigation, file management, cross-app workflows. Success is verified by checking the actual system state after the agent runs, not by evaluating output text. Tasks range from single-app operations to long-horizon multi-application sequences.
How does Holo3 compare to Claude Computer Use?Holo3-122B scores higher on OSWorld-Verified (78.85% vs approximately 65% for Claude). It's also cheaper per task. Claude Computer Use remains a strong option for teams already using the Anthropic API who want a single billing relationship.
Can I run Holo3 locally?Yes, if you use Holo3-35B-A3B. The weights are on HuggingFace under Apache 2.0. The 122B model is inference API only.
What are the main use cases for computer use APIs?Legacy system automation (no REST API available), cross-app data workflows, web app regression testing without brittle selectors, competitive intelligence scraping, and any desktop workflow that currently requires manual human interaction.
How do I test my Holo3 API integration?Use Apidog to import the endpoint, set up response validation assertions, mock the API during development, and chain requests into test scenarios. This catches integration issues before you run automation on live machines.
What is "Adaptive Agency" in Holo3's roadmap?H Company is working on models that can navigate enterprise software they have never seen before, learning the UI structure in real-time rather than relying on prior training data. This would remove the main remaining limitation of computer use AI for fully custom enterprise deployments.



