How to Use Amazon Nova Act API and SDK

Amazon Nova Act is a research preview released by Amazon Artificial General Intelligence (AGI) that enables developers to build agents capable of taking actions within web browsers. This technology combines natural language instructions with Python scripting and Playwright automation to navigate websites, click buttons, fill forms, and extract data dynamically.

Unlike traditional web automation tools that rely on brittle scripts and website-specific code, Nova Act uses AI to interact with websites more adaptively, helping it handle changes in web interfaces without requiring constant maintenance.

💡

Are you looking to streamline your API development workflow? Apidog stands out as the ultimate Postman alternative, offering a comprehensive suite of tools for API design, debugging, testing, and documentation—all in one unified platform.

button

With its intuitive interface, collaborative features, and powerful automation capabilities, Apidog significantly reduces development time while improving API quality.

button

Prerequisites

Before getting started with Amazon Nova Act, you need:

Operating System: MacOS or Ubuntu (Windows is not currently supported)
Python: Version 3.10 or above
Amazon Account: For generating an API key
Location Requirement: Amazon Nova Act is currently only available as a research preview in the US

Getting Your Amazon Nova Act API Key

To use Amazon Nova Act:

Navigate to nova.amazon.com/act and sign in with your Amazon account
Select "Act" in the Labs section of the navigation pane
Generate an API key
If access isn't immediate, you may be placed on a waitlist and notified by email when granted access

Installation

Once you have your API key:

# Install the SDK
pip install nova-act

# Set your API key as an environment variable
export NOVA_ACT_API_KEY="your_api_key"

Note: The first time you run Nova Act, it may take 1-2 minutes to start as it installs Playwright modules. Subsequent runs will start more quickly.

Basic Usage

Let's start with a simple example directly from the documentation:

from nova_act import NovaAct

with NovaAct(starting_page="https://www.amazon.com") as nova:
    nova.act("search for a coffee maker")
    nova.act("select the first result")
    nova.act("scroll down or up until you see 'add to cart' and then click 'add to cart'")

This script will:

Open Chrome and navigate to Amazon
Search for coffee makers
Select the first result
Find and click the "Add to Cart" button

Interactive Mode

Nova Act can be used interactively for experimentation:

# Start Python shell
$ python
>>> from nova_act import NovaAct
>>> nova = NovaAct(starting_page="https://www.amazon.com")
>>> nova.start()
>>> nova.act("search for a coffee maker")

After the first action completes, continue with the next step:

>>> nova.act("select the first result")

Note that according to the documentation, Nova Act does not currently support iPython; use the standard Python shell instead.

Effective Prompting Strategies

The official documentation emphasizes breaking tasks into smaller steps for reliability:

1. Be Specific and Clear

❌ DON'T

nova.act("From my order history, find my most recent order from India Palace and reorder it")

✅ DO

nova.act("Click the hamburger menu icon, go to Order History, find my most recent order from India Palace and reorder it")

2. Break Complex Tasks into Smaller Steps

❌ DON'T

nova.act("book me a hotel that costs less than $100 with the highest star rating")

✅ DO

nova.act(f"search for hotels in Houston between {startdate} and {enddate}")
nova.act("sort by avg customer review")
nova.act("hit book on the first hotel that is $100 or less")
nova.act(f"fill in my name, address, and DOB according to {blob}")

Extracting Data from Web Pages

Nova Act supports structured data extraction using Pydantic models:

from pydantic import BaseModel
from nova_act import NovaAct, BOOL_SCHEMA

class Book(BaseModel):
    title: str
    author: str

class BookList(BaseModel):
    books: list[Book]

def get_books(year: int) -> BookList | None:
    """Get top NYT books of the year and return as a BookList."""
    with NovaAct(starting_page=f"https://en.wikipedia.org/wiki/List_of_The_New_York_Times_number-one_books_of_{year}#Fiction") as nova:
        result = nova.act(
            "Return the books in the Fiction list",
            schema=BookList.model_json_schema()
        )
        
        if not result.matches_schema:
            # Act response did not match the schema
            return None
            
        # Parse the JSON into the pydantic model
        book_list = BookList.model_validate(result.parsed_response)
        return book_list

For simple boolean responses, use the built-in BOOL_SCHEMA:

result = nova.act("Am I logged in?", schema=BOOL_SCHEMA)
if result.matches_schema:
    if result.parsed_response:
        print("You are logged in")
    else:
        print("You are not logged in")

Parallel Processing with Multiple Browser Sessions

The GitHub documentation confirms that Nova Act supports parallel processing with multiple browser sessions:

from concurrent.futures import ThreadPoolExecutor, as_completed
from nova_act import NovaAct, ActError

# Accumulate results here
all_books = []

# Set maximum concurrent browser sessions
with ThreadPoolExecutor(max_workers=10) as executor:
    # Get books from multiple years in parallel
    future_to_books = {
        executor.submit(get_books, year): year
        for year in range(2010, 2025)
    }
    
    # Collect results
    for future in as_completed(future_to_books.keys()):
        try:
            year = future_to_books[future]
            book_list = future.result()
            if book_list is not None:
                all_books.extend(book_list.books)
        except ActError as exc:
            print(f"Skipping year {year} due to error: {exc}")

Authentication and Browser State

For websites requiring authentication, Nova Act provides options to use existing Chrome profiles:

import os
from nova_act import NovaAct

user_data_dir = "path/to/my/chrome_profile"
os.makedirs(user_data_dir, exist_ok=True)

with NovaAct(
    starting_page="https://amazon.com/", 
    user_data_dir=user_data_dir,
    clone_user_data_dir=False
) as nova:
    input("Log into your websites, then press enter...")

There's also a built-in helper script for this purpose:

python -m nova_act.samples.setup_chrome_user_data_dir

Handling Sensitive Information

The documentation specifically warns about handling sensitive data:

# Sign in properly
nova.act("enter username janedoe and click on the password field")

# Use Playwright directly for sensitive data
nova.page.keyboard.type(getpass())  # getpass() collects password securely

# Continue after credentials are entered
nova.act("sign in")

Security Warning: The documentation notes that screenshots taken during execution will capture any visible sensitive information.

Additional Features

Working with Captchas

result = nova.act("Is there a captcha on the screen?", schema=BOOL_SCHEMA)
if result.matches_schema and result.parsed_response:
    input("Please solve the captcha and hit return when done")

Downloading Files

with nova.page.expect_download() as download_info:
    nova.act("click on the download button")
    
# Save permanently
download_info.value.save_as("my_downloaded_file")

Recording Sessions

nova = NovaAct(
    starting_page="https://example.com",
    logs_directory="/path/to/logs",
    record_video=True
)

Real-World Example: Apartment Search

The dev.to article demonstrates a real-world example of finding apartments near a train station. Here's the core structure of that example:

from concurrent.futures import ThreadPoolExecutor, as_completed
import pandas as pd
from pydantic import BaseModel
from nova_act import NovaAct

class Apartment(BaseModel):
    address: str
    price: str
    beds: str
    baths: str

class ApartmentList(BaseModel):
    apartments: list[Apartment]

class CaltrainBiking(BaseModel):
    biking_time_hours: int
    biking_time_minutes: int
    biking_distance_miles: float

# First find apartments
with NovaAct(starting_page="https://zumper.com/", headless=headless) as client:
    client.act(
        "Close any cookie banners. "
        f"Search for apartments near {caltrain_city}, CA, "
        f"then filter for {bedrooms} bedrooms and {baths} bathrooms."
    )
    
    # Extract apartment listings with schema
    result = client.act(
        "Return the currently visible list of apartments",
        schema=ApartmentList.model_json_schema()
    )
    
# Then check biking distances in parallel
with ThreadPoolExecutor() as executor:
    # Submit parallel tasks to check biking distance to train station
    future_to_apartment = {
        executor.submit(add_biking_distance, apartment, caltrain_city, headless): apartment 
        for apartment in all_apartments
    }
    
    # Process results
    for future in as_completed(future_to_apartment.keys()):
        # Collect and process results
        pass

# Sort and display results
apartments_df = pd.DataFrame(apartments_with_biking)

This example demonstrates how Nova Act can:

Extract structured data from websites
Process multiple browser sessions in parallel
Combine information from different sources

Known Limitations

According to the documentation, Nova Act has these limitations:

Browser-Only: Cannot interact with non-browser applications
Limited Reliability: May struggle with high-level prompts
UI Constraints: Cannot interact with elements hidden behind mouseovers
Browser Modals: Cannot interact with browser window modals like permission requests
Geography Limitation: Currently only available in the US
Research Status: This is an experimental preview, not a production service

NovaAct Constructor Options

The documentation lists these parameters for initializing NovaAct:

NovaAct(
    starting_page="https://example.com",  # Required: URL to start at
    headless=False,                       # Whether to run browser visibly or not
    quiet=False,                          # Whether to suppress logs
    user_data_dir=None,                   # Path to Chrome profile
    nova_act_api_key=None,                # API key (can use env var instead)
    logs_directory=None,                  # Where to store logs
    record_video=False,                   # Whether to record session
    # Other options as documented
)

Conclusion

Amazon Nova Act represents an innovative approach to browser automation by combining AI with traditional automation techniques. While still in research preview with some limitations, it offers a promising direction for making web automation more reliable and adaptable.

The key advantage of Nova Act is its ability to break down complex browser interactions into discrete, reliable steps using natural language instructions, which can be combined with Python code for flexible, powerful automation workflows.

As this is a research preview available only in the US, expect ongoing changes and improvements. For the most current information, always refer to the official documentation at GitHub and nova.amazon.com/act.