How to Use Amazon Nova Act API and SDK

Emmanuel Mumba

Emmanuel Mumba

3 April 2025

How to Use Amazon Nova Act API and SDK

Amazon Nova Act is a research preview released by Amazon Artificial General Intelligence (AGI) that enables developers to build agents capable of taking actions within web browsers. This technology combines natural language instructions with Python scripting and Playwright automation to navigate websites, click buttons, fill forms, and extract data dynamically.

Unlike traditional web automation tools that rely on brittle scripts and website-specific code, Nova Act uses AI to interact with websites more adaptively, helping it handle changes in web interfaces without requiring constant maintenance.

💡
Are you looking to streamline your API development workflow? Apidog stands out as the ultimate Postman alternative, offering a comprehensive suite of tools for API design, debugging, testing, and documentation—all in one unified platform. 
button

With its intuitive interface, collaborative features, and powerful automation capabilities, Apidog significantly reduces development time while improving API quality.

\
button

Prerequisites

Before getting started with Amazon Nova Act, you need:

Getting Your Amazon Nova Act API Key

To use Amazon Nova Act:

  1. Navigate to nova.amazon.com/act and sign in with your Amazon account
  2. Select "Act" in the Labs section of the navigation pane
  3. Generate an API key
  4. If access isn't immediate, you may be placed on a waitlist and notified by email when granted access

Installation

Once you have your API key:

# Install the SDK
pip install nova-act

# Set your API key as an environment variable
export NOVA_ACT_API_KEY="your_api_key"

Note: The first time you run Nova Act, it may take 1-2 minutes to start as it installs Playwright modules. Subsequent runs will start more quickly.

Basic Usage

Let's start with a simple example directly from the documentation:

from nova_act import NovaAct

with NovaAct(starting_page="https://www.amazon.com") as nova:
    nova.act("search for a coffee maker")
    nova.act("select the first result")
    nova.act("scroll down or up until you see 'add to cart' and then click 'add to cart'")

This script will:

  1. Open Chrome and navigate to Amazon
  2. Search for coffee makers
  3. Select the first result
  4. Find and click the "Add to Cart" button

Interactive Mode

Nova Act can be used interactively for experimentation:

# Start Python shell
$ python
>>> from nova_act import NovaAct
>>> nova = NovaAct(starting_page="https://www.amazon.com")
>>> nova.start()
>>> nova.act("search for a coffee maker")

After the first action completes, continue with the next step:

>>> nova.act("select the first result")

Note that according to the documentation, Nova Act does not currently support iPython; use the standard Python shell instead.

Effective Prompting Strategies

The official documentation emphasizes breaking tasks into smaller steps for reliability:

1. Be Specific and Clear

DON'T

nova.act("From my order history, find my most recent order from India Palace and reorder it")

DO

nova.act("Click the hamburger menu icon, go to Order History, find my most recent order from India Palace and reorder it")

2. Break Complex Tasks into Smaller Steps

DON'T

nova.act("book me a hotel that costs less than $100 with the highest star rating")

DO

nova.act(f"search for hotels in Houston between {startdate} and {enddate}")
nova.act("sort by avg customer review")
nova.act("hit book on the first hotel that is $100 or less")
nova.act(f"fill in my name, address, and DOB according to {blob}")

Extracting Data from Web Pages

Nova Act supports structured data extraction using Pydantic models:

from pydantic import BaseModel
from nova_act import NovaAct, BOOL_SCHEMA

class Book(BaseModel):
    title: str
    author: str

class BookList(BaseModel):
    books: list[Book]

def get_books(year: int) -> BookList | None:
    """Get top NYT books of the year and return as a BookList."""
    with NovaAct(starting_page=f"https://en.wikipedia.org/wiki/List_of_The_New_York_Times_number-one_books_of_{year}#Fiction") as nova:
        result = nova.act(
            "Return the books in the Fiction list",
            schema=BookList.model_json_schema()
        )
        
        if not result.matches_schema:
            # Act response did not match the schema
            return None
            
        # Parse the JSON into the pydantic model
        book_list = BookList.model_validate(result.parsed_response)
        return book_list

For simple boolean responses, use the built-in BOOL_SCHEMA:

result = nova.act("Am I logged in?", schema=BOOL_SCHEMA)
if result.matches_schema:
    if result.parsed_response:
        print("You are logged in")
    else:
        print("You are not logged in")

Parallel Processing with Multiple Browser Sessions

The GitHub documentation confirms that Nova Act supports parallel processing with multiple browser sessions:

from concurrent.futures import ThreadPoolExecutor, as_completed
from nova_act import NovaAct, ActError

# Accumulate results here
all_books = []

# Set maximum concurrent browser sessions
with ThreadPoolExecutor(max_workers=10) as executor:
    # Get books from multiple years in parallel
    future_to_books = {
        executor.submit(get_books, year): year
        for year in range(2010, 2025)
    }
    
    # Collect results
    for future in as_completed(future_to_books.keys()):
        try:
            year = future_to_books[future]
            book_list = future.result()
            if book_list is not None:
                all_books.extend(book_list.books)
        except ActError as exc:
            print(f"Skipping year {year} due to error: {exc}")

Authentication and Browser State

For websites requiring authentication, Nova Act provides options to use existing Chrome profiles:

import os
from nova_act import NovaAct

user_data_dir = "path/to/my/chrome_profile"
os.makedirs(user_data_dir, exist_ok=True)

with NovaAct(
    starting_page="https://amazon.com/", 
    user_data_dir=user_data_dir,
    clone_user_data_dir=False
) as nova:
    input("Log into your websites, then press enter...")

There's also a built-in helper script for this purpose:

python -m nova_act.samples.setup_chrome_user_data_dir

Handling Sensitive Information

The documentation specifically warns about handling sensitive data:

# Sign in properly
nova.act("enter username janedoe and click on the password field")

# Use Playwright directly for sensitive data
nova.page.keyboard.type(getpass())  # getpass() collects password securely

# Continue after credentials are entered
nova.act("sign in")

Security Warning: The documentation notes that screenshots taken during execution will capture any visible sensitive information.

Additional Features

Working with Captchas

result = nova.act("Is there a captcha on the screen?", schema=BOOL_SCHEMA)
if result.matches_schema and result.parsed_response:
    input("Please solve the captcha and hit return when done")

Downloading Files

with nova.page.expect_download() as download_info:
    nova.act("click on the download button")
    
# Save permanently
download_info.value.save_as("my_downloaded_file")

Recording Sessions

nova = NovaAct(
    starting_page="https://example.com",
    logs_directory="/path/to/logs",
    record_video=True
)

The dev.to article demonstrates a real-world example of finding apartments near a train station. Here's the core structure of that example:

from concurrent.futures import ThreadPoolExecutor, as_completed
import pandas as pd
from pydantic import BaseModel
from nova_act import NovaAct

class Apartment(BaseModel):
    address: str
    price: str
    beds: str
    baths: str

class ApartmentList(BaseModel):
    apartments: list[Apartment]

class CaltrainBiking(BaseModel):
    biking_time_hours: int
    biking_time_minutes: int
    biking_distance_miles: float

# First find apartments
with NovaAct(starting_page="https://zumper.com/", headless=headless) as client:
    client.act(
        "Close any cookie banners. "
        f"Search for apartments near {caltrain_city}, CA, "
        f"then filter for {bedrooms} bedrooms and {baths} bathrooms."
    )
    
    # Extract apartment listings with schema
    result = client.act(
        "Return the currently visible list of apartments",
        schema=ApartmentList.model_json_schema()
    )
    
# Then check biking distances in parallel
with ThreadPoolExecutor() as executor:
    # Submit parallel tasks to check biking distance to train station
    future_to_apartment = {
        executor.submit(add_biking_distance, apartment, caltrain_city, headless): apartment 
        for apartment in all_apartments
    }
    
    # Process results
    for future in as_completed(future_to_apartment.keys()):
        # Collect and process results
        pass

# Sort and display results
apartments_df = pd.DataFrame(apartments_with_biking)

This example demonstrates how Nova Act can:

Known Limitations

According to the documentation, Nova Act has these limitations:

NovaAct Constructor Options

The documentation lists these parameters for initializing NovaAct:

NovaAct(
    starting_page="https://example.com",  # Required: URL to start at
    headless=False,                       # Whether to run browser visibly or not
    quiet=False,                          # Whether to suppress logs
    user_data_dir=None,                   # Path to Chrome profile
    nova_act_api_key=None,                # API key (can use env var instead)
    logs_directory=None,                  # Where to store logs
    record_video=False,                   # Whether to record session
    # Other options as documented
)

Conclusion

Amazon Nova Act represents an innovative approach to browser automation by combining AI with traditional automation techniques. While still in research preview with some limitations, it offers a promising direction for making web automation more reliable and adaptable.

The key advantage of Nova Act is its ability to break down complex browser interactions into discrete, reliable steps using natural language instructions, which can be combined with Python code for flexible, powerful automation workflows.

As this is a research preview available only in the US, expect ongoing changes and improvements. For the most current information, always refer to the official documentation at GitHub and nova.amazon.com/act.

Explore more

How to get start or end of a day in Python

How to get start or end of a day in Python

Learning Python can be that quiet space where things actually make sense. Let me walk you through something super practical that you'll use all the time: getting the start and end of a day in Python. Trust me, this comes up way more than you'd think. When you're building real applications - whether it's a simple script to organize your music files or something bigger - you'll constantly need to work with dates and times. Maybe you want to find all the logs from today, or calculate how long you'

6 June 2025

15 Best Open-Source RAG Frameworks in 2025

15 Best Open-Source RAG Frameworks in 2025

Large Language Models (LLMs) are revolutionary, but they have a fundamental limitation: their knowledge is frozen in time, limited to the data they were trained on. They can't access your private documents, query real-time data, or cite their sources. This is where Retrieval-Augmented Generation (RAG) comes in. RAG is the architectural pattern that gives LLMs a superpower: the ability to retrieve relevant information from external knowledge bases before answering a question. This simple but pow

6 June 2025

Stagehand Review: Best AI Browser Automation Framework?

Stagehand Review: Best AI Browser Automation Framework?

Browser automation has long been a cornerstone of modern software development, testing, and data extraction. For years, frameworks like Selenium, Puppeteer, and more recently, Playwright, have dominated the landscape. These tools offer granular control over browser actions, but they come with a steep learning curve and a significant maintenance burden. Scripts are often brittle, breaking with the slightest change in a website's UI. On the other end of the spectrum, a new wave of AI-native agents

6 June 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs