How to Use Qwen3.5 API for Free with NVIDIA ?

Learn how to use Qwen3.5 VLM API for free with NVIDIA GPU-accelerated endpoints. Step-by-step tutorial with code examples for multimodal AI integration.

Ashley Innocent

Ashley Innocent

28 February 2026

How to Use Qwen3.5 API for Free with NVIDIA ?

TL;DR

Qwen3.5 is Alibaba's groundbreaking 397-billion parameter vision-language model with Mixture of Experts (MoE) architecture. You can access it for free through NVIDIA's GPU-accelerated endpoints by registering for the NVIDIA Developer Program. This guide walks you through obtaining your API key, making your first calls, and integrating Qwen3.5's multimodal capabilities into your applications.

Introduction

Alibaba's Qwen3.5 represents a significant leap in multimodal AI. This 397-billion parameter model combines Mixture of Experts (MoE) architecture with Gated Delta Networks, delivering powerful reasoning capabilities while keeping active parameters at just 17 billion. The result is a model that can understand images, navigate user interfaces, and handle complex multimodal tasks, all accessible through a free API.

The best part? You can start using Qwen3.5 for free right now through NVIDIA's developer platform. Whether you're building AI agents, developing visual reasoning applications, or exploring multimodal AI, this guide will walk you through every step.

💡
If you're building applications that integrate with Qwen3.5 or any other AI API, you'll need robust testing tools. Apidog provides a comprehensive API testing platform that makes it easy to validate your AI API integrations, manage environment variables, and automate testing workflows.
button

What is Qwen3.5 VLM?

Qwen3.5 is Alibaba's first native vision-language model in the Qwen3.5 series, designed specifically for building autonomous agents. Unlike previous VLMs that were adapted from text-only models, Qwen3.5 was built from the ground up for multimodal reasoning and UI navigation.

Qwen 3.5 Benchmark

Key Specifications

SpecificationValue
Total Parameters397 billion
Active Parameters17 billion
Activation Rate4.28%
Expert Count512 experts
Experts per Token11 (10 routed + 1 shared)
Input Context256K (extensible to 1M)
Languages Supported200+
ArchitectureMoE + Gated Delta Networks
Gated Delta Networks Architecture
Gated Delta Networks Architecture

What Makes Qwen3.5 Special

Mixture of Experts (MoE) architecture means only a subset of the model's parameters are active for any given input. This makes the model computationally efficient while maintaining the capacity for complex reasoning across all 397B parameters.

Native Multimodal Agent Capabilities set Qwen3.5 apart from other VLMs:

Ideal Use Cases

NVIDIA Developer Program: Get Your Free API Key

NVIDIA provides free access to Qwen3.5 through their GPU-accelerated endpoints. Here's how to get started:

Step 1: Join NVIDIA Developer Program

  1. Visit build.nvidia.com
  2. Click Sign In or Create Account
  3. Register for the NVIDIA Developer Program (free)
  4. Verify your email address
NVIDIA Developer Program

Step 2: Get Your API Key

  1. After logging in, navigate to your account settings
  2. Find API Keys or NVIDIA API Key
  3. Copy your API key (starts with nvapi-)
  4. Store it securely (you'll need it for authentication)
Important: Never expose your API key in client-side code. Use environment variables or a backend server to store it securely.

Step 3: Test Your Access

You can test Qwen3.5 directly in your browser at build.nvidia.com/qwen/qwen3.5-397b-a17b. This lets you experiment with prompts and evaluate the model with your own data before writing any code.

Your First Qwen3.5 API Call

Now let's make your first API call to Qwen3.5. The API is compatible with OpenAI's format, making it easy to integrate into existing applications.

Basic API Call

import requests

# Configuration
invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions"
api_key = "YOUR_NVIDIA_API_KEY"  # Replace with your API key

headers = {
    "Authorization": f"Bearer {api_key}",
    "Accept": "application/json",
}

# Payload - simple text-only request
payload = {
    "messages": [
        {
            "role": "user",
            "content": "What are the key features of Qwen3.5 VLM?"
        }
    ],
    "model": "qwen/qwen3.5-397b-a17b",
    "max_tokens": 1024,
    "temperature": 0.7,
}

# Make the request
session = requests.Session()
response = session.post(invoke_url, headers=headers, json=payload)
response.raise_for_status()

# Print the response
result = response.json()
print(result['choices'][0]['message']['content'])

Making Multimodal Requests (With Images)

To use Qwen3.5's vision capabilities, include image data in your request:

import requests
import base64

# Function to encode image to base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Encode your image
image_base64 = encode_image("screenshot.png")

invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions"
api_key = "YOUR_NVIDIA_API_KEY"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Accept": "application/json",
}

# Multimodal request with image
payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_base64}"}
                },
                {
                    "type": "text",
                    "text": "What do you see in this image? Describe the UI elements."
                }
            ]
        }
    ],
    "model": "qwen/qwen3.5-397b-a17b",
    "max_tokens": 1024,
}

response = requests.post(invoke_url, headers=headers, json=payload)
result = response.json()
print(result['choices'][0]['message']['content'])

Code Examples in Python and JavaScript

Python: Complete Integration Example

import os
import requests
from requests.exceptions import RequestException

class QwenClient:
    """Python client for Qwen3.5 API"""

    def __init__(self, api_key=None):
        self.api_key = api_key or os.getenv("NVIDIA_API_KEY")
        self.endpoint = "https://integrate.api.nvidia.com/v1/chat/completions"
        self.model = "qwen/qwen3.5-397b-a17b"

    def chat(self, message, system_prompt=None, **kwargs):
        """Send a chat message to Qwen3.5"""
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        messages.append({"role": "user", "content": message})

        payload = {
            "messages": messages,
            "model": self.model,
            "max_tokens": kwargs.get("max_tokens", 2048),
            "temperature": kwargs.get("temperature", 0.7),
            "top_p": kwargs.get("top_p", 0.9),
        }

        # Enable thinking mode if requested
        if kwargs.get("thinking", False):
            payload["chat_template_kwargs"] = {"thinking": True}

        try:
            response = requests.post(
                self.endpoint,
                headers=headers,
                json=payload,
                timeout=kwargs.get("timeout", 60)
            )
            response.raise_for_status()
            return response.json()
        except RequestException as e:
            return {"error": str(e)}

    def chat_with_image(self, message, image_path, **kwargs):
        """Send a chat message with image to Qwen3.5"""
        import base64

        with open(image_path, "rb") as f:
            image_base64 = base64.b64encode(f.read()).decode("utf-8")

        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        payload = {
            "messages": [{
                "role": "user",
                "content": [
                    {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}},
                    {"type": "text", "text": message}
                ]
            }],
            "model": self.model,
            "max_tokens": kwargs.get("max_tokens", 2048),
            "temperature": kwargs.get("temperature", 0.7),
        }

        response = requests.post(self.endpoint, headers=headers, json=payload)
        response.raise_for_status()
        return response.json()


# Usage example
client = QwenClient(api_key="YOUR_NVIDIA_API_KEY")

# Text-only chat
result = client.chat("Explain Mixture of Experts architecture in simple terms")
print(result['choices'][0]['message']['content'])

# Multimodal chat
result = client.chat_with_image(
    "What UI elements are in this screenshot?",
    "screenshot.png"
)
print(result['choices'][0]['message']['content'])

JavaScript/Node.js: Complete Integration Example

const axios = require('axios');

class QwenClient {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.endpoint = 'https://integrate.api.nvidia.com/v1/chat/completions';
    this.model = 'qwen/qwen3.5-397b-a17b';
  }

  async chat(message, options = {}) {
    const { systemPrompt, temperature = 0.7, maxTokens = 2048, thinking = false } = options;

    const messages = [];
    if (systemPrompt) {
      messages.push({ role: 'system', content: systemPrompt });
    }
    messages.push({ role: 'user', content: message });

    const payload = {
      messages,
      model: this.model,
      temperature,
      max_tokens: maxTokens,
      ...(thinking && { chat_template_kwargs: { thinking: true } })
    };

    try {
      const response = await axios.post(this.endpoint, payload, {
        headers: {
          'Authorization': `Bearer ${this.apiKey}`,
          'Content-Type': 'application/json'
        },
        timeout: 60000
      });

      return response.data;
    } catch (error) {
      console.error('API Error:', error.response?.data || error.message);
      throw error;
    }
  }

  async chatWithImage(message, imageBase64, options = {}) {
    const { temperature = 0.7, maxTokens = 2048 } = options;

    const payload = {
      messages: [{
        role: 'user',
        content: [
          { type: 'image_url', image_url: { url: `data:image/png;base64,${imageBase64}` } },
          { type: 'text', text: message }
        ]
      }],
      model: this.model,
      temperature,
      max_tokens: maxTokens
    };

    const response = await axios.post(this.endpoint, payload, {
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      }
    });

    return response.data;
  }
}

// Usage
const client = new QwenClient(process.env.NVIDIA_API_KEY);

// Text chat
const result = await client.chat('What is the advantage of MoE architecture?');
console.log(result.choices[0].message.content);

// With thinking enabled
const deepResult = await client.chat('Explain how reasoning works in LLMs', {
  thinking: true
});
console.log(deepResult.choices[0].message.content);

Advanced Features: Thinking Mode and Tool Calling

Thinking Mode

Qwen3.5 supports an advanced "thinking" mode that enables the model to show its reasoning process. This is particularly useful for complex problem-solving tasks.

payload = {
    "messages": [{"role": "user", "content": "Solve this step by step: If a train travels 120km in 2 hours, what is its speed?"}],
    "model": "qwen/qwen3.5-397b-a17b",
    "chat_template_kwargs": {"thinking": True},
    "max_tokens": 4096,
}

response = session.post(invoke_url, headers=headers, json=payload)
result = response.json()
print(result['choices'][0]['message']['content'])

Tool Calling

Qwen3.5 supports function calling through OpenAI-compatible tools. This enables you to build agentic applications that can execute real actions.

import json

# Define tools for the model to use
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }
]

payload = {
    "messages": [
        {"role": "user", "content": "What's the weather like in Tokyo?"}
    ],
    "model": "qwen/qwen3.5-397b-a17b",
    "tools": tools,
    "tool_choice": "auto"
}

response = session.post(invoke_url, headers=headers, json=payload)
result = response.json()

# Check if model wants to call a tool
if 'tool_calls' in result['choices'][0]['message']:
    tool_call = result['choices'][0]['message']['tool_calls'][0]
    print(f"Model wants to call: {tool_call['function']['name']}")
    print(f"Arguments: {tool_call['function']['arguments']}")

Understanding Rate Limits and Pricing

Current Free Tier (NVIDIA Developer Program)

FeatureLimit
API AccessFree with registration
GPU-Accelerated EndpointsIncluded
Browser TestingUnlimited
Rate LimitsCheck developer dashboard

What This Means for You

Scaling to Production

When you're ready to move beyond free tier:

  1. NVIDIA NIM: Deploy containerized models anywhere (cloud, on-premises, hybrid)
  2. NeMo: Customize the model for your specific domain
  3. Enterprise support: Contact NVIDIA for dedicated infrastructure

Production Deployment with NVIDIA NIM

NVIDIA NIM (NVIDIA Inference Microservices) makes it easy to take Qwen3.5 from development to production.

NVIDIA NIM

What is NIM?

NIM provides pre-built, optimized containers for AI inference. Each NIM microservice packages:

Deploying Qwen3.5 with NIM

# Pull the Qwen3.5 NIM container
docker pull nvcr.io/nim/qwen/qwen3.5-397b-a17b:latest

# Run the container
docker run --gpus all --rm -p 8000:8000 \
  -e NVIDIA_API_KEY=$NVIDIA_API_KEY \
  nvcr.io/nim/qwen/qwen3.5-397b-a17b:latest

Now your model is running locally at http://localhost:8000/v1/chat/completions.

Benefits of NIM

Customization with NVIDIA NeMo

For domain-specific applications, you can fine-tune Qwen3.5 using NVIDIA NeMo.

NeMo Framework Capabilities

Example: Fine-tuning for Medical VQA

NVIDIA provides a technical tutorial for fine-tuning Qwen3.5 on radiological datasets for medical Visual Question Answering. This demonstrates how to adapt the model for specialized domains like healthcare.

Conclusion

Qwen3.5 represents an exciting opportunity to use a cutting-edge multimodal AI model at no cost through NVIDIA's developer platform. With its 397B parameter MoE architecture, native vision capabilities, and free API access, it's an excellent choice for:

Getting started is simple: register for the NVIDIA Developer Program, get your API key, and start building.

If you're building applications that integrate with Qwen3.5 or other AI APIs, Apidog provides the testing infrastructure you need. Test your API integrations, validate responses, manage environment variables, and automate your testing workflows with Apidog's comprehensive platform.

button

FAQ

Is Qwen3.5 really free to use?

Yes, NVIDIA provides free access to Qwen3.5 GPU-accelerated endpoints through their Developer Program. No credit card is required. Simply register at build.nvidia.com to get your API key.

What makes Qwen3.5 different from other VLMs?

Qwen3.5 was built specifically for autonomous agents, not adapted from a text-only model. Its Mixture of Experts architecture (397B total, 17B active) provides powerful reasoning while remaining computationally efficient. It's particularly good at UI navigation and visual reasoning tasks.

Can I use Qwen3.5 for commercial projects?

Check the current licensing terms on NVIDIA's platform. For production use, consider NVIDIA NIM for deployment or contact NVIDIA about enterprise options.

What's the difference between the free tier and NIM?

The free tier (Developer Program) uses NVIDIA-hosted endpoints. NIM lets you deploy the model yourself using containers, whether on-premises, in your cloud, or hybrid environments. NIM is designed for production-scale deployments.

How do I handle rate limiting?

The free tier has certain rate limits. For higher limits, consider upgrading to production access through NVIDIA NIM or contacting NVIDIA about enterprise options.

Can I fine-tune Qwen3.5?

Yes! NVIDIA NeMo framework provides tools for fine-tuning Qwen3.5 on your domain-specific data. This includes LoRA for memory-efficient customization and multinode support for large-scale training.

Explore more

How to Debug CI/CD Pipelines with LLMs ?

How to Debug CI/CD Pipelines with LLMs ?

Discover how LLMs can analyze terabytes of CI/CD logs to find bugs, identify patterns, and answer natural language queries.

28 February 2026

How to Get Free Claude API Access ?

How to Get Free Claude API Access ?

Anthropic offers free Claude API access (up to 20x) for open-source maintainers. Learn how to apply, eligibility requirements, and how to use it for your API projects.

28 February 2026

How to Use Nano Banana 2 for Free ?

How to Use Nano Banana 2 for Free ?

Learn how to use Google’s Nano Banana 2 image generation model completely free via Gemini, GenAIntel, and Lovart, with step‑by‑step usage tips, prompt examples, and Apidog-powered API testing to get the most from your daily credits.

27 February 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs