How to Run Deepseek V3 0323 Locally with MLX

Introduction

Apple Silicon Macs have changed local AI model deployment, offering unprecedented computational power in consumer-grade hardware. With the release of Deepseek V3 0323, a powerful large language model (LLM), Mac users can now run state-of-the-art AI models locally using MLX, Apple's machine learning framework specifically optimized for Apple Silicon. This comprehensive guide walks you through the entire process of setting up and running Deepseek V3 0323 on your Mac, complete with performance benchmarks and comparisons to other leading models like Claude Sonnet 3.7.

💡

For developers who are looking to streamline your API development and testing, Apidog offers a comprehensive platform for API design, testing, and documentation. Apidog brings automation to API development, making your process faster and more efficient.

button

What is Deepseek V3 0323?

Deepseek V3 0323 Performance vs Deepseek V3

Deepseek V3 0323 is part of the Deepseek V3 family of models, a series of advanced large language models developed by Chinese AI lab DeepSeek. The model represents cutting-edge AI capability with strong performance across various language tasks, code generation, reasoning, and creative content creation. The "0323" in the name indicates its release date (March 23), following DeepSeek's convention of baking release dates into model names.

The latest models in the Deepseek V3 family are impressively powerful and have been released under the MIT license, making them fully open-source and available for both personal and commercial use. This represents a significant shift from previous versions that had custom licensing restrictions.

Deepseek V3 0304 Benchmarks and Performance

The Deepseek V3 family of models has shown impressive benchmark results across various metrics. Looking specifically at Deepseek V3 0304 (the version prior to 0323), performance data shows it matching or exceeding many commercial alternatives.

Key Benchmark Results

According to independent testing and the information from Paul Gauthier, Deepseek V3 scored 55% on the aider polyglot benchmark, significantly improving over previous versions. This positions it as the #2 non-thinking/reasoning model, behind only Claude Sonnet 3.7.

In terms of practical performance, Deepseek V3 models demonstrate:

Strong reasoning capabilities: Excellent performance on complex problems requiring multi-step thinking
Code generation excellence: Particularly strong in polyglot programming tasks
Instruction following: High adherence to specific instructions
Context retention: Effective use of provided context for accurate responses
Knowledge accuracy: Reliable factual information with minimal hallucinations

Deepseek V3 vs Claude 3.7 Sonnet vs Claude 3.7 Sonnet Thinking vs o3-mini

When comparing Deepseek V3 0304 to Claude Sonnet 3.7:

While Claude Sonnet 3.7 edges out in some benchmarks, Deepseek V3's ability to run locally on consumer hardware with MLX represents a significant advantage for users who prioritize privacy, offline access, and cost efficiency.

Yes, You Can Run Deepseek V3 0324 on Mac Studio with MLX

The new Deep Seek V3 0324 in 4-bit runs at > 20 toks/sec on a 512GB M3 Ultra with mlx-lm! pic.twitter.com/wFVrFCxGS6
— Awni Hannun (@awnihannun) March 24, 2025

Running Deepseek V3 on your local machine with MLX offers several key advantages:

Privacy: Your data never leaves your device, ensuring complete privacy
No API costs: Avoid paying for API usage and token limits
Full control: Customize settings and fine-tune as needed
No internet dependency: Use the model offline
Low latency: Experience faster response times without network delays
Apple Silicon optimization: MLX is specifically designed to leverage the Neural Engine in M-series chips

Hardware Requirements to Run Deepseek V3 0323 Locally

Before getting started, ensure your Mac meets these minimum requirements:

Apple Silicon Mac (M1, M2, M3, or M4 series)
Minimum 16GB RAM (32GB recommended)
At least 700GB of free storage space (the full model is approximately 641GB, though quantized versions require less)

For optimal performance when running the full model:

64GB+ RAM
M2 Ultra, M3 Ultra or M4 chips

Performance varies significantly based on your Mac's specifications. According to MLX developer Awni Hannun, the latest Deepseek V3 can run at speeds exceeding 20 tokens per second on a 512GB M3 Ultra Mac Studio using 4-bit quantization.

Step-by-Step Guide to Running Deepseek V3 0323 Locally

Step 1: Setting Up Your Environment

First, let's set up a Python virtual environment to keep our dependencies organized:

# Create a new directory for your project
mkdir deepseek-mlx
cd deepseek-mlx

# Create a virtual environment
python3 -m venv env

# Activate the environment
source env/bin/activate

Step 2: Install Required Packages

MLX and MLX-LM are the core packages needed to run Deepseek V3 with MLX:

# Install MLX and MLX-LM
pip install mlx mlx-lm

# Optional: Install PyTorch nightly (suppresses warnings)
pip install --pre torch --index-url <https://download.pytorch.org/whl/nightly/cpu>

Step 3: Install the LLM Command Line Tool

The llm command line tool simplifies working with language models. Let's install it along with the MLX plugin:

pip install llm
pip install llm-mlx

Step 4: Download the Deepseek V3 0323 Model

There are two approaches to downloading the model:

Option A: Standard Version (Full Quality)

# Download the full model (requires significant disk space)
llm mlx download-model deepseek-ai/DeepSeek-V3-0323

Option B: Quantized Version (Smaller Size, Slightly Lower Quality)

# Download the 4-bit quantized model (recommended for most users)
llm mlx download-model mlx-community/DeepSeek-V3-0323-4bit

The download will take some time depending on your internet connection speed. The 4-bit quantized model significantly reduces the storage requirement to approximately 350GB while maintaining most of the performance.

Step 5: Testing the Model

Once the model is downloaded, you can test it with a simple prompt:

# Test with a basic prompt
llm chat -m mlx-community/DeepSeek-V3-0323-4bit

This will start an interactive chat session with the Deepseek V3 0323 model. You can now type your prompts and interact with the model.

Step 6: Running as a Local API Server

For more flexible usage, you can run Deepseek V3 0323 as a local API server:

# Start the server
python -m mlx_lm.server --model mlx-community/DeepSeek-V3-0323-4bit --port 8080

The server will start on localhost:8080, providing an OpenAI-compatible API endpoint at http://localhost:8080/v1/chat/completions.

Step 7: Interacting with the API

Create a simple Python script to interact with your local API server:

import requests
import json

def chat_with_model(prompt):
    url = "<http://localhost:8080/v1/chat/completions>"
    headers = {"Content-Type": "application/json"}
    data = {
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.7,
        "max_tokens": 500
    }

    response = requests.post(url, headers=headers, json=data)
    return response.json()["choices"][0]["message"]["content"]

# Test the API
response = chat_with_model("Explain quantum computing in simple terms")
print(response)

Performance Optimization Tips

To get the best performance from Deepseek V3 on your Mac:

Close other applications: Minimize background processes to free up memory
Adjust context window: Smaller context windows use less memory
Quantization: Use 4-bit quantization for better performance on lower-spec machines
Cooling: Ensure proper ventilation for your Mac during extended usage
Parameter tuning: Experiment with temperature and top_p settings for different use cases

Fine-tuning Deepseek V3

For specialized applications, you may want to fine-tune Deepseek V3 on your own data:

# Install fine-tuning dependencies
pip install datasets peft trl

# Run fine-tuning script (example)
python fine_tune_mlx.py \\\\
  --model mlx-community/DeepSeek-V3-0323-4bit \\\\
  --dataset your_dataset.json \\\\
  --output-dir fine_tuned_model \\\\
  --epochs 3

Embedding the Model in Applications

To integrate Deepseek V3 in your applications, you can utilize the API server or directly interface with MLX:

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("mlx-community/DeepSeek-V3-0323-4bit")

# Generate text
prompt = "Explain the theory of relativity"
tokens = tokenizer.encode(prompt)
generation = generate(model, tokens, temp=0.7, max_tokens=500)

# Print the result
print(tokenizer.decode(generation))

Common Issues and Troubleshooting

Out of Memory errors: Try using a more aggressive quantization or reduce your context window
Slow generation speed: Close background applications and ensure proper cooling
Installation failures: Make sure you're using Python 3.9+ and have updated pip
Model loading errors: Check that you have sufficient disk space and properly downloaded the model
API connection issues: Verify the server is running and the port is not in use by another application

Conclusion

Running Deepseek V3 0323 locally on your Mac with MLX provides a powerful, privacy-focused AI solution without the constraints of API-based services. With benchmark performance approaching that of top commercial models like Claude Sonnet 3.7, Deepseek V3 represents an impressive achievement in open-source AI.

The combination of Apple Silicon's computational efficiency and MLX's optimization for these chips makes local deployment increasingly practical, even for large models that previously required cloud infrastructure. As these technologies continue to advance, the gap between local and cloud-based AI will continue to narrow, empowering users with more control, privacy, and flexibility in their AI applications.

Whether you're a developer looking to integrate AI capabilities into your applications, a researcher exploring model capabilities, or simply an enthusiast wanting to experience cutting-edge AI, running Deepseek V3 0323 locally with MLX offers an exciting and accessible path forward.

💡

button