How to generate 100+ agent configurations using LLMs with batch processing

Introduction

Configuring hundreds of AI agents for a social media simulation sounds daunting. Each agent needs activity schedules, posting frequencies, response delays, influence weights, and stance positions. Doing this manually would take hours.

MiroFish automates this with LLM-powered configuration generation. The system analyzes your documents, knowledge graph, and simulation requirements, then generates detailed configs for every agent.

The challenge: LLMs can fail. Outputs get truncated. JSON breaks. Token limits bite.

This guide covers the complete implementation:

Step-by-step generation (time → events → agents → platforms)
Batch processing to avoid context limits
JSON repair strategies for truncated outputs
Rule-based fallback configs when LLM fails
Agent activity patterns by type (Student vs Official vs Media)
Validation and correction logic

💡

The config generation pipeline processes 100+ agents through a series of API calls. Apidog was used to validate request/response schemas at each stage, catch JSON format errors before they reached production, and generate test cases for edge scenarios like truncated LLM outputs.

button

All code comes from production use in MiroFish.

Architecture Overview

The config generator uses a pipelined approach:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Context       │ ──► │   Time Config   │ ──► │   Event Config  │
│   Builder       │     │   Generator     │     │   Generator     │
│                 │     │                 │     │                 │
│ - Simulation    │     │ - Total hours   │     │ - Initial posts │
│   requirement   │     │ - Minutes/round │     │ - Hot topics    │
│ - Entity summary│     │ - Peak hours    │     │ - Narrative     │
│ - Document text │     │ - Activity mult │     │   direction     │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                                        │
                                                        ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Final Config  │ ◄── │   Platform      │ ◄── │   Agent Config  │
│   Assembly      │     │   Config        │     │   Batches       │
│                 │     │                 │     │                 │
│ - Merge all     │     │ - Twitter params│     │ - 15 agents     │
│ - Validate      │     │ - Reddit params │     │   per batch     │
│ - Save JSON     │     │ - Viral threshold│    │ - N batches     │
└─────────────────┘     └─────────────────┘     └─────────────────┘

File Structure

backend/app/services/
├── simulation_config_generator.py  # Main config generation logic
├── ontology_generator.py           # Ontology generation (shared)
└── zep_entity_reader.py            # Entity filtering

backend/app/models/
├── task.py                         # Task tracking
└── project.py                      # Project state

Step-by-Step Generation Strategy

Generating all configs at once would exceed token limits. Instead, the system generates in stages:

class SimulationConfigGenerator:
    # Each batch generates configs for 15 agents
    AGENTS_PER_BATCH = 15

    # Context limits
    MAX_CONTEXT_LENGTH = 50000
    TIME_CONFIG_CONTEXT_LENGTH = 10000
    EVENT_CONFIG_CONTEXT_LENGTH = 8000
    ENTITY_SUMMARY_LENGTH = 300
    AGENT_SUMMARY_LENGTH = 300
    ENTITIES_PER_TYPE_DISPLAY = 20

    def generate_config(
        self,
        simulation_id: str,
        project_id: str,
        graph_id: str,
        simulation_requirement: str,
        document_text: str,
        entities: List[EntityNode],
        enable_twitter: bool = True,
        enable_reddit: bool = True,
        progress_callback: Optional[Callable[[int, int, str], None]] = None,
    ) -> SimulationParameters:

        # Calculate total steps
        num_batches = math.ceil(len(entities) / self.AGENTS_PER_BATCH)
        total_steps = 3 + num_batches  # Time + Events + N Agent Batches + Platform
        current_step = 0

        def report_progress(step: int, message: str):
            nonlocal current_step
            current_step = step
            if progress_callback:
                progress_callback(step, total_steps, message)
            logger.info(f"[{step}/{total_steps}] {message}")

        # Build context
        context = self._build_context(
            simulation_requirement=simulation_requirement,
            document_text=document_text,
            entities=entities
        )

        reasoning_parts = []

        # Step 1: Generate time config
        report_progress(1, "Generating time configuration...")
        time_config_result = self._generate_time_config(context, len(entities))
        time_config = self._parse_time_config(time_config_result, len(entities))
        reasoning_parts.append(f"Time config: {time_config_result.get('reasoning', 'Success')}")

        # Step 2: Generate event config
        report_progress(2, "Generating event config and hot topics...")
        event_config_result = self._generate_event_config(context, simulation_requirement, entities)
        event_config = self._parse_event_config(event_config_result)
        reasoning_parts.append(f"Event config: {event_config_result.get('reasoning', 'Success')}")

        # Steps 3-N: Generate agent configs in batches
        all_agent_configs = []
        for batch_idx in range(num_batches):
            start_idx = batch_idx * self.AGENTS_PER_BATCH
            end_idx = min(start_idx + self.AGENTS_PER_BATCH, len(entities))
            batch_entities = entities[start_idx:end_idx]

            report_progress(
                3 + batch_idx,
                f"Generating agent config ({start_idx + 1}-{end_idx}/{len(entities)})..."
            )

            batch_configs = self._generate_agent_configs_batch(
                context=context,
                entities=batch_entities,
                start_idx=start_idx,
                simulation_requirement=simulation_requirement
            )
            all_agent_configs.extend(batch_configs)

        reasoning_parts.append(f"Agent config: Generated {len(all_agent_configs)} agents")

        # Assign initial post publishers
        event_config = self._assign_initial_post_agents(event_config, all_agent_configs)

        # Final step: Platform config
        report_progress(total_steps, "Generating platform configuration...")
        twitter_config = PlatformConfig(platform="twitter", ...) if enable_twitter else None
        reddit_config = PlatformConfig(platform="reddit", ...) if enable_reddit else None

        # Assemble final config
        params = SimulationParameters(
            simulation_id=simulation_id,
            project_id=project_id,
            graph_id=graph_id,
            simulation_requirement=simulation_requirement,
            time_config=time_config,
            agent_configs=all_agent_configs,
            event_config=event_config,
            twitter_config=twitter_config,
            reddit_config=reddit_config,
            generation_reasoning=" | ".join(reasoning_parts)
        )

        return params

This staged approach:

Keeps each LLM call focused and manageable
Provides progress updates to the user
Allows partial recovery if one stage fails

Building Context

The context builder assembles relevant information while respecting token limits:

def _build_context(
    self,
    simulation_requirement: str,
    document_text: str,
    entities: List[EntityNode]
) -> str:

    # Entity summary
    entity_summary = self._summarize_entities(entities)

    context_parts = [
        f"## Simulation Requirement\n{simulation_requirement}",
        f"\n## Entity Information ({len(entities)} entities)\n{entity_summary}",
    ]

    # Add document text if space allows
    current_length = sum(len(p) for p in context_parts)
    remaining_length = self.MAX_CONTEXT_LENGTH - current_length - 500  # 500 char buffer

    if remaining_length > 0 and document_text:
        doc_text = document_text[:remaining_length]
        if len(document_text) > remaining_length:
            doc_text += "\n...(document truncated)"
        context_parts.append(f"\n## Original Document\n{doc_text}")

    return "\n".join(context_parts)

Entity Summarization

Entities are summarized by type:

def _summarize_entities(self, entities: List[EntityNode]) -> str:
    lines = []

    # Group by type
    by_type: Dict[str, List[EntityNode]] = {}
    for e in entities:
        t = e.get_entity_type() or "Unknown"
        if t not in by_type:
            by_type[t] = []
        by_type[t].append(e)

    for entity_type, type_entities in by_type.items():
        lines.append(f"\n### {entity_type} ({len(type_entities)} entities)")

        # Display limited number with limited summary length
        display_count = self.ENTITIES_PER_TYPE_DISPLAY
        summary_len = self.ENTITY_SUMMARY_LENGTH

        for e in type_entities[:display_count]:
            summary_preview = (e.summary[:summary_len] + "...") if len(e.summary) > summary_len else e.summary
            lines.append(f"- {e.name}: {summary_preview}")

        if len(type_entities) > display_count:
            lines.append(f"  ... and {len(type_entities) - display_count} more")

    return "\n".join(lines)

This produces output like:

### Student (45 entities)
- Zhang Wei: Active in student union, frequently posts about campus events and academic pressure...
- Li Ming: Graduate student researching AI ethics, often shares technology news...
... and 43 more

### University (3 entities)
- Wuhan University: Official account, posts announcements and news...

Time Configuration Generation

The time config determines simulation duration and activity patterns:

def _generate_time_config(self, context: str, num_entities: int) -> Dict[str, Any]:
    # Truncate context for this specific step
    context_truncated = context[:self.TIME_CONFIG_CONTEXT_LENGTH]

    # Calculate max allowed value (90% of agent count)
    max_agents_allowed = max(1, int(num_entities * 0.9))

    prompt = f"""Based on the following simulation requirements, generate time configuration.

{context_truncated}

## Task
Generate time configuration JSON.

### Basic Principles (adjust based on event type and participant groups):
- User base is Chinese, must follow Beijing timezone habits
- 0-5 AM: Almost no activity (coefficient 0.05)
- 6-8 AM: Gradually waking up (coefficient 0.4)
- 9-18 PM: Work hours, moderate activity (coefficient 0.7)
- 19-22 PM: Evening peak, most active (coefficient 1.5)
- 23 PM: Activity declining (coefficient 0.5)

### Return JSON format (no markdown):

Example:
{{
    "total_simulation_hours": 72,
    "minutes_per_round": 60,
    "agents_per_hour_min": 5,
    "agents_per_hour_max": 50,
    "peak_hours": [19, 20, 21, 22],
    "off_peak_hours": [0, 1, 2, 3, 4, 5],
    "morning_hours": [6, 7, 8],
    "work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
    "reasoning": "Time configuration explanation"
}}

Field descriptions:
- total_simulation_hours (int): 24-168 hours, shorter for breaking news, longer for ongoing topics
- minutes_per_round (int): 30-120 minutes, recommend 60
- agents_per_hour_min (int): Range 1-{max_agents_allowed}
- agents_per_hour_max (int): Range 1-{max_agents_allowed}
- peak_hours (int array): Adjust based on participant groups
- off_peak_hours (int array): Usually late night/early morning
- morning_hours (int array): Morning hours
- work_hours (int array): Work hours
- reasoning (string): Brief explanation"""

    system_prompt = "You are a social media simulation expert. Return pure JSON format."

    try:
        return self._call_llm_with_retry(prompt, system_prompt)
    except Exception as e:
        logger.warning(f"Time config LLM generation failed: {e}, using default")
        return self._get_default_time_config(num_entities)

Parsing and Validating Time Config

def _parse_time_config(self, result: Dict[str, Any], num_entities: int) -> TimeSimulationConfig:
    # Get raw values
    agents_per_hour_min = result.get("agents_per_hour_min", max(1, num_entities // 15))
    agents_per_hour_max = result.get("agents_per_hour_max", max(5, num_entities // 5))

    # Validate and correct: ensure not exceeding total agent count
    if agents_per_hour_min > num_entities:
        logger.warning(f"agents_per_hour_min ({agents_per_hour_min}) exceeds total agents ({num_entities}), corrected")
        agents_per_hour_min = max(1, num_entities // 10)

    if agents_per_hour_max > num_entities:
        logger.warning(f"agents_per_hour_max ({agents_per_hour_max}) exceeds total agents ({num_entities}), corrected")
        agents_per_hour_max = max(agents_per_hour_min + 1, num_entities // 2)

    # Ensure min < max
    if agents_per_hour_min >= agents_per_hour_max:
        agents_per_hour_min = max(1, agents_per_hour_max // 2)
        logger.warning(f"agents_per_hour_min >= max, corrected to {agents_per_hour_min}")

    return TimeSimulationConfig(
        total_simulation_hours=result.get("total_simulation_hours", 72),
        minutes_per_round=result.get("minutes_per_round", 60),
        agents_per_hour_min=agents_per_hour_min,
        agents_per_hour_max=agents_per_hour_max,
        peak_hours=result.get("peak_hours", [19, 20, 21, 22]),
        off_peak_hours=result.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
        off_peak_activity_multiplier=0.05,
        morning_activity_multiplier=0.4,
        work_activity_multiplier=0.7,
        peak_activity_multiplier=1.5
    )

Default Time Config (Chinese Timezone)

def _get_default_time_config(self, num_entities: int) -> Dict[str, Any]:
    return {
        "total_simulation_hours": 72,
        "minutes_per_round": 60,  # 1 hour per round
        "agents_per_hour_min": max(1, num_entities // 15),
        "agents_per_hour_max": max(5, num_entities // 5),
        "peak_hours": [19, 20, 21, 22],
        "off_peak_hours": [0, 1, 2, 3, 4, 5],
        "morning_hours": [6, 7, 8],
        "work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
        "reasoning": "Using default Chinese timezone configuration"
    }

Event Configuration Generation

Event config defines initial posts, hot topics, and narrative direction:

def _generate_event_config(
    self,
    context: str,
    simulation_requirement: str,
    entities: List[EntityNode]
) -> Dict[str, Any]:

    # Get available entity types for LLM reference
    entity_types_available = list(set(
        e.get_entity_type() or "Unknown" for e in entities
    ))

    # Show examples per type
    type_examples = {}
    for e in entities:
        etype = e.get_entity_type() or "Unknown"
        if etype not in type_examples:
            type_examples[etype] = []
        if len(type_examples[etype]) < 3:
            type_examples[etype].append(e.name)

    type_info = "\n".join([
        f"- {t}: {', '.join(examples)}"
        for t, examples in type_examples.items()
    ])

    context_truncated = context[:self.EVENT_CONFIG_CONTEXT_LENGTH]

    prompt = f"""Based on the following simulation requirements, generate event configuration.

Simulation Requirement: {simulation_requirement}

{context_truncated}

## Available Entity Types and Examples
{type_info}

## Task
Generate event configuration JSON:
- Extract hot topic keywords
- Describe narrative direction
- Design initial posts, **each post must specify poster_type**

**Important**: poster_type must be selected from "Available Entity Types" above, so initial posts can be assigned to appropriate agents.

For example: Official statements should be posted by Official/University types, news by MediaOutlet, student opinions by Student.

Return JSON format (no markdown):
{{
    "hot_topics": ["keyword1", "keyword2", ...],
    "narrative_direction": "<narrative direction description>",
    "initial_posts": [
        {{"content": "Post content", "poster_type": "Entity Type (must match available types)"}},
        ...
    ],
    "reasoning": "<brief explanation>"
}}"""

    system_prompt = "You are an opinion analysis expert. Return pure JSON format."

    try:
        return self._call_llm_with_retry(prompt, system_prompt)
    except Exception as e:
        logger.warning(f"Event config LLM generation failed: {e}, using default")
        return {
            "hot_topics": [],
            "narrative_direction": "",
            "initial_posts": [],
            "reasoning": "Using default configuration"
        }

Assigning Initial Post Publishers

After generating initial posts, match them to actual agents:

def _assign_initial_post_agents(
    self,
    event_config: EventConfig,
    agent_configs: List[AgentActivityConfig]
) -> EventConfig:

    if not event_config.initial_posts:
        return event_config

    # Index agents by type
    agents_by_type: Dict[str, List[AgentActivityConfig]] = {}
    for agent in agent_configs:
        etype = agent.entity_type.lower()
        if etype not in agents_by_type:
            agents_by_type[etype] = []
        agents_by_type[etype].append(agent)

    # Type alias mapping (handles LLM variations)
    type_aliases = {
        "official": ["official", "university", "governmentagency", "government"],
        "university": ["university", "official"],
        "mediaoutlet": ["mediaoutlet", "media"],
        "student": ["student", "person"],
        "professor": ["professor", "expert", "teacher"],
        "alumni": ["alumni", "person"],
        "organization": ["organization", "ngo", "company", "group"],
        "person": ["person", "student", "alumni"],
    }

    # Track used indices to avoid reusing same agent
    used_indices: Dict[str, int] = {}

    updated_posts = []
    for post in event_config.initial_posts:
        poster_type = post.get("poster_type", "").lower()
        content = post.get("content", "")

        matched_agent_id = None

        # 1. Direct match
        if poster_type in agents_by_type:
            agents = agents_by_type[poster_type]
            idx = used_indices.get(poster_type, 0) % len(agents)
            matched_agent_id = agents[idx].agent_id
            used_indices[poster_type] = idx + 1
        else:
            # 2. Alias match
            for alias_key, aliases in type_aliases.items():
                if poster_type in aliases or alias_key == poster_type:
                    for alias in aliases:
                        if alias in agents_by_type:
                            agents = agents_by_type[alias]
                            idx = used_indices.get(alias, 0) % len(agents)
                            matched_agent_id = agents[idx].agent_id
                            used_indices[alias] = idx + 1
                            break
                    if matched_agent_id is not None:
                        break

        # 3. Fallback: use highest influence agent
        if matched_agent_id is None:
            logger.warning(f"No matching agent for type '{poster_type}', using highest influence agent")
            if agent_configs:
                sorted_agents = sorted(agent_configs, key=lambda a: a.influence_weight, reverse=True)
                matched_agent_id = sorted_agents[0].agent_id
            else:
                matched_agent_id = 0

        updated_posts.append({
            "content": content,
            "poster_type": post.get("poster_type", "Unknown"),
            "poster_agent_id": matched_agent_id
        })

        logger.info(f"Initial post assignment: poster_type='{poster_type}' -> agent_id={matched_agent_id}")

    event_config.initial_posts = updated_posts
    return event_config

Batch Agent Configuration Generation

Generating configs for hundreds of agents at once would exceed token limits. The system processes in batches of 15:

def _generate_agent_configs_batch(
    self,
    context: str,
    entities: List[EntityNode],
    start_idx: int,
    simulation_requirement: str
) -> List[AgentActivityConfig]:

    # Build entity info with limited summary length
    entity_list = []
    summary_len = self.AGENT_SUMMARY_LENGTH
    for i, e in enumerate(entities):
        entity_list.append({
            "agent_id": start_idx + i,
            "entity_name": e.name,
            "entity_type": e.get_entity_type() or "Unknown",
            "summary": e.summary[:summary_len] if e.summary else ""
        })

    prompt = f"""Based on the following information, generate social media activity configuration for each entity.

Simulation Requirement: {simulation_requirement}

## Entity List
```json
{json.dumps(entity_list, ensure_ascii=False, indent=2)}

Task

Generate activity configuration for each entity. Note:

Time must follow Chinese habits: 0-5 AM almost no activity, 19-22 PM most active
Official institutions (University/GovernmentAgency): Low activity (0.1-0.3), work hours (9-17), slow response (60-240 min), high influence (2.5-3.0)
Media (MediaOutlet): Moderate activity (0.4-0.6), all-day activity (8-23), fast response (5-30 min), high influence (2.0-2.5)
Individuals (Student/Person/Alumni): High activity (0.6-0.9), mainly evening (18-23), fast response (1-15 min), low influence (0.8-1.2)
Public figures/Experts: Moderate activity (0.4-0.6), medium-high influence (1.5-2.0)

system_prompt = "You are a social media behavior analysis expert. Return pure JSON format."

try:
    result = self._call_llm_with_retry(prompt, system_prompt)
    llm_configs = {cfg["agent_id"]: cfg for cfg in result.get("agent_configs", [])}
except Exception as e:
    logger.warning(f"Agent config batch LLM generation failed: {e}, using rule-based generation")
    llm_configs = {}

# Build AgentActivityConfig objects
configs = []
for i, entity in enumerate(entities):
    agent_id = start_idx + i
    cfg = llm_configs.get(agent_id, {})

    # Use rule-based fallback if LLM failed
    if not cfg:
        cfg = self._generate_agent_config_by_rule(entity)

    config = AgentActivityConfig(
        agent_id=agent_id,
        entity_uuid=entity.uuid,
        entity_name=entity.name,
        entity_type=entity.get_entity_type() or "Unknown",
        activity_level=cfg.get("activity_level", 0.5),
        posts_per_hour=cfg.get("posts_per_hour", 0.5),
        comments_per_hour=cfg.get("comments_per_hour", 1.0),
        active_hours=cfg.get("active_hours", list(range(9, 23))),
        response_delay_min=cfg.get("response_delay_min", 5),
        response_delay_max=cfg.get("response_delay_max", 60),
        sentiment_bias=cfg.get("sentiment_bias", 0.0),
        stance=cfg.get("stance", "neutral"),
        influence_weight=cfg.get("influence_weight", 1.0)
    )
    configs.append(config)

return configs


### Rule-Based Fallback Configs

When LLM fails, use predefined patterns:

```python
def _generate_agent_config_by_rule(self, entity: EntityNode) -> Dict[str, Any]:
    entity_type = (entity.get_entity_type() or "Unknown").lower()

    if entity_type in ["university", "governmentagency", "ngo"]:
        # Official institution: work hours, low frequency, high influence
        return {
            "activity_level": 0.2,
            "posts_per_hour": 0.1,
            "comments_per_hour": 0.05,
            "active_hours": list(range(9, 18)),  # 9:00-17:59
            "response_delay_min": 60,
            "response_delay_max": 240,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 3.0
        }

    elif entity_type in ["mediaoutlet"]:
        # Media: all-day activity, moderate frequency, high influence
        return {
            "activity_level": 0.5,
            "posts_per_hour": 0.8,
            "comments_per_hour": 0.3,
            "active_hours": list(range(7, 24)),  # 7:00-23:59
            "response_delay_min": 5,
            "response_delay_max": 30,
            "sentiment_bias": 0.0,
            "stance": "observer",
            "influence_weight": 2.5
        }

    elif entity_type in ["professor", "expert", "official"]:
        # Expert/Professor: work + evening, moderate frequency
        return {
            "activity_level": 0.4,
            "posts_per_hour": 0.3,
            "comments_per_hour": 0.5,
            "active_hours": list(range(8, 22)),  # 8:00-21:59
            "response_delay_min": 15,
            "response_delay_max": 90,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 2.0
        }

    elif entity_type in ["student"]:
        # Student: evening peak, high frequency
        return {
            "activity_level": 0.8,
            "posts_per_hour": 0.6,
            "comments_per_hour": 1.5,
            "active_hours": [8, 9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
            "response_delay_min": 1,
            "response_delay_max": 15,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 0.8
        }

    elif entity_type in ["alumni"]:
        # Alumni: evening focused
        return {
            "activity_level": 0.6,
            "posts_per_hour": 0.4,
            "comments_per_hour": 0.8,
            "active_hours": [12, 13, 19, 20, 21, 22, 23],  # Lunch + evening
            "response_delay_min": 5,
            "response_delay_max": 30,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 1.0
        }

    else:
        # Default person: evening peak
        return {
            "activity_level": 0.7,
            "posts_per_hour": 0.5,
            "comments_per_hour": 1.2,
            "active_hours": [9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
            "response_delay_min": 2,
            "response_delay_max": 20,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 1.0
        }

LLM Call with Retry and JSON Repair

LLM calls fail. Outputs get truncated. JSON breaks. The system handles all of this:

def _call_llm_with_retry(self, prompt: str, system_prompt: str) -> Dict[str, Any]:
    import re

    max_attempts = 3
    last_error = None

    for attempt in range(max_attempts):
        try:
            response = self.client.chat.completions.create(
                model=self.model_name,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                response_format={"type": "json_object"},
                temperature=0.7 - (attempt * 0.1)  # Lower temp on retry
            )

            content = response.choices[0].message.content
            finish_reason = response.choices[0].finish_reason

            # Check if truncated
            if finish_reason == 'length':
                logger.warning(f"LLM output truncated (attempt {attempt+1})")
                content = self._fix_truncated_json(content)

            # Try parsing JSON
            try:
                return json.loads(content)
            except json.JSONDecodeError as e:
                logger.warning(f"JSON parse failed (attempt {attempt+1}): {str(e)[:80]}")

                # Try repairing JSON
                fixed = self._try_fix_config_json(content)
                if fixed:
                    return fixed

                last_error = e

        except Exception as e:
            logger.warning(f"LLM call failed (attempt {attempt+1}): {str(e)[:80]}")
            last_error = e
            import time
            time.sleep(2 * (attempt + 1))

    raise last_error or Exception("LLM call failed")

Fixing Truncated JSON

def _fix_truncated_json(self, content: str) -> str:
    content = content.strip()

    # Count unclosed brackets
    open_braces = content.count('{') - content.count('}')
    open_brackets = content.count('[') - content.count(']')

    # Check for unclosed string
    if content and content[-1] not in '",}]':
        content += '"'

    # Close brackets
    content += ']' * open_brackets
    content += '}' * open_braces

    return content

Advanced JSON Repair

def _try_fix_config_json(self, content: str) -> Optional[Dict[str, Any]]:
    import re

    # Fix truncation
    content = self._fix_truncated_json(content)

    # Extract JSON portion
    json_match = re.search(r'\{[\s\S]*\}', content)
    if json_match:
        json_str = json_match.group()

        # Remove newlines in strings
        def fix_string(match):
            s = match.group(0)
            s = s.replace('\n', ' ').replace('\r', ' ')
            s = re.sub(r'\s+', ' ', s)
            return s

        json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string, json_str)

        try:
            return json.loads(json_str)
        except:
            # Try removing control characters
            json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
            json_str = re.sub(r'\s+', ' ', json_str)
            try:
                return json.loads(json_str)
            except:
                pass

    return None

Configuration Data Structures

Agent Activity Config

@dataclass
class AgentActivityConfig:
    """Single agent activity configuration"""
    agent_id: int
    entity_uuid: str
    entity_name: str
    entity_type: str

    # Activity level (0.0-1.0)
    activity_level: float = 0.5

    # Posting frequency (per hour)
    posts_per_hour: float = 1.0
    comments_per_hour: float = 2.0

    # Active hours (24-hour format, 0-23)
    active_hours: List[int] = field(default_factory=lambda: list(range(8, 23)))

    # Response speed (reaction delay in simulated minutes)
    response_delay_min: int = 5
    response_delay_max: int = 60

    # Sentiment tendency (-1.0 to 1.0, negative to positive)
    sentiment_bias: float = 0.0

    # Stance on specific topics
    stance: str = "neutral"  # supportive, opposing, neutral, observer

    # Influence weight (affects probability of being seen)
    influence_weight: float = 1.0

Time Simulation Config

@dataclass
class TimeSimulationConfig:
    """Time simulation configuration (Chinese timezone)"""
    total_simulation_hours: int = 72  # Default 72 hours (3 days)
    minutes_per_round: int = 60  # 60 minutes per round

    # Agents activated per hour
    agents_per_hour_min: int = 5
    agents_per_hour_max: int = 20

    # Peak hours (evening 19-22, Chinese most active)
    peak_hours: List[int] = field(default_factory=lambda: [19, 20, 21, 22])
    peak_activity_multiplier: float = 1.5

    # Off-peak hours (early morning 0-5, almost no activity)
    off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5])
    off_peak_activity_multiplier: float = 0.05

    # Morning hours
    morning_hours: List[int] = field(default_factory=lambda: [6, 7, 8])
    morning_activity_multiplier: float = 0.4

    # Work hours
    work_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
    work_activity_multiplier: float = 0.7

Complete Simulation Parameters

@dataclass
class SimulationParameters:
    """Complete simulation parameter configuration"""
    simulation_id: str
    project_id: str
    graph_id: str
    simulation_requirement: str

    time_config: TimeSimulationConfig = field(default_factory=TimeSimulationConfig)
    agent_configs: List[AgentActivityConfig] = field(default_factory=list)
    event_config: EventConfig = field(default_factory=EventConfig)
    twitter_config: Optional[PlatformConfig] = None
    reddit_config: Optional[PlatformConfig] = None

    llm_model: str = ""
    llm_base_url: str = ""

    generated_at: str = field(default_factory=lambda: datetime.now().isoformat())
    generation_reasoning: str = ""

    def to_dict(self) -> Dict[str, Any]:
        time_dict = asdict(self.time_config)
        return {
            "simulation_id": self.simulation_id,
            "project_id": self.project_id,
            "graph_id": self.graph_id,
            "simulation_requirement": self.simulation_requirement,
            "time_config": time_dict,
            "agent_configs": [asdict(a) for a in self.agent_configs],
            "event_config": asdict(self.event_config),
            "twitter_config": asdict(self.twitter_config) if self.twitter_config else None,
            "reddit_config": asdict(self.reddit_config) if self.reddit_config else None,
            "llm_model": self.llm_model,
            "llm_base_url": self.llm_base_url,
            "generated_at": self.generated_at,
            "generation_reasoning": self.generation_reasoning,
        }

Summary Table: Agent Type Patterns

Agent Type	Activity	Active Hours	Posts/Hour	Comments/Hour	Response (min)	Influence
University	0.2	9-17	0.1	0.05	60-240	3.0
GovernmentAgency	0.2	9-17	0.1	0.05	60-240	3.0
MediaOutlet	0.5	7-23	0.8	0.3	5-30	2.5
Professor	0.4	8-21	0.3	0.5	15-90	2.0
Student	0.8	8-12, 18-23	0.6	1.5	1-15	0.8
Alumni	0.6	12-13, 19-23	0.4	0.8	5-30	1.0
Person (default)	0.7	9-13, 18-23	0.5	1.2	2-20	1.0

Conclusion

LLM-powered configuration generation requires careful handling of:

Step-by-step generation: Break into manageable stages (time → events → agents → platforms)
Batch processing: Process 15 agents per batch to avoid context limits
JSON repair: Handle truncation with bracket matching and string escaping
Rule-based fallbacks: Provide sensible defaults when LLM fails
Type-specific patterns: Different agent types have different activity patterns
Validation and correction: Check generated values and fix issues (e.g., agents_per_hour > total_agents)

button