Introduction
Configuring hundreds of AI agents for a social media simulation sounds daunting. Each agent needs activity schedules, posting frequencies, response delays, influence weights, and stance positions. Doing this manually would take hours.
MiroFish automates this with LLM-powered configuration generation. The system analyzes your documents, knowledge graph, and simulation requirements, then generates detailed configs for every agent.
The challenge: LLMs can fail. Outputs get truncated. JSON breaks. Token limits bite.
This guide covers the complete implementation:
- Step-by-step generation (time → events → agents → platforms)
- Batch processing to avoid context limits
- JSON repair strategies for truncated outputs
- Rule-based fallback configs when LLM fails
- Agent activity patterns by type (Student vs Official vs Media)
- Validation and correction logic
All code comes from production use in MiroFish.
Architecture Overview
The config generator uses a pipelined approach:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Context │ ──► │ Time Config │ ──► │ Event Config │
│ Builder │ │ Generator │ │ Generator │
│ │ │ │ │ │
│ - Simulation │ │ - Total hours │ │ - Initial posts │
│ requirement │ │ - Minutes/round │ │ - Hot topics │
│ - Entity summary│ │ - Peak hours │ │ - Narrative │
│ - Document text │ │ - Activity mult │ │ direction │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Final Config │ ◄── │ Platform │ ◄── │ Agent Config │
│ Assembly │ │ Config │ │ Batches │
│ │ │ │ │ │
│ - Merge all │ │ - Twitter params│ │ - 15 agents │
│ - Validate │ │ - Reddit params │ │ per batch │
│ - Save JSON │ │ - Viral threshold│ │ - N batches │
└─────────────────┘ └─────────────────┘ └─────────────────┘
File Structure
backend/app/services/
├── simulation_config_generator.py # Main config generation logic
├── ontology_generator.py # Ontology generation (shared)
└── zep_entity_reader.py # Entity filtering
backend/app/models/
├── task.py # Task tracking
└── project.py # Project state
Step-by-Step Generation Strategy
Generating all configs at once would exceed token limits. Instead, the system generates in stages:
class SimulationConfigGenerator:
# Each batch generates configs for 15 agents
AGENTS_PER_BATCH = 15
# Context limits
MAX_CONTEXT_LENGTH = 50000
TIME_CONFIG_CONTEXT_LENGTH = 10000
EVENT_CONFIG_CONTEXT_LENGTH = 8000
ENTITY_SUMMARY_LENGTH = 300
AGENT_SUMMARY_LENGTH = 300
ENTITIES_PER_TYPE_DISPLAY = 20
def generate_config(
self,
simulation_id: str,
project_id: str,
graph_id: str,
simulation_requirement: str,
document_text: str,
entities: List[EntityNode],
enable_twitter: bool = True,
enable_reddit: bool = True,
progress_callback: Optional[Callable[[int, int, str], None]] = None,
) -> SimulationParameters:
# Calculate total steps
num_batches = math.ceil(len(entities) / self.AGENTS_PER_BATCH)
total_steps = 3 + num_batches # Time + Events + N Agent Batches + Platform
current_step = 0
def report_progress(step: int, message: str):
nonlocal current_step
current_step = step
if progress_callback:
progress_callback(step, total_steps, message)
logger.info(f"[{step}/{total_steps}] {message}")
# Build context
context = self._build_context(
simulation_requirement=simulation_requirement,
document_text=document_text,
entities=entities
)
reasoning_parts = []
# Step 1: Generate time config
report_progress(1, "Generating time configuration...")
time_config_result = self._generate_time_config(context, len(entities))
time_config = self._parse_time_config(time_config_result, len(entities))
reasoning_parts.append(f"Time config: {time_config_result.get('reasoning', 'Success')}")
# Step 2: Generate event config
report_progress(2, "Generating event config and hot topics...")
event_config_result = self._generate_event_config(context, simulation_requirement, entities)
event_config = self._parse_event_config(event_config_result)
reasoning_parts.append(f"Event config: {event_config_result.get('reasoning', 'Success')}")
# Steps 3-N: Generate agent configs in batches
all_agent_configs = []
for batch_idx in range(num_batches):
start_idx = batch_idx * self.AGENTS_PER_BATCH
end_idx = min(start_idx + self.AGENTS_PER_BATCH, len(entities))
batch_entities = entities[start_idx:end_idx]
report_progress(
3 + batch_idx,
f"Generating agent config ({start_idx + 1}-{end_idx}/{len(entities)})..."
)
batch_configs = self._generate_agent_configs_batch(
context=context,
entities=batch_entities,
start_idx=start_idx,
simulation_requirement=simulation_requirement
)
all_agent_configs.extend(batch_configs)
reasoning_parts.append(f"Agent config: Generated {len(all_agent_configs)} agents")
# Assign initial post publishers
event_config = self._assign_initial_post_agents(event_config, all_agent_configs)
# Final step: Platform config
report_progress(total_steps, "Generating platform configuration...")
twitter_config = PlatformConfig(platform="twitter", ...) if enable_twitter else None
reddit_config = PlatformConfig(platform="reddit", ...) if enable_reddit else None
# Assemble final config
params = SimulationParameters(
simulation_id=simulation_id,
project_id=project_id,
graph_id=graph_id,
simulation_requirement=simulation_requirement,
time_config=time_config,
agent_configs=all_agent_configs,
event_config=event_config,
twitter_config=twitter_config,
reddit_config=reddit_config,
generation_reasoning=" | ".join(reasoning_parts)
)
return params
This staged approach:
- Keeps each LLM call focused and manageable
- Provides progress updates to the user
- Allows partial recovery if one stage fails
Building Context
The context builder assembles relevant information while respecting token limits:
def _build_context(
self,
simulation_requirement: str,
document_text: str,
entities: List[EntityNode]
) -> str:
# Entity summary
entity_summary = self._summarize_entities(entities)
context_parts = [
f"## Simulation Requirement\n{simulation_requirement}",
f"\n## Entity Information ({len(entities)} entities)\n{entity_summary}",
]
# Add document text if space allows
current_length = sum(len(p) for p in context_parts)
remaining_length = self.MAX_CONTEXT_LENGTH - current_length - 500 # 500 char buffer
if remaining_length > 0 and document_text:
doc_text = document_text[:remaining_length]
if len(document_text) > remaining_length:
doc_text += "\n...(document truncated)"
context_parts.append(f"\n## Original Document\n{doc_text}")
return "\n".join(context_parts)
Entity Summarization
Entities are summarized by type:
def _summarize_entities(self, entities: List[EntityNode]) -> str:
lines = []
# Group by type
by_type: Dict[str, List[EntityNode]] = {}
for e in entities:
t = e.get_entity_type() or "Unknown"
if t not in by_type:
by_type[t] = []
by_type[t].append(e)
for entity_type, type_entities in by_type.items():
lines.append(f"\n### {entity_type} ({len(type_entities)} entities)")
# Display limited number with limited summary length
display_count = self.ENTITIES_PER_TYPE_DISPLAY
summary_len = self.ENTITY_SUMMARY_LENGTH
for e in type_entities[:display_count]:
summary_preview = (e.summary[:summary_len] + "...") if len(e.summary) > summary_len else e.summary
lines.append(f"- {e.name}: {summary_preview}")
if len(type_entities) > display_count:
lines.append(f" ... and {len(type_entities) - display_count} more")
return "\n".join(lines)
This produces output like:
### Student (45 entities)
- Zhang Wei: Active in student union, frequently posts about campus events and academic pressure...
- Li Ming: Graduate student researching AI ethics, often shares technology news...
... and 43 more
### University (3 entities)
- Wuhan University: Official account, posts announcements and news...
Time Configuration Generation
The time config determines simulation duration and activity patterns:
def _generate_time_config(self, context: str, num_entities: int) -> Dict[str, Any]:
# Truncate context for this specific step
context_truncated = context[:self.TIME_CONFIG_CONTEXT_LENGTH]
# Calculate max allowed value (90% of agent count)
max_agents_allowed = max(1, int(num_entities * 0.9))
prompt = f"""Based on the following simulation requirements, generate time configuration.
{context_truncated}
## Task
Generate time configuration JSON.
### Basic Principles (adjust based on event type and participant groups):
- User base is Chinese, must follow Beijing timezone habits
- 0-5 AM: Almost no activity (coefficient 0.05)
- 6-8 AM: Gradually waking up (coefficient 0.4)
- 9-18 PM: Work hours, moderate activity (coefficient 0.7)
- 19-22 PM: Evening peak, most active (coefficient 1.5)
- 23 PM: Activity declining (coefficient 0.5)
### Return JSON format (no markdown):
Example:
{{
"total_simulation_hours": 72,
"minutes_per_round": 60,
"agents_per_hour_min": 5,
"agents_per_hour_max": 50,
"peak_hours": [19, 20, 21, 22],
"off_peak_hours": [0, 1, 2, 3, 4, 5],
"morning_hours": [6, 7, 8],
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
"reasoning": "Time configuration explanation"
}}
Field descriptions:
- total_simulation_hours (int): 24-168 hours, shorter for breaking news, longer for ongoing topics
- minutes_per_round (int): 30-120 minutes, recommend 60
- agents_per_hour_min (int): Range 1-{max_agents_allowed}
- agents_per_hour_max (int): Range 1-{max_agents_allowed}
- peak_hours (int array): Adjust based on participant groups
- off_peak_hours (int array): Usually late night/early morning
- morning_hours (int array): Morning hours
- work_hours (int array): Work hours
- reasoning (string): Brief explanation"""
system_prompt = "You are a social media simulation expert. Return pure JSON format."
try:
return self._call_llm_with_retry(prompt, system_prompt)
except Exception as e:
logger.warning(f"Time config LLM generation failed: {e}, using default")
return self._get_default_time_config(num_entities)
Parsing and Validating Time Config
def _parse_time_config(self, result: Dict[str, Any], num_entities: int) -> TimeSimulationConfig:
# Get raw values
agents_per_hour_min = result.get("agents_per_hour_min", max(1, num_entities // 15))
agents_per_hour_max = result.get("agents_per_hour_max", max(5, num_entities // 5))
# Validate and correct: ensure not exceeding total agent count
if agents_per_hour_min > num_entities:
logger.warning(f"agents_per_hour_min ({agents_per_hour_min}) exceeds total agents ({num_entities}), corrected")
agents_per_hour_min = max(1, num_entities // 10)
if agents_per_hour_max > num_entities:
logger.warning(f"agents_per_hour_max ({agents_per_hour_max}) exceeds total agents ({num_entities}), corrected")
agents_per_hour_max = max(agents_per_hour_min + 1, num_entities // 2)
# Ensure min < max
if agents_per_hour_min >= agents_per_hour_max:
agents_per_hour_min = max(1, agents_per_hour_max // 2)
logger.warning(f"agents_per_hour_min >= max, corrected to {agents_per_hour_min}")
return TimeSimulationConfig(
total_simulation_hours=result.get("total_simulation_hours", 72),
minutes_per_round=result.get("minutes_per_round", 60),
agents_per_hour_min=agents_per_hour_min,
agents_per_hour_max=agents_per_hour_max,
peak_hours=result.get("peak_hours", [19, 20, 21, 22]),
off_peak_hours=result.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
off_peak_activity_multiplier=0.05,
morning_activity_multiplier=0.4,
work_activity_multiplier=0.7,
peak_activity_multiplier=1.5
)
Default Time Config (Chinese Timezone)
def _get_default_time_config(self, num_entities: int) -> Dict[str, Any]:
return {
"total_simulation_hours": 72,
"minutes_per_round": 60, # 1 hour per round
"agents_per_hour_min": max(1, num_entities // 15),
"agents_per_hour_max": max(5, num_entities // 5),
"peak_hours": [19, 20, 21, 22],
"off_peak_hours": [0, 1, 2, 3, 4, 5],
"morning_hours": [6, 7, 8],
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
"reasoning": "Using default Chinese timezone configuration"
}
Event Configuration Generation
Event config defines initial posts, hot topics, and narrative direction:
def _generate_event_config(
self,
context: str,
simulation_requirement: str,
entities: List[EntityNode]
) -> Dict[str, Any]:
# Get available entity types for LLM reference
entity_types_available = list(set(
e.get_entity_type() or "Unknown" for e in entities
))
# Show examples per type
type_examples = {}
for e in entities:
etype = e.get_entity_type() or "Unknown"
if etype not in type_examples:
type_examples[etype] = []
if len(type_examples[etype]) < 3:
type_examples[etype].append(e.name)
type_info = "\n".join([
f"- {t}: {', '.join(examples)}"
for t, examples in type_examples.items()
])
context_truncated = context[:self.EVENT_CONFIG_CONTEXT_LENGTH]
prompt = f"""Based on the following simulation requirements, generate event configuration.
Simulation Requirement: {simulation_requirement}
{context_truncated}
## Available Entity Types and Examples
{type_info}
## Task
Generate event configuration JSON:
- Extract hot topic keywords
- Describe narrative direction
- Design initial posts, **each post must specify poster_type**
**Important**: poster_type must be selected from "Available Entity Types" above, so initial posts can be assigned to appropriate agents.
For example: Official statements should be posted by Official/University types, news by MediaOutlet, student opinions by Student.
Return JSON format (no markdown):
{{
"hot_topics": ["keyword1", "keyword2", ...],
"narrative_direction": "<narrative direction description>",
"initial_posts": [
{{"content": "Post content", "poster_type": "Entity Type (must match available types)"}},
...
],
"reasoning": "<brief explanation>"
}}"""
system_prompt = "You are an opinion analysis expert. Return pure JSON format."
try:
return self._call_llm_with_retry(prompt, system_prompt)
except Exception as e:
logger.warning(f"Event config LLM generation failed: {e}, using default")
return {
"hot_topics": [],
"narrative_direction": "",
"initial_posts": [],
"reasoning": "Using default configuration"
}
Assigning Initial Post Publishers
After generating initial posts, match them to actual agents:
def _assign_initial_post_agents(
self,
event_config: EventConfig,
agent_configs: List[AgentActivityConfig]
) -> EventConfig:
if not event_config.initial_posts:
return event_config
# Index agents by type
agents_by_type: Dict[str, List[AgentActivityConfig]] = {}
for agent in agent_configs:
etype = agent.entity_type.lower()
if etype not in agents_by_type:
agents_by_type[etype] = []
agents_by_type[etype].append(agent)
# Type alias mapping (handles LLM variations)
type_aliases = {
"official": ["official", "university", "governmentagency", "government"],
"university": ["university", "official"],
"mediaoutlet": ["mediaoutlet", "media"],
"student": ["student", "person"],
"professor": ["professor", "expert", "teacher"],
"alumni": ["alumni", "person"],
"organization": ["organization", "ngo", "company", "group"],
"person": ["person", "student", "alumni"],
}
# Track used indices to avoid reusing same agent
used_indices: Dict[str, int] = {}
updated_posts = []
for post in event_config.initial_posts:
poster_type = post.get("poster_type", "").lower()
content = post.get("content", "")
matched_agent_id = None
# 1. Direct match
if poster_type in agents_by_type:
agents = agents_by_type[poster_type]
idx = used_indices.get(poster_type, 0) % len(agents)
matched_agent_id = agents[idx].agent_id
used_indices[poster_type] = idx + 1
else:
# 2. Alias match
for alias_key, aliases in type_aliases.items():
if poster_type in aliases or alias_key == poster_type:
for alias in aliases:
if alias in agents_by_type:
agents = agents_by_type[alias]
idx = used_indices.get(alias, 0) % len(agents)
matched_agent_id = agents[idx].agent_id
used_indices[alias] = idx + 1
break
if matched_agent_id is not None:
break
# 3. Fallback: use highest influence agent
if matched_agent_id is None:
logger.warning(f"No matching agent for type '{poster_type}', using highest influence agent")
if agent_configs:
sorted_agents = sorted(agent_configs, key=lambda a: a.influence_weight, reverse=True)
matched_agent_id = sorted_agents[0].agent_id
else:
matched_agent_id = 0
updated_posts.append({
"content": content,
"poster_type": post.get("poster_type", "Unknown"),
"poster_agent_id": matched_agent_id
})
logger.info(f"Initial post assignment: poster_type='{poster_type}' -> agent_id={matched_agent_id}")
event_config.initial_posts = updated_posts
return event_config
Batch Agent Configuration Generation
Generating configs for hundreds of agents at once would exceed token limits. The system processes in batches of 15:
def _generate_agent_configs_batch(
self,
context: str,
entities: List[EntityNode],
start_idx: int,
simulation_requirement: str
) -> List[AgentActivityConfig]:
# Build entity info with limited summary length
entity_list = []
summary_len = self.AGENT_SUMMARY_LENGTH
for i, e in enumerate(entities):
entity_list.append({
"agent_id": start_idx + i,
"entity_name": e.name,
"entity_type": e.get_entity_type() or "Unknown",
"summary": e.summary[:summary_len] if e.summary else ""
})
prompt = f"""Based on the following information, generate social media activity configuration for each entity.
Simulation Requirement: {simulation_requirement}
## Entity List
```json
{json.dumps(entity_list, ensure_ascii=False, indent=2)}
Task
Generate activity configuration for each entity. Note:
- Time must follow Chinese habits: 0-5 AM almost no activity, 19-22 PM most active
- Official institutions (University/GovernmentAgency): Low activity (0.1-0.3), work hours (9-17), slow response (60-240 min), high influence (2.5-3.0)
- Media (MediaOutlet): Moderate activity (0.4-0.6), all-day activity (8-23), fast response (5-30 min), high influence (2.0-2.5)
- Individuals (Student/Person/Alumni): High activity (0.6-0.9), mainly evening (18-23), fast response (1-15 min), low influence (0.8-1.2)
- Public figures/Experts: Moderate activity (0.4-0.6), medium-high influence (1.5-2.0)
system_prompt = "You are a social media behavior analysis expert. Return pure JSON format."
try:
result = self._call_llm_with_retry(prompt, system_prompt)
llm_configs = {cfg["agent_id"]: cfg for cfg in result.get("agent_configs", [])}
except Exception as e:
logger.warning(f"Agent config batch LLM generation failed: {e}, using rule-based generation")
llm_configs = {}
# Build AgentActivityConfig objects
configs = []
for i, entity in enumerate(entities):
agent_id = start_idx + i
cfg = llm_configs.get(agent_id, {})
# Use rule-based fallback if LLM failed
if not cfg:
cfg = self._generate_agent_config_by_rule(entity)
config = AgentActivityConfig(
agent_id=agent_id,
entity_uuid=entity.uuid,
entity_name=entity.name,
entity_type=entity.get_entity_type() or "Unknown",
activity_level=cfg.get("activity_level", 0.5),
posts_per_hour=cfg.get("posts_per_hour", 0.5),
comments_per_hour=cfg.get("comments_per_hour", 1.0),
active_hours=cfg.get("active_hours", list(range(9, 23))),
response_delay_min=cfg.get("response_delay_min", 5),
response_delay_max=cfg.get("response_delay_max", 60),
sentiment_bias=cfg.get("sentiment_bias", 0.0),
stance=cfg.get("stance", "neutral"),
influence_weight=cfg.get("influence_weight", 1.0)
)
configs.append(config)
return configs
### Rule-Based Fallback Configs
When LLM fails, use predefined patterns:
```python
def _generate_agent_config_by_rule(self, entity: EntityNode) -> Dict[str, Any]:
entity_type = (entity.get_entity_type() or "Unknown").lower()
if entity_type in ["university", "governmentagency", "ngo"]:
# Official institution: work hours, low frequency, high influence
return {
"activity_level": 0.2,
"posts_per_hour": 0.1,
"comments_per_hour": 0.05,
"active_hours": list(range(9, 18)), # 9:00-17:59
"response_delay_min": 60,
"response_delay_max": 240,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 3.0
}
elif entity_type in ["mediaoutlet"]:
# Media: all-day activity, moderate frequency, high influence
return {
"activity_level": 0.5,
"posts_per_hour": 0.8,
"comments_per_hour": 0.3,
"active_hours": list(range(7, 24)), # 7:00-23:59
"response_delay_min": 5,
"response_delay_max": 30,
"sentiment_bias": 0.0,
"stance": "observer",
"influence_weight": 2.5
}
elif entity_type in ["professor", "expert", "official"]:
# Expert/Professor: work + evening, moderate frequency
return {
"activity_level": 0.4,
"posts_per_hour": 0.3,
"comments_per_hour": 0.5,
"active_hours": list(range(8, 22)), # 8:00-21:59
"response_delay_min": 15,
"response_delay_max": 90,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 2.0
}
elif entity_type in ["student"]:
# Student: evening peak, high frequency
return {
"activity_level": 0.8,
"posts_per_hour": 0.6,
"comments_per_hour": 1.5,
"active_hours": [8, 9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
"response_delay_min": 1,
"response_delay_max": 15,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 0.8
}
elif entity_type in ["alumni"]:
# Alumni: evening focused
return {
"activity_level": 0.6,
"posts_per_hour": 0.4,
"comments_per_hour": 0.8,
"active_hours": [12, 13, 19, 20, 21, 22, 23], # Lunch + evening
"response_delay_min": 5,
"response_delay_max": 30,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 1.0
}
else:
# Default person: evening peak
return {
"activity_level": 0.7,
"posts_per_hour": 0.5,
"comments_per_hour": 1.2,
"active_hours": [9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
"response_delay_min": 2,
"response_delay_max": 20,
"sentiment_bias": 0.0,
"stance": "neutral",
"influence_weight": 1.0
}
LLM Call with Retry and JSON Repair
LLM calls fail. Outputs get truncated. JSON breaks. The system handles all of this:
def _call_llm_with_retry(self, prompt: str, system_prompt: str) -> Dict[str, Any]:
import re
max_attempts = 3
last_error = None
for attempt in range(max_attempts):
try:
response = self.client.chat.completions.create(
model=self.model_name,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"},
temperature=0.7 - (attempt * 0.1) # Lower temp on retry
)
content = response.choices[0].message.content
finish_reason = response.choices[0].finish_reason
# Check if truncated
if finish_reason == 'length':
logger.warning(f"LLM output truncated (attempt {attempt+1})")
content = self._fix_truncated_json(content)
# Try parsing JSON
try:
return json.loads(content)
except json.JSONDecodeError as e:
logger.warning(f"JSON parse failed (attempt {attempt+1}): {str(e)[:80]}")
# Try repairing JSON
fixed = self._try_fix_config_json(content)
if fixed:
return fixed
last_error = e
except Exception as e:
logger.warning(f"LLM call failed (attempt {attempt+1}): {str(e)[:80]}")
last_error = e
import time
time.sleep(2 * (attempt + 1))
raise last_error or Exception("LLM call failed")
Fixing Truncated JSON
def _fix_truncated_json(self, content: str) -> str:
content = content.strip()
# Count unclosed brackets
open_braces = content.count('{') - content.count('}')
open_brackets = content.count('[') - content.count(']')
# Check for unclosed string
if content and content[-1] not in '",}]':
content += '"'
# Close brackets
content += ']' * open_brackets
content += '}' * open_braces
return content
Advanced JSON Repair
def _try_fix_config_json(self, content: str) -> Optional[Dict[str, Any]]:
import re
# Fix truncation
content = self._fix_truncated_json(content)
# Extract JSON portion
json_match = re.search(r'\{[\s\S]*\}', content)
if json_match:
json_str = json_match.group()
# Remove newlines in strings
def fix_string(match):
s = match.group(0)
s = s.replace('\n', ' ').replace('\r', ' ')
s = re.sub(r'\s+', ' ', s)
return s
json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string, json_str)
try:
return json.loads(json_str)
except:
# Try removing control characters
json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
json_str = re.sub(r'\s+', ' ', json_str)
try:
return json.loads(json_str)
except:
pass
return None
Configuration Data Structures
Agent Activity Config
@dataclass
class AgentActivityConfig:
"""Single agent activity configuration"""
agent_id: int
entity_uuid: str
entity_name: str
entity_type: str
# Activity level (0.0-1.0)
activity_level: float = 0.5
# Posting frequency (per hour)
posts_per_hour: float = 1.0
comments_per_hour: float = 2.0
# Active hours (24-hour format, 0-23)
active_hours: List[int] = field(default_factory=lambda: list(range(8, 23)))
# Response speed (reaction delay in simulated minutes)
response_delay_min: int = 5
response_delay_max: int = 60
# Sentiment tendency (-1.0 to 1.0, negative to positive)
sentiment_bias: float = 0.0
# Stance on specific topics
stance: str = "neutral" # supportive, opposing, neutral, observer
# Influence weight (affects probability of being seen)
influence_weight: float = 1.0
Time Simulation Config
@dataclass
class TimeSimulationConfig:
"""Time simulation configuration (Chinese timezone)"""
total_simulation_hours: int = 72 # Default 72 hours (3 days)
minutes_per_round: int = 60 # 60 minutes per round
# Agents activated per hour
agents_per_hour_min: int = 5
agents_per_hour_max: int = 20
# Peak hours (evening 19-22, Chinese most active)
peak_hours: List[int] = field(default_factory=lambda: [19, 20, 21, 22])
peak_activity_multiplier: float = 1.5
# Off-peak hours (early morning 0-5, almost no activity)
off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5])
off_peak_activity_multiplier: float = 0.05
# Morning hours
morning_hours: List[int] = field(default_factory=lambda: [6, 7, 8])
morning_activity_multiplier: float = 0.4
# Work hours
work_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
work_activity_multiplier: float = 0.7
Complete Simulation Parameters
@dataclass
class SimulationParameters:
"""Complete simulation parameter configuration"""
simulation_id: str
project_id: str
graph_id: str
simulation_requirement: str
time_config: TimeSimulationConfig = field(default_factory=TimeSimulationConfig)
agent_configs: List[AgentActivityConfig] = field(default_factory=list)
event_config: EventConfig = field(default_factory=EventConfig)
twitter_config: Optional[PlatformConfig] = None
reddit_config: Optional[PlatformConfig] = None
llm_model: str = ""
llm_base_url: str = ""
generated_at: str = field(default_factory=lambda: datetime.now().isoformat())
generation_reasoning: str = ""
def to_dict(self) -> Dict[str, Any]:
time_dict = asdict(self.time_config)
return {
"simulation_id": self.simulation_id,
"project_id": self.project_id,
"graph_id": self.graph_id,
"simulation_requirement": self.simulation_requirement,
"time_config": time_dict,
"agent_configs": [asdict(a) for a in self.agent_configs],
"event_config": asdict(self.event_config),
"twitter_config": asdict(self.twitter_config) if self.twitter_config else None,
"reddit_config": asdict(self.reddit_config) if self.reddit_config else None,
"llm_model": self.llm_model,
"llm_base_url": self.llm_base_url,
"generated_at": self.generated_at,
"generation_reasoning": self.generation_reasoning,
}
Summary Table: Agent Type Patterns
| Agent Type | Activity | Active Hours | Posts/Hour | Comments/Hour | Response (min) | Influence |
|---|---|---|---|---|---|---|
| University | 0.2 | 9-17 | 0.1 | 0.05 | 60-240 | 3.0 |
| GovernmentAgency | 0.2 | 9-17 | 0.1 | 0.05 | 60-240 | 3.0 |
| MediaOutlet | 0.5 | 7-23 | 0.8 | 0.3 | 5-30 | 2.5 |
| Professor | 0.4 | 8-21 | 0.3 | 0.5 | 15-90 | 2.0 |
| Student | 0.8 | 8-12, 18-23 | 0.6 | 1.5 | 1-15 | 0.8 |
| Alumni | 0.6 | 12-13, 19-23 | 0.4 | 0.8 | 5-30 | 1.0 |
| Person (default) | 0.7 | 9-13, 18-23 | 0.5 | 1.2 | 2-20 | 1.0 |
Conclusion
LLM-powered configuration generation requires careful handling of:
- Step-by-step generation: Break into manageable stages (time → events → agents → platforms)
- Batch processing: Process 15 agents per batch to avoid context limits
- JSON repair: Handle truncation with bracket matching and string escaping
- Rule-based fallbacks: Provide sensible defaults when LLM fails
- Type-specific patterns: Different agent types have different activity patterns
- Validation and correction: Check generated values and fix issues (e.g., agents_per_hour > total_agents)



