Tạo hơn 100 cấu hình agent bằng LLM hàng loạt

Giới thiệu

Cấu hình hàng trăm tác nhân AI cho một mô phỏng mạng xã hội nghe có vẻ đáng sợ. Mỗi tác nhân cần lịch trình hoạt động, tần suất đăng bài, độ trễ phản hồi, trọng số ảnh hưởng và lập trường. Thực hiện việc này thủ công sẽ mất hàng giờ.

MiroFish tự động hóa quy trình này bằng cách tạo cấu hình được hỗ trợ bởi LLM. Hệ thống phân tích tài liệu, đồ thị tri thức và yêu cầu mô phỏng của bạn, sau đó tạo ra các cấu hình chi tiết cho từng tác nhân.

Thách thức: LLM có thể thất bại. Đầu ra bị cắt ngắn. JSON bị lỗi. Giới hạn token gây khó khăn.

Hướng dẫn này bao gồm triển khai hoàn chỉnh:

Tạo theo từng bước (thời gian → sự kiện → tác nhân → nền tảng)
Xử lý theo lô để tránh giới hạn ngữ cảnh
Chiến lược sửa lỗi JSON cho các đầu ra bị cắt ngắn
Cấu hình dự phòng dựa trên quy tắc khi LLM thất bại
Mô hình hoạt động của tác nhân theo loại (Sinh viên so với Quan chức so với Truyền thông)
Logic xác thực và sửa lỗi

💡

Quy trình tạo cấu hình xử lý hơn 100 tác nhân thông qua một loạt các lệnh gọi API. Apidog được sử dụng để xác thực các lược đồ yêu cầu/phản hồi ở mỗi giai đoạn, phát hiện lỗi định dạng JSON trước khi chúng đến giai đoạn sản xuất và tạo các trường hợp thử nghiệm cho các tình huống khó như đầu ra LLM bị cắt ngắn.

nút

Tất cả mã đều đến từ việc sử dụng thực tế trong MiroFish.

Tổng quan Kiến trúc

Trình tạo cấu hình sử dụng phương pháp đường ống (pipelined approach):

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Trình xây dựng│ ──► │   Cấu hình thời │ ──► │   Cấu hình sự   │
│   ngữ cảnh      │     │   gian          │     │   kiện          │
│                 │     │                 │     │                 │
│ - Yêu cầu mô    │     │ - Tổng số giờ   │     │ - Bài đăng ban  │
│   phỏng         │     │ - Phút/vòng     │     │   đầu           │
│ - Tóm tắt thực  │     │ - Giờ cao điểm  │     │ - Chủ đề nóng  │
│   thể           │     │ - Hệ số nhân    │     │ - Hướng kể      │
│ - Văn bản tài   │     │   hoạt động     │     │   chuyện        │
│   liệu          │     │                 │     │                 │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                                        │
                                                        ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Tập hợp cấu   │ ◄── │   Cấu hình      │ ◄── │   Các lô cấu    │
│   hình cuối cùng│     │   nền tảng      │     │   hình tác nhân │
│                 │     │                 │     │                 │
│ - Hợp nhất tất  │     │ - Thông số     │     │ - 15 tác nhân   │
│   cả            │     │   Twitter       │     │   mỗi lô        │
│ - Xác thực       │     │ - Thông số     │     │ - N lô          │
│ - Lưu JSON      │     │   Reddit        │     │                 │
│                 │     │ - Ngưỡng lan    │     │                 │
│                 │     │   truyền        │     │                 │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Cấu trúc tệp

backend/app/services/
├── simulation_config_generator.py  # Logic tạo cấu hình chính
├── ontology_generator.py           # Tạo Ontology (chung)
└── zep_entity_reader.py            # Lọc thực thể

backend/app/models/
├── task.py                         # Theo dõi nhiệm vụ
└── project.py                      # Trạng thái dự án

Chiến lược tạo theo từng bước

Việc tạo tất cả cấu hình cùng một lúc sẽ vượt quá giới hạn token. Thay vào đó, hệ thống tạo theo từng giai đoạn:

class SimulationConfigGenerator:
    # Mỗi lô tạo cấu hình cho 15 tác nhân
    AGENTS_PER_BATCH = 15

    # Giới hạn ngữ cảnh
    MAX_CONTEXT_LENGTH = 50000
    TIME_CONFIG_CONTEXT_LENGTH = 10000
    EVENT_CONFIG_CONTEXT_LENGTH = 8000
    ENTITY_SUMMARY_LENGTH = 300
    AGENT_SUMMARY_LENGTH = 300
    ENTITIES_PER_TYPE_DISPLAY = 20

    def generate_config(
        self,
        simulation_id: str,
        project_id: str,
        graph_id: str,
        simulation_requirement: str,
        document_text: str,
        entities: List[EntityNode],
        enable_twitter: bool = True,
        enable_reddit: bool = True,
        progress_callback: Optional[Callable[[int, int, str], None]] = None,
    ) -> SimulationParameters:

        # Tính tổng số bước
        num_batches = math.ceil(len(entities) / self.AGENTS_PER_BATCH)
        total_steps = 3 + num_batches  # Thời gian + Sự kiện + N Lô Tác nhân + Nền tảng
        current_step = 0

        def report_progress(step: int, message: str):
            nonlocal current_step
            current_step = step
            if progress_callback:
                progress_callback(step, total_steps, message)
            logger.info(f"[{step}/{total_steps}] {message}")

        # Xây dựng ngữ cảnh
        context = self._build_context(
            simulation_requirement=simulation_requirement,
            document_text=document_text,
            entities=entities
        )

        reasoning_parts = []

        # Bước 1: Tạo cấu hình thời gian
        report_progress(1, "Đang tạo cấu hình thời gian...")
        time_config_result = self._generate_time_config(context, len(entities))
        time_config = self._parse_time_config(time_config_result, len(entities))
        reasoning_parts.append(f"Cấu hình thời gian: {time_config_result.get('reasoning', 'Thành công')}")

        # Bước 2: Tạo cấu hình sự kiện
        report_progress(2, "Đang tạo cấu hình sự kiện và các chủ đề nóng...")
        event_config_result = self._generate_event_config(context, simulation_requirement, entities)
        event_config = self._parse_event_config(event_config_result)
        reasoning_parts.append(f"Cấu hình sự kiện: {event_config_result.get('reasoning', 'Thành công')}")

        # Bước 3-N: Tạo cấu hình tác nhân theo lô
        all_agent_configs = []
        for batch_idx in range(num_batches):
            start_idx = batch_idx * self.AGENTS_PER_BATCH
            end_idx = min(start_idx + self.AGENTS_PER_BATCH, len(entities))
            batch_entities = entities[start_idx:end_idx]

            report_progress(
                3 + batch_idx,
                f"Đang tạo cấu hình tác nhân ({start_idx + 1}-{end_idx}/{len(entities)})..."
            )

            batch_configs = self._generate_agent_configs_batch(
                context=context,
                entities=batch_entities,
                start_idx=start_idx,
                simulation_requirement=simulation_requirement
            )
            all_agent_configs.extend(batch_configs)

        reasoning_parts.append(f"Cấu hình tác nhân: Đã tạo {len(all_agent_configs)} tác nhân")

        # Gán người đăng bài ban đầu
        event_config = self._assign_initial_post_agents(event_config, all_agent_configs)

        # Bước cuối cùng: Cấu hình nền tảng
        report_progress(total_steps, "Đang tạo cấu hình nền tảng...")
        twitter_config = PlatformConfig(platform="twitter", ...) if enable_twitter else None
        reddit_config = PlatformConfig(platform="reddit", ...) if enable_reddit else None

        # Tập hợp cấu hình cuối cùng
        params = SimulationParameters(
            simulation_id=simulation_id,
            project_id=project_id,
            graph_id=graph_id,
            simulation_requirement=simulation_requirement,
            time_config=time_config,
            agent_configs=all_agent_configs,
            event_config=event_config,
            twitter_config=twitter_config,
            reddit_config=reddit_config,
            generation_reasoning=" | ".join(reasoning_parts)
        )

        return params

Phương pháp theo từng giai đoạn này:

Giúp mỗi lệnh gọi LLM tập trung và dễ quản lý
Cung cấp cập nhật tiến độ cho người dùng
Cho phép phục hồi một phần nếu một giai đoạn thất bại

Xây dựng ngữ cảnh

Trình xây dựng ngữ cảnh tập hợp thông tin liên quan trong khi vẫn tuân thủ giới hạn token:

def _build_context(
    self,
    simulation_requirement: str,
    document_text: str,
    entities: List[EntityNode]
) -> str:

    # Tóm tắt thực thể
    entity_summary = self._summarize_entities(entities)

    context_parts = [
        f"## Yêu cầu mô phỏng\n{simulation_requirement}",
        f"\n## Thông tin thực thể ({len(entities)} thực thể)\n{entity_summary}",
    ]

    # Thêm văn bản tài liệu nếu có đủ không gian
    current_length = sum(len(p) for p in context_parts)
    remaining_length = self.MAX_CONTEXT_LENGTH - current_length - 500  # vùng đệm 500 ký tự

    if remaining_length > 0 and document_text:
        doc_text = document_text[:remaining_length]
        if len(document_text) > remaining_length:
            doc_text += "\n...(tài liệu bị cắt ngắn)"
        context_parts.append(f"\n## Tài liệu gốc\n{doc_text}")

    return "\n".join(context_parts)

Tóm tắt thực thể

Các thực thể được tóm tắt theo loại:

def _summarize_entities(self, entities: List[EntityNode]) -> str:
    lines = []

    # Nhóm theo loại
    by_type: Dict[str, List[EntityNode]] = {}
    for e in entities:
        t = e.get_entity_type() or "Unknown"
        if t not in by_type:
            by_type[t] = []
        by_type[t].append(e)

    for entity_type, type_entities in by_type.items():
        lines.append(f"\n### {entity_type} ({len(type_entities)} thực thể)")

        # Hiển thị số lượng giới hạn với độ dài tóm tắt giới hạn
        display_count = self.ENTITIES_PER_TYPE_DISPLAY
        summary_len = self.ENTITY_SUMMARY_LENGTH

        for e in type_entities[:display_count]:
            summary_preview = (e.summary[:summary_len] + "...") if len(e.summary) > summary_len else e.summary
            lines.append(f"- {e.name}: {summary_preview}")

        if len(type_entities) > display_count:
            lines.append(f"  ... và {len(type_entities) - display_count} thực thể khác")

    return "\n".join(lines)

Điều này tạo ra đầu ra như:

### Sinh viên (45 thực thể)
- Zhang Wei: Tích cực trong hội sinh viên, thường xuyên đăng bài về các sự kiện trong khuôn viên trường và áp lực học tập...
- Li Ming: Sinh viên cao học nghiên cứu đạo đức AI, thường chia sẻ tin tức công nghệ...
... và 43 thực thể khác

### Đại học (3 thực thể)
- Đại học Vũ Hán: Tài khoản chính thức, đăng thông báo và tin tức...

Tạo cấu hình thời gian

Cấu hình thời gian xác định thời lượng mô phỏng và các mẫu hoạt động:

def _generate_time_config(self, context: str, num_entities: int) -> Dict[str, Any]:
    # Cắt ngắn ngữ cảnh cho bước cụ thể này
    context_truncated = context[:self.TIME_CONFIG_CONTEXT_LENGTH]

    # Tính giá trị tối đa cho phép (90% số lượng tác nhân)
    max_agents_allowed = max(1, int(num_entities * 0.9))

    prompt = f"""Dựa trên các yêu cầu mô phỏng sau, hãy tạo cấu hình thời gian.

{context_truncated}

## Nhiệm vụ
Tạo JSON cấu hình thời gian.

### Nguyên tắc cơ bản (điều chỉnh dựa trên loại sự kiện và nhóm người tham gia):
- Cơ sở người dùng là người Trung Quốc, phải tuân theo thói quen múi giờ Bắc Kinh
- 0-5 giờ sáng: Hầu như không có hoạt động (hệ số 0.05)
- 6-8 giờ sáng: Dần dần thức dậy (hệ số 0.4)
- 9-18 giờ tối: Giờ làm việc, hoạt động vừa phải (hệ số 0.7)
- 19-22 giờ tối: Giờ cao điểm buổi tối, hoạt động tích cực nhất (hệ số 1.5)
- 23 giờ tối: Hoạt động giảm dần (hệ số 0.5)

### Trả về định dạng JSON (không markdown):

Ví dụ:
{{
    "total_simulation_hours": 72,
    "minutes_per_round": 60,
    "agents_per_hour_min": 5,
    "agents_per_hour_max": 50,
    "peak_hours": [19, 20, 21, 22],
    "off_peak_hours": [0, 1, 2, 3, 4, 5],
    "morning_hours": [6, 7, 8],
    "work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
    "reasoning": "Giải thích cấu hình thời gian"
}}

Mô tả trường:
- total_simulation_hours (int): 24-168 giờ, ngắn hơn cho tin tức nóng hổi, dài hơn cho các chủ đề đang diễn ra
- minutes_per_round (int): 30-120 phút, khuyến nghị 60
- agents_per_hour_min (int): Phạm vi 1-{max_agents_allowed}
- agents_per_hour_max (int): Phạm vi 1-{max_agents_allowed}
- peak_hours (mảng int): Điều chỉnh dựa trên nhóm người tham gia
- off_peak_hours (mảng int): Thường là đêm muộn/sáng sớm
- morning_hours (mảng int): Giờ buổi sáng
- work_hours (mảng int): Giờ làm việc
- reasoning (chuỗi): Giải thích ngắn gọn"""

    system_prompt = "Bạn là chuyên gia mô phỏng mạng xã hội. Trả về định dạng JSON thuần túy."

    try:
        return self._call_llm_with_retry(prompt, system_prompt)
    except Exception as e:
        logger.warning(f"Tạo cấu hình thời gian bằng LLM thất bại: {e}, sử dụng mặc định")
        return self._get_default_time_config(num_entities)

Phân tích và xác thực cấu hình thời gian

def _parse_time_config(self, result: Dict[str, Any], num_entities: int) -> TimeSimulationConfig:
    # Lấy giá trị thô
    agents_per_hour_min = result.get("agents_per_hour_min", max(1, num_entities // 15))
    agents_per_hour_max = result.get("agents_per_hour_max", max(5, num_entities // 5))

    # Xác thực và sửa lỗi: đảm bảo không vượt quá tổng số tác nhân
    if agents_per_hour_min > num_entities:
        logger.warning(f"agents_per_hour_min ({agents_per_hour_min}) vượt quá tổng số tác nhân ({num_entities}), đã sửa")
        agents_per_hour_min = max(1, num_entities // 10)

    if agents_per_hour_max > num_entities:
        logger.warning(f"agents_per_hour_max ({agents_per_hour_max}) vượt quá tổng số tác nhân ({num_entities}), đã sửa")
        agents_per_hour_max = max(agents_per_hour_min + 1, num_entities // 2)

    # Đảm bảo min < max
    if agents_per_hour_min >= agents_per_hour_max:
        agents_per_hour_min = max(1, agents_per_hour_max // 2)
        logger.warning(f"agents_per_hour_min >= max, đã sửa thành {agents_per_hour_min}")

    return TimeSimulationConfig(
        total_simulation_hours=result.get("total_simulation_hours", 72),
        minutes_per_round=result.get("minutes_per_round", 60),
        agents_per_hour_min=agents_per_hour_min,
        agents_per_hour_max=agents_per_hour_max,
        peak_hours=result.get("peak_hours", [19, 20, 21, 22]),
        off_peak_hours=result.get("off_peak_hours", [0, 1, 2, 3, 4, 5]),
        off_peak_activity_multiplier=0.05,
        morning_activity_multiplier=0.4,
        work_activity_multiplier=0.7,
        peak_activity_multiplier=1.5
    )

Cấu hình thời gian mặc định (Múi giờ Trung Quốc)

def _get_default_time_config(self, num_entities: int) -> Dict[str, Any]:
    return {
        "total_simulation_hours": 72,
        "minutes_per_round": 60,  # 1 giờ mỗi vòng
        "agents_per_hour_min": max(1, num_entities // 15),
        "agents_per_hour_max": max(5, num_entities // 5),
        "peak_hours": [19, 20, 21, 22],
        "off_peak_hours": [0, 1, 2, 3, 4, 5],
        "morning_hours": [6, 7, 8],
        "work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
        "reasoning": "Sử dụng cấu hình múi giờ Trung Quốc mặc định"
    }

Tạo cấu hình sự kiện

Cấu hình sự kiện định nghĩa các bài đăng ban đầu, chủ đề nóng và hướng kể chuyện:

def _generate_event_config(
    self,
    context: str,
    simulation_requirement: str,
    entities: List[EntityNode]
) -> Dict[str, Any]:

    # Lấy các loại thực thể có sẵn để LLM tham chiếu
    entity_types_available = list(set(
        e.get_entity_type() or "Unknown" for e in entities
    ))

    # Hiển thị ví dụ cho mỗi loại
    type_examples = {}
    for e in entities:
        etype = e.get_entity_type() or "Unknown"
        if etype not in type_examples:
            type_examples[etype] = []
        if len(type_examples[etype]) < 3:
            type_examples[etype].append(e.name)

    type_info = "\n".join([
        f"- {t}: {', '.join(examples)}"
        for t, examples in type_examples.items()
    ])

    context_truncated = context[:self.EVENT_CONFIG_CONTEXT_LENGTH]

    prompt = f"""Dựa trên các yêu cầu mô phỏng sau, hãy tạo cấu hình sự kiện.

Yêu cầu mô phỏng: {simulation_requirement}

{context_truncated}

## Các loại thực thể có sẵn và ví dụ
{type_info}

## Nhiệm vụ
Tạo JSON cấu hình sự kiện:
- Trích xuất từ khóa chủ đề nóng
- Mô tả hướng kể chuyện
- Thiết kế các bài đăng ban đầu, **mỗi bài đăng phải chỉ định poster_type**

**Quan trọng**: poster_type phải được chọn từ "Các loại thực thể có sẵn" ở trên, để các bài đăng ban đầu có thể được gán cho các tác nhân phù hợp.

Ví dụ: Các tuyên bố chính thức nên được đăng bởi các loại Official/University, tin tức bởi MediaOutlet, ý kiến sinh viên bởi Student.

Trả về định dạng JSON (không markdown):
{{
    "hot_topics": ["từ_khóa1", "từ_khóa2", ...],
    "narrative_direction": "<mô tả hướng kể chuyện>",
    "initial_posts": [
        {{"content": "Nội dung bài đăng", "poster_type": "Loại thực thể (phải khớp với các loại có sẵn)"}},
        ...
    ],
    "reasoning": "<giải thích ngắn gọn>"
}}"""

    system_prompt = "Bạn là chuyên gia phân tích ý kiến. Trả về định dạng JSON thuần túy."

    try:
        return self._call_llm_with_retry(prompt, system_prompt)
    except Exception as e:
        logger.warning(f"Tạo cấu hình sự kiện bằng LLM thất bại: {e}, sử dụng mặc định")
        return {
            "hot_topics": [],
            "narrative_direction": "",
            "initial_posts": [],
            "reasoning": "Sử dụng cấu hình mặc định"
        }

Gán người đăng bài ban đầu

Sau khi tạo các bài đăng ban đầu, hãy ghép chúng với các tác nhân thực tế:

def _assign_initial_post_agents(
    self,
    event_config: EventConfig,
    agent_configs: List[AgentActivityConfig]
) -> EventConfig:

    if not event_config.initial_posts:
        return event_config

    # Lập chỉ mục tác nhân theo loại
    agents_by_type: Dict[str, List[AgentActivityConfig]] = {}
    for agent in agent_configs:
        etype = agent.entity_type.lower()
        if etype not in agents_by_type:
            agents_by_type[etype] = []
        agents_by_type[etype].append(agent)

    # Ánh xạ bí danh loại (xử lý các biến thể LLM)
    type_aliases = {
        "official": ["official", "university", "governmentagency", "government"],
        "university": ["university", "official"],
        "mediaoutlet": ["mediaoutlet", "media"],
        "student": ["student", "person"],
        "professor": ["professor", "expert", "teacher"],
        "alumni": ["alumni", "person"],
        "organization": ["organization", "ngo", "company", "group"],
        "person": ["person", "student", "alumni"],
    }

    # Theo dõi các chỉ số đã sử dụng để tránh dùng lại cùng một tác nhân
    used_indices: Dict[str, int] = {}

    updated_posts = []
    for post in event_config.initial_posts:
        poster_type = post.get("poster_type", "").lower()
        content = post.get("content", "")

        matched_agent_id = None

        # 1. Khớp trực tiếp
        if poster_type in agents_by_type:
            agents = agents_by_type[poster_type]
            idx = used_indices.get(poster_type, 0) % len(agents)
            matched_agent_id = agents[idx].agent_id
            used_indices[poster_type] = idx + 1
        else:
            # 2. Khớp bí danh
            for alias_key, aliases in type_aliases.items():
                if poster_type in aliases or alias_key == poster_type:
                    for alias in aliases:
                        if alias in agents_by_type:
                            agents = agents_by_type[alias]
                            idx = used_indices.get(alias, 0) % len(agents)
                            matched_agent_id = agents[idx].agent_id
                            used_indices[alias] = idx + 1
                            break
                    if matched_agent_id is not None:
                        break

        # 3. Dự phòng: sử dụng tác nhân có ảnh hưởng cao nhất
        if matched_agent_id is None:
            logger.warning(f"Không tìm thấy tác nhân phù hợp cho loại '{poster_type}', sử dụng tác nhân có ảnh hưởng cao nhất")
            if agent_configs:
                sorted_agents = sorted(agent_configs, key=lambda a: a.influence_weight, reverse=True)
                matched_agent_id = sorted_agents[0].agent_id
            else:
                matched_agent_id = 0

        updated_posts.append({
            "content": content,
            "poster_type": post.get("poster_type", "Unknown"),
            "poster_agent_id": matched_agent_id
        })

        logger.info(f"Gán bài đăng ban đầu: poster_type='{poster_type}' -> agent_id={matched_agent_id}")

    event_config.initial_posts = updated_posts
    return event_config

Tạo cấu hình tác nhân theo lô

Việc tạo cấu hình cho hàng trăm tác nhân cùng một lúc sẽ vượt quá giới hạn token. Hệ thống xử lý theo lô gồm 15 tác nhân:

def _generate_agent_configs_batch(
    self,
    context: str,
    entities: List[EntityNode],
    start_idx: int,
    simulation_requirement: str
) -> List[AgentActivityConfig]:

    # Xây dựng thông tin thực thể với độ dài tóm tắt giới hạn
    entity_list = []
    summary_len = self.AGENT_SUMMARY_LENGTH
    for i, e in enumerate(entities):
        entity_list.append({
            "agent_id": start_idx + i,
            "entity_name": e.name,
            "entity_type": e.get_entity_type() or "Unknown",
            "summary": e.summary[:summary_len] if e.summary else ""
        })

    prompt = f"""Dựa trên thông tin sau, hãy tạo cấu hình hoạt động mạng xã hội cho mỗi thực thể.

Yêu cầu mô phỏng: {simulation_requirement}

## Danh sách thực thể
```json
{json.dumps(entity_list, ensure_ascii=False, indent=2)}

Nhiệm vụ

Tạo cấu hình hoạt động cho mỗi thực thể. Lưu ý:

Thời gian phải tuân theo thói quen của người Trung Quốc: 0-5 giờ sáng hầu như không có hoạt động, 19-22 giờ tối hoạt động tích cực nhất
Các tổ chức chính thức (Đại học/Cơ quan chính phủ): Hoạt động thấp (0.1-0.3), giờ làm việc (9-17), phản hồi chậm (60-240 phút), ảnh hưởng cao (2.5-3.0)
Truyền thông (Đơn vị truyền thông): Hoạt động vừa phải (0.4-0.6), hoạt động cả ngày (8-23), phản hồi nhanh (5-30 phút), ảnh hưởng cao (2.0-2.5)
Cá nhân (Sinh viên/Người/Cựu sinh viên): Hoạt động cao (0.6-0.9), chủ yếu buổi tối (18-23), phản hồi nhanh (1-15 phút), ảnh hưởng thấp (0.8-1.2)
Nhân vật công chúng/Chuyên gia: Hoạt động vừa phải (0.4-0.6), ảnh hưởng trung bình-cao (1.5-2.0)

system_prompt = "Bạn là chuyên gia phân tích hành vi mạng xã hội. Trả về định dạng JSON thuần túy."

try:
    result = self._call_llm_with_retry(prompt, system_prompt)
    llm_configs = {cfg["agent_id"]: cfg for cfg in result.get("agent_configs", [])}
except Exception as e:
    logger.warning(f"Tạo cấu hình tác nhân theo lô bằng LLM thất bại: {e}, sử dụng tạo dựa trên quy tắc")
    llm_configs = {}

# Xây dựng các đối tượng AgentActivityConfig
configs = []
for i, entity in enumerate(entities):
    agent_id = start_idx + i
    cfg = llm_configs.get(agent_id, {})

    # Sử dụng dự phòng dựa trên quy tắc nếu LLM thất bại
    if not cfg:
        cfg = self._generate_agent_config_by_rule(entity)

    config = AgentActivityConfig(
        agent_id=agent_id,
        entity_uuid=entity.uuid,
        entity_name=entity.name,
        entity_type=entity.get_entity_type() or "Unknown",
        activity_level=cfg.get("activity_level", 0.5),
        posts_per_hour=cfg.get("posts_per_hour", 0.5),
        comments_per_hour=cfg.get("comments_per_hour", 1.0),
        active_hours=cfg.get("active_hours", list(range(9, 23))),
        response_delay_min=cfg.get("response_delay_min", 5),
        response_delay_max=cfg.get("response_delay_max", 60),
        sentiment_bias=cfg.get("sentiment_bias", 0.0),
        stance=cfg.get("stance", "neutral"),
        influence_weight=cfg.get("influence_weight", 1.0)
    )
    configs.append(config)

return configs


### Cấu hình dự phòng dựa trên quy tắc

Khi LLM thất bại, hãy sử dụng các mẫu được định nghĩa trước:

```python
def _generate_agent_config_by_rule(self, entity: EntityNode) -> Dict[str, Any]:
    entity_type = (entity.get_entity_type() or "Unknown").lower()

    if entity_type in ["university", "governmentagency", "ngo"]:
        # Tổ chức chính thức: giờ làm việc, tần suất thấp, ảnh hưởng cao
        return {
            "activity_level": 0.2,
            "posts_per_hour": 0.1,
            "comments_per_hour": 0.05,
            "active_hours": list(range(9, 18)),  # 9:00-17:59
            "response_delay_min": 60,
            "response_delay_max": 240,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 3.0
        }

    elif entity_type in ["mediaoutlet"]:
        # Truyền thông: hoạt động cả ngày, tần suất vừa phải, ảnh hưởng cao
        return {
            "activity_level": 0.5,
            "posts_per_hour": 0.8,
            "comments_per_hour": 0.3,
            "active_hours": list(range(7, 24)),  # 7:00-23:59
            "response_delay_min": 5,
            "response_delay_max": 30,
            "sentiment_bias": 0.0,
            "stance": "observer",
            "influence_weight": 2.5
        }

    elif entity_type in ["professor", "expert", "official"]:
        # Chuyên gia/Giáo sư: làm việc + buổi tối, tần suất vừa phải
        return {
            "activity_level": 0.4,
            "posts_per_hour": 0.3,
            "comments_per_hour": 0.5,
            "active_hours": list(range(8, 22)),  # 8:00-21:59
            "response_delay_min": 15,
            "response_delay_max": 90,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 2.0
        }

    elif entity_type in ["student"]:
        # Sinh viên: cao điểm buổi tối, tần suất cao
        return {
            "activity_level": 0.8,
            "posts_per_hour": 0.6,
            "comments_per_hour": 1.5,
            "active_hours": [8, 9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
            "response_delay_min": 1,
            "response_delay_max": 15,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 0.8
        }

    elif entity_type in ["alumni"]:
        # Cựu sinh viên: tập trung buổi tối
        return {
            "activity_level": 0.6,
            "posts_per_hour": 0.4,
            "comments_per_hour": 0.8,
            "active_hours": [12, 13, 19, 20, 21, 22, 23],  # Giờ trưa + buổi tối
            "response_delay_min": 5,
            "response_delay_max": 30,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 1.0
        }

    else:
        # Người mặc định: cao điểm buổi tối
        return {
            "activity_level": 0.7,
            "posts_per_hour": 0.5,
            "comments_per_hour": 1.2,
            "active_hours": [9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23],
            "response_delay_min": 2,
            "response_delay_max": 20,
            "sentiment_bias": 0.0,
            "stance": "neutral",
            "influence_weight": 1.0
        }

Gọi LLM với tính năng thử lại và sửa lỗi JSON

Các lệnh gọi LLM có thể thất bại. Đầu ra bị cắt ngắn. JSON bị lỗi. Hệ thống xử lý tất cả những vấn đề này:

def _call_llm_with_retry(self, prompt: str, system_prompt: str) -> Dict[str, Any]:
    import re

    max_attempts = 3
    last_error = None

    for attempt in range(max_attempts):
        try:
            response = self.client.chat.completions.create(
                model=self.model_name,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                response_format={"type": "json_object"},
                temperature=0.7 - (attempt * 0.1)  # Giảm nhiệt độ khi thử lại
            )

            content = response.choices[0].message.content
            finish_reason = response.choices[0].finish_reason

            # Kiểm tra xem có bị cắt ngắn không
            if finish_reason == 'length':
                logger.warning(f"Đầu ra LLM bị cắt ngắn (lần thử {attempt+1})")
                content = self._fix_truncated_json(content)

            # Thử phân tích JSON
            try:
                return json.loads(content)
            except json.JSONDecodeError as e:
                logger.warning(f"Phân tích JSON thất bại (lần thử {attempt+1}): {str(e)[:80]}")

                # Thử sửa lỗi JSON
                fixed = self._try_fix_config_json(content)
                if fixed:
                    return fixed

                last_error = e

        except Exception as e:
            logger.warning(f"Gọi LLM thất bại (lần thử {attempt+1}): {str(e)[:80]}")
            last_error = e
            import time
            time.sleep(2 * (attempt + 1))

    raise last_error or Exception("Gọi LLM thất bại")

Sửa lỗi JSON bị cắt ngắn

def _fix_truncated_json(self, content: str) -> str:
    content = content.strip()

    # Đếm các dấu ngoặc chưa đóng
    open_braces = content.count('{') - content.count('}')
    open_brackets = content.count('[') - content.count(']')

    # Kiểm tra chuỗi chưa đóng
    if content and content[-1] not in '",}]':
        content += '"'

    # Đóng các dấu ngoặc
    content += ']' * open_brackets
    content += '}' * open_braces

    return content

Sửa lỗi JSON nâng cao

def _try_fix_config_json(self, content: str) -> Optional[Dict[str, Any]]:
    import re

    # Sửa lỗi cắt ngắn
    content = self._fix_truncated_json(content)

    # Trích xuất phần JSON
    json_match = re.search(r'\{[\s\S]*\}', content)
    if json_match:
        json_str = json_match.group()

        # Xóa ký tự xuống dòng trong chuỗi
        def fix_string(match):
            s = match.group(0)
            s = s.replace('\n', ' ').replace('\r', ' ')
            s = re.sub(r'\s+', ' ', s)
            return s

        json_str = re.sub(r'"[^"\\]*(?:\\.[^"\\]*)*"', fix_string, json_str)

        try:
            return json.loads(json_str)
        except:
            # Thử xóa các ký tự điều khiển
            json_str = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', json_str)
            json_str = re.sub(r'\s+', ' ', json_str)
            try:
                return json.loads(json_str)
            except:
                pass

    return None

Cấu trúc dữ liệu cấu hình

Cấu hình hoạt động tác nhân

@dataclass
class AgentActivityConfig:
    """Cấu hình hoạt động của một tác nhân"""
    agent_id: int
    entity_uuid: str
    entity_name: str
    entity_type: str

    # Mức độ hoạt động (0.0-1.0)
    activity_level: float = 0.5

    # Tần suất đăng bài (mỗi giờ)
    posts_per_hour: float = 1.0
    comments_per_hour: float = 2.0

    # Giờ hoạt động (định dạng 24 giờ, 0-23)
    active_hours: List[int] = field(default_factory=lambda: list(range(8, 23)))

    # Tốc độ phản hồi (độ trễ phản ứng tính bằng phút mô phỏng)
    response_delay_min: int = 5
    response_delay_max: int = 60

    # Xu hướng cảm xúc (-1.0 đến 1.0, từ tiêu cực đến tích cực)
    sentiment_bias: float = 0.0

    # Lập trường về các chủ đề cụ thể
    stance: str = "neutral"  # ủng hộ, phản đối, trung lập, quan sát viên

    # Trọng số ảnh hưởng (ảnh hưởng đến khả năng được nhìn thấy)
    influence_weight: float = 1.0

Cấu hình mô phỏng thời gian

@dataclass
class TimeSimulationConfig:
    """Cấu hình mô phỏng thời gian (Múi giờ Trung Quốc)"""
    total_simulation_hours: int = 72  # Mặc định 72 giờ (3 ngày)
    minutes_per_round: int = 60  # 60 phút mỗi vòng

    # Số tác nhân được kích hoạt mỗi giờ
    agents_per_hour_min: int = 5
    agents_per_hour_max: int = 20

    # Giờ cao điểm (buổi tối 19-22 giờ, người Trung Quốc hoạt động tích cực nhất)
    peak_hours: List[int] = field(default_factory=lambda: [19, 20, 21, 22])
    peak_activity_multiplier: float = 1.5

    # Giờ thấp điểm (sáng sớm 0-5 giờ, hầu như không có hoạt động)
    off_peak_hours: List[int] = field(default_factory=lambda: [0, 1, 2, 3, 4, 5])
    off_peak_activity_multiplier: float = 0.05

    # Giờ buổi sáng
    morning_hours: List[int] = field(default_factory=lambda: [6, 7, 8])
    morning_activity_multiplier: float = 0.4

    # Giờ làm việc
    work_hours: List[int] = field(default_factory=lambda: [9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
    work_activity_multiplier: float = 0.7

Các thông số mô phỏng hoàn chỉnh

@dataclass
class SimulationParameters:
    """Cấu hình thông số mô phỏng hoàn chỉnh"""
    simulation_id: str
    project_id: str
    graph_id: str
    simulation_requirement: str

    time_config: TimeSimulationConfig = field(default_factory=TimeSimulationConfig)
    agent_configs: List[AgentActivityConfig] = field(default_factory=list)
    event_config: EventConfig = field(default_factory=EventConfig)
    twitter_config: Optional[PlatformConfig] = None
    reddit_config: Optional[PlatformConfig] = None

    llm_model: str = ""
    llm_base_url: str = ""

    generated_at: str = field(default_factory=lambda: datetime.now().isoformat())
    generation_reasoning: str = ""

    def to_dict(self) -> Dict[str, Any]:
        time_dict = asdict(self.time_config)
        return {
            "simulation_id": self.simulation_id,
            "project_id": self.project_id,
            "graph_id": self.graph_id,
            "simulation_requirement": self.simulation_requirement,
            "time_config": time_dict,
            "agent_configs": [asdict(a) for a in self.agent_configs],
            "event_config": asdict(self.event_config),
            "twitter_config": asdict(self.twitter_config) if self.twitter_config else None,
            "reddit_config": asdict(self.reddit_config) if self.reddit_config else None,
            "llm_model": self.llm_model,
            "llm_base_url": self.llm_base_url,
            "generated_at": self.generated_at,
            "generation_reasoning": self.generation_reasoning,
        }

Bảng tóm tắt: Các mẫu tác nhân theo loại

Loại tác nhân	Hoạt động	Giờ hoạt động	Bài đăng/giờ	Bình luận/giờ	Phản hồi (phút)	Ảnh hưởng
Đại học	0.2	9-17	0.1	0.05	60-240	3.0
Cơ quan chính phủ	0.2	9-17	0.1	0.05	60-240	3.0
Đơn vị truyền thông	0.5	7-23	0.8	0.3	5-30	2.5
Giáo sư	0.4	8-21	0.3	0.5	15-90	2.0
Sinh viên	0.8	8-12, 18-23	0.6	1.5	1-15	0.8
Cựu sinh viên	0.6	12-13, 19-23	0.4	0.8	5-30	1.0
Cá nhân (mặc định)	0.7	9-13, 18-23	0.5	1.2	2-20	1.0

Kết luận

Việc tạo cấu hình được hỗ trợ bởi LLM yêu cầu xử lý cẩn thận các vấn đề sau:

Tạo theo từng bước: Chia thành các giai đoạn dễ quản lý (thời gian → sự kiện → tác nhân → nền tảng)
Xử lý theo lô: Xử lý 15 tác nhân mỗi lô để tránh giới hạn ngữ cảnh
Sửa lỗi JSON: Xử lý việc cắt ngắn bằng cách khớp dấu ngoặc và thoát chuỗi
Dự phòng dựa trên quy tắc: Cung cấp các giá trị mặc định hợp lý khi LLM thất bại
Các mẫu cụ thể theo loại: Các loại tác nhân khác nhau có các mẫu hoạt động khác nhau
Xác thực và sửa lỗi: Kiểm tra các giá trị được tạo và khắc phục sự cố (ví dụ: agents_per_hour > total_agents)

nút