张倩, 李天皓, 白春光. 基于多智能体强化学习的分层决策优化方法[J]. 电子科技大学学报社科版, 2022, 24(6): 90-96. DOI: 10.14071/j.1008-8105(2022)-1056
引用本文: 张倩, 李天皓, 白春光. 基于多智能体强化学习的分层决策优化方法[J]. 电子科技大学学报社科版, 2022, 24(6): 90-96. DOI: 10.14071/j.1008-8105(2022)-1056
ZHANG Qian, LI Tian-hao, BAI Chun-guang. Hierarchical Decision Optimization Method Based on Multi-Agent Reinforcement Learning[J]. Journal of University of Electronic Science and Technology of China(SOCIAL SCIENCES EDITION), 2022, 24(6): 90-96. DOI: 10.14071/j.1008-8105(2022)-1056
Citation: ZHANG Qian, LI Tian-hao, BAI Chun-guang. Hierarchical Decision Optimization Method Based on Multi-Agent Reinforcement Learning[J]. Journal of University of Electronic Science and Technology of China(SOCIAL SCIENCES EDITION), 2022, 24(6): 90-96. DOI: 10.14071/j.1008-8105(2022)-1056

基于多智能体强化学习的分层决策优化方法

Hierarchical Decision Optimization Method Based on Multi-Agent Reinforcement Learning

  • 摘要:
    目的/意义随着信息技术和人工智能的发展,大数据驱动的辅助决策方法让决策更加科学准确。强化学习作为序贯决策的经典方法,在决策优化方面有着明显的优势。但传统方法无法解决多层次、多目标的决策优化问题,尤其是在长周期决策优化问题中,学习奖励的滞后性严重制约着强化学习的效率。
    设计/方法提出基于多智能体强化学习的分层决策优化方法,应用目标分解的思想解决长期决策优化问题。该方法基于强化学习理论使具有层级关系的多智能体相互合作,利用神经网络进行建模,上层智能体学习目标的分解策略,下层智能体学习完成目标的行动策略,智能体参数交替更新,共同学习完成团队任务的最佳策略,实现决策优化。
    结论/发现在临床医疗决策优化的实验中验证了该方法的有效性与优越性,可为解决长周期序列决策优化问题提供理论与方法支持。

     

    Abstract: Purpose/Significance With the development of information technology and artificial intelligence, big data-driven auxiliary decision-making methods are more scientific and accurate. Reinforcement learning, as a classical method of sequential decision making, has obvious advantages in decision optimization. However, the traditional methods cannot solve the multi-level and multi-objective decision-making optimization problem, especially in the long-term decision-making optimization problem, the lag of learning reward seriously restricts the efficiency of reinforcement learning. Design/Methodology This paper proposes a hierarchical decision-making optimization method based on multi-agent reinforcement learning. The idea of goal decomposition is applied to solve the long-term decision-making optimization problem. Based on reinforcement learning theory, multi-agent with hierarchical relationship cooperates with each other and uses neural network to build models. The upper agent learns the decomposition strategy of goals, the lower agent learns the action strategy to complete goals. And the agent parameters are updated alternately, learn the best strategy to complete team tasks together and realize decision optimization. Conclusions/Findings The effectiveness and superiority of this method are verified in the experiment of clinical medical decision optimization, which can provide theoretical and methodological support for solving the long-term sequence decision optimization problem.

     

/

返回文章
返回