• Overview of Chinese core journals
  • Chinese Science Citation Database(CSCD)
  • Chinese Scientific and Technological Paper and Citation Database (CSTPCD)
  • China National Knowledge Infrastructure(CNKI)
  • Chinese Science Abstracts Database(CSAD)
  • JST China
  • SCOPUS
ZHANG Qian, LI Tian-hao, BAI Chun-guang. Hierarchical Decision Optimization Method Based on Multi-Agent Reinforcement Learning[J]. Journal of University of Electronic Science and Technology of China(SOCIAL SCIENCES EDITION), 2022, 24(6): 90-96. DOI: 10.14071/j.1008-8105(2022)-1056
Citation: ZHANG Qian, LI Tian-hao, BAI Chun-guang. Hierarchical Decision Optimization Method Based on Multi-Agent Reinforcement Learning[J]. Journal of University of Electronic Science and Technology of China(SOCIAL SCIENCES EDITION), 2022, 24(6): 90-96. DOI: 10.14071/j.1008-8105(2022)-1056

Hierarchical Decision Optimization Method Based on Multi-Agent Reinforcement Learning

  • Purpose/Significance With the development of information technology and artificial intelligence, big data-driven auxiliary decision-making methods are more scientific and accurate. Reinforcement learning, as a classical method of sequential decision making, has obvious advantages in decision optimization. However, the traditional methods cannot solve the multi-level and multi-objective decision-making optimization problem, especially in the long-term decision-making optimization problem, the lag of learning reward seriously restricts the efficiency of reinforcement learning. Design/Methodology This paper proposes a hierarchical decision-making optimization method based on multi-agent reinforcement learning. The idea of goal decomposition is applied to solve the long-term decision-making optimization problem. Based on reinforcement learning theory, multi-agent with hierarchical relationship cooperates with each other and uses neural network to build models. The upper agent learns the decomposition strategy of goals, the lower agent learns the action strategy to complete goals. And the agent parameters are updated alternately, learn the best strategy to complete team tasks together and realize decision optimization. Conclusions/Findings The effectiveness and superiority of this method are verified in the experiment of clinical medical decision optimization, which can provide theoretical and methodological support for solving the long-term sequence decision optimization problem.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return