GRAPES_MESO与WRF模式在鲲鹏平台上的高性能计算特征分析
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(42475038,42030610)、浙江省科技计划项目“尖兵领雁+X”研发攻关计划(2024C03256)、浙江省自然科学基金项目(LY21D050001,LGF21D010001)、浙江省气象科技计划重点项目(2022ZD14)共同资助


Analysis of High-Performance Computing Characteristics of GRAPES_MESO and WRF Models on Kunpeng Platform
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    本文选取GRAPES_MESO(Global/Regional Assimilation PrEdiction System-Mesoscale version)模式和WRF(Weather Research and Forecasting Model)模式在国产鲲鹏(KUNPENG)平台上开展数值模式计算特征分析,并与英特尔(X86)平台进行对比,探讨数值模式在鲲鹏平台上资源使用、计算瓶颈、热点函数等方面的改进空间。结果表明:经过适配后,两个模式在国产KUNPENG平台上能得到与英特尔X86平台一致的计算结果,呈现出较好的并行扩展性;两个模式对CPU的使用率均较高,计算瓶颈主要集中在后端CPU瓶颈,对节点的整体内存使用率适当,后续优化主要集中在代码效率、算法、访存等方面。在KUNPENG平台上,可以考虑通过优化集合通信的Collective Sync、Allreduce和Wait算法,来改善GRAPES_MESO模式的MPI的通信效率;可通过优化GCR算法、以uct、ucg为代表的集合通信热点、以expf、powf等为代表的数学函数、malloc内存操作等热点函数对GRAPES_MESO模式进行优化。

    Abstract:

    The GRAPES_MESO and WRF models are used to analyse the computational characteristics of numerical models on the KUNPENG platform, and are compared with the Intel (X86) platform to explore the improvement space of numerical models in resource utilisation, computational bottlenecks, hotspot functions, and other aspects on the KUNPENG platform. The results indicate that: (1) After adaptation, both models obtain consistent results on the domestic KUNPENG platform as on the X86 platform. (2) Both models exhibit good parallel scalability on both X86 and KUNPENG platforms. When using the same number of processes, the computing efficiency of the KUNPENG platform is 65% to 90% of that of the X86 platform. However, when using the same number of nodes, the computing efficiency of the KUNPENG platform exceeds that of the X86 platform by 22% to 45%. (3) In terms of hardware resource utilisation, the two models consume the most time in computing, followed by communication, and finally IO. The models have a higher CPU usage rate, appropriate memory usage of nodes, and the subsequent optimisation mainly focuses on code efficiency, algorithm, memory access, etc. (4) In terms of MPI communication, the communication efficiency of MPI in the GRAPES model improves by optimising the Collective Sync, Allreduce, and Wait algorithms of collective communication on the KUNPENG platform. (5) Through top-down analysis, it is found that the computing bottlenecks of the two models on the two platforms are mainly concentrated in the back-end CPU bottleneck and the back-end memory subsystem bottleneck. Thanks to the optimisation of multi-memory channels and the Bisheng compiler, the memory access efficiency, branch prediction rate, and cache hit rate of the GRAPES model on the KUNPENG platform are higher than those on the X86 platform. In addition, from the perspective of memory subsystem bottleneck information, TLB Miss and L1/L2 Miss are generally low, the memory access efficiency is high, and the memory access optimisation space is limited. From the perspective of instruction distribution information, the proportion of memory read and shaping instructions is relatively high, and there are certain floating-point instructions, which reflect the high memory bandwidth advantage of the KUNPENG architecture. In addition, the vectorisation instruction is not high, so vectorisation optimisation is considered. (6) From the analysis of hotspots, the GRAPES model is optimised by the GCR algorithm, the collective communication hotspots represented by uct and ucg, the mathematical functions represented by expf and powf, and the hot functions such as malloc memory operations are also optimised on the KUNPENG platform.

    参考文献
    相似文献
    引证文献
引用本文

陈锋,何明扬,陈晔峰,吴兵成,徐诚. GRAPES_MESO与WRF模式在鲲鹏平台上的高性能计算特征分析[J].气象科技,2025,53(3):347~361

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-11
  • 定稿日期:2025-01-07
  • 录用日期:
  • 在线发布日期: 2025-06-27
  • 出版日期:
您是第位访问者
技术支持:北京勤云科技发展有限公司