← 返回
面向HLS-FPGA设计的大规模矩阵运算资源复用策略
A Resource Reuse Strategy for Large-Scale Matrix Operations in HLS-Based FPGA Design
| 作者 | Zhihang Lei · Chubo Liu · Zheng Xiao · Baixuan Wu · Anthony Theodore Chronopoulos · Kenli Li |
| 期刊 | IEEE Transactions on Industrial Informatics |
| 出版日期 | 2025年12月 |
| 卷/期 | 第 22 卷 第 2 期 |
| 技术分类 | 控制与算法 |
| 技术标签 | 模型预测控制MPC 机器学习 深度学习 智能化与AI应用 |
| 相关度评分 | ★★ 2.0 / 5.0 |
| 关键词 |
语言:
中文摘要
本文针对深度学习中大规模矩阵运算在FPGA高阶综合(HLS)实现时的资源与延迟权衡问题,提出一种基于整数线性规划(ILP)的资源复用优化策略,在ResNet上实现2.21×延迟降低和2.14×能耗下降,为AI加速硬件设计提供新优化范式。
English Abstract
Matrix operations (MOPs) are essential for various computational tasks, particularly in deep learning models, which have grown increasingly complex. As these models expand, their demand for computational resources increases significantly, making deployment on resource-limited hardware platforms, such as field-programmable gate arrays, more challenging, especially in balancing resource allocation and computation latency. Reuse-control techniques have been employed to optimize resource allocation by enabling multiple operations to share the same computational units, such as digital signal processors. Yet, this approach presents a tradeoff between resource utilization and latency. In this study, we tackle this challenge by thoroughly analyzing existing reuse-control mechanisms and introducing a novel integer linear programming (ILP)-based strategy. Our experimental results demonstrate that the proposed approach not only improves resource utilization for large-scale MOPs but also significantly reduces latency compared to existing methods. In the best case, on the ResNet model, our ILP-based method achieves up to $2.21\times$ lower latency and $2.14\times$ lower energy consumption per inference, demonstrating significantly improved performance and energy efficiency. In addition, our work provides a new optimization perspective for hardware design based on high-level synthesis.
S
SunView 深度解读
该文聚焦FPGA上AI矩阵运算的HLS级资源调度优化,虽不直接涉及光伏/储能功率变换,但其ILP驱动的低延迟、高能效计算架构对阳光电源iSolarCloud平台的边缘智能诊断(如组串级故障识别)、PowerTitan储能系统中BMS实时状态预测、以及ST系列PCS内嵌AI协处理器的轻量化推理加速具有潜在参考价值。建议在下一代智能PCS和光储协同控制器中探索HLS+AI编译协同优化路径。