AI engine
.jpg)
AI engine is a computing architecture created by AMD (formerly by Xilinx, which AMD acquired in 2022[1]). It is commonly used for accelerating linear algebra operations,[2] such as matrix multiplication, used in artificial intelligence algorithms,[3][4] digital signal processing,[5] and more generally, high-performance computing.[6][7] The first products containing AI engines were the Versal adaptive compute acceleration platforms,[8] which combine scalar, adaptable, and intelligent engines connected through a Network on Chip (NoC).[9]
AI engines have evolved significantly as modern computing workloads have changed including changes directed toward accelerating AI applications. The basic architecture of a single AI engine integrates vector processors and scalar processors to implement Single Instruction Multiple Data (SIMD)[10][11] capabilities. AI engines are integrated with many other architectures like FPGAs, CPUs, and GPUs to provide a plethora of architectures for high performance, heterogeneous computation with wide application in different domains.[12][13][14]
Etymology
AMD says that "AI" in AI Engine is not an acronym for artificial intelligence or any other term.[15]
History
The AMD AI engines were originally released by Xilinx, Inc., an American company active in the creation of field-programmable gate arrays (FPGAs).[16] Their initial goal was to accelerate signal processing and, more generally, applications where data parallelism could offer significant improvements. Initially, AI engines were released combined with an FPGA layer in the novel Versal platforms.[8] The initial systems, the VCK190 and VCK5000, contained 400 AI engines in their AI engine layer, connected through a VC1902. For connectivity, this architecture class relied on an innovative Network on Chip, a high-performance connectivity devised to become the core connectivity of modern FPGA fabric.[9]
In 2022, the AI engine project changed when Xilinx was officially acquired by AMD,[1] an American copmany active in the computing architecture market. The AI engines were integrated with other computing systems to target a wider range of applications, finding benefits when considering AI workloads. Indeed, even though the Versal architecture proved powerful, it was complicated and unfamiliar to a vast academic and industrial community segment.[12] For this reason, AMD, along with third-party developers, began releasing improved toolsets and software stacks aimed at simplifying the programming challenges posed by the platform, targeting productivity and programmability.[17][18][19][20]
Aware of the AI workload needs, in 2023, AMD announced the AI engine ML (AIE-ML),[21] the second generation of such architecture. It added support for AI-specific data types like bfloat16,[22] a common data type for deep learning applications. The version retained the same vector processing capabilities of the previous instance, but enlarged memory to support more intermediate computations.[23] From this generation, AMD integrates AI engines with other processing units like CPUs and GPUs, which are incorporated into modern Ryzen AI processors. In such systems, AI engines are usually referred to as Compute Tiles that are a self-contained processing block designed to efficiently execute AI and signal processing workloads. These blocks are integrated with different other types of tiles,[17][24] namely Memory tile and Shim tile. The apparatus containing the interconnected three kinds of tiles is named XDNA,[25] and its first generation, namely XDMA 1, is released on Ryzen AI Phoenix PCs. Along with this release, AMD continues the research about programmability, releasing, as open source tool, Riallto.[26]
On a similar path, at the end of 2023, early 2024, AMD announced the XDNA 2, along with the Strix series of Ryzen AI architectures.[27][28] Different from the first generation of XDNA architectures, the second one offers more units to target the massive workload of ML systems. Again, to keep the efforts on the programmability side, AMD released the open source Ryzen AI SW toolchain, which includes the tools and runtime libraries for optimizing and deploying AI inference on Ryzen AI PC.[25]
Lastly, as neural processing and deep learning applications are spreading across different domains, researchers and industry are referring to XDNA architectures as Neural Processing Units (NPUs). However, the term includes all those architectures specifically meant for deep learning workloads[29] and several companies, such as Huawei[30] and Tesla,[31] are proposing their own alternative.[30][31]
Hardware architecture
AI engine tile

A single AI engine is a 7-way VLIW[11][32] processor that offers vector and scalar capabilities, enabling parallel execution of multiple operations per clock cycle. The architecture includes a 128-bit wide vector unit capable of SIMD (Single Instruction, Multiple Data) execution, a scalar unit for control and sequential logic, and a set of load/store units for memory access. The maximum vector register size is 1024 bit, leading to different vector sizes depending on the vector data type.[32]
In the first generation, each AI engine tile has a 32KB memory to load partial computations and 16KB of program memory.[32]
AI engines are statically scheduled architectures. As widely studied in literature, static scheduling suffers from code explosion, requiring manual code optimizations when writing the AI engine kernel to handle this side effect.[20][11]
The main programming language for a single AI engine is C++, used for both the connection declaration among multiple engines and the kernel logic executed by a specific AI engine tile.[33] However, different toolchains can offer support for other programming languages, targeting specific applications or offering automation.[20]
First generation - the AI engine layer

In the first generation of Versal systems, each AI engine is connected to multiple other engines through three main interfaces, namely cascade, memory and stream interfaces. Each one represents a possible communication mechanism of each AI engine with the others.[6]
The AI engine layer of the first versal systems combined 400 AI engines together.[34] Each AI engine has a 32KB memory that extended up to 128KB by using the memory of neighbouring engines. This leads to a reduced number of actual compute cores but ensures enlarged data memory.[8][20]
Each AI engine can execute an independent function, or multiple functions by leveraging time multiplexing. The programming structure used to describe the AI engine instantiation, placement and connection is named AIE graph. The official programming model suggested by AMD requires writing such a file in C++. However, different programming toolchains, from both companies and research, can support different alternatives to improve programmability and/or performance.[20][24]
To compile the application, the original toolchain relies on a closed-source AI engine compiler that automatically performs placement and routing, despite custom indications that can be given when writing the AIE graph.[35]
As the AI engine were initially integrated in Versal systems only, thus combining AI engines with FPGAs capabilities and Network on Chip connectivity, this architectural layer also offers a limited number of direct communications with both of them. Such communications needs to be specified in both the AIE graph, to ensure a correct placement of the AI engines, and during the system-level design.[20][7]
Second generation - the AI engine ML
The second generation of AMD's AI engines, or AI engine ML (AIE-ML), provides some architectural modifications with respect to the first generation, focusing on performance and efficiency for machine-learning workloads.[23]
AIE-ML possesses almost twice the density of computing per tile, improved memory bandwidth, and natively supports data types with more AI inference workload-optimized formats such as INT8 and bfloat formats. These optimizations allow the second-generation engine to deliver up to three times more TOPS per watt than the underlying AI engine, which was primarily built for DSP-heavy workloads and required explicit SIMD programming and hand-coded data partitioning.[3]
Recent publications from researchers and institutions[36] confirm that AIE-ML offers more scalable, more on-chip memory, and more computational power,[3] making it better suited for edge-based modern ML inference workloads. These advances collectively counter the limitations of the first generation.[23]
According to the company official documentation, there are some key similarities and differences between the two architectures.[23]
| similarities between AIE-ML and AIE | differences between AIE-ML and AIE | 
|---|---|
| Same process, voltage, frequency, clock and power distribution | AIE-ML features doubled compute/memory. AIE-ML features a processor bus for direct read/write accesses to local tile memory-mapped registers. | 
| One VLIW SIMD processor per tile | AIE-ML features an increased memory capacity (64 KB) | 
| Same debug functionality | AIE-ML features an improved power efficiency (TOPs/W). | 
| Same connectivity with PL and NoC | AIE-ML features an improved stream switch functionality, performing source to destination parity check and deterministic merge | 
| Same bandwidth for stream interconnect | AIE-ML features a grid-array architecture supporting both vertical (top to bottom) and horizontal (left to right) 512-bit cascade, versus the 384-bit horizontal cascade only of AIE. | 
XDNA 1

The XDNA is the hardware layer combining three types of tiles:[24][25]
- The Compute Tile (AI engine ML) is responsible for executing vector and scalar operations.
- The Memory Tile is responsible for 512 KB of local memory and computes pattern-specific data movements to upstream Compute Tile fetch requests.
- The ShimTile, which handles the host memory interaction, controls the data exchanges between Memory and Compute Tiles.
The XDNA architecture is combined with other architectural layers such as CPUs and GPUs, for Ryzen AI Phoenix architectures, composing the AMD product line for energy-efficient inference and AI workloads.[24]
XDNA 2
Second generation of XDNA layers is integrated within Ryzen AI Strix architecture and official documents from the producer claim it as specifically tailored for LLM inference workloads.[25]
Tools and programming model
The main programming environments for AI engine, officially supported by AMD, are the Vitis flow, which uses the Vitis toolchain to program the hardware accelerator.[33][37][7]

Vitis offers support for both hardware and software developers using a unified development environment, including high-level synthesis, RTL-based flows, and domain-specific libraries.[38] Vitis enables applications to be deployed onto heterogeneous platforms, including AI engines, FPGAs, and scalar processors.[38]
Newer architectures are rather moving towards a design approach utilizing Vitis for hardware and IP design, while relying on Vivado for system integration and hardware setup. Vivado,[39] is also a part of the AMD toolchain ecosystem, is primarily utilized for RTL design and IP integration and offers a GUI-based design environment to design block designs and manage synthesis, implementation, and bitstream generation.[39]
About the AI engine layer, the main programming language for a single AI engine is C++, used for both the connection declaration among multiple engines and the kernel logic executed by a specific AI engine tile.[33]
Research toolchains
Parallelly to the company efforts in proposing programming models, design flows and tools, researchers also proposed their own toolchains targeting programmability, performance, or simplifying development for a subset of applications.[20][40][24][19]
Following some of the main research toolchains are brefly described.[41][20][40][19]
- IRON is an open-source toolchain developed by AMD in collaboration with several researchers. IRON toolchain uses MLIR as its middle representation.[41] At the user level, IRON permits a Python API for placing and orchestrating multiple AI engines. Such Python code is then translated into MLIR using one of the two possible backends: a Vitis-based backend and an open-source backend using the Peano compiler.[24] IRON still relies on C++ for kernel development, supporting all the APIs of the standard AI engine kernel development flow.[24]
- ARIES (An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI engines) presents a high-level, tile-based programming model and shared MLIR intermediate representation encompassing both AI engines and FPGA fabric. It represents task-level, tile-level, and instruction-level parallelism in MLIR and accommodates global and local optimization passes. ARIES generates compact C++ code for AI engine kernels and data-movement logic, allowing kernel specification through Python.[20]
- EA4RCA is aimed at a specialized subclass of algorithms, regular Communication-Avoiding algorithms. EA4RCA introduces a design environment optimized for the Versal heterogeneity, emphasizing AI engine performance and high-speed data streaming abstractions. EA4RCA is aimed at algorithms exhibiting regular communication patterns to make the most out of parallelism and hierarchies of memory in the Versal platform.[40]
- CHARM is a framework to compose multiple diverse matrix multiplication accelerators working concurrently towards different layers within one application. CHARM includes analytical models which guide design space exploration to determine accelerator partitions and layer scheduling.[19]
See also
- Central processing unit
- Field programmable gate arrays
- Flynn's taxonomy
- Hardware acceleration
- Neural processing unit
- NVIDIA deep learning accelerator
- Vivado
References
- ^ a b "AMD Completes Acquisition of Xilinx". Advanced Micro Devices, Inc. 2022-02-14. Retrieved 2025-07-08.
- ^ "Developing a BLAS library for the AMD AI Engine Extended Abstract". arxiv.org. Retrieved 2025-07-09.
- ^ a b c Mhatre, Kaustubh; Taka, Endri; Arora, Aman (2025-04-15), GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines, arXiv:2504.09688
- ^ Chen, Paul; Manjunath, Pavan; Wijeratne, Sasindu; Zhang, Bingyi; Prasanna, Viktor (2023-09-04). "Exploiting On-Chip Heterogeneity of Versal Architecture for GNN Inference Acceleration". 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). IEEE. pp. 219–227. doi:10.1109/FPL60245.2023.00038. ISBN 979-8-3503-4151-5.
- ^ Flores, Fernando; Peña, María Dolores Valdés; Sánchez, José Manuel Villapún; Pazo, Jesús Manuel Costa; Graña, Camilo Quintáns (2024-11-13). "Evaluation of the Versal Intelligent Engines for Digital Signal Processing Basic Core Units". 2024 39th Conference on Design of Circuits and Integrated Systems (DCIS). IEEE. pp. 1–6. doi:10.1109/DCIS62603.2024.10769170. ISBN 979-8-3503-6439-2.
- ^ a b "AI Engine: Meeting the Compute Demands of Next-Generation Applications".
- ^ a b c Menzel, Johannes; Plessl, Christian (2025-05-04). "Efficient and Distributed Computation of Electron Repulsion Integrals on AMD AI Engines". 2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). pp. 95–104. doi:10.1109/FCCM62733.2025.00044. ISBN 979-8-3315-0281-2.
- ^ a b c Vissers, Kees (2019-02-20). "Versal: The Xilinx Adaptive Compute Acceleration Platform (ACAP)". Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA '19. New York, NY, USA: Association for Computing Machinery. p. 83. doi:10.1145/3289602.3294007. ISBN 978-1-4503-6137-8.
- ^ a b Swarbrick, Ian; Gaitonde, Dinesh; Ahmad, Sagheer; Gaide, Brian; Arbel, Ygal (2019-02-20). "Network-on-Chip Programmable Platform in VersalTM ACAP Architecture". Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA '19. New York, NY, USA: Association for Computing Machinery. pp. 212–221. doi:10.1145/3289602.3293908. ISBN 978-1-4503-6137-8.
- ^ Chhugani, Jatin; Nguyen, Anthony D.; Lee, Victor W.; Macy, William; Hagog, Mostafa; Chen, Yen-Kuang; Baransi, Akram; Kumar, Sanjeev; Dubey, Pradeep (2008-08-01). "Efficient implementation of sorting on multi-core SIMD CPU architecture". Proc. VLDB Endow. 1 (2): 1313–1324. doi:10.14778/1454159.1454171. ISSN 2150-8097.
- ^ a b c Hennessy, John L.; Patterson, David A. (2019). Computer architecture: a quantitative approach. Krste Asanović (Sixth ed.). Cambridge, Mass: Morgan Kaufmann Publishers, an imprint of Elsevier. ISBN 978-0-12-811905-1.
- ^ a b Brown, Nick (2023-02-12). "Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation". Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. FPGA '23. New York, NY, USA: Association for Computing Machinery. pp. 91–97. arXiv:2301.13016. doi:10.1145/3543622.3573047. ISBN 978-1-4503-9417-8.
- ^ Shimamura, Kotaro; Ohno, Ayumi; Takamaeda-Yamazaki, Shinya (2025-02-17), Exploring the Versal AI Engine for 3D Gaussian Splatting, arXiv:2502.11782
- ^ Brown, Nick; Canal, Gabriel Rodríguez (2025-02-14), "Seamless Acceleration of Fortran Intrinsics via AMD AI Engines", Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, p. 185, arXiv:2502.10254, doi:10.1145/3706628.3708854, ISBN 979-8-4007-1396-5
- ^ "AMD Customer Community - AI engine name". adaptivesupport.amd.com. Retrieved 2025-07-10. We do not define it so any time you see it defined as Artificial Intelligence Engine (I have seen this in many literature paper from Universities) this is incorrect. We kind of imply that AI is for Artificial Intelligence as the AI Engine is very well suited for Artificial Intelligence but it is also well suited for other application such as DSP or image processing. This is why you can also see that this is an Adaptable Intelligent Engine. But any way the only official full name is AI Engine, AI not officially standing for anything specific 
- ^ Mehta, Nick (2014). "UltraScale Architecture: Highest Device Utilization, Performance, and Scalability" (PDF).
- ^ a b Levental, Maksim; Khan, Arham; Chard, Ryan; Chard, Kyle; Neuendorffer, Stephen; Foster, Ian (2024-06-19). "An End-to-End Programming Model for AI Engine Architectures". 14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART'24)). HEART '24. New York, NY, USA: Association for Computing Machinery. pp. 135–136. doi:10.1145/3665283.3665294. ISBN 979-8-4007-1727-7.
- ^ Nguyen, Tan; Blair, Zachary; Neuendorffer, Stephen; Wawrzynek, John (2023-09-04). "SPADES: A Productive Design Flow for Versal Programmable Logic". 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). pp. 65–71. doi:10.1109/FPL60245.2023.00017. ISBN 979-8-3503-4151-5.
- ^ a b c d Zhuang, Jinming; Lau, Jason; Ye, Hanchen; Yang, Zhuoping; Du, Yubo; Lo, Jack; Denolf, Kristof; Neuendorffer, Stephen; Jones, Alex; Hu, Jingtong; Chen, Deming; Cong, Jason; Zhou, Peipei (2023-02-12). "CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture". Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. FPGA '23. New York, NY, USA: Association for Computing Machinery. pp. 153–164. doi:10.1145/3543622.3573210. ISBN 978-1-4503-9417-8.
- ^ a b c d e f g h i Zhuang, Jinming; Xiang, Shaojie; Chen, Hongzheng; Zhang, Niansong; Yang, Zhuoping; Mao, Tony; Zhang, Zhiru; Zhou, Peipei (2025-02-27). "ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines". Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. FPGA '25. New York, NY, USA: Association for Computing Machinery. pp. 92–102. doi:10.1145/3706628.3708870. ISBN 979-8-4007-1396-5.
- ^ Delaye, Elliott (2022-05-30). "CGRA4HPC 2022 Invited Speaker: Mapping ML to the AMD/Xilinx AIE-ML architecture". 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). p. 628. doi:10.1109/IPDPSW55747.2022.00109. ISBN 978-1-6654-9747-3.
- ^ Kalamkar, Dhiraj; Mudigere, Dheevatsa; Mellempudi, Naveen; Das, Dipankar; Banerjee, Kunal; Avancha, Sasikanth; Vooturi, Dharma Teja; Jammalamadaka, Nataraj; Huang, Jianyu (2019-06-13), A Study of BFLOAT16 for Deep Learning Training, arXiv:1905.12322
- ^ a b c d e "AMD Technical Information Portal - AIE-ML comparison with AIE". docs.amd.com. Retrieved 2025-07-09.
- ^ a b c d e f g h Hunhoff, Erika; Melber, Joseph; Denolf, Kristof; Bisca, Andra; Bayliss, Samuel; Neuendorffer, Stephen; Fifield, Jeff; Lo, Jack; Vasireddy, Pranathi; James-Roxby, Phil; Keller, Eric (2025-05-04). "Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface". 2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE. pp. 85–94. doi:10.1109/FCCM62733.2025.00043. ISBN 979-8-3315-0281-2.
- ^ a b c d Rico, Alejandro; Pareek, Satyaprakash; Cabezas, Javier; Clarke, David; Ozgul, Baris; Barat, Francisco; Fu, Yao; Münz, Stephan; Stuart, Dylan; Schlangen, Patrick; Duarte, Pedro; Date, Sneha; Paul, Indrani; Weng, Jian; Santan, Sonal (2024-07-10). "AMD XDNA NPU in Ryzen AI Processors". IEEE Micro. 44 (6): 73–82. Bibcode:2024IMicr..44f..73R. doi:10.1109/MM.2024.3423692. ISSN 1937-4143.
- ^ Schmidt, Andrew (2024-05-27). "RAW 2024 Invited Talk-9: Riallto: An Open-Source Exploratory Framework for Ryzen AI™". 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. p. 91. doi:10.1109/IPDPSW63119.2024.00030. ISBN 979-8-3503-6460-6.
- ^ Alcorn, Paul (July 15, 2024). "AMD deep-dives Zen 5 architecture — Ryzen 9000 and AI 300 benchmarks, RDNA 3.5 GPU, XDNA 2, and more". TomsHardware. Archived from the original on July 16, 2024.
- ^ Bonshor, Gavin. "The AMD Zen 5 Microarchitecture: Powering Ryzen AI 300 Series For Mobile and Ryzen 9000 for Desktop". www.anandtech.com. Archived from the original on July 15, 2024. Retrieved 2025-07-09.
- ^ Lee, Kyuho J. (2021-01-01), Kim, Shiho; Deka, Ganesh Chandra (eds.), "Chapter Seven - Architecture of neural processing unit for deep neural networks", Advances in Computers, Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, vol. 122, Elsevier, pp. 217–245, doi:10.1016/bs.adcom.2020.11.001, retrieved 2025-07-08
- ^ a b Liao, Heng; Tu, Jiajin; Xia, Jing; Liu, Hu; Zhou, Xiping; Yuan, Honghui; Hu, Yuxing (2021-02-27). "Ascend: A Scalable and Unified Architecture for Ubiquitous Deep Neural Network Computing : Industry Track Paper". 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). pp. 789–801. doi:10.1109/HPCA51647.2021.00071. ISBN 978-1-6654-2235-2.
- ^ a b Talpes, Emil; Sarma, Debjit Das; Venkataramanan, Ganesh; Bannon, Peter; McGee, Bill; Floering, Benjamin; Jalote, Ankit; Hsiong, Christopher; Arora, Sahil; Gorti, Atchyuth; Sachdev, Gagandeep S. (2020-03-24). "Compute Solution for Tesla's Full Self-Driving Computer". IEEE Micro. 40 (2): 25–35. Bibcode:2020IMicr..40b..25T. doi:10.1109/MM.2020.2975764. ISSN 1937-4143.
- ^ a b c "Very Long Instruction Word (VLIW) Architecture". GeeksforGeeks. 2020-12-01. Retrieved 2025-07-07.
- ^ a b c "AMD Technical Information Portal - Tools". docs.amd.com. Retrieved 2025-07-08.
- ^ "VCK5000 Versal Development Card - Documentation". AMD. Retrieved 2025-07-11.
- ^ "AMD Technical Information Portal - AI engine compiler". docs.amd.com. Retrieved 2025-07-09.
- ^ "Design Rationale of Two Generations of AI Engines" (PDF). indico.cern.ch. Archived (PDF) from the original on 2024-12-17. Retrieved 2025-07-08.
- ^ "AMD Technical Information Portal - AI Engine programming model". docs.amd.com. Retrieved 2025-07-09.
- ^ a b Kathail, Vinod (2020-02-24). "Xilinx Vitis Unified Software Platform". Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA '20. New York, NY, USA: Association for Computing Machinery. pp. 173–174. doi:10.1145/3373087.3375887. ISBN 978-1-4503-7099-8.
- ^ a b Zhao, Zhipeng; Hoe, James C. (2017-02-22). "Using Vivado-HLS for Structural Design: A NoC Case Study (Abstract Only)". Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA '17. New York, NY, USA: Association for Computing Machinery. p. 289. doi:10.1145/3020078.3021772. ISBN 978-1-4503-4354-1.
- ^ a b c Zhang, Wenbo; Liu, Yiqi; Zang, Tianhao; Bao, Zhenshan (2024-11-19). "EA4RCA: Efficient AIE accelerator design framework for regular Communication-Avoiding Algorithm". ACM Trans. Archit. Code Optim. 21 (4): 71:1–71:24. doi:10.1145/3678010. ISSN 1544-3566.
- ^ a b Lattner, Chris; Amini, Mehdi; Bondhugula, Uday; Cohen, Albert; Davis, Andy; Pienaar, Jacques; Riddle, River; Shpeisman, Tatiana; Vasilache, Nicolas; Zinenko, Oleksandr (2021-02-21). "MLIR: Scaling Compiler Infrastructure for Domain Specific Computation". 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). pp. 2–14. doi:10.1109/CGO51591.2021.9370308. ISBN 978-1-7281-8613-9.
Further reading
- Bonilla, Joel Lopez; Fond, Benoit; Graichen, Henrik; Hamann, Jan; Beyrau, Frank; Boye, Gunar (2021-09-30). "Thermal characterization of high-performance battery cells during charging and discharging using optical temperature measurement methods". FISITA World Congress 2021 - Technical Programme. FISITA. doi:10.46720/f2021-adm-145. ISBN 978-1-9160259-2-9.
- Perryman, Noah; George, Alan; Goodwill, Justin; Sabogal, Sebastian; Wilson, David; Wilson, Christopher (2025). "Comparative Analysis of Next-Generation Space Computing Applications on AMD-Xilinx Versal Architecture". Journal of Aerospace Information Systems. 22 (2): 103–115. doi:10.2514/1.I011455. ISSN 1940-3151.
- Silvano, Cristina; Ielmini, Daniele; Ferrandi, Fabrizio; Fiorin, Leandro; Curzel, Serena; Benini, Luca; Conti, Francesco; Garofalo, Angelo; Zambelli, Cristian; Calore, Enrico; Schifano, Sebastiano; Palesi, Maurizio; Ascia, Giuseppe; Patti, Davide; Petra, Nicola (2025-06-13). "A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms". ACM Comput. Surv. 57 (11): 286:1–286:39. doi:10.1145/3729215. ISSN 0360-0300.
External links
- "IRON API and MLIR-based AI Engine Toolchain". GitHub.
- "ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines (FPGA'25)". GitHub.
- "AI Engine Development - User Guide".
- "CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture (FPGA'23)". GitHub.
- "AI Engine Intrinsics - documentation".
- "Vitis Tutorials - AI Engine development". GitHub.