![]() ![]() These provide Tensor Core–specific data types, along with routines to load and store data and perform warp-based matrix multiplications using these data types. The use of Tensor Cores through the WMMA API in CUDA Fortran requires the wmma module as well as the cuf_macros.CUF macro file. CUDA Fortran Tensor Core data precision and WMMA tile sizes. In this post, I focus on the WMMA interface for double precision or real(8) data. The mapping of threads to matrix elements is opaque, where the WMMA submatrix datatype (equivalent to the fragment in CUDA C), is used to represent the elements each thread holds of the matrix represented by the warp of threads, along with other metadata. Before the WWMA operation can take place, the operand matrices must be loaded into registers and then distributed amongst the threads in the warp. The Volta and Turing architectures support only the cases where the multiplicands are real(2) data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |