大葉大學圖書館 |

Decoupled Vector-Fetch Architecture with a Scalarizing Compiler.

紀錄類型:	書目-電子資源 : Monograph/item
書名/作者:	Decoupled Vector-Fetch Architecture with a Scalarizing Compiler.
作者:	Lee, Yunsup.
出版者:	Ann Arbor : : ProQuest Dissertations & Theses, , 2016
面頁冊數:	157 p.
附註:	Source: Dissertation Abstracts International, Volume: 78-01(E), Section: B.
Contained By:	Dissertation Abstracts International78-01B(E).
標題:	Computer science.
ISBN:	9781369057706
摘要、提要註:	As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and compilers reveals an opportunity to construct a new data-parallel machine that is highly performant and efficient, yet a favorable compiler target that maintains the same level of programmability as the others.

Decoupled Vector-Fetch Architecture with a Scalarizing Compiler.
Lee, Yunsup.

Decoupled Vector-Fetch Architecture with a Scalarizing Compiler. - Ann Arbor : ProQuest Dissertations & Theses, 2016 - 157 p.

Source: Dissertation Abstracts International, Volume: 78-01(E), Section: B.

Thesis (Ph.D.)--University of California, Berkeley, 2016.

As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and compilers reveals an opportunity to construct a new data-parallel machine that is highly performant and efficient, yet a favorable compiler target that maintains the same level of programmability as the others.

ISBN: 9781369057706Subjects--Topical Terms:

182962
Computer science.

Decoupled Vector-Fetch Architecture with a Scalarizing Compiler.
LDR:02466nmm a2200277 4500 001 476250
005 20170614101409.5
008 181208s2016 ||||||||||||||||| ||eng d
020 $a 9781369057706
035 $a (MiAaPQ)AAI10151006
035 $a AAI10151006
040 $a MiAaPQ $c MiAaPQ
100 1 $a Lee, Yunsup. $3 686842
245 1 0 $a Decoupled Vector-Fetch Architecture with a Scalarizing Compiler.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2016
300 $a 157 p.
500 $a Source: Dissertation Abstracts International, Volume: 78-01(E), Section: B.
500 $a Adviser: Krste Asanovic.
502 $a Thesis (Ph.D.)--University of California, Berkeley, 2016.
520 $a As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and compilers reveals an opportunity to construct a new data-parallel machine that is highly performant and efficient, yet a favorable compiler target that maintains the same level of programmability as the others.
520 $a In this thesis, I present the Hwacha decoupled vector-fetch architecture as the basis of a new data-parallel machine. I reason through the design decisions while describing its programming model, microarchitecture, and LLVM-based scalarizing compiler that efficiently maps OpenCL kernels to the architecture. The Hwacha vector unit is implemented in Chisel as an accelerator attached to a RISC-V Rocket control processor within the open-source Rocket Chip SoC generator. Using complete VLSI implementations of Hwacha, including a cache-coherent memory hierarchy in a commercial 28 nm process and simulated LPDDR3 DRAM modules, I quantify the area, performance, and energy consumption of the Hwacha accelerator. These numbers are then validated against an ARM Mali-T628 MP6 GPU, also built in a 28 nm process, using a set of OpenCL microbenchmarks compiled from the same source code with our custom compiler and ARM's stock OpenCL compiler.
590 $a School code: 0028.
650 4 $a Computer science. $3 182962
690 $a 0984
710 2 0 $a University of California, Berkeley. $b Electrical Engineering and Computer Sciences. $3 686837
773 0 $t Dissertation Abstracts International $g 78-01B(E).
790 $a 0028
791 $a Ph.D.
792 $a 2016
793 $a English