已阅读5页,还剩13页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
heterogeneous computing in arm architecture alan tsai business development manager sept 2012 2 agenda trends in heterogeneous computing gpu computing with arm mali-t600 series as example heterogeneous system architecture (hsa) is the future 3 trends in the industry heterogeneous multiprocessing established approach for soc design mix of many specialized accelerators, implementing different isas diverse programming approaches lead to lack of portability parallel computation for performance and efficiency endorsed at all levels of computer architecture parallel programming traditionally difficult general purpose programmability of gpus massive parallel computation potential increasing programmability 4 what is parallel computing? simply, doing multiple tasks simultaneously task-parallel computing does different tasks concurrently reading email, playing music, and surfing the web are all separate tasks in a multicore system, these can execute simultaneously data-parallel computing does the same operation on a collection of data concurrently adjusting the contrast of the pixels of an image each thread executes the same code but with different data classic simd (single-instruction, multiple-data) gpu computing is perfect for data-parallel applications 5 what is heterogeneous computing? cpu gpu gpu used as computational accelerators or companion processors massively parallel architecture gives great computational capabilities cost effective, efficient, great floating point performance 6 complementary processor architectures serial workloads and task parallel workloads 50 stages very high latency high throughput 2d/3d graphics stream processing the cpu the gpu 7 gpu compute making the difference computer vision real time still and moving image perfection up scaling multi-perspective vision 2d to 3d information extraction multi-user interaction benefits more efficient processing improved accuracy/quality bom reduction unlock new use cases improved existing use cases light-field photography computational photography trends heterogeneous computing portability parallel computation hardware acceleration gpu computing 8 gpu computing mali-t600 as example 9 mali-t600 gpu series overview innovation and market leadership tri-pipe alu design - optimal graphics and gpu compute native 64-bit integer and floating point (ieee 754-2008), scalar and simd flexibility and scalability mali-t624 and mali-t628 for smartphones and smarttvs mali-t678 for the best in compute and graphics for tablets software compatibility and comprehensive api support directx 11, opengl es 3.0 opencl full profile and renderscript compute performance 100s of gflops of arithmetic performance mali-t628 10 what about opencl? opencl is an api for heterogeneous computing write one source, deploy on many type of processors currently, its targeted for data-parallel applications applications use kernels to process data provided to the opencl runtime kernels are written in opencl c subset of c99 with the addition of vector data types (e.g. float4) application initializes opencl runtime compiles and links kernel creates and initialize data buffers executes kernel and collect results 11 gpu computing with no compromises embedded profile is a subset of full profile, reducing features and precision all shipping processors openly programmable with opencl 1.1 are full profile all mainstream developers are producing for full profile all existing software in the industry has been developed for full profile with mali-t600, arm is the first ip vendor to pass conformance for opencl 1.1 full profile feature benefit native support for 64-bit integer maths (scalar and simd) radically faster and more efficient than software emulation beneficial for multimedia encoders/decoders and encryption software, pointer arithmetic for the post 4gb world, large counters ieee 754-2008 compliance same floating point accuracy on a mali-t600 series gpu as any other full profile conformant platform hardware accelerated support for 3d images great for volumetric modelling useful in physics, games built-in atomic operations accelerated in hardware on mali-t600 no need for expensive external memory synchronization or emulation cornerstone of parallel computation 12 opencl platform model on mali-t600 host arm mali-t600 mp4 gpu arm compute subsystem core multiple hardware execution queues thread work-items run as a single thread on a core a whole work-group executes on a single core each thread has its own registers, ps, sp, private stack job manager handles everything in hardware: issuing all tasks to available cores handling out-of-order execution queues continually spawning work items (threads) to keep cores busy providing work item ids per-job completion interrupts can be requested 13 opencl programming model application program runtime compiler kernel object kernel -opencl kernel -native kernel index space (ndrange) execute command can use static compilation binaries are cached the kernel is executed over each element of the n-dimensional index space 14 opencl execution model on mali-t600 core l1 core l1 w1 w1 w1 w1 w1 w1 w1 w1 w1 w1 w1 w1 w1 w1 w1 w1 w1 work item work group ndrange core group core l1 core l1 core l1 core l1 registers, pc, sp, private stack barriers, local memory/atomics, constants global atomics, cached global memory 15 opencl execution model on mali-t600 hardware queue 1 hardware queue 2 core group l1 core l1 core l1 core l1 core opencl queue (task graph) multiple hardware queues supported (whilst one is executed, the other is being built) job manager handles everything in hardware applications make driver calls to queue tasks/jobs to the target compute device 16 coherency allows the sharing of on-chip data reduces external memory access saves power compute subsystems for soc designed and optimized by
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024-2034年中国卡特缩合剂市场深度分析及投资战略咨询报告
- 2024-2034年中国乘用车涡轮增压系统行业市场运行态势与投资战略咨询报告
- 2024-2034年中国中央商用空调行业市场现状分析及竞争格局与投资发展研究报告
- 2024-2030年中国钚行业市场运行格局及战略咨询研究报告
- 【可行性报告】2023年日用及医用橡胶制品相关行业可行性分析报告
- 2024-2030年中国抗静电剂行业市场全景调研及发展趋向研判报告
- 2024-2030年中国导电布产业竞争现状及投资前景分析报告
- 2024-2030年中国哈尔滨红肠行业市场全景调查及投资潜力研究报告
- 2024-2030年中国互联网演艺行业市场竞争力分析及投资前景预测报告
- 【可行性报告】2023年水质检测相关项目可行性研究报告
- Q∕SY 05262-2019 机械清管器技术条件
- 浅谈课程游戏化背景下教师专业提升的策略
- 国开大学行管专《应用写作》形考1—6答案
- 小学美术苏少版一年级下册《这是我呀》教案
- 医院急救设备应急调配机制
- 2022年重庆市中考英语B卷试题及答案解析
- 世界各国国旗图片以与国家和首都的中英文对照
- Q∕GDW 12109-2021 电力物联网感知层设备接入安全技术规范
- 一题多问一道二次函数经典题的50种问法
- DBJ51 014-2021 四川省建筑地基基础检测技术规程
- 高填方路基专项施工方案
评论
0/150
提交评论