CUDA BY EXAMPLE

yhzslby 64 0 PDF 2018-12-25 19:12:44

This book was basically written by NV, its mainly about CUDA.12/19/11Hello world ExampleAllocate host and device memoryAlLocate host and device memoryint *h ad:int *h b,*d bcutilSafeCall(cudaMa l locHost((void**)&h a, memsize))cutilsafecall(cudaMal locHost((void**)&h b, memsize))cutilSafeCall(cudaMalloc((void**)&d a, memsize)cutilsafecall(cudaMalloc((void**)&d b, memsize))Hello world ExampleHost codeKernel parametersdim3 threads(numthreads blocksize, 1);din3 blocks(blocksize, 1)Copy the parameters to GPu gLobal memorcutilsafecall( cudaMemcpy(d a, h a, memsize cudamencpyHostToDevice));cutilsafecall( cudaMemcpy(d b, h b, memsize cudamencpyHostToDevice));Invoke kernelhe lloworld << blocks, threads >>>(d a, d b)Copy the results back to CPUcutilSafecall( cudaMemcpy(h a, d a, memsize cudaMencpyDeviceToHost))cutilSafecall( cudaMemcpy(h b,d b, memsize cudaMencpyDeviceToHost));12/19/11Hello world ExampleKernel codeglobalvoid helloworld (int *a, int *b)int idx= blockIdxx* blockDim .x threadIdx xalida blockEdblida= threadIdx XFTo Try CUDA ProgrammingSsH to13847.102.165Set environment vals in bashrc in your home directoryexport PATH=SPATH: /usr/local/ cuda/binexport LD_ LIBRARY PATH=/usr/local/ cuda/lib: SLD LIBRARY PATHexport LD_ LIBRARY_ PATH=/usr/local/cuda/lib64: SLD_ LIBRARY_ PATHCopy the sdk from home/students/NVIDIA GPU Computing SDKCompile the following directoriesNVIDIA GPU Computing_ SDk /shared/NVIDIA GPU Computing_ SDK/C/common/The sample codes are inNVIDIA GPU Computing SDK/C/src,412/19/11Demo· Hello wor|dPrint out block and thread ids· Vector adcC=A+BCUDA Language ConceptCUDa programming modelCUDA memory model12/19/11Some terminologiesDevice GPU=set of stream multiprocessorsStream Multiprocessor (SM)=set ofprocessors shared memoryKernel =GPU programGrid array of thread blocks that execute akernelThread block group of SiMd threads thatexecute a kernel and can communicate viashared memoryCUDA Programming modelParallel code(kernel)is launched andexecuted on a device by many threadsThreads are grouped into thread blocksParallel code is written for a thread// Kernel definitionglobal void vecAdd(float* A, float* B, float* C)int i threadIdx. x:C[l] =A[l] B[1]12/19/11Thread hierarchyThreads launched for a parallel section arepartition into thread blocksThread block is a group of threads that canSynchronize their executionCommunicate via a low latency shared memoryGrid all thread blocks for a given launchGridBlock(O,0Block (2,0lock(0, 1) Block(1, 1). Block(2,Block (1, 1)Thread(o, o)Thread (1, 0) Thread (2, 0) Thread (3, 0)Thread (o, 1)Thread (1, 1) Thread (2, 1) Thread(3, 1)Thread(0, 2)Thread (1, 2) Thread (2, 2)Thread(3, 2)12/19/11Ds and dimensionsThreadsdIDSUnique within a block -two threads from two differentblocks cannot cooperateBlocks2D and 3D IDs(depend on the hardwareUnique within a gridDimensions are set at launch timeCan be unique for each sectionBuilt-in variablesthreadldx, blockldxlock Dim grid DiGrid 1Kernel 1BlockBlock(0,0)(1,0)2,0)BlockBlockBlock(0,1)(1,1)(2,1)Grid 2terne 2B|ock(1,1)Thread ThreadThread I(0.0)(1,0)(3,0)(4,0)Thread ThreadThreadThreadThread(131)2,1)(3.1(4,D)Thread Thread Thread Threa(0,2)2.2)(4.2)12/19/11Example: Increment Array Elements 2nVIDIAIncrement N-element vector a by scalar bLet' s assume n=16. blockDim=4 - 4 blocksint idx= blockDim.x blockldx +threadIdxxblockldxx-Oblockldx x-1blockldx.x=2blockldx. x=3blockDim.x=4blockDim. xe4block Dim x=4threadldx. x=0, 1, 2, 3 threadldxx-0, 1, 2, 3 threadldx x=0, 1, 2, 3 threadldx x=0, 1, 2 3idx=0,123dx=4,56,7idx=89,10,11id=12,13,14,15Example: Increment Array ElementsnVIDIACPU programCUDA progranoid increment cpu(float "a, float b, int n)__void increment gpu(float"a, float b, intN)for (int idx=O; idxppa, b, 16);ONMDIA Corporation 200712/19/11CUDA Memory Model· Each thread can(Device)GridR/W per-thread registersBlock(0, 0Block(1, 0)R/W per-thread local memoryShared MemoryShared MemoryR/W per-block shared memoryRegistersRegisters RegistersR/W per-grid global memoryThread (0, 0) Thread(1. 0) Thread (0, 0) Thread(1, D)Read only per-grid constant memoryRead only per-grid texture memory eomalThe host can r/w globalHostGlobalMemoryconstant and textureConstantmemoriesTextuMemoryDeviceGPUDRAMMultiprocessorLocalMultiprocessorMultiprocessorGlobalRegisterShared MemoryHostConstantConstant and TexturememoryCachesTexture10

资源预览

用户评论

暂无评论

itween example

iTween官网例子源代码官网要5美元

27 2020-05-15
SELinux By Example

selinuxbyexamplechm格式

27 2020-05-18
jstorm example

jstorm简单example,仅供本人网络不通，大家无需下载，https://github.com/alibaba/jstorm有免费资源

17 2020-06-14
Shiro Example

NULL 博文链接:https://technoboy.iteye.com/blog/1852118

8 2020-12-16
LVDS Example

Here is an example of instantiating an LVDS using the black box method:module mylvds_tx (tx_in, tx_i

5 2020-12-13
Debugging Example

希望和大家一起交流交流c#含源代码

33 2019-09-26
Arduino by Example

Chapter1,GettingStartedwithArduino,introducesthereadertotheArduinoplatform,beginningwithacquiringth

21 2020-03-01
angularJs by example

Learn AngularJS, and tackle the challenges of modern web development by creating your own applicatio

15 2019-09-22
html example

用于网页开发的一个网页特例！自己做的不一定好~

23 2019-09-19
my example

Thisismytrainingexample,ifneeditpleasedownloadit.

20 2019-09-19

CUDA BY EXAMPLE

资源预览

用户评论

推荐下载