optimization - Optimize code performance when odd/even threads are doing different things in CUDA -


I have two major vectors, I'm trying to multiply some of the elements, where in the first one- The numbered element is the vector multiplied by the next weird-numbered element in the second vector ... and where in the first vector the contrasting element is multiplied by the preceding numerical element in the second vector. Example for

: Vector 1 V1 (1) V1 (2) v1 (3) v1 (4)
vector 2v2 (1) V2 (2) V2 (3) V2 (4) is V1 (1) * V2 (2)
V1 (3) * V2 (4)
V1 ( 2) * V2 (1)
V1 (4) * V2 (3)

I have written the code for doing this (PDS has elements of the first vector in shared memory, Second Vector NDS):

  // Instead,% 2 is the first time to check that some number is odd / too weird / too fast if ((tx and 0x0001) == 0x0000 ) NDS [Tx + 1] = PDS [Taeq C] NDS [Tx + 1]; Other NDS [Tx-1] = PDS [Tx] * NDS [Tx-1]; __syncthreads ();  

Anyway what is there to avoid this code or to avoid deviation?

You should be able to end the branch like this:

  Int tx_index = tx ^ 1; // is equal to: tx_index = (tx and 1)? Tx - 1: Tx + 1 NDS [Tx_indix] = PDS [Tx] * NDS [Tx_index];  

Comments