megatron.core.tensor_parallel.layers.RowParallelLinear#
- class megatron.core.tensor_parallel.layers.RowParallelLinear(input_size, output_size, *, bias=True, input_is_parallel=False, init_method=<function xavier_normal_>, stride=1, keep_master_weight_for_test=False, skip_bias_add=False, params_dtype=torch.float32, use_cpu_initialization=False, perform_initialization=True, gradient_accumulation_fusion=False, sequence_parallel_enabled: bool = False, world_size: int | None = None)#
- Bases: - Module- Linear layer with row parallelism. - The linear layer is defined as Y = XA + b. A is parallelized along its first dimension and X along its second dimension as: A_1 |. |- A = | . | X = [X_1, …, X_p]
- . |A_p | - -
 - Parameters:
- input_size – first dimension of matrix A. 
- output_size – second dimension of matrix A. 
 
- Keyword Arguments:
- bias – If true, add bias. Note that bias is not parallelized. 
- input_is_parallel – If true, we assume that the input is already split across the GPUs and we do not split again. 
- init_method – method to initialize weights. Note that bias is always set to zero. 
- stride – For the strided linear layers. 
- keep_master_weight_for_test – This was added for testing and should be set to False. It returns the master weights used for initialization. 
- skip_bias_add – This was added to enable performance optimization where bias can be fused with other elementwise operations. We skip adding bias but instead return it. 
- params_dtype – 
- use_cpu_initialization – 
- perform_initialization – 
- gradient_accumulation_fusion – 
- sequence_parallel_enabled – 
 
 - forward(input_)#
- Parameters:
- input – 3D tensor whose order of dimension is [sequence, batch, hidden] 
- Returns:
- output 
- bias