Attention processor using custom rmsnorm kernel for Q/K normalization. NOTE: attn.norm_q and attn.norm_k HAVE weights (elementwise_affine=True).