Incomplete implementation of SparseGEMV

https://github.com/FasterDecoding/TEAL/blob/fb7373c93ac3594817c9ee64d4e08b47430a1822/kernels/sparse_gemv.py#L271

Hi, I notice that the SparseGEMV kernel only manage the case when `batch_size=1 & seqlen=1`. Beyond that case, the kernel outputs wrong answer. 

Is it expected that this kernel only work for decoding stage? Then where is the implementation about Appendix A4?