Data Detection in Single User Massive MIMO Using Re-Transmissions

Single user massive multiple input multiple output (MIMO) can be used to increase the spectral efficiency, since the data is transmitted simultaneously from a large number of antennas located at both the base station and mobile. It is feasible to have a large number of antennas in the mobile, in the millimeter wave frequencies. However, the major drawback of single user massive MIMO is the high complexity of data recovery at the receiver. In this work, we propose a low complexity method of data detection with the help of re-transmissions. A turbo code is used to improve the bit-error-rate (BER). Simulation results indicate significant improvement in BER with just two re-transmissions as compared to the single transmission case. We also show that the minimum average SNR per bit required for error free propagation over a massive MIMO channel with re-transmissions is identical to that of the additive white Gaussian noise (AWGN) channel, which is equal to $-1.6$ dB.


I. INTRODUCTION
The main idea behind single user massive multiple input multiple output (MIMO) [1]- [7] is to increase the bit rate between the transmitter and receiver over a wireless channel. This is made possible by sending the bits or symbols (groups of bits) simultaneously from a large number of transmit antennas. The signal at each receive antenna is a linear combination of the bits or symbols sent from all the transmit antennas plus additive white Gaussian noise (AWGN). We assume that the carrier frequency offset is absent or has been accurately estimated and canceled with the help of training symbols (preamble) [8]- [12]. The task of the receiver is to estimate the transmitted bits or symbols, from the signals in all the receive antennas. Note that it is possible to have a large number of antennas in both the base station and the mobile, in millimeter wave frequencies [13]- [21], due to the small size of the antennas.
If both the transmitter and receiver have N antennas and the symbols are drawn from an M -ary constellation, the complexity of the maximum likelihood (ML) detector would be M N , since it exhaustively searches all possible symbol combinations. Clearly, the ML detector is impractical. On the other hand, the "zero-forcing" solution is to multiply the received signal vector by the inverse of the channel matrix, which eliminates the interference from the other symbols. However, the computational complexity of inversion of the N × N channel matrix, for large values of N , does not make this approach attractive. Moreover, when the noise vector is multiplied by the inverse of the channel matrix, it usually results in noise enhancement, leading to poor bit-error-rate (BER) performance.
In [22], a split pre-conditioned conjugate gradient method for data detection in massive MIMO is proposed. A low-complexity soft-output data detection scheme based on Jacobi method is presented in [23]. Near optimal data detection based on steepest descent and Jacobi method is presented in [24]. Matrix inversion based on Newton iteration for large antenna arrays is given in [25]. Subspace methods of data detection in MIMO are presented in [26], [27]. Data detection in large scale MIMO systems using successive interference cancellation (SIC) is given in [28]. MIMO data detection in the presence of phase noise is given in [29]. Detection of LDPC coded symbols in MIMO systems is discussed in [30]. Decoding of convolutional codes in MIMO systems is presented in [31]. Decoding of polar codes in MIMO systems is given in [32]. Sphere decoding procedures for the detection of symbols in MIMO systems are discussed in [33]- [35]. Large scale MIMO detection algorithms are presented in [36]. Multiuser detection in massive MIMO with power efficient low resolution ADCs is given in [37]. Joint ML detection and channel estimation in multiuser massive MIMO is presented in [38]. Detection of turbo coded offset QPSK in the presence of frequency and clock offsets and AWGN is presented in [39], [40]. Channel estimation in large antenna systems is given in [41]- [43]. Channel-aware data fusion for massive MIMO, in the context of wireless sensor networks (WSNs) is proposed in [44].
In all the papers in the literature, on the topic of data detection in massive MIMO, the main lacuna has been in the definition of (or rather the lack of it) the signal-to-noise ratio (SNR). In fact, even the operating SNR of a mobile phone is not known [9]- [12], [45]. It may be noted that the mobile phones indicate a typical signal strength of −100 dBm (10 −10 milliwatt). However, this is not the SNR. In this work, we use the SNR per bit as the performance measure, since there is a lower bound on the SNR per bit for error-free transmission over any type of channel, which is −1.6 dB [11], [12]. The so-called "capacity" of MIMO channels has been derived earlier in [46]- [49]. However, the channel capacity is derived differently in [11], [12], and in this work. Therefore, the question naturally arises: are the present-day wireless telecommunication systems operating anywhere near the channel capacity? This question assumes significance in the context of 5G wireless communications where not only humans, but also machines and devices would be connected to the internet to form the Internet of Things (IoT) [50]. Hence, in order to minimize the global energy consumption due to IoT, it is necessary for each device to operate as close to the minimum average SNR per bit for error-free transmission, as possible [9]- [12], [45], [51]. Finally, one might ask the question: is it not possible to increase the bit-rate by increasing the size of the constellation, and using just one transmit and receive antenna? The answer is: increasing the size of the constellation increases the peak-to-average power ratio (PAPR), which poses a problem for the radio frequency (RF) front end amplifiers, in terms of the dynamic range. In other words, a large PAPR requires a large dynamic range, which translates to low power efficiency, for the RF amplifiers [52].
In this work, we re-transmit a symbol N rt times and then take the average, which results in a lower interference power compared to the single transmission case. Perfect channel state information (CSI) is assumed. The BER is improved with the help of a turbo code. This paper is organized as follows. Section II describes the system model. The receiver design is presented in Section III. The bit-error-rate (BER) results from computer simulations are given in Section IV. Finally, Section V presents the conclusions and future work.  Figure 1. System model.
Consider the system model in Figure 1. The data bits are organized into frames of length L d1 bits. The recursive systematic convolutional (RSC) encoders 1 and 2 encode the data bits into Quadrature Phase Shift Keyed (QPSK) symbols having a total length of L d . We assume a MIMO system with N transmit and N receive antennas. We also assume that L d /N is an integer, where L d = 2L d1 as shown in Figure 1. The L d QPSK symbols are transmitted, N symbols at a time, from the N transmit antennas.
The received signal in the k th (0 ≤ k ≤ N rt − 1, k is an integer), re-transmission is given by [11], [12] whereR k ∈ C N ×1 is the received vector,H k ∈ C N ×N is the channel matrix andW k ∈ C N ×1 is the additive white Gaussian noise (AWGN) vector. The transmitted symbol vector is S ∈ C N ×1 , whose elements are drawn from an M -ary constellation. Boldface letters denote vectors or matrices. Complex quantities are denoted by a tilde. However tilde is not used for complex symbols S. The elements ofH k are statistically independent with zero mean and variance per dimension equal to σ 2 H , that is where E[·] denotes the expectation operator [53], [54],H k, i, j denotes the element in the i th row and j th column ofH k . Similarly, the elements ofW k are statistically independent with zero mean and variance per dimension equal to σ 2 W , that is whereW k, i denotes the element in the i th row ofW k . The real and imaginary parts ofH k, i, j andW k, i are also assumed to be independent. The channel and noise are assumed to be independent across re-transmissions, that is where the superscript (·) H denotes Hermitian (conjugate transpose of a matrix), I N is an N × N identity matrix and δ K (m) (m is an integer) is the Kronecker delta function defined by The receiver is assumed to have perfect knowledge ofH k .

III. RECEIVER
In this section, we describe the procedure for detecting S given the received signalR k in (1). Consider Observe that similar to (4) we have The main aim of this work is to replace the expectation operator in (8) by time-averaging, in the form of re-transmissions, so that the right-hand-side of (8) is approximately satisfied. Now the i th element ofỸ k in (6) is where it is understood thatF k, i, i is real-valued. Note that for large values of N ,Ĩ k, i andṼ k, i are Gaussian distributed due to the central limit theorem [53]. Moreover, since S i andW k, i are independent,Ĩ k, i andṼ k, i are uncorrelated, that is whereĨ k, i denotes the interference andṼ k, i denotes the noise term. From (12) we have The noise power is where we have used the sifting property of the Kronecker delta function. The interference power is and Substituting (15) and (16) in (14) we get whereỸ k, i is defined in (10) and Note that F i in (21) is real-valued. SinceŨ ′ k, i is independent over k we have In other words, the interference plus noise power reduces due to averaging. The average signal-to-noise ratio per bit in decibels is defined as [11], [12] (see also the appendix) From (23) we can write σ 2 Substituting (24) in (22) we get Noise power constant for a given SNR Interference power reduces with increasing Nrt After concatenation, the signalỸ i and F i, i in (20) for 0 ≤ i ≤ L d − 1 is sent to the turbo decoder [54], as explained below.
A. Turbo Decoding -the BCJR Algorithm The block diagram of the turbo decoder is depicted in Figure 2. Note that The BCJR algorithm has the following components: 1) The forward recursion 2) The backward recursion 3) The computation of the extrinsic information and the final a posteriori probabilities. Let S denote the number of states in the encoder trellis. Let D n denote the set of states that diverge from state n, for implies that states 0 and 3 can be reached from state 0. Similarly, let C n denote the set of states that converge to state n. Let α i, n denote the forward sum-of-products (SOP) at time i ((0 ≤ i ≤ L d1 − 2)) at state n. Then the forward SOP for decoder 1 can be recursively computed as follows (forward recursion) [54] α ′ i+1, n = m∈Cn α i, m γ 1, i, m, n P (S b, i, m, n ) where denotes the a priori probability of the systematic (data) bit corresponding to the transition from state m to state n, at decoder 1 at time i, obtained from the 2 nd decoder at time l, after de-interleaving, that is, i = π −1 (l) for some 0 ≤ l ≤ L d1 − 1, l = i and where S m, n denotes the coded QPSK symbol corresponding to the transition from state m to n in the trellis. The normalization step in the last equation of (28) is done to prevent numerical instabilities. Let β i, n denote the backward SOP at time i (1 ≤ i ≤ L d1 − 1) at state n (0 ≤ n ≤ S − 1). Then the recursion for the backward SOP (backward recursion) at decoder 1 can be written as: Once again, the normalization step in the last equation of (31) is done to prevent numerical instabilities. Let ρ + (n) denote the state that is reached from state n when the input symbol is +1. Similarly let ρ − (n) denote the state that can be reached from state n when the input symbol is −1. Then the extrinsic information from decoder 1 to 2 is calculated as follows for 0 which is further normalized to obtain Equations (28), (31), (32) and (33) constitute the MAP recursions for the first decoder. The MAP recursions for the second decoder are similar excepting that γ 1, i, m, n is replaced by whereỸ i1 and F i1 are obtained by concatenatingỸ i and F i, i in (20) and and (33) is replaced by F 2, i+ and F 2, i− respectively (see Figure 2). After several iterations, the final a posteriori probabilities of the i th data bit obtained at the output of the first decoder is computed as (for 0 ≤ i ≤ L d1 − 1): where again F 2, k+ and F 2, k− denote the a priori probabilities obtained at the output of the second decoder (after deinterleaving) in the previous iteration. The final estimate of the i th data bit is given as (see Figure 2): Note that: 1) One iteration involves decoder 1 followed by decoder 2.
2) Since the terms α i, n and β i, n depend on F 2, i+ , F 2, i− for decoder 1, and F 1, i+ , F 1, i− for decoder 2, they have to be recomputed for every decoder in every iteration according to (28) and (31) respectively. In the computer simulations, robust turbo decoding [9] has been incorporated, that is, the exponent in (30) and (34) is normalized to the range [−30, 0].

IV. SIMULATION RESULTS AND DISCUSSION
In this section, we present the results from computer simulations. The simulation parameters are presented in Table I. At high SNR, the number of frames simulated is 10 6 , whereas for low and medium SNR, the number of frames simulated is 10 5 .
In Figure 3, we present the simulation results for a 4-state turbo code with generating matrix given by From Figures 3(a) to (c) we see the following. 1) There is no significant degradation in the BER performance due to the increase in the number of antennas (N ), for a given number of re-transmissions N rt > 1. For example, with N rt = 2 and N = 16, a BER of 10 −4 is attained at an SNR per bit of 4dB, whereas the same BER is attained at an SNR per bit of 4.25 dB for N = 512 -this is just a 0.25 dB degradation in performance. Observe that the spectral efficiency with N = 16 antennas and N rt = 2 re-transmissions, is 4 bits/transmission or 4 bits/sec/Hz, since each QPSK symbol carries 1/4 bits of information (see appendix). However, the spectral efficiency with N = 512 antennas and N rt = 2 re-transmissions is 128 bits/sec/Hz. In other words, an increase in the spectral efficiency by a factor of 32 results in only a 0.25 dB degradation in the BER performance. 2) With N rt = 2, there is significant improvement in BER performance compared to N rt = 1, for all values of N . However the BER performance with N rt = 4 is comparable to N rt = 2. This is because, with increasing N rt the BER is limited by the variance of the noise term in (25), even though the variance of the interference term gets reduced due to averaging. 3) Note that when N = 1, the interference is zero and only noise is present. We see from Figure 3(a) that there is a significant improvement in performance for N rt = 2, compared to N rt = 1. This can be attributed to the fact that F i in (21) contains two positive terms (independent Rayleigh distributed random variables) for N rt = 2 compared to N rt = 1.
Hence, the probability that both terms are simultaneously close to zero, is small. 4) It is interesting to compare the case N = N rt = 1 in Figure 3(a) with Figure 12 in [12] with N r = 1. Both systems are identical, in terms of the received signal model, that is where i denotes the time index. In this work, we obtain a BER of 10 −4 at an average SNR per bit of 5 dB, whereas in [12] we obtain the same BER at an average SNR per bit of just 2.25 dB. What could be the reason for this difference?
The answer lies in the computation of gammas. In this work, the gammas are computed using (30) and (34), which is sub-optimum compared to (66) in [12]. This is because, the noise termŨ i in (20) is equal toH * iW i , which is not even Gaussian (recall thatŨ i is Gaussian for large values of N due to the central limit theorem). However, in this work, we are assuming thatŨ i is Gaussian, for N = 1.
In Figure 4, we present the simulation results for a 16-state turbo code with generating matrix given by [54]  1) There is again a significant improvement in BER performance for N rt = 2, compared to N rt = 1. However, the improvement in BER for N rt = 4 is not much, compared to N rt = 2. Figures 3 and 4, with N = 16 and N rt = 2, the encoder in (40) gives only a 0.5 dB improvement at a BER of 10 −4 , over the encoder in (38). 3) Comparing Figures 3 and 4, with N = 512 and N rt = 2, the encoder in (40) gives only a 0.75 dB improvement at a BER of 10 −4 , over the encoder in (38). These results indicate that this may not be the best 16-state turbo code.

V. CONCLUSION
We have shown by analysis as well as computer simulations that, as the number of retransmissions increase, the BER decreases. There is little improvement by using a 16-state turbo code as compared to the 4-state code, in terms of the BER. Perhaps, this may not be the best 16-state turbo code. Future work could be to use iterative interference cancellation with no  re-transmissions, since the re-transmissions reduce the spectral efficiency. Estimating the N × N channel matrix is also a good topic for future research.

VI. APPENDIX
We derive the minimum average SNR per bit required for error-free propagation over a massive MIMO channel with re-transmissions. Consider the signalr where the subscript i denotes the time index,x i is the transmitted signal (message) andw i denotes samples of zero-mean noise, not necessarily Gaussian, with variance per dimension equal to σ 2 w . All the terms in (41) are complex-valued or twodimensional. Here the term "dimension" refers to a communication link between the transmitter and the receiver carrying only real-valued signals [11], [12]. The number of bits per transmission, defined as the channel capacity, is given by [11], [12], [55] C = log 2 (1 + SNR) bits per transmission (42) over a complex dimension, where the average SNR is given by over a complex dimension. Recall that (42) gives the minimum SNR for the error-free propagation of C bits. Proposition 6.1: The channel capacity is additive over the number of complex dimensions. In other words, the channel capacity over N complex dimensions, is equal to the sum of the capacities over each complex dimension, provided the information is independent across the complex dimensions [9], [11], [12]. Independence of information also implies that, the bits transmitted over one complex dimension is not the interleaved version of the bits transmitted over any other complex dimension.
Proposition 6.2: Conversely, if C bits per transmission are sent over N complex dimensions, it seems reasonable to assume that each complex dimension receives C/N bits per transmission [9], [11], [12] . The reasoning for Proposition 6.2 is as follows. We assume that a "bit" denotes "information". Now, if each of the N antennas (complex dimensions) receive the "same" C bits of information, then we might as well have only one antenna, since the other antennas are not yielding any additional information. On the other hand, if each of the N antennas receive "different" C bits of information, then we end up receiving more information (CN bits) than what we transmit (C bits), which is not possible. Therefore, we assume that each complex dimension receives C/N bits of "different" information.
Observe that the average SNR in (43) is not the average SNR per bit over a complex dimension. In order to compute the average SNR per bit, we note from Figure 1 that each data bit generates two QPSK symbols, and each QPSK symbol is repeated N rt times. Therefore, from Proposition 6.2, each QPSK symbol carries 1/(2N rt ) bits of information. The information sent in one transmission is N/(2N rt ) bits, from the N transmit antennas (Proposition 6.1). The information in each receive antenna in one transmission over a complex dimension is (Proposition 6.2): N/(2N N rt ) = 1/(2N rt ) = C bits (44) which is identical to the channel capacity in (42). Let us now consider the i th element ofR k in (1). We havẽ R k, i = N j=1H k, i, j S j +W k, i .
Now, if we substitutex in (41), the channel capacity remains unchanged, as given in (42), with SNR equal to where σ 2 H , σ 2 W and P av are defined in (2), (3) and (17) respectively. However, the information contained inR k, i in (45) is 1/(2N rt ) bits (see (44)), hence the SNR in (47) is for 1/(2N rt ) bits. Therefore, the SNR per bit is where we have used (44). Substituting (48) in (42) we get C = log 2 (1 + C SNR av, b ) bits per transmission (49) over a complex dimension. Re-arranging terms in (49) we get Thus (50) implies that as C → 0, SNR av, b → ln(2) ≡ −1.6 dB, which is the minimum average SNR per bit required for error-free propagation over a massive MIMO channel, with re-transmissions. Just as in the case of turbo codes, it may not be necessary for C to approach zero, in order to attain the channel capacity.