FPGA-based rate-adaptive LDPC-coded modulation for the next generation of optical communication systems

DING ZOU* AND IVAN B. DJORDJEVIC

Department of Electrical and Computer Engineering, University of Arizona, 1230 E Speedway Blvd., Tucson, Arizona 85721, USA
*dingzou@email.arizona.edu

Abstract: In this paper, we propose a rate-adaptive FEC scheme based on LDPC codes together with its software reconfigurable unified FPGA architecture. By FPGA emulation, we demonstrate that the proposed class of rate-adaptive LDPC codes based on shortening with an overhead from 25% to 42.9% provides a coding gain ranging from 13.08 dB to 14.28 dB at a post-FEC BER of 10^{-15} for BPSK transmission. In addition, the proposed rate-adaptive LDPC coding combined with higher-order modulations have been demonstrated including QPSK, 8-QAM, 16-QAM, 32-QAM, and 64-QAM, which covers a wide range of signal-to-noise ratios. Furthermore, we apply the unequal error protection by employing different LDPC codes on different bits in 16-QAM and 64-QAM, which results in additional 0.5dB gain compared to conventional LDPC coded modulation with the same code rate of corresponding LDPC code.

© 2016 Optical Society of America

OCIS codes: (060.4510) Optical communications; (060.1660) Coherent communications.

References and links

1. Introduction

Current coherent optical transmission systems focus on single carrier solutions for 400-Gbit/s to support traffic growth in optical fiber communications, together with a few carriers frequency division multiplexed solutions for next generation data rate towards 1-Tb/s [1,2]. With the advance of analog-to-digital converter technologies, high order modulation formats up to 64-QAM with symbol rate up to 72-Gbaud has been demonstrated experimentally with Raman amplification [3]. To accommodate such high-speed optical communication system, the high-performing FEC engines that can support throughputs of 400-Gbit/s or multiple thereof are needed, which have low power consumption, providing high net coding gains at a target bit error rate of $10^{-15}$, and that are preferably adaptable to the time-varying optical channel conditions [4,5]. Recently, soft-decision binary and non-binary LDPC codes with an outer hard-decision code pushing the system BER to levels below target BER have been proposed in [6,7]. Meanwhile, a spatially coupled LDPC code has been demonstrated to have very low error floors below the system’s target bit-error rate (BER) [8,9]. While it is essential to design an optimized FEC code offering the best coding gain, determining the optimal tradeoff between high-order modulation formats and the overhead of FEC codes is highly concerned in next generation 400-Gbit/s technology. Most recently, DP-QPSK, DP-16QAM, DP-64QAM, with varying code rates have been studied to achieve the highest generalized mutual information (GMI) at a given signal-to-noise ratio (SNR) and this study explored a total of 10 modulation formats to find the best combination of spectral efficiency and highest span loss budget [10,11].

In this paper, we propose an adaptive FPGA-based LDPC-coded modulation for the next generation of optical communication systems. Our motivation is two-fold. Firstly, a well-constructed capacity-approaching LDPC code offers the promise of substantial performance gain. Secondly, a unified architecture of LDPC decoder together with various modulation formats have been shown to allow a wide range of performances for OTN, where large number of parameters can be reconfigured in order to cope with the time-varying optical channel conditions and service requirements. The contribution of our paper can be summarized as follows: i) To the best of our knowledge, this is the first work on real-time implementation of rate-adaptive LDPC codes with overhead ranging from 25% to 42.9% in BPSK, QPSK, 8-QAM, 16-QAM, 32-QAM, and 64-QAM transmissions over spontaneous emission noise (ASE) scenario. ii) We provide detailed hardware architecture implementation and its explicit resource utilization and power analysis to help researcher better access its figure of merit. iii) We provide detailed analysis of rate-adaptive LDPC codes when applied to the higher-order modulation formats and demonstrate that an enhanced net coding gain (NCG) can be achieved with proposed rate-adaptive LDPC coded modulation schemes.

The rest of this paper is organized as follows. In Section 2 we first present the data flow of the LDPC-coded modulation emulator and the associate unified FPGA-based architecture and the corresponding performance, as well as the logic utilization, power consumption, latency, and throughput analysis. In Section 3, we then proposed a rate-adaptive LDPC coding scheme combined with higher order modulation formats. Section 4 concludes our paper.

2. FPGA-based LDPC-coded modulation emulator

Let $LLR_i$, $LLR_j$ represent the symbol log-likelihood ratio (LLR) of symbol $i$ and bit LLR of bit $j$ in one symbol, and let $P(s_i | r)$ represent a posteriori probability of the symbol $i$ given the received symbol $r$. For LDPC decoder, let $R_{ck}, R_{c}, L_c$ represent the check $c$ to variable $v$ message, the variable $v$ to check $c$ at $k$-th iteration and $l$-th layer message, and the LLR from the channel, respectively; where $k = 1, ..., I_{max}$ and $l = 1, ..., \gamma$. The layered scaled min-sum algorithm (with scaling factor $s$ set to 0.75) is adopted in this paper [12]. The emulation processors can be summarized as Eqs. (1)-(6), where Eqs. (1)-(3) correspond
to symbol LLRs calculation (Eq. (1)) and bit LLRs calculation (Eq. (3)). On the other hand, Eqs. (4)-(6) correspond to the layered decoding algorithm.

\[ LLR_{si} = \log(P(s_i | r) / P(s_i | r)) \]  

\[ LLR_{sj} = \log(\sum_{(b_j)_{-1} \neq 0} P(s_j | r) / \sum_{(b_j)_{-1} = 1} P(s_j | r)) \]  

\[ LLR_{sj} = \max(\sum_{(b_j)_{-1} \neq 0} LLR_{si}) - \max(\sum_{(b_j)_{-1} = 1} LLR_{si}) \]  

\[ L_v^{k,j} = L_v + \sum_{i} R_v^{k,i} \]  

\[ L_v^{k,j} = L_v + \sum_{i} R_v^{k,i} \]  

\[ R_v^{k,j} = s \times \prod_{i} \text{sign}(L_v^{i,j}) \min_{i} |L_v^{i,j}| \]  

2.1 FPGA architecture

We study the performance of the proposed rate-adaptive LDPC-coded modulation in a field programmable gate array (FPGA) platform, whose high-level diagram is illustrated in Fig. 1(a). The platform consists of three parts: a set of PRBS 31 generators, a M-QAM mapper, two Gaussian noise generators, a symbol log-likelihood ratio calculator, a bit log-likelihood ratio calculator, a rate-adaptive LDPC decoder based on layered scaled min-sum algorithm, and an error counter circuit. The PRBS 31 generator is based on linear feedback shift register with a 31-bit initial value. A M-QAM mapper is stored in two read-only memories (ROMs). The Gaussian noise generator using two linear feedback shift register (LFSR)-based uniform generator combined with Box-Muller algorithm generates samples of the white Gaussian noise. Such generated sequence of samples is multiplied with standard deviation of noise \( \sigma \) and fed to the symbol log-likelihood ratio block which is implemented based on Eq. (1). It is worth noting that the max star operation is replaced by max operation due to its simplicity. Then the quantized bit LLR is obtained based on Eq. (2) and fed to LDPC decoder based on Eqs. (3)-(5). In the architecture, a microblaze-based software configuration interface is implemented to set up initial configuration and to read from register. The setup process includes configuring noise variance, the number of iterations, and the length of shortening. Meanwhile, the BER is obtained by accessing registers storing the number of errors and the number of codewords that have been emulated.

The architecture of the code rate reconfigurable binary LDPC decoder is shown in Fig. 1(b). There are three types of processors shown in the figure: (i) variable node unit (VNU) based on Eqs. (3)-(4) will take input from memories \( L_v \) and \( R_v \) and produce \( L_v^{k,j} \) and \( L_v^{k,j} \), (ii) scaled min-sum check node unit (CNU) based on Eq. (5) take inputs \( L_v^{k,j} \) and produce \( R_v^{k,j} \), (iii) early termination unit (ETU) that is making a bit decision based on \( L_v^{k,j} \). In addition, there are four types of memories in the implementation: (i) memory for \( R_v \) with size of \( \gamma \times n \times W_r \) stores \( R_v^{k,j} \), (ii) memory for \( L_v \) with size of \( n \times W_l \) stores the initial LLRs, (iii) memory for \( \hat{c} \) with size \( n \) stores the decoded bits. In discussion above, \( \gamma \) denotes the column weight, \( n \) is the codeword length, \( W_r \) and \( W_l \) represent the word-lengths for \( R_v \) and \( L_v \). The most computational complexity block is involved in CNU, shown in Fig. 1(c). The ABS-block first takes the absolute value of the inputs and the sign XOR array produces the output sign. Then we find the first minimum value via binary tree and trace back the survivors to find the second minimum value as well as the position of the first minimum value. At last, we will reconstruct the output data from sign bits and the three outputs from
scaled two minimums’ finder block. Furthermore, we can take advantage of the technique to significantly reduce the memory usage.

2.2 Emulation results and analysis

The mother LDPC code (3, 15) (34635, 27710) is constructed based on permutation matrices due to its efficient implementation [13], and the rate adaptation is achieved by eliminating several blocks from a mother code by setting the initial log-likelihood ratio (LLR) into largest integer value. We employ the 8-bit uniform quantization scheme for messages \((L_v, L_w, R_{cv})\) to ensure that the error floor phenomenon is due to the code-design itself instead of finite precision representation, while keeping the decoding complexity reasonably low.
TABLE 1. Coding gains (in dB) of LDPC-coded modulation scheme.

<table>
<thead>
<tr>
<th>Code Rate</th>
<th>BPSK</th>
<th>QPSK</th>
<th>16-QAM</th>
<th>32-QAM</th>
<th>64-QAM</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.7</td>
<td>14.28</td>
<td>14.28</td>
<td>13.92</td>
<td>13.79</td>
<td></td>
</tr>
<tr>
<td>0.727</td>
<td>14.02</td>
<td>14.02</td>
<td>13.58</td>
<td>13.53</td>
<td></td>
</tr>
<tr>
<td>0.75</td>
<td>13.71</td>
<td>13.71</td>
<td>13.33</td>
<td>13.29</td>
<td></td>
</tr>
<tr>
<td>0.77</td>
<td>13.52</td>
<td>13.52</td>
<td>13.12</td>
<td>13.12</td>
<td></td>
</tr>
<tr>
<td>0.786</td>
<td>13.33</td>
<td>13.33</td>
<td>12.92</td>
<td>12.91</td>
<td></td>
</tr>
<tr>
<td>0.8</td>
<td>13.08</td>
<td>13.08</td>
<td>12.78</td>
<td>12.78</td>
<td></td>
</tr>
</tbody>
</table>

The BER vs. SNR performance of the proposed rate-adaptive LDPC code with number of layered iterations set to 45 is presented in Fig. 2, in which we have shown a set of LDPC component codes of code rates \{0.8, 0.786, 0.77, 0.75, 0.727, 0.7\} in which rate-adaptation is performed via shortening, combined with a set of modulation formats, namely, BPSK, QPSK, 8-QAM, 16-QAM, 32-QAM, and 64-QAM. Table 1 presents coding gains at BER of $10^{-15}$, obtained via extrapolation. One can clearly observe that a flexible NCGs ranging from 13.08dB to 14.28dB can be achieved by employing the proposed rate-adaptive LDPC coding. Additionally, when combined with higher-order modulation formats, the proposed rate adaptation when applied to both component code rates and modulation format size can offer extremely flexible performance by adapting to time-varying optical channel conditions. It is worth noting that the coding gain decreases as the constellation size increases. We will explain and address this observation in next section.
2.3 Implementation analysis

Apart from error correction performance of the rate-adaptive LDPC-coded modulation, logic utilization, power consumption, and latency represent another important aspect. We compare the logic utilization and power consumption of six LDPC-coded modulation schemes, which have been implemented in Xilinx xc6vsx475t. Each emulator comprises \( \log_2(M) \) PRBS generators (\( M \) is the signal constellation size), one or two Gaussian noise generator for BPSK and others modulation formats, respectively, one symbol LLR calculator, one bit LLR calculator and one reconfigurable LDPC decoder. The resource utilization is summarized in Table 2. One can clearly notice that the occupied slices usage increases as the modulation format size increases, while the memory utilization is almost the same due to negligible amount of memory utilization except inside the LDPC decoder. In addition, the on-chip power consumption from clocks, logics, signals, BRAMs, DSPs, MMCMs, and IOs are shown in the last column in Table 2. The power consumption increases as we increase modulation format size, while this increase is reasonably low.

<table>
<thead>
<tr>
<th>Modulation Formats</th>
<th>Occupied Slices</th>
<th>RAMB36E1</th>
<th>RAMB18E1</th>
<th>Power (W)</th>
</tr>
</thead>
<tbody>
<tr>
<td>BPSK</td>
<td>10% (7,724 out of 74,400)</td>
<td>10% (113 out of 1,064)</td>
<td>6% (137 out of 2,128)</td>
<td>1.205</td>
</tr>
<tr>
<td>QPSK</td>
<td>10% (7,860 out of 74,400)</td>
<td>10% (113 out of 1,064)</td>
<td>6% (137 out of 2,128)</td>
<td>1.223</td>
</tr>
<tr>
<td>8QAM</td>
<td>11% (8,758 out of 74,400)</td>
<td>10% (113 out of 1,064)</td>
<td>6% (137 out of 2,128)</td>
<td>1.267</td>
</tr>
<tr>
<td>16QAM</td>
<td>12% (9,181 out of 74,400)</td>
<td>10% (113 out of 1,064)</td>
<td>6% (137 out of 2,128)</td>
<td>1.357</td>
</tr>
<tr>
<td>32QAM</td>
<td>14% (10,465 out of 74,400)</td>
<td>10% (113 out of 1,064)</td>
<td>6% (137 out of 2,128)</td>
<td>1.538</td>
</tr>
<tr>
<td>64QAM</td>
<td>20% (15,182 out of 74,400)</td>
<td>10% (113 out of 1,064)</td>
<td>6% (137 out of 2,128)</td>
<td>2.024</td>
</tr>
</tbody>
</table>

As we discussed above, we duplicate four LDPC-coded modulation emulators in one FPGA and with four FPGAs available in our rapid prototyping platform, in total 16 emulators are employed. Each decoder consists of 3 CNUs and 45 VNUs in the implementation, hence the throughput of the decoder can be calculated by \( F_{\text{clk}} \times n / (B / (p + \delta) \times I_{\text{max}}) \), where \( F_{\text{clk}} = 200\text{MHz} \) is the FPGA running frequency, \( n \) is number of bits per codeword, \( B = 2309 \) is the block size, \( p = 3 \) is the pipeline depth, \( \delta = 7 \) is the latency of VNP and CNP, \( I_{\text{max}} = 45 \) is the maximum number of layered iterations. It is worth noting that the decoder will converge fast at high SNR regime (~24 iterations verified by simulation). The throughput of the mother code will be ~3.17Gbit/s at low SNR regime and ~5.94Gbit/s at high SNR regime, while the throughput of code rate of 0.7 will be ~2.11Gbit/s and ~3.96Gbit/s respectively.

3. Proposed rate-adaptive LDPC-coded modulation

The uncoded and coded BER performance vs. SNR of each bit in BPSK, QPSK, 8-QAM, 16-QAM, 32-QAM, and 64-QAM are shown in Figs. 3(a) and 3(b), respectively. A close look at Fig. 3 reveals that each bit in higher order modulation is protected unequally. For instance, the first bit and second bit have the same performance (the same applies for the third and the fourth bits) in 16QAM. Additionally, at input BER threshold of \( 4.2 \times 10^{-2} \) of LDPC code with code rate of 0.75, the corresponding SNR limits of first and second bit in 16QAM are 9.37dB and 11.72dB, respectively. This phenomenon is illustrated in Fig. 3(b) as well since the SNR gap of coded BER is approximately 2.4dB. The overall SNR limit of post-FEC BER of \( 10^{-15} \) will limited by the worst bit performance, which inspires us to use our proposed rate-adaptive LDPC codes for different component bits combined with higher-order modulation formats. Another interesting observation is the best bit performance in 64-QAM is comparable to the worst bit performance in 16QAM. In addition, there are slightly different in the slope of performance curves associated with different bits in high-order modulation formats since the distribution of bit LLR is not Gaussian anymore.
In order to bridge the gap between different bits in high-order modulation formats, non-binary LDPC codes can be employed [14]. Due to its extremely high implementation complexity [7], we propose to use different error correction performance codes to different bits in high-order modulation format. Namely, instead applying code rate of 0.75 to all four bits in 16-QAM and six bit 64-QAM, we employ code rate of 0.7 to first and third bit and code rate of 0.8 to second and fourth bits in 16-QAM. Meanwhile, we apply code rate of 0.7, 0.75, 0.8 to the first and fourth pair, second and fifth pair, and third and sixth pair bit in 64-QAM; both configurations will result in the same code rate of 0.75. As shown in Fig. 4, the
BER vs. SNR performance reveals the existence of coding gain improvement of proposed scheme compared with the conventional scheme. More specifically, the proposed scheme provides 0.5dB additional gain at the BER of 10^{-15} compared to the corresponding (3, 12)-regular QC-LDPC (27708, 20781) code when 16-QAM and 64-QAM are used. Meanwhile, there is no error floor phenomenon observed at BER of 10^{-15} after ~10^{16} bits have been emulated, which implies the effectiveness of designing high-girth QC-LDPC code. It is worth noting that we can further bridge the gap between different bits in large constellation by more flexible component codes, however, addressing the difference of latency and the throughput of different component decoder will be very interesting.

![BER performance vs. SNR with maximum number of layered iterations set to 45.](image)

**Fig. 4.** BER performance vs. SNR with maximum number of layered iterations set to 45.

### 4. Conclusion

In this paper, we have proposed a novel class of reconfigurable rate-adaptive LDPC codes with overhead ranging from 25% to 42.9% for high-speed optical transmission systems. The BER performance has been verified through FPGA emulation system and it has been shown that the proposed LDPC-coded modulation schemes exhibit a superior waterfall performance and excellent error floor performance down BER of 10^{-15}. In addition, additional SNR gain of 0.5dB can be achieved by employing the rate-adaptive LDPC codes to 16-QAM and 64-QAM. To the best of our knowledge, this is the first FPGA implementation results of flexible LDPC-coded modulation. We believe that the proposed rate-adaptive QC-LDPC codes together with six modulation formats is one of the promising candidates for the next generation of optical communication systems.

### Funding

National Science Foundation (NSF) CIAN ERC (EEC-0812072); ONR MURI program (N00014-13-1-0627).