# FPGA Realization of Braun's Multipliers

Muhammad H. Rais, Member, IAENG, Mohammed H. Al Mijalli, and Mohammad Nisar

*Abstract*—In this study we investigate the design and implementation of Braun's multipliers using Very High speed integrated circuit Hardware Description Language and implemented on Virtex-5 FPGA devices. The Virtex-5 FPGA devices including XC5VLX30, XC5VLX30T, XC5VLX50, and XC5VLX50T and their resource utilization is obtained for 4×4, 6×6, 8×8 and 12×12 Braun's Multipliers. The study show that their device utilizations for occupied slices, average connection delay and maximum pin delay are different.

*Index Terms*—Braun's Multipliers, DSP, FPGA, Virtex-5, VHDL

#### I. INTRODUCTION

MULTIPLICATION is an important and prevalence in scientific computations for digital signal processing (DSP) and its subfields like sonar and radar signal processing, sensor array processing, spectral estimation, statistical signal processing, digital image processing, signal processing for communications, control of systems, biomedical signal processing, and seismic data processing. DSP algorithms utilize special processor called digital signal processor built on hardware such as Application Specific Integrated Circuits (ASICs). Beside ASICs there are additional technologies used for DSP includes more powerful general purpose microprocessors, Field Programmable Gate Arrays (FPGAs), digital signal controllers, and stream processors [1-5].

The recent development in very large scale integration (VLSI) technology has reached to such extent that the hardware implementation has become a desirable alternative, which makes FPGA a feasible technology and an attractive alternative to ASICs [6].

DSP, image processing and multimedia applications extensively requires multiplication and squaring functions [7-8]. Cryptography requires not only a significant number of multiplication and squaring functions but also large integers [9].

Many research efforts have been presented in literature to achieve hardware efficient implementation of low power

Manuscript received December 04, 2011. This work was supported in part by Cornea Research Chair, the College of Applied Medical Sciences, King Saud University.

M. H. Rais is with the Biomedical Technology Department, College of Applied Medical Sciences, King Saud University, Riyadh 11433, Saudi Arabia (phone: +966-1-469-3660; fax: +966-1-469-3556; (e-mail: mhrais@yahoo.com.au).

M. H. Al Mijalli is with the Biomedical Technology Department, College of Applied Medical Sciences, King Saud University, Riyadh 11433, Saudi Arabia (e-mail: almijalli@yahoo.com).

M. Nisar is with the Biomedical Technology Department, College of Applied Medical Sciences, King Saud University, Riyadh 11433, Saudi Arabia (e-mail: m\_nisar@yahoo.com).

multipliers [10-27].

In this study we have used contemporary Virtex-5 FPGA. The purpose of this paper is to present Braun's multipliers design and implementation on Virtex-5 FPGA family devices including XC5VLX30, XC5VLX30T, XC5VLX50, and XC5VLX50T and their resources utilization.

The rest of this paper is structured as follows. In section II, describes the Braun's multipliers and its mathematical basis. Section III addresses the architectural platform used in this study. Section IV presents the FPGA design and implementation results. Finally, section V presents the conclusion.

## II. BRAUN'S MULTIPLIER

Braun's multiplier is an  $n \times m$  bit parallel multiplier and generally known as carry save multiplier and is constructed with  $m \times (n-1)$  addres and  $m \times n$  AND gates. The Braun's multiplier has a glitching problem which is due to the ripple carry adder in the last stage of the multiplier.

#### A. Mathematical Basis of Braun's Multiplier

Consider a generic m by n multiplication of two unsigned n-bit numbers  $Y = Y_{m-1} \dots Y_0$  and  $X = X_{n-1} \dots X_0$ 

$$Y = \sum_{i=0}^{m-1} Y_i 2^i$$
 (1)

$$X = \sum_{i=0}^{n-1} X_i 2^i$$
 (2)

The product  $P = P_{2n-1} \dots P_1 P_0$ , which results from multiplying the multiplicand Y by the multiplier X, can be written as follows:

$$P = XY = \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} (X_i \cdot Y_j) 2^{i+j}$$
(3)

#### III. ARCHITECTURE PLATFORM

The FPGA is similar to ASIC to be configured using hardware description language (HDL) by the customer or designer after manufacturing. The ability to reconfigure after shipping and low cost relative to an ASIC design makes it an ideal candidate for many applications. FPGAs can be used to implement any logical function that an ASIC could perform.

FPGA contains programmable logic components called configurable logic blocks (CLBs) and a hierarchy of reconfigurable interconnect that allow the blocks to be wired together. The CLB containing RAM for creating

combinational logic functions. CLBs also contain memory elements such as flip-flops for clocked storage elements and multiplexers in order to route the logic within the block and to route the logic to and from external resources. FPGAs originally began as competitors to complex programmable logic devices (CPLDs) and competed in a similar space, which of glue logic for printed circuit boards.

The inherent parallelism of the logic resources on an FPGA allows for considerable computational throughput even at a low MHz clock rates. The flexibility of the FPGA allows for even higher performance by trading off precision and range in the number format for an increased number of parallel arithmetic units. This has driven a new type of processing called reconfigurable computing, where time intensive tasks are offloaded from software to FPGAs.

FPGA have the benefit of hardware speed and the flexibility of software. The three main factors that play an important role in FPGA based design are the targeted FPGA architecture, Electronic Design Automation (EDA) tools and design techniques employed at the algorithmic level using HDL. In FPGAs, the choice of the optimum multiplier involves three key factors: area, propagation delay and reconfiguration time [6]. In this section a brief introduction about Virtex-5 FPGA from Xilinx is presented.

# A. Virtex-5 FPGAs

The Virtex-5 devices [28] are a programmable alternative to custom ASIC technology. The Virtex-5 LX platform also contains many hard-IP system-level blocks, including Block RAM/first in first out (FIFO), second generation 25×18 DSP slices, SelectIO technology with built-in digitally-controlled impedance, ChipSync source-synchronous interface blocks, enhanced clock management tiles with integrated DCM and phase locked loop (PLL) clock generators, and advanced configuration options. Advanced DSP48E slices are available in Virtex-5 FPGAs that helps in accelerating computation intensive DSP and image processing algorithms. These slices can operate at a maximum frequency of 550 MHz, drawing only 1.38 mW of power at 100 MHz frequency.

### IV. FPGA DESIGN AND IMPLEMENTATION RESULTS

The design of Braun's multipliers  $4\times4$ ,  $6\times6$ ,  $8\times8$ , and  $12\times12$ -bit are done using VHDL and implemented in a Xilinx Virtex-5 FPGA family; devices including XC5VLX50 (package: ff676, speed grade: -3), XC5VLX50T (package: ff665, speed grade: -3), XC5VLX30 (package: ff676, speed grade: -3), and XC5VLX30T (package: ff665, speed grade: -3) using the Xilinx ISE 9.2i design tool [29].

Tables 1, 2, 3 and 4 summarize the FPGA device resources utilization for standard Braun's  $4 \times 4$ ,  $6 \times 6$ ,  $8 \times 8$ , and  $12 \times 12$ -bit multipliers. FPGA resource utilization shows similar findings in XC5VLX50 and XC5VLX50T and in between XC5VLX30 and XC5VLX30T. The only difference is obtained for occupied slices, average connection delay and maximum pin delay. The average connection delay in devices XC5VLX50 and XC5VLX50T shows almost same values for  $4 \times 4$ ,  $6 \times 6$ , and  $8 \times 8$  bit multipliers but average connection delay for  $12 \times 12$  bit multipliers has been increased. The maximum pin delay in XC5VLX50 and XC5VLX50 a

ISBN: 978-988-19251-1-4 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online) demonstrates almost similar response. The value for 4×4 bit is more than  $6\times6$  and  $8\times8$  bit multipliers and there is jump in value of maximum pin delay is seen in 12×12 bit multipliers. The comparison of occupied slices among XC5VLX50 and XC5VLX30 devices show an increase in value for 4×4 and  $6\times6$  bit multipliers and decrease in value for  $6\times6$  bit multiplier. The value of 12×12 bit multipliers remains the same. The same result is obtained for the XC5VLX50T and XC5VLX30T Virtex-5 devices.

The maximum pin delay is remarkably decreases in XC5VLX30 and XC5VLX30T devices and shows linear trend as compared to the values obtained for XC5VLX50 and XC5VLX50T Virtex-5 devices.

## V. CONCLUSION

In this paper we have demonstrated hardware design and implementation of FPGA based parallel architecture for Braun's multipliers utilizing VHDL. The design was implemented on Xilinx including XC5VLX30, XC5VLX50, XC5VLX30T, and XC5VLX50T Virtex-5 devices using the Xilinx ISE 9.2i design tool.

The objective is to present a comparative study of the Virtex-5 FPGA devices using  $4\times4$ ,  $6\times6$ ,  $8\times8$ , and  $12\times12$ -bit Braun's multipliers. The comparison between Virtex-5 devices show same numbers for four input LUTs, bonded IOBs, total equivalent gate count but their occupied slices, average connection and maximum pin delays are different. Further study will be carried out to find delays among devices.

#### REFERENCES

- L.V. Agostini, I.S. Silva, S. Bampi, "Multiplierless and fully pipelined JPEG compression soft IP targeting FPGAs", *Microprocessors & Microsystems*, vol. 31 no. (8), pp. 487-497, 2007.
- [2] V. Gierenz, C. Panis, J. Nurmi, "Parameterized MAC unit generation for a scalable embedded DSP core", *Microprocessors & Microsystems*, vol. 34 no. (5), pp. 138-150, 2010.
- [3] M.Y. Kong, J.M.P. Langlois, D. Al-Khalili, "Efficient FPGA implementation of complex multipliers using the logarithmic number system", in proc. of IEEE International Symposium on Circuits and Systems, pp. 3154-3157, 2008.
- [4] A. Zemva, M. Verderber, "FPGA-oriented HW/SW implementation of the MPEG-4 video decoder", Microprocessors & Microsystems, vol. 31 no. (5), pp. 313-325, 2007.
- [5] D. Stranneby, W. Walker "Digital Signal Processing and Applications", Elsevier, 2nd ed., 2004.
- [6] C. Maxfield, "The Design Warrior's Guide to FPGAs: Devices, Tools and flows", Newnes Publishers, MA, 2004.
- [7] J. A. Kalomiros, J. Lygouras, "Design and evaluation of a hardware/software FPGA-based system for fast image processing", *Microprocessors & Microsystems*, vol. 32 no. (2), pp. 95-106, 2008.
- [8] C.R. Baugh, B.A. Wooley, "A Two's Complement Parallel Array Multiplication Algorithm", *IEEE Trans. Comput.* Vol. C-22, no. (12), pp. 1045-1047, 1973.
- [9] W. Stallings, "Cryptography and Network Security: Principles and Practices", Prentice-Hall, 4th edn. Upper Saddle River, NJ, 2006.
- [10] M.H. Rais, "FPGA design and implementation of fixed width standard and truncated 6×6-bit multipliers: A comparative study", *in Proc. of the* 4th IEEE International Design and Test Workshop, IEEE Xplore Press, pp. 1-4, 2009.
- [11] M.H. Rais, "Efficient hardware realization of truncated multipliers using FPGA", *Int., J. of Applied Science, Engineering and Technology*, vol. 5, no. (2), pp. 124-128, 2009.
- [12] M.H. Rais, "Hardware implementation of truncated multipliers using Spartan 3AN, Virtex-4 and Virtex-5 devices", *American J. of Engineering and Applied Sciences*, vol. 3, no. (1), pp. 201-206, 2010.
- [13] M.H. Rais, B.M. Al-Harthi, S.I. Al-Askar, F.K. Al-Hussein, "Design and Field Programmable Gate Array Implementation of Basic Building

Blocks for Power-Efficient Baugh-Wooley Multipliers", Am. J. Eng. Applied Sci., vol. 3, no. (2), pp. 307-311, 2010.

- [14] M.H. Rais, M.H. Al Mijalli, "Braun's multipliers: Spartan-3AN based design and implementation", *J. Comput. Sci.*, vo. 7, no. (11), pp. 1629-1632, 2011.
- [15] K.N.Vijeyakumar, V. Sumathy, S. Komanduri, C.C.G. Suji, "Design of low- power high-speed error tolerant shift and add multiplier", *J. Comput. Sci.*, vol. 7, no. (12), pp. 1839-1845. 2011.
- [16] M.H. Rais, M.H. Al Mijalli, "Field programmable gate arrays based design, implementation and delay study of Braun's multipliers", J. *Computer Sci.*, vol. 8, no. (2), pp. 227-231, 2012.
- [17] M.H. Rais, M.H. Al Mijalli, "Field programmable gate arrays based realization of truncated multipliers", *Am. J. Applied Sci.*, vol. 8, no. (7), pp. 681-684, 2011.
- [18] M.H. Al-Mijalli, "Spartan-3AN field programmable gate arrays truncated multipliers delay study", *Am. J. Applied Sci.*, vol. 8, no. (6), pp. 554-557, 2011.
- [19] M.H. Al Mijalli, "FPGA Based Truncated Multipliers: A Study of Latency in FPGA Devices", *European J. Sci. Res.*, vol. 60, no. (2), pp. 273-279, 2011.
- [20] M. H. Rais, M. H. Al Mijalli, "Reconfigurable Design and Implementation of Standard and Truncated Multipliers Using Spartan-3AN, Spartan-3E, Virtex-2 and Virtex-4 FPGAs", *European J. Sci. Res.*, vol.60, no. (3), pp.469-481, 2011
- [21] M. H. Rais, M. H. Al Mijalli, M. Nisar, "Resource Efficient Design and Implementation of Standard and Truncated Multipliers using FPGAs", in Proc. of the World Congress on Engineering, vol. II, 2011.
- [22] M. H. Al Mijalli, "Delay Study of Virtex-2, Virtex-4 and Spartan-3E Based Truncated Multipliers", *Int. J. Comput. Sci. Network Security*, vol. 11, no. (7), pp. 68-71, 2011.

- [23] M. H. Rais, M. H. Al Mijalli, "FPGA Based Fixed Width 4×4, 6×6, 8×8 and 12×12-Bit Multipliers using Spartan-3AN", *Int. J. Comput. Sci. Network Security*, vol.11, no. (2), pp. 61-68. 2011.
- [24] M. H. Rais, M. H. Al Mijalli, "Virtex-5 FPGA Based Braun's Multipliers", *Int. J. Comput. Sci. Network Security*, vol. 11, no. (8), pp. 81-84, 2011.
- [25] M.H. Rais, "Hardware design and implementation of fixed-width standard and truncated 4x4, 6x6, 8x8 and 12x12-bit multipliers using FPGA", in *AIP Conference Proceedings*, vol. 1239, pp. 192-196, 2010.
- [26] M.N.M. Isa, M.I. Ahmad, S.A.Z. Murad, M.K.M. Arshad, "FPGA Based SPWM Bridge Inverter", Am. J. Applied Sci., vol. 4, no. (8), pp. 584-586, 2007.
- [27] Y.S. Algnabi, R. Teymourzadeh, M. Othman, M.S. Islam, M.V. Hong, "On-Chip Implementation of Pipeline Digit-Slicing Multiplier-Less Butterfly for Fast Fourier Transform Architecture", Am. J. Eng. Applied Sci., vol. 3, no. (4), pp. 757-764, 2010.
- [28] Xilinx, Virtex-5 FPGA family datasheet, (2009). http://www.xilinx.com/support/documentation/data\_sheets/ds100.pdf
- [29] Xilinx, ISE 9.2i design tool, (2007).
  www.xilinx.com/prs\_rls/2007/software/0786\_ise92i.htm

TABLE I FPGA resource utilization for standard Braun's multiplier for Virtex-5 XC5v1x50 (Package: ff676, speed grade:-3) [Rais and Al Mualli 2011, Ref#24]

| Bit<br>Width | Multipliers | Four Input | Occupied<br>Slices | Bonded<br>IOBs | Total      | Average    | Maximum<br>Pin delay (ns) |
|--------------|-------------|------------|--------------------|----------------|------------|------------|---------------------------|
|              |             | (28800)    | (7200)             | (440)          | Gate Count | delay (ns) |                           |
| 4×4          | Standard    | 22         | 11                 | 16             | 154        | 0.887      | 2.103                     |
| 6×6          | Standard    | 43         | 19                 | 24             | 301        | 0.885      | 1.795                     |
| 8×8          | Standard    | 81         | 29                 | 32             | 567        | 0.857      | 1.733                     |
| 12×12        | Standard    | 202        | 96                 | 48             | 1414       | 1.074      | 2.834                     |

TABLE II FPGA RESOURCE UTILIZATION FOR STANDARD BRAUN'S MULTIPLIER FOR VIRTEX-5 XC511 X50T (PACK ACE: EE665, SPEED CRAPE: 3)

| ACJVLAJ01 (FACKAGE, FF00J, SPEED GRADE5) |             |                               |                              |                         |                                   |                                     |                           |
|------------------------------------------|-------------|-------------------------------|------------------------------|-------------------------|-----------------------------------|-------------------------------------|---------------------------|
| Bit<br>Width                             | Multipliers | Four Input<br>LUTs<br>(28800) | Occupied<br>Slices<br>(7200) | Bonded<br>IOBs<br>(360) | Total<br>Equivalent<br>Gate Count | Average<br>Connection<br>delay (ns) | Maximum<br>Pin delay (ns) |
| 4×4                                      | Standard    | 22                            | 11                           | 16                      | 154                               | 0.887                               | 2.103                     |
| 6×6                                      | Standard    | 43                            | 19                           | 24                      | 301                               | 0.885                               | 1.795                     |
| 8×8                                      | Standard    | 81                            | 29                           | 32                      | 567                               | 0.857                               | 1.733                     |
| 12×12                                    | Standard    | 202                           | 96                           | 48                      | 1414                              | 1.074                               | 2.834                     |

TABLE III

FPGA RESOURCE UTILIZATION FOR STANDARD BRAUN'S MULTIPLIER FOR VIRTEX-5 XC5VL X30 (PACKAGE: EE676 SPEED GPADE: -3)

| ACJVLAS0 (I ACKAGE. FF070, SFEED GRADE5) |             |                               |                              |                         |                                   |                                     |                           |
|------------------------------------------|-------------|-------------------------------|------------------------------|-------------------------|-----------------------------------|-------------------------------------|---------------------------|
| Bit<br>Width                             | Multipliers | Four Input<br>LUTs<br>(19200) | Occupied<br>Slices<br>(4800) | Bonded<br>IOBs<br>(400) | Total<br>Equivalent<br>Gate Count | Average<br>Connection<br>delay (ns) | Maximum<br>Pin delay (ns) |
| 4×4                                      | Standard    | 22                            | 13                           | 16                      | 154                               | 0.824                               | 1.371                     |
| 6×6                                      | Standard    | 43                            | 17                           | 24                      | 301                               | 0.823                               | 1.674                     |
| 8×8                                      | Standard    | 81                            | 45                           | 32                      | 567                               | 0.975                               | 2.148                     |
| 12×12                                    | Standard    | 202                           | 95                           | 48                      | 1414                              | 1.027                               | 2.663                     |

TABLE IV

FPGA RESOURCE UTILIZATION FOR STANDARD BRAUN'S MULTIPLIER FOR VIRTEX-5

XC5vlx30T (Package: ff665, speed grade:-3)

| Bit<br>Width | Multipliers | Four Input<br>LUTs<br>(19200) | Occupied<br>Slices<br>(4800) | Bonded<br>IOBs<br>(360) | Total<br>Equivalent<br>Gate Count | Average<br>Connection<br>delay (ns) | Maximum<br>Pin delay (ns) |
|--------------|-------------|-------------------------------|------------------------------|-------------------------|-----------------------------------|-------------------------------------|---------------------------|
| 4×4          | Standard    | 22                            | 13                           | 16                      | 154                               | 0.824                               | 1.371                     |
| 6×6          | Standard    | 43                            | 17                           | 24                      | 301                               | 0.823                               | 1.674                     |
| 8×8          | Standard    | 81                            | 45                           | 32                      | 567                               | 0.975                               | 2.148                     |
| 12×12        | Standard    | 202                           | 95                           | 48                      | 1414                              | 1.027                               | 2.663                     |