© 2005 - 2014 JATIT & LLS. All rights reserved

ISSN: 1992-8645

www.jatit.org

AN EFFICIENT FLOATING-POINT MULTIPLIER DESIGN USING COMBINED BOOTH AND DADDA ALGORITHMS

# <sup>1</sup>DHANABAL R, <sup>2</sup>BHARATHI V, <sup>3</sup>NAAMATHEERTHAM R SAMHITHA, <sup>4</sup>PAVITHRA S, <sup>5</sup>PRATHIBA S, <sup>6</sup>JISHIA EUGINE

<sup>1</sup>Asst Prof. (Senior Grade), School of Electronics Engineering,
 <sup>2</sup>Asst. Prof., GGR College of Engineering, Anna University, Vellore,
 <sup>3</sup>M.Tech VLSI Student, School of Electronics Engineering,
 <sup>4, 5, 6</sup> B.Tech ECE Students, School of Electronics Engineering,
 VIT University, Vellore, Tamil Nadu, India
 E-mail: <sup>1</sup>rdhanabal@vit.ac.in, <sup>2</sup>bharathiveerappan@yahoo.co.in, <sup>3</sup>samhitha.nr@gmail.com,
 <sup>4</sup>pavithra.girija@gmail.com, <sup>5</sup>prathiba.sivakumar@yahoo.com, <sup>6</sup>jishhhia@gmail.com

# ABSTRACT

This paper includes the design of efficient double precision floating-point multiplier using radix-4 Modified Booth Algorithm (MBE) and Dadda Algorithm. This hybrid multiplier is designed by using the advantages in both the multiplier algorithms. MBE has the advantage of reducing partial products to be added. Dadda scheme has the advantage of adding the partial products in a faster manner. Our main objective is to combine these two schemes to make the multiplier design power efficient and area efficient. The floating-point multiplier is designed using Verilog HDL. The design is simulated using Altera ModelSim and synthesized using Cadence RTL compiler in TSMC 45 nanometre technology. It is found that multiplier has reduced power and area and it consumes 4619.23  $\mu W$  and 34880  $\mu m^2$ .

Keywords: Floating Point Multiplication, Dadda Reduction, Modified Booth, Computer Arithmetic

# 1. INTRODUCTION

Multipliers are among the fundamental components in modern electronic systems that run complex calculations in both digital signal processors and general purpose processors. As the technology is improving, many researchers are trying to develop efficient multiplier designs which can offer high speed or low power or low area or the combination of all these three in single multiplier. But for the portable devices power and area is mainly considered and it should be minimized as much as possible. Such type of multiplier is implemented here.

The Booth Algorithm is relatively straight forward way of doing signed multiplications [1]. Modified Booth algorithm reduces the partial products than any other method. Then comes addition of these partial products. In the recent days Dadda or Wallace are been used for addition of partial products. But when we reconsider Wallace and Dadda multipliers it is proved that the hardware requirement for Dadda is less than the Wallace multiplier [2, 3]. Dadda reduction method performs faster addition of partial products [2]. In this work a Booth encoded Dadda multiplier is implemented and the improved performance is compared with some regular multipliers.

E-ISSN: 1817-3195

The floating-point multiplier (FPM) is the major logical block in the floating-point unit (FPU) [11] [12] and ALU [13]. The IEEE-754 standard sets down specific rules and formats for any system that uses floating-point arithmetic's [4]. The main reason to consider double precision floating-point multiplier is that many standard FPUs support this format [5, 6].

This paper is divided into five sections. Basic floating-point multiplication algorithm is explained in section II. Section III provides the actual design of FPM. In section IV results and comparison is provided. Conclusion is given in section V.

# FLOATING POINT MULTIPLICATION Standard IEEE Floating-Point Representation

Our multiplier unit performs multiplication on double precision floating-point data. i.e., the inputs and outputs to the unit are in standard IEEE-754 double precision format [4]. The standard IEEE-754 double precision floating-point format consists

<u>30<sup>th</sup> June 2014. Vol. 64 No.3</u>

© 2005 - 2014 JATIT & LLS. All rights reserved

| ISSN: 1992-8645 | www.jatit.org | E-ISSN: 1817-3195 |
|-----------------|---------------|-------------------|
|-----------------|---------------|-------------------|

of a 64-bit vector split into three sections as shown in Fig. 1. The double precision floating-point number is calculated as shown in equation (1). To represent any floating-point number, the three fields are combined and interpreted as follows:

$$A = (-1)^{sign_A} \times 1. fraction_A \times 2^{\exp_A - blas}$$
(1)

#### Fig. 1. IEEE-754 Double Precision (64-bit) Floating-Point Format

#### 2.2 Floating-Point Multiplication

The floating-point multiplication involves following steps: Assume that the operands A and B are in IEEE-754 double precision format, performing floating-point multiplication  $A \times B = (-1)^{A_{exp}} (A_{man} \times 2^{A_{exp}}) \times (-1)^{B_{exp}} (B_{man} \times 2^{B_{exp}})$ where "map" represents matrices "avy" represents

where "man" represents mantissa, "exp" represents exponent.

- 1. Compute the sign of the result  $(A_{exp} \wedge B_{exp})$ .
- 2. Multiply the mantissa.
- 3. Normalize the product if needed.
- 4. Compute the exponent of the result: exponent of the result = biased exponent  $(A_{exp})$  + biased exponent  $(B_{exp})$  - bias
- 5. Round the result to the no. of mantissa bits.

The double precision floating-point multiplier architecture is as shown in Fig. 2. The architecture contains a multiplier tree which multiplies two 53bit numbers (1 hidden bit+52 bit mantissa). Output from the multiplier tree is in carry save format and it is passed to a combined add/round stage, where the carry save product is added and rounded. Both the sign and exponent calculation logic runs in parallel with the mantissa multiplication. To get the final exponent value, the exponent needs to be adjusted based on the rounding. Sometimes there is a chance of getting a wrong value for the exponent as the actual exponent value exceeds the bit range. So to avoid this in our design exception handling is included. Whenever the bit range is high, it generates an exception while performing exponent logic or exponent adjust.

#### 2.3 IEEE Rounding Modes

As per IEEE-754 standards there are four rounding modes as follows:

- 1. Round to nearest (RN)
- 2. Round to zero (RZ)
- 3. Round to positive infinity (RP)
- 4. Round to negative infinity (RN)

Implementation wise these rounding are further reduced to three. Round up (RU) with fix up, round to infinity, round to zero [7]. Round to nearest can be implemented as Round up with a fix up [7]. We are using RU in our floating-point arithmetics. Mathematically RU is given as follows

$$= \left\{ \begin{array}{c} \begin{bmatrix} x \\ y \end{bmatrix} & \text{if } x - \begin{bmatrix} x \\ y \end{bmatrix} \ge 0.5 \\ \begin{bmatrix} x \\ y \end{bmatrix} & \text{otherwise} \end{array} \right.$$

λ

Where |x| and  $\lfloor x \rfloor$  are the ceiling and floor functions.



Fig. 2. Floating-Point Multiplier Architecture

#### 3. FLOATING-POINT MULTIPLIER USING BOOTH AND DADDA ALGORITHMS

Simple multiplication operation involves three major steps:

- 1. Generation of partial products.
- 2. Reduce the generated partial products into two rows i.e., one row of final sums and one row of carries.
- 3. These final sums and carries need to be added to generate final result.

In our design for generation of partial products we have used radix-4 Modified Booth Encoding (MBE) approach. Dadda algorithm is used to reduce the generated partial products into one row of final sums and one row of carries [8] [9]. Finally these two rows are added using an efficient parallel prefix adders [10].

#### 3.1 Modified Booth Encoding(MBE)

In our floating-point multiplier design a modified booth encoding scheme is used as it reduces the

<u>30<sup>th</sup> June 2014. Vol. 64 No.3</u>

© 2005 - 2014 JATIT & LLS. All rights reserved

| ISSN: 19 | 92-86 | 45      |          |            | www.jat | t.org                       | E-ISSN: 1817-3195    |
|----------|-------|---------|----------|------------|---------|-----------------------------|----------------------|
| number   | of    | partial | products | generated. | MBE     | Each partial product can be | obtained using block |

generates at most  $\left|\frac{N}{2}\right| + 1$  partial products, where

N is the number of bits. MBE algorithm involves following steps:

- 1. Pad the LSB with one zero.
- 2. Pad the MSB with 2 zeros if n is even and 1 zero if n is odd.
- 3. Divide the multiplier into overlapping groups of 3-bits.
- 4. Determine the partial products scale factor from the recoding table.
- 5. Compute the multiplicand multiples which is nothing but partial products.

Radix-4 recoding, the most common modified booth's recoding scheme and is used with the digit set  $\{-2, -1, 0, 1, 2\}$  is shown in Table I.

Table. I. Radix-4 Modified Booth's Recoding (for  $A \times B$ )

| Bits             | of multip      | Encoding<br>operation on |                |
|------------------|----------------|--------------------------|----------------|
| C <sub>i+1</sub> | C <sub>i</sub> | C <sub>i-1</sub>         | multiplicand A |
| 0                | 0              | 0                        | 0              |
| 0                | 0              | 1                        | +B             |
| 0                | 1              | 0                        | +B             |
| 0                | 1              | 1                        | +2B            |
| 1                | 0              | 0                        | -2B            |
| 1                | 0              | 1                        | -В             |
| 1                | 1              | 0                        | -B             |
| 1                | 1              | 1                        | 0              |

Each three consecutive bits of the multiplier B represents the input to the booth recoding block. This block selects the right operation on multiplicand A which can be shift or invert (-2B) or invert (-B) or zero or no operation (B) or shift (2B). Fig. 3 shows the generation of partial product using MBE.

For double precision floating –point multiplication two 53-bit (1 hidden bit + mantissa 52 bits) numbers are to be multiplied. If we use normal method for generation of partial products 53 partial products will be obtained. But by using MBE the partial products can be reduced to 27. Each partial product can be obtained using block shown in Fig. 3.

#### 3.2 Dadda Reduction Approach

After the generation of partial products, the partial products need to be added. Usually addition of these partial products consumes time. For this reason Dadda scheme is used to minimize the number of adder stages, by which delay can be minimized. Reduction of these partial products into two rows is done in stages using half adders and full adders. The reduction in size of each stage is calculated by working back from the final stage. Each preceding stage height must be not greater than  $\lfloor 3 \cdot successorheight/2 \rfloor$  [9]. Heights for the various stages can be 2, 3, 4, 6, 9, 13, 19, 28, 42, 63, etc., The dot diagram for Dadda reduction for 9 bit by 9 bit multiplication is shown in Fig. 4, this is



Fig. 3. One Partial Product Generator Using MBE



Fig. 4. 9bit by 9 bit Dadda Reduction

#### 3.3 Parallel Prefix Adder

The final sums and carries are added using parallel prefix adders as it offer a highly efficient solution to the binary addition and suitable for VLSI implementations [10]. Among the parallel

<u>30<sup>th</sup> June 2014. Vol. 64 No.3</u>

© 2005 - 2014 JATIT & LLS. All rights reserved

| ISSN: 1992-8645 | www.jatit.org | E-ISSN: 1817-3195 |
|-----------------|---------------|-------------------|
|-----------------|---------------|-------------------|

prefix adders, Kogge-Stone architecture is the widely used and the popular one.

Kogge-Stone adder is the parallel version of carry-look-ahead adder. The term prefix means the outcome of operation depends on the initial inputs. Parallel means the execution of an operation is in parallel. This is done by segmenting into smaller pieces, which are computed in parallel. Koggestone adder is considered as the fastest adder design possible [10]. 8-bit Kogge-Stone adder is as shown in Fig. 5. In our design to add the final sums and carries a 106-bit Kogge-Stone adder is used.



Fig. 5. (A) 8-Bit Kogge-Stone Adder, (B) Logic Implementation of Each Block of Kogge-Stone Adder [10]

#### 3.4 Multiplier Using MBE and Dadda

As discussed in section II mantissas need to be multiplied. For mantissa multiplication, both the concepts of MBE and Dadda reduction is used. Initially the two 53 bit mantissas are considered to generate partial products using Radix-4 Modified Booth Encoding Algorithm as discussed in section III.A. From this 27 rows of partial products will be obtained. These 27 partial products are reduced using Dadda reduction scheme. For example consider Fig. 6, here multiplication of two 10-bit numbers is taken and 5 partial products are generated using MBE. Then these 5 rows of partial products are reduced to two rows in 3 reduction stages, where 4, 3, 2 is the height of each stage. The same context is extended for reduction of 27 partial products. These 27 rows of partial products are reduced to 2 rows in 7 reduction stages, where 19, 13, 9, 6, 4, 3, 2 is height of each stage as we go down in the reduction scheme.



Fig. 6. 10 bit by 10 bit Booth Encoding with Dadda Reduction Scheme

# 4. RESULTS AND DISCUSSION

The simulation results of the floating-point Booth encoded Dadda multiplier is shown in Fig. 7. Table II provides the synthesis results for the multiplier designed. The Booth encoded Dadda floating-point multiplier has been compared with Booth encoded Wallace tree and simple Modified Booth Encoding Algorithm. Comparison chart for power, area, and delay is given in Fig. 8, Fig. 9, and Fig. 10 respectively. In contrast to the Wallace reduction, Dadda method does the least reduction necessary at each stage.

Dadda reduction scheme uses fewer half adders and full adders when compared to the Wallace reduction scheme. Thus Dadda multipliers area and power is less than Wallace. In conventional MBE, partial products are reduced using CSA trees but it consumes time. To overcome this Dadda reduction scheme is used.

Table II. Synthesis Results Using TSMC 45nmTechnology

| 64-bit Floating<br>–Point<br>Multiplier | Total<br>Power (<br>μW) | Area ( $\mu m^2$ ) | Delay<br>(ns) |
|-----------------------------------------|-------------------------|--------------------|---------------|
| MBE with<br>Dadda                       | 4619.23                 | 34880.00           | 6.19          |
| MBE with<br>Wallace                     | 4765.80                 | 35337.30           | 6.19          |
| MBE                                     | 4883.52                 | 35393.56           | 6.49          |

www.jatit.org

30<sup>th</sup> June 2014. Vol. 64 No.3 © 2005 - 2014 JATIT & LLS. All rights reserved



E-ISSN: 1817-3195

| Messages                         |         |                                                   |   |
|----------------------------------|---------|---------------------------------------------------|---|
|                                  | 1100000 | 11000000010110101011111100000010011001111         | 0 |
|                                  | 0001111 | 00011111111111101010111111100000010011001111      | 0 |
| /FMUL_dadda_test/F1/A_sign       | St1     |                                                   |   |
| /FMUL_dadda_test/F1/B_sign       | St0     |                                                   |   |
| /FMUL_dadda_test/F1/A_exponent   | 1000000 | 1000000101                                        |   |
|                                  | 0011111 | 0011111111                                        |   |
| /FMUL_dadda_test/F1/A_mantissa   | 1010101 | 101010111111110000001100111110000010010           |   |
| - /FMUL_dadda_test/F1/B_mantissa | 1010101 | 101010111111110000001100111110000010010           |   |
| - /FMUL_dadda_test/F1/bias       | 0111111 | 0111111111                                        |   |
| /FMUL_dadda_test/F1/sign         | St1     |                                                   |   |
| /FMUL_dadda_test/F1/exp_out      |         | 01000000110                                       |   |
| /FMUL_dadda_test/F1/man_out      |         |                                                   |   |
|                                  | 1010000 | 0 <u>10 1000000 1 100 1 100 10 1 10 1 1 10 10</u> | 0 |

Fig. 7. Simulation results for 64-bit floating-point multiplier

ISSN: 1992-8645



Fig. 8. Power comparison chart for the 64-bit floatingpoint multiplier



Fig. 9. Area Comparison Chart for the 64-Bit Floating-Point Multiplier



Fig. 10. Delay Comparison Chart for the 64-Bit Floating-Point Multiplier

### 5. CONCLUSION

This paper presents a low power and area efficient double precision floating point multiplier using Modified Booth Encoding and Dadda reduction. The design has been compared with MBE and MBE with Wallace. It is found that multiplier has reduced power and area and it consumes  $4619.23 \,\mu W$  and  $34880 \,\mu m^2$ . The comparison results in the previous sections prove that the design is efficient. This multiplier design is suitable for high performance floating point units or floating point multiply-add units of the coprocessors.

As the future work this work can be extended to achieve better speeds by using Residue Number System [14].

# **REFRENCES:**

- [1] D. Booth, "A Signed Binary Multiplication Technique," *Quarterly J. Mechanical and Applied Math.*, vol. 4, pp.236-240, 1951.
- [2] K.C. Bickerstaff, E. E. Swartzlander, and M. J. Schulte, "Analysis of column compression multipliers," in *proceedings of the 15<sup>th</sup> IEEE symposium on Computer Arithmetic*, pp. 33-39, June 2001.
- [3] W. J. Townsend, E. E. Swartzlander, and J. A. Abraham, "A comparison of Dadda and Wallace multiplier delays," in Advanced Signal Processing Algorithms, Architectures, and Implementations XIII, vol. 5205 of Proceedings of the SPIE, pp. 552–560, August 2003.
- [4] *IEEE Standard for Binary Floating-Point Arithmetic,* ANSI/IEEE Standard 754-1985, Reaffirmed Dec. 6, 1990, Inc., 1985.

# Journal of Theoretical and Applied Information Technology <u>30<sup>th</sup> June 2014. Vol. 64 No.3</u>

|                                                                                                                                                                                                            | © 2005 - 2014 JATIT & LLS. All rights reserved                                                         | JATIT             |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|-------------------|
| ISSN: 1992-8645                                                                                                                                                                                            | www.jatit.org                                                                                          | E-ISSN: 1817-3195 |
| [5] R. K. Montoye, E. Hokenel<br>"Design of the IBM<br>floating-point execution of                                                                                                                         | RISC System/6000<br>unit," IBM J. Res.                                                                 |                   |
| <ul> <li>Development, vol. 34, pp. 5</li> <li>[6] E. Hokenek, R. Montoye,<br/>"Second-generation RISC<br/>multiply-add fused," <i>IE</i><br/><i>Circuits</i>, vol. 25, no. 5, p<br/>1990.</li> </ul>       | and P. W. Cook,<br>floating point with<br><i>EE J. Solid-State</i>                                     |                   |
| 7] N. Quach, N. Takagi, and<br>IEEE rounding," <i>Stanford</i><br><i>Tech. Rep.</i> CSL-TR-91-459                                                                                                          | Univ., Stanford, CA,                                                                                   |                   |
| <li>[8] L. Dadda, "Some sch<br/>multipliers," <i>IEEE Transac</i><br/>vol. 13, pp.14-17, 1964.</li>                                                                                                        | emes for parallel                                                                                      |                   |
| <li>Waters, R. S. Swartzlander<br/>Complexity Wallac<br/>Reduction," Computers, IE<br/>vol.59, no.8, pp.1134-1137</li>                                                                                     | e Multiplier<br>EE Transactions on,                                                                    |                   |
| <ul> <li>10] Giorgos Dimitrakopoulc<br/>Nikolos, "High-Speed Pa<br/>Ling Adders," <i>IEEE</i><br/><i>Computers</i>, vol. 54, no. 2<br/>2005.</li> </ul>                                                    | s and Dimitris<br>arallel-Prefix VLSI<br><i>Transactions on</i>                                        |                   |
| 11] Dhanabal R, Bharathi V, S<br>and Sahoo S.K, "Design<br>of Low Power Floating Poi<br>International Journal of A<br>Research, ISSN 0973-4562<br>339-346, 2014.                                           | and Implementation<br>nt Arithmetic Unit",<br><i>Ipplied Engineering</i><br>2, vol. 9, no. 3, pp.      |                   |
| 12] Ushasree G, Dhanabal R,<br>"VLSI Implementation of a<br>Precision Floating Po<br>Verilog", Proceedings of I<br>Information and Communi<br>(ICT 2013), pp. 803-808, 20<br>13] Dhanabal R, Bharathi V, S | a High Speed Single<br>int Unit Using<br>EEE Conference on<br>cation Technologies<br>013.              |                   |
| Soman H, and Sahoo.S.K<br>low power ALU-DBG<br>Journal of Engineering and<br>no. 3, pp. 2172 – 2180, Jur                                                                                                   | , "Design of 16-bit<br>PU", International<br>I Technology, vol. 5                                      |                   |
| 14] Dhanabal R, Sarat Kumar<br>N.R.Samhitha, Neethu Ao<br>Mariam Jacob, "Impleme<br>Point MAC using Residue                                                                                                | Sahoo, Barathi V,<br>cha Cherian, Pretty<br>ntation of Floating<br>e Number System",<br>al and Applied |                   |