

# Design of 2-D Discrete Wavelet Transform by using FPGA Radix-4 Booth Multiplier

Mr. Hemantkumar H. Nikhare<sup>1</sup> Prof. Ashish Singhadia<sup>2</sup>

Scholar, Electronics & Telecommunication, VIT, Bhopal, India<sup>1</sup>

Professor, Electronics & Telecommunication, VIT, Bhopal, India<sup>2</sup>

**Abstract:** A new architecture, namely, Multiplier-and accumulator (MAC) based Radix-4 Booth Multiplication Algorithm for high-speed arithmetic logics have been proposed and implemented on Xilinx FPGA device. By combining multiplication with accumulation and devising a hybrid type adder the performance was improved. The modified booth encoder will reduce the number of partial products generated by a factor of 2. Fast multipliers are essential parts of digital signal processing systems. The speed of multiply operation is of great importance in digital signal processing as well as in the general purpose processors. The number to be added is the multiplicand, the number of times that it is added is the multiplier, and the result is the product. Each step of addition generates a partial product.

Keywords: - VLSI, FPGA, Carry Select Adder(CSA), CarryLook Ahead Adder (CLA), ASM

## I. INTRODUCTION

The digital signal processing methods nonlinearfunctions such as discrete cosine transform (DCT) ordiscrete wavelet transform (DWT). Because they arebasically accomplished by repetitive application ofmultiplication and addition, the speed of themultiplication and addition arithmetic's determines theexecution speed and performance of the entirecalculation.

For high-speed multiplication, the modifiedradix-4 Booth's algorithm (MBA) is commonly used.Encoder, to compress the partial products, and finaladder. The most effective way to increase the speed of amultiplier is to reduce the number of the partial productsbecause multiplication proceeds a series of additions for the partial products.

To reduce the number of calculationsteps for the partial products, MBA algorithm has been applied [1][5].

## II. RADIX-4 BOOTH MULTIPLIER

With the help of recent advances in multimedia and communication systems, real-time signal processing like audio signal processing, video/image processing, or largecapacity data processing are increasingly being demanded.

The multiplier and multiplier-and-accumulator (MAC) are the essential elements of the digital signal processing such as filtering, convolution, and inner products.

Most digital signal processing methods use nonlinear functions such as discrete cosine transform (DCT) or discrete wavelet transform (DWT), because they are basically accomplished by repetitive application of multiplication and addition.

Booth recoding is a technique for high speed multiplication, by recoding the bits that are multiplied. The number of partial products reduced to half, using the technique of radix-4 Booth recoding [1],[2][6].

# **III. RADIX-4 MODIFIED BOOTHALGORITHM**

use The modified Booth algorithm minimizes the number of partial products by half. We used the modified Booth hey encoding(MBE) scheme.

It is known as the most efficient Boothencoding and decoding scheme. To multiply,multiplicand 'X' by multiplier 'Y' using the modified Boothalgorithm. First group the multiplier bits 'Y' by three bits and encoding into one of  $\{-2, -1, 0, 1, 2\}$ .

Prior to convert he multiplier, a zero is appended into the LeastSignificant Bit (LSB) of the multiplier. Table I shows therules to generate the encoded signals by MBE scheme and Fig. 2 (a) shows the corresponding logic diagram. The Boothdecoder generates the partial products using the encoded signals[1][7].



Fig1. Basic arithmetic steps of multiplication and accumulation[1],[2][8].

INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL RING Vol. 2, Issue 10, October 2014



Fig 2. Booth Recoding [2][9].

Table 1 : Radix 4 Booth Table

| Select Line | Partial Products         |
|-------------|--------------------------|
| (Encoding)  | (Operation)              |
| 000         | Add 0                    |
| 001         | Add multiplicand         |
| 010         | Add multiplicand         |
| 011         | Add 2* multiplicand      |
| 100         | Subtract 2* multiplicand |
| 101         | Subtract multiplicand    |
| 110         | Subtract multiplicand    |
| 111         | Subtract 0               |

The recoding is done by appending one zero to the Least Significant Bit (LSB) and extending the Most Significant Bit (MSB) with the sign bit if necessary. Then the grouping of 3 bits from the LSB is done as shown in Fig 2. The obtained result is -1 -1 0 -2. This result is multiplied with the multiplier and the number of partial product is reduced [2][11].

## **IV. VLSI ARCHITECTURE IMPLEMENTATION**

The architecture of the proposed ECAT Booth multiplier is designed by using tree-based carry save reduction followed by parallel-prefix carry-propagate addition architecture. The whole architecture of the proposed ECAT Booth multiplier is shown in Fig.accumulator. In final adder both sum and carry is added to produce the 2N bits product.



Fig3. VLSI Architecture [2][12].

# V. ARCHITECTURE OF A MULTIPLIER

A multiplier can be divided into three operational steps:

- i. Radix-4 Booth algorithm in which a partial product is generated.
- ii. Carry save adder and Accumulator
- iii. The final addition in which the final multiplication result is produced by adding the sum and the carry



Fig. 4: MAC Multiplier[1][11].

Generally if N-bit data of multiplicand 'X' is multipliedwith N-bit multiplier 'Y' then it generates Npartial products. But if Radix-4 booth algorithm is used then number of partial products will be reduced to N/2. In addition, the signed multiplication based on 2's complement numbers is also possible.

$$X = -2^{N-1}x_{N-1} + \sum_{i=0}^{N-1} x_i 2^i, \quad x_i \in 0, 1$$
$$X \times Y = \sum_{i=0}^{\frac{N}{2}-1} d_i 2^{2i} Y$$
Where di =  $-2x_{2i+1} + x_{2i} + x_{2i-1}$ 

$$P = X \times Y + Z = \sum_{i=0}^{N/2-1} di \, 2^i \, Y + \sum_{j=0}^{2N-1} zi \, 2^j$$

In CSA, the sign extension is used in order to increase thebit density of the operands. Half adder is used to generate sum and carry in CSA. The generated carry is stored in accumulator [1][11].

## VI. CARRY SAVE ADDER

The summing of the partial products in parallel using atree of carry save adder. Half adders are used to implement the Carry save adder. First of all the partial products should be arranged such as. The second partial product had to be shifted by two bits before adding to the first partial product. The third partial product will be shifted left by four bits where as fourth partial product will be shifted by six bits. After rearrangement partial products will be added.In final adder both sum and carry is addedto produce the 2N bits product [1], [4][15].

INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL RING Vol. 2, Issue 10, October 2014



Fig. 5. Carry lookahead adder [4][13].

#### **VII. 2-D DISCRETE WAVELET TRANSFORM**

The main challenges in the hardware architectures for 1-D DWT are the processing speed and the number of multipliers and adders while for 2-D DWT it is the memory issue that dominates the hardware cost and the architectural complexity. A 2-D DWT is a separable transform where 1-Dwavelet transform is taken along the rows and then a 1-D wavelet transform along the columns.The 2-DDWT operates by inserting array transposition between the two 1-DDWT. The rows of the array are processed first with only one level of decomposition.

This essentially divides the array into two vertical halves, with the first half storing the average coefficients, while the second vertical half stores the detail coefficients. This process is repeated again with the columns, resulting in four sub-bands within the array defined by filter output.as in three-level decomposition.

The LL sub-band represents an approximation of the original image, the LL1 sub-band can beconsidered as a [2] M. Gopinathan& D. Jessintha (2013) "An Error Compensated DCT Architecture with Booth Multiplier", IJAEEE- ISSN: 2278-8948, 2:1 sub-sampled version of the original image. The other three sub-bands HL1,LH1, and HH1 contain higher frequency detail information. This process is repeated for as manylevels of decomposition as desired. The JPEG2000 standard specifies five levels ofdecomposition, although three are usually considered acceptable in hardware.

JPEG2000, two points have to be taken intoaccount. Firstly, the 1-D DWT generates the control signal memory to compute 2-D DWT and managesthe internal memory access. Secondly, we need to store temporary results generated by 2-Dcolumn filter. The amount of the external memory access and the area occupied by the embeddedinternal buffer are considered the most critical [8] Kishore Andra, Chaitali Chakrabathi, and Tinku Acharya (2002), " A issues for the implementation of 2D-DWT.

As thecache is used to reduce the main memory access in the general processor architectures, in similarway, the internal buffer is used to reduce the external memory access for 2D-DWT. However, theinternal buffer would <sup>[10]</sup> occupy much area and power consumption. Three main architecture design approaches were proposed in the [11] L.Hongyu, M.K. Mandal, B.F. Cockburn,(2004), "Efficient literature with the aim to implement efficiently the 2D-DWT level by level, line-based and block based architectures. T

hese architectures address this difficulty in different ways. A typical level-by-level architecture as uses a single processing module that first processes the rows, and then the columns.Intermediate values between row and column processing are stored in memory.

Since this memory must be large enough to keep wavelet coefficients for the entire image, external memory is usually used.

Access to the external memory is sometimes done in rowwise order, and sometimes in column-wise order, so highbandwidth access modes cannot be used. 157 external memory access can become the performance bottleneck of the system for the given J level of decomposition [3][12][14].

#### VIII. CONCLUSION

In this paper, we have analyzed the existing Lifting based 2-dimensional DiscreteWavelet Transform based on the hardware complexities and FPGA Radix 4-booth multiplier and computational time for the differentarchitectures using Lifting schemes.

The architectures represented vary from direct mapped, folded, recursive to multilevel folded architectures. This review is useful for exploring a new method of pipelined architectures capable of handling multiple data streams suitable for application in image andvideo processing multimedia real time applications.

#### REFERENCES

- [1] G. Jaya Prada, N.C. Pant "Design and Verification of faster Multiplier", vol.1 issue 3, pp. 683-686, ISSN: 2248-9622.
- Volume-2, Issue-4, 2013.
- [3] UshaBhanu. N &Dr. A. Chilambuchelvan (2012) "A Detailed Survey on VLSI Architecture for Lifting based DWT for efficient hardware implantation",(VLSICS) Vol.3 No.2.
- [4] S. ShaffiullaBasha, Syed. Jahangir Badashah.(2012) "Design and implementation of Radix-4 based High Speed multiplier For ALU's using minimal partial product", IJAET- ISSN: -1963-Vol.4, Issue 1, pp. 314-325.
- In order toextend the 1-D filter to compute 2-D DWT in [5] Keshab K. Parhi and Takao Nishitani,(1993), "VLSI Architectures for Discrete WaveletTransforms".IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 1, No. 2.
  - [6] I. Daubechies, W. Sweldens, (1998), "Factoring wavelet transform into lifting steps", J. Fourier Anal.Appl. 4, 247-269.
  - Chao-Tsung Huang, Po-Chih Tseng, and Liang-Gee Chen,(2004), ".Flipping Structure: An EfficientVLSI Architecture for Lifting-Based Discrete Wavelet Transform", IEEETransactions On SignalProcessing, Vol. 52, No. 4.
  - VLSI architecture for Liftingbased Forward and inverse wavelet transform", IEEE Trans, on Signal processing, VOL.50.No.4.
  - [9] T. Acharya, C. Chakrabarti, (2006), "A survey on lifting-based transformsarchitectures", J. VLSI discrete wavelet Signal Process.42, 321-339.
  - S. Barua, J.E. Carletta, K.A. Kotteri and A.E. Bell,(2004), " An efficient architecture for lifting-basedtwo-dimensional discrete wavelet transforms". The VLSI Journal 38, 341-352.
  - architectures for 1-D and 2-D lifting-basedwavelet transforms", IEEE Trans. Signal Process.52 (5), 1315-1326.
  - S.A. Salehi, S. Sadri, (2009), ".Investigation of Lifting-Based [12] Hardware Architectures for DiscreteWavelet Transform",Journal of Circuits Systems and Signal Processing, vol. 28.

INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL

Vol. 2, Issue 10, October 2014

- [13] M.S.Bhuyan, Nowshad Amin, MD.AzrulHasniMadesa,(2007), ".An Efficient VLSI Implementationof Lifting Based Forward Discrete Wavelet Transform Processor for JPEG2000", Proceedings of the7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China.
- [14] Gab Cheon Jung, Duk Young Jin, Seong Mo Park, (2004), "An Efficient Line based VLSI Architecture for 2D Lifting DWT". IEEE.
- [15] P.Y Chen,(2004), "VLSI implementation for one-dimensional multilevel lifting based wavelettransform",IEEE Trans. on Computers, vol. 53, pp.386-398, 2004.

# BIOGRAPHY



Mr. Hemantkumar H. Nikhare M. Tech. (Digital & Communication.) Scholar in Vedica Institute of Technology Bhopal, under R.G.P.V. Bhopal.And Completed B.E. (Electronics Engg.) from Rajiv Gandhi College of Engineering, Research & Chandrapur under R.T.M. Nagpur

Technology University.