

Vol. 6, Issue 11, November 2018

# Design and Implementation of Dual Core and Quad Core Processor in Vertex 6 FPGA Using Pipelined RISC Architecture

## Mr. Rakesh M R

Assistant Professor, Dept. of ECE, AJIET Mangaluru, Karnataka, India

**Abstract:** This paper describes the design of a dual and quad core pipelined Reduced Instruction Set Computer (RISC) processor using Verilog HDL and its implementation in vertex 6 FPGA. Dual and quad core processor consumes less power with high efficiency. The processor has instruction and data memory spaces are both physically and logically separate called Harvard memory architecture. Single core have 5 bit opcode, 23 set of instructions and it is designed by using the pipelining which increase speed of processor. The processor design is done by using RISC architecture which involves Registers (General purpose), Arithmetic and Logical Unit (ALU), Memory (Data and program) with pipeline techniques. Load and Store instructions used to access memory. The comparison of dual and quad core RISC architecture in terms of structure and power consumption is explained in paper with design summary.

Keywords: RISC characteristics, Pipelining, Memory, Load and Store, dual core processor, quad core processor

## I. INTRODUCTION

The Processor is a logic circuitry (programmable) that takes in input, performs arithmetic or logical operations based on the program stored in memory and then produces output. Processor is also an electronic circuit that functions as the Central Processing Unit (CPU) of a computer, which controls complete computational tasks. The simple design technology and the decreasing cost of the integrated circuit leads RISC processor is most used in every area. The simple design gives higher performance, less cost, compatible systems. This technology is most used in many applications such as data processing, scientific and engineering applications and real-time control. In the present work, the design of dual core and quad core processor is presented. Each core is having design of a 4-bit data width Reduced Instruction Set Computer (RISC) processor. It has a complete instruction set, program and data memories, general purpose registers and a simple Arithmetical Logical Unit (ALU) for basic operations. In electronics, a HDL is a specialized computer language used to program the structure, design and operation of electronic circuits, and most commonly, digital logic circuits. Verilog is a Hardware Description Language; a textual format for describing electronic circuits and systems. Verilog is intended to be used for verification through simulation, for timing analysis, for test analysis and for logic synthesis. It means that by using HDL one can describe any hardware at any level. HDL's allows the design to be simulated earlier in the design cycle in order to correct errors or experiment with different architectures. Some of the advantages of HDL are technology-independent, easy to design and debug, and are usually more readable than schematics, particularly for large circuits. The dual core and quad core designed and synthesized with the help of Verilog HDL by considering RISC architecture which leads high efficiency, simple design and simulated using Modelsim. The design machine cycle instructions allow the processor to handle several instructions at the same time, so processor work at a high clock frequency, yields higher speed. The RISC architecture follows singlecycle instruction execution.

### II. RISC ARCHITECTURE

RISC processors use a small and limited number of instructions and it consume less power and has high performance. Each instruction is very simple and consistent also it use simple addressing modes with uniform fixed length instructions. RISC processors use large number of Registers for passing arguments and holding the local variables. In a RISC machine, the instruction set is based upon a load store approach. Only load and store instructions access memory. RISC is a type of microprocessor architecture that uses highly-optimized set of instructions. RISC reducing the cycles per instruction at the cost of the number of instructions per program Pipelining is one of the unique features of RISC. It is performed by overlapping the execution of several instructions in a pipeline fashion. It has a high performance advantage over CISC. The speed of the operation can be maximized and the execution time can be minimized. The block diagram of general RISC architecture is shown in figure 1.

## **IJIREEICE**



## International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering





Figure 1: Block diagram of RISC architecture

Pipelining used to improve both Cycle per Instruction and system performance. Pipelining allows a processor to work on different steps of the instruction at the same time leads many instructions can be executed in a small period of time. RISC designed by using four stages of pipeline techniques such as Fetch, Decode, Execute, and Store result. In Fetch Stage the content of the Program Counter is used to access memory and fetch the next instruction to be executed. In Decode Stage the instruction is decoded and the required operands are retrieved from the general purpose registers. In Execute Stage any calculations are performed. This includes effective address calculation for Load or Store instructions. The subsequently Program Counter value is also calculated during this stage of the pipeline so that branches, can be executed. In Store, the results of the calculation from the Execute stage, or the memory load from the Memory access stage, are updated into the general purpose registers. The instruction format is of register type and instruction of 17 bit with opcode 5bit and three operands of each 4 bit are shown in figure 2.



Figure 2: instruction format

Program Memory is implemented as a ROM consists of 23 set of instructions which is accessed by CPU during operation. Data Memory is implemented as RAM. Memory model pre-loads initial data from the Register file and dumps its content into file after the simulation is finished. If the instruction is a load, then read from memory using the effective address computed in the previous cycle. If the instruction is a store, then the data from the second register read from the register file is written into memory using the effective address.

## III. DUAL AND QUAD CORE ARCHITECTURE

A multi-core processor is a single computing component with two or more independent processing units called cores, which read and execute program instructions. The instructions are ordinary CPU instructions but the single processor can run multiple instructions on separate cores at the same time, increasing overall speed for programs amenable to parallel computing. In this architecture, special load and store instructions are used to move data between the processor's internal registers and memory

**i. Dual Core:** Dual core is a CPU that has two distinct processors that work simultaneously in the same integrated circuit. This type of processor can function as efficiently as a single processor but can perform operations up to twice as quickly because the operating system is able to handle most of the tasks in parallel. The block diagram of dual core processor which having two cores (works parallel) designed using RISC architecture is shown in figure 3.



## **IJIREEICE**

## International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering





Figure 3: Dual core RISC architecture

**ii. Quad Core:** Quad core is a CPU that has four distinct processors that work simultaneously in the same integrated circuit can perform operations up to four times as quickly because the operating system is able to handle most of the tasks in parallel. The block diagram of quad core processor which having four cores (works parallel) designed using RISC architecture is shown in figure 4.



Figure 4: Quad core RISC architecture

More cores also lead to higher power consumption by the processor. When the processor is switched on, it supplies power to all the cores, not just one at a time. But a quad core processor will draw more power. So in this paper new method is used for quad core design and it reduces the power consumption in an effective way. The computer with a Quad Core has more computational power versus the Dual Core computer. If one will be using this personal computer for highly intensive tasks, the Quad Core processor can be beneficial in many ways. Multitasking with different applications and programs on a Quad Core will be more seamless due to its ability to distribute the workload to the multiple processors.

## IV. VERTEX 6 FPGA

The Virtex-6 family is a 40nm process technology for compute-intensive electronic systems. Virtex-6 FPGAs deliver higher performance clocking and advanced power management technology while lowering cost, risk and power consumption. The Virtex series of FPGAs are based on Configurable Logic Blocks (CLBs), where each CLB is equivalent to multiple ASIC gates. Each CLB is composed of multiple slices, which differ in construction between Virtex families. Virtex FPGAs include an I/O Block for controlling input/output pins on the Virtex chip, that support a variety of signalling standards. All pins default to input mode (high impedance). I/O pins are grouped into I/O Banks, where each Bank can support a different voltage.



## **IJIREEICE**

## International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering

Vol. 6, Issue 11, November 2018

## V. RESULT

In this paper the design of dual core and quad core processor is shown with power analysis. RISC architecture is used to design the core. The designed cores are implemented in one of the FPGA board called Vertex 6.

**i. Dual Core Processor:** The figure 5 shows schematic diagram of dual core RISC processor which has two cores in single chip.



Figure 5: schematic diagram of dual core

The summary of power analysis of dual core RISC processor is shown in figure 6.

| A                    | В                | С | D       | E          | F             | G           | Н               | 1 | J       | К         | L           | М           | N           |
|----------------------|------------------|---|---------|------------|---------------|-------------|-----------------|---|---------|-----------|-------------|-------------|-------------|
| Device               |                  |   | On-Chip | Power (W)  | Used          | Available   | Utilization (%) |   | Supply  | Summary   | Total       | Dynamic     | Quiescent   |
| Family               | Virtex6          |   | Clocks  | 0.016      | 12            |             |                 |   | Source  | Voltage   | Current (A) | Current (A) | Current (A) |
| Part                 | xc6vbx75t        |   | Logic   | 0.001      | 401           | 46560       | 1               |   | Vccint  | 1.000     | 0.649       | 0.027       | 0.622       |
| Package              | ff484            |   | Signals | 0.002      | 616           | 1           |                 |   | Vccaux  | 2.500     | 0.045       | 0.000       | 0.045       |
| Grade                | C-Grade          | - | BRAMs   | 0.004      |               |             | *               |   | Vcco25  | 2.500     | 0.001       | 0.000       | 0.001       |
| Process              | Typical          | - | DSPs    | 0.000      | 2             | 288         | 1               |   | MGTAVcc | 1.000     | 0.303       | 0.000       | 0.303       |
| Speed Grade          | -3               |   | iOs     | 0.003      | 68            | 240         | 28              |   | MGTAVtt | 1.200     | 0.213       | 0.000       | 0.213       |
|                      |                  |   | Leakage | 1.296      |               |             |                 |   |         |           |             |             |             |
| Environment          |                  |   | Total   | 1.323      |               |             |                 |   | 1       |           | Total       | Dynamic     | Quiescent   |
| Ambient Temp (C)     | 50.0             |   | a.      | S          |               |             |                 |   | Supply  | Power (W) | 1.323       | 0.027       | 1.296       |
| Use custom TJA?      | No               | - |         |            | Effective TJA | Max Ambient | Junction Temp   |   |         |           |             |             |             |
| Custom TJA (C/W)     | NA               |   | Thermal | Properties | (C/W)         | (C)         | (C)             |   |         |           |             |             |             |
| Airflow (LFM)        | 250              | - |         | h          | 2.9           | 81.1        | 53.9            |   |         |           |             |             |             |
| Heat Sink            | Medium Profile   | - | 0       |            |               |             |                 |   |         |           |             |             |             |
| Custom TSA (C/W)     | NA               |   |         |            |               |             |                 |   |         |           |             |             |             |
| Board Selection      | Medium (10"x10") | - |         |            |               |             |                 |   |         |           |             |             |             |
| # of Board Layers    | 8 to 11          | - |         |            |               |             |                 |   |         |           |             |             |             |
| Custom TJB (C/W)     | NA               |   |         |            |               |             |                 |   |         |           |             |             |             |
| Board Temperature (C | NA               |   |         |            |               |             |                 |   |         |           |             |             |             |

Figure 6: power analysis table for dual core

The device utilization summary with estimated values for dual core processor is shown in figure 7.

| Device Utilization Summary (estimated values) |      |           |             |  |  |  |  |
|-----------------------------------------------|------|-----------|-------------|--|--|--|--|
| Logic Utilization                             | Used | Available | Utilization |  |  |  |  |
| Number of Slice Registers                     | 128  | 93120     | 0%          |  |  |  |  |
| Number of Slice LUTs                          | 431  | 46560     | 0%          |  |  |  |  |
| Number of fully used LUT-FF pairs             | 100  | 459       | 21%         |  |  |  |  |
| Number of bonded IOBs                         | 68   | 240       | 28%         |  |  |  |  |
| Number of Block RAM/FIFO                      | 1    | 156       | 0%          |  |  |  |  |
| Number of BUFG/BUFGCTRLs                      | 4    | 32        | 12%         |  |  |  |  |
| Number of DSP48E1s                            | 2    | 288       | 0%          |  |  |  |  |

Figure 7: Device utilization summary for dual core





Vol. 6, Issue 11, November 2018

**ii. Quad Core Processor:** The figure 8 shows schematic diagram of quad core RISC processor which has two cores in single chip.



Figure 8: schematic diagram of quad core

The summary of power analysis of quad core RISC processor is shown in figure 9.

| A                    | В                | С | D       | E          | F             | G             | Н               | Î. | J       | К         | L           | М           | N           |
|----------------------|------------------|---|---------|------------|---------------|---------------|-----------------|----|---------|-----------|-------------|-------------|-------------|
| Device               |                  |   | On-Chip | Power (W)  | Used          | Available     | Utilization (%) |    | Supply  | Summary   | Total       | Dynamic     | Quiescent   |
| Family               | Virtex6          |   | Clocks  | 0.028      | 21            | ) <del></del> | <u>.</u>        |    | Source  | Voltage   | Current (A) | Current (A) | Current (A) |
| Part                 | xc6vbx75t        |   | Logic   | 0.002      | 696           | 46560         | 1               |    | Vccint  | 1.000     | 0.667       | 0.045       | 0.62        |
| Package              | ff484            |   | Signals | 0.004      | 1069          | -             |                 |    | Vccaux  | 2.500     | 0.045       | 0.000       | 0.04        |
| Grade                | C-Grade          | - | BRAMs   | 0.006      |               |               |                 |    | Vcco25  | 2.500     | 0.001       | 0.000       | 0.00        |
| Process              | Typical          | - | DSPs    | 0.001      | 4             | 288           | . 1             |    | MGTAVcc | 1.000     | 0.303       | 0.000       | 0.30        |
| Speed Grade          | -2               |   | 10s     | 0.004      | 125           | 240           | 52              |    | MGTAVtt | 1.200     | 0.213       | 0.000       | 0.21        |
|                      |                  |   | Leakage | 1.296      |               |               |                 |    |         |           |             | -           |             |
| Environment          | -02              |   | Total   | 1.341      |               |               |                 |    |         |           | Total       | Dynamic     | Quiescent   |
| Ambient Temp (C)     | 50.0             |   | 2       |            |               |               | - N             |    | Supply  | Power (W) | 1.341       | 0.045       | 1.29        |
| Use custom TJA?      | No               | - |         |            | Effective TJA | Max Ambient   | Junction Temp   |    |         |           |             |             |             |
| Custom TJA (C/W)     | NA               |   | Thermal | Properties | (C/W)         | (C)           | (C)             |    |         |           |             |             |             |
| Airflow (LFM)        | 250              | - |         |            | 2.9           | 81.0          | 54.0            |    |         |           |             |             |             |
| Heat Sink            | Medium Profile   | - |         |            |               |               |                 |    |         |           |             |             |             |
| Custom TSA (C/W)     | NA               |   |         |            |               |               |                 |    |         |           |             |             |             |
| Board Selection      | Medium (10"x10") | - |         |            |               |               |                 |    |         |           |             |             |             |
| # of Board Layers    | 8 to 11          | - |         |            |               |               |                 |    |         |           |             |             |             |
| Custom TJB (C/W)     | NA               |   |         |            |               |               |                 |    |         |           |             |             |             |
| Board Temperature (( | NA               |   |         |            |               |               |                 |    |         |           |             |             |             |

The Power Analysis is up to date.

Figure 9: power analysis table for quad core



Vol. 6, Issue 11, November 2018

## The device utilization summary with estimated values for quad core processor is shown in figure 10

| Device Utilization Summary (estimated values) |      |           |             |  |  |  |  |
|-----------------------------------------------|------|-----------|-------------|--|--|--|--|
| Logic Utilization                             | Used | Available | Utilization |  |  |  |  |
| Number of Slice Registers                     | 242  | 93120     | 0%          |  |  |  |  |
| Number of Slice LUTs                          | 750  | 46560     | 1%          |  |  |  |  |
| Number of fully used LUT-FF pairs             | 182  | 810       | 22%         |  |  |  |  |
| Number of bonded IOBs                         | 125  | 240       | 52%         |  |  |  |  |
| Number of Block RAM/FIFO                      | 2    | 156       | 1%          |  |  |  |  |
| Number of BUFG/BUFGCTRLs                      | 5    | 32        | 15%         |  |  |  |  |
| Number of DSP48E1s                            | 4    | 288       | 1%          |  |  |  |  |

Figure 10: Device utilization summary for quad core

The dual core and quad core designed using Verilog HDL is implemented in vertex 6 FPGA. The table 1 shows some of the important parameters values for both dual and quad core processors

|                                                      | Dual core                                     | Quad core                                     |
|------------------------------------------------------|-----------------------------------------------|-----------------------------------------------|
| Device                                               | xc6v1x75t-3-ff484                             | xc6v1x75t-3-ff484                             |
| Speed grade                                          | -3                                            | -3                                            |
| Minimum period                                       | 3.713ns (Maximum<br>Frequency:<br>269.347MHz) | 3.713ns (Maximum<br>Frequency:<br>269.347MHz) |
| Min į/p arrival time before<br>clock                 | 1.145ns                                       | 1.195ns                                       |
| Max o/p required time after<br>clock                 | 0.572ns                                       | 0.572ns                                       |
| Maximum combinational path<br>delay                  | -                                             | -                                             |
| Clock period                                         | 3.713ns ( Frequency:<br>269.347MHz)           | 3.713ns (Frequency:<br>269.347MHz)            |
| Total number of<br>path/destination port             | 1362/276                                      | 7516/544                                      |
| Autotimespec constraints for<br>clock net, clk BUFGP | Worst case slack:<br>0.092ns                  | Worst case slack:<br>0.090ns                  |
|                                                      | Best case achievable:<br>4.032ns              | Best case achievable:<br>4.932ns              |
| Number of register                                   | 132                                           | 264                                           |
| Vccint                                               | 0.00276W                                      | 0.00478W                                      |
| Vccaux                                               | 0.00001W                                      | 0.00001W                                      |
| Vcco onchip thermal                                  | 0.00013W                                      | 0.00025W                                      |
| Optimization goal                                    | Speed                                         | Speed                                         |
| Total power for 100MHz clock<br>frequency            | 1.323                                         | 1.348                                         |

Table 1: parameter values for dual core and quad core

## VI. CONCLUSION

In this paper the design of dual core and quad core processor is shown with the help of RISC architecture. The designed cores are implemented in Vertex 6 FPGA. The Xilinx ise 14.4 is used with Verilog code. This is very helpful to reduce power consumption, cost with high efficiency and speed. In future this can be extended to more cores and also can be implemented in other FPGA technology. And more instructions can be added in RISC architecture.





Vol. 6, Issue 11, November 2018

### REFERENCES

- [1]. Rakesh M. R "Design and Simulation of Four Stage Pipelining Architecture Using the Verilog" published in International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064, Volume 3 Issue 3, March 2014, page 108 to 112
- [2]. O. V. P. R. Shiva Kumar P. V. S. K. Pavan Mithin Varghese T. Madhulika Reddy, Design & Implementation of 16-bit RISC Processor IJSRD -International Journal for Scientific Research & Development Vol. 6, Issue 04, 2018 | ISSN (online): 2321-0613
- [3]. V. R. Gaikwad Design, Implementation and Testing of 16 bit RISC Processor IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 2, Issue 2 (Mar. Apr. 2013), PP 01-04
- [4]. Rakesh M.R "RISC Processor Design in VLSI Technology using the Pipeline technique" published in International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering, Vol. 2, Issue4, April 2014, page,1359 to 1363
- [5]. Akshatha rai k, basavaraj h j, Novel design of dual core RISC architecture implementation International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-2, Issue-5, May-2015
- [6]. Jaina Patel1, RISC (16 Bit) Processor Design using Verilog in Modelsim International Journal of Innovative Research in Science, Engineering and Technology Vol. 6, Issue 10, October 2017
- [7]. Rakesh M. R, Ajeya B, Mohan A.R "Novel Architecture of 17 Bit Address RISC CPU with Pipelining Technique Using Xilinx in VLSI Technology", International Journal of Engineering Research and Applications, ISSN : 2248-9622, Vol. 4, Issue 5 (Version 5), May 2014, pp.116-121
- [8]. http://en.wikipedia.org/wiki/RISC\_architecture.

## BIOGRAPHY



**Mr. Rakesh M.R** received his B.E degree in Electronics and Communication from KVG College of Engineering Sullia and received his M. Tech degree in Electronics from Canara College of Engineering Bantwal. Currently he is working in AJIET Mangaluru. His areas of interest are VLSI and Image Processing.