# Energy-Efficient Dynamic Circuit for High Fan-In OR Gates

Mohammad Asyaei<sup>1\*</sup>

Abstract-- Dynamic circuits offer a promising solution due to their low power consumption and high performance compared to static ones. However, dynamic circuits have their limitations, particularly in terms of robustness. This article presents a new dynamic circuit that reduces power consumption and delays for high-fan-in OR gates without significant loss of robustness. In the new dynamic circuit, the pull-down network (PDN) is split to increase the speed. Furthermore, employing a reference circuit decreases the conflict current that occurs between the PDN and keeper transistors. For this purpose, the reference circuit replicates the leakage current of the PDN. Therefore, the power and delay of the presented circuit are reduced. In addition, the sub-threshold leakage current and hence the leakage power are decreased in the PDN because of the body effect. The results of simulating high fan-in OR gates in a 90nm CMOS technology show 45% and 53% reduction in delay and power consumption, respectively while maintaining the same level of robustness as the conventional circuit for 64 inputs OR gates. Moreover, the tag comparator designed with the presented circuit shows a 2.65 times improvement in the figure of merit compared to the conventional design.

*Index Terms*- Dynamic circuit, Leakage Current, Low-Power Design, Robustness.

## I. INTRODUCTION

**D**YNAMICS circuit is preferred in the design of highperformance modules in modern microprocessors due to its higher speed in comparison with static circuits. However, the dynamic circuit has less immunity against noise sources and consumes more energy [1].

The conventional dynamic CMOS circuit is illustrated in Fig. 1. This circuit uses a keeper transistor to stop the voltage drop at the dynamic node caused by noise sources. Typically, the keeper transistor's size is increased to enhance its robustness. However, this method increases the conflict between the keeper and the pull-down network, yielding lower performance. This conflict is caused when the keeper is ON and at least one transistor in the pull-down network becomes ON. The performance is influenced by the conductance of both the keeper and the pull-down network. By adjusting the keeper size,

1. School of Engineering, Damghan University, Damghan, Iran

the desired delays can be achieved. Thus, the following ratio is defined [2]:

$$K = \frac{\mu_p(\frac{W}{L})_{Keeper-transistor}}{\mu_n(\frac{W}{L})_{Pull-Down-Network}}$$
(1)

where W is the width and L is the length of transistors. For the pull-down network, W and L are the width and length of each transistor. In addition,  $\mu$  is the mobility of the charge carriers. One significant limitation of the conventional dynamic circuit is its inability to handle a large number of parallel branches found in circuits with very high fan-in. This is because the numerous branches on the dynamic node led to an increase in parasitic capacitance. This parasitic capacitance is mostly composed of the drain capacitance of the parallel transistors [3].



Fig. 1. Conventional dynamic CMOS circuit

To achieve the desired robustness in high fan-in circuits, it is necessary to utilize strong keepers. Therefore, as more branches are added, the power consumption and delay are heightened. As a result, high fan-in gates are constructed using low fan-in cascaded gates [3].

Several dynamic circuits have been presented to decrease the

delay and power without robustness degradation [4-9].

This article proposes a method to reduce the propagation delay of dynamic circuit gates. To decrease the propagation delay and power consumption, the high fan-in gates are split into two smaller circuits.

Furthermore, the pull-down leakage current is replicated and utilized to decrease the conflict. To indicate the efficacy of the presented circuit, high fan-in OR gates and tag comparators are implemented by employing the presented circuit in a 90-nm CMOS technology.

The article's organization is arranged in the following manner: Section II explains the presented circuit, while Section III presents the simulation results and comparisons. Section IV describes a tag comparator design using the presented circuit. Finally, conclusions are drawn in Section V.

# II. PRESENTED DYNAMIC CIRCUIT

Figures 2 and 3 depict the presented circuit and its corresponding waveforms, respectively. Regarding the depicted circuit in Fig. 2, its operational information is presented in two phases, outlined below.



Fig. 2. The presented dynamic circuit

During the precharge phase, at which point the clock signal is in a state of low (CK='0') and its complement signal is in a state of high (CKB='1'), precharge transistor (MPr), discharge transistors (MD1 and MD2) and keeper transistors (MK1 and MK2) are ON. Therefore the dynamic node (D) is charged up to V<sub>DD</sub> and the output node is discharged to zero voltage. In this phase, transistors M1 and M2 are also OFF. Since the input signals are derived from the outputs of previous stages with the same dynamic logic, the input signals are set at zero volts after a short delay compared to the clock signal, as shown in Fig. 3. Consequently transistors in PDN1 and also PDN2 become off and finally voltages of A1 and A2 become 0V.

During the evaluation phase, at which point the clock signal increases to a high level (CK='1' and CKB='0'), the precharge transistor (MPr), and discharge transistors (MD1 and MD2) are OFF.

The rest of the transistors could be in an ON or OFF state based on the input vector. As a result, two distinct situations can arise during this phase, contingent on the input voltages. The first situation entails all input signals remaining at the low level, while in the second situation, at least one input signal rises to the high level.

In the first situation, no transistor is turned ON in the pulldown networks, resulting in solely leakage current. The dynamic node's voltage level is maintained by keeper transistors MK1 and MK2. As a result, the desired robustness is satisfied using these transistors.

During the second situation, at least one transistor is turned ON in one of the pull-down networks (for example PDN1 shown in Fig. 2), resulting in a rise in its current. This causes the voltage of node A1 to rise to  $V_{DD}$ - $V_{tn}$ , where  $V_{tn}$  is the threshold voltage of NMOS transistors in the PDN1. Once the voltage of node A1 reaches the threshold voltage of transistor M1, this transistor becomes ON and causes node D to discharge quickly to the ground. Hence, the node Out is charged up to  $V_{DD}$ .

Noted by reducing the maximum voltage on nodes A1 and A2 to  $V_{DD}$ - $V_{tn}$ , power consumption is decreased, but the delay is increased. For this reason, the PDN is split into smaller PDNs to compensate for this delay. On the other hand, choosing a proper size for transistors M1 and M2 can mitigate the speed drop. Besides, reducing the maximum voltage on nodes A1 and A2 has no impact on the robustness. This is because input noises generally have lower amplitudes and durations in comparison with input signals. Thus, the voltage on nodes A1 and A2 cannot reach its maximum voltage.



Fig 3. Transient waveform of the presented circuit for 64 inputs OR gate

As illustrated in Fig. 2, the current flowing through the transistors, MK1 and MK2, is directly related to the current of the reference circuit and replicated by an analog current mirror. This current is proportional to the amount of leakage current flowing in the PDNs. Unlike the conventional methods, the current of the keeper transistor is not constant and depends on the parallel branches in the PDNs in the presented circuit. Accordingly, the conflict between the keepers and PDNs is reduced without significant robustness degradation. Thus, both power and delay are reduced by the presented circuit. Additionally, the utilization of the replica current mirror helps to decrease the impact of process variations [4].

The reference circuit depicted in Fig. 2 is made up of transistors MPr and MNr, sized to replicate the leakage of the PDNs. Since the reference circuit is shared among all gates with the same structure, it does not contribute to an increase in the chip's area or power consumption.

In the presented dynamic circuit, power consumption is decreased through the implementation of the following concepts. 1) The conflict is minimized by employing a reference circuit to monitor the leakage current of the pull-down

$$I_{sub-th} = I_0 \left( 1 - exp(\frac{-V_{DS}}{V_t}) \right) exp\left( \frac{V_{GS} - V_{tn} + \eta V_{DS}}{nV_t} \right)$$

where

$$I_0 = \mu_0 C_{OX} \frac{W}{L} (n-1) V_t^2$$
(3)

 $V_{GS}$ ,  $V_{DS}$  and  $V_{tn}$  are the gate-source, drain-source, and threshold voltages of the NMOS transistor, respectively,  $V_t$  is the thermal voltage,  $\eta$  is the DIBL coefficient, n and  $C_{ox}$  are the sub-threshold swing coefficient and gate oxide capacitance of the transistor, respectively, and  $\mu_0$  is the zero-bias mobility.

In the presented circuit, when all inputs are at the low level and transistors in the PDNs are OFF, only sub-threshold current flows through the PDNs. As a result, nodes A1 and A2 will be charged slightly due to the leakage current. According to (2), the leakage current in the presented circuit is reduced because of two reasons. First, as the source voltage of the transistors in the PDNs is higher than their bulks (i.e.,  $V_{SB} > 0$ ), their threshold voltage is increased due to the body effect given by:

$$V_{tn} = V_{tn0} + \gamma \left( \sqrt{\phi_s + V_{SB}} - \sqrt{\phi_s} \right) \tag{4}$$

Where  $\phi_s$  is the surface potential and  $\gamma$  is the body effect coefficient [11]. Second, the gate-source voltage ( $V_{GS}$ ) of the transistors in the PDNs becomes negative and decreases the leakage current. Also, the robustness of the presented dynamic circuit is enhanced due to the body effect and through the utilization of a replica of the leakage current from the PDN.

According to the mentioned explanations, the presented circuit has the following advantages in comparison with the studied works.

The presented circuit has lower power consumption and better noise immunity due to the reduction of the voltage swing and the body effect in comparison with the circuit proposed in [4].

Compared to the circuits proposed in [5, 8], the presented circuit has fewer transistors and the keeper circuit creates less conflict, which reduces the power and delay of the proposed circuit.

The circuit technique proposed in [6] turns off the keeper circuit at the beginning of the evaluation phase to reduce the conflict at the cost of robustness degradation. This problem is solved in the presented circuit. Besides the number of transistors is decreased.

In the circuit proposed in [7], the power consumption is higher than the presented circuit due to the use of two inverters. Furthermore, there is no circuit technique to reduce the short circuit power [7].

Unlike the circuit proposed in [9], the presented circuit does not need additional supply voltage for proper operation, and as a result, its power consumption is less. network. 2) The body effect contributes to the reduction of leakage current.

Noted the leakage current is mostly due to the sub-threshold current stated as [10]:

A list of main novelties and contributions of this work can be listed as follows:

- The voltage swing on the dynamic node is lowered to reduce the power consumption.
- The drain capacitance is reduced by splitting the parallel transistors into two groups which results in lower power consumption.
- Using a reference circuit that properly tracks the leakage current of the PDNs, improves the performance of the proposed circuit without loss of robustness.
- The leakage current of the PDNs is decreased and noise immunity is increased due to the body effect.

#### **III. SIMULATION RESULTS**

The circuits under investigation, as well as the presented circuit technique, were simulated using HSPICE [12] with a 90nm CMOS model. The simulations were conducted with a supply voltage of 1V and an operating temperature of 110°C. Noted the supply voltage is chosen according to the technology model. Moreover, the bottleneck temperature for technology is chosen as the operating temperature to address the worst conditions. Also, the simulated circuits included 64 inputs OR gates, using both the examined circuits and the presented dynamic circuit.

Initially, the transistors are sized at their minimum value, and then their size is adjusted to obtain a noise floor of 0.3V for each design. The width ratio between the PMOS and NMOS transistors in the output inverters is established at 2:1 (Wp/Wn = 2). The initial dimensions of the other transistors match the minimum size, after which their dimensions were adjusted to achieve the desired level of robustness.

The simulations in this paper utilize the framework presented in [13]. As shown in Fig. 4, this framework includes testing a logic gate by utilizing its nominal copy to generate input signals that accurately reflect real-world conditions. In the evaluation phase, the delay between the input and output signals is calculated for each gate. The worst-case scenario is tested by raising only one input to  $V_{DD}$  while keeping other inputs at a low level. Under this circumstance, the gate's power consumption is also evaluated.

Additionally, the gates' robustness is compared using the unity noise average (UNA) metric. UNA is described as the input noise amplitude that produces an output noise with the same average voltage [14].

$$UNA = \{V_{in}: Vnoise_{Avg} = Voutput_{Avg}\}$$
(5)

Under the most challenging conditions for robustness, all inputs experience noise signals simultaneously [15].

Fig. 5 compares the power consumption of the studied circuits, demonstrating that the presented circuit lowers power usage. Each design's power consumption is normalized to that of the conventional circuit. As shown in this figure, the presented circuit mitigates power consumption by 53% compared to the conventional dynamic circuit. Compared to the best existing works, the percentage of improvement in power consumption is 1%.

In Fig. 6, the normalized delays of the studied circuits are displayed. The simulation results indicate that the presented circuit exhibits the shortest delay. Additionally, the delay of the presented circuit is 45% less than that of the conventional circuit. Moreover, the delay is decreased by 3% relative to the state-of-the-art.

To facilitate better comparison, it is important to consider the mentioned parameters, including power consumption, robustness, and delay, together. Hence, a figure of merit (FOM) is utilized to account for all these design parameters at the same time [16].



Fig. 5. Comparison of the normalized power consumption in the same robustness



Fig. 6. Comparison of the normalized delay in the same robustness

$$FOM = \frac{UNA}{P_t \times t_p^2} \tag{5}$$

where UNA,  $t_P$ , and  $P_t$  are normalized values of the UNA, the propagation delay, and the total power of the 64 inputs OR gates, respectively. Additionally, the propagation delay,  $t_P$ , is squared in the FOM to take into account the energy-delay product.

Fig. 7 depicts a comparison of the FOM for the presented dynamic circuit and the analyzed dynamic circuits for 64 inputs OR gates. This figure indicates that the FOM for the presented circuit surpasses that of the others. Furthermore, the presented dynamic circuit exhibits an FOM that is seven times higher than that of the conventional dynamic circuit. As seen in Fig. 7, the presented circuit exhibits at least a 42% improvement in FOM compared to other circuit designs.

# IV. TAG COMPARATOR DESIGN

To illustrate the effectiveness of the presented dynamic circuit compared to the conventional circuit, tag comparators are designed using both circuits. Tag comparators are essential components in cache memories, which serve as a critical path in modern microprocessors. Cache memories play a vital role in bridging the speed gap between off-chip main memory and high-speed processors. In a typical cache memory structure, there is a tag comparator, a tag SRAM, and a data SRAM. The cache cannot fulfill its function until the tag comparator provides a hit/miss signal to the cache controller. Consequently, high-performance tag comparators are crucial for the efficiency of modern microprocessors.

Conventional tag comparator designs are often implemented using high fan-in dynamic circuits. For microprocessors with 64 inputs and an 50 inputs physical address, a 40 inputs tag comparator is essential.

The conventional tag comparator is structured in two stages, utilizing low fan-in comparators alongside 5 inputs OR gate, as illustrated in Fig. 8 (a). Figure 8 (b) demonstrates a 40 inputs tag comparator implemented with the presented dynamic circuit.

The 40 input tag comparators are designed using a low threshold voltage 90nm CMOS technology model, operating at 1V with an output load of 5fF. Simulations are done under typical process conditions at 110°C. Additionally, the size of transistors is chosen to ensure a minimum UNA of 0.25 V in the worst-case scenario across all process variations. The propagation delay is defined as the time between the address signal and the miss signal under worst-case conditions. For this purpose, only one branch in the circuit discharges the dynamic node and also, one of the two series NMOS transistors is kept in the ON state.



Fig. 7. Comparison of normalized FOMs



Fig. 8. Implementation of the 40 inputs tag comparator using, (a) the conventional circuit and (b) the presented circuit

Table I summarizes the design parameters for the simulated 40 input comparators designed with both conventional and presented circuits. The table includes power consumption and delay for each design, along with their respective UNA and FOM values. To emphasize the advancements, all data are normalized in comparison to their conventional design counterparts.

As shown in Table I, the power consumption and delay of the presented circuit for the 40 inputs tag comparator are reduced by 33% and 25%, respectively, compared to the conventional dynamic circuit while maintaining the same noise immunity. Consequently, the presented circuit achieves a figure of merit (FOM) that is 2.65 times greater than that of its conventional dynamic circuit counterpart for the 40 inputs tag comparator.

To account for potential fluctuations, simulations are conducted across all process corners, at three different temperatures, and with five different supply voltages. The normalized results are displayed in Figures 9, 10, and 11. These figures compare the delay and power values of the presented circuit to those observed under typical process conditions at 1V and 110°C. Based on the simulation results, it can be concluded that the presented circuit performs satisfactorily across various process, voltage, and temperature conditions.

 TABLE I

 Comparing the Simulation Results of Tag Comparators

|                  | Conventional design | This work |
|------------------|---------------------|-----------|
| Power (µw)       | 71.8                | 48.2      |
| Normalized power | 1                   | 0.67      |
| Delay (ps)       | 189                 | 141.3     |
| Normalized delay | 1                   | 0.75      |
| UNA (V)          | 0.25                | 0.25      |
| Normalized UNA   | 1                   | 1         |
| Normalized FOM   | 1                   | 2.65      |



Fig. 9. The impact of process variation on the normalized delay and power of the presented tag comparator



Fig. 10. The impact of voltage variation on the normalized delay and power of the presented tag comparator



Fig. 11. The impact of temperature variation on the normalized delay and power of the presented tag comparator

# V. CONCLUSIONS

A new circuit technique was presented in this article to reduce the delay and power of high fan-in gates. To accomplish this objective, the PDN was split into smaller PDNs. As a result, the switching capacitance on the dynamic nodes was split to decrease both delay and power. Furthermore, the replica leakage current was utilized to minimize the conflict between the keeper and PDNs. To mitigate energy dissipation caused by the significant switching capacitance, the voltage swing is minimized in the presented circuit technique.

The analyzed circuits were simulated with a 90nm CMOS model. A 40-input tag comparator is also implemented using the presented circuit technique to reduce both delay and power consumption of high fan-in tag comparators during search operations. The simulation results demonstrated a substantial improvement in the design parameters by using the presented circuit.

The new dynamic circuit offers a promising solution for achieving energy-efficient circuits, especially for high fan-in gates that can meet the demands of modern microprocessors. However, the very large number of inputs is a potential limitation of the current work. A possible avenue for future research can be the use of Carbon Nanotube Field Effect Transistors (CNFETs) or other new transistors instead of CMOS ones to improve design parameters.

### References

- A. Kumar, R.K. Nagaria, "A new process variation and leakage-tolerant domino circuit for wide fan-in OR gates," *Analog Integr. Circuits Signal Process*, vol. 102, pp. 9-25, 2020.
- [2] A. Kumar and R.K. Nagaria, "Reduction of variation and leakage in wide fan-in OR logic domino gate," *Integration, the VLSI Journal*, vol. 89, pp. 229-240, 2023.
- [3] H. Mostafa, M. Anis, M. Elmasry, "Novel timing yield improvement circuits for high-performance low-power wide fan-in dynamic OR gates," *IEEE Trans. Circ. Syst.*, vol. 58, pp. 1785-1797, 2011. doi:10.1109/TCSI.2011.2107171
- [4] Y. Lih, N. Tzartzanis, W.W. Walker, "A leakage current replica keeper for dynamic circuits," *IEEE J. Solid-State Circ*, vol. 42, pp. 48-55, 2007. doi:10.1109/JSSC.2006.885051
- [5] H. Suzuki, C.H. Kim, K. Roy, "Fast tag comparator using diode partitioned domino for 64-bit microprocessors," *IEEE Trans. Circuits Syst.*, vol. 54, pp. 322–328, 2007. doi:10.1109/TCSI.2006.885998
- [6] AA. Angeline and VSK. Bhaaskaran, "Speed enhancement techniques for clock-delayed dual keeper domino logic style," *International Journal* of *Electronics*, vol. 107, pp. 1239-1253, 2020. doi: 10.1080/00207217.2020.1726486
- [7] R. Kannan and R. Rangarajan, "Low power noise immune node voltage comparison keeper design for high-speed architecture," *Microprocessors* and *Microsystems Journal*, vol. 77, pp. 103192, 2020. doi:10.1016/j.micpro.2020.103192
- [8] M. Asyaei, "New dynamic logic style for energy efficient tag comparators," *Microprocessors and Microsystems Journal*, vol. 90, pp. 104522, 2022. doi:10.1016/j.micpro.2022.104522
- [9] M. Asyaei, "New partitioned domino circuit for power-efficient wide gates," *Elsevier Integration, the VLSI Journal*, vol. 80, pp. 320-327, 2023. doi:10.1016/j.vlsi.2022.10.010
- [10] J.M. Rabaey, A.P. Chandrakasan, B. Nikolic, "Digital integrated circuits," 2nd ed., Upper Saddle River, NJ: Prentice hall Englewood Cliffs, 2003.
- [11] L. Ding and P. Mazumder, "On circuit techniques to improve noise immunity of CMOS dynamic logic," *IEEE Trans. on Very Large Scale Integ. (VLSI) Syst.*, vol. 12, pp. 910-925, 2004.
- [12] HSpice Simulation and Analysis Users Guide, [online] Available: https:// https://www.synopsys.com/.
- [13] M. Alioto, G. Palumbo, M. Pennisi, "Understanding the effect of process variations on the delay of static and domino logic," *IEEE Trans. on Very Large Scale Integ. (VLSI) Syst.*, vol. 18, pp. 697-710, 2010. doi:10.1109/TVLSI.2009.2015455
- [14] M. Asyaei, "A new low-power dynamic circuit for wide fan-in gates," Integration, the VLSI Journal., vol. 60, pp. 263-271, 2018. doi: 10.1016/j.vlsi.2017.10.010
- [15] TR. Kandpal, T. Pokhrel, S. Saini, A. Majumder "A variation resilient keeper design for high-performance domino logic applications," *Integration, the VLSI Journal*, vol. 88, pp. 1-9, 2023. doi: 10.1016/j.vlsi.2022.08.007
- [16] M. Asyaei, "A New Circuit Scheme for Wide Dynamic Circuits," *Inter. J. of Engineering Trans. B: Applications*, vol. 31, pp. 699-704, 2018. doi: 10.5829/ije.2018.31.04a.03