

# Journal of Modeling & Simulation in Electrical & Electronics Engineering (MSEEE)

Journal homepage: https://mseee.semnan.ac.ir/

ISSN: 2821-0786



# A Power-Efficient Noise-Tolerant Circuit Technique for Wide Dynamic Gates

Mohammad Asyaei\*1

Abstract— In this article, a dynamic circuit technique is suggested to lower the consumed power of wide gates without speed degradation. In the proposed circuit, the voltage swing on the evaluation network is reduced to decrease the consumed power of wide gates. To reduce the consumed power and delay of the suggested circuit, the structure of the output inverter is modified by utilizing the voltage of the footer node of the evaluation network. A current mirror is employed to decrease the contention between the evaluation network and keeper transistors, replicating the evaluation network's leakage current. In addition, the subthreshold leakage current is reduced due to the stacking effect. As a result, the leakage power is lowered and the noise immunity is improved. Wide OR gates are simulated using HSPICE in a 90nm CMOS technology. Simulation results of wide OR gates demonstrate 51% power reduction and 1.82× noise immunity improvement at the same delay as the conventional dynamic circuit for 32-input OR gates. Moreover, a 128-input multiplexer is designed by employing the suggested dynamic circuit. The results demonstrate 13% power reduction and 33% speed improvement for the suggested 128input multiplexer compared to the conventional multiplexer at the same robustness.

Index Terms— Dynamic circuit, wide gates, leakage current, noise immunity.

# I. INTRODUCTION

In new technologies, reducing consumed power has become an important topic in high-performance applications. This issue is especially critical in portable devices, which do not always have access to an unlimited power source. These devices may remain idle for significant periods, during which most consumed power is attributed to leakage power. Furthermore, as the trend of reducing channel length continues, leakage currents increase, especially subthreshold leakage current increases exponentially. This

situation causes sub-threshold leakage currents to constitute a significant portion of the total power loss [1].

The leakage issue in new technologies becomes particularly critical in gates with very high fan-in, which are commonly used in registers and other circuits with many inputs. These circuits have a considerable impact on system performance [2]. For example, registers, one of the most critical modules in microprocessors, require very high fan-in gates to select a register. Therefore, the design of high fan-in gates, especially wide OR gates, has gained the attention of researchers. Consequently, new ideas for designing circuits to operate in nanoscale CMOS technologies are required.

To reduce the power of VLSI circuits, it is essential to identify the components that affect consumed power. The following equations show that the power loss in digital CMOS circuits includes three main parts: dynamic power resulting from the charging and discharging of capacitive nodes, short-circuit power, and static power from leakage currents [3].

$$P_{Avg/gate} = P_{Switching} + P_{Short-circuit} + P_{Leakage}$$
 (1)

$$P_{Avg/gate} = \alpha_{0 \to 1} C_L V_{swing} V_{DD} f_{clk} + I_{sc} V_{DD} + I_{leakage} V_{DD}$$
 (2)

The first component of consumed power, denoted as  $P_{switching}$ , represents the switching or dynamic consumed power, where  $C_L$  is the load capacitance,  $f_{clk}$  is the clock frequency,  $V_{swing}$  represents the maximum voltage swing, and  $\alpha$  (ranging from 0 to 1) is the activity factor of the node (the average number of times the node switches during a clock period and consumes power). This component is generated when the circuit's capacitive load is charged by PMOS transistors switching from '0' to '1'. Referring to (2), the switching power is related to the voltage swing. Therefore, reducing the voltage has a significant impact on the consumed power. Consequently, one of the methods for reducing

Received; 2025-07-22 Revised; 2025-09-24 Accepted; 2025-09-29

• Corresponding author Email: m.asyaei@du.ac.ir

#### Cite this article as:

Asyaei, M. (2025). A Power-Efficient Noise-Tolerant Circuit Technique for Wide Dynamic Gates. *Journal of Modeling & Simulation in Electrical & Electronics Engineering (MSEEE)*, 5(2), 29-36.

DOI:https://doi.org/10.22075/MSEEE.2025.38160.1221

<sup>&</sup>lt;sup>1</sup> School of Engineering, Damghan University, Damghan, Iran.

P<sub>switching</sub> is to decrease the voltage swing of nodes with high capacitance, which is utilized in this article.

The second component ( $P_{short-circuit}$ ) occurs when a direct path between the voltage source and ground is established. The final component ( $P_{leakage}$ ) is due to leakage current ( $I_{Leakage}$ ), which is mainly determined by manufacturing technology considerations. Leakage current, especially subthreshold leakage, significantly increases channel length reduction in new technologies and also with rising temperatures, resulting in decreased noise immunity.

In this article, a circuit technique is suggested to lower the switching power of dynamic circuits. The voltage swing of the evaluation network is reduced to decrease switching power. To minimize contention, the leakage current of the evaluation network is mirrored to control the keeper circuit's current, and the output inverter's structure is modified. Additionally, due to the stacking effect, the leakage current of dynamic gates is lowered, and the noise immunity is improved.

The rest of the article is organized as follows: Firstly, Section II provides a review of previous works. Section III describes and examines the suggested dynamic circuit. Section IV presents and compares the simulation results. The 128-input multiplexers are simulated and compared in Section V. Section VI provides the article's conclusion.

#### II. REVIEW OF PREVIOUS WORKS

Dynamic circuits are widely used to implement wide gates. Wide gates are utilized in critical units of microprocessors, including register banks and content-addressable memories. Fig. 1 shows a conventional dynamic circuit. Assuming the size of the pull-down network transistors remains constant in this circuit, increasing the size of the keeper transistor enhances the circuit's robustness. Therefore, the keeper ratio (*K*) is defined as follows [4]:

$$K = \frac{\mu_p \left(\frac{W}{L}\right)_{Keeper-transistor}}{\mu_n \left(\frac{W}{L}\right)_{Pull-Down-Network}} \tag{3}$$

where L is the length and W is the width of the MOSFET. Also,  $\mu_n$  is the mobility of electrons and  $\mu_p$  is the mobility of holes. Increasing the size of the keeper transistor is a common way to enhance the robustness of dynamic circuits. Nevertheless, increasing the keeper size raises the contention between the keeper and the pull-down network during the evaluation mode, increasing consumed power and reducing circuit performance. Thus, upsizing the keeper transistor to enhance the robustness will result in a power-delay trade-off.

Various circuit designs have been proposed in the literature to tackle these issues. Conditional keeper domino (CKD) is one of them that utilizes clock delay to manage the strong keeper, as illustrated in Fig. 2(a) [5]. The circuit comprises a weak keeper transistor (K1) and a strong keeper transistor (K2). Consequently, the voltage level of the dynamic node is maintained by K1 at the beginning of the evaluation mode and by K2 for the remainder of this mode. However, this circuit faces challenges in minimizing the delay of inverters and NAND gates. Although increasing the size of inverters as the delay element enhances robustness, it leads to a considerable rise in power dissipation.

The high-speed domino (HSD) shown in Fig. 2(b) features

a distinct design for keeper control utilizing clock delay [6]. In this scenario, when the dynamic node is at a high level during the evaluation mode, the transistor Mn1 turns ON, activating transistor MK as the keeper; otherwise, it turns OFF. A significant drawback of the HSD circuit technique is that the transistor MK is OFF at the onset of the evaluation mode, leaving the dynamic node floating. Although this technique accelerates the circuit, any noise at the beginning of the evaluation mode can lead to the dynamic node discharging. Furthermore, when the output is at  $V_{DD}$ , the gate of transistor MK is at the voltage  $V_{DD}$  -  $V_{tn1}$  (where  $V_{tn1}$  is the threshold voltage of transistor Mn1), meaning the keeper transistor is not entirely OFF, thus creating a direct path between MK and the evaluation network.

Another method proposed for reducing leakage and enhancing robustness is the diode-footed domino (DFD) [7]. Fig. 2(c) illustrates a wide OR gate implemented using this technique. In the DFD, M1 is a transistor configured in diode mode, which is placed in series within the evaluation network. As a result, the leakage current is reduced due to the stacking effect created by the presence of transistor M1. Furthermore,



Fig. 1. Conventional dynamic circuit.

employing this transistor enhances robustness against noise.

As illustrated in Fig. 2(d), the leakage current replica (LCR) incorporates a keeper transistor that replicates the leakage current to mitigate the effects of variations [8]. The conductivity of the keeper in this circuit is regulated according to process corners. This is accomplished through a current mirror that duplicates the gate leakage. However, it has drawbacks such as low robustness and high power consumption, particularly in wide gates.

The current comparison-based domino circuit (CCD) is depicted in Fig. 2(e). In this circuit, a current comparison against a reference current is employed to minimize power consumption [9]. However, the use of PMOS transistors increases the circuit's area.

Fig. 2(f) illustrates the controlled current comparison-based domino (C3D), which primarily aims to reduce contention between the evaluation network and the keeper transistor [10]. To accomplish this, two voltage-controlled current sources are employed in the C3D.

Node voltage-based conditional keeper (NCK) is another



Fig. 2. Dynamic circuit techniques: (a) CKD [5], (b) HSD [6], (c) DFD [7], (d) LCR [8], (e) CCD [9], (f) C3D [10], (g) NCK [11], (h) CDDK [12], (i) [13].

circuit scheme [11]. As illustrated in Fig. 2(g), the keeper transistor is controlled by the voltage of the footer node to minimize contention. However, the power consumption remains high due to the use of two inverters.

As depicted in Fig. 2(h), the clock-delayed dual keeper (CDDK) was introduced to reduce power consumption in dynamic circuits [12]. In this approach, the keeper circuit is modified to lessen contention, albeit at the expense of robustness during the initial phase of the evaluation mode.

A circuit technique that merges the concepts of HSD and DFD is shown in Fig. 2(i) [13]. The keeper transistor is controlled using a transmission gate along with clock-delayed signals. Similar to HSD, the keeper transistor stays OFF at the start of the evaluation mode. Consequently, the speed increases, though this comes at the cost of diminished robustness during this period.

# III. SUGGESTED DYNAMIC CIRCUIT

The suggested circuit scheme is depicted in Fig. 3. Additionally, the waveforms related to the proposed circuit are illustrated in Fig. 4.

Regarding the suggested circuit in Fig. 3, its operational details during the two working modes are as follows. In the

pre-charge mode, the signal CLK is at a low level. In this mode, the precharge transistor (Mpre) and discharge transistors (MD1 and MD2) are ON. Thus, the dynamic node (D) is charged up to  $V_{\rm DD}$ , and nodes F and Vo are discharged to zero. The MP2 transmitter is also ON, maintaining the voltage of the dynamic node at  $V_{\rm DD}$ .

In the evaluation mode, where the signal CLK is high, transistors MPre, MD1, and MD2 are turned OFF, and the other transistors in the circuit can be either OFF or ON according to the input signals. Thus, two different scenarios may occur based on the input signals in the evaluation mode.

1) All input signals remain at the zero level. 2) By applying appropriate input signals, at least one conducting path is established in the evaluation network.

In the former scenario, there is no conducting path in the evaluation network, and the only current flowing in the network is leakage current. Most of this leakage current in the nanometer era is due to the sub-threshold leakage current flowing in the OFF transistors in the evaluation network.

Some voltage appears at the footer node (F) due to the leakage current. The voltage generated at node F should be noted as small for the following reasons. First, the threshold voltage of the input transistors increases because of the body effect. The second reason is that because of the presence of

some voltage at node F and zero voltage of the inputs, the  $V_{\rm GS}$  of the NMOS transistors is negative. Thus, the sub-threshold leakage current in the evaluation network is lowered, and the voltage at node F does not increase significantly enough to affect the circuit's operation adversely.

In the suggested circuit, any voltage drop at node D is compensated by utilizing the keeper transistors MP1 and MP2 so that the circuit's operation is not disrupted.

In the latter scenario, when at least one conducting path is created in the evaluation network, the current in the evaluation network increases. Due to the charge sharing between nodes D and F, the voltage at node D is decreased. Therefore, transistor M3 turns ON, charging the output voltage (Vo) to  $V_{DD}$ . Additionally, the voltage at node F is increased. Consequently, transistor M2 turns ON, discharging node E to the ground.



Fig. 3. The suggested dynamic circuit.



Fig. 4. Transient waveform of the suggested circuit.

Finally, transistors M4 and M1 are turned OFF. To turn OFF

these transistors more quickly and reduce the circuit delay, the reference voltage  $V_R$  is selected lower than  $V_{\rm DD}$ .

Since the amount of reference voltage directly affects the robustness and performance of the suggested circuit, the impact of process variations is minimized by using an appropriate reference voltage  $V_R$ . This voltage is generated using a circuit that can accurately track process variations [14]. Thus, the performance and robustness of the suggested circuit are not decreased.

Furthermore, the current in MP1 and MP2 is a replica of the

leakage current of the evaluation network, which is produced by a common circuit. Therefore, the contention between the keeper transistors (MP1 and MP2) and the evaluation network will be reduced, thereby decreasing the consumed power and propagation delay of the gate. Furthermore, due to the utilization of the leakage current replica circuit of the evaluation network, the effects related to process, supply voltage, and temperature variations are minimized.

In the suggested circuit, the replica current mirror includes transistors Mpr and Mnr. This circuit is shared for all gates and therefore does not increase the chip's power and area. The keeper current of each gate is proportional to the leakage current of that gate and is achieved by resizing the transistor MP1.

According to (2), the main components of consumed power in the suggested circuit are reduced for three reasons.

- a) The voltage swing at the dynamic node (D) is lowered to decrease switching power. For this purpose, if a conduction path is established in the evaluation network, the voltage at node D is not discharged to the ground. Its minimum voltage will be approximately  $V_{DD}/2$ , as seen in Fig. 4. Thus, the  $V_{swing}$  is lowered in the suggested circuit, and hence the switching power is decreased according to (2).
- b) In the suggested circuit, the replica current mirror is used to compensate for the leakage current of the evaluation network and avoid the voltage drop at node D, reducing contention and hence the short-circuit power.
- c) Due to the stacking effect in the suggested circuit, subthreshold leakage current and consequently leakage power are decreased [15]. The reduction in leakage current not only decreases leakage power but also increases immunity against noise in the suggested circuit.

## IV. RESULTS AND DISCUSSIONS

The studied circuits and the suggested circuit are designed using HSPICE software and a 90-nm CMOS technology. The supply voltage is set to 1V, the output load capacitance is 5fF, and the operating temperature is selected as  $110^{\circ}$ C. The two-stage OR gates with 8, 16, 32, and 64 inputs are simulated using the studied circuits and the suggested circuit, with delays of 70, 80, 90, and 110 picoseconds, respectively. The ratio of PMOS to NMOS transistor widths in the output inverters is set to 2 (Wp/Wn=2), and the initial size of other transistors is considered to be the minimum size. Then, the value of K in (3) was modified to obtain the specified delay.

The used framework is illustrated in Fig. 5. The gate under test is connected to a similar gate to generate the actual input waveform, as seen in Fig. 5. To calculate the circuit performance, the delay between *In* and *Out* signals of the gate under test is measured in the evaluation mode. This is carried out where only one input is changed while others remain in their previous states [16]. Under the same conditions, the consumed power of the gate is also calculated.

Table I shows a comparison of the consumed power of the circuits normalized to the consumed power of the conventional dynamic circuit (Fig. 1). According to this table, the reduction in the consumed power of the suggested circuit compared to other circuits is evident. Additionally, the proposed circuit demonstrates at least a 48% reduction in consumed power compared to the conventional one.

To calculate the immunity of the circuits against noise

| Comparing the Normalized Power in the Same Delay. |                  |       |      |      |      |      |      |      |      |       |      |           |
|---------------------------------------------------|------------------|-------|------|------|------|------|------|------|------|-------|------|-----------|
| Fan-in                                            |                  | Conv. | CKD  | HSD  | DFD  | LCR  | CCD  | C3D  | NCK  | CDDK  | [13] | This Work |
| 8                                                 | Power            | 25.3  | 40.5 | 26   | 23.4 | 24.5 | 23.7 | 20.3 | 20.4 | 28.8  | 26.4 | 13.1      |
|                                                   | Normalized Power | 1     | 1.6  | 1.03 | 0.92 | 0.97 | 0.94 | 0.8  | 0.81 | 1.14  | 1.04 | 0.52      |
| 16                                                | Power            | 29.4  | 42.7 | 31.7 | 32.8 | 27.6 | 24   | 22.3 | 23.1 | 31.76 | 29.9 | 13.67     |
|                                                   | Normalized Power | 1     | 1.45 | 1.08 | 1.12 | 0.94 | 0.82 | 0.76 | 0.79 | 1.08  | 1.02 | 0.46      |
| 32                                                | Power            | 34.75 | 51.5 | 38   | 34.8 | 33.7 | 27   | 26.6 | 31.1 | 37.7  | 41.5 | 17        |
|                                                   | Normalized Power | 1     | 1.48 | 1.09 | 1    | 0.97 | 0.78 | 0.77 | 0.89 | 1.08  | 1.19 | 0.49      |
| 64                                                | Power            | 44    | 63   | 51   | 47   | 40   | 29   | 28   | 38.5 | 49.8  | 54.3 | 22.8      |
|                                                   | Normalized Power | 1     | 1.43 | 1.16 | 1.07 | 0.91 | 0.66 | 0.64 | 0.88 | 1.13  | 1.23 | 0.52      |

TABLE I Comparing the Normalized Power in the Same Delay.

sources, the Unity Noise Average (UNA) is used, which is defined as the amplitude of the input noise voltage that results in the same average output noise voltage. UNA is defined as



Fig. 5. The framework used in simulations [16].

follows [14]:  

$$UNA = \{V_{in} : Vnoise_{Avg} = Voutput_{Avg}\}$$
 (4)

The values obtained for UNAs are shown in Table II. All UNAs are normalized to the UNA of the conventional circuit. Simulations are accomplished at the same speed for a given fan-in. Simultaneously, noisy signals are applied to all inputs to account for the worst-case scenario to measure the robustness against noise in OR gates. The simulation results indicate an improvement of 1.22 to 2.21 times in UNA of the suggested circuit compared to the conventional dynamic circuit.

Additionally, the suggested circuit exhibits better noise immunity than other circuits, except for DFD, CCD, C3D, and NCK circuits. However, the suggested circuit has lower consumed power compared to these circuits.

For a better comparison of the suggested circuit with other circuits, it is essential to concurrently consider design factors consisting of consumed power, delay, and noise immunity. Reference [17] described the relationship between noise, delay, and consumed power as a criterion for calculating the overall performance (OP) in the following way.

$$OP = \frac{Power \times Delay}{UNG}$$
 (5)

where UNG represents the unity noise gain. This factor is used to measure the noise immunity of circuits and is defined as the input noise voltage amplitude that causes noise to appear at the output with the same voltage amplitude. UNG is derived from the following relationship [18].

$$UNG = \{V_{in} : V_{noise} = V_{outnut}\}$$
(6)

The OP criterion suffers from the following drawbacks. First, the power-delay product is employed instead of using the energy-delay product. The energy-delay product is more important than the power-delay product. Second, the output noise pulse width is not considered in UNG. Therefore, UNA should be used to measure the noise immunity of circuits. To simultaneously examine design factors, the following figure of merit (FOM) can be utilized [19].

$$FOM = \frac{UNA_{norm}}{P_{rown} \times t_{p}^{2}} \tag{7}$$

where  $UNA_{norm}$ ,  $P_{tot-norm}$ , and  $t_{P-norm}$  represent the unity noise average, average consumed power, and the circuit delay,

TABLE II
Comparing the Normalized UNA in the Same Delay.

|        |                | Comp  | 41111g ti | 10 1 1012 | THE PARTIE OF | G C 1 11 2 |      | 241110 | <del></del> |      |      |           |
|--------|----------------|-------|-----------|-----------|---------------|------------|------|--------|-------------|------|------|-----------|
| Fan-in |                | Conv. | CKD       | HSD       | DFD           | LCR        | CCD  | C3D    | NCK         | CDDK | [13] | This work |
| 8      | UNA            | 0.45  | 0.46      | 0.36      | 0.72          | 0.37       | 0.68 | 0.84   | 0.82        | 0.44 | 0.51 | 0.55      |
|        | Normalized UNA | 1     | 1.02      | 0.8       | 1.6           | 0.82       | 1.51 | 1.87   | 1.82        | 0.98 | 1.13 | 1.22      |
| 16     | UNA            | 0.39  | 0.39      | 0.32      | 0.68          | 0.32       | 0.66 | 0.8    | 0.82        | 0.4  | 0.45 | 0.57      |
|        | Normalized UNA | 1     | 1         | 0.82      | 1.74          | 0.82       | 1.69 | 2.05   | 2.1         | 1.03 | 1.15 | 1.46      |
| 32     | UNA            | 0.34  | 0.35      | 0.29      | 0.67          | 0.29       | 0.61 | 0.75   | 0.73        | 0.39 | 0.48 | 0.62      |
|        | Normalized UNA | 1     | 1.03      | 0.85      | 1.97          | 0.85       | 1.79 | 2.21   | 2.15        | 1.15 | 1.41 | 1.82      |
| 64     | UNA            | 0.29  | 0.3       | 0.26      | 0.62          | 0.25       | 0.54 | 0.75   | 0.72        | 0.31 | 0.47 | 0.64      |
|        | Normalized UNA | 1     | 1.03      | 0.9       | 2.14          | 0.86       | 1.86 | 2.59   | 2.48        | 1.07 | 1.62 | 2.21      |

|                  | Conv. | CKD  | HSD  | DFD  | LCR  | CCD  | C3D  | NCK  | CDDK | [13] | This work |
|------------------|-------|------|------|------|------|------|------|------|------|------|-----------|
| # of transistors | 36    | 47   | 44   | 40   | 37   | 41   | 43   | 38   | 42   | 41   | 41        |
| Power            | 34.75 | 51.5 | 38   | 31.2 | 33.7 | 27   | 26.6 | 31.1 | 37.7 | 41.5 | 17        |
| Normalized Power | 1     | 1.48 | 1.09 | 0.9  | 0.97 | 0.78 | 0.77 | 0.89 | 1.08 | 1.19 | 0.49      |
| Normalized delay | 1     | 1    | 1    | 1    | 1    | 1    | 1    | 1    | 1    | 1    | 1         |
| UNA              | 0.34  | 0.35 | 0.29 | 0.32 | 0.29 | 0.61 | 0.75 | 0.73 | 0.39 | 0.48 | 0.62      |
| Normalized UNA   | 1     | 1.03 | 0.85 | 0.94 | 0.85 | 1.79 | 2.21 | 2.15 | 1.15 | 1.41 | 1.82      |
| Normalized FOM   | 1     | 0.7  | 0.78 | 1.04 | 0.88 | 2.29 | 2.87 | 2.42 | 1.06 | 1.18 | 3.71      |

TABLE III
Comparison of Normalized FOMs.

respectively. Each factor is normalized to the corresponding factors in the conventional dynamic circuit.

Table III compares the FOM of the dynamic circuits under investigation. According to this table, the suggested circuit presents a higher FOM compared to other dynamic circuits, due to its low consumed power and high noise immunity. Specifically, the FOM of the proposed circuit is 3.71 times greater than that of the conventional dynamic circuit.

To examine the effect of process variations on the delay and consumed power of the suggested circuit, these variations are evaluated at all process corners, i.e., sPfN, fPfN, sPsN, and fPsN, at a supply voltage of 1V and a temperature of 110 °C.

Delay and consumed power are normalized to their corresponding factors in a typical process (tPtN), with the supply voltage of 1V and temperature at  $110\,^{\circ}$ C. The simulation results are illustrated in Fig. 6.

Furthermore, the effects of voltage and temperature variations on the delay and consumed power of the suggested circuit are considered, and simulation results are illustrated in Fig. 7 and Fig. 8, respectively, for different  $V_{DD}$  and temperatures. Based on figures 6-8, it can be concluded that the suggested circuit operates effectively under process, voltage, and temperature variations.

To examine the impact of CMOS technology scaling on the suggested circuit, simulations are carried out in 45 and 16nm CMOS technology nodes using high-performance predictive technology models (PTM) [20]. Simulations are performed for a 32-input OR gate at the same delay at the typical corner and 110°C.



Fig. 6. The impact of process variation on the delay and power of the suggested circuit.



Fig. 7. The impact of voltage variation on the delay and power of the suggested circuit.



Fig. 8. The impact of temperature variation on the delay and power of the suggested circuit.

The power consumption and UNA of the suggested circuit are shown in Fig. 9. The results, normalized to the conventional circuit, show that the proposed circuit consumes less power and has a higher UNA, indicating its effectiveness even in the 16nm technology. This indicates that the suggested circuit's performance benefits are maintained as CMOS technology scales down.



Fig. 9. The impact of technology scaling on the delay and power of the suggested circuit.

# V. WIDE FAN-IN MULTIPLEXER DESIGN

High-performance multiplexers, constructed using dynamic gates, are critical components in modern microprocessors, particularly within multi-port memories. These memories, essential for achieving high access speeds, contribute significantly to overall energy consumption. This is primarily due to the substantial capacitance associated with their extensive bus architectures. The problem is further exacerbated by the increasing demand for larger memory sizes and faster operational speeds, leading to a disproportionate increase in energy expenditure. Therefore, optimizing the energy efficiency of multiplexers and memory architectures is paramount in the design of future high-performance microprocessors.

To evaluate the efficacy of the suggested circuit design, simulations are carried out on wide multiplexers implemented using the proposed circuit and the conventional one. These multiplexers are designed with a 90nm CMOS technology process. Transistor sizing is performed to ensure a UNA of 0.3V across all process corners, thereby guaranteeing a fair comparison.

A common approach to wide multiplexer design involves a two-stage architecture. The initial stage comprises multiple narrow multiplexers operating in parallel. Subsequently, the outputs from these first-stage multiplexers are aggregated using a domino OR gate in the second stage.

A 128-input conventional multiplexer is realized using a multi-stage architecture. As illustrated in Fig. 10 (a), the first stage utilizes eight 16-input multiplexers. Each multiplexer selects one of its 16-input data based on a set of read select lines. The outputs of these eight 16-input multiplexers (N0...N7) then serve as inputs to the second stage. In Fig. 10 (b), these intermediate signals are fed into an 8-input OR gate. This OR gate effectively combines the outputs of the first-stage multiplexers to generate the final output signal of the 128-input multiplexer.

In addition, a 128-input multiplexer realized using the suggested circuit is depicted in Fig. 11. This design employs eight 16-input multiplexers in its first stage, as illustrated in Fig. 11 (a). The outputs of these multiplexers, denoted N0 through N7, are then fed into an output stage, depicted in Fig. 11 (b). This output stage utilizes an 8-input dynamic OR gate to produce the final multiplexed signal.

Table IV details the design parameters of simulated multiplexers. The data indicate that the suggested circuit exhibits a superior Figure of Merit (FOM) compared to the conventional design. Specifically, the suggested multiplexer achieves a 33% reduction in delay and a 13% reduction in power consumption for a 128-input configuration, relative to the conventional circuit. This results in a 2.56x improvement in the FOM under the same noise immunity. Furthermore, the area occupied by the shared replica current mirror is estimated to be a mere 8% of the suggested multiplexer's total area.



Fig. 10. 128-input conventional multiplexer using: (a) eight 16-input multiplexers, (b) the 8-input OR gate as the output stage.



Fig. 11. 128-input multiplexer using the suggested circuit: (a) eight 16-input multiplexers, (b) the 8-input dynamic OR gate for implementation of the output stage.

TABLE IV
Comparison of the Normalized FOMs in the 128-in
Multiplexers.

Conventional design This work

| Power (mw)          | 58.5  | 50.8  |  |  |  |  |
|---------------------|-------|-------|--|--|--|--|
| Normalized<br>Power | 1     | 0.87  |  |  |  |  |
| Delay (ps)          | 247.6 | 166.5 |  |  |  |  |
| Normalized Delay    | 1     | 0.67  |  |  |  |  |
| UNG (V)             | 0.3   | 0.3   |  |  |  |  |
| Normalized UNG      | 1     | 1     |  |  |  |  |
| FOM                 | 1     | 2.56  |  |  |  |  |

#### VI. CONCLUSION

This article suggested a circuit technique for reducing the consumed power of dynamic OR gates with large inputs. The proposed circuit utilizes a low-swing network to decrease the consumed power of the wide gates. To reduce contention, the leakage current replication from the evaluation network is used for the keeper transistors. Also, a modified structure for the output inverter has been implemented, which helps to decrease contention and hence consumed nower. Additionally, due to the stacking effect, the leakage current of dynamic gates is reduced, enhancing their noise immunity. Furthermore, wide fan-in multiplexers are designed by exploiting the suggested dynamic and conventional circuits. Multiplexers are simulated using a 90 nm, 1-V CMOS process. The simulation results presented lower power consumption and higher speed than the conventional design with the same noise immunity.

The suggested circuit and other circuit schemes were designed using 90-nm CMOS technology. The results demonstrate a meaningful reduction in consumed power and an increase in noise immunity of the suggested circuit compared to the conventional circuit.

## REFERENCES

- A. Kumar, R.K. Nagaria, Reduction of variation and leakage in wide fan-in OR logic domino gate. Integration, the VLSI Journal 89 (2023) 229-240.
- [2] M. Asyaei, Energy-Efficient Dynamic Circuit for High Fan-In OR Gates, Journal of Modeling & Simulation in Electrical & Electronics Engineering (MSEEE), 3 (2023) 35-40.
- [3] J.M. Rabaey, A.P. Chandrakasan, B. Nikolic, Digital integrated circuits, 2nd ed., Upper Saddle River, NJ: Prentice Hall, Englewood Cliffs, 2003.
- [4] P. Gronowski, Issues in dynamic logic design, in Design of High-Performance Microprocessor Circuits, A. Chandrakasan, W. J. Bowhill, and F. Fox, Eds. Piscataway, NJ: IEEE Press, Ch. 8, 2001, pp. 140–157.
- [5] A. Alvandpour, R. Krishnamurthy, K. Sourrty and S. Y. Borkar, A sub-130-nm conditional-keeper technique, IEEE Journal of Solid-State Circuits, 37 (2002) 633-638.
- [6] M. H. Anis, M. W. Allam, and M. I. Elmasry, Energy-efficient noisetolerant dynamic styles for scaled-down CMOS and MTCMOS technologies, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 10 (2002) 71-78.
- [7] H. Mahmoodi-Meimand and K. Roy, Diode-footed domino: a leakage-tolerant high fan-in dynamic circuit design style, IEEE Transactions on Circuits and Systems, 51 (2004) 495-503.
- [8] Y. Lih, N. Tzartzanis, W.W. Walker, A leakage current replica keeper for dynamic circuits, IEEE Journal of Solid-State Circuits 42 (2007) 48– 55
- [9] A. Peiravi and M. Asyaei, Current-comparison-based domino: a new low-leakage high speed domino circuit for wide fan-in gates, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21 (2013) 934-943
- [10] M. Asyaei and F. Moradi, A domino circuit technique for noise-immune high fan-in gates, Journal of Circuits, Systems, and Computers, 27 (2018) 1850151:1-23.

- [11]R. Kannan, R. Rangarajan, Low power noise immune node voltage comparison keeper design for high speed architectures. Microprocessors and Microsystems. 77 (2020) 103192.
- [12] A. A. Angeline, VSK. Bhaaskaran, Speed enhancement techniques for clock-delayed dual keeper domino logic style. International Journal of Electronics. 107 (2020)1239-1253.
- [13] A. Kumar, N. Garg, R.K. Nagaria, A Low Power Noise Tolerant Wide Fan-In OR logic Domino Gate, Integration, the VLSI Journal, 104 (2025) 102468.
- [14]M. Asyaei, and E. Ebrahimi, Low power dynamic circuit for power efficient bit lines, AEU-International Journal of Electronics and Communications, 83 (2018) 204-212.
- [15]S. Fisher, A. Teman, D. Vaysman, A. Gertsman, O. Yadid-Pecht, and A. Fish, Digital subthreshold logic design—motivation and challenges, IEEE 25th Convention of Electrical and Electronics Engineers, (2008) 702-706.
- [16]M. Alioto, G. Palumbo, M. Pennisi, Understanding the effect of process variations on the delay of static and domino logic. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 18 (2010) 697-710.
- [17] J. Wang, W. Wu, N. Gong, and L. Hou, Domino gate with modified voltage keeper, Paper presented at the ISQED, (2010) 443-446.
- [18]L. Wang and N. R. Shanbhag, An energy-efficient noise-tolerant dynamic circuit technique, IEEE Transactions on Circuits and Systems, 47 (2000) 1300-1306.
- [19] M. Asyaei, A New Circuit Scheme for Wide Dynamic Circuits, Inter. Journal of Engineering Transactions B: Applications, 31 (2018) 699-704.
- [20] Predictive Technology Model (PTM). 16 nm High-performance V2.1 technology of PTM model. (2022), http://ptm.asu.edu/modelcard/HP/45nm\_HP.pm.