# A Novel Non-cluster Based Architecture of Hybrid Electro-optical Network-on-Chip

Gaizhen Yan, Ning Wu, and Zhicheng Zhou

Abstract-With the growing scale of Multi-Processor System-on-Chip, the large interconnection power and latency is becoming the bottleneck of the system performance. Hybrid electro-optical Network-on-Chip (NoC) is envisioned as a promising solution, which delivers global and local traffic in optical and electronic path respectively. Most of existing work is cluster based. They break up the node accessibility in the electronic link and thus might result in poor locality support, inability of fault tolerance or limited scalability. In this paper, we propose a novel non-cluster based hybrid electro-optical network architecture, in which topology of electronic NoC is maintained unchanged and an auxiliary optical NoC is built for the global communication speed up. Performance and power efficiency have been tested under 64 nodes and 100 nodes scale network. Experiments shown that, compared to the electronic network, throughput has been improved by 39% and 57%, while the latency has been reduced by 63% and 51%. When the network gets saturated, the per-bit energy consumption has been reduced by about 29% and 21%.

#### Index Terms-hybrid, Electro-optical NoC, Non-cluster

#### I. INTRODUCTION

WITH the growing on-chip transistor budgets and diminishing return from instruction level parallelism, the semiconductor industry has been pushed towards multiprocessor system-on-chip (MPSoC) for task level parallelism [1]. Related commercial products have evolved from the Intel 8-core superscalar CMP [2] to Rapport 256-element reconfigurable processor [3]. Increased cores in commercial products and the newly developed kilo-core research prototype [4] are optimistically embracing the "new" Moore's Law: number of on-chip cores will double roughly every 18 months [5]. Providing sufficient interconnection bandwidth is becoming a great challenge.

Due to the better scalability and concurrent communication

N. WU is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China (email: wunee@nuaa.edu.cn).

Z. ZHOU is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China.(email: 1162706805@qq.com). ability, Networks-on-Chip (NoCs) are envisioned as an appealing interconnection solution for the large-scale MPSoC. Anyway, with the increased scale of the network topology, Electronic NoC (ENoC) might cause large power consumption and latency for global communication because of the large network diameter [6]. The newly developed three-dimensional NoC helps to shorten the global interconnects, but because vertically piling up more active devices, severe thermal issue is introduced [7]. Thermal management schemes, such as node throttling and routing-based traffic migration would shrink communication bandwidth.

Most recently, advances in the silicon nano-photonic technology provide another interconnect candidate for large-scale MPSoC: Optical NoC (ONoC), which is faster and less power-consuming than the electrical counterparts. What is more, with dense wavelength division multiplexing, optical links can provide high bandwidth density, and thus high throughput [8]. Anyway, because of the necessity of optical-electronic conversion when signals shuffles between the electronic process element and optical links, the power and latency saving in optical links might not compensate the additional cost in optical-electronic interfaces for the local communications. ONoC also consumes large amount of static power caused by laser and micro-ring tuning. The energy efficiency of the full ONoC would be very poor when the traffic load is low.

As a result, electronic interconnect is fit for local communication while optical interconnect benefits global one; electronic links consumes large dynamic power but the optical links are suffered from large static power. Therefore, Neither ENoC nor ONoC is the ideal interconnect candidate. Hybrid optical-electrical NoC (HOE-NoC) can make the benefits of the two links well compensate with each other, and becomes one of the hottest topics in NoC research area.

Ye et al. [9] has proposed a torus-based hierarchical optical-electronic NoC architecture THOE, in which, the optical link is based on circuit switching. Although quickly acknowledge and simultaneously tear-down protocol is designed to reduce the latency caused by path setup and teardown, the high performance of optical link is still limited to large packets. *Vantrease et al.* [10] designed Corona for 256 core interconnect, in which the optical link utilizes a large radix cross-bar, thus has poor scalability. *Tan et al.* [11] proposed a fat-tree based hybrid NoC, and the root node is replaced by generic wavelength optical router for providing high bandwidth. Joshi et al. [12] explores the methods using photonics to implement low diameter non-blocking crossbar and Clos networks. Morris et al. [13] presents an architecture

Manuscript received June 6, 2017, revised June 19, 2017. This work was supported in part by the National Natural Science Foundation of China (61376025), the Anhui Scientific Research Funds for University (KJ2017A501), the Jiangsu Innovation Program for Graduate Education (KYLX15\_0283), and the Natural Science Foundation of Jiangsu Province (BK20160806)

G. YAN is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China, and with the School of Electrical and Electronic Engineering, Anhui Science and Technology University, Chuzhou, 233100, China. (email: xucs\_yan@126.com).

that combines nano-photonic interconnects and 3D technology to design a reconfigurable NoC. *Kao et al.* [14] proposed a Sustained and Informed Dual Round-Robin Matching scheduling algorithm and Distributed and Informed Path Allocation scheme for the output contention and routing of clos network.

The existing work has made great contribution to the hybrid network. Based on our observation, from the view of the link hybrid method, most of them are cluster based. That is, processor elements within the same cluster are connected with electronic routers and clusters are interconnected with optical links. Such topology might have several potential problems. For the first, it is not efficient in supporting the local communication, for the neighbored nodes in different clusters still communicate through optical links. Secondly, it is unable to support fault tolerance. Optical links are the only way for inter-cluster communication. If one optical node failed because of the thermal variation, processor element within the cluster will have no way to other clusters. Thirdly, scalability of the cluster based topology is not so well. Neither the nodes number within clusters nor the cluster number can be continuously increased. Large radix crossbar for intra-cluster node connection would result in large area and power cost. Too much optical node helps to provide high bandwidth, but at the cost of high static power. Therefore, cluster number is limited.

Based on our observation, the main reason for the above-mentioned problems is that the electronic link availability between any two nodes is not guaranteed. Thereafter, our previous work [15] has proposed a new hybrid electro-optical NoC architecture, in which, topology of electronic NoC is maintained unchanged and an auxiliary optical NoC is built for the global communication speed up. Anyway, the optical NoC is based on circuit switching and only serves for large global packets. This paper is an improvement to our previous work [15]. We keep the architecture unchanged, but change the optical link into crossbar based. To make the global traffic and local traffic be delivered in optical link and electronic link, respectively, we further give out our routing algorithm to determine the routing path for a specific packet.

The rest of this paper is organized as follows. Section II describes the proposed electro-optical NoC architecture. Section III optimizes distribution of the optical interface. Section IV shows our strategy to distinguish global traffic from local traffic and deliver them in optical or electronic path, respectively. In Section V, we present our experiment result and evaluate the performance of our proposed architecture. Section VI is the conclusion and our future work.

## II. PROPOSED ARCHITECTURE

The proposed optical-electronic architecture is shown in Fig.1. It is composed of three layers: Optical Die, Electro-Optical Transceiver Die and Electronic Die, as shown in Fig. 1(a). The Optical Die includes micro-rings and waveguides for optical signal modulation, propagation and detection. The Electro-Optical Die includes Electro-Optical Converter, Opto-electronic Converter and the Electronic contacts for heating. Process Elements and Electronic Routers



Fig. 1. Architecture of the proposed hybrid electro-optical NoC (a) The architecture (b) The network topology in Optical Die (c) The vertical link between electronic die and electro-optical transceiver die (d) The network topology in electronic die

are all placed in the Electronic Die, which is closest to the heat sink for better thermal dissipation.

As shown in Fig. 1(d), the Process Elements in the Electronic Die are interconnected through electrical network in mesh topology, rather than being grouped into clusters like the previous work does. Then, any two nodes can access each other through electronic network, and therefore, provide better locality support and fault tolerance. Some of the routers, i.e., the ones with dark filling color, are equipped with vertical ports for delivering global packets to optical networks by through silicon vias (TSVs), as shown in Fig. 1(c). To benefit small global packets, reduce waveguide crossings and support optical broadcasting, the optical nodes are interconnected with optical crossbar which is implemented by Multiple Write Single Read (MWSR) rings, as shown in Fig. 1(b).

Despite of many benefits of our proposed architecture, there are still two problems needed to be solved.

(1) The more optical interfaces the optical network provides, the large the bandwidth is provided. But the static power consumed by optical laser and micro ring tuning is also increased. Therefore, if the optical nodes in the optical die is limited, how to distribute them?

(2) Electronic path and optical path are provided to the traffic between any nodes. So, for a given packet, how to distinguish global traffic from local traffic, and deliver them in optical or electronic path, respectively?

Our strategy to solve the abovementioned two problems will be illustrated in Section III and Section IV, respectively.

### III. TOPOLOGY OPTIMIZATION

## A. Problem Description

In the proposed hybrid architecture, multiple nodes share the same optical interface for entering and leaving the optical link, potentially causing the link bandwidth contention of global traffic and local traffic in the electronic NoC. The placement of optical interfaces in the electronic NoC plays an



Fig. 2 Different placement of optical interface (a) Extremely uneven placement (b) Relatively even placement

important role in reducing contention of the two kind of traffic. Fig. 2 has given two different placement of the optical interface in  $9 \times 9$  mesh topology. It can be seen that, under the uneven distribution shown in Fig. 2(a), a node may need  $1 \sim 8$  hops to reach the optical interface. This will increase not only the traffic load of the electronic routers near the optical interface, but also the cost of optical link utilization for nodes far from the optical interface. By contrast, under the relatively even distribution shown Fig. 2(b), each node can reach the optical interface within  $1 \sim 2$  hops. Problems shown in Fig. 2(a) can be greatly alleviated.

Based on the above observation, we are motivated to find an even optical interface placement over a given topology. The optimal optical interface placement should meet the following requirements.

(1) Every node can reach at least one optical interface within a given distance.

(2) The number of optical interface should be reduced as least as possible.

## B. Problem Formulation

For a given  $m \times n$  mesh based topology  $T_{m \times n}$ , we define the the set of nodes which has optical interface as X, type of node  $n_k$  as  $x_k$  and the Manhattan Distance between node  $n_i$  to  $n_j$  as  $d_{ij}$ . If node  $n_k$  has optical interface,  $x_k$  would be one; or else  $x_k$  would be zero, as shown in equation (1).

$$x_k = \begin{cases} 1, & n_k \in X \\ 0, & n_k \notin X \end{cases}$$
(1)

Giving the maximum distance as  $d_{max}$ , the optical interface searching region  $R_k$  of any node  $n_k$  can be expressed as equation (2)

$$R_k = \{n_m \mid d_{mk} \le d_{\max}\}$$
<sup>(2)</sup>

For better understanding, we take Fig. 3 to show the reachable nodes of node  $n_k$  when maximum distance  $d_{max}$  is 2. We further divide the searching region into four quadrants. Then finding a node with optical interface in the first quartile can be express as equation (3).

$$S_{1} = \sum_{j=i}^{q_{1}} \sum_{i=0}^{p_{1}} x_{k-i^{*}m+j} \ge 1$$
(3)
where  $p_{1} = \min(k/m, d_{\max}), q_{1} = \min(d_{\max} - i, m - k\%m)$ 

Conditions for finding a node with optical interface in the second, third and fourth quartile can be express as equation  $(4)\sim(6)$ , respectively.

$$S_{2} = \sum_{j=i}^{q^{2}} \sum_{i=0}^{p^{2}} x_{k-i^{*}m \cdot j} \ge 1$$
(4)
where  $p_{2} = \min(k/m, d_{\max}), q_{2} = \min(d_{\max} - i, k\%m)$ 

|          | _ | r | 11.            |      |      |
|----------|---|---|----------------|------|------|
|          |   | - | <sup>v</sup> K |      |      |
|          |   |   |                |      |      |
|          |   |   |                |      |      |
|          |   |   |                |      |      |
| $\vdash$ |   |   |                | <br> | <br> |

Fig. 3 Searching region of the node  $n_k$ 

TABLE 1 THE OPTIMIZED RESULTS FOR OPTICAL INTERFACE DISTRIBUTION

| Scale | Count | Location                                                                                                                                                                  |
|-------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 6×6   | 10    | 2 4 7 12 15 19 23 28 32 36                                                                                                                                                |
| 7×7   | 12    | 3 5 8 14 18 23 27 32 36 42 45 47                                                                                                                                          |
| 8×8   | 16    | 4 7 9 10 15 21 27 32 33 38 44 50 55 56 58 61                                                                                                                              |
| 9×9   | 20    | 4 7 10 11 18 23 24 30 35 37 42 49 54 56 61 64 68<br>72 75 79                                                                                                              |
| 10×10 | 24    | 2 6 8 14 20 21 27 33 35 39 41 47 54 60 62 66 68<br>74 80 81 87 93 95 99                                                                                                   |
| 11×11 | 29    | 2 6 9 15 20 22 23 29 36 38 43 47 52 56 61 66 70<br>75 79 84 86 92 93 99 100 107 113 116 120                                                                               |
| 12×12 | 35    | 2 5 10 14 19 20 24 28 33 37 42 47 51 56 63 65 70<br>72 73 79 81 88 95 98 102 104 112 118 120 121<br>127 135 137 141 143                                                   |
| 13×13 | 40    | 3 7 11 14 18 22 26 29 34 36 40 45 51 56 61 67 72<br>76 78 83 87 92 94 98 103 109 114 119 125 130<br>134 136 141 144 148 152 156 159 163 167                               |
| 14×14 | 47    | 3 5 9 12 15 21 28 30 32 37 38 39 48 55 57 59 64<br>68 75 80 84 86 91 95 102 107 111 113 118 123<br>129 134 140 143 145 150 152 155 161 168 172<br>177 179 184 188 191 195 |

$$S_{3} = \sum_{j=i}^{q_{3}} \sum_{i=0}^{p_{3}} x_{k+i^{*}m\cdot j} \ge 1$$
(5)  
where  $p_{3} = \min(n - k/m, d_{\max}), q_{3} = \min(d_{\max} - i, k\%m)$ 

$$S_{4} = \sum_{j=i}^{q_{4}} \sum_{i=0}^{p_{4}} x_{k+i^{*}m+j} \ge 1$$
(6)

where  $p_4 = \min(n - k/m, d_{\max}), q_4 = \min(d_{\max} - i, m - k\%m)$ 

Based on equation (3)~(6), requirement that every node can reach at least one optical interface within a given distance  $d_{max}$  can be expressed as equation (7).

$$S_1 + S_2 + S_3 + S_4 \ge 1 \tag{7}$$

Further, requirement that reduce the optical interface as least as possible can be expressed as equation (8).

$$f = \min \sum x_i \tag{8}$$

To find the best optical interface placement in a given topology is a non-deterministic polynomial (NP) hard problem. We take equation (7) as constraint condition, and equation (8) as the optimization objective function. Further, Integer Linear Programming (ILP) method is utilized to solve this optimization issue.

#### C. Results of ILP Optimization

We set  $d_{max}$  as 1, that is, every node can find at least one optical interface within one hop. Lingo (Linear Interactive and General Optimizer), ILP software solver, is utilized to find the best optical interface distribution under different topology scale. The results including the count and location of

| 1-  | 2     | 3    | -4-  | - 5  | 6      | - 7 - | -8  |
|-----|-------|------|------|------|--------|-------|-----|
| 9   | 10-   | -11- | 12   | 13   | -14-   |       | 16  |
| 17- | 18    | 19   | 20-  |      | 22     | 23    | -24 |
| 25  | - 26- | 27   | 28   | 29   | - 30-  | 31    | 32  |
| 33- | -34   | 35   | -36- | -37- |        | -39-  | 40  |
| 41- | 42    | 43   | 44   | -45- | 46     | 47    | 48  |
| 49- | 50    | -51- | 52   | -53- | - 54 - | -55   | 56  |
| 57- | - 58  | -59  | 60   | -61- | 62     | 63    | 64  |

Fig. 4 Optimal optical interface placement under 8×8 topology



Fig. 5 Demonstration of the optical and electronic path

optical interface are listed in Table 1.

An optimization example under  $8 \times 8$  mesh topology is shown in Fig. 4. It is shown that totally 16 optical interfaces are evenly distributed within the topology, and every node can find one within one hop. Global traffic will be firstly sent to the nearest node with optical interface and then be delivered in the optical link.

#### IV. ROUTING ALGORITHM

The proposed architecture provide optical and electronic path for global and local traffic, respectively. Therefore, packet type should be firstly determined based on the power and latency cost in different path. And then routing algorithm should be developed to deliver different packets on different path.

## A. Power and Latency Model

An example demonstration of the optical and electronic path for a specific packet is shown in Fig. 5. Supposing the packet is originated from node  $n_s$  and destined at node  $n_d$ , the hops the packet traversing in the electronic network can be expressed as equation (9).

$$H = |n_d / m - n_s / m| + |n_d \% m - n_s \% m| + 1$$
(9)

Then the power consumption for the packet delivered in the electronic path would be,

$$P_e = P_R \cdot H + P_L(H-1) \tag{10}$$

where  $P_R$  and  $P_L$  are the power consumption of router and link between routers when packet is forwarded one hop.

When packet is delivered in optical path, the power consumption is composed of the one in optical link and electronic link, and can be expressed as equation (11).

$$P_{O} = 2P_{R} + (P_{R} + P_{L}) \cdot (d_{s} + d_{d}) + 2P_{OI}$$
(11)

where  $x_s$  and  $x_d$  are the type of source and destination node,  $d_s$  is the distance between source node and its nearest optical interface,  $d_d$  is the distance between destination node and its

```
Algorithm: Path Differentiated Routing Algorithm
Input:
             Source (x<sub>s</sub>, y<sub>s</sub>);
             Destination (x_d, y_d);
             Current (x<sub>c</sub>, y<sub>c</sub>);
             Source Optical Interface Gs (x<sub>gs</sub>, y<sub>gs</sub>)= null;
             Destination Interface G_d (x_{gd}, y_{gd})= null;
             Optimized Topology T;
Output:
             dir:
    e_x = d_x - c_x; e_y = d_y - c_y
   IF (c_x = s_x) \&\& (c_y = s_y)
2
3
     | H = |d_x-s_x|+|d_y-s_y|+1;
4
        determine packet_type based on equation (14)
5
        IF (packet type == Global Traffic)
              calculate Gs and G<sub>d</sub> based on T
6
7
    END IF
8
    IF (e<sub>x</sub>==0) && (e<sub>y</sub>==0)
        dir = LOCAL;
9
     10 ELSE IF (Gs != null)
11
          e_x = x_{qs} - c_x; e_y = y_{qs} - c_y
     IF (e<sub>x</sub>==0) && (e<sub>y</sub>==0)
12
13
              dir = UP;
14
               Gs = null;
15
          ELSE
16
               dir = routing_xy();
     17
          END IF
     18 ELSE
19 | dir = routing_xy();
20 END IF
routing_xy();
0_1 IF (e<sub>x</sub>!=0)
0_2 | IF (e<sub>x</sub>>0)
                          dir = EAST:
0_3
      ELSE
                           dir = WEST;
0_4 ELSE IF (e<sub>v</sub>!=0)
           IF (e<sub>y</sub>>0)
                           dir = SOUTH
05 |
06
           ELSE
                           dir = NORTH;
      0 7 END IF
```

Fig. 6 The proposed path differentiated routing algorithm

nearest optical interface, and  $P_{OI}$  is the power consumption the traffic traversing the optical interface.

Similarly, we can derive the zero-load latency when packet is delivered in electronic and optical path, as shown in equation (12) and (13)

$$D_e = D_R \cdot H + L \tag{12}$$

$$D_{0} = 2D_{R} + D_{R} \cdot (d_{s} + d_{d}) + 2D_{0l} + D_{P} + L \qquad (13)$$

where  $D_R$  and  $D_P$  are the latency that a flit traverses one electronic hop and optical link, respectively.  $D_{OI}$  is the latency in the optical interface and *L* is the packet length in flits.

If the power consumption and zero load latency of a packet is smaller in the optical path than in the electronic path, that is,  $D \leq D$  and  $P_0 \leq P_0$  (14)

$$D_o < D_e \quad and \quad Po < Pe \tag{14}$$

We assign packet global traffic and will be delivered in optical path; or else, it will be delivered in electronic path. Based on the above-mentioned idea, we propose the path differentiated routing algorithm, as shown in Fig. 6

## B. The proposed routing algorithm

The proposed path differentiated routing algorithm totally including two steps:

**Step1**: at the source node, determine the *packet type*, source optical interface, destination optical interface, which is described by line 2-7 in Fig. 6. Note that, determine *packet type* needs the basic information about the Manhattan Distance between source and destination node H, the distance between source node and source optical interface  $d_s$ , and the



Fig. 7 Information maintenance and determination at the source node

| PARAMETER SETTING FOR SIMULATION |                            |  |  |  |
|----------------------------------|----------------------------|--|--|--|
| Topology Size                    | 64 nodes, 100 nodes        |  |  |  |
| Router Architecture              | Four Stages (RC-VA-SA-ST)  |  |  |  |
| VCs Per Port                     | 6 for E_Mesh, 3 for HEO_NC |  |  |  |
| Buffer Depth                     | 5 flits                    |  |  |  |
| Flit Width                       | 128bits                    |  |  |  |
| Flit Size                        | 4 flits                    |  |  |  |
| Optical Interface Buffer         | 32flits                    |  |  |  |
| Clock Frequency                  | 2.5GHz                     |  |  |  |
| Optical Baudrate                 | 10Gbps                     |  |  |  |
| Number of Wavelength             | 32                         |  |  |  |
| Optical Parallel Level           | 1                          |  |  |  |
| Traffic                          | Uniform random             |  |  |  |
| Simulation Time                  | 10000cycles                |  |  |  |

distance between destination node and destination optical interface  $d_d$ . *H* can be easily gotten based on equation (9).  $G_s$ and ds are not changed with traffic and can be cheaply stored locally. But  $G_d$  and  $d_d$  will be changed with different traffic and should be calculated dynamically. We take the scheme shown in Fig. 7 for the above-mentioned determination.

Because in the proposed architecture, multiple nodes share one optical gateway, then it is possible to condense the information storage space. Location of the optical interface is stored in Gateway Table. A packet will first map destination node Id to the address of the table, and then read out the gateway Id of the destination node. By this way, the Table items can be greatly reduced.

Based on the information the source node give out, the packet type can be determined. If it is global traffic, the source optical interface and destination optical interface will be carried in the head flit of the packet.

**Step 2**: deliver the packet along the proper path, as shown in line8-20 in Fig. 6. At each router, the Source Optical Interface in the packet head flit will be firstly checked. If it is not set, the packet will be delivered with dimension order XY routing; or else the packet will be firstly routed to the Source Optical Interface in XY routing. Once the packet reach the Source Optical Gateway, the corresponding id in the head flit will be cleared, and then be delivered to the Destination Optical Interface through optical path.



Fig. 8 Throughput comparision under (a) 64 nodes network (b) 100 nodes network



Fig. 9 Latency comparison under (a) 64 nodes network (b) 100 nodes network

## V. EXPERIMENTS AND EVALUATION

We implement three networks: mesh based electronic network (E\_Mesh), ring based full optical network (O\_Corona) and our non-cluster hybrid electro-optical network (HEO\_NC) in the cycle-accurate network simulation environment of JADE [16]. Throughput, average latency and power efficiency of each network have been tested and compared under the parameter settings shown in Table 2.

## A. Throughput

Throughput of three involved network under 64 nodes and 100 nodes topology size are shown in Fig. 8(a) and Fig. 8(b), respectively. E\_Mesh get saturated at the packet injection



Fig. 10 Energy efficiency comparison under (a) 64 nodes network (b) 100 nodes network

point of 0.32 and 0.24 under 64 nodes and 100 nodes topology. The proposed HEO\_NC has extends the point to 0.5 and 0.46. Further the saturation throughput has been improved by 39% and 57%, compared to E\_Mesh under the same topology size. OCorona has the best performance among the three networks. That is because the Optical Corona is in fact a high radix global Xbar, which can provide extremely high bandwidth communication.

## B. Latency

Latency comparison results under 64 nodes and 100 nodes topology size are shown in Fig. 9(a) and Fig. 9(b). It is shown that, compared to E\_Mesh network, the latency of the proposed the HEO-NC has been greatly reduced, at most 63% and 51% reduction have been observed. Packet injection point for 100 cycle delay has been extended by about 31% and 58 % under the 64 nodes and 100 nodes network, respectively. Latency of O\_Corona keeps steadily the lowest and hardly increased with network load. That is because the Optical Corona provides direct link for any nodes. Abundant bandwidth has been provied and no network congestion occures.

## C. Energy Efficiency

We first extract the power model of the electrical and optical components with the Design Space Exploration of Network Tool (DSENT) [17]. The main parameters we use in the simulation are listed in Table 3.

Fig. 10 shows the per bit energy comparison results. Compared to the E\_Mesh network, the proposed HEO\_NC of 64 node and 100 nodes have reduced the energy consumption by about 29% and 21%, when the network load gets saturated. Unlike the 64 nodes network, the energy efficiency of the 100 nodes HEO\_NC is lower than the 100 nodes E\_Mesh network when the traffic load is small. That is because optical interfaces in 100 nodes network is more than the ones in 64 nodes network, thus, resulting larger static power caused by laser power and the ring thermal tuning. Therefore, bandwidth under-utilization is severe where the traffic load is light.

| TABLE 3                      |  |  |  |
|------------------------------|--|--|--|
| PARAMETER SETTINGS FOR DSENT |  |  |  |
| 1mm                          |  |  |  |
| Bulk45LVT                    |  |  |  |
| Standard                     |  |  |  |
| ThermalWithBitReshuffle      |  |  |  |
| 100dB/meter                  |  |  |  |
| 0.01dB                       |  |  |  |
| 1dB                          |  |  |  |
|                              |  |  |  |

Although optical Corona network O\_Corona shows excellent throughput and latency performance, power efficiency of O\_Corona is extremely poor. It is shown that the per-bit energy consumption is several times higher than E\_Mesh under 64 nodes network. For the 100 nodes network, the power consumption is even soar out of the watching window. That is, power consumption of the fully optical NoC is unaffordable for system on chip.

## VI. CONCLUSION

In this paper, we present a non-cluster based electro-optical hybrid NoC architecture, in which electrical interconnection is kept between any two nodes and optical links are provided to accelerate the long distance packets delivering. Experiments show that our proposed hybrid NoC architecture is beneficial to the system throughput, latency and power efficiency. Anyway, with the network scale enlarges, the optical links may incur large static power consumption, which may result in low power efficiency when the network load is light. Therefore, in the future, we will study the way to reconfigure the hybrid network, and improve the power efficiency of the whole system.

#### REFERENCES

- J. Ouyang, Y. Xie, et al.: "LOFT: A high performance networkon-chip providing quality-of-service support", 45th Annual IEEE/ACM International Symposium on Microarchitecture, Atlanta, USA, pp.409-420, Dec. 2012,
- [2] S. Rusu, S. Tam, H. Muljono, et al., "A 45nm 8-Core Enterprise Xeon Processor". International Solid-State Circuits Conference, pp.98–99, San Francisco, CA, USA, Feb. 2009.
- [3] KC256. http://en.wikipedia.org/wiki/Kilocore
- [4] B. Bohnenstiehl, A. Stillmaker; J. J. Pimentel, et al., "KiloCore: A 32-nm 1000-processor computational array", IEEE Journal of Solid-State Circuits, 2017, DOI: 10.1109/JSSC.2016.2638459.
- [5] S. B. Furber, D. R. Lester, L. A. Plana, et al., "Overview of the SpiNNaker system architecture", IEEE Transactions on Computers, vol. 62, no. 12, pp. 2454-2467, Dec. 2013.
- [6] P. K. Hamedani, N. E. Jerger, S.Hessabi, "QuT: A Low-power optical Network-on-Chip", The 8th IEEE/ACM International Symposium on Networks-on-Chip (NoCS), Ferrara, Italy, pp. 80-87, Sept. 2014.
- [7] M. Li, N. Wu, G. Yan, and Lei Zhou, "Temperature and Traffic Information Sharing Network in 3D NoC," Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science, San Francisco, USA, pp.12-16, Oct. 2015.
- [8] Y. Xie, "Future Memory and Interconnect Technologies", Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, pp.964 -969, Mar. 2013.
- [9] Y. Ye, J. Xu, X. Wu, et al. "A torus-based hierarchical optical-electronic Network-on-Chip for multiprocessor System-on-Chip", ACM Journal on Emerging Technologies in Computing Systems, vol. 8, no. 1, pp. 1-26, 2012.

- [10] D. Vantrease, R. Schreiber, M. Monchiero, et al., "Corona: System implications of emerging nanophotonic technology", ACM SIGARCH Computer Architecture News, vol. 36, no. 3, pp. 153-164, 2008.
- [11] X. Tan, M. Yang, L. Zhang, et al., A hybrid optoelectronic Networks-on-Chip architecture. Journal of Lightwave Technology, vol. 32, no. 5, pp. 991-998, 2014.
- [12] A. Joshi, C. Batten, K. Yong-Jin, et al., "Silicon-photonic clos networks for global on-chip communication", Proceeding of ACM /IEEE International Symposium on Network-on-Chip, San Diego, CA, pp. 124-133, May, 2009:
- [13] R. W. Morris, A. K. Kodi, A. Louri, et al., "Three-dimensional stacked nanophotonic network-on-chip architecture with minimal reconfiguration", IEEE Transactions on Computers, vol. 63, no. 1, pp. 243-255, 2014.
- [14] Y.-H. Kao and H. J. Chao, "Design of a bufferless photonic Clos network-on-chip architecture", IEEE Transactions on computers, vol.63, no. 3, pp. 764-776, 2014.
- [15] Z. Zhou, N. Wu, and G. Yan, "Topology Optimization of 3D Hybrid Optical-Electronic Networks-on-Chip", Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2016, San Francisco, USA, pp.8-12, Oct. 2016
- [16] R. K. V. Maeda, P. Yang, X. Wu, et al., "JADE: a Heterogeneous Multiprocessor System Simulation Platform Using Recorded and Statistical Application Models", International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems, Prague, Czech Republic, Jan. 2016
- [17] C. Sun, C.-H. O. Chen, G. Kurian, et al. "DSENT-A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling". The 6th ACM/IEEE International Symposium on Networks-on-Chip (NOCS), Lyngby, Denmark, pp.201-210, May, 2012.