# Collaborative Optimization of Testing and Mapping for Network on Chip

Zhang Ying, Chen Xin, and Ge Fen

Abstract—How to efficiently map the specific application to NoC infrastructure is always an important topic for complex SoC chips. At the same time, there are many challenges for testing of NoC. This paper proposes a novel collaborative optimization scheme of IP cores testing and NoC mapping. Associated with the pre-designed test structure, IP cores are partitioned into parallel testing groups to minimize the testing time. We adapt the Multi-Objective Genetic Algorithm to collaboratively optimize testing time and NoC mapping performance. Besides the process time is relatively decreased, the balance of testing and mapping overheads can be adjusted for various applications. The results of simulations and experiments on testing platform with ITC'02 benchmark circuits showed the effectiveness and flexibility of the collaborative testing and mapping optimization for NoC.

*Index Terms*—NoC testing, NoC mapping, Multi-objective Genetic Algorithm, testing schedule

### I. INTRODUCTION

**N**OC (Network on Chip) has becomes an important solution to complex SoC (System on Chip) architecture and mapping is one of the hot issues in NoC research [1]. NoC mapping is meant to assign a specific application on the NoC platform. The existing mapping schemes are mainly based on various optimization algorithms to minimize the traffic or energy consumption [2-3], which only consider functional performance. However, the difficulty of complex system chip testing increases dramatically [4], which brings the requirement of considering test optimization while mapping.

NoC-based SoCs often reuse NoC communication architecture as TAM (Test Access Machine) for embedded IP cores testing. Parallel testing can be achieved by multiple ATEs that simultaneously access the NoC. There have been some research [5,6] discussing on the multiple ATEs scheme. Authors of [5] proposed a DFT scheme with three accessible

Manuscript received March 20, 2018; revised April 9, 2018. This work was supported in part by the National Natural Science Foundation of China (No. 61701228 and 61106029), the Aviation Science Foundation of China (No. 20162852028 and 20152052025) and the Fundamental Research Funds for the Central Universities (No. NS2015045 and NS2016046).

Y. Zhang is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu Province, 211106 CHINA (phone: 86-25-84892381; fax: 86-25-84892452; e-mail: tracy403@ nuaa.edu.cn).

X. Chen is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu Province, 211106 CHINA (e-mail: xin\_chen@nuaa.edu.cn).

F. Ge is with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu Province, 211106 CHINA (e-mail: gefen@nuaa.edu.cn). ATEs and described the associated testing structure. The diagram of NoC partition is provided and the test partition is aimed to minimize the testing time. However, it is worth noting that the proposed partition is only applied to manufactured chips. M. Agrawal et al. [6] proposed the test data delivery optimization for NoC. It co-optimizes quantity and position of access points and the assignment of cores to access points. Testing time minimization is modeled as a NoC partitioning problem and solved with some optimization algorithms. And yet the scheme is also only concerned with manufactured chips. Our proposed solution is to take into account the needs of parallel testing during NoC mapping process.

We propose a collaborative optimization scheme for IP cores testing and NoC mapping which is aimed to simultaneously minimizing testing time of embedded IP cores and mapping cost. Cooperating with the optimized test structure for 2D Mesh NoC, IP cores are required to partition into four groups for parallel testing and a Multi-Objective Genetic Algorithm is applied to implement the IP cores partition and NoC mapping.

Hereafter, Section II analyzes embedded IP cores testing time and introduces testing optimization NoC mapping problem. Section III presents the collaborative optimization scheme for testing time and NoC mapping. Section IV explains performance evaluation results of the collaborative optimization. Section V draws the conclusions.

## II. IP CORES TESTING TIME ANALYSYS AND THE PROBLEM DESCRIPTION

NoCs can be defined as a set of structured routers and point-to-point channels interconnecting IP cores (Resources). For regularity of its structure, Mesh network are easy to implement and have good scalability. Our proposed testing optimized NoC mapping is based on 2D Mesh NoC.

The optimization objective of IP cores parallel testing is testing time (the number of test cycles), so the independent testing time of each core needs to be firstly determined. Our testing optimized NoC mapping is applied to the ITC'02 benchmark circuits [7].

Circuits in benchmark are made up with cores (modules). For each module, number of test patterns is  $p_m$ , number of input signals is  $i_m$ , number of output signals is  $o_m$ , number of bidirectional signals is  $b_m$ , number of scan chains is  $s_m$ , length of scan chain k is  $l_{m,k}$ , total number of scan FFs is  $f_m$  and  $f_m = \sum_{k=1}^{s_m} l_{m,k}$ . The number of test stimulus and response for each test vector is  $s_{im}$  and  $s_{om}$ , so (1) and (2) can be obtained.

Proceedings of the World Congress on Engineering 2018 Vol I WCE 2018, July 4-6, 2018, London, U.K.

$$s_{im} = i_m + b_m + f_m \tag{1}$$

$$s_{om} = o_m + b_m + f_m \tag{2}$$

The input data and output data of each test vector can be transmitted simultaneously in the scan test except the last one. Moreover, there is an extra cycle needed for function. Therefore, the independent testing time for an IP core  $T_c$  without considering the TAM width is as (3).

$$T_{c} = \{1 + \max(s_{im}, s_{om})\} \times p_{m} + \min(s_{im}, s_{om})$$
(3)

Assume the TAM width is w,  $T_c$  will be as (4).

$$T_{c} = \left\lceil \frac{\max(\mathbf{s}_{im}, \mathbf{s}_{om}) \times p_{m} + \min(\mathbf{s}_{im}, \mathbf{s}_{om})}{w} \right\rceil + p_{m}$$
(4)

For NoC-based SoCs, TAM is usually the NoC communication channel and the width w is typically 16 or 32 bits. Test data are transmitted in the form of packets, which are composed of flits in wormhole routing. One flit is transferred in one clock cycle, so  $T_c$  is valued by clock cycles.

We provided performance evaluation of several ATE access modes and proposed the best solution [8]. According to the test access mode, IP cores of NoC need to be pre-grouped and then mapped to a 2D Mesh architecture. Therefore, NoC IP cores testing optimization has converted into the IP cores partition problems. It can be summarized as follows.

Given the set of IP cores  $C = \{c_1, c_2, ..., c_N\}$ , assume the width of NoC transfer channel is w, then the individual testing time of each IP core  $T_c$  can be induced. Partition C into four groups  $P_i$  (i = 0, 1, 2, 3), the core number and testing time of  $P_i$  is separately  $Num_i$  and  $TP_i$ , so that the grouping optimization objective is as (5).

$$Minimize\left(Max(TP_i)\right) \ i = 0, 1, 2, 3 \tag{5}$$

And the following constraints should be satisfied at the same time.

$$\sum_{i=0}^{3} Num_i = N \tag{6}$$

$$Num_i \le \left\lceil \frac{N}{4} \right\rceil \tag{7}$$

Meanwhile, 2D Mesh NoC mapping process is shown in Fig. 1. The mapping performance can be evaluated by the objective function.

The description of 2D Mesh NoC mapping problem needs to provide three definitions as follows firstly.

Definition 1: Given directed acyclic weighted graph G(V,E) as application characteristics chart, each vertex  $v_i \in V$  shows the associated IP core, directed arc  $e_{i,j} \in E$  shows the communication relationship between  $v_i$  and  $v_j$ , coefficient  $\omega_{i,j}$  shows the traffic between  $v_i$  and  $v_j$ .



Fig.1 2D Mesh NoC mapping

Definition 2: Given directed graph P(R,P) as NoC structure feature graph, each vertex  $r_i \in R$  shows resource node, directed arc  $p_{i,j} \in P$  shows the router between  $v_i$  and  $v_j, E(p_{i,j})$  shows the power consumption of one bit transfer between  $v_i$  and  $v_j$ .

Definition 3: power consumption functions are defined as (8) and (9).

$$Energy = \sum \omega_{i,j} \times E(p_{i,j})$$
(8)

$$E(p_{i,j}) = n_{router} \times E_{Sbit} + (n_{router} - 1) \times E_{Lbit}$$
<sup>(9)</sup>

 $n_{router}$  is number of routers,  $E_{sbit}$  and  $E_{Lbit}$  are separately power consumption of routers and interconnections. Considering the 2D Mesh architecture and XY routing algorithm,  $E(p_{i,j})$  is actually determined by the hamming distance between  $v_i$  and  $v_j$ , so that it can be induced as (10).

$$Cost = \sum_{i=0}^{N-1} \sum_{j=0}^{M-1} \omega_{i,j} * distance_{i,j}$$
(10)

Among them, *Cost* is the mapping overhead,  $\omega_{i,j}$  is the traffic between  $v_i$  and  $v_j$ , *distance*<sub>*i*,*j*</sub> is the hamming distance between  $v_i$  and  $v_j$ , and the NoC is N×M 2D Mesh architecture.

NoC mapping problem: given G and P, search for mapping function *map()*, for minimizing *Cost* and satisfying the following constraints.

$$\forall v_i \in V \Longrightarrow map\left(v_i\right) \in R \tag{11}$$

$$\forall v_i \neq v_i \Leftrightarrow map(v_i) \neq map(v_i) \tag{12}$$

$$size(G) \le size(P)$$
 (13)

### III. COLLABORATIVE OPTIMIZATION OF TESTING AND MAPPING

Considering the structure of optimization problem, we propose a Multi-Objective Genetic Algorithm for collaborative optimization of testing and mapping. The cost of the collaborative optimization function is shown in (14).

$$C_{coop} = \alpha \times T_{max} \times \varepsilon_1 + \beta \times C_{sum} \times \varepsilon_2$$
(14)

The optimization goal is to minimize  $C_{coop}$ , while  $\alpha$  and  $\beta$  are used to coordinate the proportion of testing time and mapping overhead. In addition,  $\varepsilon_1$  and  $\varepsilon_2$  are applied to adjust the magnitude of two optimization objectives to keep the magnitude consistency.

The algorithm includes the following stages.

(1) Dividing IP cores of  $N \times M$  Mesh NoC into *Num* (*Num* = 4) groups according to the test structure.

(2) Encoding the chromosome.

The length of the chromosome in NoC mapping problem is equal to the number of the IP core type, namely *k*. Each genes in the chromosome contains a IP core position in the NoC architecture. In 2D Mesh NoC, we can encode the resource nodes from left to right and from top to bottom position, so that the scope of genetic value is  $[1,N \times M]$ .

(3) Initializing the population.

(4) Calculating the fitness  $C_{coop}$ .

Proceedings of the World Congress on Engineering 2018 Vol I WCE 2018, July 4-6, 2018, London, U.K.

| Input: CCG(C, A), NAG(R, P), {testing time}                                 |  |  |  |  |  |  |  |
|-----------------------------------------------------------------------------|--|--|--|--|--|--|--|
| Output: Noc IP cores mapping                                                |  |  |  |  |  |  |  |
| Initialize_population(); // Initialize the population function              |  |  |  |  |  |  |  |
| set the parameters; // Set probability values of selection crossover        |  |  |  |  |  |  |  |
| for i=1 to N_max // Iterations from 1 to the maximum number                 |  |  |  |  |  |  |  |
| {                                                                           |  |  |  |  |  |  |  |
| for each solution // For each chromosome in the population                  |  |  |  |  |  |  |  |
| {                                                                           |  |  |  |  |  |  |  |
| for each $a_{i,j}$ in CCG in current mapping                                |  |  |  |  |  |  |  |
| {                                                                           |  |  |  |  |  |  |  |
| compute the coordinates of the mapped nodes $map(c_i)$ and $map(c_j)$ ;     |  |  |  |  |  |  |  |
| generate placement; //Generate the new mapping                              |  |  |  |  |  |  |  |
| count IP cores allocated in different ports;                                |  |  |  |  |  |  |  |
| compute test time for each port;                                            |  |  |  |  |  |  |  |
| compute <i>E<sub>hops</sub></i> ; //Compute communication power consumption |  |  |  |  |  |  |  |
| calculate cost; //Compute cost function                                     |  |  |  |  |  |  |  |
| if(cost is minimum) save mapping as the best one}                           |  |  |  |  |  |  |  |
| select();                                                                   |  |  |  |  |  |  |  |
| crossover();                                                                |  |  |  |  |  |  |  |
| mutate();                                                                   |  |  |  |  |  |  |  |
| upgrade_ population();                                                      |  |  |  |  |  |  |  |
| i=i+1; }                                                                    |  |  |  |  |  |  |  |
|                                                                             |  |  |  |  |  |  |  |

Fig.2 Collaborative optimization algorithm

(5) Selection, Crossover, Mutation.

Roulette selection method is applied to select individuals in the populations, in which the probability of an individual being selected is inversely proportional to its fitness.

Crossover operation is to reconstruct parts of two parent individuals to generate new individuals. A random number is generated within the length of the chromosome as a scheduled crossing point. Then two individual chromosomes exchange the order on this crossover point.

Mutation operation is to change the gene value of the individual, in order to ensure that the algorithm has the ability of random search and maintain the local population diversity. Usually selects one or more genes by random in the individuals and alters them at the preset probability.

(6) Updating the population.

(7) Determining whether satisfies the termination criteria, If satisfied, provides the output, otherwise return to step (4).

The pseudo code of collaborative optimization algorithm is as shown in Fig. 2.

#### IV. PERFORMANCE EVALUATION OF MAPPING SCHEMES

In order to evaluate the proposed testing optimization mapping scheme, some experiments are applied to four circuits of ITC'02 benchmark and they are d695, g1023, p22810, p93791. The collaborative optimization algorithm



Fig.3 Normalized testing time comparison results

TABLE I COMPARISON RESULTS OF G1023 CIRCUIT

| W    | $\alpha, \beta$             | Tmax_rand | Tmax_optimal | e_rand | e_optimal | C <sub>coop</sub> _rand | C <sub>coop</sub> _optimal |
|------|-----------------------------|-----------|--------------|--------|-----------|-------------------------|----------------------------|
| W=16 | $\alpha = \beta = 0.5$      | 17159     | 16928        | 409014 | 284623    | 29030                   | 22695                      |
|      | $\alpha = 1, \beta = 0$     | 15450     | 14953        | 601177 | 671450    | 15450                   | 14953                      |
|      | $\alpha = 0.8, \beta = 0.2$ | 15190     | 17159        | 409014 | 332166    | 20907                   | 18795                      |
| W=32 | $\alpha = \beta = 0.5$      | 24187     | 15767        | 324695 | 282728    | 28328                   | 22019                      |
|      | $\alpha = 1, \beta = 0$     | 15372     | 14953        | 810481 | 554495    | 15372                   | 14953                      |
|      | $\alpha = 0.8, \beta = 0.3$ | 17159     | 15031        | 409014 | 299129    | 21907                   | 18007                      |
|      |                             |           |              |        |           |                         |                            |

is written in C++, and its compiler and simulation environment is Visual C++ 9.0.

NoC mapping needs the determined traffic information between various IP cores. Since g1023 has the same number of IP cores as H.263, it adopts traffic diagram of H.263 [9], while other three ITC'02 circuits adopt TGFF randomly generated traffic [10].

To evaluate the performance of the algorithm, Table I shows the algorithm results of g1023 circuit for example. The comparison results of testing time  $T_{max}$ \_optimal, communication overhead  $e_optimal$ , collaborative optimization of overall mapping overhead  $C_{coop}$ \_optimal with random mapping scheme are provided under different proportion with  $\alpha$  and  $\beta$ .

Testing time comparison results of four ITC'02 circuits under three proportion of  $\alpha$  and  $\beta$  are shown in Fig.3. For the sake of effective comparison,  $T_{max}$  optimal and  $T_{max}$  rand value are normalized.

It can be concluded from Fig.3 that testing time of our proposed scheme is shorter than random mapping scheme under various optimal proportions. And the case of  $\alpha = 1, \beta = 0$ , for each circuit, is the best optimization result of testing time. It also can be concluded that with the decrease of  $\alpha$  proportion, testing time optimization results are also reduced. Furthermore, circuits have different testing optimization effects and that is not directly related to circuit complexity. However, under any circumstances, the optimization effect of W = 32 is better than that of W = 16.

Mapping cost results of collaborative optimization scheme for the ITC'02 circuits under test is shown in Fig.4. In order to compare effectively, proportion of mapping cost reduction is provided in Fig.4. The reduction proportion of mapping cost  $r_{cost}$  is based on (15).

$$r_{\cos t} = \frac{C_{coop\_rand} - C_{coop\_optimal}}{C_{coop\_rand}} \times 100\%$$
(15)





Fig.4 Proportion of mapping cost reduction comparison results

The experiment results in Fig.4 show, under different bandwidth, the best comprehensive proportion is when the proportion is  $\alpha = \beta = 0.5$  for the four ITC'02 circuits. That also reflects the effectiveness of proposed collaborative optimization algorithm. Whereas  $\alpha = 1, \beta = 0$ , with the lowest optimization effect, because it simply optimizes testing time. For each circuit, there are different optimization effects varied with differentiated proportion of testing time and mapping cost. For example, p22810 circuit has relatively little difference when proportions alter, and this is associated with traffic and testing characteristics of the specific circuit. So that the proportion of  $\alpha$  and  $\beta$  can be adjusted properly in order to improve the overall overhead or focus on a particular optimization target for the specific circuit.

### V. CONCLUSION

The paper firstly analyzed IP core testing time estimation problem when the transmission bandwidth of NoC is provided. Then it introduced the NoC mapping and testing problem. Afterwards, a novel collaborative optimization for testing time and NoC mapping performance is proposed to decrease the testing time and mapping overhead. Since mapping is processed with the consideration of testing time optimization, effectively improves parallel testing efficiency of the system chip based on NoC.

We adopted the Multi-Objective Genetic Algorithm to speed up the implementation process of collaborative optimization scheme. Moreover, this scheme can adjust scale parameters to coordinate the relationship between testing time and power consumption, which will contribute improved adaptability and controllability to NoC mapping structure. The experiment results on ITC'02 circuits showed the effectiveness of the collaborative optimization algorithm. The differentiation with the various proportion of testing time and NoC mapping cost is also analyzed. We will pursue on the optimal balance of the testing and mapping for NoC.

### References

- S. Yang, L. Li, M. Gao and Y. Zhang, "An energy- and delay-aware mapping method of NoC," *Acta Electronica Sinica*, vol. 36, no. 5, 2008, pp. 937-942.
- [2] Y. Xie and Y. Liu, "A research on NoC mapping with Quantum Ant Colony Algorithm," *in Proc. the WiSPNET 2017*, 22-24 March, 2017, Chennai, India, pp. 874-877.
- [3] L. Wang and X. Ling, "Energy-and latency-aware NoC mapping based on chaos discrete particle swarm optimization," *in Proc. the CMC 2010*, 12-14 April, 2010, Shenzhen, China, pp. 263-268.



- [4] K.J. Lee, B.R. Chen and M.A. Kochte, "On-Chip Self-Test Methodology with ALL Deterministic Compressed Test Patterns Recorded in Scan Chains," *IEEE Transactions on CAD*, vol. PP, no. 99, 2018, pp. 1-14.
- [5] A.M. Amory, F. Ferlini, M. Lubaszewski and F. Moraes, "DfT for the reuse of Networks-on-Chip as test access mechanism," *in Proc. the* VTS 2007, 6-10 May, 2007, Berkeley, CA, pp. 435-440.
- [6] M. Agrawal, M. Richter and K. Chakrabarty, "A dynamic programming solution for optimizing test delivery in multicore SOCs," *in Proc. the ITC 2012*, 5-8 November, 2012, Anaheim, CA, pp. 1-10.
- [7] E.J. Marinissen, V. Iyengar and K. Chakrabarty, "A set of benchmarks for modular testing of SOCs," *in Proc. the ITC 2002*, October, 2002, Baltimore, MD, USA, pp. 519-528.
- [8] Y. Zhang, N. Wu and F. Ge, "The co-design of test structure and test data transfer mode for 2D-Mesh NoC," *IAENG Trans. on Engineering Technologies*, 2013, pp. 171-184.
- [9] J.Hu and R. Marculescu, "Energy-aware mapping for tile-based NoC architectures under performance constraints," *in Proc. of the ASP-DAC* 2003, 21-24 January, 2003, Kitakyushu, Japan, pp. 233-239.
- [10] P. R. Dick, L. D. Rhodes and W. Wayne, "TGFF: Task Graphs for Free," *in Proc. the CODES 98*, 15-18 March, 1998, Seattle, WA, USA, pp. 97-101.