Few research fields can trace their birth to a single moment and place in history. Graph theory, the mathematical scaffold behind network science, can. Its roots go back to 1735 in Königsberg, the capital of Eastern Prussia, a thriving merchant city of its time. The trade supported by its busy fleet of ships allowed city officials to build seven bridges across the river Pregel that surrounded the town. Five of these connected to the mainland the elegant island Kneiphof, caught between the two branches of the Pregel. The remaining two crossed the two branches of the river (Image 2.1). This peculiar arrangement gave birth to a contemporary puzzle: Can one walk across all seven bridges and never cross the same one twice? Despite many attempts, no one could find such path. The problem remained unsolved until 1735, when Leonard Euler, a Swiss born mathematician, offered a rigorous mathematical proof that such path does not exist [6, 7].
Euler represented each of the four land areas separated by the river with letters A, B, C, and D (Image 2.1). Next he connected with lines each piece of land that had a bridge between them. He thus built a graph, whose nodes were pieces of land and links were the bridges. Then Euler made a simple observation: if there is a path crossing all bridges, but never the same bridge twice, then nodes with odd number of links must be either the starting or the end point of this path. Indeed, if you arrive to a node with an odd number of links, you may find yourself having no unused link for you to leave it.
A walking path that goes through all bridges can have only one starting and one end point. Thus such a path cannot exist on a graph that has more than two nodes with an odd number of links. The Königsberg graph had four nodes with an odd number of links, A, B, C, and D, so no path could satisfy the problem.
Euler’s proof was the first time someone solved a mathematical problem using a graph. For us the proof has two important messages: The first is that some problems become simpler and more tractable if they are represented as a graph. The second is that the existence of the path does not depend on our ingenuity to find it. Rather, it is a property of the graph. Indeed, given the structure of the Königsberg graph, no matter how smart we are, we will never find the desired path. In other words, networks have properties encoded in their structure that limit or enhance their behavior.
To understand the many ways networks can affect the properties of a system, we need to become familiar with graph theory, a branch of mathematics that grew out of Euler’s proof. In this chapter we learn how to represent a network as a graph and introduce the elementary characteristics of networks, from degrees to degree distributions, from paths to distances and learn to distinguish weighted, directed and bipartite networks. We will introduce a graph-theoretic formalism and language that will be used throughout this book
If we want to understand a complex system, we first need to know how its components interact with each other. In other words we need a map of its wiring diagram. A network is a catalog of a system’s components often called nodes or vertices and the direct interactions between them, called links or edges (BOX 2.1). This network representation offers a common language to study systems that may differ greatly in nature, appearance, or scope. Indeed, as shown in Image 2.2, three rather different systems have exactly the same network representation.
Image 2.2 introduces two basic network parameters:
Number of nodes, or N, represents the number of components in the system. We will often call N the size of the network. To distinguish the nodes, we label them with i = 1, 2, ..., N.
Number of links, which we denote with L, represents the total number of interactions between the nodes. Links are rarely labeled, as they can be identified through the nodes they connect. For example, the (2, 4) link connects nodes 2 and 4.
The networks shown in Image 2.2 have N = 4 and L = 4.
The links of a network can be directed or undirected. Some systems have directed links, like the WWW, whose uniform resource locators (URL) point from one web document to the other, or phone calls, where one person calls the other. Other systems have undirected links, like romantic ties: if I date Janet, Janet also dates me, or like transmission lines on the power grid, on which the electric current can flow in both directions.
A network is called directed (or digraph) if all of its links are directed; it is called undirected if all of its links are undirected. Some networks simultaneously have directed and undirected links. For example in the metabolic network some reactions are reversible (i.e., bidirectional or undirected) and others are irreversible, taking place in only one direction (directed).
The choices we make when we represent a system as a network will determine our ability to use network science successfully to solve a particular problem. For example, the way we define the links between two individuals dictates the nature of the questions we can explore:
While many links in these four networks overlap (some coworkers may be friends or may have an intimate relationship), these networks have different uses and purposes.
We can also build networks that may be valid from a graph theoretic perspective, but may have little practical utility. For example, if we link all individuals with the same first name, Johns with Johns and Marys with Marys, we do obtain a well-defined graph, whose properties can be analyzed with the tools of network science. Its utility is questionable, however. Hence in order to apply network theory to a system, careful considerations must precede our choice of nodes and links, ensuring their significance to the problem we wish to explore.
Throughout this book we will use ten networks to illustrate the tools of network science. These reference networks, listed in Table 2.1, span social systems (mobile call graph or email network), collaboration and affiliation networks (science collaboration network, Hollywood actor network), information systems (WWW), technological and infrastructural systems (Internet and power grid), biological systems (protein interaction and metabolic network), and reference networks (citations). They differ widely in their sizes, from as few as N =1,039 nodes in the E. coli metabolism, to almost half million nodes in the citation network. They cover several areas where networks are actively applied, representing ‘canonical’ datasets frequently used by researchers to illustrate key network properties. As we indicate in Table 2.1, some of them are directed, others are undirected. In the coming chapters we will discuss in detail the nature and the characteristics of each of these datasets, turning them into the guinea pigs of our journey to understand complex networks.
| Network | Nodes | Links | Directed / Undirected | N | L | ‹K› |
|---|---|---|---|---|---|---|
| Internet | Routers | Internet connections | Undirected | 192,244 | 609,066 | 6.34 |
| WWW | Webpages | Links | Directed | 325,729 | 1,497,134 | 4.60 |
| Power Grid | Power plants, transformers | Cables | Undirected | 4,941 | 6,594 | 2.67 |
| Mobile-Phone Calls | Subscribers | Calls | Directed | 36,595 | 91,826 | 2.51 |
| Email addresses | Emails | Directed | 57,194 | 103,731 | 1.81 | |
| Science Collaboration | Scientists | Co-authorships | Undirected | 23,133 | 93,437 | 8.08 |
| Actor Network | Actors | Co-acting | Undirected | 702,388 | 29,397,908 | 83.71 |
| Citation Network | Papers | Citations | Directed | 449,673 | 4,689,479 | 10.43 |
| E. Coli Metabolism | Metabolites | Chemical reactions | Directed | 1,039 | 5,802 | 5.58 |
| Protein Interactions | Proteins | Binding interactions | Undirected | 2,018 | 2,930 | 2.90 |
A key property of each node is its degree, representing the number of links it has to other nodes. The degree can represent the number of mobile phone contacts an individual has in the call graph (i.e. the number of different individuals the person has talked to), or the number of citations a research paper gets in the citation network.
We denote with ki the degree of the ithnode in the network. For example, for the undirected networks shown in Image 2.2 we have k1=2, k2=3, k3=2, k4=1. In an undirected network the total number of links, L, can be expressed as the sum of the node degrees:
Here the 1/2 factor corrects for the fact that in the sum (2.1) each link is counted twice. For example, the link connecting the nodes 2 and 4 in Image 2.2 will be counted once in the degree of node 1 and once in the degree of node 4.
An important property of a network is its average degree (BOX 2.2), which for an undirected network is
In directed networks we distinguish between incoming degree, kiin, representing the number of links that point to node i, and outgoing degree, kiout, representing the number of links that point from node i to other nodes. Finally, a node’s total degree, ki, is given by
For example, on the WWW the number of pages a given document points to represents its outgoing degree, kout, and the number of documents that point to it represents its incoming degree, kin. The total number of links in a directed network is
The 1/2 factor seen in (2.1) is now absent, as for directed networks the two sums in (2.4) separately count the outgoing and the incoming degrees. The average degree of a directed network is
The degree distribution, pk, provides the probability that a randomly selected node in the network has degree k. Since pk is a probability, it must be normalized, i.e.
For a network with N nodes the degree distribution is the normalized histogram (Image 2.3) is given by
where Nk is the number of degree-k nodes. Hence the number of degree-k nodes can be obtained from the degree distribution as Nk = Npk.
The degree distribution has assumed a central role in network theory following the discovery of scale-free networks [8]. One reason is that the calculation of most network properties requires us to know pk. For example, the average degree of a network can be written as
The other reason is that the precise functional form of pk determines many network phenomena, from network robustness to the spread of viruses.
A complete description of a network requires us to keep track of its links. The simplest way to achieve this is to provide a complete list of the links. For example, the network of Image 2.2 is uniquely described by listing its four links: {(1, 2), (1, 3), (2, 3), (2, 4)}. For mathematical purposes we often represent a network through its adjacency matrix. The adjacency matrix of a directed network of N nodes has N rows and N columns, its elements being:
Aij = 1 if there is a link pointing from node j to node i
Aij = 0 if nodes i and j are not connected to each other
The adjacency matrix of an undirected network has two entries for each link, e.g. link (1, 2) is represented as A12 = 1 and A21 = 1. Hence, the adjacency matrix of an undirected network is symmetric, Aij = Aji (Image 2.5b)
The degree ki of node i can be directly obtained from the elements of the adjacency matrix. For undirected networks a node’s degree is a sum over either the rows or the columns of the matrix, i.e.
For directed networks the sums over the adjacency matrix’ rows and columns provide the incoming and outgoing degrees, respectively
Given that in an undirected network the number of outgoing links equals the number of incoming links, we have
The number of nonzero elements of the adjacency matrix is 2L, or twice the number of links. Indeed, an undirected link connecting nodes i and j appears in two entries: Aij = 1, a link pointing from node j to node i, and Aji = 1, a link pointing from i to j (Image 2.5b).
In real networks the number of nodes (N) and links (L) can vary widely. For example, the neural network of the worm C. elegans, the only fully mapped nervous system of a living organism, has N = 302 neurons (nodes). In contrast the human brain is estimated to have about a hundred billion (N ≈ 1011) neurons. The genetic network of a human cell has about 20,000 genes as nodes; the social network consists of seven billion individuals (N ≈ 7×109) and the WWW is estimated to have over a trillion web documents (N > 1012).
These wide differences in size are noticeable in Table 2.1, which lists N and L for several network maps. Some of these maps offer a complete wiring diagram of the system they describe (like the actor network or the E. coli metabolism), while others are only samples, representing a subset of the full network (like the WWW or the mobile call graph)
Table 2.1 indicates that the number of links also varies widely. In a network of N nodes the number of links can change between L = 0 and Lmax, where
is the total number of links present in a complete graph of size N (Image 2.6). In a complete graph each node is connected to every other node.
In real networks L is much smaller than Lmax, reflecting the fact that most real networks are sparse. We call a network sparse if L‹‹ Lmax. For example, the WWW graph in Table 2.1 has about 1.5 million links. Yet, if the WWW were to be a complete graph, it should have Lmax ≈ 5x1010 links according to (2.12). Consequently the web graph has only a 3x10-5 fraction of the links it could have. This is true for all of the networks in Table 2.1: One can check that their number of links is only a tiny fraction of the expected number of links for a complete graph of the same number of nodes.
The sparsity of real networks implies that the adjacency matrices are also sparse. Indeed, a complete network has Aij = 1, for all (i, j), i.e. each of its matrix elements are equal to one. In contrast in real networks only a tiny fraction of the matrix elements are nonzero. This is illustrated in Image 2.7, which shows the adjacency matrix of the protein-protein interaction network listed in Table 2.1 and shown in Image 2.4a. One can see that the matrix is nearly empty.
Sparseness has important consequences on the way we explore and store real networks. For example, when we store a large network in our computer, it is better to store only the list of links (i.e. elements for which Aij ≠ 0), rather than the full adjacency matrix, as an overwhelming fraction of the Aij elements are zero. Hence the matrix representation will block a huge chunk of memory, filled mainly with zeros (Image 2.7).
So far we discussed only networks for which all links have the same weight, i.e. Aij = 1. In many applications we need to study weighted networks, where each link (i, j) has a unique weight wij. In mobile call networks the weight can represent the total number of minutes two individuals talk with each other on the phone; on the power grid the weight is the amount of current flowing through a transmission line.
For weighted networks the elements of the adjacency matrix carry the weight of the link as
Most networks of scientific interest are weighted, but we can not always measure the appropriate weights. Consequently we often approximate these networks with an unweighted graph. In this book we predominantly focus on unweighted networks, but whenever appropriate, we discuss how the weights alter the corresponding network property (BOX 2.3).
A bipartite graph (or bigraph) is a network whose nodes can be divided into two disjoint sets U and V such that each link connects a U-node to a V-node. In other words, if we color the U-nodes green and the V-nodes purple, then each link must connect nodes of different colors (Image 2.9).
We can generate two projections for each bipartite network. The first projection connects two U-nodes by a link if they are linked to the same V-node in the bipartite representation. The second projection connects the V-nodes by a link if they connect to the same U-node (Image 2.9).
In network theory we encounter numerous bipartite networks. A wellknown example is the Hollywood actor network, in which one set of nodes corresponds to movies (U), and the other to actors (V). A movie is connected to an actor if the actor plays in that movie. One projection of this bipartite network is the actor network, in which two nodes are connected to each other if they played in the same movie. This is the network listed in Table 2.1. The other projection is the movie network, in which two movies are connected if they share at least one actor in their cast.
Medicine offers another prominent example of a bipartite network: The Human Disease Network connects diseases to the genes whose mutations are known to cause or effect the corresponding disease (Image 2.10).
Finally, one can also define multipartite networks, like the tripartite recipe-ingredient-compound network shown in Image 2.11.
Physical distance plays a key role in determining the interactions between the components of physical systems. For example the distance between two atoms in a crystal or between two galaxies in the universe determine the forces that act between them.
In networks distance is a challenging concept. Indeed, what is the distance between two webpages, or between two individuals who do not know each other? The physical distance is not relevant here: Two webpages could be sitting on computers on the opposite sides of the globe, yet, have a link to each other. At the same time two individuals that live in the same building may not know each other.
In networks physical distance is replaced by path length. A path is a route that runs along the links of the network. A path’s length represents the number of links the path contains (Image 2.12a). Note that some texts require that each node a path visits is distinct.
In network science paths play a central role. Next we discuss some of their most important properties, many more being summarized in (Image 2.13).
The shortest path between nodes i and j is the path with the fewest number of links (Image 2.12b). The shortest path is often called the distance between nodes i and j, and is denoted by dij, or simply d. We can have multiple shortest paths of the same length d between a pair of nodes (Image 2.12b). The shortest path never contains loops or intersects itself.
In an undirected network dij = dji, i.e. the distance between node i and j is the same as the distance between node i and j. In a directed network often dij ≠ dji. Furthermore, in a directed network the existence of a path from node i to node j does not guarantee the existence of a path from j to i.
In real networks we often need to determine the distance between two nodes. For a small network, like the one shown in Image 2.12, this is an easy task. For a network with millions of nodes finding the shortest path between two nodes can be rather time consuming. The length of the shortest path and the number of such paths can be formally obtained from the adjacency matrix (BOX 2.4). In practice we use the breadth first search (BFS) algorithm discussed in BOX 2.5 for this purpose.
The diameter of a network, denoted by dmax, is the maximum shortest path in the network. In other words, it is the largest distance recorded between any pair of nodes. One can verify that the diameter of the network shown in Image 2.13 is dmax = 3. For larger networks the diameter can be determined using the BFS algorithm described in BOX 2.5.
The average path length, denoted by 〈d〉, is the average distance between all pairs of nodes in the network. For a directed network of N nodes, 〈d〉 is
Note that (2.14) is measured only for node pairs that are in the same component (SECTION 2.9). We can use the BFS algorithm to determine the average path length for a large network. For this we first determine the distances between the first node and all other nodes in the network using the algorithm described in BOX 2.5. We then determine the distances between the second node and all other nodes but the first one (if the network is undirected). We then repeat this procedure for all nodes. The sum
A phone would be of limited use as a communication device if we could not call any valid phone number; email would be rather useless if we could send emails to only certain email addresses, and not to others. From a network perspective this means that the network behind the phone or the Internet must be capable of establishing a path between any two nodes. This is in fact the key utility of most networks: they ensure connectedness. In this section we discuss the graph-theoretic formulation of connectedness.
In an undirected network nodes i and j are connected if there is a path between them. They are disconnected if such a path does not exist, in which case we have dij = ∞. This is illustrated in Image 2.15a, which shows a network consisting of two disconnected clusters. While there are paths between any two nodes on the same cluster (for example nodes 4 and 6), there are no paths between nodes that belong to different clusters (nodes 1 and 6).
A network is connected if all pairs of nodes in the network are connected. A network is disconnected if there is at least one pair with dij = ∞. Clearly the network shown in Image 2.15a is disconnected, and we call its two subnetworks components or clusters. A component is a subset of nodes in a network, so that there is a path between any two nodes that belong to the component, but one cannot add any more nodes to it that would have the same property.
If a network consists of two components, a properly placed single link can connect them, making the network connected (Image 2.15b). Such a link is called a bridge. In general a bridge is any link that, if cut, disconnects the network.
While for a small network visual inspection can help us decide if it is connected or disconnected, for a network consisting of millions of nodes connectedness is a challenging question. Mathematical and algorithmic tools can help us identify the connected components of a graph. For example, for a disconnected network the adjacency matrix can be rearranged into a block diagonal form, such that all nonzero elements in the matrix are contained in square blocks along the matrix’ diagonal and all other elements are zero (Image 2.15a). Each square block corresponds to a component. We can use the tools of linear algebra to decide if the adjacency matrix is block diagonal, helping us to identify the connected components.
In practice, for large networks the components are more efficiently identified using the BFS algorithm (BOX 2.6).
The clustering coefficient captures the degree to which the neighbors of a given node link to each other. For a node i with degree ki the local clustering coefficient is defined as [12]
where Li represents the number of links between the ki neighbors of node i. Note that Ci is between 0 and 1 (Image 2.16a):
In summary Ci measures the network’s local link density: The more densely interconnected the neighborhood of node i, the higher is its local clustering coefficient.
The degree of clustering of a whole network is captured by the average clustering coefficient, 〈C〉, representing the average of Ci over all nodes i = 1, ..., N [12],
In line with the probabilistic interpretation 〈C〉 is the probability that two neighbors of a randomly selected node link to each other.
While (2.16) is defined for undirected networks, the clustering coefficient can be generalized to directed and weighted [13, 14, 15, 16] networks as well. In the network literature we may encounter the global clustering coefficient as well, discussed in ADVANCED TOPICS 2.A.
The crash course offered in this chapter introduced some of the basic graph theoretical concepts and tools used in network science. The set of elementary network characteristics, summarized in Image 2.17, offer a formal language through which we can explore networks.
Many of the networks we study in network science consist of thousands or even millions of nodes and links (Table 2.1). To explore them, we need to go beyond the small graphs shown in Image 2.17. A glimpse of what we are about to encounter is offered by the protein-protein interaction network of yeast (Image 2.4a). The network is too complex to understand its properties through a visual inspection of its wiring diagram. We therefore need to turn to the tools of network science to characterize its topology.
In network science we often distinguish networks by some elementary property of the underlying graph. Here we summarize the most commonly encountered network types. We also list real systems that share the particular property. Note that many real networks combine several of these elementary network characteristics. For example the WWW is a directed multi-graph with self-interactions; the mobile call network is directed and weighted, without self-loops.
Undirected Network
A network whose links do not have a defined direction.
Examples: Internet, power grid, science collaboration networks.
Self-loops
In many networks nodes do not interact with themselves, so the diagonal elements of the adjacency matrix are zero, Aii = 0, i = 1,..., N. In some systems self-interactions are allowed; in such networks, self-loops represent the fact that node i interacts with itself.
Examples: WWW, protein interactions.
Multigraph/Simple Graphs
In a multigraph nodes are permitted to have multiple links (or parallel links) between them. Hence Aii can be any positive integer. Networks that do not allow multiple links are called simple.
Multigraph Examples: Social networks, where we distinguish friendship, family and professional ties.
Directed Network
A network whose links have selected directions.
Examples: WWW, mobile phone calls, citation network.
Weighted Network
A network whose links have a defined weight, strength or flow parameter. The elements of the adjacency matrix are Aij = wij if there is a link with weight wij between them. For unweighted (binary) networks, the adjacency matrix only indicates the presence (Aij = 1) or the absence (Aij = 0) of a link.
Examples: Mobile phone calls, email network.
Complete Graph (Clique)
In a complete graph, or a clique, all nodes are connected to each other.
Examples: Actors in the cast of the same movie, as they are all linked to each other in the actor network.
Let us use the measures we introduced so far to explore some basic characteristics of this network. The undirected network, shown in Image 2.4a, has N = 2,018 proteins as nodes and L=2,930 binding interactions as links. Hence its average degree, according to (2.2), is 〈k〉 = 2.90, suggesting that a typical protein interacts with approximately two to three other proteins. Yet, this number is somewhat misleading. Indeed, the degree distribution pk shown in Image 2.4b,c, indicates that the vast majority of nodes have only a few links. To be precise, in this network 69% of nodes have fewer than three links, i.e. for these k ‹ 〈k〉 . These numerous nodes with few links coexist with a few highly connected nodes, or hubs, the largest having as many as 92 links. Such wide differences in node degrees is a consequence of the network’s scale-free property, discussed in CHAPTER 4. We will see that the shape of the degree distribution determines a wide range of network properties, from the network’s robustness to the spread of viruses.
The breadth-first-search algorithm (BOX 2.5) helps us determine the network’s diameter, finding dmax = 14. We might be tempted to expect wide variations in d, as some nodes are close to each other, others, however, may be quite far. The distance distribution (Image 2.18a) indicates otherwise: pd has a prominent peak between 5 and 6, telling us that most distances are rather short, being in the vicinity of 〈d〉 =5.61. Also, pd decays fast for large d, suggesting that large distances are absent. Indeed, the variance of the distances is σd = 1.64, indicating that most path lengths are in the close vicinity of 〈d〉 . These are manifestations of the small world property discussed in CHAPTER 3.
The breadth first search algorithm also tells us that the protein interaction network is not connected, but consists of 185 components, shown as isolated clusters and nodes in Image 2.4a. The largest, called the giant component, contains 1,647 of the 2,018 nodes; all other components are tiny. As we will see in the coming chapters, such fragmentation is common in real networks.
The average clustering coefficient of the protein interaction network is 〈C〉 =0.12, which, as we will come to appreciate in the coming chapters, indicates a significant degree of local clustering. A further caveat is provided by the dependence of the clustering coefficient on the node’s degree, or the C(k) function (Image 2.18b). The fact that C(k) decreases for large k indicates that the local clustering coefficient of the small nodes is significantly higher than the local clustering coefficient of the hubs. Hence the small degree nodes are located in dense local network neighborhoods, while the neighborhood of the hubs is much sparser. This is a consequence of hierarchy, a network property discussed in CHAPTER 9.
Finally, a visual inspection reveals an interesting pattern: hubs have a tendency to connect to small nodes, giving the network a hub and spoke character (Image 2.4a). This is a consequence of degree correlations, discussed in CHAPTER 7. Such correlations influence a number of network based processes, from spreading phenomena to the number of driver nodes needed to control a network.
Taken together, Image 2.4 and 2.18 illustrate that the quantities we introduced in this chapter can help us diagnose several key properties of real networks. The purpose of the coming chapters is to study systematically these network characteristics and understand what they tell us about a particular complex system.
Which of the icons in Image 2.19 can be drawn without raising yourpencil from the paper, and without drawing any line more than once? Why?
Let A be the NxN adjacency matrix of an undirected unweighted network, without self-loops. Let 1 be a column vector of N elements, all equal to 1. In other words 1 = (1, 1, ..., 1)T , where the superscript T indicates the transpose operation. Use the matrix formalism (multiplicative constants, multiplication row by column, matrix operations like transpose and trace, etc, but avoid the sum symbol Σ) to write expressions for:
The adjacency matrix is a useful graph representation for many analytical calculations. However, when we need to store a network in a computer, we can save computer memory by offering the list of links in a Lx2 matrix, whose rows contain the starting and end point i and j of each link. Construct for the networks (a) and (b) in Imade 2.20:
Consider the bipartite network of Image 2.21
Consider a bipartite network with N1 and N2 nodes in the two sets.
In the network literature we ocassionally encounter the global clustering coefficient, which measures the total number of closed triangles in a network. Indeed, Li in (2.15) is the number of triangles that node i participates in, as each link between two neighbors of node i closes a triangle (Image 2.17). Hence the degree of a network’s global clustering can be also captured by the global clustering coefficient, defined as
where a connected triplet is an ordered set of three nodes ABC such that A connects to B and B connects to C. For example, an A, B, C triangle is made of three triplets, ABC, BCA and CAB. In contrast a chain of connected nodes A, B, C, in which B connects to A and C, but A does not link to C, forms a single open triplet ABC. The factor three in the numerator of (2.17) is due to the fact that each triangle is counted three times in the triplet count. The roots of the global clustering coefficient go back to the social network literature of the 1940s [17, 18], where CΔ is often called the ratio of transitive triplets.
Note that the average clustering coefficient ‹C› defined in (2.16) and the global clustering coefficient (2.17) are not equivalent. Indeed, take a network that is a double star, consisting of N nodes, where nodes 1 and 2 are joined to each other and to all other nodes, and there are no other links. Then the local clustering coefficient Ci is 1 for i ≥ 3 and 2/(N − 1) for i = 1, 2. It follows that the average clustering coefficient of the network is ‹C› = 1−O(1), while the global clustering coefficient is CΔ ~ 1/N. In less extreme networks the two definitions will give more comparable values, but they still differ from each other [19]. For example, for the network of in Image 2.16b we have ‹C› = 0.31 and CΔ = 0.375
[1] K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabási. The human disease network. PNAS, 104:8685–8690, 2007.
[2] H.U. Obrist. Mapping it out: An alternative atlas of contemporary cartographies. Thames and Hudson, London, 2014.
[3] I. Meirelles. Design for Information. Rockport, 2013.
[4] K. Börner. Atlas of Science: Visualizing What We Know. The MIT Press, 2010.
[5] L. B. Larsen. Networks: Documents of Contemporary Art. MIT Press. 2014.
[6] L. Euler, Solutio Problemat is ad Geometriam Situs Pertinentis. Commentarii Academiae Scientiarum Imperialis Petropolitanae 8:128-140, 1741.
[7] G. Alexanderson. Euler and Königsberg’s bridges: a historical view. Bulletin of the American Mathematical Society 43: 567, 2006.
[8] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
[9] G. Gilder. Metcalfe’s law and legacy. Forbes ASAP, 1993.
[10] B. Briscoe, A. Odlyzko, and B. Tilly. Metcalfe’s law is wrong. IEEE Spectrum, 43:34–39, 2006.
[11] Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabási. Flavor network and the principles of food pairing, Scientific Reports, 196, 2011.
[12] D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440–442, 1998.
[13] A. Barrat, M. Barthélemy, R. Pastor-Satorras, and A. Vespignani. The architecture of complex weighted networks. PNAS, 101:3747–3752, 2004.
[14] J. P. Onnela, J. Saramäki, J. Kertész, and K. Kaski. Intensity and coherence of motifs in weighted complex networks. Physical Review E, 71:065103, 2005.
[15] B. Zhang and S. Horvath. A general framework for weighted gene coexpression network analysis. Statistical Applications in Genetics and Molecular Biology, 4:17, 2005.
[16] P. Holme, S. M. Park, J. B. Kim, and C. R. Edling. Korean university life in a network perspective: Dynamics of a large affiliation network. Physica A, 373:821–830, 2007.
[17] R. D. Luce and A. D. Perry. A method of matrix analysis of group structure. Psychometrika, 14:95–116, 1949.
[18] S. Wasserman and K Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
[19] B. Bollobás and O. M. Riordan. Mathematical results on scale-free random graphs, in Stefan Bornholdt, Hans Georg Schuster, Handbook of Graphs and Networks: From the Genome to the Internet (2003 Wiley-VCH Verlag GmbH & Co. KGaA).