I. INTRODUCTION
Internet is nowadays a reality experienced daily by billions of human beings. This network that has evolved in 40 years from a small experiment involving 4 nodes and funded by the DARPA into a web crossing continents, and connecting hundreds of millions of servers and routers, concretizing the cyberspace ”consensual hallucination” prophesized by William Gibson in 1982, is now an essential component of contemporary world. Internet and the attached cyberspace is a concrete reality for the billions of users that ”surf” on it each days to access information, social networks, multimedia and more generally use all the applications that spin what is commonly named the world wide web.
However, Internet was not the first and most important network that has impacted humanity. Current humanity is resulting from the large network of migration that has lead Lucy’s cousins, the Australopithecus afarensis, to come out of Africa and spread into the world. Recent phylogenetic studies show through Human mitochondrial DNA haplogroup analysis that current human genome is a mixture of human genes that have evolved almost everywhere in the world, suggesting that the migration network has not only transported human, but information in form of DNA. The remainder of human history has largely been shaped by different networks: networks of roads, financial, social, families, etc. Network are therefore playing a central role in the development of human societies and their wide spread can make networks and their properties by themselves a topic of study of a ”Network Science”. We will describe later the particularity of this science.
Internet is therefore just the last in a long sequence of networks that have shaped human life, but with the peculiarity that it is the first network dedicated to transport of information in bits, i.e. in information theoretic sense. Indeed, Internet is not the first network transporting bits; before that digital telephony, telegraphic and telex network were transporting bits. However each one of these network were designed to a special purpose while Internet was not dedicated to a specific usage. In that sense Internet fulfilled the information theory perspective opened by Shannon in mid-twentieth century. Because of its specific and central position in our current life, and its peculiarity, transporting information in bits in the largest sense, Internet per se is a topic of scientific investigation. A new science nicknamed ”Internet Science” aims into studying properties of Internet. The aim of the manifesto is to investigate the state of current researches and to describe the understanding of the scientific community of this new science.
II. ON BITS AND INFORMATION
In 1948, Claude E. Shannon, published a groundbreaking and trailblazing paper ”A mathematical theory of communication” [1] that sets the basis of Information Theory. This paper comes as a crowning of a remarkable sequence of papers that has changed the face of the contemporary world and has become the foundations over which Internet is build. Shannon contribution is essentially three folds. He first proved that entropic representation of information through bits is universal, i.e. every band limited physical signal can be represented with an arbitrary precision with a bits based 0/1 representation [2]. He moreover showed how one can trade high fidelity (or low distorsion) or signal reconstruction with bits rate, i.e. the number of bits used to represent one time unit of the signal. This first contribution is the base of multimedia communication and the digital era we are living in, as it enabled the representation of sound and music signals by an mp3 bit stream or the representation of video by a MPEG2 bit stream.
The second contribution of Shannon was proving that information is universal, in other terms one can transmit information from one point to another perfectly, if the rate of transmission is less than the channel capacity, a fundamental property of the channel that only depends on the amount of noise. This is the conceptual basis of all digital communication schemes that are used to transport information nowadays.
The third and maybe the most important contribution of Shannon happened during his master thesis, which was called by Howard Gardner of Harvard University, ”possibly the most important, and also the most famous, master’s thesis of the century”. In his thesis, Shannon shows that all the Boolean algebra can be implemented using relays and switches. This paved the way to digital circuit and ultimately to current CPU’s.
The above three contributions show that Internet and more largely the current digital world can be objectively considered as conceptually rooted by Shannon. In particular, Internet fundamentally transport bits as described by Shannon in its ”mathematical theory of communication”, so that Information theory fundamentals directly impact Internet properties. Shannon’s 1948’s paper dealt mainly with the concept of entropy and with the capacity of point-to-point channels. Information theory research community understood early that they are fundamental difference between the point-to-point communication scenario and the multi-sender, multi-receiver case, like Internet is. Information theory community begun to work on multi-user scenarios in the eighties and most of even very simple multi-user scenarios, like the single relay channel, are still not completely understood.
Nonetheless, the main conceptual idea behind information theoretical treatment of both multi-user and point-to- point channels, is the notion of cooperation through side information, i.e. the encoder in each communicating device is implemented such that each usage of the communication channels transfers useful side-information to receivers. The role of the decoding function at a receiver is to fusion the different received side-information in order to retrieve the messages sent over the channel. For example side-information is generated by adding redundancy through the application of an error correcting code. The capacity of a communication channel is in fact the maximum amount of side-information that can be transferred in each channel usage.
The notion of cooperation is central in information theory and will be seen also to be central in networks. The capacity of a communication channel is achieved when an encoding function transferring the maximum amount of side- information and a decoding function retrieving the message from them are used. A fundamental property of point to point channels is called the ”separation principle”, i.e., the encoding function generating side-information to be sent over a point- to-point channel, can be implemented without any loss of performance without knowing what messages are transferred over the channel. This separation principle is the conceptual basis of the well-known layered architecture. It is noteworthy that the separation principle is only valid for point-to-point communication, and it is not valid for other more general multi-sender, multi-receiver scenarios. Meaning that applying a layered architecture results in a loss of performance for general scenarios. This last point is the theoretical motivation of several cross-layer techniques that are very appealing when resources are scarce and we cannot afford to waste them by using a layered approach,e.g., in wireless networks.
This section motivates the fact that information theory is a fundamental theoretical basis for Internet Science. However this is not the only basis, and we will investigate others in the forthcoming.
III. A DEFINITION FOR NETWORKS
All in all Internet is a network of networks. As we say in the introduction Internet was not the first network, even first data network, and it borrows a lot of similarity with other networks. A basic question in the area of Internet Science is ”what is a network? The answer to this question should enclose the largest set of network while capturing the essence of Internet. Interestingly while the question seems very obvious and trivial, there is no clear consensus on the answer. In an attempt to answer this question in a community of researcher among the European Internet and Network Science (EINS) network of excellence, we witnessed that this question generated at list 10 different answers with strong backing. A major rift is relative to necessity of a graph or not for having a network. On one side an argument is that any system that can be naturally represented by a graph is a network, another view emphasis on the network being the whole setting of a situation, the ”Zeitgeist”, and often a graph is not involved. Another issue is relative to the necessity of existence of a common goal in the constitution of a network or not. Clearly, there is a large set of networks where there is a common goal involved, however one can consider as network cases where a conscious commonality is not visible or pre-existing. One solution to this can be to consider the emergence of the network itself as a fortuitous common goal. This introduces the importance of understanding the growing process of a network, for analysing the resulting network.
Another source of discussion was relative to integrating in the same framework networks that transport energy, goods or services, that we will name ”goods network” in the forthcoming, and information networks that are mainly transporting information, that will be named ”information networks”. Very frequently these two type of networks are intertwined, e.g., the power grid where the network transporting energy is coupled with a control network used manage the energy flow, or a commercial network (like the Silk Road) where goods are transported along with the seller handling the commercial information. However there are fundamental differences between these two types of network. While goods cannot be replicated or generated without spending resources and energy, information can be almost freely replicated and generated. Resulting in the fact that a unit of good or service can be sold or given to only a single final consumer, while information can be given freely to several destinations. This means that goods networks are following some conservation rules like the first law of thermodynamics, while information is definitely following the second law. Indeed there are strong interactions between these two types of network, for example flow based techniques developed for goods network and assuming conservation laws have been applied sometimes successfully to information networks, and as stated earlier goods network are frequently backed with information networks. However there are some cases like for example a linear electrical circuit where the associated information network is not clearly visible. My conjecture here is that any goods network have or had an associated information network that was used for growing it and defining the cooperation relationships. For some case the information network associated with a goods network disappear later when the network get ossified and for some case where the network is still active and dynamic the information network is still here. This means that one has to analyse the associated information network to understand the goods network resulting from it. For this reason, I believe that one can consider as a network, only the information network. Based on all of these considerations and discussions, here is the definition I ended up for a network: ”a network is a set distributed element that are cooperating to exchange information”. Over these exchanged information a goods network can grow later.
This definition contains the ”information” term that needs to be refined. Current Internet is strictly exchanging information in Shannon terms, i.e. in forms of bits. Indeed more complex type of information might be transferred over these bits,e.g. music or video, or emotions in a poetry transferred for example. However, this higher-level information needs transducers to show themselves. Indeed other type of network might transfer other type of information, for example a biologic network might transfer information in form of enzymes or proteins that might not be representable in form of bits. The last term to define is indeed the notion of cooperation. We will describe this in the next section.
IV. ON COOPERATION IN NETWORKS
The fundamental term in the definition of a network is indeed the cooperation. Cooperation is what motivates and keep growing a network. We stated in previous section that nodes are cooperating for information exchange. However we still need to clarify a little more the notion of cooperation for information exchange. There are two cooperation stages networks: the first is the decision to connect and the second consists in deciding what information to forward on established links. These two generally happens at different time scales, the decision to connect at large time scale and the decision to forward at smaller ones. However, in some scenarios like Opportunistic communications in Delay Tolerant Networks or epidemic propagation in human networks, these two decisions can happen at the same time scale. Nonetheless, the connection decision shapes the topology of the network and can act on strong and deep cooperation parameters. Let’s analyse first this aspect and we will thereafter move to the second stage.
A. On the growth of a network
As explained before, Internet Science, look at networks as its main topic of study. Generally we are looking at a constituted network, and we want to analyse it. One of the first questions that come to the mind is how the network has evolved from a set of unconnected nodes to such an observed topology. Several research directions have targeted the development of meaningful models of network growth. My goal here is not to describe these model but rather to show that are different theoretical approach to this question, and to show that the question of understanding the growth of a network has deep implications for a large set of scientific topics. During the past decade, several research work uncovered fundamental observation about the graph observed in a large set of practical large-scale networks. In particular, it was observed that almost all these graphs exhibit power law node degree behaviours, small world properties, etc. Based on these observations several constructive models were developed trying to recover the observed macroscopic and global power law properties using a microscopic defined (at level of nodes and links). One of the most famous was the ”preferential attachment” principle and the ”Barabasi-Albert” model that was able to generate random scale-free network graphs[3]. While these works provide a descriptive mechanism able to mimic the growth of a network however these models do not explain using objective socio- economical process how a set of non connected node grow into an eventually fully connected graph.
Understanding the emergence of cooperation in networks is the one of the fundamental issues of Internet science. In more general terms, a potential direction to explain network growth can be through a prisoner dilemma, where two actors can cooperate or defect and there is a payout or loss for each potential joint decision. Let’s assume the four potential cases for two nodes in a network: the two node cooperate by exchanging information (a bidirectional link or a free peering is build between them), one of the node cooperates and the other defects (the node build a unidirectional link to the other, or accept a paying client-provider agreement), and the two defect (there is no link build in between them). Based on the expected gain in term of received information or money generated the two nodes might decide to apply each one of the above strategies. The prisoner dilemma happens when despite the fact the cooperating is beneficial to the two actors, they decide rationally to not cooperate because of the risk involved when the other actor defect. In an iterative extension to the prisoner dilemma, with several successive round of cooperation/defection defection, the decision to cooperate or defect at each round becomes dependent on the outcome/risk expectation at that stage, but also on the sequence of past cooperation/defection actions taken by each player. Theoretical analysis of the above game theoretical setting shows that for a large variety of scenarios the actors of the iterative game will end up not cooperating, i.e. implementing the ”all defect” strategy, that translates in our network growth analysis in the nodes staying disconnected. Axelrod studied in its book [4], the prisoner dilemma and showed through experimental evaluation that when the number of iteration of the cooperation/defection game is not known in advance or when the risk/payoff of each decision is not clearly defined in advance the defection strategy is not the best and can result in very bad results. He shows that ”Altruist” strategy give frequently better results as they enable cooperation to emerge.
This is in this context that networks play their role in helping in the emergence of cooperation. By enriching the palette of possible cooperations between N actors from N2 in bilateral terms to up to 2N possible network configurations, networks help strongly in the emergence of cooperation in decentralized situations. In fact a network robustify the decision to cooperate against the defection of the other players by enabling other potential strategy. This line of thinking have been an active area of research in the past years [5] and give very promising perspectives on understanding networks growth process.
B. On Forwarding
The second stage of cooperation is done through information forwarding. Let’s clarify this type of cooperation. Let’s assume that a node at time T has obtained through reception from other nodes, or through it’s own existing knowledge, a sequence of messages {Xi}. The cooperation of a node in the network consist of forwarding to other nodes at time T a message Y T that is obtained as function F({Xi}) of previously received messages. The cooperation of a node is achieved through the particular forwarding function F that is implemented in a node. For example, a node might implement a gossiping (or epidemic) forwarding scheme that consists of forwarding whatever it can to a node it encounters regardless of its identity, or a node might forward each packet received to a specific egress link that validates a condition (for example the attached node being on the path to the destination of the packet in the routing table). The forwarding function enables to implement a wide palette of cooperation.
In particular among the different cooperation schemes there are some that can achieve high global performance (as defined by information theory or any other economical, social, or technical performance metric), e.g. in Internet a specific routing table setting can achieve the best revenue for the operator. When an actor of the network is willing to implement a particular forwarding function that will achieve the best global performance, we are in the context of ”full cooperation”.
However, there is also the case when the actors in the network, have their own selfishness and would like to achieve a personal performance metrics. In this case the forwarding function is tailored to the particular need of the node self. For example an actor would forward specific information to specific users to protect his/her privacy. Here the issue is can we accommodate an overall performance or benefit metric in term of information diffusion while achieving a personal benefit. This is clearly a game theoretical question with Nash equilibrium flavour. A network has to accommodate with the user selfishness in order to grow and be sustained. The previous discussion on the prisoner’s dilemma is also fully relevant here. In an iterative prisoners’ dilemma setting, the cooperating actors in a network have to adapt their forwarding function in order to adapt to achieve cooperation.
A third context of forward cooperation is what I call the ”non-rational” case, where a node is implementing a forwarding behaviour that is not compatible with any rationality understood by network participant. The context of ”non-rational” forwarding is very similar to what we have in social science as anti-social behaviour (while asocial behaviour can also exist in a network but result in self-exclusion from the network). Law and social norms generally delimit the frontier between anti-social and social behaviour. Acceptable social behaviour and the corresponding actor rationality is generally defined in opposition with anti-social behaviour,i.e., everything that is not explicitly forbidden is acceptable. However in several cases, in particular in data network security the dual paradigm is also used, only what explicitly defined as licit is authorized and everything else forbidden and irrational (in regard of the network rationality). The solution that is generally implemented in societies is to monitor actors behaviours and to detect ”non-rational” behaviours and to confine them (in jail). The same approach is valid for managing actors in a network, networks actor are monitored and the ones detected as ”irrational” are ejected from the network by cutting their links. The above approach is radically different from the current security paradigm in data network like Internet, where an authentication paradigm is used. If a user is authenticated and connects to a network, it can do everything he is explicitly authorized to (following the system policies) and no behaviour monitoring is applied to him.
V. TOWARD A NEW ECONOMICAL THEORY FOR INFORMATION NETWORKS
I described previously the difference that can be observed between information and goods networks. I also described the deep implication on growth and cooperation of game theoretical arguments. All these premises drive us to think of about the economy of network as a very fundamental brick of Internet Science. As stated in the definition, networks are cooperating for exchanging information, we also saw that the decision to cooperate or not is a rational decision that take into account cost and benefit. While the issue of cost and benefit is clear in the context of ISP networks, where real money are exchanged through transit agreement, and through customer payment, the issue is more complex when we think about purely information exchange. One major issue that we have is relative to the particular nature of information that cannot be simply fitted into classical economical terms. Very frequently in network economical analysis, one assumes a value function for information that translates the information into a monetary value and follows up with the analysis from that point. However the issue of determining this value function is generally pushed under the carpet. One of the reasons is that information cannot be fitted with the classical utility-based formulation of micro economical consumer model. This model assumes that utility of a good for a customer is a quasi-concave function of the amount of the good. However, information is lacking a basic properties of a good: a conservation property; when you have one unit of good you can only give it to one customer, while when you have one unit of information (for example an mp3 file), you can at almost zero cost duplicate it and generate an almost infinite supply of it. Because of this conservation property, some economists define economy as the science of sharing rare resources with the highest resulting value. However as explained information does not fit in this definition. This, to some extends, explains the major difficulty that we have in evaluating the value of ”.com” companies like Google or Facebook. For a company producing a manufactured product or service it is possible to base an economical evaluation on tangible production, sale, and activity constraints while for a company based on information the lack of conservation law can lead to an explosive growth.
There have been several attempts to solve the issues described before. First attempt define the value of information as the value of money that somebody can gain by knowing this piece, for example having the knowledge of a company roadmap can give to a person an edge that he can benefit from in the stock market. While this is definition is meaningful in some cases, in the large majority of case it is not very effective. For example how can you evaluate the value of the information provided by a Facebook user on his profile page? A second attempt defines the value of information as the price of the amount of energy needed to create it. This definition is completely meaningful from a thermodynamic perspective. The second law of thermodynamic states that one has to consume energy to reduce entropy and therefore to produce information. A perfect target for this definition is the Google model. Google reduces the entropy (the uncertainty) that we have about a content, by giving a list of website containing the keyword we enter. For doing this Google consumes energy and one might state the value of the information provided by Google is at least the cost of the energy used to produce it. While this definition is meaningful in some scenario, it is not really usable in some other context as the value of information in an informal chat between friends in a friendship network. A third approach to information valuation is to state that all in all, the only thing that one can trade with information is information itself. This definition can solve the issue we had with Facebook value. In fact if somebody want to understand the success of Facebook, he have to consider that Facebook is a market for sharing piece of privacy between peoples. In Facebook you put some of your private information in stake and you receive in return some bits of privacy of others. Facebook in fact get money by providing the platform of this privacy market. These three attempts show that we are still far from having a consistent economical theory of information at that theory is needed to understand fully the Internet and networks in more general terms. Developing such a theory is one of the major challenges of Internet Science.
VI. CONCLUSION AND FUTURE WORK
This paper was an occasion to develop some theoretical basis and fundaments for an emerging new science: Internet Science. A large part of this work has resulted from discussions and exchange with members of the European Network of Excellence EINS funded under FP7 Fire activity. The topic of this Network is to develop Internet Science and to put it on strong fundaments. I hope this paper is one milestone in this long running path.
REFERENCES
- [1] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423, 1948.
- [2] R. B. Blackman, H. W. Bode, and C. E. Shannon, “Data smoothing and prediction in fire-control systems, vol. 1, gunfire control,” National Defense Research Committee, Washington, DC, USA, Summary Technical Report AD 200795, 1946.
- [3] A. L. Barabasi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, pp. 509–512, 1999.
- [4] R. Axelrod, The Evolution of Cooperation. New York: Basic Books, 1984.
- [5] G. Szabo ́ and G. Fa ́th, “Evolutionary games on graphs,” Phys. Rep., vol. 446, no. 4-6, pp. 97–216, 2007.