A centralized self-adaptive fault tolerance approach based on feedback control for multiagent systems

Our research introduces a self-adaptive fault tolerance approach for multiagent systems that enables the system to avoid crash failures. It is a replication-based approach that exploits a feedback control loop and a proportional (P) controller within a replication infrastructure. Thus, we are able to both observe the agents behaviors to estimate criticalities and determine the number of replicas in replica groups with respect to the agents criticalities and the number of available resources. Thus, agents that are to be replicated and the numbers of replicas in replica groups are automatically and adaptively identified in dynamic environments. We implement this approach to demonstrate performance gained in a set of experiments undertaken in different operating conditions.

PDF

___

[1] Laprie JC. Dependable computing and fault tolerance: concepts and terminology. In: 15th IEEE International Symposium on Fault Tolerant Computing; 1985; Ann Arbor, MI, USA. pp. 2-11.
[2] Kephart JO, Chess DM. The vision of autonomic computing. Computer 2003; 36: 41-50.
[3] Babaoglu O, Jelasity M, Montresor A, Fetzer C, Leonardi S, Moorsel SA, Steen M. Self-Star Properties in Complex Information Systems: Conceptual and Practical Foundations. Secaucus, NJ, USA: Springer-Verlag, 2005.
[4] Huebscher MC, McCann JA. A survey of autonomic computing: degrees, models, and applications. ACM Comput Surv 2008; 40: 1-28.
[5] Fedoruk A, Deters R. Improving fault-tolerance by replicating agents. In: 1st International Joint Conference on Autonomous Agents and Multi-Agent Systems; July 1519 2002; Bologna, Italy. New York, NY, USA: ACM. pp. 737-744.
[6] Guessoum Z, Briot JP. From active objects to autonomous agents. IEEE Concurr 1999; 7: 68-76.
[7] Marin O, Sens P, Briot JP, Guessoum Z. Towards adaptive fault-tolerance for distributed multi-agent systems. In: 3rd European Research Seminar on Advanced Distributed Systems; 2001; Annecy, France. pp. 195-201.
[8] Faci N, Guessoum Z, Marin O. DimaX: A fault tolerant multi-agent platform. In: Fifth International Workshop on Software Engineering for Large-Scale Multi-Agent Systems; 2223 May 2006; Shanghai, China. New York, NY, USA: ACM. pp. 13-20.
[9] Powell D. Delta-4: A Generic Architecture for Dependable Distributed Computing. Berlin, Germany: Springer, 1991.
[10] Cristian F, Dancey B, Dehn J. Fault-tolerance in the advanced automation system. In: 20th International Conference on Fault-Tolerant Computing; 2628 June 1990; Newcastle upon Tyne, UK. New York, NY, USA: IEEE. pp. 6-17.
[11] Elmootazbellah N, Zwaenepoel W. Replicated distributed processes in Manetho. In: Twenty-Second International Symposium on Fault Tolerant Computing; 810 July 1992; Boston, MA, USA. New York, NY, USA: IEEE. pp. 18-27.
[12] Bora S. Implementing fault-tolerant services in goal-oriented multi-agent systems. Adv Electr Comput En 2014; 14: 113-122.
[13] Guessoum Z, Ziane M, Faci N. Monitoring and organizational-level adaptation of multi-agent systems. In: 3rd International Joint Conference on Autonomous Agents and Multi-Agent Systems; 1923 July 2004. New York, NY, USA: ACM. pp. 514-522.
[14] Bora S, Dikenelli O. Applying feedback control in adaptive replication in fault tolerant multi-agent organizations. In: Fifth International Workshop on Software Engineering for Large-Scale Multi-Agent Systems; 2223 May 2006; Shanghai, China. New York, NY, USA: ACM. pp. 5-11.
[15] Bora S, Dikenelli O. Experience with feedback control mechanisms in self-replicating multi-agent systems. In: Burkhard D, Lindemann G, Verbrugge R, Varga Z, editors. Multi-Agent Systems and Applications V, 5th International Central and Eastern European Conference on Multi-Agent Systems. Berlin, Germany: Springer Verlag, 2007. pp. 133-142.
[16] Guessoum Z, Faci N, Briot J. Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform. ACM SIGSOFT 2005; 30: 1-6.
[17] Phillips CL, Harbor RD. Feedback Control Systems. Englewood Cliffs, NJ, USA: Prentice Hall, 1999.
[18] Bora S, Dikenelli O. Implementing a multi agent organization that changes its fault tolerance policy at run-time. In: Dikenelli O, Gleizes MP, Ricci A, editors. Engineering Societies in the Agents World VI, 6th International Workshop. Berlin, Germany: Springer Verlag, 2006. pp. 153-167.
[19] Almeida A, Aknine S, Briot J. Dynamic resource allocation heuristics for providing fault tolerance in multi-agent systems. In: 2008 ACM Symposium on Applied Computing; 1620 March 2008; Fortaleza, Brazil. New York, NY, USA: ACM. pp. 66-70.
[20] Dikenelli O, Erdur RC, Gumus O, Ekinci EE, Gurcan O, Kardas G, Seylan I, Tiryaki AM. SEAGENT: A platform for developing semantic web based multi agent systems. In: Fourth International Joint Conference on Autonomous Agents and Multiagent Systems; 2529 July 2005; Utrecht, the Netherlands. New York, NY, USA: ACM. pp. 1271-1272.
[21] Bora S, Dikenelli O. On the choice of sampling rates in a fault-tolerant multi-agent system. In: International Symposium on Innovations in Intelligent Systems and Applications Conference; 24 July 2012; Trabzon, Turkey. New York, NY, USA: IEEE. pp. 1-5.