Any complex system is subject to errors in execution. A complex system must be designed to incorporate the detection of error as a source of useful information. We propose here a strategy for SSE in a multi-agent system. This strategy attempts to infer causes of agent errors by using information about communication with other atomic agents.
Detection of abnormal communication with another agent is based on communication timing patterns. Each agent collects statistics on timing of communication with other agents. The two main forms of communication are: 1) one-way data flow or observer and 2) master-slave signals. In the first form, one agent supplies another with a steady stream of data, often at regular intervals. In this case the quantity of interest is the elapsed time between successive updates.
In the second form, two agents act as a master-slave pair. The master sends the slave a signal to begin an operation. When the slave has completed its operation, either it sends an acknowledgement signal, or the master can detect that it has finished by monitoring the world state. In this case, we are interested in the elapsed time between the initial signal and the completion of the slave's operation.
Each agent will collect histograms of the pertinent communication timing. Using the data from the histogram, each agent will classify the status of its communication with other atomic agents as normal or abnormal.