Centralized System Logging/alx
|Centralized System Logging|
|Contributors||Roland Bijvank, Wiebe Wiersema, Christian Köppe|
|Last modification||May 17, 2017|
|Source||Bijvank, Wiersema & Köppe (2013)|
|Pattern formats||OPR Alexandrian|
The application needs to provide the ability of logging certain events or actions for the application developer. The application developer is one target audience for the need of logging, the other target audience is the system administrator for the purpose of system management, especially monitoring of system (landscapes).
Problem and forces:
Having a variety of logging formats and log-file locations makes it hard to monitor the state of a whole enterprise, including all running applications. In the case of an error it is hard to find the cause for it when multiple logs are dispersed over hundreds of servers without the right tools. Some problems which could be met when trying to integrate the wide variety of log files and their formats are:
Format Variety. A high variety of logging formats increases the complexity of integrating the information held within those several log files. It becomes a burden to nullify the different lay-outs of these log files.
Location Variety. When having a variety of log file locations the dispersion of those locations makes it hard to find those log files.
Information Granularity. Not only the formats might be varying, but also the granularity of information. This makes it hard to monitor all applications in a consistent way or to integrate the information in a consistent way for other statistical purposes like e.g. root cause analysis.
Therefore: Use the built-in system logging mechanism whenever possible (preferred solution). If it is not possible, then define a standard format to be used by all systems and implement your own logger (alternative solution).
First the preferred solution will be looked at. Many monitoring tools use the system built-in logging mechanisms. The connection between these is well defined and proven. It is therefore of help for the system administrators if these built-in logging mechanisms are used by all applications, as this allows the administrators to make use of existing tools (e.g. Nagiosor HP OpenView) that collect, centralize, and search the logs.
The built-in system logging mechanisms take care of the log file location problem. They also prescribe the format, thereby forcing the developers, but also supporting them, to make consistent use of logging on the appropriate granularity.
It is also a lot easier to automatically generate incidents from specific defined events from the built-in system log for an IT service management (ITSM) tool. This ITSM tool can be configured to forward the automatically generated incidents directly, without human intervention, to the second line specialists. This way incidents are more easily solved without less human intervention, saving valuable time of the system administrators.
Of course logging in many cases has to be activated from within the system, so developers often have to explicitly program it into the system. But using the built-in logging mechanism alone does not ensure that the developers also make use of logging when it is appropriate. To address this issue guidelines could be defined and used by the developers for including logging in the system.
Now the alternative solution will be treated. If it is not possible to use the built-in system logging, e.g. because of different operating systems being used, then develop your own  and define a standard for your system landscape that works well combined with the administration tools being used. Use the properties of built-in system logging mechanisms as basis for the requirements of your own logging mechanism. The most important point hereby is that this mechanism can be connected to the ITSM tools used by the system administrators. Ensure that this standard system is used for logging. This approach can be combined with . Another solution could be making use of "logging as a service" providers, one just forward the syslog or agents and the log service providers will do the collecting of the data. An example of a log service provider is papertrail.
Some requirements a good log should met to be valuable are:
—Log actions before they happen.
—Mind the file size if logs should be copied or archived.
—Split messages into different files depending on intended audience/way of using.
On the type of usage it depends how robust the chosen solution acts within daily use:
—When normal availability is desirable one can choose, when the centralized logging system fails, to recover the server and reload the logging of the several subsystems.
—When high availability is needed the centralized logging system needs to be made so, e.g. as a High Availability cluster
Otherwise the chosen solution could become a Single Point of Failure (SPoF).
Because one wants just one instance of a system logger,  seems to be the preferred way to implement it. But because one wants to be able to test it, beside the production version, several instances should be possible therefore the number of instances should be parametrizable.
Many monitoring tools provide a mechanism for gathering several logs to one central place, but even easier to use is a distributed log collector:
—Scribe is a scalable log aggregation server used and released by Facebook as open source. Scribe is written in C++ and uses Thriftfor the protocol encoding. Since it uses thrift, virtually any language can work with it.
As an example of the implementation of we have performed with our second year students System and Network Engineering some practical scripting exercises with Python where they, amongst others, use some standard libraries available for Python to log events to the system event log and afterwards create a statistical plot of it with the help of the Python library Matplotlib. An example of a call from Python to the Windows system log is:
This way the students get a feeling for how to integrate information from several resources (systems and applications) into one central store (system event log) and transform that information into a graphical output which could give insighs into e.g. the number of incidents per month with error level.
- Bijvank, R., Wiersema, W., & Köppe, C. (2013). Software architecture patterns for system administration support. In Proceedings of the 20th Conference on Pattern Languages of Programs (PLoP 2013) (p. 1). The Hillside Group.
- Paschke, A., & Schnappinger-Gerull, E. (2006). A Categorization Scheme for SLA Metrics. Service Oriented Electronic Commerce, 80, 25-40.
- Nagios. (2014). Nagios. http://www.nagios.org/. Accessed: 25-April-2014.
- Openview. (2014). HP Openview. http://en.wikipedia.org/wiki/HP_OpenView. Accessed: 25-4-2014.
- Limoncelli, T. A. (2011). A plea from sysadmins to software vendors: 10 Do’s and Don’ts. Communications of the ACM 54, 2, 50–51.
- Harrison, N. B. (2011). Improving quality attributes of software systems through software architecture patterns. Ph.D. thesis, University Library Groningen][Host].
- Papertrail. (2014). Papertrail. https://papertrailapp.com/. Accessed: 25-4-2014.
- Anti Patterns, L. (2014). logging-anti-patterns
- Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1994). Design Patterns: elements of reusable object-oriented software. Addison-Wesley: Boston, MA.
- Scribe. (2014). Scribe. https://github.com/facebook/scribe. Accessed: 25-4-2014.
- Thrift. (2014). Thrift. http://thrift.apache.org/. Accessed: 25-4-2014.
- Flume. (2014). Flume. http://flume.apache.org/. Accessed: 25-4-2014.
- HDFS. (2014). HDFS. http://hadoop.apache.org/. Accessed: 25-4-2014.