Recent from talks
Contribute something
Nothing was collected or created yet.
Logging (computing)
View on WikipediaThis article needs additional citations for verification. (September 2017) |
In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or broad information on current operations. These events may occur in the operating system or in other software. A message or log entry is recorded for each such event. These log messages can then be used to monitor and understand the operation of the system, to debug problems, or during an audit. Logging is particularly important in multi-user software, to have a central overview of the operation of the system.
In the simplest case, messages are written to a file, called a log file.[1] Alternatively, the messages may be written to a dedicated logging system or to a log management software, where it is stored in a database or on a different computer system.
Specifically, a transaction log is a log of the communications between a system and the users of that system,[2] or a data collection method that automatically captures the type, content, or time of transactions made by a person from a terminal with that system.[3] For Web searching, a transaction log is an electronic record of interactions that have occurred during a searching episode between a Web search engine and users searching for information on that Web search engine.
Many operating systems, software frameworks and programs include a logging system. A widely used logging standard is Syslog, defined in IETF RFC 5424.[4] The Syslog standard enables a dedicated, standardized subsystem to generate, filter, record, and analyze log messages. This relieves software developers of having to design and code their ad hoc logging systems.[5][6][7]
Types
[edit]Event logs
[edit]Event logs record events taking place in the execution of a system that can be used to understand the activity of the system and to diagnose problems. They are essential to understand particularly in the case of applications with little user interaction.
It can also be useful to combine log file entries from multiple sources. It is a different combination that may yield between with related events on different servers. Other solutions employ network-wide querying and reporting.[8][9]
Transaction logs
[edit]Most database systems maintain some kind of transaction log, which are not mainly intended as an audit trail for later analysis, and are not intended to be human-readable. These logs record changes to the stored data to allow the database to recover from crashes or other data errors and maintain the stored data in a consistent state. Thus, database systems usually have both general event logs and transaction logs.[10][11][12][13]
The use of data stored in transaction logs of Web search engines, Intranets, and Web sites can provide valuable insight into understanding the information-searching process of online searchers.[14] This understanding can enlighten information system design, interface development, and devising the information architecture for content collections.
Message logs
[edit]Internet Relay Chat (IRC), instant messaging (IM) programs, peer-to-peer file sharing clients with chat functions, and multiplayer games (especially MMORPGs) commonly have the ability to automatically save textual communication, both public (IRC channel/IM conference/MMO public/party chat messages) and private chat between users, as message logs.[15] Message logs are almost universally plain text files, but IM and VoIP clients (which support textual chat, e.g. Skype) might save them in HTML files or in a custom format to ease reading or enable encryption.
In the case of IRC software, message logs often include system/server messages and entries related to channel and user changes (e.g. topic change, user joins/exits/kicks/bans, nickname changes, the user status changes), making them more like a combined message/event log of the channel in question, but such a log is not comparable to a true IRC server event log, because it only records user-visible events for the time frame the user spent being connected to a certain channel.
Instant messaging and VoIP clients often offer the chance to store encrypted logs to enhance the user's privacy. These logs require a password to be decrypted and viewed, and they are often handled by their respective writing application. Some privacy focused messaging services, such as Signal, record minimal logs about users, limiting their information to connection times.[16]
Server logs
[edit]
A server log is a log file (or several files) automatically created and maintained by a server consisting of a list of activities it performed.
A typical example is a web server log which maintains a history of page requests. The W3C maintains a standard format (the Common Log Format) for web server log files, but other proprietary formats exist.[9] Some servers can log information to computer readable formats (such as JSON) versus the human readable standard.[17] More recent entries are typically appended to the end of the file. Information about the request, including client IP address, request date/time, page requested, HTTP code, bytes served, user agent, and referrer are typically added. This data can be combined into a single file, or separated into distinct logs, such as an access log, error log, or referrer log. However, server logs typically do not collect user-specific information.
These files are usually not accessible to general Internet users, only to the webmaster or other administrative person of an Internet service. A statistical analysis of the server log may be used to examine traffic patterns by time of day, day of week, referrer, or user agent. Efficient web site administration, adequate hosting resources and the fine tuning of sales efforts can be aided by analysis of the web server logs.
See also
[edit]- Digital traces – One's unique set of traceable digital activities
- Log management – Process of managing log data
- Logging as a service – Software architecture for ingesting logs
- XML log
- Tracing (software) § Event logging - comparing software tracing with event logging
- Security event management § Event logs - with a focus on security management
References
[edit]- ^ DeLaRosa, Alexander (February 8, 2018). "Log Monitoring: not the ugly sister". Pandora FMS. Archived from the original on February 14, 2018. Retrieved February 14, 2018.
A log file is a text file or XML file used to register the automatically produced and time-stamped documentation of events, behaviors and conditions relevant to a particular system.
- ^ Peters, Thomas A. (1993-02-01). "The history and development of transaction log analysis". Library Hi Tech. 11 (2): 41–66. doi:10.1108/eb047884. ISSN 0737-8831.
- ^ Rice, Ronald E.; Borgman, Christine L. (1983). "The use of computer-monitored data in information science and communication research". Journal of the American Society for Information Science. 34 (4): 247–256. doi:10.1002/asi.4630340404. ISSN 0002-8231.
- ^ R. Gerhards (March 2009). The Syslog Protocol. Network Working Group. doi:10.17487/RFC5424. RFC 5424. Proposed Standard. Obsoletes RFC 3164.
- ^ "XML Logging :: WinSCP". winscp.net. 16 June 2022.
- ^ "Use XML for Log Files". CodeProject. August 22, 2008.
- ^ "Turn Your Log Files into Searchable Data Using Regex and the XML Classes". learn.microsoft.com. 24 June 2011.
- ^ "Log File Viewer - SQL Server". learn.microsoft.com. 28 February 2023.
- ^ a b "Extended Log File Format". www.w3.org.
- ^ "The Transaction Log (SQL Server) - SQL Server". learn.microsoft.com. 27 September 2023.
- ^ Stankovic, Ivan (February 11, 2014). "A beginner's guide to SQL Server transaction logs".
- ^ "Understanding the importance of transaction logs in SQL Server". TechRepublic. November 11, 2004.
- ^ "Logfiles". www.neurobs.com.
- ^ Jansen, Bernard J. (2006). "Search log analysis: What it is, what's been done, how to do it". Library & Information Science Research. 28 (3). Elsevier BV: 407–432. doi:10.1016/j.lisr.2006.06.005. ISSN 0740-8188.
- ^ "LogFile Class (Microsoft.SqlServer.Management.Smo)". learn.microsoft.com.
- ^ Brandom, Russell (2 January 2018). "Iran blocks encrypted messaging apps amid nationwide protests". The Verge. Vox Media. Archived from the original on 22 March 2018. Retrieved 23 March 2018.
- ^ Server, Caddy Web. "How Logging Works - Caddy Documentation". caddyserver.com.
Logging (computing)
View on Grokipediajava.util.logging or Python's logging module) directly into source code to emit messages at specified severity levels such as DEBUG, INFO, WARN, or ERROR; and management, which encompasses collecting, storing, and analyzing these logs using tools for parsing, searching, and visualization.[1] Common challenges include avoiding excessive logging that impacts performance (log bloat) or insufficient detail that hinders diagnosis, often addressed through configurable levels and structured formats like JSON for machine-readable output.[6] Modern practices emphasize integration with DevOps pipelines, where logs feed into centralized systems for automated alerting and compliance reporting.[7]
Fundamentals
Definition and Purpose
In computing, logging is the process of systematically collecting, storing, and managing records of events, operations, and states that occur within software applications, hardware components, operating systems, and networks.[8] This practice generates timestamped documentation of system activities, enabling visibility into runtime behavior and facilitating subsequent analysis.[9] In networked environments, standardized protocols such as syslog support the transmission of these event notifications across devices, ensuring consistent logging across distributed systems.[10] The primary purposes of logging encompass debugging and troubleshooting, where developers and administrators use logs to trace errors, reproduce issues, and resolve software or system faults; auditing and compliance, by documenting user actions and transactions to satisfy regulatory requirements like FISMA, HIPAA, SOX, or PCI DSS; performance monitoring, to identify bottlenecks, track resource utilization, and optimize operational efficiency; and security analysis, for detecting intrusions, anomalies, and policy violations through examination of access patterns and event sequences.[8][11] For instance, authentication logs might record failed login attempts to aid in intrusion detection, while application traces help pinpoint performance degradation in real-time operations.[11] Key benefits of effective logging include enabling post-event analysis for retrospective reviews of system incidents, supporting root cause identification to prevent recurrence of failures, and enhancing overall system reliability by providing actionable insights into operational health.[8] Common examples of logged data encompass timestamps for event sequencing, user IDs for accountability, error codes indicating specific faults, and resource usage metrics such as CPU or memory consumption to gauge system load.[8][11]History and Evolution
Logging in computing traces its origins to the 1960s with the advent of mainframe systems, where mechanisms for recording system activities emerged to support auditing and error tracking in batch processing environments. IBM's System/360, announced in 1964, represented a pivotal development in this era, incorporating audit trails that logged operations to ensure reliability and compliance in large-scale data processing.[12] These early practices laid the groundwork for systematic event recording, primarily focused on operational integrity rather than real-time analysis. During the 1970s and 1980s, logging evolved alongside Unix systems, shifting toward more standardized and centralized approaches. The syslog protocol, developed by Eric Allman in the early 1980s as part of the Sendmail project at the University of California, Berkeley, became a cornerstone for Unix-like operating systems by enabling the transmission and aggregation of log messages across networks.[13] This innovation addressed the growing need for remote logging in distributed environments, influencing system administration practices for decades.[14] The 1990s marked significant advancements as logging integrated with web and database technologies amid the internet's expansion. Apache HTTP Server, first released in 1995, introduced access logs to capture details of HTTP requests, facilitating web traffic analysis and security monitoring in the burgeoning online ecosystem.[15] Concurrently, relational database management systems increasingly adopted transaction logging to support recovery and ACID properties, with evolving SQL standards providing transaction control statements like COMMIT and ROLLBACK to manage transaction boundaries. In the 2000s and 2010s, the rise of distributed and cloud computing drove innovations in log structure and aggregation. Structured logging formats, such as JSON—specified by Douglas Crockford in the early 2000s—gained traction for their machine-readable properties, improving searchability and parsing in complex applications.[16] The ELK Stack, comprising Elasticsearch (launched in 2010 by Shay Banon), Logstash, and Kibana, revolutionized log aggregation by providing scalable search and visualization for distributed systems.[17] Post-2020 developments have emphasized intelligent analysis and regulatory compliance in logging. AI-assisted techniques, including anomaly detection in tools like AWS CloudWatch Logs, have automated the identification of unusual patterns in log data using machine learning models.[18] Since the enforcement of the General Data Protection Regulation (GDPR) in 2018, logging practices have adapted to prioritize data minimization and access controls for personal information, ensuring compliance while maintaining audit trails.[19]Logging Mechanisms
Levels and Severity
In logging systems, levels and severity provide a hierarchical classification for categorizing log entries based on their importance, urgency, and context, enabling developers and administrators to prioritize, filter, and manage logs effectively. This standardization originated with the syslog protocol, which defines eight severity levels ranging from the most critical to the least, allowing for consistent handling across diverse systems.[10] The syslog severity levels, as specified in RFC 5424, are assigned numerical values from 0 to 7 and serve as a foundational model for many logging implementations. These levels guide the assignment of priorities to messages, ensuring that critical issues receive immediate attention while routine information does not overwhelm storage or analysis resources. The levels are defined as follows:| Numerical Code | Severity Level | Description |
|---|---|---|
| 0 | Emergency | System is unusable |
| 1 | Alert | Action must be taken immediately |
| 2 | Critical | Critical conditions |
| 3 | Error | Error conditions |
| 4 | Warning | Warning conditions |
| 5 | Notice | Normal but significant condition |
| 6 | Informational | Informational messages |
| 7 | Debug | Debug-level messages |
Formats and Structures
Log formats in computing are broadly categorized into unstructured and structured types, each serving distinct purposes in data organization and analysis. Unstructured formats, typically consisting of plain text lines, prioritize human readability and simplicity, often following predefined patterns without rigid schemas. A prominent example is the Apache Common Log Format (CLF), which records web server access events in a space-separated line: remote host IP, RFC 1413 identity (often "-"), username (or "-"), timestamp in brackets, quoted request line, status code, and bytes sent. This format enables quick manual inspection but complicates automated parsing due to its free-form message components.[23] In contrast, structured formats employ machine-readable schemas to facilitate parsing, querying, and integration with analytics tools. Common implementations include JSON, which encapsulates log data as key-value pairs—for instance,{"timestamp": "2023-01-01T00:00:00Z", "level": "error", "message": "Failed login"}—allowing nested fields for rich context; XML, which uses tagged elements for hierarchical representation; and Avro, a binary serialization format that pairs a JSON schema with compact data storage for efficient transmission in distributed systems. These formats support interoperability across tools like log aggregators and big data platforms by enforcing consistent field definitions.[24][25]
Regardless of format, log entries typically include mandatory components to ensure traceability and utility. The timestamp, standardized under ISO 8601 for unambiguous representation (e.g., YYYY-MM-DDTHH:MM:SSZ), captures the event occurrence; the logger name identifies the originating module or service; and the message conveys the core event description. Optional fields, such as thread ID or user context, provide additional diagnostic details without compromising core structure. Severity levels, like "error" or "info," are often embedded as fields in structured logs to classify events.[26][27][28]
Established standards further promote consistency and interoperability in log data exchange. The Syslog protocol, defined in RFC 3164 for legacy BSD-style messages and updated in RFC 5424 for enhanced structure (including version, timestamp, hostname, app-name, process ID, and message ID), enables reliable transmission of event notifications across networks. For security-focused logging, the Common Event Format (CEF), developed by ArcSight (now Micro Focus), standardizes text-based events with a header (device vendor, product, version) followed by extensions for fields like signature ID and event description, supporting multi-vendor integration.[29][10][30]
Over time, log formats have evolved from predominantly human-readable unstructured text to machine-oriented structured schemas, driven by the demands of big data processing. This shift enhances scalability for tools like Splunk, where structured data enables efficient indexing, searching, and correlation of high-volume logs in distributed environments.[31][32]
Types of Logs
Event Logs
Event logs in computing are records of discrete events occurring within an operating system or application, such as startups, shutdowns, errors, or exceptions, serving primarily for diagnostic and historical analysis. These logs capture happenings that indicate system or software behavior, enabling administrators and developers to reconstruct sequences of actions for investigation. For instance, the Windows Event Log system maintains records of significant software and hardware events, including operational successes, warnings, and failures.[33][34] Key characteristics of event logs include timestamping to denote when the event occurred, sequential ordering to preserve chronology, unique event IDs for identification, and source indicators distinguishing between origins like the kernel, drivers, or user applications. These attributes facilitate efficient querying and filtering, with logs often stored in structured formats such as binary files or databases for quick retrieval. In the Windows implementation, each event entry includes a timestamp, event ID, source name, and category to categorize the log entry precisely.[34][35] Event logs are commonly used for troubleshooting system crashes, where details like stack traces in error entries help pinpoint failure points, and for ongoing system health monitoring to detect anomalies or performance degradation early. By analyzing these logs, IT teams can perform root cause analysis for incidents, such as hardware faults or software bugs, ensuring proactive maintenance and issue resolution.[36][34] Examples of event logging implementations include the Windows Event Viewer, which aggregates OS-level events from sources like the system kernel or services for centralized viewing. In Linux environments, systemd-journald collects and stores structured event data from the system and applications in binary journals, supporting efficient searching and rotation. For application-level events, Java's java.util.logging package enables developers to record component-specific events via LogRecord objects, which include timestamps, levels, and messages for debugging purposes. Event logs may incorporate severity levels, such as debug or error, to filter entries during analysis.[33][37][38]Transaction Logs
Transaction logs in computing are specialized records that capture the sequence and state of atomic operations, primarily in databases and distributed systems, to ensure the ACID properties of transactions—atomicity, consistency, isolation, and durability. These logs maintain a durable, sequential history of changes, enabling the system to recover from failures by replaying or rolling back operations as needed. A foundational technique for implementing transaction logs is write-ahead logging (WAL), where modifications to data are first appended to the log on stable storage before being applied to the primary data structures, guaranteeing that committed transactions survive crashes.[39][40] Key elements of transaction logs include before-images and after-images of affected data, which provide the necessary information for undoing uncommitted changes (rollback) or reapplying committed ones (redo) during recovery. Commit and rollback points explicitly mark transaction boundaries, with commit records ensuring all prior log entries are flushed to durable storage, while rollback generates compensation records to reverse effects without altering the original log. Timestamps, often implemented as log sequence numbers (LSNs), order operations chronologically and correlate log entries with data page states, facilitating precise recovery. These elements collectively support fault tolerance by allowing the system to reconstruct the database state post-failure.[40] Primary use cases for transaction logs involve failure recovery, such as replaying logs after a crash to redo committed transactions and undo incomplete ones, thereby restoring consistency without data loss. In replication scenarios, transaction logs enable high availability by streaming changes to secondary nodes; for instance, MySQL's binary logs record data-modifying events like inserts, updates, and deletes, which are then applied on replicas to maintain synchronized copies of the database. This approach supports point-in-time recovery and scales read operations across distributed systems.[39][41] Seminal examples illustrate the impact of transaction logging. The ARIES algorithm, introduced in 1992, provides a robust recovery framework based on WAL, incorporating analysis, redo, and undo passes over the log to handle partial rollbacks and fine-granularity locking efficiently, influencing modern database engines like IBM DB2. In blockchain systems, transaction logs form an immutable ledger where transactions are bundled into timestamped blocks, cryptographically linked to prior blocks via hashes, ensuring tamper-evidence and permanence for applications like cryptocurrency transfers.[40][42]Message Logs
Message logs in computing are specialized records that capture the exchange of messages within messaging systems, such as message queues, brokers, and protocols like SMTP, to track communication flows for operational and regulatory purposes. These logs document the lifecycle of messages from production to consumption, enabling visibility into asynchronous interactions in distributed environments. Unlike broader event logs, message logs focus specifically on inter-system communications, such as those in publish-subscribe models or request-response patterns.[43][44] Key characteristics of message logs include details on the originator and destination of each message, often represented by identifiers like producer IDs, exchange names, or IP addresses; partial or redacted payload content to balance utility with privacy; and status indicators such as delivery confirmation, redelivery flags, or failure reasons. For instance, in SMTP protocol logging on Microsoft Exchange servers, entries record the date, time, client and server IP addresses, session identifiers, and the sequence of SMTP commands and responses exchanged during message transfer, which implicitly capture sender and recipient envelope information without full body content.[45] In Apache Kafka, topic logs store messages as immutable append-only sequences partitioned across brokers, including each message's key, value (payload), timestamp, and optional headers, with delivery semantics ensuring at-least-once guarantees via replication and acknowledgments.[43] Payloads in these logs are frequently truncated or anonymized in production systems to mitigate data exposure risks, particularly for sensitive communications.[11] Message logs serve critical use cases in debugging and compliance. For debugging, they facilitate tracing integration failures, such as undelivered API payloads or stalled queue processing, by replaying message sequences to identify bottlenecks or errors in distributed workflows.[11] In compliance scenarios, especially for healthcare systems handling electronic protected health information (ePHI), message logs fulfill audit requirements under the HIPAA Security Rule, which mandates mechanisms to record and examine activity in systems transmitting ePHI, including message origins, destinations, and access events to demonstrate accountability and detect breaches.[46] Representative examples illustrate these applications. In RabbitMQ, the firehose tracer logs all messages routed through a virtual host, capturing publish events with exchange names (as sender proxies), routing keys, properties, and full message bodies, alongside delivery events noting queue names (as receivers) and redelivery status for troubleshooting routing issues.[47] Similarly, Amazon Simple Queue Service (SQS) employs dead-letter queues to isolate and log messages that exceed maximum receive attempts due to processing failures, preserving the original message attributes, body, and enqueue timestamps for post-mortem analysis and redrive policies.[48] These mechanisms ensure that failed deliveries in asynchronous messaging do not propagate errors while providing durable records for recovery.Server Logs
Server logs, also known as access logs or web server logs, are detailed records generated by web servers to document incoming HTTP requests, server responses, and operational states in networked environments. These logs capture interactions between clients and the server, providing a chronological audit trail of activities such as resource access and error occurrences. For instance, Nginx access logs record all processed requests in a configurable format, enabling administrators to track server behavior across various contexts.[49] Similarly, Apache HTTP Server's access logs detail every request handled by the server, supporting both standard and custom formats for flexibility.[23] In Microsoft IIS, logging records HTTP transactions and errors to facilitate site management and diagnostics.[50] The key elements typically included in server logs follow standardized formats like the Common Log Format (CLF) or extended variants, ensuring interoperability across servers. Essential fields encompass:- Client IP address: Identifies the origin of the request (e.g.,
%hin Apache).[23] - Request method and URL: Specifies the HTTP method (e.g., GET, POST) and targeted resource (e.g.,
"%r"in Apache or$requestin Nginx).[23][49] - Status code: Indicates the response outcome (e.g., 200 for success, 404 for not found;
%>sin Apache or$statusin Nginx).[23][49] - Response time: Measures processing duration (e.g.,
$request_timein Nginx or time taken in IIS).[49][50] - User agent: Reveals the client's browser or device details (e.g.,
%{User-agent}iin Apache).[23]
%b or $bytes_sent), timestamp (%t or $time_local), and referer (%{Referer}i or $http_referer), enhance analysis in combined formats used by Apache and Nginx.[23][49] In IIS, logs in W3C extended format include similar elements like server IP, bytes received/sent, and protocol status for comprehensive tracking.[50]
Server logs serve critical use cases in traffic analysis and security monitoring. For traffic analysis, they enable identification of peak loads by aggregating request volumes over time, revealing patterns in user activity and resource demands to inform capacity planning.[51] In security contexts, logs help detect anomalies such as DDoS attacks through spikes in 404 errors or excessive requests from specific IPs, allowing rapid response to mitigate threats like HTTP floods.[52] Examples include Apache's combined access and error logs, which integrate request details with failure diagnostics, and IIS logs, which support auditing for unauthorized access patterns.[23][50] Server logs may also incorporate API message details, linking to broader message logging practices.[49]
Implementation Approaches
In Software Development
In software development, logging is integrated into codebases through dedicated APIs and libraries that enable developers to insert log statements programmatically at key points in the application logic. These libraries provide structured ways to capture events, errors, and diagnostic information, facilitating debugging and monitoring without disrupting core functionality. For instance, Apache Log4j, a widely used Java logging framework originating in 1996, allows developers to log messages using simple method calls likelogger.info("Message"), supporting configurable appenders for output to files, consoles, or remote systems.[53] Similarly, Serilog, a .NET library introduced in 2013, emphasizes structured logging with message templates that embed properties as key-value pairs, enabling queries on log data post-collection.[54] In Node.js environments, Winston serves as a versatile logger since its early adoption around 2010, offering transports for multiple outputs and levels of abstraction for custom formatting.[55]
Best integration practices emphasize centralized logging to maintain consistency across large codebases, often achieved through facades that abstract underlying libraries and allow swapping implementations without code changes. For example, using a facade like SLF4J in Java decouples application code from specific loggers, promoting portability and reducing vendor lock-in. Conditional logging further optimizes integration by enabling logs only in appropriate contexts, such as debug mode, to avoid performance overhead in production; this can be implemented via environment checks or level thresholds, like logging at DEBUG only when a flag is set.[56]
For testing and debugging, developers incorporate unit tests to verify log output, ensuring that expected messages are emitted under specific conditions, such as error scenarios, using mocking to capture and assert log calls without side effects.[57] Correlation IDs enhance traceability by assigning unique identifiers to requests, which are propagated through log entries to link related events across modules or services, aiding in root cause analysis during debugging.
In microservices architectures, logging integrates with distributed tracing standards like OpenTelemetry, established in 2020 through the merger of OpenTracing and OpenCensus projects, to correlate logs with trace spans.[58] OpenTelemetry enables this by injecting trace and span IDs into log records, allowing developers to link application logs directly to performance traces for end-to-end visibility across services.[59]
