AUTOMATED PROCESSES AND SYSTEMS FOR PERFORMING LOG MESSAGE CURATION
Automated computer-implemented processes and systems described herein are directed to performing curation of log messages. The automated processes and systems filter unacceptable character strings from log messages to obtain curated text statements. The curated text statements contain human-readable text that enables a reader to understand the underlying messages contained in the log messages.
Latest VMware, Inc. Patents:
- Decentralized network topology adaptation in peer-to-peer (P2P) networks
- REUSING AND RECOMMENDING USER INTERFACE (UI) CONTENTS BASED ON SEMANTIC INFORMATION
- Exposing PCIE configuration spaces as ECAM compatible
- METHODS AND SYSTEMS THAT MONITOR SYSTEM-CALL-INTEGRITY
- Inter-cluster automated failover and migration of containerized workloads across edges devices
Processes and systems that perform log curation on log messages generated in a distributed computing system.
BACKGROUNDData centers execute thousands of applications that enable businesses, governments, and other organizations to offer services over the Internet. These organizations cannot afford problems that result in downtime or slow performance of their applications. Problems frustrate users, damage a brand name, result in lost revenue, and deny people access to vital services. In order to aid system administrators and application owners with detection of problems, various management tools have been developed to collect performance information about applications, services, and hardware. A typical log management tool, for example, records log messages generated by various operating systems and applications executing in a data center. Each log message is an unstructured or semi-structured time-stamped message that records information about the state of an operating system, state of an application, state of a service, or state of computer hardware at a point in time. Most log messages record benign events, such as input/output operations, client requests, logins, logouts, and statistical information about the execution of applications, operating systems, computer systems, and other devices of a data center. For example, a web server executing on a computer system generates a stream of log messages, each of which describes a date and time of a client request, web address requested by the client, and IP address of the client. Other log messages record diagnostic information, such as alarms, warnings, errors, or emergencies.
Software engineers, developers, and troubleshooting teams use log messages to troubleshoot root causes of problems and monitor execution of an applications and systems that support execution of the application. However, problems with large distributed applications do not arise suddenly. Observable problems with large distributed applications often result from hidden problems that occur in the background or when no one is paying attention. Detection of a problem is further complicated because most large distributed applications running in a data center can generate millions of log messages per day with only a small fraction that can be used to troubleshoot the root cause of a problem with an application. As a result, unnoticed problems are often recorded in log messages that are buried deep in log files that contain millions of log messages, making manual detection of such log message challenging, error prone. and extremely time consuming and expensive. For example, consider a batch job that stores results in a data file in response to a user request. A batch job is a non-interactive program that runs off hours or runs in the background while interactive programs run in the foreground. Suppose that when the batch job runs, there is a Null Pointer Exception error in the program that was not noticed during debugging of the program. A Null Pointer Exception error occurs when a variable is declared in a program, but a value is not assigned to the variable before the variable is used to store a data value, resulting in data that should have been assigned to the variable not being written to a data file. When the error occurs during execution of the program, a log message describing the error is recorded in a log file along with millions of other log messages generated that day. However, because the batch job runs unnoticed by users, the problem with no data being written to the data file goes unnoticed until a user carefully inspects the data file.
Debugging an application, such as a batch job, at runtime is an ongoing challenge for developers, architects, and administrators of the application. Even with log management tools, discovering the root cause of an application problem is often performed by different teams of software engineers, including a field team, an escalation team, and a research and development team. Within each team, the search for a root cause is gradually narrowed by filtering millions of log messages through different sub-teams that examine and search for log messages that reveal specific problems. The troubleshooting process can take weeks and, in some cases, months. These long periods spent troubleshooting a problem often leads to increased cost for the organization and can lead to mistakes in processing transactions and denying people access to services provided by an organization. Developers, administrators, and application owners seek automated methods and systems that reduce the time to discovery of root causes of problems in applications using log messages.
SUMMARYAutomated computer-implemented processes and systems described herein are directed to performing curation of log messages produced by log message sources of an application running in a distributed computing system. In one implementation, an automated process retrieves log messages that represent one or more classes of log messages with time stamps in a user-selected time interval from a log file stored in a log message database in response to receiving the time interval from a user via a graphical user interface (“GUI”). The process uses a Grok engine to construct a Grok expression for each of the log messages. The process filters unacceptable character strings from the log messages to obtain curated text statements based on the Grok expressions and acceptable character strings. The process displays the curated text statements. The curated text statements contain human-readable text that enables a reader to understand the underlying messages contained in the log messages.
This disclosure presents automated computer-implemented processes and systems that perform curation of log messages produced by log message sources of an application running in a distributed computing system. Log messages and log files are described below in a first subsection. An example of a log management server executed in a distributed computing system is described below in a second subsection. Processes and systems for performing curation of log messages are described below in a third subsection.
Log Messages and Log FilesIn
As the log management server receives log messages from various event sources, the log messages are stored in corresponding log files in the order in which the log messages are received.
In large, distributed computing systems, such as a data center, terabytes of log messages may be generated each day. The log messages may be sent to a log management server that records the log messages in separate log files that correspond to event sources are in turn stored in data-storage appliances.
The log management server 702 uses Grok expressions that correspond to the log messages to extract character stings and parameters from the log messages. A Grok expression is a language parsing expression that is unique to the format of a class of log messages and is used by the log management server 702 to extract character strings (e.g., words, terms, and alphanumeric character strings) and parameters from log messages that match the format of the Grok expression. Grok expressions are formed from Grok patterns, which are in turn representations of regular expressions. A regular expression, also called a “regex,” is a sequence of symbols that defines a search pattern in text data. Regular expressions are specifically constructed to match strings of characters in log messages and can be become lengthy and extremely complex. For example, because log messages are unstructured, different types of regular expressions are configured to match various different character strings used to record a date and time in the time stamp portion of a log message. Grok patterns are predefined symbolic representations of regular expressions that significantly reduce the complexity of manually constructing regular expressions. Grok patterns are categorized as either primary Grok patterns or composite Grok patterns that are formed from primary Grok patterns. A Grok pattern is called and executed using Grok syntax notation denoted by % {Grok pattern}. When a representative log message does not have a corresponding Grok expression, the log management server 702 automatically generates a corresponding Grok expression for the representative log message. The log management server 702 performs automated methods for constructing Grok expressions for each of the log messages using a Grok engine described in U.S. patent application Ser. No. 17/008,755, filed Sep. 1, 2020, which is owned by VMware Inc. and is herein incorporated by reference.
A composite Grok pattern is formed from two or more primary Grok patterns. Composite Grok patterns may also be formed from combinations of composite Grok patterns and combinations of composite Grok patterns and primary Grok patterns.
Composite Grok patterns also include user defined Grok patterns, such as composite Grok patterns defined by a user. User defined Grok patterns may be formed from any combination of composite and/or primary Grok patterns. For example, a user may define a Grok pattern MYCUSTOMPATTERN as the combination of Grok patterns % {TIMESTAMP_ISO8601} and % {HOSTNAME}, where TIMESTAMP_ISO8601 is a composite Grok pattern listed in the table of
The log management server 702 uses Grok patterns to map specific character strings into dedicated variable identifiers. Grok syntax for using a Grok pattern to map a character string to a variable identifier is given by:
%{GROK_PATTERN:variable_name}
-
- where
- GROK_PATTERN represents a Grok pattern; and
- variable_name is a variable identifier assigned to a character string in text data that matches the GROK_PATTERN.
A Grok expression is a parsing expression that is constructed from Grok patterns that match characters strings in text data and is used to parse character strings of a log message. Consider. for example, the following simple example segment of a log message:
34.5.243.1GET index.html14763 0.064
The five character strings of the segment are “34.5.243.1,” “GET,” “index.html,” “14763.” and “0.064.” A Grok expression that may be used to parse the example segment is given by:
{circumflex over ( )}%{IP:ip_address}\s%{WORD:word}\s%{URIPATHPARAM:request}\s%{INT:bytes}\s%{NUMBER:duration}$
The hat symbol “{circumflex over ( )}” identifies the beginning of a Grok expression. The dollar sign symbol “$” identifies the end of a Grok expression. The symbol “\s” matches spaces between character strings in the log message. The Grok expression parses the example segment by assigning the character strings of the log message to the variable identifiers of the Grok expression as follows:
-
- ip_address:34.5.243.1
- word:GET
- request:index.html
- bytes:14763
- duration:0.064
The log management server 702 forms tokens from the character strings and associated Grok patterns denoted by “character_string|Grok_pattern.” For example, a token formed from the characters string “GET” and the corresponding Grok pattern “WORD” for the example segment above is “GET|WORD”. The log management server performs a filtering operation in which Grok patterns of the tokens are compared with Grok patterns in a list of disallowed Grok patterns persisted in a data-storage device. A token with a Grok pattern that matches a Grok pattern in the list of disallowed Grok patterns is denied and not used in construction of a set of curated text of a log message. By contrast, a token with a Grok pattern that does not match any of the Grok patterns in the list of disallowed Grok patterns is allowed to proceed to a next phase of filtering in construction of a set of curated text.
After filtering based on disallowed Grok patterns, log management server 702 filters character strings of corresponding allowed Grok patterns by discarding character strings comprised of only special characters, such as brackets, parentheses, a coma, a period, an exclamation point, and any special symbols (e.g., @, #, $, %, &, and *). The log management server 702 determines the number of characters in non-discarded character strings and discards character strings that fail to satisfy the following condition:
maxstring_length>length(character_string)>minstring_length (1)
-
- where
- length(character_string) is the string length or number of characters in the character string character_string;
- maxstring_length is the maximum string length (i.e., maximum number of characters); and
- minstring_length is the minimum string length (i.e., minimum number of characters).
For example, the maxstring_length may be set to 30 and the minstring_length may be set to 2.
The log management server 702 compares each character string that satisfies the condition in Equation (1) to allowed character strings persisted in an allowed character string database. The allowed character string database includes a DBMS that stores and retrieves allowed character strings from a data-storage device. The allowed character string database comprises user-selected character strings that appear in log messages and are allowed in curated texts. The allowed character strings selected by a user may be terms created by software engineers that describe specific types of data center objects or resources utilized by an application or named components of an application that aid a reading in understanding how the curated text obtained from a log message relate to the application.
When a character string matches an allowed character string in the allowed character string database. the log management server 702 adds the character string to a set of curated text. On the other hand, when a character string does not match an allowed character string, the log management server uses a natural language processor (“NLP”) engine to assign a probability to the character string denoted by Prob (character_string). The NLP engine is a trained neural network that receives a character string as input and outputs a probability that the character string is a word used in natural language. The log management server tags the character string with the probability output from the NLP engine. If the probability of the character string satisfies the following condition:
Prob(character_string)>Thprob (2)
where Thprob is a probability threshold, the log management server 702 adds the corresponding character string to a set of curated text. For example, the probability threshold may be set to 0.60 or 0.70.
The log management server 702 discards duplicate character strings from the set of curated text output from the process described above with reference to
The log management server 702 compares each of the character strings in the curated text statement with problem character strings that signify a problem with the application. When a problem character string is detected, the log management server 702 tags the curated text statement with the problem character string. Examples of problem character strings that signify a problem application include “error,” “warning.” “critical,” “alert,” “alarm,” “unavailable,” “not found,” “failed,” and “failure.” The log management server uses the tags to identify curated text statements associated with problems in a GUI.
The methods described below with reference to
Those skilled in the art will recognize that any of many different implementation and design parameters, including choice of operating system, virtualization layer, programming language, modular organization, control structures, data structures, and other such design and implementation parameters can be varied to generate a variety of alternative implementations of automated computer-implemented processes and systems for performing log message curation and discovery of problems in an application. The automated process and systems described herein can be integrated into any of a variety of different automated-application-deployment facilities.
It is appreciated that the previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. An automated computer-implemented process for curating log messages generated by event sources of an application, the process comprising:
- displaying a graphical user interface (“GUI”) that enables a user to input a start time and an end time of a time interval and start the automated computer-implemented process for curating the log messages;
- retrieving log messages that represent different classes of the log messages with time stamps in the time interval from a log file stored in a log message database;
- using a Grok engine to construct a Grok expression for each log message that represents one of the classes;
- filtering unacceptable character strings from the log messages that represent the one or more classes to obtain curated text statements based on the Grok expressions and acceptable character strings; and
- displaying the curated text statements in a GUI, the curated text statements containing human-readable text that enables a reader to understand the underlying messages contained in the log messages.
2. The process of claim 1 wherein the filtering unacceptable character strings from the log messages to obtain curated text statements comprises:
- for each log message, filtering disallowed character strings from the log message based on Grok patterns of the Grok expression; filtering character strings from the log message with special characters and character strings with string lengths that are greater than a maximum string length or less than a minimum string length: forming a set of curated text from acceptable character strings; and merging character strings of the set of curated text into a curated text statement;
3. The process of claim 2 wherein filtering disallowed character strings from the log message comprises:
- parsing character strings of the log message using a corresponding Grok expression;
- initializing a set of curated text to the empty set;
- for each character string of the log message and corresponding Grok pattern of the Grok expression,
- comparing the Grok pattern to Grok patterns of disallowed Grok patterns in a disallowed Grok patterns database; and
- discarding the character string when the Grok pattern matches a Grok pattern in the disallowed Grok pattern database.
4. The process of claim 2 wherein filtering character strings from the log message with special characters and character strings with string lengths that are greater than a maximum string length or less than a minimum string length comprises:
- for each character string of the log message, initializing a counter to zero; for each character in the character string, discarding the character string when a character in the character string matches a special character; incrementing the counter when the character does not match a space; comparing the counter to the maximum string length and the minimum string length when the character does not match a space; and discarding the character string when the counter is greater than the maximum string length or less than the minimum string length.
5. The process of claim 2 wherein forming the set of curated text from acceptable character strings comprises:
- for each character string of the log message, comparing the character string to character strings in an allowed character string database; adding the character string to a set of curate text associated with the log message when the character string matches a character string in the allowed character string database; inputting the character string to a natural language processing (“NLP”) engine that outputs a probability that the character string is a natural language word when the character string does not match a character string in the allowed character string database; tagging the character string with the probability output from the NLP engine; and adding the character string to a set of curate text associated with the log message when the probability is greater than a probability threshold.
6. A computer system for curating log messages generated by event sources of an application, the system comprising:
- one or more processors;
- one or more data-storage devices; and
- machine-readable instructions stored in the one or more data-storage devices that when executed using the one or more processors controls the system to perform operations comprising: displaying a graphical user interface (“GUI”) that enables a user to input a start time and an end time of a time interval and start the automated computer-implemented process for curating the log messages; retrieving log messages that represent different classes of the log messages with time stamps in the time interval from a log file stored in a log message database; using a Grok engine to construct a Grok expression for each log message that represents one of the classes: filtering unacceptable character strings from the log messages that represent the one or more classes to obtain curated text statements based on the Grok expressions and acceptable character strings; and displaying the curated text statements in a GUI, the curated text statements containing human-readable text that enables a reader to understand the underlying messages contained in the log messages.
7. The computer system of claim 6 wherein the filtering unacceptable character strings from the log messages to obtain curated text statements comprises:
- for each log message, filtering disallowed character strings from the log message based on Grok patterns of the Grok expression; filtering character strings from the log message with special characters and character strings with string lengths that are greater than a maximum string length or less than a minimum string length; forming a set of curated text from acceptable character strings; and merging character strings of the set of curated text into a curated text statement;
8. The computer system of claim 7 wherein filtering disallowed character strings from the log message comprises:
- parsing character strings of the log message using a corresponding Grok expression;
- initializing a set of curated text to the empty set;
- for each character string of the log message and corresponding Grok pattern of the Grok expression, comparing the Grok pattern to Grok patterns of disallowed Grok patterns in a disallowed Grok patterns database; and discarding the character string when the Grok pattern matches a Grok pattern in the disallowed Grok pattern database.
9. The computer system of claim 7 wherein filtering character strings from the log message with special characters and character strings with string lengths that are greater than a maximum string length or less than a minimum string length comprises:
- for each character string of the log message, initializing a counter to zero; for each character in the character string, discarding the character string when a character in the character string matches a special character; incrementing the counter when the character does not match a space; comparing the counter to the maximum string length and the minimum string length when the character does not match a space; and discarding the character string when the counter is greater than the maximum string length or less than the minimum string length.
10. The computer system of claim 7 wherein forming the set of curated text from acceptable character strings comprises:
- for each character string of the log message, comparing the character string to character strings in an allowed character string database; adding the character string to a set of curate text associated with the log message when the character string matches a character string in the allowed character string database; inputting the character string to a natural language processing (“NLP”) engine that outputs a probability that the character string is a natural language word when the character string does not match a character string in the allowed character string database; tagging the character string with the probability output from the NLP engine; and adding the character string to a set of curate text associated with the log message when the probability is greater than a probability threshold.
11. A computer-readable medium encoded with machine-readable instructions that when executed by one or more processors of a computer system cause the computer system to perform operations comprising:
- displaying a graphical user interface (“GUI”) that enables a user to input a start time and an end time of a time interval and start the automated computer-implemented process for curating log messages generated by event sources of an application;
- retrieving log messages that represent different classes of the log messages with time stamps in the time interval from a log file stored in a log message database;
- using a Grok engine to construct a Grok expression for each log message that represents one of the classes;
- filtering unacceptable character strings from the log messages that represent the one or more classes to obtain curated text statements based on the Grok expressions and acceptable character strings; and
- displaying the curated text statements in a GUI, the curated text statements containing human-readable text that enables a reader to understand the underlying messages contained in the log messages.
12. The medium of claim 11 wherein the filtering unacceptable character strings from the log messages to obtain curated text statements comprises:
- for each log message, filtering disallowed character strings from the log message based on Grok patterns of the Grok expression; filtering character strings from the log message with special characters and character strings with string lengths that are greater than a maximum string length or less than a minimum string length; forming a set of curated text from acceptable character strings; and merging character strings of the set of curated text into a curated text statement;
13. The medium of claim 12 wherein filtering disallowed character strings from the log message comprises:
- parsing character strings of the log message using a corresponding Grok expression;
- initializing a set of curated text to the empty set;
- for each character string of the log message and corresponding Grok pattern of the Grok expression, comparing the Grok pattern to Grok patterns of disallowed Grok patterns in a disallowed Grok patterns database: and discarding the character string when the Grok pattern matches a Grok pattern in the disallowed Grok pattern database.
14. The medium of claim 12 wherein filtering character strings from the log message with special characters and character strings with string lengths that are greater than a maximum string length or less than a minimum string length comprises:
- for each character string of the log message, initializing a counter to zero; for each character in the character string, discarding the character string when a character in the character string matches a special character; incrementing the counter when the character does not match a space; comparing the counter to the maximum string length and the minimum string length when the character does not match a space; and discarding the character string when the counter is greater than the maximum string length or less than the minimum string length.
15. The medium of claim 12 wherein forming the set of curated text from acceptable character strings comprises:
- for each character string of the log message, comparing the character string to character strings in an allowed character string database; adding the character string to a set of curate text associated with the log message when the character string matches a character string in the allowed character string database; inputting the character string to a natural language processing (“NLP”) engine that outputs a probability that the character string is a natural language word when the character string does not match a character string in the allowed character string database; tagging the character string with the probability output from the NLP engine; and adding the character string to a set of curate text associated with the log message when the probability is greater than a probability threshold.
Type: Application
Filed: Oct 26, 2021
Publication Date: Apr 27, 2023
Applicant: VMware, Inc. (Palo Alto, CA)
Inventors: Chandrashekhar Jha (Bangalore), Siddartha Laxman LK (Bangalore), Akash Srivstava (Bangalore), Yash Bhatnagar (Bangalore), Naveen Mudnal (Bangalore)
Application Number: 17/511,341