Presentation: Fast Log Analysis by Automatically Parsing Heterogeneous Log
This presentation is now available to view on InfoQ.com
Watch videoWhat You’ll Learn
-
Hear how parsing logs is extremely challenging. However, there are approaches that originate in machine learning that can be used to make sense of automating the parsing of heterogeneous logs.
-
Learn interesting approaches to log parsing and backed by a reference implementation used in a commercial product.
-
Understand the challenges to parsing logs automatically.
Abstract
Most log analysis tools provide platforms for indexing, monitoring, and visualizing logs. Although these tools allow users to relatively easily perform ad-hoc queries and define rules in order to generate alerts, they do not provide automated log parsing support. In particular, most of these systems use regular expressions (RegEx) to parse log messages. These tools assume that the users know how to work with RegEx. and make them manually parse or define the fields of interest. By definition, these tools support only supervised parsing as human input is essential. However, human involvement is clearly non-scalable for heterogeneous and continuously evolving log message formats in systems such as IoTs and custom applications -- it is impossible to manually review the sheer number of logs generated in an hour, let alone days and weeks. On top of that, writing RegEx-based parsing rules is a long, frustrating, and error-prone process as RegEx rules may conflict with each other. In this talk, we present a solution inspired by the unsupervised machine learning techniques for automatically generating RegEx rules from a set of logs with no (or minimal) human involvement. Human involvement is limited to providing a set of training logs. In addition, we present a demo illustrating how to integrate our solution with the popular Elasticsearch-Logstash-Kibana (ELK) stack to analyze logs collected from the real-world applications.
Who is the main audience the talk is targeting?
The talk is mainly targeting people who design/architect log analytics solutions and are focused on making the troubleshooting operational problems faster by analyzing logs. When a computer operates, it generates logs to communicate with humans -- logs act as tweets to inform system status. If something fails, somebody has to understand the logs and take necessary steps to correct it. This talk is about how people parse those logs in a form that is one level up in analytics.
What's the motivation for the talk?
When we initially started building the log analysis product for commercial purposes, we experienced bottleneck situations pretty quick. You have a log, but, unless you parse it, you cannot build any useful tools/analytics with it -- this is kind of limited. Since every log is really different (I mean there is no consistent form of logging), it's become very hard to automate.
To solve this problem, we say: “Ok, if this is automated, it doesn't need to be 100% perfect to start log parsing with no (or minimal) human input about the logs, but at least it will help people to get it started with the log analysis. Over the time, if more input is provided, then the automated process will act like a human expert. ” So you throw any logs and the system comes up with some regular expression based patterns. Logs are usually unstructured and there is a lot of text in a log, but, once you run our method, it will generate patterns to parse logs into structured forms, and use that to make sense of the logs.
In our talk, we will discuss our approaches to solving this problem. For example, in the talk, we will cover one particular log which is very scary (almost one page long). Using our tool and the approach we took to solve the problem, the tool will show is that given any log you have a way to parse it.
How does it do that? Does it apply machine learning techniques to be able to identify the components of the log? What does it actually do?
Yes, it's a good question. What we found is if you apply pure machine learning the issue is run time. Machine learning is a time-consuming process, and it is still limiting. What we did is bring machine learning concepts into a blended approach with ways of understanding system logs. What I mean, although the logs are all in different formats, they are generated by a computer. A computer is dumb -- it’s just some programs writing information in the form of logs. Taking that assumption, it is not a storybook where all lines are different. Since logs are generated by some programs, it is only a few logging points which usually generate logs with fixed formats (maybe 10 to 100 formats or something like that). If you go with this kind of deep system knowledge, then it is just solving the problem systematically.
You have a tool that automates discovery of parsing logs now. Is this talk about a specific tool or about techniques you used to build that tool?
The talk will be very generic. Because we have built this tool, we developed a methodology for addressing the problem. We’ll mostly focus on the methodology and use the tool to demonstrate the reference implementation and various design trade-offs. BTW, we completely assume that you don't have any prior knowledge to be able to attend this talk.
Similar Talks
Scaling DB Access for Billions of Queries Per Day @PayPal
Software Engineer @PayPal
Petrica Voicu
Psychologically Safe Process Evolution in a Flat Structure
Director of Software Development @Hunter_Ind
Christopher Lucian
Let's talk locks!
Software Engineer @Samsara
Kavya Joshi
PID Loops and the Art of Keeping Systems Stable
Senior Principal Engineer @awscloud
Colm MacCárthaigh
Are We Really Cloud-Native?
Director of Technology @Luminis_eu
Bert Ertman
The Trouble With Learning in Complex Systems
Senior Cloud Advocate @Microsoft
Jason Hand
How Did Things Go Right? Learning More From Incidents
Site Reliability Engineering @Netflix
Ryan Kitchens
What Breaks Our Systems: A Taxonomy of Black Swans
Site Reliability Engineer @Slack, Contributor to Seeking SRE, & SRECon Steering Committee
Laura Nolan
Cultivating High-Performing Teams in Hypergrowth
Chief Scientist @n26