Data Troubleshooting Series
By JOSHUA NUDELL
Have you ever deployed new technology and not have it go as expected?
This is true for most technologies. Recently we have introduced Cribl LogStream to a customer’s observability pipeline and ran into a few issues. Knowing how your data is supposed to get from the originating source to the destination is critical to knowing how to troubleshoot data delivery issues.
Being able to diagnose and remediate data delivery issues quickly will make the installation and management of LogStream (and most other tools) easier and more rewarding.
Imagine that you are trying to send Windows event logs to Splunk and Cribl LogStream is being used as the observability pipeline tool. Windows hosts are generating events that are reading from the Windows event logs by a Splunk universal forwarder, but events are not arriving at Splunk as expected.
Determine the logical observation
When troubleshooting issues, being able to separate the communication components into manageable testing or observation points will help to make an organized and logical approach to determining the overall issue(s). There are several logical areas to consider when troubleshooting the issue in this scenario:
- The data analysis platform, Splunk
- The observability pipeline tool, Cribl LogStream
- The originating host or device, Windows
Ask questions for each observation point
Now that the observation points have been defined, questions related to each point can be used to assist with finding the cause(s) of the problem. The following provides logical areas in which to ask questions about. Below are basic areas and questions to ask to help in troubleshooting data delivery
Checking basic connectivity at each observation point is an important first step in diagnosing data issues.
- Is the Windows host able to communicate with other hosts on the same network?
- Is the Windows host able to communicate with other hosts on different networks?
- Can the Windows host establish communication to the LogStream hosts?
- Is the Windows system resolving names properly?
- Is LogStream receiving information from other data generating systems besides the problem Windows system?
- Can the LogStream host(s) contact the Splunk platform?
- Is Splunk receiving information from other data generating systems besides the problem Windows host?
- Is Splunk receiving information from other data generating systems that pass data through Cribl LogStream?
Issues can lie with the actual port configured to receive information. A port is reference a to a logical slot on the network communication stack where two systems can exchange information. Verifying port connectivity will help to eliminate the receiving system as the culprit in the data delivery issue.
- Does the system have the permission to open the port?
- Is there a host-based firewall and could it be blocking the service from the assigned port?
- Can the Windows host open a connection on the port that LogStream is listening on?
- Can the LogStream host(s) open a connection on the Splunk platform component that is supposed to be listening?
Validate the services and/or processes required to handle the receiving and sending of data are operating properly.
- Is the service running?
- Does it have the required permissions to perform all functions?
- Is the Splunk Universal Forwarder service running on the Windows host generating the event logs?
- Is the LogStream service running and listening on the port expected?
- Is the Splunk service running and listening on the correct port on the Splunk platform component that is supposed to be listening?
Resolve the answers to the questions
In order to resolve the answers to the questions identified in the previous section, a solid understanding of the tools referenced is needed.
Each of the tools referenced in the above troubleshooting process provides different information about the data path communication, as well as different levels of detail. Learning what each tool is and what the expected output will help greatly with the troubleshooting process and general understanding of how the data flows from origin to destination.
Part of resolving the answers to the questions is to correct any errors that are observed, reducing or eliminating any data flow issues.
Examine the original issue
After the answers to the questions have been resolved, and any problems that are found have been corrected, examining the original issue to determine if it persists is the last step. At this point, hopefully the original issue is resolved, and the data events are arriving in Splunk as expected.
Using this logical methodology to assist in troubleshooting communication problems between systems and hosts that are generating event data and the data analysis platform can provide resolutions to problems in a shorter period of time and in a way that is easy to understand and follow.
Start Saving Today with Concanon
This article lists just some of the ways to troubleshoot data flow. Have a more complex situation you need help with fixing?
This article is written by Joshua Nudell, a Director at Concanon, a business intelligence and big data consulting firm specializing in big data analytics tools and solutions. Joshua has over 20 years of experience in IT systems and working with systems data.