Today one of the key success indicators across all industries is their degree of digital transformation adoption. This digital conversion is originated and empowered by the unprecedented evolution of Data Analytics and Big Data, regarding larger and more complex structures. The soaring demand for faster processing power and larger storage space has led to new and innovative technologies with a variety of sophisticated tools. In practice often it is needed to combine multiple tools to meet hardware and software requirements, but then it becomes increasingly difficult to control and monitor such systems due to their complexity and diversity.
There can be numerous specific challenges when monitoring large data structures, starting from the collection of traces and logfiles, to the analysis phase until finally resolving an incident. Equally hard is to provide observability towards distinguished applications and hardware units, since the volume of potentially displayed information is enormous and the processing time should be kept to a minimum.
At the collection stage, the more diverse the gathered traces, the greater the number of access points for the monitored resources, with the latter scattered across the big data ecosystem. There might exist applications or hardware units completely lacking monitoring capabilities without generating any logfiles, or these could be partially supervised by the overall monitoring system.
At the analysis stage, during incident’s investigation, the troubleshooter is usually confronted with massive volumes of generated logfiles and alerts, which create an irrepressible and chaotic situation. Furthermore, the processed logfiles might be unsynchronized and incompatible between them, since they were generated at different times by various systems or applications. Lineage and historical data retention might not be supported by all systems, which could contribute significantly to the root cause exploration.
A significant manual effort and labor cost accompanies the troubleshooting process, proportional to the number of faults and the complexity of the environment. As a result, the efficiency of monitoring mechanisms regarding large data analytics structures is rather poor, hard to audit and intricate to improve.
Typical solutions for monitoring Big Data clusters involve a combination of numerous monitoring tools, dedicated to individual resources and targeting specific metrics. Some of the common tools that are integrated within enterprise data environments - such as Cloudera Manager for Cloudera distributions, are limited to basic monitoring functions. Other commercial tools such as Datadog, Splunk and others, may offer high degree of monitoring capabilities and customizations, but their cost escalates proportionally to the volume of the data stored and function served.
Alternative methods, using open-source tools, include Elastic Stack for logfiles and tracing, Nagios for checking the status of servers, hosts and network. Regarding monitoring of hardware structures, racks and power units, these are often monitored via their own specialized software lacking any graphical user interface.
There is a clear need for an innovative monitoring solution, capable to handle metrics from a diverse range of resources and applications, correlate their metrics and visualize the results via a common tool and a user-friendly interface.
Any proper monitoring system focuses its metric mechanism to “4 Golden Signals” the Latency, the Traffic, the Errors and the Saturation. “Latency” measures the amount of time needed to send a request and receive a response, regardless of being successful or unsuccessful, which is highly impacted by the distributed number of servers within the cluster. “Traffic” measures the total number of requests served by the system. “Errors” is the number of failed requests, while “Saturation” depicts the utilization of the system, emphasizing on the most critical resources.
Intracom Telecom has an extensive experience on installation, configuration and consultation of Big Data ecosystems, for more than a decade across various industries and business interests. Through this knowledge and expertise emerges “BigStreamer™ Monitoring”, a solution capable to collect, analyze and visualize critical information generated by Big Data components, applications, data sources and hardware units.
BigStreamer™ Monitoring is capable of ingesting metrics either directly from data sources, or indirectly by available metrics processed by other monitoring tools (see Figure 1).
Direct data sources include but are not limited to application servers, hardware structures, web servers, database servers, specific applications and others. Indirect data includes ready metrics generated by other commonly used monitoring tools such as Cloudera Manager, Nagios, various hardware supervision tools, Graphite, Elastic Stack.
Collected metrics are stored as timeseries and then further analyzed to create meaningful graphs and visualizations. The access and interaction with the tool is powered by an intuitive, web-based graphical user interface.
BigStreamer™ Monitoring offers a rich and interactive User Experience that supports a great number of visualizations, charts, graphs, while being highly customizable. Standard offered features may be extended through plug-ins to support additional functions if & when required (see Figure 2).
BigStreamer™ Monitoring consists of a main "Manager" module, which is the central starting dashboard and optional autonomous modules serving specialized monitoring functions (see Figure 3), including:
To find out more on BigStreamer™ Monitoring, please visit: www.intracom-telecom.com/bigstreamer.