Figure 1: Source: IDC’s Future Enterprise Resiliency and Spending Survey, Wave 2, March 2022

Broadcom

For today’s teams, it is exceedingly complex and costly to support multiple generations of infrastructure and applications. What’s worse, according to an IDC report on network observability, this is the number one challenge to achieving digital transformation success.

The right data will lead you to the right root cause

The reality is that teams lost visibility and control when workloads started moving to cloud and SaaS environments. To get that visibility and control back, you need to be able to collect, correlate, and contextualize network and user experience data from all networks—whether you own the infrastructure or not.

Today, it is actually possible to realize complete network monitoring visibility, even across multiple generations of network infrastructure. You can establish unified views of bare metal infrastructures, VMs, and containers, even those hosted in ISP, cloud, and SaaS environments. 

In action: Full NetOps visibility and control

I recently caught up with an IT executive at a U.S.-based financial services institution. This organization provides services to banks all over the nation. When the organization began migrating services and workloads to the cloud and adapting to hybrid work realities, they realized they had an urgent network monitoring need. Customer and employee services were suddenly reliant upon internal corporate networks, ISPs, and cloud service providers. When customers and employees encountered downtime and performance issues, they needed to be able to quickly identify which domain the problem was arising in.

Their team was able to establish the comprehensive network monitoring capabilities outlined above, including across ISP networks, their data centers, and the cloud. Now, they’re tracking the user experience, no matter where customers or employees are located.

This visibility provided immediate dividends. For example, when a banking customer began reporting timeouts and latency issues, the financial service firm’s NetOps team was able to quickly identify the cause of the issue: a misconfigured load balancer running on the customer’s network. This is a great example of how teams can improve mean time to innocence (MTTI) when they have the right data in front of them. The NetOps team could quickly determine the issue wasn’t arising in their environment.

Not only does this provide significant improvement in operational efficiency and service levels, but it enables better, more proactive customer service. As a senior systems manager with the financial services firm stated, “We showed the customer that we really do care about them and their business, and we can continue to improve the outcomes our services provide.”

Conclusion

Everyone is talking about network observability today, but any industry analyst or seasoned IT veteran will agree: network observability is really just about having a network monitoring system that collects a complete and diverse set of network data and delivers actionable insights. By harnessing these capabilities, this financial services firm was able to improve network delivery, optimize the user experience, maintain business continuity, and achieve better business outcomes.

Click here to learn more.

Networking

Around the world, organisations are fine-tuning the new digital focus brought about by the COVID-19 pandemic. To remain competitive, they have to grapple with multicloud-based data and operations, software as a service and hybrid working, among other trends.

In this dynamic environment, those that don’t know exactly how their business-critical applications are behaving and performing at all times are wasting their time and money – and losing ground against competitors.

Skills and integration issues

Because of the evolution of applications, traditional monitoring tools, skills and processes – often deployed in silos to focus on specific areas only – are no longer sufficient.

This lack of integration and intelligent data processing creates a visibility and governance nightmare. It limits organisations’ ability to ensure fast and effective fault tracking and performance improvements in a holistic manner. When an application goes down, they may not know the reason or impact – and, importantly, how to avoid another outage proactively.

They may also find it difficult to protect their network and users against attacks and vulnerabilities without the necessary real-time application security monitoring.

Visibility leads to insights – and action

To survive and thrive, organisations need to adopt a data-driven approach to modern observability.

Proactive performance management and automated issue identification can avert application problems – whether traditional or hybrid, on traditional infrastructure or in the cloud – before they affect users or customers. Also, intelligent security insights across multicloud environments strengthen application security.

Full-stack observability refers to real-time observability across an organisation’s technology stack – including applications, software-defined computing, storage and the network. The data it generates gives organisations actionable insights into the behaviour, performance and health of their applications and supporting infrastructure.

With all these benefits, observability seems an obvious choice to make, but it can be hard to adopt given many organisations’ limited skill sets. A lack of technological skills and resources may keep them from striving for full, data-driven observability.

For this reason, there is value in working with an experienced partner to make the shift to modern observability across the entire stack – not only during the implementation phase but also to ensure continual fine-tuning.

For example, the vast amounts of data generated by modern monitoring tools can be daunting. Without expert assistance and the application of artificial intelligence and machine learning, few organisations can interpret and contextualise all of this data to support their business outcomes.

From assessment to intelligent support

NTT’s 360 Observability, powered by Cisco’s suite of full-stack observability tools, delivers end-to-end monitoring visibility supported by the unification of application and network performance management tools.

As a first step, our Observability Maturity Assessment helps organisations align their business and information technology strategies.

This assessment summarises their performance management goals and identifies the gaps and areas of overlap between their current and desired states of observability.

Next, it details recommendations and strategies for people, processes and technologies (including high-level architectures) to implement and support full-stack observability. This is supported by a high-level roadmap to achieving the organisation’s goals.

From there, they can select one of three service levels linked to our Multicloud Application Monitoring Platform. These include regular platform checks (including security) and upgrades, with audit logs, and ongoing fine-tuning of the monitoring settings. Alerts and data dashboards can be customised, and there are regular reviews and recommendations – including the option of observability knowledge transfer and mentoring.

In this way, working with a partner like NTT delivers both full-stack observability and peace of mind to organisations, allowing them to focus fully on achieving their business goals.

Learn more about 360 Observability from NTT and discover your Observability Maturity.

James Wesley is Director of Offer Management: Application and Observability Services at NTT.

Cloud Computing, Cloud Management, Cloud Security, Multi Cloud

As organizations evolve and fully embrace digital transformation, the speed at which business is done increases. This also increases the pressure to do more in less time, with a goal of zero downtime and rapid problem resolution.

Real costs to the business are at stake. For instance, a 2021 ITIC report found that a single hour of server downtime costs at least $300,000 for 91% of mid-sized and large enterprises – and 44% of companies said hourly outage costs exceed $1 million to over $5 million.

The key to avoiding downtime is to get ahead of issues and slowdowns before they even happen. Thankfully, there’s a reliable recipe for how to achieve this. Let’s examine the power that comes with combining AIOps together with observability to minimize downtime and the negative business consequences that come with it.

The power of AIOps

To really grasp the combined power of AIOps with Observability, it’s important to first understand the capabilities of each of these technologies and what they mean. Let’s start with AIOps and the crucial role automation and AI play in supporting enterprises struggling with the inherent challenge of scale. 

A typical enterprise IT system may generate thousands of “events” per second. These events can be anything anomalous to the regular operations of multiple systems – storage, cloud, network equipment, etc. This makes it impossible to keep up with events manually, let alone parse out and prioritize which events will have major business impacts from the ones whose impact might be negligible.

AIOps allows you to put automation to work in separating the signal from the noise in this effort – to isolate the most impactful issues and, ideally, resolve them autonomously. It’s a value proposition that more and more companies are understanding and investing in. Indeed, analysts have found the AIOps market has already surpassed $13 billion and will likely top $40 billion by 2026.

The value of full stack observability

Organizations can reap further value from AIOps when these capabilities are combined with observability, which is the ability to measure the inner state of applications based on the data generated by them, such as logs and key metrics. By looking at multiple indicators to get a full understanding of incidents and components within a system, a strong observability framework in the enterprise can help identify not just what went wrong, but the context for why it went wrong and how to fix it and prevent future occurrences.

One popular approach for comprehensive, full-stack observability is what’s known as a MELT (Metrics, Events, Logs, and Traces) framework of capabilities. Metrics indicate “what” is wrong with a system; understanding Events can help isolate the alerts that matter; Logs help pinpoint “why” a problem is occurring; and Traces of transaction paths can identify “where” the problem is happening.

Although Observability and AIOps can work alone, they complement each other when combined to form a holistic incident management solution. Blending Observability with AIOps enhances speed and accuracy in leveraging applications data for proactive identification and auto-resolution of problems and anomalies – even to the point of heading off issues before they arise. This proactive optimization of systems can drastically reduce risk and downtime for the enterprise.

Combining AIOps and observability: A case study

An example comes to mind of a private investment company based in Canada – one of the largest institutional investors globally. They struggled to manually coordinate 15 decentralized monitoring tools, resulting in massive system noise and delays finding the root cause of issues. To solve these challenges, they implemented a combination of AIOps and observability tools that helped conduct end-to-end blueprinting of the entire IT ecosystem and then integrate all 15 monitoring tools to capture and prioritize alerts.

The new system now automatically eliminates false positives; generates tickets for real alerts; and then deploys suppression, aggregation, and closed-loop self-heal capabilities to autonomously resolve most issues. For the remaining unresolved tickets, the system does root cause analysis, logs all the relevant data along with the ticket and then sends it to the manual queue.

As this case study illustrates, pairing observability together with AIOps capabilities allows an organization to link the performance of its applications to its operational results by isolating and resolving errors before they hamper the end user experience. In doing so, enterprises can support closed-loop systems for getting ahead of potential causes of downtime to reduce the number of incidents and – where events do occur – decrease the mean-time-to-detect (MTTD) and mean-time-to-resolution (MTTR).

Conclusion

Clearly, the business benefits that come from combining AIOps and observability together are exponentially greater than the sum of what observability or AIOps could do on their own. These advantages are critically important for organizations looking to minimize both downtime, and the steep organizational costs that come with it.

Learn how to get ahead of issues and downtown before they arise, visit Digitate.

IT Leadership

Over the past decade, an ever-growing number of organisations have taken their infrastructure and applications to the cloud, delivering noticeable results impacting the bottom line and several other business metrics. This is why today, a cloud-first strategy is rightly recognised even by many non-IT corporate leaders as the catalyst for rapid digital transformation and a key enabler for businesses to respond to constantly evolving customer and employee needs.

By re-thinking their approach to applications – in either cloud-only or hybrid environments – organisations can introduce greater flexibility and freedom to their application development processes, unleashing innovation on a grander scale and speed. However, as anybody who has worked in an IT department over the past year or two knows, managing availability and performance across cloud-native applications and technology stacks is a huge challenge.

Traditional approaches to availability and performance were often based on long-lived physical or virtualised infrastructures. Ten years ago, IT departments operated a fixed number of servers and network wires; they were dealing with constant and static dashboards for each layer of the IT stack. The introduction of cloud computing has added a new level of complexity, and organisations found themselves continually scaling up and down their use of IT resources based on real-time business needs. Monitoring solutions have adapted to accommodate deployments of cloud-based applications alongside traditional on-premises environments. The reality, however, is that most of these solutions are not passing the stress tests as they were not designed to efficiently handle the dynamic and highly volatile cloud-native environments that we increasingly see today. 

These highly distributed cloud and hybrid systems rely on thousands of containers and spawn a massive volume of metrics, logs and traces (MLT) telemetry every second. And currently, most IT departments don’t have a monitoring solution to cut through this crippling volume of data and noise when troubleshooting application availability and performance problems caused by infrastructure-related issues that span across cloud and hybrid environments. 

Cloud-native observability solutions are necessary

In response to this spiralling complexity, IT departments need visibility across the application level, down into the supporting digital services (such as Kubernetes), and into the underlying infrastructure-as-code (IaC) services (such as compute, server, database, and network) that they’re leveraging from all their cloud providers. They also need visibility into the user and business impact of each resource to prioritise their actions. This is essential for IT teams to truly understand how their applications are performing and where they need to focus their time.

Technologists are increasingly recognising the need for full-stack insights and to map relationships and dependencies across siloed domains and teams. This explains why, according to an AppDynamics report, The Journey to Observability, more than half of global businesses (54%) have now started the transition to full-stack observability, and a further 36% plan to do so during 2022.

IT teams need new cloud-native observability solutions to manage the complexity of cloud-native applications and IT environments. They require a way to get visibility into applications and underlying infrastructure for large, managed Kubernetes environments running on one or several public clouds. 

From a technology perspective, there are numerous key criteria that IT leaders and their teams should be considering when looking at cloud-native observability solutions to ensure they are future-proofed. They should be seeking out a solution that is able to observe distributed and dynamic cloud-native applications at scale; a solution that embraces open standards, particularly Open Telemetry; and that leverages AIOps and business intelligence to speed up identification and resolution of issues and enable technologists to prioritise actions based on business outcomes.

Organisations must have a new cloud-native mindset

Besides choosing the best cloud observability solution for the enterprise overall, IT managers must also make sure their solution delivers value to the emerging cloud specialists in their team, such as Site Reliability Engineers (SRE), DevOps and CloudOps. And not only do these technologists have new and highly specialised skill sets, but they also have very different needs, priorities, mindsets, and ways of working.

Traditionally, ITOps teams have always been focused on minimising the risks brought about by change. Their mission has been to maximise up-time and unify technology choices, and they tend to take a rigid, centralised approach to digital transformation. 

But when it comes to SREs, DevOps or CloudOps teams, it’s a very different story. These new teams value agility over control and focus on giving each team the freedom to choose the best approach. They accept that there will always be massive complexity with cloud-native applications, but they see that giving up some level of control gives them speed and innovation. They can find peace in the chaos by adopting new solutions that allow them to cut through complexity and data noise and pinpoint what matters.

Similarly, when considering digital transformation initiatives, these teams aren’t unnerved by the scale and complexity involved in these programs. They don’t feel held back by legacy technology or scarred by previous attempts to innovate. They embrace change rather than resisting it and see transformation as an exciting and welcome part of business as usual. 

These new cloud-native technologists are unwilling to conform to vendor lock-ins; they believe they can deliver the most value within dynamic technology ecosystems, with all teams having the freedom to select and work with best-in-class solutions for each project. 

Finally, cloud-native technologists (be they SREs, DevOps or CloudOps) will evolve to have a very business-focused mindset. They will increasingly strive to view IT performance and availability through a business lens and to understand how their actions and decisions can have the most significant impact on the business. 

The important thing for business leaders is to recognise the new mindsets and drivers of their cloud-native teams and empower these technologists with the culture, support, and solutions they need to deliver value. That means developing a strategy that enables these teams to operate in entirely new ways, while also ensuring their existing teams can continue doing the vital work they’re doing by monitoring large parts of their IT infrastructure.

IT leaders should consider these cultural factors when selecting a cloud-native observability solution to ensure their SREs, DevOps and CloudOps teams have a solution that offers them the scalability, flexibility and business metrics they need to perform to their full potential. 

By taking a holistic approach, considering both the technical and cultural needs of their IT teams, organisations can empower their technologists to cut through the complexity of cloud-native environments and deliver on the promise of this exciting new approach to application development.

Cloud Native