In the words of J.R.R. Tolkien, “shortcuts make long delays.” I get it, we live in an age of instant gratification, with Doordash and Grubhub meals on-demand, fast-paced social media and same-day Amazon Prime deliveries. But I’ve learned that in some cases, shortcuts are just not possible.

Such is the case with comprehensive AI implementations; you cannot shortcut success. Operationalizing AI at scale mandates that your full suite of data–structured, unstructured and semi-structured get organized and architected in a way that makes it useable, readily accessible and secure. Fortunately, the journey to AI is one that is more than worth the time and effort.

AI Potential: Powering Our World and Your Business

That’s because AI promises to be one of the most transformational technologies of our time. Already, we see its impact across industries and applications. If you’ve experienced any of these, then you’re seeing AI in action:

Automated assistants such as Amazon Alexa, Microsoft Cortana and Google Assistant.COVID vaccines and/or personalized medicine used to treat an illness or disease.Smart cars that alert drivers like you, help you park and ping you when it’s time for maintenance.Shopping preferences that are tailored to your specific tastes and proactively sent to you.

Despite these AI-powered examples, businesses have only just begun to embrace AI, with an estimated 12% fully using AI technology.1 But this is changing rapidly. And that’s because AI holds massive potential. In one Forrester study and financial analysis, it was found that AI-enabled organizations can gain an ROI of 183% over three years. 2

That’s why AI is a key determinant of your future success. Businesses that lead in fully deploying AI will be able to optimize customer experiences and efficiencies that help maximize customer retention and customer acquisition and gain a distinct advantage over the competition. The growing divide between AI haves and have-nots is underway and at a certain point, that chasm will not be crossable.

For example, today airports can use AI to keep passengers and employees safer. AI working on top of a data lakehouse, can help to quickly correlate passenger and security data, enabling real-time threat analysis and advanced threat detection.

In order to move AI forward, we need to first build and fortify the foundational layer: data architecture. This architecture is important because, to reap the full benefits of AI, it must be built to scale across an enterprise versus individual AI applications. 

Constructing the right data architecture cannot be bypassed. That’s because several impeding factors are currently in play that must be resolved. All organizations need an optimized, future-proofed data architecture to move AI forward.

Complexity slows innovation

Data growth is skyrocketing. One estimate3 states that by 2024, 149 zettabytes will be created every day: that’s 1.7 MB every second. A zettabyte has 21 zeroes. What does that mean? According to the World Economic Forum4, “At the beginning of 2020, the number of bytes in the digital universe was 40 times bigger than the number of stars in the observable universe.” 


Data’s size alone creates inherent complexity. Layered on top of that are the different types of data stored in various siloes and locations throughout an organization. It all adds up to a “perfect storm” of complexity.

A complex data landscape prevents data scientists and data engineers from easily linking the right data together at the right time. Additionally, multiple systems of record create a confusing environment when those sources do not report the same answers.

Extracting value from data

Highly skilled data scientists, analysts and other users grapple with gaining ready access to data. This has become a bottleneck, hindering richer and real-time insights. For AI success, data scientists, analysts and other users need fast, concurrent access to data from all areas of the business.

Securing data as it grows

Securing mission-critical infrastructure, across all data in an enterprise, is a default task for every organization.  However, as data grows within an enterprise, more desire for access and use of that data produces an increasing amount of vulnerable security end points.   

Catalyzing AI at Scale with Data Lakehouse

The good news is that data architectures are evolving to solve these challenges and fully enable AI deployments at scale. Let’s look at the data architecture journey to understand why and how data lakehouses help to solve complexity, value and security.

Traditionally, data warehouses have stored curated, structured data to support analytics and business intelligence, with fast, easy access to data. Data warehouses, however, were not designed to support the demands of AI or semi-structured and unstructured data sources. Data lakes emerged to help solve complex data organizational challenges and store data in its natural format. Used in tandem with data warehouses, data lakes, while helpful, simultaneously create more data silos and increase cost.5

Today, the ideal solution is a data lakehouse, which combines the benefits of data warehouses and data lakes. A data lakehouse handles all types of data via a single repository, eliminating the need for separate systems. This unification of access through the lakehouse removes multiple areas of ingress/egress and simplifies security and management achieving both value extraction and security. Data lakehouses support AI and real-time data applications with streamlined, fast and effective access to data.

The benefits of a data lakehouse address complexity, value and security:

Create more value quickly and efficiently from all data sourcesSimplify the data landscape via carefully engineered design featuresSecure data and ensure data availability at the right time for the right requirements

For example, pharmacies can use a data lakehouse to help patients. By quickly matching drug availability with patient demand, pharmacies can ensure the right medication is at the right pharmacy for the correct patient.

Moving AI Forward

AI deployments at scale will change the trajectory of success around the world and across industries, company types and sizes. But first things first mandate that the right data architecture be put in place to fully enable AI. While data lake solutions help accelerate this process, the right architecture cannot be bypassed. As J.R.R. Tolkien intimated, anything worth achieving takes time.

Want to learn more?  Read this ESG paper.




[3] Finances Online, 53 Important Statistics About How Much Data Is Created Every Day, accessed April 2022



IT Leadership

By Hock Tan, Broadcom President & CEO

In the years that I have led Broadcom, I have found two things to be true for technology leaders: First, success with your customers starts with success with your ecosystem partners; and second, driving ecosystem growth is key to maintaining the growth of your own business.

This is why, at Broadcom, we bring innovation, investment and attention into our making customer value a lasting reality through our pioneering partner programs. These programs help us drive two pivotal customer objectives: innovation in technology and innovation in business models.

From joint innovation to accessing new markets, our pioneering partner programs help us do more for customers. As digital transformation accelerates, customers need fully integrated solutions that address their needs.

Today, we have more than 35,000 partners in our IT infrastructure and cybersecurity software ecosystem, and every single one plays a vital role in bringing value and success for our customers. We work with many kinds of partners across the entire value chain – including the production, procurement, distribution and deployment of our products. They help us expand the reach of our technology and drive better business efficiency and experiences for customers.

When we set out to make any business decision, we always ask ourselves the following three questions:

Does it drive a better outcome for the customer?Does it allow and enable profitability for a partner?Does it drive better efficiencies for Broadcom?

If the answer to any of these is “no”, it’s not a path worth pursuing. Our partners and customers should always benefit from the decisions we make.

What partners bring to Broadcom’s customers

At Broadcom, we understand that the key to growth isn’t found in being all things to all people, but instead we believe our customer-first mindset, coupled with purposeful partnerships, is key to delivering untapped value for customers. 

Broadcom’s innovative and industry-first partnership models provide that purposeful plan for how our partners integrate into the overall value chain, and empower each company to leverage their core competencies and do what they do best. Our highly capable partners help us provide solutions for customers ranging from the world’s largest public and private organizations to small- and medium-sized businesses (SMBs). Through Broadcom’s unique friction free Expert Advantage Partner Program, partners deliver high value services to customers of all sizes – including our largest enterprise accounts. 

Yet, the value our partners deliver goes far beyond services. Showcased on our Insights Marketplace at, customers can find our partner-built applications that extend our product capabilities and tailor them for specific use cases – unlocking more value from our customers’ investments. In short, for every challenge, there’s a Broadcom partner ready to deliver the solution and support the specialized needs of businesses – regardless of size. 

What Broadcom brings to partners

At Broadcom, we are unique in how we engage with and support our partner ecosystem. Often, commercial vendors will attempt to control how their partners conduct business. But at Broadcom, we empower partners to identify and pursue their own commercial strategies, so they can bring sales and services to end-user customers on their own terms. We introduce industry-first, go-to-market partner models with shared risk and significant rewards. 

Our Global Cyber Security Aggregator Program (CSAP) is proof. CSAP was launched to expand our market reach and deliver enhanced levels of service to a subset of commercial enterprises with unique needs. The program brings together Broadcom’s Symantec cyber security solutions and partners’ resources along with their in-country expertise to offer a best-in-class customer experience. We have made significant investments, including in-sales training to ensure our distribution partners are well equipped to provide better customer support and a quicker response time to evolving threats.

Our customers can also receive hands-on technical help through our unique Broadcom Software Knights Program. We vet and provide certified partners with ongoing technical training, product presale and sales intelligence so that they can handle any complex issue put in front of them with hands-on technical support. We provide them with the best so that our customers experience the best.

Together, we have a shared goal and responsibility of addressing our customers’ needs and delivering superior outcomes. It’s a win-win-win. Our message to our customers, current partners and future partners is this: our goal is to deliver superior outcomes for customers of all sizes; and our partners’ success is our success. We understand the value our partner ecosystem brings to Broadcom and mutual customers, and we are committed to our partner and customers’ continued success.  

Learn more about Broadcom here.

About Hock Tan:

Broadcom Software

Hock Tan is Broadcom President, Chief Executive Officer and Director. He has held this position since March 2006. From September 2005 to January 2008, he served as chairman of the board of Integrated Device Technology. Prior to becoming chairman of IDT, Mr. Tan was the President and Chief Executive Officer of Integrated Circuit Systems from June 1999 to September 2005. Prior to ICS, Mr. Tan was Vice President of Finance with Commodore International from 1992 to 1994, and previously held senior management positions with PepsiCo and General Motors. Mr. Tan served as managing director of Pacven Investment, a venture capital fund in Singapore from 1988 to 1992, and served as managing director for Hume Industries in Malaysia from 1983 to 1988.

IT Leadership

By George Trujillo, Principal Data Strategist, DataStax

Increased operational efficiencies at airports. Instant reactions to fraudulent activities at banks. Improved recommendations for online transactions. Better patient care at hospitals. Investments in artificial intelligence are helping businesses to reduce costs, better serve customers, and gain competitive advantage in rapidly evolving markets. Titanium Intelligent Solutions, a global SaaS IoT organization, even saved one customer over 15% in energy costs across 50 distribution centers, thanks in large part to AI.  

To succeed with real-time AI, data ecosystems need to excel at handling fast-moving streams of events, operational data, and machine learning models to leverage insights and automate decision-making. Here, I’ll focus on why these three elements and capabilities are fundamental building blocks of a data ecosystem that can support real-time AI.


Real-time data and decisioning

First, a few quick definitions. Real-time data involves a continuous flow of data in motion. It’s streaming data that’s collected, processed, and analyzed on a continuous basis. Streaming data technologies unlock the ability to capture insights and take instant action on data that’s flowing into your organization; they’re a building block for developing applications that can respond in real-time to user actions, security threats, or other events. AI is the perception, synthesis, and inference of information by machines, to accomplish tasks that historically have required human intelligence. Finally, machine learning is essentially the use and development of computer systems that learn and adapt without following explicit instructions; it uses models (algorithms) to identify patterns, learn from the data, and then make data-based decisions.

Real-time decisioning can occur in minutes, seconds, milliseconds, or microseconds, depending on the use case. With real-time AI, organizations aim to provide valuable insights during the moment of urgency; it’s about making instantaneous, business-driven decisions. What kinds of decisions are necessary to be made in real-time? Here are some examples:

Fraud It’s critical to identify bad actors using high-quality AI models and data

Product recommendations It’s important to stay competitive in today’s ever-expanding online ecosystem with excellent product recommendations and aggressive, responsive pricing against competitors. Ever wonder why an internet search for a product reveals similar prices across competitors, or why surge pricing occurs?

Supply chain With companies trying to stay lean with just-in-time practices, it’s important to understand real-time market conditions, delays in transportation, and raw supply delays, and adjust for them as the conditions are unfolding.

Demand for real-time AI is accelerating

Software applications enable businesses to fuel their processes and revolutionize the customer experience. Now, with the rise of AI, this power is becoming even more evident. AI technology can autonomously drive cars, fly aircraft, create personalized conversations, and transform the customer and business experience into a real-time affair. ChatGPT and Stable Diffusion are two popular examples of how AI is becoming increasingly mainstream. 

With organizations looking for increasingly sophisticated ways to employ AI capabilities, data becomes the foundational energy source for such technology. There are plenty of examples of devices and applications that drive exponential growth with streaming data and real-time AI:  

Intelligent devices, sensors, and beacons are used by hospitals, airports, and buildings, or even worn by individuals. Devices like these are becoming ubiquitous and generate data 24/7. This has also accelerated the execution of edge computing solutions so compute and real-time decisioning can be closer to where the data is generated.AI continues to transform customer engagements and interactions with chatbots that use predictive analytics for real-time conversations. Augmented or virtual reality, gaming, and the combination of gamification with social media leverages AI for personalization and enhancing online dynamics.Cloud-native apps, microservices and mobile apps drive revenue with their real-time customer interactions.

It’s clear how these real-time data sources generate data streams that need new data and ML models for accurate decisions. Data quality is crucial for real-time actions because  decisions often can’t be taken back. Determining whether to close a valve at a power plant, offer a coupon to 10 million customers, or send a medical alert has to be dependable and on-time. The need for real-time AI has never been more urgent or necessary.

Lessons not learned from the past

Organizations have over the past decade put a tremendous amount of energy and effort into becoming data driven but many still struggle to achieve the ROI from data that they’ve sought. A 2023 New Vantage Partners/Wavestone executive survey highlights how being data-driven is not getting any easier as many blue-chip companies still struggle to maximize ROI from their plunge into data and analytics and embrace a real data-driven culture:

19.3% report they have established a data culture26.5% report they have a data-driven organization39.7% report they are managing data as a business asset47.4% report they are competing on data and analytics

Outdated mindsets, institutional thinking, disparate siloed ecosystems, applying old methods to new approaches, and a general lack of a holistic vision will continue to impact success and hamper real change. 

Organizations have balanced competing needs to make more efficient data-driven decisions and to build the technical infrastructure to support that goal. While big data technologies like Hadoop were used to get large volumes of data into low-cost storage quickly, these efforts often lacked the appropriate data modeling, architecture, governance, and speed needed for real-time success.

This resulted in complex ETL (extract, transform, and load) processes and difficult-to-manage datasets. Many companies today struggle with legacy software applications and complex environments, which leads to difficulty in integrating new data elements or services. To truly become data- and AI-driven, organizations must invest in data and model governance, discovery, observability, and profiling while also recognizing the need for self-reflection on their progress towards these goals.

Achieving agility at scale with Kubernetes

As organizations move into the real-time AI era, there is a critical need for agility at scale. AI needs to be incorporated into their systems quickly and seamlessly to provide real-time responses and decisions that meet customer needs. This can only be achieved if the underlying data infrastructure is unified, robust, and efficient. A complex and siloed data ecosystem is a barrier to delivering on customer demands, as it prevents the speedy development of machine learning models with accurate, trustworthy data.

Kubernetes is a container orchestration system that automates the management, scaling, and deployment of microservices. It’s also used to deploy machine learning models, data streaming platforms, and databases. A cloud-native approach with Kubernetes and containers brings scalability and speed with increased reliability to data and AI the same way it does for microservices. Real-time needs a tool and an approach to support scaling requirements and adjustments; Kubernetes is that tool and cloud-native is the approach. Kubernetes can align a real-time AI execution strategy for microservices, data, and machine learning models, as it adds dynamic scaling to all of these things. 

Kubernetes is a key tool to help do away with the siloed mindset. That’s not to say it’ll be easy. Kubernetes has its own complexities, and creating a unified approach across different teams and business units is even more difficult. However, a data execution strategy has to evolve for real-time AI to scale with speed. Kubernetes, containers, and a cloud-native approach will help. (Learn more about moving to cloud-native applications and data with Kubernetes in this blog post.)

Unifying your organization’s real-time data and AI strategies

Data, when gathered and analyzed properly, provides the inputs necessary for functional ML models. An ML model is an application created to find patterns and make decisions when accessing datasets. The application will contain ML mathematical algorithms. And, once ML models are trained and deployed, they help to more effectively guide decisions and actions that make the most of the data input. So it’s critical that organizations understand the importance of weaving together data and ML processes in order to make meaningful progress toward leveraging the power of data and AI in real-time. From architectures and databases to feature stores and feature engineering, a myriad of variables must work in sync for this to be accomplished.

ML models need to be built,  trained, and then deployed in real-time. Flexible and easy-to-work-with data models are the oil that makes the engine for building models run smoothly. ML models  require data for testing and developing the model and for inference when the ML models are put in production (ML inference is the process of an ML model making calculations or decisions on live data).

Data for ML is made up of individual variables called features. The features can be raw data  that has been processed or analyzed or derived. ML model development is about finding the right features for the algorithms. The ML workflow for creating these features is referred to as feature engineering. The storage for these features is referred to as a feature store. Data and ML model development fundamentally depend on one another..

That’s why it is essential for leadership to build a clear vision of the impact of data-and-AI alignment—one that can be understood by executives, lines of business, and technical teams alike. Doing so sets up an organization for success, creating a unified vision that serves as a foundation for turning the promise of real-time AI into reality .

A real-time AI data ingestion platform and operational data store

Real-time data and supporting machine learning models are about data flows and machine-learning-process flows. Machine learning models require quality data for model development and for decisioning when the machine learning models are put in production. Real-time AI needs the following from a data ecosystem:

A real-time data ingestion platform for messaging, publish/subscribe (“pub/sub” asynchronous messaging services), and event streamingA real-time operational data store for persisting data and ML model features An aligned data ingestion platform for data in motion and an operational data store working together to reduce the data complexity of ML model developmentChange data capture (CDC) that can send high-velocity database events back into the real-time data stream or in analytics platforms or other destinations.An enterprise data ecosystem architected to optimize data flowing in both directions.


Let’s start with the real-time operational data store, as this is the central data engine for building ML models. A modern real-time operational data store excels at integrating data from multiple sources for operational reporting, real-time data processing, and support for machine learning model development and inference from event streams. Working with the real-time data and the features in one centralized database environment accelerates machine learning model execution.

Data that takes multiple hops through databases, data warehouses, and transformations moves too slow for most real-time use cases. A modern real-time operational data store (Apache Cassandra® is a great example of a database used for real-time AI by the likes of Apple, Netflix, and FedEx) makes it easier to integrate data from real-time streams and CDC pipelines. 

Apache Pulsar is an all-in-one messaging and streaming platform, designed as a cloud-native solution and a first class citizen of Kubernetes. DataStax Astra DB, my employer’s database-as-a-service built on Cassandra, runs natively in Kubernetes. Astra Streaming is a cloud-native managed real-time data ingestion platform that completes the ecosystem with Astra DB. These stateful data solutions bring alignment to applications, data, and AI.

The operational data store needs a real-time data ingestion platform with the same type of integration capabilities, one that can ingest and integrate data from streaming events. The streaming platform and data store will be constantly challenged with new and growing data streams and use cases, so they need to be scalable and work well together. This reduces the complexity for developers, data engineers, SREs, and data scientists to build and update data models and ML models.  

A real-time AI ecosystem checklist

Despite all the effort that organizations put into being data-driven, the New Vantage Partners survey mentioned above highlights that organizations still struggle with data. Understanding the capabilities and characteristics for real-time AI is an important first step toward designing a data ecosystem that’s agile and scalable.  Here is a set of criteria to start with:

A holistic strategic vision for data and AI that unifies an organizationA cloud-native approach designed for scale and speed across all componentsA data strategy to reduce complexity and breakdown silosA data ingestion platform and operational data store designed for real-timeFlexibility and agility across on-premises, hybrid-cloud, and cloud environmentsManageable unit costs for ecosystem growth

Wrapping up

Real-time AI is about making data actionable with speed and accuracy. Most organizations’ data ecosystems, processes and capabilities are not prepared to build and update ML models at the speed required by the business for real-time data. Applying a cloud-native approach to applications, data, and AI improves scalability, speed, reliability, and portability across deployments. Every machine learning model is underpinned by data. 

A powerful datastore, along with enterprise streaming capabilities turns a traditional ML workflow (train, validate, predict, re-train …) into one that is real-time and dynamic, where the model augments and tunes itself on the fly with the latest real-time data.

Success requires defining a vision and execution strategy that delivers speed and scale across developers, data engineers, SREs, DBAs, and data scientists. It takes a new mindset and an understanding that all the data and ML components in a real-time data ecosystem have to work together for success. 

Special thanks to Eric Hare at DataStax, Robert Chong at Employers Group, and Steven Jones of VMWare for their contributions to this article. 

Learn how DataStax enables real-time AI.

About George Trujillo:

George is principal data strategist at DataStax. Previously, he built high-performance teams for data-value driven initiatives at organizations including Charles Schwab, Overstock, and VMware. George works with CDOs and data executives on the continual evolution of real-time data strategies for their enterprise data ecosystem. 

Artificial Intelligence, IT Leadership

Many people associate high-performance computing (HPC), also known as supercomputing, with far-reaching government-funded research or consortia-led efforts to map the human genome or to pursue the latest cancer cure.

But HPC can also be tapped to advance more traditional business outcomes — from fraud detection and intelligent operations to helping advance digital transformation. The challenge: making complex compute-intensive technology accessible for mainstream use.

As companies digitally transform and steer toward becoming data-driven businesses, there is a need for increased computing horsepower to manage and extract business intelligence and drive data-intensive workloads at scale. The rise of artificial intelligence (AI), machine learning (ML), and real-time analytics applications, often deployed at the edge, can utilize HPC resources to unlock insights from data and efficiently run increasingly large and more complex models and simulations.

The convergence of HPC with AI-based analytics is impacting nearly every industry and across a wide range of applications, including space exploration, drug discovery, financial modeling, automotive design, and systems engineering.

“HPC is becoming a utility in our lives — people aren’t thinking about what it takes to design this tire, validate a chip design, parse and analyze customer preferences, do risk management, or build a 3D structure of the COVID-19 virus,” notes Max Alt, distinguished technologist and director of Hybrid HPC at HPE. “HPC is everywhere, but you don’t think about it, because it’s hidden at the core.”

HPC’s scalable architecture is particularly well suited for AI applications, given the nature of computation required and the unpredictable growth of data associated with these workflows. HPC’s use of graphics-processing-unit (GPU) parallel processing power — coupled with its simultaneous processing of compute, storage, interconnects, and software — raises the bar on AI efficiencies. At the same time, such applications and workflows can operate and scale more readily.

Even with widespread usage, there is more opportunity to leverage HPC for better and faster outcomes and insights. HPC architecture — typically clusters of CPU and GPUs working in parallel and connected to a high-speed network and data storage system — is expensive, requiring a significant capital investment. HPC workloads are typically associated with vast data sets, which means that public cloud might be an expensive option due to requirements regarding latency and performance issues. In addition, data security and data gravity concerns often rule out public cloud.

Another major barrier to more widespread deployment: a lack of in-house specialized expertise and talent. HPC infrastructure is far more complex than traditional IT infrastructure, requiring specialized skills for managing, scheduling, and monitoring workloads. “You have tightly coupled computing with HPC, so all of the servers need to be well synchronized and performing operations in parallel together,” Alt explains. “With HPC, everything needs to be in sync, and if one node goes down, it can fail a large, expensive job. So you need to make sure there is support for fault tolerance.”

HPE GreenLake for HPC Is a Game Changer

An as-a-service approach can address many of these challenges and unlock the power of HPC for digital transformation. HPE GreenLake for HPC enables companies to unleash the power of HPC without having to make big up-front investments on their own. This as-a-service-based delivery model enables enterprises to pay for HPC resources based on the capacity they use. At the same time, it provides access to third-party experts who can manage and maintain the environment in a company-owned data center or colocation facility while freeing up internal IT departments.

“The trend of consuming what used to be a boutique computing environment now as-a-service is growing exponentially,” Alt says.

HPE GreenLake for HPC bundles the core components of an HPC solution (high-speed storage, parallel file systems, low-latency interconnect, and high-bandwidth networking) in an integrated software stack that can be assembled to meet an organization’s specific workload needs.

As part of the HPE GreenLake edge-to-cloud platform, HPE GreenLake for HPC gives organizations access to turnkey and easily scalable HPC capabilities through a cloud service consumption model that’s available on-premises. The HPE GreenLake platform experience provides transparency for HPC usage and costs and delivers self-service capabilities; users pay only for the HPC resources they consume, and built-in buffer capacity allows for scalability, including unexpected spikes in demand. HPE experts also manage the HPC environment, freeing up IT resources and delivering access to the specialized performance tuning, capacity planning, and life cycle management skills.

To meet the needs of the most demanding compute and data-intensive workloads, including AI and ML initiatives, HPE has turbocharged HPE GreenLake for HPC with purpose-built HPC capabilities. Among the more notable features are expanded GPU capabilities, including NVIDIA Tensor Core models; support for high-performance HPE Parallel File System Storage; multicloud connector APIs; and HPE Slingshot, a high-performance Ethernet fabric designed to meet the needs of data-intensive AI workloads. HPE also released lower entry points to HPC to make the capabilities more accessible for customers looking to test and scale workloads.

As organizations pursue HPC capabilities, they should consider the following:

Stop thinking of HPC in terms of a specialized boutique technology; think of it more as a common utility used to drive business outcomes.Look for HPC options that are supported by a rich ecosystem of complementary tools and services to drive better results and deliver customer excellence.Evaluate the HPE GreenLake for HPC model. Organizations can dial capabilities up and down, depending on need, while simplifying access and lowering costs.

HPC horsepower is critical, as data-intensive workloads, including AI, take center stage. An as-a-service model democratizes what’s traditionally been out of reach for most, delivering an accessible path to HPC while accelerating data-first business.

For more information, visit

High-Performance Computing

IT leaders seeking to derive business value from the data their companies collect face myriad challenges. Perhaps the least understood is the lost opportunity of not making good on data that is created, and often stored, but seldom otherwise interacted with.

This so-called “dark data,” named after the dark matter of physics, is information routinely collected in the course of doing business: It’s generated by employees, customers, and business processes. It’s generated as log files by machines, applications, and security systems. It’s documents that must be saved for compliance purposes, and sensitive data that should never be saved, but still is.

According to Gartner, the majority of your enterprise information universe is composed of “dark data,” and many companies don’t even know how much of this data they have. Storing it increases compliance and cybersecurity risks, and, of course, doing so also increases costs.

Figuring out what dark data you have, where it is kept, and what information is in it is an essential step to ensuring the valuable parts of this dark data are secure, and those that shouldn’t be kept are deleted. But the real advantage to unearthing these hidden pockets of data may be in putting it to use to actually benefit the business.

But mining dark data is no easy task. It comes in a wide variety of formats, can be completely unformatted, locked away in scanned documents or audio or video files, for example.

Here is a look at how some organizations are transforming dark data into business opportunities, and what advice industry insiders have for IT leaders looking to leverage dark data.

Coded audio from race car drivers

For five years, Envision Racing has been collecting audio recordings from more than 100 Formula E races, each with more than 20 drivers.

“The radio streams are available on open frequencies for anyone to listen to,” says Amaresh Tripathy, global leader of analytics at Genpact, a consulting company that helped Envision Racing make use of this data.

Previously the UK-based racing team’s race engineers tried to use these audio transmissions in real-time during races, but the code names and acronyms drivers used made it difficult to figure out what was being said and how it could be made use of, as understanding what other drivers were saying could help Envision Racing’s drivers with their racing strategy, Tripathy says.

“Such as when to use the attack mode. When to overtake a driver. When to apply brakes,” he says.

Envision Racing was also collecting sensor data from its own cars, such as from tires, batteries, and breaks, and purchasing external data from vendors, such as wind speed and precipitation.

Genpact and Envision Racing worked together to unlock the value of these data streams, making use of natural language processing to build deep learning models to analyze them. The process took six months, from preparing the data pipeline, to ingesting the data, to filtering out noise, to deriving meaningful conversations.

Tripathy says humans take five to ten seconds to figure out what they’re listening to, a delay that made the radio communications irrelevant. Now, thanks to the AI model’s predictions and insights, they can now respond in one to two seconds.

In July, at the ABB FIA Formula E World Championship in New York, the Envision Racing team took first and third places, a result Tripathy credits to making use of what was previously dark data.

Dark data gold: Human-generated data

Envision Racing’s audio files are an example of dark data generated by humans, intended for consumption by other humans — not by machines. This kind of dark data can be extremely useful for enterprises, says Kon Leong, co-founder and CEO of ZL Technologies, a data archiving platform provider.

“It is incredibly powerful for understanding every element of the human side of the enterprise, including culture, performance, influence, expertise, and engagement,” he says. “Employees share absolutely massive amounts of digital information and knowledge every single day, yet to this point it’s been largely untapped.”

The information contained in emails, messages, and files can help organizations derive insights such as who are the most influential people are in the organization. “Eighty percent of company time is spent communicating. Yet analytics often deals with data that only reflects 1% of our time spent,” Leong says.

Processing human-generated unstructured data is uniquely challenging. Data warehouses aren’t typically set up to handle these communications, for example. Moreover, collecting these communications can create new issues for companies to deal with, having to do with compliance, privacy, and legal discovery.

“These governance capabilities are not present in today’s concept of a data lake, and in fact by collecting data into a data lake, you create another silo which increases privacy and compliance risks,” Leong says.

Instead companies can also leave this data where it currently resides, simply adding a layer of indexing and metadata for searchability. Leaving the data in place will also keep it within existing compliance structures, he says.

Effective governance is key

Another approach to handling dark data of questionable value and origin is to start with traceability.

“It’s a positive development in the industry that dark data is now recognized as an untapped resource that can be leveraged,” says Andy Petrella, author of Fundamentals of Data Observability, currently available in pre-release form from O’Reilly. Petrella is also the founder of data observability provider Kensu.

“The challenge with utilizing dark data is the low levels of confidence in it,” he says, in particular around where and how the data is collected. “Observability can make data lineage transparent, hence traceable. Traceability enables data quality checks that lead to confidence in employing these data to either train AI models or act on the intelligence that it brings.”

Chuck Soha, managing director at StoneTurn, a global advisory firm specializing in regulatory, risk, and compliance issues, agrees that the common approach to tackling dark data — throwing everything into a data lake — poses significant risks.

This is particularly true in the financial services industry, he says, where companies have been sending data into data lakes for years. “In a typical enterprise, the IT department dumps all available data at their disposal into one place with some basic metadata and creates processes to share with business teams,” he says.

That works for business teams that have the requisite analytics talent in-house or that bring in external consultants for specific use cases. But for the most part these initiatives are only partially successful, Soha says.

“CIOs transformed from not knowing what they don’t know to knowing what they don’t know,” he says.

Instead, companies should begin with data governance to understand what data there is and what issues it might have, data quality chief among them.

“Stakeholders can decide whether to clean it up and standardize it, or just start over with better information management practices,” Soha says, adding that investing in extracting insights from data that contains inconsistent or conflicting information would be a mistake.

Soha also advises connecting the dots between good operational data already available inside individual business units. Figuring out these relationships can create rapid and useful insights that might not require looking at any dark data right away, he says. “And it might also identify gaps that could prioritize where in the dark data to start to look to fill those gaps in.”

Finally, he says, AI can be very useful in helping make sense of the unstructured data that remains. “By using machine learning and AI techniques, humans can look at as little as 1% of dark data and classify its relevancy,” he says. “Then a reinforcement learning model can quickly produce relevancy scores for the remaining data to prioritize which data to look at more closely.”

Using AI to extract value

Common AI-powered solutions for processing dark data include Amazon’s Textract, Microsoft’s Azure Cognitive Services, and IBM’s Datacap, as well as Google’s Cloud Vision, Document, AutoML, and NLP APIs.

In Genpact’s partnership with Envision Racing, Genpact coded the machine learning algorithms in-house, Tripathy says. This required knowledge of Docker, Kubernetes, Java, and Python, as well as NLP, deep learning, and machine learning algorithm development, he says, adding that an MLOps architect managed the complete process.

Unfortunately, these skills are hard to come by. In a report released last fall by Splunk, only 10% to 15% of more than 1,300 IT and business decision makers surveyed said their organizations are using AI to solve the dark data problem. Lack of necessary skills was a chief obstacle to making use of dark data, second only to the volume of the data itself.

A problem (and opportunity) on the rise

In the meantime, dark data remains a mounting trove of risk — and opportunity. Estimates of the portion of enterprise data that is dark vary from 40% to 90%, depending on industry.

According to a July report from Enterprise Strategy Group, and sponsored by Quest, 47% of all data is dark data, on average, with a fifth of respondents saying more than 70% of their data is dark data. Splunk’s survey showed similar findings, with 55% of all enterprise data, on average, being dark data, and a third of respondents saying that 75% or more of their organization’s data is dark.

And the situation is likely to get worse before it gets better, as 60% of respondents say that more than half of the data in their organization is not captured at all and much of it is not even understood to exist. As that data is found and stored, the amount of dark data is going to continue to go up.

It’s high time CIOs put together a plan on how to deal with it — with an eye toward making the most of any dark data that shows promise in creating new value for the business.

Analytics, Data Management, Data Science

A modern, agile IT infrastructure has become the critical enabler for success, allowing organizations to unlock the potential of new technologies such as AI, analytics, and automation. Yet modernization journeys are often bumpy; IT leaders must overcome barriers such as resistance to change, management complexity, high costs, and talent shortages.

Those successful in their modernization endeavors can expect significant business gains. In Ampol’s case, the transport fuels provider enjoyed enhanced operational efficiency, business agility, and maximized service uptimes.

A vision for transformation, hampered by legacy

Ampol had a clear goal: intelligent operations for improved service reliability, increased agility, and reduced cost. To achieve this, Ampol created a vision centered on “uplifting and modernizing existing cloud environment and practices,” according to Lindsay Hoare, Ampol’s Head of Technology.

This meant having enterprise-wide visibility and environment transparency for real-time updates, modernizing its environment management capabilities with cloud-based and cloud-ready tools, building the right capabilities and skillsets for the cloud, redesigning the current infrastructure into a cloud-first one, and leveraging automation for enhanced operations.  

While Ampol had most workloads in the cloud, it is still highly dependent on its data center. This meant added complexity to infrastructure networking and management, which in turn drove up maintenance and management costs. The need for human intervention across the environment further increased the risk of error and resultant downtime. Its ambition to enable automation across the entire enterprise, at that point in time, felt unattainable as it lacked the technical expertise and capabilities to do so.

Realizing its ambitions with the right partner

Ampol knew it was not able to modernize its enterprise and bridge the ambition gap alone. It then turned to Accenture. “We needed a partner with a cloud mindset, one that could cover the technological breadth at which Ampol operates,” said Hoare. “Hence why we turned to Accenture, with whom we’ve built a strong partnership that has spanned over a decade.”

Accenture has been helping Ampol in its digital transformation journey across many aspects of its IT operations and as such has a deep understanding of Ampol’s automation ambitions.

“We brought to the table our AIOps capability that leverages automation, analytics, and AI for intelligent operations. Through our ongoing work with Ampol, we were able to accelerate cloud adoption alongside automation implementation, reducing implementation and deployment time,” said Duncan Eadie, Accenture’s Managing Director of Cloud, Infra, and Engineering for AAPAC.

Reaping business benefits through intelligent operations

Through its collaboration with Accenture, Ampol was able to realize its vision for intelligent operations which then translates to business benefits.

Visualization and monitoring

Ampol can now quickly pinpoint incidents to reduce the time to resolve. Recently, a device failure impacted Ampol’s retail network and service stations, but a map-based visualization of the network allowed engineers to identify the device and switch over to the secondary within the hour: an 85% improvement in downtime reduction.

Self-healing capabilities

Intelligent operations not only detect failures but also attempt to resolve them independently and create incidents for human intervention only when basic resolution is unsuccessful. As a result, Ampol’s network incidents have been reduced by 40% while business-impacting retail incidents are down by half.

Automating mundane tasks

Automation now regularly takes care of mundane and routine tasks such as patching, updates, virtual machine builds, and software installs. This frees up employees’ time that is otherwise spent on maintenance, enabling them to innovate and add real business value through working on more strategic assignments and business growth.


As Ampol focuses on the global energy transition, it is investing in new energy solutions in a highly dynamic environment. A cloud-first infrastructure removes complexity, increases the levels of abstraction, and offers greater leverage of platform services, enabling agility and responsiveness. The right architecture and security zoning facilitate critical business-led experimentation and innovation to ensure Ampol continues to place at the front of the pack.

As IT infrastructure becomes a critical enabler across industries, organizations are compelled to embrace modernization. While significant roadblocks exist, a clear vision and the right partner can help overcome challenges and unlock the potential of the cloud, AI and analytics, and automation, to be a true game-changer.

“This is a long journey,” says Hoare, “we’ve been at it for years now… It needs drive and tenacity. But when you get there, you’ll be in a great place.”

Learn more about getting started with a modern infrastructure here.

Cloud Management, Digital Transformation