As one of the largest IT service providers in the world, TCS produces and depends on a massive amount of data to conduct and grow its business. But like many enterprises, its data practices made it difficult to derive timely, actionable insights from ever-increasing volumes of data, preventing the Mumbai-based multinational from becoming a truly data-driven business.

Specifically, TCS faced three major issues in dealing with its siloed data. First, there was the lack of customized reporting and the need to rely on application teams to develop those reports. Then there was the lack of a centralized, secure data management platform, with disparate data from various sources needing to be consolidated and validated, leading to high turnaround times and data quality issues. And third, without a central data platform, TCS was challenged to make use of AI-powered insights in helping business users run their operations efficiently and effectively[YS1] . 

“The company was unable to derive actionable insights from soaring volumes of data as it grappled with traditional information architecture supporting data silos, which are difficult to interpret, thereby leading to extended cycle times of data retrieval,” says Abhijit Mazumder, CIO of TCS, adding that a lack of data visualization interfaces further complicated the process of data-driven decision making for the IT services provider, which employs nearly half a million consultants across 46 countries.

Leveraging the power of platform

Confronted with these challenges, Mazumdar and TCS set about developing a self-service data analytics platform that could act as a single source of truth for data generated in its internal IT ecosystem, a project that has earned TCS a CIO 100 US Award for innovation and IT leadership.

The mandate for the platform was clear: It had to empower business users to identify patterns, detect anomalies, and glean tangible insights from the vast volumes of complex datasets at their disposal; provide centralized data for regulating and controlling data access and interaction at all levels; provide intuitive visualization platforms to facilitate customized interpretations of data that would help foster faster, informed decision-making; and reduce dependence on IT to help make TCS more agile in acting on its data.

“We built the solution on the fundamental principle that data has shape, color, and texture. The aim was to enable users to understand hidden patterns and insights in data, effortlessly and quickly,” says Mazumder.

To realize the solution, TCS IT had to develop a new information architecture, Mazumdar says, one that could provide consistent, reliable, near real-time data across all user roles. The architecture includes a common central data integration layer that serves all the analytics needs of the organization across business functions. It also relies on data marts to enable business functions to harness the power of self-service analytics capabilities. A multi-layered security solution was also designed to secure access to data based on role access and digital rights management systems.

To help business users make informed and timely decisions, self-service business intelligence with simple drag-and-drop features was built into the platform. This enabled users to create their own visualizations without scripting needing to be provided. There were also out-of-the-box interactive visualizations and enterprise dashboards that instantly respond to interactions and changes in context providing intuitive intelligence.

The platform also facilitated TCS making use of machine learning and AI. The former replaced manual activity for custom report generation, while the latter drove a conversational analytics experience to leadership, based on natural language queries, thereby providing a faster, easier way to ask questions, get insights, and make data-driven decisions.

“The storytelling feature of the platform provides multiple points of view with ability to dive back into source analysis at any point. It combines the narrative capabilities with data visualizations to deliver compelling and easily understandable insights. Besides, natural language queries with smart search helped to navigate complex information to accelerate data discovery,” says Mazumder.

Over 700 dashboards were deployed for use across 70 departments, with secure access ensured for 20,000 users across multiple geographies to enable data availability for business anytime anywhere. Moreover, the system now enables TCS business users to access data insights in near real-time, the CIO says.

“The solution enables interactive visualizations and enterprise dashboards that instantly respond to interactions and changes in context providing intuitive intelligence,” says Mazumder.

The cloud-based solution, which took nearly a year to build starting in 2020, relied heavily on the work of a centralized, highly skilled team of 30 people, who implemented the common data fabric and visual analytics framework, performed the necessary data modelling, and established governance guidelines and policies. Various data integration, data modelling, visual analytics, business intelligence, and data analytics tools were evaluated and selected for inclusion in the project based on flexibility, scalability, pricing, and several other factors keeping overall benefits in mind, Mazumder says.

The data-driven advantage

In addition to driving data insights, Mazumdar says the centralized data architecture has greatly reduced resource utilization, cutting down data load times and data storage requirements by around 90%.

“There has been 85% reduction in projects with delivery risks because of proactive tracking enabled with interactive dashboards. The advanced use of analytics helped in tracking over 25,000 projects during remote way of working being practiced during the COVID-19 outbreak,” says Mazumder.

“There has been a reduction of turnaround time for delivery of analytics requirements by 50% compared to the earlier processes. It also improved productivity and enabled smart, quick decision-making across the organization with near real-time data analytics,” he says.

The in-house project has also helped the IT services provider streamline its IT operations. “With this new solution, TCS has been reducing enterprise technical debt by consolidating and retiring legacy platforms to the next-generation advanced analytics platforms,” says Mazumder.

But the biggest takeaway is how the homegrown self-service analytics platform has impacted business operations.

“The enabling of analytics- and insights-driven enterprise are creating new business opportunities to drive superior business performance by empowering [business users across the organization] with rapid and actional insights,” says Mazumder.

Business Intelligence and Analytics Software

Analytics have evolved dramatically over the past several years as organizations strive to unleash the power of data to benefit the business. While many organizations still struggle to get started, the most innovative organizations are using modern analytics to improve business outcomes, deliver personalized experiences, monetize data as an asset, and prepare for the unexpected.

Modern analytics is about scaling analytics capabilities with the aid of machine learning to take advantage of the mountains of data fueling today’s businesses, and delivering real-time information and insights to the people across the organization who need it. To meet the challenges and opportunities of the changing analytics landscape, technology leaders need a data strategy that addresses four critical needs:

Deliver advanced analytics and machine learning that can scale and adapt to whatever evolutions in applications and data science the future may hold.Break down internal data silos to create boundaryless innovation while enabling greater collaboration with partners outside of their own organization.Embrace the democratization of data with low-code/no-code technologies that offer the insight and power of analytics to anyone in the organization.Embed analytics into business processes to create more compelling, relevant customer experiences and insights in real time.

Building a foundation for flexible and scalable analytics

Migrating analytics from on-premises systems to the cloud opens a realm of applications and capabilities and has allowed organizations to gradually shed the restraints of legacy architecture, with the proper controls in place.

“The migration of advanced analytics to the cloud has been an iterative, evolving process,” said Deirdre Toner, Go-To-Market leader for AWS’s analytics portfolio of services. AWS doesn’t recommend that organizations try to completely re-create its on-premises environment in the cloud. “Migration works best by considering the guardrails and processes needed to collect data, store it with the appropriate security and governance models, and then accelerate innovation,” Toner said. “Don’t just lift and shift with the old design principles that caused today’s bottlenecks. This is an opportunity to modernize and break down old architectural patterns that no longer serve the business.”

The goal is a data platform that can evolve and can scale almost infinitely, using an iterative approach to maintain flexibility, with guardrails in place. “IT leaders want to avoid having to re-do the architecture every couple of years to keep pace with changing market requirements,” said Toner. “As use cases change, or if unforeseen changes in market conditions suddenly emerge – and they surely did during the pandemic – organizations need to be able to respond quickly. Being locked into a data architecture that can’t evolve isn’t acceptable.”

Aurora – a company transforming the future of transportation by building self-driving technology for trucks and other vehicles – took advantage of the scalability of cloud-based analytics in the development of its autonomous driver technology. Aurora built a cloud testing environment on AWS to better understand the safety of its technology by seeing how it would react to scenarios too dangerous or rare to simulate in the real world. With AWS, Aurora can run 5 million simulations per day, the virtual equivalent of 6 billion miles of road testing. Aurora combined its proprietary technology with many AWS database, analytics, and machine learning solutions, including Amazon EMR, Amazon DynamoDB, AWS Glue, and Amazon SageMaker. The solutions helped Aurora reach levels of scale not possible in a real-world testing environment, which accelerated their innovation capabilities.

Moving beyond silos to “borderless” data

Integrating internal and external data and achieving a “borderless” state for sharing information is a persistent problem for many companies who want to make better use of all the data they’re collecting or can have access to in shared environments. Toner emphasized the importance of breaking down data silos to become truly data driven.

Organizations also need to explore new ways to harness third-party data from partners or customers, which increases the need for comprehensive governance policies to protect that data. Solutions such as data clean rooms are becoming more popular as a way to leverage data from outside providers, or monetize proprietary data sets, in a compliant and secure way.

AWS Data Exchange makes it easy for customers to find, subscribe to, and use third-party data from a wide range of sources, Toner said. For example, one financial services customer needed a better way to quickly find, procure, ingest, and process data provided by hundreds of vendors. But its existing data integration and analysis process took too long and used too many resources, putting at risk the bank’s reputation for providing expert insights to investors in fast-changing markets.

The company used AWS Data Exchange to streamline its consumption of third-party data, enabling teams across the company to build applications and analyze data more efficiently. AWS Data Exchange helped the firm eliminate the undifferentiated heavy lifting of ingesting and getting third-party data ready, freeing developers to dedicate more time toward generating insights for their clients.

Making analytics accessible to the masses

The consumerization of data and the broad applicability of machine learning have led to the emergence of low-code/no-code tools that make advanced analytics accessible to non-technical users.

“The simplification of tools is a crucial aspect of changing how a user prepares their data, picks the best model, and performs predictions without writing a single line of code,” said Toner. Amazon SageMaker Canvas  and Amazon QuickSight are two examples of the low-code/no-code movement in machine learning and analytics, respectively.

SageMaker Canvas has a simple drag and click user interface that allows a non-technical person to create an entire machine learning workflow without writing a single line of code. QuickSight Q, powered by machine learning, makes it easy for any user to simply ask natural language questions and get answers in real time.  

Embedding insights and experiences

Toner emphasized the importance of understanding that the types of people who need access to data across the business are expanding. “You can’t just build an analytics environment that serves a handful of developers and data scientists,” she said. “You need to make sure that the people who need data for decision making can find it, access it, and interpret that data in the moment it is important to them and the business.”

A cloud-based data strategy makes it possible to embed the power of data directly into customer experiences and workflows by making relevant data available as it’s needed. Toner used the example of Best Western, the hotel and hospitality brand using real-time analytics to give its revenue management team the capability to set room rates at any given moment. The result: improved revenue gains and the ability to be more responsive to customers.

“Best Western used to rely on static reports and limited data sets to set room rates,” Toner said. “Now, with QuickSight, they can access a much broader set of data in real time to get the insights they need to make better decisions and improve the efficiency of every team member.”

Addressing these four core components of modern analytics will help CIOs, CDOs, and their teams develop and deploy a data strategy that delivers value across the business today, while being flexible enough to adapt to whatever may happen tomorrow. 

Learn more about ways to put your data to work on the most scalable, trusted, and secure cloud.


Data-informed decision-making is a key attribute of the modern digital business. But experienced data analysts and data scientists can be expensive and difficult to find and retain.

One potential solution to this challenge is to deploy self-service analytics, a type of business intelligence (BI) that enables business users to perform queries and generate reports on their own with little or no help from IT or data specialists.

Self-service analytics typically involves tools that are easy to use and have basic data analytics capabilities. Business professionals and leaders can leverage these to manipulate data so they can identify market trends and opportunities, for example. They’re not required to have any experience with analytics or background in statistics or other related disciplines.

Given the ongoing gap between the demand for experienced data analysts and the supply of these professionals — and the desire to quickly get valuable business insights into the hands of the users who need it most — it’s easy to see why enterprises would find self-service analytics appealing.

But there are right and wrong ways to deploy and use self-service analytics. Here are some tips for IT leaders looking to make good on the promise of self-service analytics strategies.

Have a clear, comprehensive analytics plan

Data analytics and analytics tools have gained such a high profile within many businesses that it’s easy to see how they can be overused or inappropriately applied. This is even more of an issue with self-service analytics, because it enables a much larger range and base of people to analyze data.

That’s why it’s important to establish a plan for where and when it makes sense to use analytics, and to have reasonable controls to keep your analytics strategy from becoming a free for all.

“Determine your mission, vision, and questions you need to answer around analytics before even starting,” says Brittany Meiklejohn, a business and sales process analyst at Swagelok, a developer of fluid system products and services for the oil, gas, chemical, and clean energy industries.

“It is extremely easy to get caught up on all the charts and graphs you can create, but that gets overwhelming very quickly,” Meiklejohn says. “Having that roadmap from the start helps to trim down and focus on the actual metrics to create. Have a data governance plan as well to validate and keep the metrics clean. As soon as one metric is not accurate it is hard to get the buy-in again, so routinely confirming accuracy on all analytics is extremely important.”

The analytics plan should emphasize the use of proactive data as much as possible, Meiklejohn says. “Focus [on] data that is actionable and can be implemented back into the business,” she says. “Incorporate learnings to transform processes and decision-making at an organizational scale. It is great to understand the historical side of the business, but it is hard to change if you are only looking at the past.”

At Swagelok, departments are using self-service analytics tools from Domo to determine whether customer orders will be late, schedule production runs, analyze sales performance, and make supply chain decisions.

“We have seen an increase in efficiency; everyone is able to get the data they need to drive decisions much faster than before,” Meiklejohn says. “We are making more responsible data-driven decisions, since each department is using the data for decision-making.”

Go for quick wins

While it’s important to have a long-range analytics strategy in place, that doesn’t mean organizations should move at a plodding pace with self-service analytics.

“In my previous company, our advanced material business had a saying, ‘Go fast, take risks, and learn,’” says Keith Carey, CIO at Hemlock Semiconductor, a maker of products for the electronic and solar power industries. “That would be my advice for those just getting started [with self-service analytics]. Don’t get me wrong, governance is very important and can come along a little later so as not to stifle creativity.”

It’s a good idea to find a small work group “and assign a moonshot mission to demonstrate the art of possible,” Carey says. He suggests teams focus “on the data pipelines that drive consistent business logic and metrics across the enterprise. Understand the importance of timeliness and quality of the data on which important decisions are being made. That’s a great place to start.”

Hemlock launched a self-service analytics initiative in 2018 using Tibco’s Spotfire platform, which is currently being used by all functions of the business. “Prior to that, IT would develop custom .NET applications that wrangled data and provided initial charting capability,” Carey says. “The most popular feature of these apps was an ‘export to Excel’ button, where [the Microsoft spreadsheet] became the analytics platform of choice.”

A handful of the company’s brightest engineers also created macros that would mash up new data sets, “which took overnight to run on someone’s PC,” Carey says. “And hopefully, if it didn’t crash, the data set was shared out amongst the engineering professionals.

With self-service analytics capabilities, Hemlock has seen benefits such as faster decision-making and quicker results. Self-service enables all functions, including operations, finance, procurement, supply chain, and continuous improvement teams, to perform data discovery and create powerful visualizations.

“We shortened the learning curve, delivered results faster, and accelerated our understanding of our manufacturing processes, which led to improving our products and reducing cost,” Carey says. “Within a very short time, we saved millions of dollars by improving existing reporting methods and discovering new insights.”

Leverage natural language processing

Natural language processing (NLP) makes analytics more accessible to greater numbers of people by eliminating the need to understand SQL, database structures, and the concept of joining tables together, says Dave Menninger, senior vice president and research director at Ventana Research.

There are two main aspects of NLP as it relates to analytics, Menninger says: natural language search — also known as natural language query—and natural language presentation — also known as natural language generation.

“Natural language search allows people to ask questions and get responses without [any] special syntax,” Menninger says. “Just like typing a search into a Google search bar you can type, or in some cases speak, a query using everyday language.”

For example, a user could ask to see the products that had the biggest increase or decrease in sales for that month. The results would be displayed and then the user could refine the search, for instance, to determine the inventory on hand for certain products.

Natural language presentation deals with the results of analyses rather than the query portion, Menninger says. “Once a query has been formulated, using NLP or otherwise, the results are displayed as narratives explaining what was found,” he says.

In the product example, instead of displaying a chart of products showing the sales increases or decreases, natural language presentation would generate a few sentences or a paragraph describing specific details about the products.

“People have different learning styles,” Menninger says. “Some like tables of numbers. Some prefer charts. Others don’t know how to interpret tables or charts and prefer narratives. Natural language presentation makes It easier to know what to look for in an analysis. It also removes the inconsistency in the way data is interpretated by spelling out exactly what should be taken away from the analysis.”

Use embedded analytics

Embedded analytics involves the integration of analytical capabilities and data visualizations into business applications. Embedding real-time reports and dashboards into these applications enables business users to analyze the data in these applications.

“Embedded analytics brings the analytics to the applications that individuals are using in [their] day-to-day activities,” Menninger says. This might include line-of-business applications such as enterprise resource planning (ERP), customer relationship management (CRM), or human resources information systems (HRIS), as well as productivity tools such as collaboration, email, spreadsheets, presentations, and documents.

“In the context of business applications, pre-built analyses make it much easier for line-of-business personnel to access and utilize analytics,” Menninger says. “It also provides good governance, since the data is managed by the underlying application where access rights are already maintained.”

Choose the right tools

The difference between success and failure with self-service analytics can come down to the technology tools companies choose to deploy. Business executives need to work with closely with IT leadership to evaluate tools and determine which ones best meet the needs of the organization and fit with its infrastructure.

Among the requirements financial services firm Western Union had when selecting a self-service analytics platform was that it be easy to integrate with multiple disparate data sources, be flexible and easy to use, have powerful analytical capabilities, and have minimal infrastructure requirements.

The company deployed a platform from Tableau to enable business users to make decisions based on their own queries and analyses in a governed environment, says Harveer Singh, chief data architect and head of data engineering and architecture at Western Union.

Business departments can create their own queries and reports and collaborate without the need for support from IT, Singh says. “Users have freedom to slice and dice the data without technical know-how,” he says. “Data can be derived from multiple sources in various formats.”

When organizations select the right analytics tools, self-service analytics “empowers business users to retrieve and analyze the data without the need for IT expertise/product specialists for report development and analysis,” Singh says. It’s an asset “that responds to dynamic business requirements.”

Analytics, Business Intelligence

The benefits of analyzing vast amounts of data, long-term or in real-time, has captured the attention of businesses of all sizes. Big data analytics has moved beyond the rarified domain of government and university research environments equipped with supercomputers to include businesses of all kinds that are using modern high performance computing (HPC) solutions to get their analytics jobs done. Its big data meets HPC ― otherwise known as high performance data analytics. 

Bigger, Faster, More Compute-intensive Data Analytics

Big data analytics has relied on HPC infrastructure for many years to handle data mining processes. Today, parallel processing solutions handle massive amounts of data and run powerful analytics software that uses artificial intelligence (AI) and machine learning (ML) for highly demanding jobs.

A report by Intersect360 Research found that “Traditionally, most HPC applications have been deterministic; given a set of inputs, the computer program performs calculations to determine an answer. Machine learning represents another type of applications that is experiential; the application makes predictions about new or current data based on patterns seen in the past.”

This shift to AI, ML, large data sets, and more compute-intensive analytical calculations has contributed to the growth of the global high performance data analytics market, which was valued at $48.28 billion in 2020 and is projected to grow to $187.57 billion in 2026, according to research by Mordor Intelligence. “Analytics and AI require immensely powerful processes across compute, networking and storage,” the report explained. “As a result, more companies are increasingly using HPC solutions for AI-enabled innovation and productivity.”

Benefits and ROI

Millions of businesses need to deploy advanced analytics at the speed of events. A subset of these organizations will require high performance data analytics solutions. Those HPC solutions and architectures will benefit from the integration of diverse datasets from on-premise to edge to cloud. The use of new sources of data from the Internet of Things to empower customer interactions and other departments will provide a further competitive advantage to many businesses. Simplified analytics platforms that are user-friendly resources open to every employee, customer, and partner will change the responsibilities and roles of countless professions.

How does a business calculate the return on investment (ROI) of high performance data analytics? It varies with different use cases.

For analytics used to help increase operational efficiency, key performance indicators (KPIs) contributing to ROI may include downtime, cost savings, time-to-market, and production volume. For sales and marketing, KPIs may include sales volume, average deal size, revenue by campaign, and churn rate. For analytics used to detect fraud, KPIs may include number of fraud attempts, chargebacks, and order approval rates. In a healthcare environment, analytics used to improve patient outcomes might include key performance indicators that track cost of care, emergency room wait times, hospital readmissions, and billing errors.

Customer Success Stories

Combining data analytics with HPC:

A technology firm applies AI, machine learning, and data analytics to client drug diversion data from acute, specialty, and long-term care facilities and delivers insights within five minutes of receiving new data while maintaining a HPC environment with 99.99% uptime to comply with service level agreements (SLAs).A research university was able to tap into 2 petabytes of data across two HPC clusters with 13,080 cores to create a mathematical model to predict behavior during the COVID-19 pandemic.A technology services provider is able to inspect 124 moving railcars ― a 120% reduction in inspection time ― and transmit results in eight minutes, based on processing and analyzing 1.31 terabytes of data per day.A race car designer is able to process and analyze 100,000 data points per second per car ― one billion in a two-hour race ― that are used by digital twins running hundreds of different race scenarios to inform design modifications and racing strategy.  Scientists at a university research center are able to utilize hundreds of terabytes of data, processed at I/O speeds of 200 Gbps, to conduct cosmological research into the origins of the universe.

Data Scientists are Part of the Equation

High performance data analytics is gaining stature as more and more data is being collected.  Beyond the data and HPC systems, it takes expertise to recognize and champion the value of this data. According to Datamation, “The rise of chief data officers and chief analytics officers is the clearest indication that analytics has moved from the backroom to the boardroom, and more and more often it’s data experts that are setting strategy.” 

No wonder skilled data analysts continue to be among the most in-demand professionals in the world. The U.S. Bureau of Labor Statistics predicts that the field will be among the fastest-growing occupations for the next decade, with 11.5 million new jobs by 2026. 

For more information read “Unleash data-driven insights and opportunities with analytics: How organizations are unlocking the value of their data capital from edge to core to cloud” from Dell Technologies. 


Intel® Technologies Move Analytics Forward

Data analytics is the key to unlocking the most value you can extract from data across your organization. To create a productive, cost-effective analytics strategy that gets results, you need high performance hardware that’s optimized to work with the software you use.

Modern data analytics spans a range of technologies, from dedicated analytics platforms and databases to deep learning and artificial intelligence (AI). Just starting out with analytics? Ready to evolve your analytics strategy or improve your data quality? There’s always room to grow, and Intel is ready to help. With a deep ecosystem of analytics technologies and partners, Intel accelerates the efforts of data scientists, analysts, and developers in every industry. Find out more about Intel advanced analytics.

Data Management

Good cyber hygiene helps the security team reduce risk. So it’s not surprising that the line between IT operations and security is increasingly blurred. Let’s take a closer look.

One of the core principles in IT operations is “you can’t manage what you don’t know you have.” By extension, you also can’t secure what you don’t know you have. That’s why visibility is important to IT operations and security. Another important aspect is dependency mapping. Dependency mapping is part of visibility, showing the relationships between your servers and the applications or services they host.

There are many security use cases where dependency mapping comes into play. For example, if there’s a breach, dependency mapping offers visibility into what’s affected. If a server is compromised, what is it talking to? If it must be taken offline, what applications will break?

To further erase the line between IT operations and security, many operations tools have a security dimension as well.

What is good cyber hygiene?

Good cyber hygiene is knowing what you have and controlling it. Do you have the licenses you need for your software? Are you out of compliance and at risk for penalties? Are you paying for licenses you’re not using? Are your endpoints configured properly? Is there software on an endpoint that shouldn’t be there? These questions are all issues of hygiene, and they can only be answered with visibility and control. 

To assess your cyber hygiene, ask yourself:

What do you have?Is it managed?Do managed endpoints meet the criteria set for a healthy endpoint?

Think of endpoints in three categories: managed, unmanaged and unmanageable. Not all endpoints are computers or servers. That’s why good cyber hygiene requires tools that can identify and manage devices like cell phones, printers and machines on a factory floor.

There is no single tool that can identify and manage every type of endpoint. But the more visibility you have, the better your cyber hygiene. And the better your risk posture.

Work-from-home (WFH) made visibility much harder. If endpoints aren’t always on the network, how do you measure them? Many network tools weren’t built for that. But once you know what devices you have, where they are and what’s on them, you can enforce policies that ensure these devices behave as they should.

You also want the ability to patch and update software quickly. When Patch Tuesday comes around, can you get critical patches on all your devices in a reasonable time frame? Will you know in real time what’s been patched and what wasn’t? It’s about visibility.

That way, when security comes to operations and says, “There’s a zero-day flaw in Microsoft Word. How many of your endpoints have this version?” Operations can answer that question. They can say, “We know about that, and we’ve already patched it.” That’s the power of visibility and cyber hygiene.

Good hygiene delivers fresh data for IT analytics

Good hygiene is critical for fresh, accurate data. But in terms of executive hierarchy, where does the push for good cyber hygiene start? Outside of IT and security, most executives probably don’t think about cyber hygiene. They think about getting answers to questions that rely on good IT hygiene.

For example, if CFOs have a financial or legal issue around license compliance, they probably assume the IT ops team can quickly provide answers. Those executives aren’t thinking about hygiene. They’re thinking about getting reliable answers quickly.

What C-level executives need are executive dashboards that can tell them whether their top 10 business services are healthy. The data the dashboards display will vary depending on the executive and business the organization is in.

CIOs may want to know how many Windows 10 licenses they’re paying for. The CFO wants to know if the customer billing service is operating. The CMO needs to know if its customer website is running properly. The CISO wants to know about patch levels. This diverse group of performance issues all depends on fresh data for accuracy.

Fresh data can bring the most critical issues to the dashboard, so management doesn’t have to constantly pepper IT with questions. All this starts with good cyber hygiene.

Analytics supports alerting and baselining

When an issue arises, like a critical machine’s CPU use is off the charts, an automated alert takes the burden off IT to continuously search for problems. This capability is important for anyone managing an environment at scale; don’t make IT search for issues.

Baselining goes hand-in-hand with alerting because alerts must have set thresholds. Organizations often need guidance on how to set thresholds. There are several ways to do it and no right way.

One approach is automatic baselining. If an organization thinks its environment is relatively healthy, the current state is the baseline. So it sets up alerts to notify IT when something varies from that.

Analytics can play an important role here by helping organizations determine whether normal is the same as healthy. Your tools should tell you what a healthy endpoint looks like and that’s the baseline. Alerts tell you when something happens that changes that baseline state.

Analytics helps operations and security master the basics

Visibility and control are the basics of cyber hygiene. Start with those. Know what’s in your environment and what’s running on those assets—not a month ago—right now. If your tools can’t provide that information, you need tools that can. You may have great hygiene on 50 percent of the machines you know about, but that won’t get the job done. Fresh data from every endpoint in the environment: that’s what delivers visibility and control.

Need help with cyber hygiene? Here’s a complete guide to get you started.


Cyber hygiene describes a set of practices, behaviors and tools designed to keep the entire IT environment healthy and at peak performance—and more importantly, it is a critical line of defense. Your cyber hygiene tools, as with all other IT tools, should fit the purpose for which they’re intended, but ideally should deliver the scale, speed, and simplicity you need to keep your IT environment clean.

What works best is dependent on the organization. A Fortune 100 company will have a much bigger IT group than a firm with 1,000 employees, hence the emphasis on scalability. Conversely, a smaller company with a lean IT team would prioritize simplicity.

It’s also important to classify your systems. Which ones are business critical? And which ones are external versus internal facing? External facing systems will be subject to greater scrutiny.

In many cases, budget or habit will prevent you from updating certain tools. If you’re stuck with a tool you can’t get rid of, you need to understand how your ideal workflow can be supported. Any platform or tool can be evaluated against the scale, speed and simplicity criteria.

An anecdote about scale, speed and complexity

Imagine a large telecom company with millions of customers and a presence in nearly every business and consumer-facing digital service imaginable. If your organization is offering an IT tool or platform to customers like that, no question you’d love to get your foot in the door.

But look at it from the perspective of the telecom company. No tool they’ve ever purchased can handle the scale of their business. They’re always having to apply their existing tools to a subset of a subset of a subset of their environment. 

Any tool can look great when it’s dealing with 200 systems. But when you get to the enterprise size, those three pillars are even more important. The tool must work at the scale, speed, and simplicity that meets your needs.

The danger of complacency

With all the thought leadership put into IT operations and security best practices, why is it that many organizations are content with having only 75% visibility into their endpoint environment? Or 75% of endpoints under management? 

It’s because they’ve accepted failure as built into the tools and processes they’ve used over the years. If an organization wants to stick with the tools it has, it must:

Realize their flaws and limitationsMeasure them on the scale, speed and simplicity criteriaDetermine the headcount required to do things properly

Organizations cannot remain attached to the way they’ve always done things. Technology changes too fast. The cliché of “future proof” is misleading. There’s no future proof. There’s only future adaptable.

Old data lies

To stay with the three criteria of strong cyber hygiene—scale, speed and simplicity—nothing is more critical than the currency of your data. Any software or practice that supports making decisions on old data should be suspect. 

Analytics help IT and security teams make better decisions. When they don’t, the reason is usually a lack of quality data. And the quality issue is often around data freshness. In IT, old data is almost never accurate. So decisions based on it are very likely to be wrong. Regardless of the data set, whether it’s about patching, compliance, device configuration, vulnerabilities or threats, old data is unreliable.

The old data problem is compounded by the number of systems a typical large organization relies on today. Many tools we still use were made for a decades-old IT environment that no longer exists. Nevertheless, today tools are available to give us real-time data for IT analytics.

IT hygiene and network data capacity

Whether you’re a 1,000-endpoint or 100,000-endpoint organization, streaming huge quantities of real-time data will require network bandwidth to carry it. You may not have the infrastructure to handle real-time data from every system you’re operating. So, focus on the basics. 

That means you need to understand and identify the core business services and applications that are most in need of fresh data. Those are the services that keep a business running. With that data, you can see what your IT operations and security posture look like for those systems. Prioritize. Use what you have wisely.

To simplify gathering the right data, streamline workflows

Once you’ve identified your core services, getting back to basics means streamlining workflows. Most organizations are in the mindset of “my tools dictate my workflow.” And that’s backward.

You want a high-performance network that has low vulnerability and strong threat response.  You want tools that can service your core systems, do efficient patching, perform antivirus protection and manage recovery should there be a breach. That’s what your tooling should support. Your workflows should help you weed out the tools that are not a good operational fit for your business.

Looking ahead

It’s clear the “new normal” will consist of remote, on-premises, and hybrid workforces. IT teams now have the experience to determine how to update and align processes and infrastructure without additional disruption.

Part of this evaluation process will center on the evaluation and procurement of tools that provide the scale, speed and simplicity necessary to manage operations in a hyper converged world while:

Maintaining superior IT hygiene as a foundational best practiceAssessing risk posture to inform technology and operational decisions Strengthening cybersecurity programs without impeding worker productivity

Dive deeper into cyber hygiene with this eBook.


In the wake of the COVID-19 pandemic, airlines have struggled with bad weather, fewer air traffic controllers, and a shortage of pilots, all leading to an unprecedented number of cancelations in 2022. According to Reuters, more than 100,000 flights in the US were canceled between January and July, up 11% from pre-pandemic levels.

American Airlines, the world’s largest airline, is turning to data and analytics to minimize disruptions and streamline operations with the aim of giving travelers a smoother experience.

“Touchless, seamless, stressless. We’ve always had this vision, but it’s been hard to realize with the legacy systems and infrastructure we have,” says Maya Leibman, outgoing executive vice president and CIO of American Airlines. “As we modernize, we make more and more strides towards our vision. In the future, maybe airports will just be called Sky-Stops because, just like your average bus stop, they’ll require no more effort or stress than just simply showing up and getting on board.”

Leibman, who stepped down on Sept. 1 in favor of incoming Executive Vice President and Chief Digital and Information Officer (CDIO) Ganesh Jayaram, drove a major transformation of the 86-year-old airline to embrace data-driven decision-making.

“We have been on this transformation journey for a few years now, and prior to the pandemic we implemented a product mindset by restructuring our squads around our newly developed product taxonomy,” Leibman says. “This was a huge change for our teams. But because we had laid the foundation in 2019 for a product-oriented DevOps culture, we were able to pivot and reprioritize our work to quickly address pandemic-related customer issues, such as making it easier for customers to use travel credits from canceled flights.”

Leibman notes that American Airlines operates every hour of every day. It always has planes in the air around the world.

“We are an industry where our product is being consumed as it’s being produced,” she says. “The biggest challenge is turning that data into actionable insights that can be acted on easily and seamlessly in real-time in this 24-7-365 environment.”

Taking to the cloud

Luckily, Leibman has had an ace on her side. Poonam Mohan, vice president of corporate technology at American Airlines, oversees many of the airline’s AI and data analytics initiatives and has been fundamental to implementing Leibman’s vision.

Poonam Mohan, vice president of corporate technology, American Airlines

American Airlines

“We moved our major data platforms to the cloud and implemented data hubs for Customer and Operations,” Mohan says. “These systems allow real-time data from many of the massive moving parts of the world’s largest airline to be used not just for understanding how events affected us in the past, but rather allowing us to improve customer and operational outcomes as they happen.”

Mohan notes that her team simultaneously created DataOps frameworks that have improved the airline’s ability to ingest and consume new data sources in a matter of hours rather than weeks.

American Airlines has partnered with Microsoft to use Azure as its preferred cloud platform for its airline applications and key workloads. The partners are applying AI, machine learning, and data analytics to every aspect of the company’s operations, from reducing taxi time (thus saving thousands of gallons of jet fuel per year and giving connecting customers extra time to make their next flight) to putting real-time information at the fingertips of maintenance personnel, ground crews, pilots, flight attendants, and gate agents.

“When the pandemic started, all of a sudden we were canceling thousands of flights as travel bans were implemented. As a result, we were issuing a lot of refunds to customers who had their travel plans canceled because of the pandemic. To handle the incredible volume that our customer service agents were dealing with, we used machine learning and automated ingestion and processing to help with the volume and to get our customers their refunds processed faster,” Mohan explains by way of example.

When it comes to taxi times, an intelligent gating program deployed at the airline’s Dallas-Fort Worth (DFW) hub, is providing real-time analysis of data points such as routing and runway information to automatically assign the nearest available gate to arriving aircraft, reducing the need for manual involvement from gate planners. The program is currently reducing taxi time by about 10 hours per day.

The airline is migrating and centralizing its strategic operational workloads — including its data warehouse and several legacy applications — into one Operations Hub on Azure, which it says will help it save costs, increase efficiency and scalability, and progress toward its sustainability goals.

“We are focused on automation in every function of the company,” Mohan says. “Robotic process automation has allowed us to automate a large number of repetitive manual processes in Finance, Loyalty, Revenue Management, Reservations, and HR, just to name a few. Combining automation with machine learning for natural language processing is very effective in helping solve many customer-facing issues.”

The importance of culture

Mohan also notes that the company has just scratched the surface of how digital twin and AI can help its operations and enhance the customer travel experience. Two of its more recent ML programs, started this spring, include HEAT (short for Hub Efficiency Analytics Tool) and the aforementioned intelligent gating program.

HEAT has already played a key role during severe thunderstorm events. It analyzes multiple data points, including weather conditions, load factors, customer connections, gate availability, and air traffic control to help American Airlines adjust departure and arrival times on hundreds of flights in a coordinated way.

“So far, we’ve been pleased with the results as it has reduced the number of cancelations during a weather event,” Mohan says. “While customers may be delayed, we are able to get them to their destinations as opposed to canceling their flights.”

As for the intelligent gating program at DFW, Mohan says that in March American Airlines was able to save nearly two minutes per flight in taxi time, which totals 10 hours per day.

“We have reduced the instances where gate separation is more than 25 minutes by 50%,” she says. “This is directly related to the scenario we all have to face: Our flight actually arrives early but then we sit on the tarmac waiting for our gate to be cleared. Spreading the time out between when the previous flight leaves and when the next one arrives reduces that frustrating scenario.”

Mohan says the program has also helped the airline reduce the number of “close in” gate changes by 50%. These events are particularly annoying to customers who then have to hustle to a new location in the airport.

To drive all these changes throughout IT and the wider company has required building and maintaining the right culture. Leibman notes that she has an entire team dedicated to delivery transformation within the company. That team’s primary focus is to build the company’s culture around continuous learning and to engage business partners to adopt DevOps and product-based practices. Internally, they’ve developed an immersive coaching environment called “the Hangar,” to create space for product teams to work closely with coaches.

“We’ve also been building a developer experience platform, called Developer Runway, to create a frictionless experience for our developers to build and deliver applications,” Leibman says.

The platform enables teams to build and expose their services. Teams across the technology organization work directly with the Runway platform and the developer community is then able to leverage what is exposed on the platform to simplify their delivery experience.

“What is hard with a big company is that people like consistency, standards, and predictability, so processes get built around those things and it’s like a fence that prevents innovation,” Leibman says. “We can’t hire people and put them in a tiny pen because they’ll never achieve what we hired them for. As leaders, we need to have the judgment to understand that while we need standards and consistency, we can’t have it at the expense of people thinking their best thoughts, spreading their wings, and producing new, innovative approaches — not just to what we are doing but how we are doing it.”

Analytics, Digital Transformation, Travel and Hospitality Industry

What is a data engineer?

Data engineers design, build, and optimize systems for data collection, storage, access, and analytics at scale. They create data pipelines used by data scientists, data-centric applications, and other data consumers.

This IT role requires a significant set of technical skills, including deep knowledge of SQL database design and multiple programming languages. Data engineers also need communication skills to work across departments and to understand what business leaders want to gain from the company’s large datasets.

Data engineers are often responsible for building algorithms for accessing raw data, but to do this, they need to understand a company’s or client’s objectives, as aligning data strategies with business goals is important, especially when large and complex datasets and databases are involved.

Data engineers must also know how to optimize data retrieval and how to develop dashboards, reports, and other visualizations for stakeholders. Depending on the organization, data engineers may also be responsible for communicating data trends. Larger organizations often have multiple data analysts or scientists to help understand data, whereas smaller companies might rely on a data engineer to work in both roles.

The data engineer role

According to Dataquest, there are three main roles that data engineers can fall into. These include:

Generalist: Data engineers who typically work for small teams or small companies wear many hats as one of the few “data-focused” people in the company. These generalists are often responsible for every step of the data process, from managing data to analyzing it. Dataquest says this is a good role for anyone looking to transition from data science to data engineering, as smaller businesses often don’t need to engineer for scale.Pipeline-centric: Often found in midsize companies, pipeline-centric data engineers work alongside data scientists to help make use of the data they collect. Pipeline-centric data engineers need “in-depth knowledge of distributed systems and computer science,” according to Dataquest.Database-centric: In larger organizations, where managing the flow of data is a full-time job, data engineers focus on analytics databases. Database-centric data engineers work with data warehouses across multiple databases and are responsible for developing table schemas.

Data engineer job description

Data engineers are responsible for managing and organizing data, while also keeping an eye out for trends or inconsistencies that will impact business goals. It’s a highly technical position, requiring experience and skills in areas such as programming, mathematics, and computer science. But data engineers also need soft skills to communicate data trends to others in the organization and to help the business make use of the data it collects. Some of the most common responsibilities for a data engineer include:

Develop, construct, test, and maintain architecturesAlign architecture with business requirementsData acquisitionDevelop data set processesUse programming language and toolsIdentify ways to improve data reliability, efficiency, and qualityConduct research for industry and business questionsUse large data sets to address business issuesDeploy sophisticated analytics programs, machine learning, and statistical methodsPrepare data for predictive and prescriptive modelingFind hidden patterns using dataUse data to discover tasks that can be automatedDeliver updates to stakeholders based on analytics

Data engineer vs. data scientist

Data engineers and data scientists often work closely together but serve very different functions. Data engineers are responsible for developing, testing, and maintaining data pipelines and data architectures. Data scientists use data science to discover insights from massive amounts of structured and unstructured data to shape or meet specific business needs and goals.

Data engineer vs. data architect

The data engineer and data architect roles are closely related and frequently confused. Data architects are senior visionaries who translate business requirements into technology requirements and define data standards and principles. They visualize and design an organization’s enterprise data management framework. Data engineers work with the data architect to create that vision, building and maintaining the data systems specified by the data architect’s data framework.

Data engineer salary

According to Glassdoor, the average salary for a data engineer is $117,671 per year, with a reported salary range of $87,000 to $174,000 depending on skills, experience, and location. Senior data engineers earn an average salary of $134,244 per year, while lead data engineers earn an average salary of $139,907 per year.

Here’s what some of the top tech companies pay their data engineers, on average, according to Glassdoor:

CompanyAverage annual salaryAmazon$130,787Apple$168,046Capital One$124,905Hewlett-Packard$94,142Meta$166,886IBM$100,936Target$183,819

Data engineer skills

The skills on your resume might impact your salary negotiations — in some cases by more than 15%. According to data from PayScale, the following data engineering skills are associated with a significant boost in reported salaries:

Ruby: +32%Oracle: +26%MapReduce: +26%JavaScript: +24%Amazon Redshift: +21%Apache Cassandra: +18%Apache Sqoop: +12%Data Quality: +11%Apache HBase: +10%Statistical Analysis: +10%

Data engineer certifications

Only a few certifications specific to data engineering are available, though there are plenty of data science and big data certifications to pick from if you want to expand beyond data engineering skills.

Still, to prove your merit as a data engineer, any one of these certifications will look great on your resume:

Amazon Web Services (AWS) Certified Data Analytics – SpecialtyCloudera Data Platform GeneralistData Science Council of America (DASCA) Associate Big Data EngineerGoogle Professional Data Engineer

For more on these and other related certifications, see “Top 8 data engineer and data architect certifications.”

Becoming a data engineer

Data engineers typically have a background in computer science, engineering, applied mathematics, or any other related IT field. Because the role requires heavy technical knowledge, aspiring data engineers might find that a bootcamp or certification alone won’t cut it against the competition. Most data engineering jobs require at least a relevant bachelor’s degree in a related discipline, according to PayScale.

You’ll need experience with multiple programming languages, including Python and Java, and knowledge of SQL database design. If you already have a background in IT or a related discipline such as mathematics or analytics, a bootcamp or certification can help tailor your resume to data engineering positions. For example, if you’ve worked in IT but haven’t held a specific data job, you could enroll in a data science bootcamp or get a data engineering certification to prove you have the skills on top of your other IT knowledge.

If you don’t have a background in tech or IT, you might need to enroll in an in-depth program to demonstrate your proficiency in the field or invest in an undergraduate program. If you have an undergraduate degree, but it’s not in a relevant field, you can always look into master’s programs in data analytics and data engineering.

Ultimately, it will depend on your situation and the types of jobs you have your eye on. Take time to browse job openings to see what companies are looking for, and that will give you a better idea of how your background can fit into that role.

Analytics, Careers, Data Management, Data Mining, Data Science, Staff Management

For enterprises looking to wrest the most value from their data, especially in real-time, the “data lakehouse” concept is starting to catch on.

The idea behind the data lakehouse is to merge together the best of what data lakes and data warehouses have to offer, says Gartner analyst Adam Ronthal.

Data warehouses, for their part, enable companies to store large amounts of structured data with well-defined schemas. They are designed to support a large number of simultaneous queries and to deliver the results quickly to many simultaneous users.

Data lakes, on the other hand, enable companies to collect raw, unstructured data in many formats for data analysts to hunt through. These vast pools of data have grown in prominence of late thanks to the flexibility they provide enterprises to store vast streams of data without first having to define the purpose of doing so.  

The market for these two types of big data repositories is “converging in the middle, at the lakehouse concept,” Ronthal says, with established data warehouse vendors adding the ability to manage unstructured data, and data lake vendors adding structure to their offerings.

For example, on AWS, enterprises can now pair Amazon Redshift, a data warehouse, with Amazon Redshift Spectrum, which enables Redshift to reach into Amazon’s unstructured S3 data lakes. Meanwhile, data lake Snowflake can now support unstructured data with external tables, Ronthal says.

When companies have separate lakes and warehouses, and data needs to move from one to the other, it introduces latency and costs time and money, Ronthal adds. Combining the two in one platform reduces effort and data movement, thereby accelerating the pace of uncovering data insights.

And, depending on the platform, a data lakehouse can also offer other features, such as support for data streaming, machine learning, and collaboration, giving enterprises additional tools for making the most of their data.

Here is a look at at the benefits of data lakehouses and how several leading organizations are making good on their promise as part of their analytics strategies.

Enhancing the video game experience

Sega Europe’s use of data repositories in support of its video games has evolved considerably in the past several years.

In 2016, the company began using the Amazon Redshift data warehouse to collect event data from its Football Manager video game. At first this event data consisted simply of players opening and closing games. The company had two staff members looking into this data, which streamed into Redshift at a rate of ten events per second.

“But there was so much more data we could be collecting,” says Felix Baker, the company’s head of data services. “Like what teams people were managing, or how much money they were spending.”

By 2017, Sega Europe was collecting 800 events a second, with five staff working on the platform. By 2020, the company’s system was capturing 7,000 events per second from a portfolio of 30 Sega games, with 25 staff involved.

At that point, the system was starting to hit its limits, Baker says. Because of the data structures needed for inclusion in the data warehouse, data was coming in batches and it took half an hour to an hour to analyze it, he says.

“We wanted to analyze the data in real-time,” he adds, but this functionality wasn’t available in Redshift at the time.

After performing proofs of concept with three platforms — Redshift, Snowflake, and Databricks — Sega Europe settled on using Databricks, one of the pioneers of the data lakehouse industry.

“Databricks offered an out-of-the-box managed services solution that did what we needed without us having to develop anything,” he says. That included not just real-time streaming but machine learning and collaborative workspaces.

In addition, the data lakehouse architecture enabled Sega Europe to ingest unstructured data, such as social media feeds, as well.

“With Redshift, we had to concentrate on schema design,” Baker says. “Every table had to have a set structure before we could start ingesting data. That made it clunky in many ways. With the data lakehouse, it’s been easier.”

Sega Europe’s Databricks platform went live into production in the summer of 2020. Two or three consultants from Databricks worked alongside six or seven people from Sega Europe to get the streaming solution up and running, matching what the company had in place previously with Redshift. The new lakehouse is built in three layers, the base layer of which is just one large table that everything gets dumped into.

“If developers create new events, they don’t have to tell us to expect new fields — they can literally send us everything,” Baker says. “And we can then build jobs on top of that layer and stream out the data we acquired.”

The transition to Databricks, which is built on top of Apache Spark, was smooth for Sega Europe, thanks to prior experience with the open-source engine for large-scale data processing.

“Within our team, we had quite a bit of expertise already with Apache Spark,” Baker says. “That meant that we could set up streams very quickly based on the skills we already had.”

Today, the company processes 25,000 events per second, with more than 30 data staffers and 100 game titles in the system. Instead of taking 30 minutes to an hour to process, the data is ready within a minute.

“The volume of data collected has grown exponentially,” Baker says. In fact, after the pandemic hit, usage of some games doubled.

The new platform has also opened up new possibilities. For example, Sega Europe’s partnership with Twitch, a streaming platform where people watch other people play video games, has been enhanced to include a data stream for its Humankind game, so that viewers can get a player’s history, including the levels they completed, the battles they won, and the civilizations they conquered.

“The overlay on Twitch is updating as they play the game,” Baker says. “That is a use case that we wouldn’t have been able to achieve before Databricks.”

The company has also begun leveraging the lakehouse’s machine learning capabilities. For example, Sega Europe data scientists have designed models to figure out why players stop playing games and to make suggestions for how to increase retention.

“The speed at which these models can be built has been amazing, really,” Baker says. “They’re just cranking out these models, it seems, every couple of weeks.”

The business benefits of data lakehouses

The flexibility and catch-all nature of data lakehouses is fast proving attractive to organizations looking to capitalize on their data assets, especially as part of digital initiatives that hinge quick access to a wide array of data.

“The primary value driver is the cost efficiencies enabled by providing a source for all of an organization’s structured and unstructured data,” says Steven Karan, vice president and head of insights and data at consulting company Capgemini Canada, which has helped implement data lakehouses at leading organizations in financial services, telecom, and retail.

Moreover, data lakehouses store data in such a way that it is readily available for use by a wide array of technologies, from traditional business intelligence and reporting systems to machine learning and artificial Intelligence, Karan adds. “Other benefits include reduced data redundancy, simplified IT operations, a simplified data schema to manage, and easier to enable data governance.”

One particularly valuable use case for data lakehouses is in helping companies get value from data previously trapped in legacy or siloed systems. For example, one Capgemini enterprise customer, which had grown through acquisitions over a decade, couldn’t access valuable data related to resellers of their products.

“By migrating the siloed data from legacy data warehouses into a centralized data lakehouse, the client was able to understand at an enterprise level which of their reseller partners were most effective, and how changes such as referral programs and structures drove revenue,” he says.

Putting data into a single data lakehouse makes it easier to manage, says Meera Viswanathan, senior product manager at Fivetran, a data pipeline company. Companies that have traditionally used both data lakes and data warehouses often have separate teams to manage them, making it confusing for the business units that needed to consume the data, she says.

In addition to Databricks, Amazon Redshift Spectrum, and Snowflake, other vendors in the data lakehouse space include Microsoft, with its lakehouse platform Azure Synapse, and Google, with its BigLake on Google Cloud Platform, as well as data lakehouse platform Starburst.

Accelerating data processing for better health outcomes

One company capitalizing on these and other benefits of data lakehouses is life sciences analytics and services company IQVIA.

Before the pandemic, pharmaceutical companies running drug trials used to send employees to hospitals and other sites to collect data about things such adverse effects, says Wendy Morahan, senior director of clinical data analytics at IQVIA. “That is how they make sure the patient is safe.”

Once the pandemic hit and sites were locked down, however, pharmaceutical companies had to scramble to figure out how to get the data they needed — and to get it in a way that was compliant with regulations and fast enough to enable them to spot potential problems as quickly as possible.

Moreover, with the rise of wearable devices in healthcare, “you’re now collecting hundreds of thousands of data points,” Morahan adds.

IQVIA has been building technology to do just that for the past 20 years, says her colleague Suhas Joshi, also a senior director of clinical data analytics at the company. About four years ago, the company began using data lakehouses for this purpose, including Databricks and the data lakehouse functionality now available with Snowflake.

“With Snowflake and Databricks you have the ability to store the raw data, in any format,” Joshi says. “We get a lot of images and audio. We get all this data and use it for monitoring. In the past, it would have involved manual steps, going to different systems. It would have taken time and effort. Today, we’re able to do it all in one single platform.”

The data collection process is also faster, he says. In the past, the company would have to write code to acquire data. Now, the data can even be analyzed without having to be processed first to fit a database format.

Take the example of a patient in a drug trial who gets a lab result that shows she’s pregnant, but the pregnancy form wasn’t filled out properly, and the drug is harmful during pregnancy. Or a patient who has an adverse event and needs blood pressure medication, but the medication was not prescribed. Not catching these problems quickly can have drastic consequences. “You might be risking a patient’s safety,” says Joshi.

Analytics, Data Architecture, Data Management


What is business analytics?

Business analytics is the practical application of statistical analysis and technologies on business data to identify and anticipate trends and predict business outcomes. Research firm Gartner defines business analytics as “solutions used to build analysis models and simulations to create scenarios, understand realities, and predict future states.”

While quantitative analysis, operational analysis, and data visualizations are key components of business analytics, the goal is to use the insights gained to shape business decisions. The discipline is a key facet of the business analyst role.

Wake Forest University School of Business notes that key business analytics activities include:

Identifying new patterns and relationships with data miningUsing quantitative and statistical analysis to design business modelsConducting A/B and multivariable testing based on findingsForecasting future business needs, performance, and industry trends with predictive modelingCommunicating findings to colleagues, management, and customers


What are the benefits of business analytics?

Business analytics can help you improve operational efficiency, better understand your customers, project future outcomes, glean insights to aid in decision-making, measure performance, drive growth, discover hidden trends, generate leads, and scale your business in the right direction, according to digital skills training company Simplilearn.


What is the difference between business analytics and data analytics?

Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining, data cleansing, data transformation, data modeling, and more. Business analytics also involves data mining, statistical analysis, predictive modeling, and the like, but is focused on driving better business decisions.


What is the difference between business analytics and business intelligence?

Business analytics and business intelligence (BI) serve similar purposes and are often used as interchangeable terms, but BI can be considered a subset of business analytics. BI focuses on descriptive analytics, data collection, data storage, knowledge management, and data analysis to evaluate past business data and better understand currently known information. Whereas BI studies historical data to guide business decision-making, business analytics is about looking forward. It uses data mining, data modeling, and machine learning to answer “why” something happened and predict what might happen in the future.

Business analytics techniques

According to Harvard Business School Online, there are three primary types of business analytics:

Descriptive analytics: What is happening in your business right now? Descriptive analytics uses historical and current data to describe the organization’s present state by identifying trends and patterns. This is the purview of BI.Predictive analytics: What is likely to happen in the future? Predictive analytics is the use of techniques such as statistical modeling, forecasting, and machine learning to make predictions about future outcomes.Prescriptive analytics: What do we need to do? Prescriptive analytics is the application of testing and other techniques to recommend specific solutions that will deliver desired business outcomes.

Simplilearn adds a fourth technique:

Diagnostic analytics: Why is it happening? Diagnostic analytics uses analytics techniques to discover the factors or reasons for past or current performance.

Examples of business analytics

San Jose Sharks build fan engagement

Starting in 2019, the San Jose Sharks began integrating its operational data, marketing systems, and ticket sales with front-end, fan-facing experiences and promotions to enable the NHL hockey team to capture and quantify the needs and preferences of its fan segments: season ticket holders, occasional visitors, and newcomers. It uses the insights to power targeted marketing campaigns based on actual purchasing behavior and experience data. When implementing the system, Neda Tabatabaie, vice president of business analytics and technology for the San Jose Sharks, said she anticipated a 12% increase in ticket revenue, a 20% projected reduction in season ticket holder churn, and a 7% increase in campaign effectiveness (measured in click-throughs).

GSK finds inventory reduction opportunities

As part of a program designed to accelerate its use of enterprise data and analytics, pharmaceutical titan GlaxoSmithKline (GSK) designed a set of analytics tools focused on inventory reduction opportunities across the company’s supply chain. The suite of tools included a digital value stream map, safety stock optimizer, inventory corridor report, and planning cockpit.

Shankar Jegasothy, director of supply chain analytics at GSK, says the tools helped GSK gain better visibility into its end-to-end supply chain and then use predictive and prescriptive analytics to guide decisions around inventory and planning.

Kaiser Permanente streamlines operations

Healthcare consortium Kaiser Permanente uses analytics to reduce patient waiting times and the amount of time hospital leaders spend manually preparing data for operational activities.

In 2018, the consortium’s IT function launched Operations Watch List (OWL), a mobile app that provides a comprehensive, near real-time view of key hospital quality, safety, and throughput metrics (including hospital census, bed demand and availability, and patient discharges).

In its first year, OWL reduced patient wait time for admission to the emergency department by an average of 27 minutes per patient. Surveys also showed hospital managers reduced the amount of time they spent manually preparing data for operational activities by an average of 323 minutes per month.

Business analytics tools

Business analytics professionals need to be fluent in a variety of tools and programming languages. According to the Harvard Business Analytics program, the top tools for business analytics professionals are:

SQL: SQL is the lingua franca of data analysis. Business analytics professionals use SQL queries to extract and analyze data from transactions databases and to develop visualizations.Statistical languages: Business analytics professionals frequently use R for statistical analysis and Python for general programming.Statistical software: Business analytics professionals frequently use software including SPSS, SAS, Sage, Mathematica, and Excel to manage and analyze data.

Business analytics dashboard components

According to analytics platform company OmniSci, the main components of a typical business analytics dashboard include:

Data aggregation: Before it can be analyzed, data must be gathered, organized, and filtered.Data mining: Data mining sorts through large datasets using databases, statistics, and machine learning to identify trends and establish relationships.Association and sequence identification: Predictable actions that are performed in association with other actions or sequentially must be identified.Text mining: Text mining is used to explore and organize large, unstructured datasets for qualitative and quantitative analysis.Forecasting: Forecasting analyzes historical data from a specific period to make informed estimates predictive of future events or behaviors.Predictive analytics: Predictive business analytics use a variety of statistical techniques to create predictive models that extract information from datasets, identify patterns, and provide a predictive score for an array of organizational outcomes.Optimization: Once trends have been identified and predictions made, simulation techniques can be used to test best-case scenarios.Data visualization: Data visualization provides visual representations of charts and graphs for easy and quick data analysis.

Business analytics salaries

Here are some of the most popular job titles related to business analytics and the average salary for each position, according to data from PayScale:

Analytics manager: $71K-$132KBusiness analyst: $48K-$84KBusiness analyst, IT: $51K-$100KBusiness intelligence analyst: $52K-$98KData analyst: $46K-$88KMarket research analyst: $42K-$77KQuantitative analyst: $61K-$131KResearch analyst, operations: $47K-$115KSenior business analyst: $65K-$117KStatistician: $56K-$120KAnalytics