For enterprises looking to wrest the most value from their data, especially in real-time, the “data lakehouse” concept is starting to catch on.

The idea behind the data lakehouse is to merge together the best of what data lakes and data warehouses have to offer, says Gartner analyst Adam Ronthal.

Data warehouses, for their part, enable companies to store large amounts of structured data with well-defined schemas. They are designed to support a large number of simultaneous queries and to deliver the results quickly to many simultaneous users.

Data lakes, on the other hand, enable companies to collect raw, unstructured data in many formats for data analysts to hunt through. These vast pools of data have grown in prominence of late thanks to the flexibility they provide enterprises to store vast streams of data without first having to define the purpose of doing so.  

The market for these two types of big data repositories is “converging in the middle, at the lakehouse concept,” Ronthal says, with established data warehouse vendors adding the ability to manage unstructured data, and data lake vendors adding structure to their offerings.

For example, on AWS, enterprises can now pair Amazon Redshift, a data warehouse, with Amazon Redshift Spectrum, which enables Redshift to reach into Amazon’s unstructured S3 data lakes. Meanwhile, data lake Snowflake can now support unstructured data with external tables, Ronthal says.

When companies have separate lakes and warehouses, and data needs to move from one to the other, it introduces latency and costs time and money, Ronthal adds. Combining the two in one platform reduces effort and data movement, thereby accelerating the pace of uncovering data insights.

And, depending on the platform, a data lakehouse can also offer other features, such as support for data streaming, machine learning, and collaboration, giving enterprises additional tools for making the most of their data.

Here is a look at at the benefits of data lakehouses and how several leading organizations are making good on their promise as part of their analytics strategies.

Enhancing the video game experience

Sega Europe’s use of data repositories in support of its video games has evolved considerably in the past several years.

In 2016, the company began using the Amazon Redshift data warehouse to collect event data from its Football Manager video game. At first this event data consisted simply of players opening and closing games. The company had two staff members looking into this data, which streamed into Redshift at a rate of ten events per second.

“But there was so much more data we could be collecting,” says Felix Baker, the company’s head of data services. “Like what teams people were managing, or how much money they were spending.”

By 2017, Sega Europe was collecting 800 events a second, with five staff working on the platform. By 2020, the company’s system was capturing 7,000 events per second from a portfolio of 30 Sega games, with 25 staff involved.

At that point, the system was starting to hit its limits, Baker says. Because of the data structures needed for inclusion in the data warehouse, data was coming in batches and it took half an hour to an hour to analyze it, he says.

“We wanted to analyze the data in real-time,” he adds, but this functionality wasn’t available in Redshift at the time.

After performing proofs of concept with three platforms — Redshift, Snowflake, and Databricks — Sega Europe settled on using Databricks, one of the pioneers of the data lakehouse industry.

“Databricks offered an out-of-the-box managed services solution that did what we needed without us having to develop anything,” he says. That included not just real-time streaming but machine learning and collaborative workspaces.

In addition, the data lakehouse architecture enabled Sega Europe to ingest unstructured data, such as social media feeds, as well.

“With Redshift, we had to concentrate on schema design,” Baker says. “Every table had to have a set structure before we could start ingesting data. That made it clunky in many ways. With the data lakehouse, it’s been easier.”

Sega Europe’s Databricks platform went live into production in the summer of 2020. Two or three consultants from Databricks worked alongside six or seven people from Sega Europe to get the streaming solution up and running, matching what the company had in place previously with Redshift. The new lakehouse is built in three layers, the base layer of which is just one large table that everything gets dumped into.

“If developers create new events, they don’t have to tell us to expect new fields — they can literally send us everything,” Baker says. “And we can then build jobs on top of that layer and stream out the data we acquired.”

The transition to Databricks, which is built on top of Apache Spark, was smooth for Sega Europe, thanks to prior experience with the open-source engine for large-scale data processing.

“Within our team, we had quite a bit of expertise already with Apache Spark,” Baker says. “That meant that we could set up streams very quickly based on the skills we already had.”

Today, the company processes 25,000 events per second, with more than 30 data staffers and 100 game titles in the system. Instead of taking 30 minutes to an hour to process, the data is ready within a minute.

“The volume of data collected has grown exponentially,” Baker says. In fact, after the pandemic hit, usage of some games doubled.

The new platform has also opened up new possibilities. For example, Sega Europe’s partnership with Twitch, a streaming platform where people watch other people play video games, has been enhanced to include a data stream for its Humankind game, so that viewers can get a player’s history, including the levels they completed, the battles they won, and the civilizations they conquered.

“The overlay on Twitch is updating as they play the game,” Baker says. “That is a use case that we wouldn’t have been able to achieve before Databricks.”

The company has also begun leveraging the lakehouse’s machine learning capabilities. For example, Sega Europe data scientists have designed models to figure out why players stop playing games and to make suggestions for how to increase retention.

“The speed at which these models can be built has been amazing, really,” Baker says. “They’re just cranking out these models, it seems, every couple of weeks.”

The business benefits of data lakehouses

The flexibility and catch-all nature of data lakehouses is fast proving attractive to organizations looking to capitalize on their data assets, especially as part of digital initiatives that hinge quick access to a wide array of data.

“The primary value driver is the cost efficiencies enabled by providing a source for all of an organization’s structured and unstructured data,” says Steven Karan, vice president and head of insights and data at consulting company Capgemini Canada, which has helped implement data lakehouses at leading organizations in financial services, telecom, and retail.

Moreover, data lakehouses store data in such a way that it is readily available for use by a wide array of technologies, from traditional business intelligence and reporting systems to machine learning and artificial Intelligence, Karan adds. “Other benefits include reduced data redundancy, simplified IT operations, a simplified data schema to manage, and easier to enable data governance.”

One particularly valuable use case for data lakehouses is in helping companies get value from data previously trapped in legacy or siloed systems. For example, one Capgemini enterprise customer, which had grown through acquisitions over a decade, couldn’t access valuable data related to resellers of their products.

“By migrating the siloed data from legacy data warehouses into a centralized data lakehouse, the client was able to understand at an enterprise level which of their reseller partners were most effective, and how changes such as referral programs and structures drove revenue,” he says.

Putting data into a single data lakehouse makes it easier to manage, says Meera Viswanathan, senior product manager at Fivetran, a data pipeline company. Companies that have traditionally used both data lakes and data warehouses often have separate teams to manage them, making it confusing for the business units that needed to consume the data, she says.

In addition to Databricks, Amazon Redshift Spectrum, and Snowflake, other vendors in the data lakehouse space include Microsoft, with its lakehouse platform Azure Synapse, and Google, with its BigLake on Google Cloud Platform, as well as data lakehouse platform Starburst.

Accelerating data processing for better health outcomes

One company capitalizing on these and other benefits of data lakehouses is life sciences analytics and services company IQVIA.

Before the pandemic, pharmaceutical companies running drug trials used to send employees to hospitals and other sites to collect data about things such adverse effects, says Wendy Morahan, senior director of clinical data analytics at IQVIA. “That is how they make sure the patient is safe.”

Once the pandemic hit and sites were locked down, however, pharmaceutical companies had to scramble to figure out how to get the data they needed — and to get it in a way that was compliant with regulations and fast enough to enable them to spot potential problems as quickly as possible.

Moreover, with the rise of wearable devices in healthcare, “you’re now collecting hundreds of thousands of data points,” Morahan adds.

IQVIA has been building technology to do just that for the past 20 years, says her colleague Suhas Joshi, also a senior director of clinical data analytics at the company. About four years ago, the company began using data lakehouses for this purpose, including Databricks and the data lakehouse functionality now available with Snowflake.

“With Snowflake and Databricks you have the ability to store the raw data, in any format,” Joshi says. “We get a lot of images and audio. We get all this data and use it for monitoring. In the past, it would have involved manual steps, going to different systems. It would have taken time and effort. Today, we’re able to do it all in one single platform.”

The data collection process is also faster, he says. In the past, the company would have to write code to acquire data. Now, the data can even be analyzed without having to be processed first to fit a database format.

Take the example of a patient in a drug trial who gets a lab result that shows she’s pregnant, but the pregnancy form wasn’t filled out properly, and the drug is harmful during pregnancy. Or a patient who has an adverse event and needs blood pressure medication, but the medication was not prescribed. Not catching these problems quickly can have drastic consequences. “You might be risking a patient’s safety,” says Joshi.

Analytics, Data Architecture, Data Management

1.

What is business analytics?

Business analytics is the practical application of statistical analysis and technologies on business data to identify and anticipate trends and predict business outcomes. Research firm Gartner defines business analytics as “solutions used to build analysis models and simulations to create scenarios, understand realities, and predict future states.”

While quantitative analysis, operational analysis, and data visualizations are key components of business analytics, the goal is to use the insights gained to shape business decisions. The discipline is a key facet of the business analyst role.

Wake Forest University School of Business notes that key business analytics activities include:

Identifying new patterns and relationships with data miningUsing quantitative and statistical analysis to design business modelsConducting A/B and multivariable testing based on findingsForecasting future business needs, performance, and industry trends with predictive modelingCommunicating findings to colleagues, management, and customers

2.

What are the benefits of business analytics?

Business analytics can help you improve operational efficiency, better understand your customers, project future outcomes, glean insights to aid in decision-making, measure performance, drive growth, discover hidden trends, generate leads, and scale your business in the right direction, according to digital skills training company Simplilearn.

3.

What is the difference between business analytics and data analytics?

Business analytics is a subset of data analytics. Data analytics is used across disciplines to find trends and solve problems using data mining, data cleansing, data transformation, data modeling, and more. Business analytics also involves data mining, statistical analysis, predictive modeling, and the like, but is focused on driving better business decisions.

4.

What is the difference between business analytics and business intelligence?

Business analytics and business intelligence (BI) serve similar purposes and are often used as interchangeable terms, but BI can be considered a subset of business analytics. BI focuses on descriptive analytics, data collection, data storage, knowledge management, and data analysis to evaluate past business data and better understand currently known information. Whereas BI studies historical data to guide business decision-making, business analytics is about looking forward. It uses data mining, data modeling, and machine learning to answer “why” something happened and predict what might happen in the future.

Business analytics techniques

According to Harvard Business School Online, there are three primary types of business analytics:

Descriptive analytics: What is happening in your business right now? Descriptive analytics uses historical and current data to describe the organization’s present state by identifying trends and patterns. This is the purview of BI.Predictive analytics: What is likely to happen in the future? Predictive analytics is the use of techniques such as statistical modeling, forecasting, and machine learning to make predictions about future outcomes.Prescriptive analytics: What do we need to do? Prescriptive analytics is the application of testing and other techniques to recommend specific solutions that will deliver desired business outcomes.

Simplilearn adds a fourth technique:

Diagnostic analytics: Why is it happening? Diagnostic analytics uses analytics techniques to discover the factors or reasons for past or current performance.

Examples of business analytics

San Jose Sharks build fan engagement

Starting in 2019, the San Jose Sharks began integrating its operational data, marketing systems, and ticket sales with front-end, fan-facing experiences and promotions to enable the NHL hockey team to capture and quantify the needs and preferences of its fan segments: season ticket holders, occasional visitors, and newcomers. It uses the insights to power targeted marketing campaigns based on actual purchasing behavior and experience data. When implementing the system, Neda Tabatabaie, vice president of business analytics and technology for the San Jose Sharks, said she anticipated a 12% increase in ticket revenue, a 20% projected reduction in season ticket holder churn, and a 7% increase in campaign effectiveness (measured in click-throughs).

GSK finds inventory reduction opportunities

As part of a program designed to accelerate its use of enterprise data and analytics, pharmaceutical titan GlaxoSmithKline (GSK) designed a set of analytics tools focused on inventory reduction opportunities across the company’s supply chain. The suite of tools included a digital value stream map, safety stock optimizer, inventory corridor report, and planning cockpit.

Shankar Jegasothy, director of supply chain analytics at GSK, says the tools helped GSK gain better visibility into its end-to-end supply chain and then use predictive and prescriptive analytics to guide decisions around inventory and planning.

Kaiser Permanente streamlines operations

Healthcare consortium Kaiser Permanente uses analytics to reduce patient waiting times and the amount of time hospital leaders spend manually preparing data for operational activities.

In 2018, the consortium’s IT function launched Operations Watch List (OWL), a mobile app that provides a comprehensive, near real-time view of key hospital quality, safety, and throughput metrics (including hospital census, bed demand and availability, and patient discharges).

In its first year, OWL reduced patient wait time for admission to the emergency department by an average of 27 minutes per patient. Surveys also showed hospital managers reduced the amount of time they spent manually preparing data for operational activities by an average of 323 minutes per month.

Business analytics tools

Business analytics professionals need to be fluent in a variety of tools and programming languages. According to the Harvard Business Analytics program, the top tools for business analytics professionals are:

SQL: SQL is the lingua franca of data analysis. Business analytics professionals use SQL queries to extract and analyze data from transactions databases and to develop visualizations.Statistical languages: Business analytics professionals frequently use R for statistical analysis and Python for general programming.Statistical software: Business analytics professionals frequently use software including SPSS, SAS, Sage, Mathematica, and Excel to manage and analyze data.

Business analytics dashboard components

According to analytics platform company OmniSci, the main components of a typical business analytics dashboard include:

Data aggregation: Before it can be analyzed, data must be gathered, organized, and filtered.Data mining: Data mining sorts through large datasets using databases, statistics, and machine learning to identify trends and establish relationships.Association and sequence identification: Predictable actions that are performed in association with other actions or sequentially must be identified.Text mining: Text mining is used to explore and organize large, unstructured datasets for qualitative and quantitative analysis.Forecasting: Forecasting analyzes historical data from a specific period to make informed estimates predictive of future events or behaviors.Predictive analytics: Predictive business analytics use a variety of statistical techniques to create predictive models that extract information from datasets, identify patterns, and provide a predictive score for an array of organizational outcomes.Optimization: Once trends have been identified and predictions made, simulation techniques can be used to test best-case scenarios.Data visualization: Data visualization provides visual representations of charts and graphs for easy and quick data analysis.

Business analytics salaries

Here are some of the most popular job titles related to business analytics and the average salary for each position, according to data from PayScale:

Analytics manager: $71K-$132KBusiness analyst: $48K-$84KBusiness analyst, IT: $51K-$100KBusiness intelligence analyst: $52K-$98KData analyst: $46K-$88KMarket research analyst: $42K-$77KQuantitative analyst: $61K-$131KResearch analyst, operations: $47K-$115KSenior business analyst: $65K-$117KStatistician: $56K-$120KAnalytics