Economic instability and uncertainty are the leading causes for technology budget decreases, according to the IDG/Foundry 2022 annual State of the CIO survey. Despite a desire to cut budgets, data remains the key factor to a business succeeding – especially during economic uncertainty. According to the Harvard Business Review, data-driven companies have better financial performance, are more likely to survive, and are more innovative.[1]

So how do companies find this balance and create a cost-effective data stack that can deliver real value to their business? A new survey from Databricks, Fivetran, and Foundry that surveyed 400-plus senior IT decision-makers in data analytics/AI roles at large global companies, finds that 96% of respondents report negative business effects due to integration challenges. However, many IT and business leaders are discovering that modernizing their data stack overcomes those integration hurdles, providing the basis for a unified and cost-effective data architecture.

Building a performant & cost-effective data stack 

The Databricks, Fivetran, and Foundry report points the way for four investment priorities for data leaders: 

1. Automated data movement. A data pipeline is critical to the modern data infrastructure. Data pipelines ingest and move data from popular enterprise SaaS applications, and operational and analytic workloads to cloud-based destinations like data lakehouses. As the volume, variety and velocity of data grow, businesses need fully managed, secure and scalable data pipelines that can automatically adapt as schemas and APIs change while continuously delivering high-quality, fresh data. Modernizing analytic environments with an automated data movement solution reduces operational risk, ensures high performance, and simplifies ongoing management of data integration. 

2. A single system of insight. A data lakehouse incorporates integration tools that automate ELT to enable data movement to a central location in near real time. By combining both structured and unstructured data and eliminating separate silos, a single system of insight like the data lakehouse enables data teams to handle all data types and workloads. This unified approach of the data lakehouse dramatically simplifies the data architecture and combines the best features of a data warehouse and a data lake. This enables improved data management, security, and governance in a single data architecture to increase efficiency and innovation. Last, it supports all major data and AI workloads making data more accessible for decision-making.

A unified data architecture results in a data-driven organization that gains both BI, analytics and AI/ML insights at speeds comparable to those of a data warehouse, an important differentiator for tomorrow’s winning companies. 

3. Designed for AI/ML from the ground up. AI/ML is gaining momentum, as more than 80% of organizations are using or exploring the use of (AI) to stay competitive. “AI remains a foundational investment in digital transformation projects and programs,” says Carl W. Olofson, research vice president with IDC, who predicts worldwide AI spending will exceed $221B by 2025.[2] Despite that commitment, becoming a data-driven company fueled by BI analytics and AI insights is proving to be beyond the reach of many organizations that find themselves stymied by integration and complexity challenges. The data lakehouse solves this by providing a single solution for all major data workloads from streaming analytics to BI, data science, and AI. It empowers data science and machine learning teams to access, prepare and explore data at scale.

4. Solving the data quality issue. Data quality tools(59%) stand out as the most important technology to modernize the data stack, according to IT leaders in the survey. Why is data quality so important? Traditionally, business intelligence (BI) systems enabled queries of structured data in data warehouses for insights. Data lakes, meanwhile, contained unstructured data that was retained for the purposes of AI and Machine Learning (ML). However, maintaining siloed systems, or attempting to integrate them through complex workarounds, is difficult and costly. In a data lakehouse, metadata layers on top of open file formats increase data quality, while query engine advances speed and performance. This serves the needs of both BI analytics and AI/ML workloads in order to assure the accuracy, reliability, relevance, completeness, and consistency of data. 

According to the Databricks, Fivetran, and Foundry report, nearly two-thirds of IT leaders are using a data lakehouse, and more than four out of five say they’re likely to consider implementing one. At a moment when cost pressure is calling into question open-ended investments in data warehouses and data lakes, savvy IT leaders are responding as they place a high priority on modernizing their data stack. 

Download the full report to discover exclusive insights from IT leaders into their data pain points, how theyplan to address them, and what roles they expect cloud and data lakehouses to play in their data stack modernization.

[1] https://mitsloan.mit.edu/ideas-made-to-matter/why-data-driven-customers-are-future-competitive-strategy

[2]  Source: IDC’s Worldwide Artificial Intelligence Spending Guide, Feb V1 2022. 

Data Architecture

Building and managing infrastructure yourself gives you more control — but the effort to keep it all under control can take resources away from innovation in other areas. Matt Doka, CTO of FiveStars, a marketing platform for small businesses, doesn’t like that trade-off and goes out of his way to outsource whatever he can.

It shows in his reluctance to run his own servers but it’s perhaps most obvious in his attitude to data engineering, where he’s nearing the end of a five-year journey to automate or outsource much of the mundane maintenance work and focus internal resources on data analysis.

FiveStars offers small businesses an online loyalty card service — the digital equivalent of “buy nine, get one free” stamp cards — that they can link to their customers’ telephone numbers and payment cards. Over 10,000 small businesses use its services, and Doka estimates around 70 million Americans have opted into loyalty programs it manages. More recently, it has moved into payment processing, an option adopted by around 20% of its clients, and offers its own PCI-compliant payment terminals.

Recording all those interactions generates a prodigious amount of data, but that’s not the half of it. To one-up the legacy payment processors that just drop off a terminal and leave customers to call for support if it stops working, FiveStars builds telemetry systems into its terminals, which regularly report their connection status, battery level and application performance information.

“The bulk of our load isn’t even the transactions, the points or the credit cards themselves,” he says. “It’s the huge amounts of device telemetry data to make sure that when somebody wants to make a payment or earn some points, it’s a best in class experience.”

Figuring that out from the data takes a lot of analysis — work that the 10-person data team had less time for since just maintaining their data infrastructure was eating it all up.

The data team that built the first version of FiveStars’ data infrastructure started on the sales and marketing side of the business, not IT. That historical accident meant that while they really knew their way around data, they had little infrastructure management experience, says Doka.

When Doka took over the team, he discovered they had written everything by hand: server automation code, database queries, the analyses — everything. “They wrote bash scripts!” Doka says. “Even 10 years ago, you had systems that could abstract away bash scripts.”

The system was brittle, highly manual and based on a lot of tribal knowledge. The net effect was that the data analysts spent most of their time just keeping the system running. “They struggled to get new data insights developed into analyses,” he says.

Back in 2019, he adds, everyone’s answer to a problem like that was to use Apache Airflow, an open-source platform for managing data engineering workflows written in and controlled with Python. It was originally developed at AirBnB to perform exactly the kinds of things Doka’s team were still doing by hand.

Doka opted for a hosted version of Airflow to replace FiveStars’ resource-intensive homebrew system. “I wanted to get us out of the business of hosting our own infrastructure because these are data analysts or even data engineers, not experienced SREs,” he says. “It’s not a good use of our time either.”

Adopting Airflow meant Doka could stop worrying about other things besides servers. “There was a huge improvement in standardization and the basics of running things,” he says. “You just inherit all these best practices that we were inventing or reinventing ourselves.”

But, he laments, “How you actually work in Airflow is entirely up to the development team, so you still spend a lot of mind cycles on just structuring every new project.” And a particular gripe of his was that you have to build your own documentation best practices.

So barely a year after beginning the migration to Airflow, Doka found himself looking for something better to help him automate more of his data engineering processes and standardize away some of the less business-critical decisions that took up so much time.

He cast his net wide, but many of the tools he found only addressed part of the problem.

“DBT just focused on how to change the data within a single Snowflake instance, for example,” he says. “It does a really good job of that, but how do you get data into Snowflake from all your sources?” For that, he adds, “there were some platforms that could abstract away all the data movement in a standardized way, like Fivetran, but they didn’t really give you a language to process.”

After checking out several other options, Doka eventually settled on Ascend.io. “I loved the fact there was a standard way to write a SQL query or Python code, and it generates a lineage and a topology,” he says. “The system can automatically know where all the data came from; how it made its way to this final analysis.”

This not only abstracts away the challenge of running servers, but also of deciding how you do work, he says.

“This saves a ton of mental load for data engineers and data analysts,” he says. “They’re able to focus entirely on the question they’re trying to answer and the analysis they’re trying to do.”

Not only is it easier for analysts to focus on their own work, it’s also easier for them to follow one another’s, he adds.

“There’s all this documentation that was just built in by design where, without thinking about it, each analyst left a clear trail of crumbs as to how they got to where they are,” he says. “So if new people join the project, it’s easier to see what’s going on.”

Ascend uses another Apache project, Spark, as its analytics engine, and it has its own Python API, PySpark.

Migrating the first few core use cases from Airflow took less than a month. “It took an hour to turn on, and two minutes to hook up Postgres and some of our data sources,” Doka says. “That was very fast.”

Replicating some of the workflows was as easy as copying the underlying SQL from Airflow to Ascend. “Once we had it working at parity, we would just turn the [old] flow off and put the [new] output connector where it needed to go,” he says.

The most helpful thing about Ascend was it would run code changes so quickly so the team could develop and fix things in real time. “The system can be aware of where pieces in the workflow have changed or not, and it doesn’t rerun everything if nothing’s changed, so you’re not wasting compute,” he says. “That was a really nice speed up.”

Some things still involved an overnight wait, though. “There’s an upstream service you can only download from between 2 a.m. and 5 a.m., so getting that code just right, to make sure it was downloading at the right time of day, was a pain but it wasn’t necessarily Ascend’s fault,” he says.

Mobilizing a culture shift

The move to Ascend didn’t lead to any major retraining or hiring needs either. “Building is pretty much zero now that we have everything abstracted,” Doka says, and there are now three people running jobs on top of the new systems, and around six analysts doing reporting and generating insights from the data.

“Most of the infrastructure work is gone,” he adds. “There’s still some ETL work, the transforming and cleansing that never goes away, but now it’s done in a standardized way. One thing that took time to digest, though, was that shift from what I call vanilla Python used with Airflow to Spark Python. It feels different than just writing procedural code.” It’s not esoteric knowledge, just something the FiveStars team hadn’t used before and needed to familiarize themselves with.

A recurring theme in Doka’s data engineering journey has been looking for new things he can stop building and buy instead.

“When you build, own, and run a piece of infrastructure in house, you have a greater level of control and knowledge,” he says. “But often you sacrifice a ton of time for it, and in many cases don’t have the best expertise to develop it.”

Convincing his colleagues of the advantages of doing less wasn’t easy. “I struggled with the team in both eras,” he says. “That’s always part of a transition to any more abstracted system.”

Doka says he’s worked with several startups as an investor or an advisor, and always tells technically minded founders to avoid running infrastructure themselves and pick a best-in-class vendor to host things for them — and not just because it saves time. “You’re also going to learn best practices much better working with them,” he says. He offers enterprise IT leaders the same advice when dealing with internal teams. “The most consistent thing I’ve seen across 11 years as a CTO is that gravity just pulls people to ‘build it here’ for some reason,” he says. “I never understood it.” It’s something that has to be continually resisted or wind up wasting time maintaining things that aren’t part of the core business.

CIO, Data Engineering, IT Leadership

By Milan Shetti, CEO Rocket Software

For several months now, pundits and economists alike have indicated that we are likely to enter, or already have entered, a recession. Regardless, The National Bureau of Economic Research (NBER) has the final say on whether any period of economic decline qualifies as a recession, and that determination might not come for months.

Whether or not the U.S. enters a recession, businesses must have a plan. By recession-proofing tech stacks, businesses can compete and thrive regardless of market conditions. Consider the following tips when planning to recession proof your technology stack. 

Avoid single sourcing

When it comes to recession-proofing a tech stack, leaders should avoid single sourcing. Supply chains are especially vulnerable where single sources can hamstring a business by causing delays in shipments and leading to an increase in prices and generalized inflation. If a company’s entire product portfolio is made in one location and that location becomes overwhelmed, its operations could come to a halt.

The same thing can be said about IT processes. If a business is focused on a single cloud provider that can shut down operations, whether purposefully or accidentally, the outcomes can be catastrophic. As a recession becomes more likely, businesses must choose partners that do not box their customers into a single source cloud solution. A hybrid approach to IT is always best.

Understand the power of automation

Understanding and investing in automation is a powerful tool when fighting the impacts of a recession. In a recession, businesses must do more with less. Automation can help fill the gaps and ease pressure on overworked employees.

But don’t automate just for the sake of automating. Over-automation could ultimately result in a business spending more resources than necessary. Instead, take stock of where employees are spending their most time, evaluate if the work being done is best done by an employee or automation, and adjust accordingly. Automation can help free up resources to allow employees to focus on more value-driven work.

Speaking of automation and technology in general, it’s always important to take stock of which technologies are considered mission critical and which are not. When the economy is on a downswing, this is especially true. If you’re continuously taking stock of which tools and technologies yield the most value, it will make trimming the excess easier.

Always prepare for a recession

Even in times of economic prosperity, business leaders must operate like a recession is never too far away. As the late technology visionary and former chairman and CEO of Intel Andy Grove once said, “Success breeds complacency. Complacency breeds failure. Only the paranoid survive.” A healthy dose of paranoia can help businesses lessen the blow of an economic recession by reducing the urge to overspend on unnecessary technology that does not bring a certain level of value to the company. It’s important to plan not just for the good days, but the days that might not be great, as well.

Businesses should view a recession as an opportunity: a chance to reevaluate what’s important and make sure its tech stack is fueled by technology that brings the highest level of return to the business.

To learn more about recession proofing your tech stack, visit Rocket’s homepage.

Digital Transformation

Halkbank, founded in 1993, is one of the largest banks in Türkiye, offering corporate and retail banking, investor relations, and SME and commercial services to over 15 million customers. But during the pandemic lockdowns, customers were forced to switch to the bank’s digital channels, and mobile app users quickly soared from one million to 2.5 million.

Since the pandemic, however, this increase in digital customers hasn’t been entirely smooth. The bank recognized the need to scale its mobile banking platform to handle more than double the volume of traffic.

Namik Kemal Uçkan, head of IT operations at Halkbank, lists challenges created in different areas, including prioritizing network availability when traffic volumes surge; making services in high demand available during peaks; ensuring speedy resolution and identification of network issues; having a sufficient capacity of network monitoring solutions; and ensuring faster incident resolutions and troubleshooting across the networks.

As a result, increasing complexity inside the enterprise IT ecosystem has been constant, and managing networks that support it, while always important, have become critical to address.

Over the last 10 years, Halkbank has been using Riverbed SteelHead across more than 1,000 branches for WAN optimization, and network and application performance. Riverbed’s solutions have helped to ensure Halkbank’s business critical applications are always available for its business users.

“Riverbed SteelHead has been used to accelerate the performance of the internal banking applications that are utilized by its own employees,” says Mena Migally, regional VP, META, at Riverbed. “By deploying this solution, they’ve reduced the latency for applications at branch offices while also realizing bandwidth savings.”

Banking, CIO, Digital Transformation