Peter Zhou, President of Huawei’s IT Product Line, is the public face of data storage technologies at the Chinese telecoms to IT giant. At MWC 2023, in between meetings with many of the 2,500 Huawei clients who made the trip to Barcelona, Peter described Europe’s buoyant market as one of the drivers behind 40% year-on-year growth in Huawei’s international on-premise data storage revenues.

In Europe, Huawei envisages continuing rapid growth as enterprises re-tool their private clouds to deal with accelerating cloudification.

“IT technology has been developing very quickly in Europe,” says Peter. “People here have accepted the case for cloudification quicker than in other regions.”

Peter adds: “For the future evolution of multi-cloud, we definitely believe that we need to continue innovating, particularly in data storage.”

Enterprises are investing in on-premise infrastructure in order to keep pace with runaway data volumes, mitigate security threats and cope with the rise of container-based and serverless application architectures.

At MWC, Peter spent much of his time discussing Huawei’s new multi-cloud storage solution, which supports intelligent cross-cloud data tiering, cross-cloud data visibility and enhanced data mobility.

“This is a must,” Peter says, referring to the last item on the list. “The data in data storage has to support data mobility, sharing between multiple clouds.”

“If we have an application in the public cloud, it must be able to access data in private clouds, rather than copying data from on-premise to the public cloud, which really isn’t cost-effective.”

For Peter’s on-premise offerings, the innovation agenda is also being driven by spiraling volumes of data, which accounts for Huawei’s aggressive focus on hardware size reduction, compression technologies and decreasing energy consumption.

Other on-premise solutions for enterprise data centers unveiled at MWC included an industry- first unified disaster recovery solution based on Storage & Optical Connection Coordination (SOCC) and multi-layer ransomware protection that integrates networking and storage to deliver 99.9% accuracy.

Huawei already offers a full enterprise storage portfolio, covering data production, backup and archiving, as well tiered solutions for hot, warm and cold data. At MWC, however, Peter’s on-premise division announced that it will be broadening its focus to include small and medium-sized enterprises (SMEs).

This is part of a company-wide effort, involving the roll-out of more than 200 new products and services for SMEs including cost-effective primary and back-up storage offerings based around the OceanStor Dorado 2000 and OceanProtect X3000.

Peter doesn’t foresee an end to on-premise demand.

He says: “People may think the public cloud is expanding and that business in the on-premise data center is shrinking. People have that kind of worry. But the real results show that the facts are different.”

“In the beginning, people choose public cloud infrastructure, but then they become more rational in terms of the cost and the return. And they start to think about the future evolution of their technology needs.”

According to Peter, that’s precisely where European enterprises find themselves: investing in on-premise storage upgrades to future-proof their multi-cloud strategies.

“We think there’s a big change happening in Europe,” he adds. “The largest enterprises will be running multiple private clouds alongside multiple public clouds. That’s the reality.”

Find out more about Huawei’s MWC program here.

Data Management

Chief data and analytics officers (CDAOs) are poised to be of increasing strategic importance to their organizations, but many are struggling to make headway, according to data presented last week by Gartner at the Gartner Data & Analytics Summit 2023.

Fewer than half (44%) of data and analytics leaders say their teams are effective in providing value to their organization. That’s from a survey of 566 data and analytics leaders globally that Gartner conducted online from September to November 2022.

“It was kind of an eye-opener that one-third of them felt they were not as effective as they could be,” says Donna Medeiros, senior director analyst at Gartner. “There’s so much going on, so many things they are compelled to do versus what they really want to do, know they need to do, know they need to prioritize. They’re spending a lot of time on things like data quality, data management, things that might be tactical, helping with operational aspects of IT. But that’s not helping move the value of the organization as a business forward.”

The responsibilities of data and analytics leaders are many and varied: Sixty percent of respondents cited defining and implementing data and analytics strategy; 59% said oversight of data and analytics strategy was in their portfolio of responsibilities; 55% pointed to data and analytics governance; and 54% cited managing data-driven culture change.

Organizations are still investing in data and analytics functions. Respondents to the survey reported their organizations are increasing investment in data management (65%), data governance (63%), and advanced analytics (60%). The mean reported budget among respondents was $5.41 million, and 44% said their data and analytics teams increased in size over the past year.

Key obstacles to data success

Despite that increased investment, CDAOs say lack of resources and funding are among their top impediments to delivering results, with 13% citing it as their top obstacle and 29% listing resource constraints among their top three hurdles.

The top impediment? Skills and staff shortages. One in six (17%) survey respondents said talent was their biggest issue, while 39% listed it among their top three. And the tight talent pool isn’t helping, Medeiros says. “CDAOs must have a talent strategy that doesn’t count on hiring data and analytics talent ready-made.”

To counter this, CDAOs need to build a robust talent management strategy that includes education, training, and coaching for data-driven culture and data literacy, Medeiros says. That strategy must apply not only to the core data and analytics team but also the broader business and technology communities in the organization.

Other obstacles to data and analytics success, according to Gartner, include:

Culture challenges to accept change (8%, top impediment; 26%, among top three)Lack of business stakeholder involvement and support (10%, No. 1 impediment; 26%, top three)Not enough authority to execute the CDAO responsibilities (9%, first choice; 24% top three)Poor data literacy (5%, top choice; 23%, top three)

“Their life is very complex,” Medeiros says of the current state of the CDAO role. “They have lots of areas of primary responsibility — implementing data and analytics strategy, oversight of data and analytics initiatives, creating and implementing information systems and data management — and the people side — workforce development, upskilling, making the organization data-driven, artificial intelligence, and centers of excellence. They’ve got a lot of complexity and a lot of people they’re answering to.”

This lack of funding for data initiatives echoes the findings of Foundry/CIO.com’s 2022 Data & Analytics Study, which also found other digital transformation initiatives taking priority and lack of executive advocacy for data initiatives as other key roadblocks to data-driven success.

What it takes to lead data strategy

Strategic missteps in realizing data goals may signal an organizational issue at the C-level, with company leaders recognizing the importance of data and analytics but falling short on making the strategic changes and investments necessary for success. According to a 2022 study from Alation and Wakefield Research, 71% of data leaders said they were “less than very confident” that their company’s leadership sees a link between investing in data and analytics and staying ahead of the competition.

Even in the case where an organization taps a designated IT leader to helm data strategy, whether in a chief data officer or chief analytics officer role, the complexity of the role and how it interfaces with other business leaders needs to be addressed for success.

Medeiros likens the CDAO role to a combination of three personas: an orchestra conductor, a composer, and a performer. The conductor looks across the organization and conducts how data and analytics is done, both across business lines with the help of domain experts, as well as in a centralized function. The composer creates and sells the storyline of the value of data and analytics. And sometimes, data leaders must be performers: helping to implement data management, data quality, data trust, spending time on data governance, compliance, and risk.

“These three personas require juggling soft, people skills and technical savvy,” Medeiros says, adding that “the CDAO serves multiple stakeholders across the organization and cannot operate in isolation. They need to align with organizational strategic priorities, collaborate and sell the overall vision and strategy for data and analytics, and get buy-in for their initiatives.”

The most successful data leaders, according to Gartner’s survey, outperformed their peers by projecting an executive presence while also building an agile and strategic data and analytics function capable of shaping data-driven business performance and operational excellence, Medeiros says. Gartner asked respondents to rate themselves across 17 executive leadership traits. There was a strong correlation between those leaders who said they were effective or very effective across those traits and those who reported high organizational and team performance. For example, 43% of top-performing data and analytics leaders said they were effective in committing time to their own professional development, versus only 19% of low performers.

Prominence matters

How CDAOs are positioned in the organization also impacts data and analytics success. According to Foundry’s 2023 State of the CIO survey, 53% of chief data officers and 45% of chief analytics officers report to the CIO, while just 35% and 38% report to the CEO, respectively. Moreover, only 37% of CDOs and 25% of CAOs report having budgets separate from IT overall.

Foundry / CIO.com

Medeiros concedes that CDAOs who report to the CIO and sit within the IT function can still be effective, but, in general, the higher CDAOs sit in the org chart, the better, she says, as this gives them more visibility and better leverage to work on organizational goals.

“It depends on their roles, responsibilities, and how much time they’re allotted for what we call business enablement — not just enterprise IT but actually helping the organization do what matters,” Medeiros says. “It can be things like cost efficiency, but it’s also new products and services that data and analytics supports and can call out.”

Foundry / CIO.com

Indeed, Rita Sallam, distinguished VP analyst at Gartner, says that by 2026 more than a quarter of Fortune 500 CDAOs will have become responsible for at least one data- and analytics-based product that becomes a top earner for their company.

To get there, though, Medeiros says CDAOs must prioritize strategy over tactics. While tactical elements such as data quality and data security are important, improving effectiveness relies on aligning the data and analytics function with organizational strategic priorities and selling the data and analytics vision to key influencers like the CEO, CIO, and CFO.

“Most CDAOs are delivering on immediate-term business goals, but for around half of CDAOs surveyed, delivery against goals for future-term growth and sustainability is lagging,” Medeiros says.

She notes that the most successful data leaders are focusing on improving decision-making capabilities, monetization of data products, and cost optimization, as well as improving data literacy and fostering a data-driven culture.

Chief Data Officer, Data Management, IT Leadership

Data is the powerhouse of digital transformation. That’s no surprise. But did you know that data is also one of the most significant factors in whether a company can achieve its sustainability goals? 

Business leaders are at a crossroads. On one hand, a perilous financial landscape threatens to stall growth, with companies of all sizes retreating to more well-established profit drivers. On the other hand, environmental stewardship has swelled into a legitimate consideration underpinning nearly every business decision, underscored by the severity of recent climate reports and a surge in consumer activism.  

This begs the question – as digital and environmental transition both hit the top of corporate agendas, what are the most important changes needed to achieve this dual transition?  

In this webinar, Microsoft’s Rosie Mastrandrea, TCS’ Jai Mishra, and Equinor’s Vegard Torset explore the crossroads of data and digital transformation — and how the right approach can unlock your sustainability goals. 

Watch the webinar.  

Digital Transformation, Financial Services Industry

Data governance definition

Data governance is a system for defining who within an organization has authority and control over data assets and how those data assets may be used. It encompasses the people, processes, and technologies required to manage and protect data assets.

The Data Governance Institute defines it as “a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”

The Data Management Association (DAMA) International defines it as the “planning, oversight, and control over management of data and the use of data and data-related sources.”

Data governance framework

Data governance may best be thought of as a function that supports an organization’s overarching data management strategy. Such a framework provides your organization with a holistic approach to collecting, managing, securing, and storing data. To help understand what a framework should cover, DAMA envisions data management as a wheel, with data governance as the hub from which the following 10 data management knowledge areas radiate:

Data architecture: The overall structure of data and data-related resources as an integral part of the enterprise architectureData modeling and design: Analysis, design, building, testing, and maintenanceData storage and operations: Structured physical data assets storage deployment and managementData security: Ensuring privacy, confidentiality, and appropriate accessData integration and interoperability: Acquisition, extraction, transformation, movement, delivery, replication, federation, virtualization, and operational supportDocuments and content: Storing, protecting, indexing, and enabling access to data found in unstructured sources and making this data available for integration and interoperability with structured dataReference and master data: Managingshared data to reduce redundancy and ensure better data quality through standardized definition and use of data valuesData warehousing and business intelligence (BI): Managing analytical data processing and enabling access to decision support data for reporting and analysisMetadata: Collecting, categorizing, maintaining, integrating, controlling, managing, and delivering metadataData quality: Defining, monitoring, maintaining data integrity, and improving data quality

When establishing a strategy, each of the above facets of data collection, management, archiving, and use should be considered.

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. To counter that, BARC recommends starting with a manageable or application-specific prototype project and then expanding across the company based on lessons learned.

BARC recommends the following steps for implementation:

Define goals and understand benefitsAnalyze current state and delta analysisDerive a roadmapConvince stakeholders and budget projectDevelop and plan the data governance programImplement the data governance programMonitor and control

Data governance vs. data management

Data governance is just one part of the overall discipline of data management, though an important one. Whereas data governance is about the roles, responsibilities, and processes for ensuring accountability for and ownership of data assets, DAMA defines data management as “an overarching term that describes the processes used to plan, specify, enable, create, acquire, maintain, use, archive, retrieve, control, and purge data.”

While data management has become a common term for the discipline, it is sometimes referred to as data resource management or enterprise information management (EIM). Gartner describes EIM as “an integrative discipline for structuring, describing, and governing information assets across organizational and technical boundaries to improve efficiency, promote transparency, and enable business insight.”

Importance of data governance

Most companies already have some form of governance for individual applications, business units, or functions, even if the processes and responsibilities are informal. As a practice, it is about establishing systematic, formal control over these processes and responsibilities. Doing so can help companies remain responsive, especially as they grow to a size in which it is no longer efficient for individuals to perform cross-functional tasks. Several of the overall benefits of data management can only be realized after the enterprise has established systematic data governance. Some of these benefits include:

Better, more comprehensive decision support stemming from consistent, uniform data across the organizationClear rules for changing processes and data that help the business and IT become more agile and scalableReduced costs in other areas of data management through the provision of central control mechanismsIncreased efficiency through the ability to reuse processes and dataImproved confidence in data quality and documentation of data processesImproved compliance with data regulations

Goals of data governance

The goal is to establish the methods, set of responsibilities, and processes to standardize, integrate, protect, and store corporate data. According to BARC, an organization’s key goals should be to:

Minimize risksEstablish internal rules for data useImplement compliance requirementsImprove internal and external communicationIncrease the value of dataFacilitate the administration of the aboveReduce costsHelp to ensure the continued existence of the company through risk management and optimization

BARC notes that such programs always span the strategic, tactical, and operational levels in enterprises, and they must be treated as ongoing, iterative processes.

Data governance principles

According to the Data Governance Institute, eight principles are at the center of all successful data governance and stewardship programs:

All participants must have integrity in their dealings with each other. They must be truthful and forthcoming in discussing the drivers, constraints, options, and impacts for data-related decisions.Data governance and stewardship processes require transparency. It must be clear to all participants and auditors how and when data-related decisions and controls were introduced into the processes.Data-related decisions, processes, and controls subject to data governance must be auditable. They must be accompanied by documentation to support compliance-based and operational auditing requirements.They must define who is accountable for cross-functional data-related decisions, processes, and controls.It must define who is accountable for stewardship activities that are the responsibilities of individual contributors and groups of data stewards.Programs must define accountabilities in a manner that introduces checks-and-balances between business and technology teams, and between those who create/collect information, those who manage it, those who use it, and those who introduce standards and compliance requirements.The program must introduce and support standardization of enterprise data.Programs must support proactive and reactive change management activities for reference data values and the structure/use of master data and metadata.

Best practices of data governance

Data governance strategies must be adapted to best suit an organization’s processes, needs, and goals. Still, there are six core best practices worth following:

Identify critical data elements and treat data as a strategic resource.Set policies and procedures for the entire data lifecycle.Involve business users in the governance process.Don’t neglect master data management.Understand the value of information.Don’t over-restrict data use.

For more on doing data governance right, see “6 best practices for good data governance.”

Challenges in data governance

Good data governance is no simple task. It requires teamwork, investment, and resources, as well as planning and monitoring. Some of the top challenges of a data governance program include:

Lack of data leadership: Like other business functions, data governance requires strong executive leadership. The leader needs to give the governance team direction, develop policies for everyone in the organization to follow, and communicate with other leaders across the company.Lack of resources: Data governance initiatives can struggle for lack of investment in budget or staff. Data governance must be owned by and paid for by someone, but it rarely generates revenue on its own. Data governance and data management overall, however, are essential to leveraging data to generate revenue.Siloed data: Data has a way of becoming siloed and segmented over time, especially as lines of business or other functions develop new data sources, apply new technologies, and the like. Your data governance program needs to continually break down new siloes.

For more on these difficulties and others, see “7 data governance mistakes to avoid.”

Data governance software and vendors

Data governance is an ongoing program rather than a technology solution, but there are tools with data governance features that can help support your program. The tool that suits your enterprise will depend on your needs, data volume, and budget. According to PeerSpot, some of the more popular solutions include:

Data governance solutionDescription and featuresCollibra GovernanceCollibra is an enterprise-wide solution that automates many governance and stewardship tasks. It includes a policy manager, data helpdesk, data dictionary, and business glossary.SAS Data ManagementBuilt on the SAS platform, SAS Data Management provides a role-based GUI for managing processes and includes an integrated business glossary, SAS and third-party metadata management, and lineage visualization.erwin Data Intelligence (DI) for Data Governanceerwin DI combines data catalog and data literacy capabilities to provide awareness of and access to available data assets. It provides guidance on the use of those data assets and ensures data policies and best practices are followed.Informatica AxonInformatica Axon is a collection hub and data marketplace for supporting programs. Key features include a collaborative business glossary, the ability to visualize data lineage, and generate data quality measurements based on business definitions.SAP Data HubSAP Data Hub is a data orchestration solution intended to help you discover, refine, enrich, and govern all types, varieties, and volumes of data across your data landscape. It helps organizations to establish security settings and identity control policies for users, groups, and roles, and to streamline best practices and processes for policy management and security logging.AlationAlation is an enterprise data catalog that automatically indexes data by source. One of its key capabilities, TrustCheck, provides real-time “guardrails” to workflows. Meant specifically to support self-service analytics, TrustCheck attaches guidelines and rules to data assets.Varonis Data Governance SuiteVaronis’s solution automates data protection and management tasks leveraging a scalable Metadata Framework that enables organizations to manage data access, view audit trails of every file and email event, identify data ownership across different business units, and find and classify sensitive data and documents.IBM Data GovernanceIBM Data Governance leverages machine learning to collect and curate data assets. The integrated data catalog helps enterprises find, curate, analyze, prepare, and share data.

Data governance certifications

Data governance is a system but there are some certifications that can help your organization gain an edge, including the following:

DAMA Certified Data Management Professional (CDMP)Data Governance and Stewardship Professional (DGSP)edX Enterprise Data ManagementSAP Certified Application Associate – SAP Master Data Governance

For related certifications, see “10 master data management certifications that will pay off.”

Data governance roles

Each enterprise composes its data governance differently, but there are some commonalities.

Steering committee

Governance programs span the enterprise, generally starting with a steering committee comprising senior management, often C-level individuals or vice presidents accountable for lines of business. Morgan Templar, author of Get Governed: Building World Class Data Governance Programs, says steering committee members’ responsibilities include setting the overall governance strategy with specific outcomes, championing the work of data stewards, and holding the governance organization accountable to timelines and outcomes.

Data owner

Templar says data owners are individuals responsible for ensuring that information within a specific data domain is governed across systems and lines of business. They are generally members of the steering committee, though may not be voting members. Data owners are responsible for:

Approving data glossaries and other data definitionsEnsuring the accuracy of information across the enterpriseDirect data quality activitiesReviewing and approving master data management approaches, outcomes, and activitiesWorking with other data owners to resolve data issuesSecond-level review for issues identified by data stewardsProviding the steering committee with input on software solutions, policies, or regulatory requirements of their data domain

Data steward

Data stewards are accountable for the day-to-day management of data. They are subject matter experts (SMEs) who understand and communicate the meaning and use of information, Templar says, and they work with other data stewards across the organization as the governing body for most data decisions. Data stewards are responsible for:

Being SMEs for their data domainIdentifying data issues and working with other data stewards to resolve themActing as a member of the data steward councilProposing, discussing, and voting on data policies and committee activitiesReporting to the data owner and other stakeholders within a data domainWorking cross-functionally across lines of business to ensure their domain’s data is managed and understood

More on data governance:

7 data governance mistakes to avoid6 best practices for good data governanceThe secrets of highly successful data analytics teams What is data architecture? A framework for managing data10 master data management certifications that will pay off

Big Data, Data and Information Security, Data Integration, Data Management, Data Mining, Data Science, IT Governance, IT Governance Frameworks, Master Data Management

Staying in control and securing your data has never been more important! As data privacy regulations continue to evolve, businesses have had to adapt how and where they store data. The EU’s General Data Protection Regulation (GDPR) has been the most newsworthy, requiring all businesses that operate in or have customers in the EU to change how they handle personal data. Regulations, compliance, and how data is controlled and managed is becoming more of a critical factor globally, with more than 157 countries around the world having some form of data privacy laws, and thus putting a spotlight on sovereign clouds.1 

In addition to rights to transparency and security granted by regulations such as GDPR, more countries worldwide are starting to create rules around data sovereignty. This ‘protectionism’ restricts where data can go and who has jurisdiction over the data. New rules around data sovereignty are designed to keep data out of the hands of other countries, bad actors, and those without authorized access.  

Data sovereignty is the right to control citizens’ data collection, ownership, and application.2  

To ensure compliance with data privacy and sovereignty laws, organizations are looking to sovereign cloud solutions to protect their sensitive data. Sovereign clouds are operated by experienced national cloud providers who can provide dedicated cloud storage that complies with local regulations. 

There are four key use cases to consider around sovereign cloud. This post will cover all four in brief, or you can read the in-depth posts on each topic. 

Data Sovereignty in the CloudData Security and ComplianceData Access and Integrity Data Independence and Mobility

Data Sovereignty in the Cloud 

A significant hurdle for complying with data sovereignty regulations is the dominance of U.S.-based companies in the public cloud computing market. These providers are subject to the U.S. CLOUD Act, which could result in the U.S. government accessing data, even if it is stored in another country but with a U.S.-based company.  

Sovereign cloud protects your data from interference by foreign authorities. All data, including metadata, resides locally, making it easier to comply with residency laws and other local sovereign requirements. Using a sovereign cloud allows you to stay in control of your data and ensure it’s compliant with regulations.  

Data Security and Compliance 

Sovereign cloud providers use multi-layered security and access controls to protect data. This prevents unauthorized access and data loss in the face of growing cyberattacks. Additional data protection steps should be taken by the provider, such as encryption and air-gapped storage.  

Compliance is critical to comply with data sovereignty laws, from where data is stored to who can access it. As laws evolve, compliance staff must understand and follow relevant local and industry regulations. Sovereign cloud providers have been approved for security controls as part of the 20-point self attestation process which provide consistent security, zero trust principles and micro segmentation in addition to having local compliance experts to keep up with the latest laws. 

VMware Sovereign Cloud helps organizations comply with data privacy laws by partnering with local cloud providers to build sovereign clouds based on VMware’s framework that are based entirely within a local jurisdiction. These VMware Cloud Verified partners have local staff with security clearances (if required) and expertise with local laws to ensure the compliance of the sovereign cloud environment. These providers offer continuous compliance monitoring, reporting, and remediation so data follows local and industry regulations.

Data Access and Integrity 

Having data is useless if you can’t access it when you need it. That’s why access and integrity are required components of a sovereign cloud. With multiple in-region data centers, providers can offer 99.999% uptime in addition to backup and recovery protocols that meet data sovereignty requirements. 

VMware Sovereign Cloud provides secure access to sensitive data and protects its integrity to allow organizations to unlock value from their data and to ensure it is accurate and complete. In-region data centers with high availability, resilient infrastructure, and low latency make data accessible when needed. Secure access presents new opportunities for data analysis that can fuel innovation and improve local economies. 

Data Independence and Mobility 

Data sovereignty laws have placed restrictions on how data travels across national or regional borders. These data movement and sharing restrictions can cause companies to limit where they do business to avoid compliance headaches. Sovereign clouds can prevent these issues by keeping a company’s sensitive data compliant while operating as part of a broader multi-cloud ecosystem that supports the overall business.

VMware Sovereign Cloud helps organizations future-proof their cloud infrastructure with data independence, interoperability and mobility. Data can be shared and migrated as needed to respond to changes in technology or geopolitics. A sovereign cloud is compatible with multi-cloud or hybrid cloud strategies and is separate from the underlying infrastructure, preventing vendor lock-in. Workload migrations into or out of a sovereign cloud are secure, allowing organizations to deploy and move data anywhere as needed.

For more info on VMware Sovereign Cloud…
Download the Sovereign Cloud Solutions Brief by clicking here or watch the Sovereign Cloud Overview video by clicking here.
Learn more about sovereign cloud from VMware or to connect with a provider in your region, visit https://cloudsolutions.vmware.com/services/sovereign-cloud.html or join the conversation on Sovereign Cloud on LinkedIn by clicking here.

Sources: 
1. Now 157 Countries: Twelve Data Privacy Laws in 2021/22, SSRN, Graham Greenleaf, University of New South Wales, Faculty of Law, March 2022 
2. Hinrich Foundation, Data is disruptive: How data sovereignty is challenging data governance, August 2021

Cloud Computing, Cloud Security

Industries increasingly rely on data and AI to enhance processes and decision-making. However, they face a significant challenge in ensuring privacy due to sensitive Personally Identifiable Information (PII) in most enterprise datasets. Safeguarding PII is not a new problem. Conventional IT and data teams query data containing PII, but only a select few require access. Rate-limiting access, role-based access protection, and masking have been widely adopted for traditional BI applications to govern sensitive data access. 

Protecting sensitive data in the modern AI/ML pipeline has different requirements. The emerging and ever-growing class of data users consists of ML data scientists and applications requiring larger datasets. Data owners need to walk a tightrope to ensure parties in their AI/Ml lifecycle get appropriate access to the data they need while maximising the privacy of that PII data.

Enter the new class 

ML data scientists require large quantities of data to train machine learning models. Then the trained models become consumers of vast amounts of data to gain insights to inform business decisions. Whether before or after model training, this new class of data consumers relies on the availability of large amounts of data to provide business value.

In contrast to conventional users who only need to access limited amounts of data, the new class of ML data scientists and applications require access to entire datasets to ensure that their models represent the data with precision. And even if they’re used, they may not be enough to prevent an attacker from inferring sensitive information by analyzing encrypted or masked data patterns. 

The new class often uses advanced techniques such as deep learning, natural language processing, and computer vision to analyze and extract insights from the data. These efforts are often slowed down or blocked as they face sensitive PII data entangled within a large proportion of datasets they require. Up to 44% of data is reported to be inaccessible in an organization. This limitation blocks the road to AI’s promised land in creating new and game-changing value, efficiencies, and use cases. 

The new requirements have led to the emergence of techniques such as differential privacy, federated learning, synthetic data, and homomorphic encryption, which aim to protect PII while still allowing ML data scientists and applications to access and analyze the data they need. However, there is still a market need for solutions deployed across the ML lifecycle (before and after model training) to protect PII while accessing vast datasets – without drastically changing the methodology and hardware used today.

Ensuring privacy and security in the modern ML lifecycle

The new breed of ML data consumers needs to implement privacy measures at both stages of the ML lifecycle: ML training and ML deployment (or inference).

In the training phase, the primary objective is to use existing examples to train a model.

The trained model must make accurate predictions, such as classifying data samples it did not see as part of the training dataset. The data samples used for training often have sensitive information (such as PII) entangled in each data record. When this is the case, modern privacy-preserving techniques and controls are needed to protect sensitive information.

In the ML deployment phase, the trained model makes predictions on new data that the model did not see during training; inference data. While it is critical to ensure that any PII used to train the ML model is protected and the model’s predictions do not reveal any sensitive information about individuals, it is equally critical to protect any sensitive information and PII within inference data samples as well. Inferencing on encrypted data is prohibitively slow for most applications, even with custom hardware. As such, there is a critical need for viable low-overhead privacy solutions to ensure data confidentiality throughout the ML lifecycle.

The modern privacy toolkit for ML and AI: Benefits and drawbacks

Various modern solutions have been developed to address PII challenges, such as federated learning, confidential computing, and synthetic data, which the new class of data consumers is exploring for Privacy in ML and AI. However, each solution has differing levels of efficacy and implementation complexities to satisfy user requirements.

Federated learning

Federated learning is a machine learning technique that enables training on a decentralized dataset distributed across multiple devices. Instead of sending data to a central server for processing, the training occurs locally on each device, and only model updates are transmitted to a central server.

Limitation: Research conducted in 2020 from the Institute of Electrical and Electronics Engineers  shows that an attacker could infer private information from model parameters in federated learning. Additionally, federated learning does not address the inference stage, which still exposes data to the ML model during cloud or edge device deployment.

Differential privacy

Differential privacy provides margins on how much a single data record from a training dataset contributes to a machine-learning model. A membership test on the training data records ensures that if a single data record is removed from the dataset, the output should not change beyond a certain threshold.

Limitation: While training with differential privacy has benefits, it still requires the data scientist’s access to large volumes of plain-text data. Additionally, it does not address the ML inference stage in any capacity. 

Homomorphic encryption

Homomorphic encryption is a type of encryption that allows computation to be performed on data while it remains encrypted. For modern users, this means that machine learning algorithms can operate on data that has been encrypted without the need to decrypt it first. This can provide greater privacy and security for sensitive data since the data never needs to be revealed in plain text form. 

Limitation: Homomorphic encryption is prohibitively costly because it operates on encrypted data rather than plain-text data, which is computationally intensive. Homomorphic encryption often requires custom hardware to optimize performance, which can be expensive to develop and maintain. Finally, data scientists use deep neural networks in many domains, often difficult or impossible to implement in a homomorphically encrypted fashion.

Synthetic data

Synthetic data is computer-generated data that mimic real-world data. It is often used to train machine learning models and protect sensitive data in healthcare and finance. Synthetic data can generate large amounts of data quickly and bypass privacy risks. 

Limitation: While synthetic data may help train a predictive model, it only adequately covers some possible real-world data subspaces. This can result in accuracy loss and undermine the model’s capabilities in the inference stage. Also, actual data must be protected in the inference stage, which synthetic data cannot address. 

Confidential computing

Confidential computing is a security approach that protects data during use. Major companies, including Google, Intel, Meta, and Microsoft, have joined the Confidential Computing Consortium to promote hardware-based Trusted Execution Environments (TEEs). The solution isolates computations to these hardware-based TEEs to safeguard the data. 

Limitation: Confidential computing requires companies to incur additional costs to move their ML-based services to platforms that require specialized hardware. The solution is also partially risk-free. An attack in May 2021 collected and corrupted data from TEEs that rely on Intel SGX technology.

While these solutions are helpful, their limitations become apparent when training and deploying AI models. The next stage in PII privacy needs to be lightweight and complement existing privacy measures and processes while providing access to datasets entangled with sensitive information. 

Balancing the tightrope of PII confidentiality with AI: A new class of PII protection 

We’ve examined some modern approaches to safeguard PII and the challenges the new class of data consumers faces. There is a balancing act in which PII can’t be exposed to AI, but the data consumers must use as much data as possible to generate new AI use cases and value. Also, most modern solutions address data protection during the ML training stage without a viable answer for safeguarding real-world data during AI deployments.

Here, we need a future-proof solution to manage this balancing act. One such solution I have used is the stained glass transform, which enables organisations to extract ML insights from their data while protecting against the leakage of sensitive information. The technology developed by Protopia AI can transform any data type by identifying what AI models require, eliminating unnecessary information, and transforming the data as much as possible while retaining near-perfect accuracy. To safeguard users’ data while working on AI models, enterprises can choose stained glass transform to increase their ML training and deployment data to achieve better predictions and outcomes while worrying less about data exposure.  

More importantly, this technology also adds a new layer of protection throughout the ML lifecycle – for training and inference. This solves a significant gap in which privacy was left unresolved during the ML inference stage for most modern solutions.

The latest Gartner AI TriSM guide for implementing Trust, Risk, and Security Management in AI highlighted the same problem and solution. TRiSM guides analytics leaders and data scientists to ensure AI reliability, trustworthiness, and security. 

While there are multiple solutions to protect sensitive data, the end goal is to enable enterprises to leverage their data to the fullest to power AI.

Choosing the right solution(s) 

Choosing the right privacy-preserving solutions is essential for solving your ML and AI challenges. You must carefully evaluate each solution and select the ones that complement, augment, or stand alone to fulfil your unique requirements. For instance, synthetic data can enhance real-world data, improving the performance of your AI models. You can use synthetic data to simulate rare events that may be difficult to capture, such as natural disasters, and augment real-world data when it’s limited.

Another promising solution is confidential computing, which can transform data before entering the trusted execution environment. This technology is an additional barrier, minimizing the attack surface on a different axis. The solution ensures that plaintext data is not compromised, even if the TEE is breached. So, choose the right privacy-preserving solutions that fit your needs and maximize your AI’s performance without compromising data privacy.

Wrap up

Protecting sensitive data isn’t just a tech issue – it’s an enterprise-wide challenge. As new data consumers expand their AI and ML capabilities, securing Personally Identifiable Information (PII) becomes even more critical. To create high-performance models delivering honest value, we must maximize data access while safeguarding it. Every privacy-preserving solution must be carefully evaluated to solve our most pressing AI and ML challenges. Ultimately, we must remember that PII confidentiality is not just about compliance and legal obligations but about respecting and protecting the privacy and well-being of individuals.

Data Privacy, Data Science, Machine Learning

At UL Solutions, CIO Karriem Shakoor has identified clear cultural and architectural requirements for achieving data democratization so that IT can get out of the reports business and into driving revenue.

Recently, I had the chance to speak at length with Shakoor about data strategy at the global safety science company, which has over 15,000 employees in 40 countries. What follows is an edited version of our interview.

Martha Heller: How is software changing UL Solutions as a business?

Karriem Shakoor: UL Solutions’ ambition is to be our customers’ most trusted, scienced-based safety, security, and sustainability partner, which means that we need best-in-class technology infrastructure. For example, investing in industry-leading customer relationship management software lets us leverage the collective innovation of that software company’s entire customer base toward meeting our own transformation goals, rather than starting from scratch. That allows our sales teams to run and track their activities with feature-rich and fully integrated processes.

But the software tools are only as powerful as our ability to create a consistent view of our customer base. We can digitize our services and enable their appropriate pricing and configuration for a customer, but to fully leverage the software investment, we also need reliable, accurate customer and account data to support direct marketing, lead generation, and personalization.

What are the steps toward having a data strategy that fully leverages the software?

Good governance is a must if you want to harness the full power of data for new products and services and achieve data democratization.

Every company must be intentional about governing and proactively managing the quantities of data it creates each year, using effective standards and quality rules. If not, they risk diluting the value they can derive, and slowing decision-making.

Email offers a simple example. In order to use email marketing to engage customers, it’s critical to build an accurate and trusted repository of email addresses. Without enforcing a convention for how those addresses are formatted and ensuring that the systems that record those addresses — whether manually or using automation — conform to that convention, you jeopardize the usability of key data.

What is data democratization and why is it important?

Democratizing data empowers stakeholders to access and use that data to answer questions on their own without working through an IT broker. For example, a stakeholder should be able to run a report without having to request that IT pull the information. After IT certifies datasets that meet validated stakeholder needs and makes them available internally, end users can draw from those datasets on demand, speeding stakeholder decision-making and getting IT out of the business of running reports.

In addition to standards and governance, what else does an organization need for data democratization?

Effective data democratization requires a data management culture that empowers business stakeholders to define how certain information will be used. This also means holding people accountable for using that information appropriately and subject to good governance.

Data democratization also requires subject matter experts inside business units and functions who understand data analytics and reporting. IT alone simply cannot drive successful data democratization.

What is your architectural strategy for enabling the democratization of data?

There really is no single best architectural design. You need adherence to strong data governance and consistent practices for defining your data and mastering it with the right tools to achieve a standard, concise view of key data types across your business.

What is the CIO’s role in leading data strategy?

An effective data strategy must connect to a business imperative. Every CIO needs to understand the company’s multi-year strategy and desired outcomes, and the data-related capabilities necessary to drive those outcomes.

At UL Solutions, tapping into our data to build a deeper understanding of customer needs and buying behaviors can help expand our relationships with existing customers.

What advice do you have for CIOs on developing a culture of data democratization?

Start with a clear strategic intention. Connecting our data democratization proposals to the company’s business strategy went a long way toward helping our executive team appreciate why we prioritized building a single, consistent view of our customers. This approach really helped generate enthusiasm and build the commitment we needed.

I also recommend that CIOs resist trying to execute a data strategy with their IT teams alone. In any company, there are at least three different groups outside of IT that think about your key data every day. For example, pricing managers, product managers, and inside sales teams need to buy into the data strategy, and so do your executive peers. You need your chief revenue or chief commercial officers sitting right next to you, championing the importance of data governance and quality.

Finally, understand that your most important work as CIO is to bring the right data leadership into the IT organization. You cannot wait to be asked to build out a data team; as CIO, you have to be one step ahead.

Data Management, IT Leadership

KPN, the largest infrastructure provider in the Netherlands, offers a high-performance fixed-line and mobile network in addition to enterprise-class IT infrastructure and a wide range of cloud offerings, including Infrastructure-as-a-Service (IaaS) and Security-as-a-Service. Drawing on its extensive track record of success providing VMware Cloud Verified services and solutions, KPN is now one of a distinguished group of providers to have earned the VMware Sovereign Cloud distinction.

“With the exceptionally strong, high-performance network we offer, this is truly a sovereign cloud. Government agencies, healthcare companies, and organizations with highly sensitive and confidential data can confidentially comply with industry-specific regulations such as GDPR, Government Information Security Baseline, Royal Netherlands Standardization Institute, and the Network and Information Security Directive,” said Babak Fouladi, Chief Technology & Digital Officer and Member of the Board of Management at KPN. “KPN places data and applications in a virtual private cloud that is controlled, tested, managed, and secured in the Netherlands, without third-party interference.”

KPN’s sovereign cloud, CloudNL, reflects a rapidly changing landscape in which many companies need to move data to a sovereign cloud. Reasons why include a dramatic increase in remote or hybrid work, evolving geopolitical events and threats, and fast-changing international regulations.

“The more you digitize an enterprise, the greater the variety of data and applications you must manage,” says Fouladi. “Each requires the right cloud environment based on the required security level, efficiency, and ease of use. On the one hand, this might include confidential customer information that requires extra protection, and which must remain within the nation’s boundaries. Just as importantly, the information must never be exposed to any foreign nationals at any time. On the other hand, you have workloads that are entirely appropriate for the public cloud and benefit from the economy and scale the cloud offers.”

Fouladi stresses that this is why so many organizations are embracing a multi-cloud strategy. It’s a strategy he believes is fundamentally enriched with a sovereign cloud.

Based on VMware technologies, CloudNL is designed to satisfy the highest security requirements and features stringent guarantees verified through independent audits. All data and applications are stored in KPN’s data centers within the Netherlands – all of which are operated and maintained by fully-vetted citizens of the Netherlands.

ValidSign, a KPN CloudNL customer, is a rapidly growing provider of cloud-based solutions that automate document signings. ValidSign’s CEO John Lageman notes that the company’s use of a fully sovereign cloud in Holland is particularly important for the security-minded organizations the company serves, among them notaries, law firms, and government institutions.

“The documents, permits, and contracts that we sign must remain guaranteed in the Netherlands,” says Lageman. “Digitally and legally signing and using certificates used to be very expensive. Moving to the cloud was the solution, but not with an American cloud provider – our customers would no longer be sure where the data would be stored or who could have access to it. With CloudNL they have that control.”

The Bottom Line

There are many reasons to move data to a sovereign cloud, among them an increase in remote or hybrid work, changing geopolitical events, or fast-changing international regulations. KPN CloudNL empowers enterprises to handle these challenges with ease by incorporating sovereign cloud into their multi-cloud strategy.

Learn more about KPN CloudNL here and its partnership with VMware here.

Cloud Computing

Organizations that are investing in analytics, artificial intelligence (AI), and other data-driven initiatives have exposed a growing challenge: a lack of integration across data sources that is limiting their ability to extract true value from these investments. It’s imperative for IT and business leaders to eliminate these data silos – some of which are operational, and some of which are cultural – to enable better insights for the business.

“The main challenge is how can companies extract data out of every silo and make it more meaningful,” says Manoj Palaniswamy, Principal Architect, Data & AI Services at Kyndryl. “They need to bring it into an environment where it can be used for analytics, reporting, AI, and machine learning.”

Increasingly, organizations are finding the cloud serves as the best environment for integrating, storing, managing, and analyzing large volumes of disparate data types. Cloud hyperscalers such as AWS offer a level of scale and performance that’s impossible to replicate in an on-premises environment. AWS also offers advanced services for analytics and AI/ML that in-house teams may not have the resources or expertise to develop themselves.

“With the cloud and its unlimited compute and storage, it is easier to collect and process structured and unstructured data, query multiple data types, and unlock insights from the data,” says Palaniswamy. Cloud environments can help companies scale for analytics, reporting, and AI/ML, while also reducing complexity – and costs – in IT operations by having a single platform to manage rather than multiple, siloed systems.

Most organizations and their leadership teams understand the value of data and are taking steps to implement a modern data strategy. Some are still at the early stages of the process, defining the data strategy and determining which data or workloads should be moved to the cloud, and which should remain on-premises. Others are further along and are now looking to capture more value from their data projects, or scale initiatives across the business. The most data-mature organizations are running operations from the cloud, in some cases deploying a managed services model for a scalable data infrastructure.

No matter where a company is in their data journey, partnering with a trusted and experienced partner is key. Kyndryl provides end-to-end servicesto consult, modernize, migrate, secure and manage critical business applications and their data. “For the past 30-plus years, we have been designing, building, and managing mission-critical IT environments for Fortune 500 companies,” says Palaniswamy. “Customers trust us.” Kyndryl brings more than 5,000 certified resources and an integrated portfolio of services and technologies across practices such as the cloud, digital workplaces, applications, data, AI security, networking, and edge computing.

Learn more about how Kyndryl and AWS can help companies extract more value from their data systems.

Analytics

SAP’s Data Warehouse Cloud is evolving, gaining new features and a new name, Datasphere, as the company addresses continued diversification of the enterprise data.

It’s part of SAP’s move to become a more significant player in the business data fabric space, said Irfan Khan, SAP’s chief product officer for its HANA database and analytics.

Khan said SAP is going beyond the usual capabilities of a data fabric by preserving the business context of the data it carries. “We want to preserve the business semantics and the business context of that data,” he said. “We’re not going to have customers make a compromise between accessing the data virtually or federating the data.”

Threat recognition

The competitive threat SAP faces in this space, said IDC analyst Dan Vesset, is that the data landscape is becoming more diverse.

“You have SAP applications and you have more and more of somebody else’s applications in the same environment, and the question then is, where’s the center of gravity? Who has the most pull?” he said.

Khan acknowledged the threat is influencing SAP’s product development.

“A significant part of SAP’s evolution towards this new strategy is recognizing that no single vendor will own the entire customer stack,” he said. “That customer stack is in fact very heterogeneous.”

In the past, SAP and other vendors have assumed that if they create a new product, customers will move their data to it — but that has not always worked out well for either vendors or customers.

SAP is no longer taking a hard line on moving data to its applications, Khan said.

“If you’re running a marketing campaign, more likely you’ll need to have access to SAP data,” he said. “But it just makes it a lot easier to have access to the SAP context through the business data fabric, through Datasphere, without having to redundantly move the data, lose the context, lose semantics, and then have to go to the painful exercise of having to reconstitute all that again.”

The new functions Datasphere offers over Data Warehouse Cloud include automated data cataloging, simplified data replication, and improved data modeling.

The move from Data Warehouse Cloud to Datasphere will be easy, according to Khan: Existing customers will automatically have access to the new functionality and will be charged for usage under their regular SAP consumption agreement. “There’s nothing more to pay if you don’t use it,” he said.

SAP is also opening it up to partners to add new functionality and make it easier to access data from other platforms through Datasphere.

“We want to make it very easy for SAP’s data to be accessed and to be extended with business context through Datasphere,” he said. “But we’ll also use our new data ecosystem participants.”

Partner agreements

Four partners are signed up to offer Datasphere integrations at launch. Collibra plans to offer enterprises a way to build a complete catalog, with lineage, of all their SAP and non-SAP data. Confluent will connect its cloud-native data streaming platform to Datasphere, making it possible to connect SAP and external applications in real time. Plus, Databricks is making it possible for users of its data lakehouse to integrate it with SAP applications, preserving semantics when data are shared. And DataRobot is helping customers to build automated machine learning capabilities on Datasphere.

SAP has chosen its initial partners to cover a broad range of functions with little overlap between them, noted Vesset. But that doesn’t mean there’s no overlap with Datasphere itself.

Take Datasphere’s new data cataloging and governance functions, for instance. “Theoretically, one could just use Collibra,” Vesset said.

However, he said, SAP will have greater knowledge of the metadata associated with the data held in its applications that the catalog is supposed to capture, and it can invest more in integration with its partners. “If you’re an SAP ERP customer, or if you have multiple SAP enterprise applications, probably the easier path would be to use SAP’s product first, and then use something like Collibra for other data that’s not SAP,” he said.

Four partners are a start, but to make a success of this new strategy to help customers integrate SAP and non-SAP applications into their data fabric, “They absolutely need more, because they need to get to where their clients are, and their clients will have many different tools,” he said. SAP is offering nothing enterprises can’t find elsewhere — at a price. “You can build any of these tools yourself from open-source technologies, but there’s a cost associated with that. That’s the build, buy, or partner decision that every large organization needs to make,” Vesset said. “SAP is hoping that their solution will provide enough efficiency and cost savings for clients to come to them.”

Cloud Architecture, Cloud Management, Cloud Storage, SAP