Industries increasingly rely on data and AI to enhance processes and decision-making. However, they face a significant challenge in ensuring privacy due to sensitive Personally Identifiable Information (PII) in most enterprise datasets. Safeguarding PII is not a new problem. Conventional IT and data teams query data containing PII, but only a select few require access. Rate-limiting access, role-based access protection, and masking have been widely adopted for traditional BI applications to govern sensitive data access. 

Protecting sensitive data in the modern AI/ML pipeline has different requirements. The emerging and ever-growing class of data users consists of ML data scientists and applications requiring larger datasets. Data owners need to walk a tightrope to ensure parties in their AI/Ml lifecycle get appropriate access to the data they need while maximising the privacy of that PII data.

Enter the new class 

ML data scientists require large quantities of data to train machine learning models. Then the trained models become consumers of vast amounts of data to gain insights to inform business decisions. Whether before or after model training, this new class of data consumers relies on the availability of large amounts of data to provide business value.

In contrast to conventional users who only need to access limited amounts of data, the new class of ML data scientists and applications require access to entire datasets to ensure that their models represent the data with precision. And even if they’re used, they may not be enough to prevent an attacker from inferring sensitive information by analyzing encrypted or masked data patterns. 

The new class often uses advanced techniques such as deep learning, natural language processing, and computer vision to analyze and extract insights from the data. These efforts are often slowed down or blocked as they face sensitive PII data entangled within a large proportion of datasets they require. Up to 44% of data is reported to be inaccessible in an organization. This limitation blocks the road to AI’s promised land in creating new and game-changing value, efficiencies, and use cases. 

The new requirements have led to the emergence of techniques such as differential privacy, federated learning, synthetic data, and homomorphic encryption, which aim to protect PII while still allowing ML data scientists and applications to access and analyze the data they need. However, there is still a market need for solutions deployed across the ML lifecycle (before and after model training) to protect PII while accessing vast datasets – without drastically changing the methodology and hardware used today.

Ensuring privacy and security in the modern ML lifecycle

The new breed of ML data consumers needs to implement privacy measures at both stages of the ML lifecycle: ML training and ML deployment (or inference).

In the training phase, the primary objective is to use existing examples to train a model.

The trained model must make accurate predictions, such as classifying data samples it did not see as part of the training dataset. The data samples used for training often have sensitive information (such as PII) entangled in each data record. When this is the case, modern privacy-preserving techniques and controls are needed to protect sensitive information.

In the ML deployment phase, the trained model makes predictions on new data that the model did not see during training; inference data. While it is critical to ensure that any PII used to train the ML model is protected and the model’s predictions do not reveal any sensitive information about individuals, it is equally critical to protect any sensitive information and PII within inference data samples as well. Inferencing on encrypted data is prohibitively slow for most applications, even with custom hardware. As such, there is a critical need for viable low-overhead privacy solutions to ensure data confidentiality throughout the ML lifecycle.

The modern privacy toolkit for ML and AI: Benefits and drawbacks

Various modern solutions have been developed to address PII challenges, such as federated learning, confidential computing, and synthetic data, which the new class of data consumers is exploring for Privacy in ML and AI. However, each solution has differing levels of efficacy and implementation complexities to satisfy user requirements.

Federated learning

Federated learning is a machine learning technique that enables training on a decentralized dataset distributed across multiple devices. Instead of sending data to a central server for processing, the training occurs locally on each device, and only model updates are transmitted to a central server.

Limitation: Research conducted in 2020 from the Institute of Electrical and Electronics Engineers  shows that an attacker could infer private information from model parameters in federated learning. Additionally, federated learning does not address the inference stage, which still exposes data to the ML model during cloud or edge device deployment.

Differential privacy

Differential privacy provides margins on how much a single data record from a training dataset contributes to a machine-learning model. A membership test on the training data records ensures that if a single data record is removed from the dataset, the output should not change beyond a certain threshold.

Limitation: While training with differential privacy has benefits, it still requires the data scientist’s access to large volumes of plain-text data. Additionally, it does not address the ML inference stage in any capacity. 

Homomorphic encryption

Homomorphic encryption is a type of encryption that allows computation to be performed on data while it remains encrypted. For modern users, this means that machine learning algorithms can operate on data that has been encrypted without the need to decrypt it first. This can provide greater privacy and security for sensitive data since the data never needs to be revealed in plain text form. 

Limitation: Homomorphic encryption is prohibitively costly because it operates on encrypted data rather than plain-text data, which is computationally intensive. Homomorphic encryption often requires custom hardware to optimize performance, which can be expensive to develop and maintain. Finally, data scientists use deep neural networks in many domains, often difficult or impossible to implement in a homomorphically encrypted fashion.

Synthetic data

Synthetic data is computer-generated data that mimic real-world data. It is often used to train machine learning models and protect sensitive data in healthcare and finance. Synthetic data can generate large amounts of data quickly and bypass privacy risks. 

Limitation: While synthetic data may help train a predictive model, it only adequately covers some possible real-world data subspaces. This can result in accuracy loss and undermine the model’s capabilities in the inference stage. Also, actual data must be protected in the inference stage, which synthetic data cannot address. 

Confidential computing

Confidential computing is a security approach that protects data during use. Major companies, including Google, Intel, Meta, and Microsoft, have joined the Confidential Computing Consortium to promote hardware-based Trusted Execution Environments (TEEs). The solution isolates computations to these hardware-based TEEs to safeguard the data. 

Limitation: Confidential computing requires companies to incur additional costs to move their ML-based services to platforms that require specialized hardware. The solution is also partially risk-free. An attack in May 2021 collected and corrupted data from TEEs that rely on Intel SGX technology.

While these solutions are helpful, their limitations become apparent when training and deploying AI models. The next stage in PII privacy needs to be lightweight and complement existing privacy measures and processes while providing access to datasets entangled with sensitive information. 

Balancing the tightrope of PII confidentiality with AI: A new class of PII protection 

We’ve examined some modern approaches to safeguard PII and the challenges the new class of data consumers faces. There is a balancing act in which PII can’t be exposed to AI, but the data consumers must use as much data as possible to generate new AI use cases and value. Also, most modern solutions address data protection during the ML training stage without a viable answer for safeguarding real-world data during AI deployments.

Here, we need a future-proof solution to manage this balancing act. One such solution I have used is the stained glass transform, which enables organisations to extract ML insights from their data while protecting against the leakage of sensitive information. The technology developed by Protopia AI can transform any data type by identifying what AI models require, eliminating unnecessary information, and transforming the data as much as possible while retaining near-perfect accuracy. To safeguard users’ data while working on AI models, enterprises can choose stained glass transform to increase their ML training and deployment data to achieve better predictions and outcomes while worrying less about data exposure.  

More importantly, this technology also adds a new layer of protection throughout the ML lifecycle – for training and inference. This solves a significant gap in which privacy was left unresolved during the ML inference stage for most modern solutions.

The latest Gartner AI TriSM guide for implementing Trust, Risk, and Security Management in AI highlighted the same problem and solution. TRiSM guides analytics leaders and data scientists to ensure AI reliability, trustworthiness, and security. 

While there are multiple solutions to protect sensitive data, the end goal is to enable enterprises to leverage their data to the fullest to power AI.

Choosing the right solution(s) 

Choosing the right privacy-preserving solutions is essential for solving your ML and AI challenges. You must carefully evaluate each solution and select the ones that complement, augment, or stand alone to fulfil your unique requirements. For instance, synthetic data can enhance real-world data, improving the performance of your AI models. You can use synthetic data to simulate rare events that may be difficult to capture, such as natural disasters, and augment real-world data when it’s limited.

Another promising solution is confidential computing, which can transform data before entering the trusted execution environment. This technology is an additional barrier, minimizing the attack surface on a different axis. The solution ensures that plaintext data is not compromised, even if the TEE is breached. So, choose the right privacy-preserving solutions that fit your needs and maximize your AI’s performance without compromising data privacy.

Wrap up

Protecting sensitive data isn’t just a tech issue – it’s an enterprise-wide challenge. As new data consumers expand their AI and ML capabilities, securing Personally Identifiable Information (PII) becomes even more critical. To create high-performance models delivering honest value, we must maximize data access while safeguarding it. Every privacy-preserving solution must be carefully evaluated to solve our most pressing AI and ML challenges. Ultimately, we must remember that PII confidentiality is not just about compliance and legal obligations but about respecting and protecting the privacy and well-being of individuals.

Data Privacy, Data Science, Machine Learning

In addition to showcasing your executive experience and accomplishments, effective and targeted personal branding can demonstrate thought leadership and expertise within specific domain areas, as well as make a statement about your core values, character, and attitude. It can also help you move roles, whether from an operational “keep the lights on” CIO position to a more forward-looking innovative one (or vice versa), or even a CDO, COO or CEO role.

There’s a financial component, too. The Thinkers360 2023 B2B Thought Leadership Outlook study, conducted in association with the British Computer Society (BCS), found that over 86% of thought leadership creators rate their content as adding over 25% to the brand premium they command in the marketplace, and over 48% stated it added over 75%.

So no matter where you are in your personal branding journey, here are 10 best practices to help you maximize your personal brand both in the near-term and throughout your career.

Determine your commitment to personal branding – This is the “why” of your personal brand. What do you want your legacy to be? What do you want to be known for? Think about your personal branding goals for this year, but also where you want to be in up to 10 years’ time. It’s fine to adjust your personal brand as well. For example, if you’re known for your expertise in emerging technologies, it makes sense to keep your brand up to date with the latest trends (while being careful not to spread yourself too thin attempting to cover too many topics).

Pick your thought leadership persona – As a CIO, your primary persona is likely that of an executive, but think about other thought leadership personas that can help to amplify your primary persona. This might be as an author, influencer or speaker, for example, from your perspective as a CIO. If you’re uncomfortable with keynote speaking, you can be just as effective as a panelist at industry events and conferences, or on the receiving end of media interviews. The most important thing is to choose a persona that’s authentic to your personality and something you enjoy doing.

Pick your area of expertise – Once you’ve chosen your thought leadership persona, you’ll want to think about the area of expertise you’d like to anchor to your personal brand. This might be your CIO role itself, or even a specific technology or leadership discipline such as artificial intelligence, machine learning or change management. For example, Claire Rutkowski, CIO of infrastructure engineering software company Bentley Systems, gives advice from her perspective with actionable insights such as her experience with ProSci’s ADKAR model, which can be useful for change enablement.

Start small – If you’re new to thought leadership and wish to add this aspect to your personal brand, you can often start small with a ‘land and expand’ approach. Start small with an article or blog, a media interview, a speaking slot at an industry event or conference, or even by entering some suitable industry awards. This all builds credibility, adds to your personal profile, portfolio, and media kit, and can help land your next “win” such as a book, a keynote, or a major award, such as the CIO 100 Awards. When selecting any of these outlets, choose wisely, since your personal brand will be shaped by the brands you associate with.

Amplify your personal brand – The Thinkers360 study found that specialist communities were the number-one destination for access to thought leadership content by readers, and a top-three destination for thought leaders to disseminate their content after social media and individual web sites. Depending on the business model, these specialist communities can often help you to build, amplify and monetize your personal brand as well.

Use your career journey to tell your brand story – Your life experiences and career journey all tell a story about your personal brand. Think about the various career moves you’ve made over the years, the rationale for each move, and how this helps to shape the narrative about your personal brand. This may also help influence your next move too.

Round out your competency over time – Once you’ve become a world-class author, influencer, or speaker (no small feat in itself), the next step is to round out your skills so you’re even more versatile. Gartner encourages this among their analysts and advisors, so they develop their skills not only in terms of one-on-one advising and writing research reports, but also in public speaking in front of both small private groups and large audiences at conferences. This helps to develop skills to best connect with your audience regardless of the context.

Use your personal branding to promote your organization – As a CIO, you can be an excellent employee advocate for your own organization, and many CIOs do this to a greater or lesser extent based on personal preference. This may involve piloting solutions internally before they’re released to the public, and helping with internal case studies. Many CIOs, not only pilot internally, but hit the road with other members of the C-suite to meet with key clients and share their experience.

Make your content insightful, engaging, and actionable – The Thinkers360 study found that thought leadership consumers cited insightful (94%), forward-looking (90%), engaging (89%), relevant (88%) and actionable (84%) as extremely important or very important attributes of thought leadership. In an era of increased competition for attention, thought leaders plan to cut through the noise by making their content highly actionable (73%), multichannel (59%), and shared via specialist communities (55%).

Treat your personal brand as your most valuable asset – According to Tom Koulopoulos, author of Revealing the Invisible, the great myth of the Internet is if you have volumes of great content, you don’t need to worry about creating a thought leadership brand. This is no truer for a thought leader brand than it is for a corporate brand. In many ways you must be more vigilant about how you present yourself to the market, prospects, and clients. His advice is to craft, curate, and care for your brand as though it were your most important asset, because it is.

As a CIO, you’ve put a lot of energy into advancing your organization and its mission. Putting some energy into your personal branding is well worth the effort and it will benefit your organization too.

Careers, CIO, IT Leadership