Automating Data Identification and Classification to Improve Workflow

Large and mid-sized companies have a data problem. Accumulating data to better understand consumer behavior has led to a surge of investment in data storage, but companies face a significant challenge today in figuring out what data they actually need and how to manage it.

Managing and organizing data for large companies with large departments can be challenging when data is siloed and 80% of all corporate data is unstructured. Companies typically hire teams of data managers, either on or off-site, whose sole job is to ensure data is organized, stored, easily accessible and secure from cyberthreats.

Data teams typically work independently from each other and are notoriously inefficient because companies simply have too much data. According to a report by Forrester, 75% of employees find it difficult to access the information they need through enterprise systems and 62% of people report procrastinating a task if it requires them to sign into multiple different systems.

For many company executives who probably don’t know where their company’s data is stored, hiring more data managers is not the answer to solving enterprise data issues. Instead, executives need to properly index all their company’s data to unlock the true potential of the data. Indexing reduces the management of unstructured data, cuts storage costs of redundant data, and lessens the company’s risk of data leaks or breaches.

Here’s how executives should view their data.

First, identify where the data is located. Whether data is spread across the U.S. or the world in massive data centers, at the edge, on onsite servers, in the cloud, on social media, or even on employees’ personal computers, identifying where the data is located essential to regaining control.

Second, index all of the data across those multiple locations so data management teams can access the metadata and content of files without delay with the added benefit of reducing human error. Enable your data managers to be the data stewards you need them to be by giving them access to data that can be searched, organized, copied, moved, deleted, and integrated with other applications. Data intelligence can provide managers with even more insight into their data usage. Most importantly, ensure the indexed data can be audited to remove unknown “dark” data or redundant, obsolete, and trivial (ROT) files, which simply take up costly storage capacity.

The overhead costs of data storage are astronomical and many companies are entirely unaware of how much they spend. Simplifying the removal of dark data can reduce costs by approximately 60% through data and IT consolidation. Utilizing actionable intelligence to move, archive, or purge data at the source, can reduce data infrastructure and storage costs by an estimated 30% to 40% – huge savings for companies whose data needs will only grow in the future.

Third, classify your data to meet regulatory compliance obligations. California just passed Proposition 24, the California Privacy Rights Act of 2020 (CPRA), a popular referendum strengthening and clarifying data privacy rights under California’s Consumer Privacy Act of 2018 (CCPA). The CPRA creates a new agency, the California Privacy Protection Agency, with the ability to develop legally binding regulations to more effectively enforce the CCPA and CPRA.

If a company can’t comply with a user’s request to have their information deleted, transferred or changed under the CCPA or CPRA, they’re subject to large penalties and fines as well as private rights of action. While enforcement of the CPRA doesn’t start until 2023, companies need to get their data act together now, and the only way to do that is to modernize their approach to data.

Data classification for legal compliance will become a major challenge for companies who aren’t even sure what data they have or where their data is located. Today, most companies who classify their information do so manually, making it subject to human error. Companies can modernize their data storage with automatic classification programs to minimize the human risk factor. Automating data governance lowers risks affiliated with sovereignty, compliance, privacy, and cyber risk.

By removing data bottlenecks, companies can improve their data team’s workflow, better understand their consumer data and its worth, protect the company from legal liability and cyber threats, and more.

Website | + posts

Gary Lyng is the CMO of Aparavi, the leading data intelligence and automation software and services company that helps companies find and unlock the value of data – no matter where it lives. Aparavi’s cloud-based SaaS platform finds, automates, governs, and indexes distributed data. Aparavi ensures secure access for modern data demand of analytics, machine learning, and collaboration.