QoreAudit: The mechanics of the semantic contextualization.

January 10, 2024 Alkis Papadopoullos

QoreAudit: The mechanics of the semantic contextualization.

By Alkis Papadopoullos, CEO and CSO at Coginov

As privacy laws are enacted throughout various countries and regions, the ability to uncover relevant tags indicative of personal or sensitive data (often referred to as PII – personal identifiable information – and SII – sensitive identifiable information) is becoming more and more important. Achieving this requires tools and algorithms to be able to put text-based keywords and expressions in context in order to minimize false positive or false negative hits when discovering PII or SII data. Typical types of said data can be person or organization names, phone numbers, addresses, financial information or records, citizen identification documents, etc.

To address this need, we present some ideas around the mechanics of semantic contextualization. Platforms that attempt to discover and extract such information have a set of very tangible expectations such as obtaining a list of possible PII or SII candidates for review and subsequent calls to action regarding disposal, storage, or protection of this data. However, this proves very difficult to accomplish if analysis is based solely on brute force extraction without any context. For example, is what seems to be a person’s name, actually part of street name and hence part of an address? Is a sixteen-digit numeral actually a credit card number? Etc.

Machine learning based semantic analysis can help to achieve these goals, primarily because it involves associating to each potential sensitive data piece, the meaning of that data; it is thus equivalent to extracting and storing concepts rather than keywords. By identifying concepts along with named entities (PII and SII data) it is possible to achieve three important goals that are the cornerstones of reliably identifying personal information:

Avoid breaking up conceptual units consisting of multiple words (“my credit card number is”).
Use different semantic categories to facilitate discovery of unexpected or related themes or concepts.
Establish correlations between concepts to provide context for analysis.

Coginov’s QoreAudit product helps to precisely achieve such goals. Using a natural language processing approach that combines semantic analysis with proprietary machine learning algorithms we strive to help users reliably identify meaning in content and relate it to potential PII and SII data. This enhances customers’ ability to reliably mine data for actionable information and does so all the while reducing the time that must be spent analyzing data to draw reliable conclusions. This in turn means that we can identify whether a concept evoked in a comment is clearly referring to a snippet of personal or sensitive data.

Another very significant advantage of semantic contextualization is the ability to compute a document’s “semantic profile” based on the type of PII and SII data extracted. IN so doing, it is possible then possible to assess the level “sensitivity” of a given document or set of documents and much more accurately determine whether chances of identity theft, intellectual property theft, sensitive financial data acquisition, etc., are higher. By mapping the most relevant concepts to potential PII and SII data, we can determine several very interesting things:

What is the real risk factor associated with a given document or set of documents.

How much conceptual and PII or SII overlap there is across multiple documents
Reduce duplication of documents with PII or SII data, as duplicates multiply the chance of more effectively stealing private data).
Where are the riskiest documents concentrated (i.e. from which data store or source).
Take measures to address loss of private data through tangible risk-based assessment and call to action!

In summary, through semantic contextualization Coginov’s QoreAudit product allows customers to gain actionable insights more rapidly from most or all of their data repositories, understand the specifics about why certain documents or sets of documents are riskier than others, and take tangible action to protect all PII and SII data they hold. Please feel to contact us at sales@coginov.com if you are interested in further information or a demo of our product.

COGINOV

We create innovative solutions

COGINOV is recognized as a world leader in semantic technologies and information management. We are a Canadian software company offering our customers innovative solutions for managing structured and unstructured information. Our head office is based in Montreal.

Coginov’s Qore platform technology enhances the information value chain, transforming unstructured content into highly contextualized, accessible and valuable information. Coginov’s solutions enable you to capture, analyze, engage, automate and manage your information assets, with unrivalled accuracy and efficiency.

Discover our solutions QoreAudit, QoreUltima and QoreMail

Machine Learning

Contact Information

Contact Information

Contact Information

QoreAudit: The mechanics of the semantic contextualization.

Main Menu

Menu principal

Subscribe