Classification Entity Types

A classification entity type is a custom classification model that is plugged into the DryvIQ Platform. You can add an entity type to a policy to have the policy find the associated information, and you also have the option of uploading files against an entity type to see if the corresponding information appear in the file. (Refer to Uploading Samples to learn how to upload individual files for analysis against any entity type.) DryvIQ has the following 7 preinstalled classification entity types.

 

Content Signature

The content signature is a unique identifier generated based on the file contents. The Content Signature entity type can identify the signature for a given document. Identifying this signature can help identify duplicate data. While this entity type was added to be used as part of the duplicate detection for the Discover scan, you can upload samples against it to identify the content signature, and you can add the entity type to a Govern policy to identify duplicate files based on the same content signature.

Document Type

The Document Type Classifier is an AI classifier trained to identify and classify a document as one of 188 different document types. You can upload individual documents to the classifier, and it will identify the document type and provide a confidence score. You also have the option of including this entity type in the policies you create to identify the documents types in a the content for a data source.

Some documents are categorized in document groups. When a document matches one of these document types, the group name will be displayed rather than the individual document type. You can download and review a complete list of the individual document types DryvIQ will identity using the link below.

 

File Metadata Signature

The file metadata signature is a unique identifies generated based on the file name and size. The File Metadata Signature entity type can identify the file metadata signature for a given document. Identifying this can help identify duplicate data. While this entity type was added to be used as part of the duplicate detection for the Discover scan, you can upload samples against it to identify the file metadata signature, and you can add the entity type to a Govern policy to identify duplicate files based on the same file metadata signature.

File Name

The File Name Classifier is an AI classifier trained to identify and classify a document as one of almost 700 different document types based on the file name. You can upload individual documents to the classifier, and it will identify the document type and provide a confidence score. You also have the option of including this entity type in the policies you create to identify documents based on file names. You can download and review a complete list of the document types DryvIQ will identity using the link below.

 

Form Matcher

The Form Matcher is an AI classifier trained to identify and classify a document as one of over 6000 government and other commonly-used organization forms. The Form Matcher can be used to match a “query” document to an indexed document. It attempts to match the query document against all indexed documents and returns the indexed document with the highest similarity score between it and the query document. When you upload a file against the matcher, DryvIQ will include the confidence level for the matched form. You also have the option of including this entity type in the policies you create to identify forms in a the content for a data source. The list of forms is too large to include on this page, but you can download the full list of forms below for your reference.

 

Language Detection

The Language Detection is an AI classifier that will identify the language of a document. It currently detects 176 languages. When you upload a file against the module, DryvIQ will also include the confidence level for the detected language. You also have the option of including this entity type in the policies you create to identify the languages in a the content for a data source. Download the completed list of languages below.

 

Microsoft Information Protection

The MIP Classifier extension allows you to extract your Microsoft Information Protection (MIP) security labels and use the MIP entity type to create tracking group assignment rules for your policies. This requires you to register an application in your Microsoft Azure account to obtain the Application (Client) ID and Directory (Tenant) ID required to allow DryvIQ to access the security labels through the Microsoft Information Protection Sync Service. See MIP Classifier Extension for more information.

PII Extraction Module

The Personally Identifiable Information (PPI) Extraction Module is a pre-trained AI model that can reliably identify and extract PII elements contained in unstructured data. You can include this entity type in the policies you create to identify PII information in content for a data source and specify rules to classify files based on the results. For example, if a Person’s Name or Address is found in a file in a “public” folder, it can be set to be classified as “Restricted.” You can also upload individual documents to the classifier, and it will identify any PII found in it and provide a confidence score. Download a list of the PII types DryvIQ will detect below.

 

Sensitive Object Detection

Sensitive Object Detection is trained to identify and classify images of sensitive data, such as identification cards, fingerprints, license plates, etc. You can upload individual documents to the classifier, and it will identify any images of sensitive information. If an image contains multiple sensitive objects, all items will be identified. For example, if the document contains an image of a driver’s license, the scan will identify both the ID card and signature as detected sensitive objects. You also have the option of including this entity type in the policies you create to identify documents based on file names. Download the list of 12 sensitive objects DryvIQ will identify using the link below.

 

Supported Image Types

DryvIQ will scan the following image types only.

  • BM

  • BMP

  • GIF

  • ICB

  • JFIF

  • JPEG

  • JPG

  • PBM

  • PDF

  • PNG

  • TGA

  • TIFF (See note below.)

  • VDA

  • VST

  • WEBP

A TIFF is a complex image file made up of multiple parts; therefore, not all TIFF files can be successfully scanned for various reasons. If a TIFF file is found but can't be scanned, an error will be logged identifying why it couldn't be scanned.

 

 

DryvIQ Platform Version: 5.9.2
Release Date: December 17, 2024