A classification entity type is a custom classification model that is plugged into the DryvIQ Platform. Refer to Uploading Samples to learn how to upload individual files for analysis against any entity type.
DryvIQ has the following preinstalled classification entity types.
Document Type
The Document Type Classifier is trained to identify and classify a document as one of 188 different document types. You can upload individual documents to the classifier, and it will identify the document type and provide a confidence score. You also have the option of including this entity type in the policies you create to identify the documents types in a the content for a data source.
Some documents are categorized in document groups. When a document matches one of these document types, the group name will be displayed rather than the individual document type. You can download and review a complete list of the individual document types DryvIQ will identity using the link below.
⬇ Download the Document Classifier List
File Name
The File Name Classifier is trained to identify and classify a document as one of over 645 different document types based on the file name. You can upload individual documents to the classifier, and it will identify the document type and provide a confidence score. You also have the option of including this entity type in the policies you create to identify documents based on file names. You can download and review a complete list of the document types DryvIQ will identity using the link below.
⬇ Download the File Name Classifier List
Form Matcher
The Form Matcher currently supports over 6000 government and other commonly-used organization forms. The Form Matcher can be used to match a “query” document to an indexed document. It attempts to match the query document against all indexed documents and returns the indexed document with the highest similarity score between it and the query document. When you upload a file against the matcher, DryvIQ will include the confidence level for the matched form. You also have the option of including this entity type in the policies you create to identify forms in a the content for a data source. The list of forms is too large to include on this page, but you can download the full list of forms below for your reference.
Language Detection
The Language Detection classifier will identify the language of a document. It currently detects 176 languages. When you upload a file against the module, DryvIQ will also include the confidence level for the detected language. You also have the option of including this entity type in the policies you create to identify the languages in a the content for a data source. Download the completed list of languages below.
⬇ Download the Language Detection list
Microsoft Information Protection
The MIP Classifier extensions allows you to extract your Microsoft Information Protection (MIP) security labels and use the MIP entity type to create tracking group assignment rules for your policies. This requires you to register an application in your Microsoft Azure account to obtain the Application (Client) ID and Directory (Tenant) ID required to allow DryvIQ to access the security labels through the Microsoft Information Protection Sync Service. See MIP Classifier Extension for more information.
PII Extraction Module
The Personally Identifiable Information (PPI) Extraction Module is a pre-trained artificial intelligence (AI) model that can reliably identify and extract PII elements contained in unstructured data. You can include this entity type in the policies you create to identify PII information in content for a data source and specify rules to classify files based on the results. For example, if a Person Name or Address is found in a file in a “public” folder, it can be set to be classified as “Restricted.” You can also upload individual documents to the classifier, and it will identify any PII found in it and provide a confidence score. Download a list of the PII information DryvIQ will detect below.
⬇ Download PII Extraction list
Sensitive Object Detection
Sensitive Object Detection is trained to identify and classify images of sensitive data, such as identification cards, fingerprints, license plates, etc. You can upload individual documents to the classifier, and it will identify any images of sensitive information. If an image contains multiple sensitive objects, all items will be identified. For example, if the document contains an image of a driver’s license, the scan will identify both the ID card and signature as detected sensitive objects. You also have the option of including this entity type in the policies you create to identify documents based on file names.
Only the following image types will be scanned:
BM
BMP
GIF
ICB
JFIF
JPEG
JPG
PBM
PDF
PNG
TGA
TIFF (See note below.)
VDA
VST
WEBP
Info |
---|
A TIFF is a complex image file made up of multiple parts; therefore, not all TIFF files can be successfully scanned for various reasons. If a TIFF file is found but can't be scanned, an error will be logged identifying why it couldn't be scanned. |