Adding a Discover Scan
On This Page
- 1 Overview
- 2 Data Source
- 3 Discover Categories
- 4 Settings
- 4.1 Path Filters
- 4.1.1 Exclude folders
- 4.1.1.1 Selecting Folders to Exclude
- 4.1.1.2 Removing Excluded Folders
- 4.1.2 Patterns
- 4.1.2.1 Adding a Pattern
- 4.1.2.2 Removing a Pattern
- 4.1.1 Exclude folders
- 4.2 Item Filters
- 4.2.1 File types
- 4.2.2 Minimum file size
- 4.2.3 Maximum file size
- 4.3 Content Extraction Limit
- 4.4 OCR
- 4.5 Hidden items
- 4.6 Shared items
- 4.1 Path Filters
- 5 Schedule
- 5.1 Starting on
- 5.2 Scan on
- 5.3 Scan every
- 5.3.1 First scan start time
- 5.3.2 Scan stop time
- 5.3.3 Maximum number of daily scans
- 5.4 At a specific time
- 5.4.1 Scan at
- 6 Review
Overview
You can add a Discover scan using the Add scan button on the top of the Scans page. This opens the Discover Scan Setup. You will simply work your way through the Discover scan setup wizard to configure the scan. This includes specifying the location you want to scan, selecting the templates you want the scan to use, and designating the schedule for the scan.
Data Source
The first step is in the Discover Scan Setup is to select if you will be using a location or document set for the scan. A location allows you to select a connection and path to scan. A document set allows you to select a saved scan result to rescan. Expand the section below based on the option you want to select to view the instructions.
Discover Categories
This step allows you to select the standard templates you want to add to the Discover scan. For locations, File Inventory is always included in the Discover scan and cannot be turned off. However, you can choose to add additional categories to the scan to meet your needs. For document sets, File Inventory will not be included in the scan since all the items in the document set will already have been run against the File Inventory template, and scanning them against it again would be redundant.
Each template provides a speed and scope key to help you understand the depth of analysis the template provides and how that will impact the speed of the Discover scan. Deeper scans will run slower but reveal more information or refined classifications. Keep this in mind when building your scan.
If a category is unavailable for selecting, the supporting entity type is not installed in your DryvIQ Platform.
The Miscellaneous category contains all the custom entity types created in your DryvIQ Platform. If you want to add a custom entity type to the Discover scan, you should add this template to the scan. You can then use the configuration option for the Miscellaneous template to select just the specific custom entity type that you want to add to the scan you are creating.
Click to select the box in front of the template name. A check mark appear in the box to indicate it is selected.
Selecting a template selects all available categories in that template. DryvIQ recommends leaving the default categories for each template. However, if your use case requires a specific set of categories, you can edit the template categories as needed. The exception is the File Inventory template, which cannot be edited. To edit a template, click the gear icon on the template card.
The corresponding modal opens and displays the categories for the selected template. Expand a category to view the included entity types. Clear the box for an entity type you do not want to include in the template.
Click OK when you are finished adjusting the categories.
Repeat steps 1-4 for each template you want to select.
Click Next to advance to the Schedule step.
If you selected more than one template, you will receive a warning that selecting multiple templates may impact your scan speed. Click Cancel to edit your selections or OK to continue with the current selections.
Settings
The Settings step allows you to customize the scan by adding filters based on path, file type, files size, etc. You can even specify how much of a file gets scanned and enable optical character recognition (OCR) if you want PDFs and images to be scanned for text. There are even advanced options to ignore hidden and shared items to improve scan results.
Path Filters
The Path Filters allow you to specify specific paths to exclude and patters that should be included or excluded in the scan. These filters are only available for scans that use locations and are not available for scans that use document sets.
Exclude folders
The Exclude folders filter allows you specify folders under the selected data source to exclude from the scan. This is useful if you know there are folders in the selected data source that contain files that do not need to be scanned. For example, if you are performing a scan for sensitive content such as PII and have a folder that contains purchased stock images used for marketing purposes, you can choose to exclude that folder since the files would not need to be scanned. This improves scan performance and outcomes since there will be less superfluous results to review.
Selecting Folders to Exclude
Click Add.
The Path window displays all the folders under the selected data source for the scan. Select the folder you want to filter. Click the Load More link to load additional folders in the list as needed. You can drill into each folder by selecting the right arrow that displays to the right of the folder name. You can also manually enter the path for the folder you want to exclude using the Manually enter a path button.
Click OK once you have selected the folder.
Repeat these steps for each folder you want to exclude from the scan.
Removing Excluded Folders
You can clear an excluded folder selection by clicking the X on the box for the folder. Alternately, you can click Clear to clear all the excluded folder selections.
Patterns
This filter allows you to filter files and/or folders based on the name pattern. An asterisk can be used for exact matches or for prefix or suffix matches. For example, “*txt” would filter all .txt extensions. You can also use asterisks to surround a pattern to filter file and/or file names. For example, using “*test*” would filter all names that contain “test.”
Adding a Pattern
Click Add. The Add patter filter modal appears.
Use the Filter type list to select if you want to include or exclude the specified pattern.
Use the target list to select if the pattern applies to folders (containers), files (items), or both.
Type the pattern you want to use in the Pattern field.
Click OK.
Repeat these steps for each pattern you want to use for the scan.
Removing a Pattern
You can remove a pattern by clicking the X on the pattern box. Alternately, you can click Clear to remove all patterns.
Item Filters
The Item Filters allow you to include only specific file types in the scan or to exclude specific file types from the scan. This is also where you can set the minimum and maximum file sizes that determine which files will be included in the scan. You can filter all files greater than or less than a specified size. You can also use a combination of both. Files that fall outside the size range set will be skipped. These filters are only available for scans that use locations and are not available for scans that use document sets.
File types
This filter allows you filter specific file types. Select if you want include or exclude the file type that will be selected in the next field. Then, select the file type you want to filter. You can select multiple file types if needed. The filter options are temporary files, executables, movies, audio files, images, documents, and Windows OS/DB files. Click a selected file type to clear the selection as needed. Click the down arrow on the list or anywhere outside of the list to close it.
Minimum file size
Set the minimum file size for files to be included in a scan. Type the numeric value in the first field and select the size unit from the list. Leave the field blank if you do not want to set a minimum value. Make sure that the value entered is smaller than the maximum value set. The Next button will be unavailable if the minimum and maximum file sizes conflict.
Maximum file size
Set the maximum file size for files to be included in a scan. Type the numeric value in the first field and select the size unit from the list. Leave the field blank if you do not want to set a maximum value. Make sure that the value entered is larger than the minimum value set. The Next button will be unavailable if the minimum and maximum file sizes conflict.
Content Extraction Limit
Content Extraction refers to how content within a file is scanned to discover different entities. This setting determines the maximum bytes to scan per file. By default, the maximum bytes scanned is 1 MB. Increase or decrease the limit as preferred. Scans with a higher extraction limit scan a larger amount of text and provide a deeper scope but run slower. Scans with a lower extraction limit run faster but do not provide as deep of an analysis. This setting is only applicable when using templates that scan a file’s contents (File Categories, Sensitive Data Detection, and Miscellaneous templates).
OCR
Optical character recognition (OCR) allows DryvIQ to scan image files and PDF files for text. This is useful setting if your company has a lot of scanned documents. Enabling OCR provides a deeper scope for a scan, but the scan will run slower. Scans without OCR enabled will run faster but do not provide analysis of text in image or PDF files.
Enabling OCR is only useful when using templates that scan a file’s contents (File Categories, Sensitive Data Detection, and Miscellaneous templates). All text processing is limited by the content extraction limit, so even when OCR is enabled, that limit applies. Also, applying filters that exclude scanning image and PDF files will make the OCR setting obsolete.
Hidden items
Hidden items are commonly created by utilities for storing user preferences and rarely need to be scanned. This setting is enabled by default for scans that use a location as the data source. When enabled, the scan will ignore hidden items. If you want to scan for hidden items, disable this setting. This setting is only available for scans that use locations and is not available for scans that use document sets.
Shared items
Shared items are items available to an account but that are not owned by it. This setting is enabled by default for scans that use a location as the data source. When enabled, the scan will ignore shared items; only items owned by an account will be scanned. This ensures items are not scanned multiple times when shared across accounts. If you want to scan shared items, disable this option. This setting is only available for scans that use locations and is not available for scans that use document sets.
Schedule
The Schedule step allows you to assign a schedule to the scan. If no schedule is assigned, the scan must be run manually. If enabled, the scan will run automatically based on the defined schedule. There are multiple settings available to customize the scan schedule.
Select the Enable schedule.
Complete the schedule fields based on the schedule you want to use. Expand the Scan Schedule Options below to learn about each schedule option.
Click Next to advance to the Review step.
Review
The Review page displays all the configuration options selected for the scan so you can review the settings and make edits as needed. You will also assign a name to your scan on this page, which is a required step before saving the scan.
Type the name you want to assign the scan in the Name field and click Done to save the name.
For document sets, the selected document set will be renamed with the scan name entered here and will be locked from further editing. The new name will be displayed on the Document sets tab in the Results saved views.The rest of the page displays the configurations selected for each step of the Discover Scan Setup. Review the information. If you want to change anything, click Edit next to the section heading (or click the heading in the menu on the left).
This will take you back to the corresponding page so you can make the necessary edits. Click Next to advance through the remaining setup pages or click Review in the left menu to return to the Review page.
Once you are finished reviewing the setup, save the scan.
If you selected a location for you scan, you are presented with two options:
Save scan: This saves the scan with no further action. The scan will follow the schedule that has been set. If no schedule has been set, the scan must be triggered manually.
Note that the primary scan for mapped locations and document sets will run as soon as the scan is saved to create the child scans. However, the child scans will not run once the primary scan is complete. The child scans will run according to the schedule set for the scan. If no schedule has been set, the scans will need to be run manually.Save scan and run it now: This saves the scan and triggers the scan to run. Once complete, the scan will then follow the schedule that has been set for the scan. If no schedule has been set, subsequent scans must be triggered manually.
The scan is added to the All Scans page.