Overview
A DataBlend collector defines how data should be pulled from an external system like Salesforce or FTP into DataBlend. It links how to access the external system (DataBlend credential), what data should be collected, and where the data should be stored (DataBlend data source and schema).
A collection is a specific execution of a collector. If the collection succeeds, then it creates a stream under the configured data source and schema. If it fails, an error will be reported in the collection log. Creating a schema with fields is not necessary before creating a collector. The collector will populate the schema.
Streams are created by collections, but exist independently of them; the collector configuration can be changed or the collection history purged without impacting the collected data.
Running a Collector
Collectors are used to pull data from source systems. Each collector is tied to a Credential.
To set up a new Collector, click the Add button
Select the type of system to collect data from (must already have a credential set up to make the connection).
Depending on the Collector type, users will be able to specify schemas, fields, filters and/or initial queries that users would like to collect from the source system. Options vary based on Collector type.
Parameters
Many collectors can filter data collected. For example the QuickBooks Online Profit and Loss Detail collector accepts a date range and returns only transactions that have a date within range. If the filter values will not change, it is appropriate to set them directly in the collector configuration. However, in other situations, the filter value may change over time (the start of the fiscal quarter) or may need to be set through a workflow (multiple collectors will be run in sequence for the same set of accounts or for the same date range). In these situations, users should set up collection parameters.
Custom Relative Date
Calendar Type
“Default” calendar type lets users define an offset of a period from a certain point in time and additionally the start or end of that period and how the result is formatted.
The “Fiscal” calendar limits the period type to only Quarters and Years and the year’s start date is based of the group settings.
Period Type
Options are Hour, Day, Week, Month, Quarter and Year
Range Type
Options are Default, Start Of, and End Of. Default does not change the base date and will not truncate any values. This is useful if the parameter should be exactly one week ago instead of the beginning of last week.
Offset
Determines how many periods should be added or subtracted from the base date.
As Of
Allows the user to change the base date, otherwise the time (UTC) the job is run is used.
Format
Useful in scripts to format the date according to dotnet standards.
String
A string is a parameter that is useful for characters, text, numbers, or symbols.
Date
Date parameters provide users the ability to collect data within a specific window of time. The dates are entered as specific dates such as Month/Day/Year.
Encrypted Value
Encrypted values are useful to pass encrypted information from Query to Data Target. Please note that values will be stored encrypted and not be able to be decrypted by non-admins. Parameters must be used with the notation. For example, if a header value in the http data target is an API key that a user wants encrypted they can add a parameter.
Query Result
A Query Result parameter is useful to replace a query result with a specific value. Users must select a specific Query from the drop-down menu from which to insert the intended value.
Relative Date
Relative Date parameters provide users the ability to collect data within a relative window of time. The dates are entered as within a wide variety of timeframes such as start of the first quarter and end of the last quarter.
Boolean
A Boolean Parameter is useful for users wishing to utilize True, False, or NULL values.
Advanced
Field |
Required/ Optional |
Comments |
Schema Update Type |
Required |
Add New Columns (formerly Add Only): Existing schema columns are preserved regardless of whether they exist in the first record collected. New columns identified in the first record collected are added to the schema. If the collection returns no records, the existing schema is unchanged. Recreate Columns (formerly Auto): Default. Schema is recreated from the first record collected. If the collection returns no records, the existing schema is unchanged. Preserve Columns (formerly Manual): No changes are made to the schema during collection. If the schema is changed by the file upload, data in previous streams may no longer be available. (Columns may “go missing”.) When in doubt, set to Add New Columns. |
History Retention (Days) |
Optional |
Default set as zero. Users may set the days they wish their collector data to be stored. |
Timeout (seconds) |
Optional |
The Timeout section allows users to determine if they would like to timeout collections taking longer than a set number of seconds to collect data. |
Run As |
Required |
Run As allows users to select from a drop-down list of users to run the Workflow. |
Schedule |
Optional |
The Schedule option is a convenient way for users to make sure collections are running at the desired time. Simply select from the presets menu provided. |
Is Paused
The Is Paused Toggle allows users to enable or disable a schedule. The toggle default is “false”. If the toggle is enabled (“true”) the schedule is paused.
Details
The Details section documents who the collector was created and updated by and the corresponding times. This allows for easy tracking of multiple collectors.
Logs
Latest Run
The latest run section documents when the Collector was created, started, completed and the total amount of data scanned. The status includes information regarding the state of the Collector. This allows for easy tracking of multiple collections.
Creating a Favorite
Creating a favorite is simple. Users may favorite a Credential, Collector, Data Target, Query. Data Source, or Workflow. To create a favorite, users navigate to the star icon on the upper left next to Edit.
Please note that users cannot favorite an Unpivot, Data Quality Report, Schema, Agent or Notification.
Saved Views
Saved views are a unique feature offered by DataBlend that allow users to quickly view filtered searches. Setting a saved view is simple. Click the gear icon in the upper right corner. A drop-down will appear with option to save the current view, restore the default view, or copy share URL. Copying a Share URL will allow other users with the URL to view the same saved view.
Want to see more? Visit our helpful demo page or attend an office hour. Demos