Collector Basics

Overview

A DataBlend collector defines how data should be pulled from an external system like Salesforce or FTP into DataBlend. It links how to access the external system (DataBlend credential), what data should be collected, and where the data should be stored (DataBlend data source and schema).

A collection is a specific execution of a collector. If the collection succeeds, then it creates a stream under the configured data source and schema. If it fails, an error will be reported in the collection log. Creating a schema with fields is not necessary before creating a collector. The collector will populate the schema.

Streams are created by collections, but exist independently of them; the collector configuration can be changed or the collection history purged without impacting the collected data.

Running a Collector

Collectors are used to pull data from source systems. Each collector is tied to a Credential. 

To set up a new Collector, click the Add button

Select the type of system to collect data from (must already have a credential set up to make the connection).

Depending on the Collector type, users will be able to specify schemas, fields, filters and/or initial queries that users would like to collect from the source system. Options vary based on Collector type.

Parameters

Many collectors can filter data collected. For example the QuickBooks Online Profit and Loss Detail collector accepts a date range and returns only transactions that have a date within range. If the filter values will not change, it is appropriate to set them directly in the collector configuration. However, in other situations, the filter value may change over time (the start of the fiscal quarter) or may need to be set through a workflow (multiple collectors will be run in sequence for the same set of accounts or for the same date range). In these situations, users should set up collection parameters.

Custom Relative Date

Calendar Type

“Default” calendar type lets users define an offset of a period from a certain point in time and additionally the start or end of that period and how the result is formatted.

The “Fiscal” calendar limits the period type to only Quarters and Years and the year’s start date is based of the group settings.

Period Type

Options are Hour, Day, Week, Month, Quarter and Year

Range Type

Options are Default, Start Of, and End Of. Default does not change the base date and will not truncate any values. This is useful if the parameter should be exactly one week ago instead of the beginning of last week.

Offset

Determines how many periods should be added or subtracted from the base date.

As Of

Allows the user to change the base date, otherwise the time (UTC) the job is run is used.

Format

Useful in scripts to format the date according to dotnet standards.

String

A string is a parameter that is useful for characters, text, numbers, or symbols.

Date

Date parameters provide users the ability to collect data within a specific window of time. The dates are entered as specific dates such as Month/Day/Year.

Encrypted Value

Encrypted values are useful to pass encrypted information from Query to Data Target. Please note that values will be stored encrypted and not be able to be decrypted by non-admins. Parameters must be used with the notation. For example, if a header value in the http data target is an API key that a user wants encrypted they can add a parameter.

Query Result

A Query Result parameter is useful to replace a query result with a specific value. Users must select a specific Query from the drop-down menu from which to insert the intended value.

Relative Date

Relative Date parameters provide users the ability to collect data within a relative window of time. The dates are entered as within a wide variety of timeframes such as start of the first quarter and end of the last quarter.

Boolean

A Boolean Parameter is useful for users wishing to utilize True, False, or NULL values.

Advanced

Field

Required/ Optional

Comments

Schema Update Type

Required

Add New Columns (formerly Add Only): Existing schema columns are preserved regardless of whether they exist in the first record collected. New columns identified in the first record collected are added to the schema. If the collection returns no records, the existing schema is unchanged.

Recreate Columns (formerly Auto): Default. Schema is recreated from the first record collected. If the collection returns no records, the existing schema is unchanged.

Preserve Columns (formerly Manual): No changes are made to the schema during collection.

 

If the schema is changed by the file upload, data in previous streams may no longer be available. (Columns may “go missing”.) When in doubt, set to Add New Columns.

History Retention (Days)

Optional

Default set as zero. Users may set the days they wish their collector data to be stored.

Timeout (seconds)

Optional

The Timeout section allows users to determine if they would like to timeout collections taking longer than a set number of seconds to collect data.

Run As

Required

Run As allows users to select from a drop-down list of users to run the Workflow.

Schedule

Optional

The Schedule option is a convenient way for users to make sure collections are running at the desired time. Simply select from the presets menu provided.

 

Is Paused

The Is Paused Toggle allows users to enable or disable a schedule. The toggle default is “false”. If the toggle is enabled (“true”) the schedule is paused.

Details

The Details section documents who the collector was created and updated by and the corresponding times. This allows for easy tracking of multiple collectors.

Logs

Job logs are easily accessible via the latest run section at the top of the page. Click the linked timestamp and the user is taken to the Details section. Here users view items, details and logs related to the ran job. Logs are downloadable via the download log button indicated at the lower left of the log section. Logs are useful to see how much data was collected, the steps taken, and the time at which it occurred.

Latest Run

The latest run section documents when the Collector was created, started, completed and the total amount of data scanned. The status includes information regarding the state of the Collector. This allows for easy tracking of multiple collections.

Creating a Favorite

Creating a favorite is simple. Users may favorite a Credential, Collector, Data Target, Query. Data Source, or Workflow. To create a favorite, users navigate to the star icon on the upper left next to Edit.

Please note that users cannot favorite an Unpivot, Data Quality Report, Schema, Agent or Notification.

Saved Views

Saved views are a unique feature offered by DataBlend that allow users to quickly view filtered searches. Setting a saved view is simple. Click the gear icon in the upper right corner. A drop-down will appear with option to save the current view, restore the default view, or copy share URL. Copying a Share URL will allow other users with the URL to view the same saved view.

 

Want to see more? Visit our helpful demo page or attend an office hour. Demos