CSV File Connector Configuration Guide

  • Updated

This is a general guide to configuration of a CSV Connector for a custom data source. CSV Connector configuration guides for specific data sources may also be useful references.

 

The general prerequisites for building a One Model custom integration with any application that has an API are:

  • The application must have a public API.
  • You should have the various URL’s (base and endpoint/files), token URL/s and scope/s (if applicable) as required for the API.  This is typically found in the API documentation published by the applications owner.
  • You know what type of authentication the API requires; Basic, Bearer Token, OAuth or Header Token - and have the required usernames and secret keys/ passwords for that authentication type.
  • You know what CSV files are available, how these are organized and what data is available from them.
  • (if any) Requirements/ expectations of the data source when requesting data from the API.
  • Understanding of how/ in what format the data source will return the requested data.

Most data sources will have documentation available that guide their users through describing the type of authentication used by the API and setting up the access credentials as well as information on where to access the API and its endpoints, any information that needs to be passed to the data source when retrieving data from the data source, and how the queried data will be returned.  Having this documentation available when configuring a connector can be very helpful.

Be selective about which endpoints/ files are needed from the API as the data retrieved from one endpoint may make another unnecessary.  Eg: if an endpoint provides a list of all departments and their details (/departments), it may be redundant to include an endpoint that queries for each department individually (/departments/{departmentid}).  The API documentation can help with determining and comparing what is retrieved from an endpoint/ file.

Please note that for security reasons, any Password, Client Secret and/ or Token (depending on the method of authentication) will need to be entered each time the connector is created or modified.

 


Connector Details

 
NameThe label that will be used to identify this data source, choose one that is meaningful and helps with easy identification of the data source.
Data Loads Should Process Data

Enable this option to trigger processing (eg: custom SQL, processing script and cache warming) after data has been loaded into the database.

This will default to On, however it may be preferable to leave this option Off until the data source configuration is finished and tested.

Enable Debug Mode

This option allows for logging of the API request and response.

No change to default configuration required.  In the event that debug mode is needed, One Model will organize the activation of debug mode and the retrieval of the logs.

Restricted Data

This option allows for restricting the downloading of the data files produced by this data source, it is typically used with sensitive data to ensure that it cannot be accessed prior to processing and aggregating (eg: survey data).

In the event that restricting data is required, One Model will organize the activation of restricted data.

Message Queue Capacity

Sets the maximum number of Simple Queue Service (SQS) messages that can be processed at a time by the SQS queue of the API service.

No change to default configuration required.  In the event that a change to the message queue is needed, One Model will manage the appropriate capacity.

SchemaThe label for the schema that will hold the data retrieved for this data source, choose one that is meaningful and helps with easy identification of the data source when querying the data within it via SQL Explorer or in the Processing Script.
Processed ToNo configuration required, this will auto-populate with the most recent date the data source was run.
File UriThe URL of the APIs hostname for your data source. Refer to your API’s documentation to find the URL; typically it’s found as the “Base URL”, “Root Endpoint” or “Host”.  Note that (if applicable), a production environment API hostname may have a similar URL to a test environment API.

 


Authentication Types

Basic Key or User + Password Credentials
UsernameThe user identifier that has access to the API.  If the data source has provided a key for authentication, then this key can be entered here.
PasswordThe password or secret key for the user identifier.  If the data source has provided a key for authentication, the key should be entered into the Username and Password can be any string (eg: xx).
Bearer Token Digital key credentials
Bearer TokenAn alphanumeric string, as provided by the data source.  Bearer tokens can expire, requiring a new token to be created.
OAuth Client Credential Client ID/Secret are traded for a temporary token
Access Token URLThe URL where the Client ID and Client Secret are sent to get the access token.  May also be referred to as an: OAuth Endpoint, Access Token, Token Request Endpoint, Identity Provider (IdP) or may be found as a POST /token.  The Access Token URL is almost always different to the File Uri.
Client IDThe user identifier that has access to the API.
Client SecretThe secret key for the user identifier.
Scopes

A list of the permissions granted to the Client, that will be used by the connector.  If a permission is listed here that has not been granted to the Client, the connector will be rejected by the API.  Refer to your API documentation for the appropriate way to format and separate the scopes.

Scopes may be left empty, however some API’s require scopes and will reject the connector if scopes are not listed.

OAuth Static Refresh Token A ‘master key’ used to generate access tokens.
Access Token URLThe URL where the Client ID, Client Secret and Refresh Token are sent to get the access token.  May also be referred to as an: OAuth Endpoint, Access Token, Token Request Endpoint, Identity Provider (IdP) or may be found as a POST /token.  The Access Token URL is almost always different to the File Uri.
Client IDThe user identifier that has access to the API.
Client SecretThe secret key for the user identifier.
Scopes

A list of the permissions granted to the Client, that will be used by the connector.  If a permission is listed here that has not been granted to the Client, the connector will be rejected by the API.  Refer to your API documentation for the appropriate way to format and separate the scopes.

Scopes may be left empty, however some API’s require scopes and will reject the connector if scopes are not listed.

Refresh TokenA secret key that rarely, if ever, changes.  This key is used in conjunction with the Client ID and Client Secret for access to the API.
Header Token A key used in the API request header.
Header NameThe name for the header token, defined by the application that provides the API.  Typically found in the applications API documentation.
Header TokenThe secret key.

 


Files

 
NameA label for the file being retrieved, choose one that is meaningful and helps with easy identification of the file.
Relative URIThe path to the file.  This will be added to the API’s base URL, so there is no need to include the full URL string.
Table NameA label for the table that will hold the files data, choose one that is meaningful and helps with easy identification of the files data when querying it via SQL Explorer or in the Processing Script.
DelimiterThe character that is used to separate the files contents into columns.  A comma, pipe or a tab space tend to be most commonly used.
Has HeaderWhether the file being retrieved has a header row.  Enabling this option will direct the connector to use the first row for column names.  Column names in other rows is not supported.
Quote All FieldsWhether or not you’d like the column data encased in quotes.  If there are column/s in the file that are already quoted, they will not be added again.

Media Type

Media types allow the connector to set the appropriate method of data transfer when retrieving the file.

NoneChoose “none” if you are unsure of the file's media/ content type - the file will be processed as a CSV.
Binary FileFor files that are of content type “application/octet-stream”.
CSVFor files that are of content type “text/csv”.
Plain TextFor files that are of content type “text/plain”.

 


Running the Data Source

Typically CSV connectors should use a destructive API run, which will remove and replace any data retrieved from previous runs.  Depending on the data source, retrieving historical data may be captured with a destructive run - however if this is not available, please discuss with Customer Success options to merge datasets and build this history.

 

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.