Setting up a Data Source for a Delimited Input

Adding a Delimited Input file to a File data source.

 

Using the main menu, navigate to the Data dropdown and select Sources.

 

Determine the data source grouping you will add your new file to or create a new one. Click the Add Input (+) button at the top of the data source then select the 'Add Delimited Input' option.

 

You’ll be presented with an editor. It lists a set of fields that you will fill out to define the file that you are loading and how it is structured. Some fields have default values applied, a full list of fields and their meanings are provided in this article.

 

For our example, we are going to write a load definition for a basic mapping file:

 

Here is an overview of the fields we will fill in from the top section.

  • fileNamePattern: the name of the file that will be loaded as a full/destructive file (completely replacing our previous dataset). We will use Custom Location Mappings%.csv
    The % acts as a wildcard that allows the filename to include other characters (such as a date/timestamp) so it does not have to be an exact match. Note that files are matched in a case insensitive manner so capitalisation is not important.

  • incrementalFileNamePattern: we will leave blank as this is for incremental files only. If we were setting up an incremental file, we would need to also specify the incrementalFileNamePattern and isKey and keyOrder on all of the columns that are part of the incremental key.

  • schema: defines the schema that the data is loaded into in OneModel's database. We recommend grouping files into schemas logically and to have separate schemas for different data sources. This makes data easier to find later when logic is added to the site.

  • table: defines the table that the data is loaded into in OneModel's database. Again, we recommend naming this in a clear manner so it is easy to find when logic is added later in OneModel.

  • delimiter: the character used as the column delimiter in your file.

  • encoding: the encoding on the file, the default is fine if in doubt.

  • fileNameColumn: defines a custom column name that the tool will store the original filename in within our table. This can be helpful for incremental loads to see what file a record has been loaded in. Setting to something such as 'filename' is recommended if you wish to set this up. We won't use this for our mapping file.

  • fileNameColumnIsKey: A different option for incremental file loads where the data is loaded incrementally based on the file name instead of/as well as key columns. We won't be using this.

  • required: true or false. We won't be using this so the default of true is fine.

  • hasHeaderRows: true or false depending on whether the file has at least one header row.

  • headerRowsCount: the number of header rows the file has. The tool will skip these rows at the start of the file and not load them into our table since they are not data.

  • quoteCharacter and escapeCharacter: We can process files that have new lines in the row as long as the field is delimited with the appropriate quotes using a standard csv format. Double quotes are most common and this is what we'll set our file up. Although our file is not quoted, we must enter a value in these fields.

  • ignoreQuotes: true or false, we'll set to false which is recommended unless there is a specific need identified.

  • skipInvalidRows: true or false. We recommend this is set to false unless there is a specific need identified. false means the Data Load will error on site and not load any data from the file if there is an issue with the file, this means we don't lose our prior data in the case of a broken file. If set to true, the tool will skip the errored records and load the valid records as normal.

Here is what we end up with for our mapping file:

fileNamePattern: Custom Location Mappings%
incrementalFileNamePattern:
schema: mapping
table: location
delimiter: ','
encoding: Utf8
fileNameColumn:
fileNameColumnIsKey: false
required: true
hasHeaderRows: true
headerRowsCount: 1
quoteCharacter: '"'
escapeCharacter: '"'
ignoreQuotes: false
skipInvalidRows: false

 

Next we need to setup the columns that the file is expecting to have in it. Each column in the file must be defined in the column list and in the same order as the file. The column names between the two do not have to match as long as it correctly represents the data in the file. Column names in your data source must be lower case, start with a letter and contain only letters, numbers and underscores (_).


Our mapping file only has strings but here are some examples of different data types.

 

For string fields we need to set these as a minimum:

- name: objid
  type: String
  size: 20

 

For integers fields we need to set these as a minimum:

- name: divgv
  type: Integer

 

For numeric fields we need to set these as a minimum:

- name: divgv
  type: Decimal
  precision: 18
  scale: 4

 

For date or timestamp fields we need to set these as a minimum:

- name: begda
  type: Timestamp
  dateFormat:
  - yyyyMMdd
  - yyyy-MM-dd

For details on setting the correct date format or timestamp format, please see the help article on it here.

 

Once you have all columns set up, press the Save button at the top of the page. If anything is configured incorrectly it will fail to save and will display an error message at the top of the script. Once saved, if you reload the script you might see some additional fields are present with blank values, this is expected.

 

Our final data source script ready to save will look like this:

 

fileNamePattern: Custom Location Mappings%
incrementalFileNamePattern:
schema: mapping
table: location
delimiter: ','
encoding: Utf8
fileNameColumn:
fileNameColumnIsKey: false
required: true
hasHeaderRows: true
headerRowsCount: 1
quoteCharacter: '"'
escapeCharacter: '"'
ignoreQuotes: false
skipInvalidRows: false
columns:
- name: level_1
type: string
size: 100
- name: level_2
type: string
size: 100

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.