File Upload Connector Configuration Guide

  • Updated

This article covers how to configure the File Upload connector in One Model to ingest delimited, JSON, or XML files. It is intended for Data Admins setting up a new file-based data source. By the end you will have created a data source, configured a file input, and saved a working YAML file definition ready for data loading.

 

Before you start setting up a connector ensure:

  • You have Data Admin permissions in One Model
  • You can access Data > Sources from the One Model menu
  • You know the details of the file you wish to upload: type of file, column delimiters, data formats

The File Upload method is used to complete once-off imports of simple files (uploaded manually by a user), or transferred automatically via One Model's hosted SFTP server. When using One Model hosted SFTP, customers push files to the One Model SFTP server, which monitors for new files and retrieves them automatically. SFTP users (for SFTP files) and file definitions are configured within the data source.

 

Creating a File Upload Data Source

From Data > Data Sources, click Add Data Source:

 

Select the One Model File Upload / SFTP connector:

 

Complete the data source configuration and click Add:

Configuring the Data Source
Name The label that will be used to identify this data source. One data source can contain multiple inputs - choose a name that is meaningful and helps with easy identification of the group of inputs included in the data source.
Data loads should process data Enable this option to trigger processing (e.g. custom SQL, processing script and cache warming) after data has been loaded into the database. This will default to On, however it may be preferable to leave this option Off until the data source configuration is finished and tested.
Expire data load if files missing after (minutes) The data load will expire this many minutes after the last file was received if not all expected files for the source have been included.
Use batching Batches data together to improve the performance of loading a large number of files into a single table. Enable this if you have multiple files that are all intended for the same table destination - e.g. Absence-Jan, Absence-Feb, Absence-Mar files all loading into the same Absence table.

Your new File Upload data source will now be available in the Data Sources list. It can be managed via its options:

  1. Upload file/s to the data source
  2. Add/configure SFTP Users for the data source
  3. Schedule when files are expected (note: this can also be used to configure Alerts)
  4. Configure the data source
  5. Add File Input
  6. Delete the data source (note: this will also delete all file inputs)

 


File Input Types and YAML Configuration Overview

The File Upload connector accepts three types of files: delimited, JSON, and XML. A YAML editor is used to configure the file definitions (column names, data types, and format). YAML is a plain-text format that uses indented key-value pairs to define settings. Each field appears as fieldName: value. YAML is case-sensitive and uses consistent indentation (spaces, not tabs) to structure the configuration. If there are any syntax errors, the editor will not save and will display an error message at the top of the script. Some fields will have default values applied and once saved, if you reload the script you might see some additional fields are present with blank values - this is expected.

The configuration fields common to all delimited, JSON, and XML file inputs are listed below. Refer to each file type section for further details on their specific YAML configuration.

Configuration for Delimited, JSON and XML File Inputs
fileNamePattern The name of the file that will be loaded as a full/destructive file (completely replacing the previous dataset), e.g. Original Locations Mappings%.csv. The % acts as a wildcard that allows the filename to include other characters (such as a date/timestamp) so it does not have to be an exact match. Files are matched in a case-insensitive manner so capitalisation is not important.
incrementalFile NamePattern The name of the file that will be loaded incrementally (if applicable), e.g. New Locations Mappings%. Use for incremental files only. When incremental files are loaded, their contents will be appended to any data that currently exists in the defined table. Ensure any incremental file name patterns don't conflict with the defined fileNamePattern, in particular where wildcards are used. Incremental file data must be formatted in the same way (same columns in the same order) as the main file. filenamecolumn and filenamecolumniskey are also required where incremental files are used.
schema The name of the schema that the data is loaded into in One Model's database. We recommend grouping files into schemas logically and having separate schemas for different data sources. This makes data easier to find later when building data processing scripts and querying data via SQL Explorer.
filenamecolumn The custom column name that the tool will store the input filename in within the table. This can be helpful for incremental loads to see what file a record has been loaded in. The name filename is often used, however any name (that has not been used anywhere else in the definition) may be used. Required where incrementalFileNamePattern has been defined.
filenamecolumniskey A true/false option used with incremental file loads where the input filename (from the filenamecolumn) is used as (or part of) the data key. Required where incrementalFileNamePattern has been defined.
required Applicable where the data source has enabled Data loads should process data. If set to true, the file input needs to be successfully processed before beginning a process data load.

 


How to add and configure a delimited file input

Go to the data source grouping you will add your new file to and click + then Add Delimited Input:

 

Configure the file definition of the delimited input in the editor and click Save:

 

Here is an overview of the fields for the delimited file definition:

Configuration for Delimited File Inputs
table Defines the table that the data is loaded into in One Model's database. Using concise, descriptive names helps when using the data in data processing scripting or when querying the data in SQL Explorer.
delimiter The character used as the column delimiter in the input file, in single or double quotes. Use \t for tab delimiters.
encoding The character encoding of the file. Leave at the default (Utf8) if unsure.
hasHeaderRows Use true or false depending on whether the file has at least one header row.
headerRowsCount The number of header rows the file has. The tool will skip these rows at the start of the file and not load them into the table.
quoteCharacter The character used to wrap field values that contain the delimiter character, line breaks, or leading and trailing spaces. The most common quote character is a double quote ("). When a field is wrapped in the quote character, the connector reads everything between the opening and closing quote as a single value - including any delimiters or line breaks within it. Enter the character in single quotes, e.g. '"'.
escapeCharacter The character used to indicate that the character immediately following it should be treated as a literal value, not as a special character. For example, if your file uses a double quote as both the quote character and the escape character, a pair of double quotes ("") within a field tells the connector to read one literal double quote, not the end of the field. In most standard CSV files the escape character is the same as the quote character. Enter the character in single quotes, e.g. '"'.
ignoreQuotes Use true or false if there are quote characters in the data that should be ignored.
skipInvalidRows Use true or false to indicate whether the data load should ignore rows that do not meet the configuration definitions. We recommend setting this to false so that if there is an issue with the file input, the data load will error and not load any data from the file - retaining prior data in the case of a broken file. If set to true, the tool will skip the errored records and load the valid records as normal.
rowDelimiter The character used as the row delimiter in the input file, in single or double quotes.
ignoreFooterRows Default to false. Use true if the file has at least one footer row that should be skipped and not loaded into the table.
footerRowsCount The number of footer rows to skip.

 

Here is an example of the YAML file definition for a comma delimited file that has one header row, no footer rows and includes quotes on columns that contain the delimiter and/or line breaks:

fileNamePattern: Original Location Mappings%
incrementalFileNamePattern: New Location Mappings%
schema: mapping
table: location
delimiter: ','
encoding: Utf8
fileNameColumn: filename
fileNameColumnIsKey: true
required: true
hasHeaderRows: true
headerRowsCount: 1
quoteCharacter: '"'
escapeCharacter: '"'
ignoreQuotes: false
skipInvalidRows: false
rowDelimiter:
ignoreFooterRows: false
footerRowsCount: 0

 

Next set up the definitions for the columns that the file is expecting to have in it. Each column in the file must be defined in the column list and in the same order as the file. The column names between the YAML configuration and the data file do not have to match as long as it correctly represents the data in the file. Column names in the definition must be lower case, start with a letter and contain only letters, numbers and underscores (_). As with table names, the column names set in the YAML configuration will be used in the data processing and when querying the data via SQL Query Explorer, it helps to be concise and descriptive.

For string fields, set these as a minimum:

- name: objid
  type: String
  size: 20

For integer fields, set these as a minimum:

- name: divgv
  type: Integer

For decimal fields, set these as a minimum:

- name: divgv
  type: Decimal
  precision: 18
  scale: 4

For date or timestamp fields, set these as a minimum:

- name: begda
  type: Timestamp
  dateFormat:
  - yyyyMMdd
  - yyyy-MM-dd

Additional details on setting the correct date and timestamp format can be found in File Source - Delimited Input Date and Timestamp columns.

The final data source script ready to save will combine the file and column definitions eg:

fileNamePattern: Original Location Mappings%
incrementalFileNamePattern: New Location Mappings%
schema: mapping
table: location
delimiter: ','
encoding: Utf8
fileNameColumn: filename
fileNameColumnIsKey: true
required: true
hasHeaderRows: true
headerRowsCount: 1
quoteCharacter: '"'
escapeCharacter: '"'
ignoreQuotes: false
skipInvalidRows: false
rowDelimiter:
ignoreFooterRows: false
footerRowsCount: 0
columns:
- name: level_1
  type: String
  size: 100
- name: level_2
  type: String
  size: 100

 


How to add and configure a JSON file input

Go to the data source grouping you will add your new file to and click + then Add JSON Input:

 

Configure the file definition of the JSON input in the editor and click Save (note: the configuration will not save if there are syntax issues with the YAML configuration):

 

The JSON file input uses the same common YAML fields (fileNamePattern, schema, and others) described in the overview above. The JSON-specific configuration centres on the objects list, which tells One Model where to find the records in your JSON file and how to handle each set of data.

Configuration for JSON File Inputs
startAt An optional path to a nested node within the JSON file where processing should begin. Use this when the records you want are nested inside wrapper objects and you want to skip the outer structure. Leave empty to start from the root of the file. For nested paths, use dot notation, e.g. data.employees.
encoding The character encoding of the file. Leave at the default (Utf8) if unsure.
multiJsonObjectFileFormat Set to true if the file contains multiple JSON objects listed one after another on separate lines (newline-delimited JSON, sometimes called NDJSON or JSON Lines format). Set to false (the default) for standard JSON files that contain a single root object or array.

Next set up the definitions for the JSON objects. Each entry in objects defines one table - the path to the repeating data, what to include, and what to skip. A single JSON file can define more than one object, loading data into multiple tables in one load.

Configuration for JSON Objects
name The name of the table that will be created in One Model's database for this object. Use lowercase letters, numbers, and underscores only.
path The path to the repeating element in the JSON that will become the rows of this table. Use dot notation for nested paths, e.g. employees or data.records. Wildcards are accepted, e.g. [*]. Leave empty to load from the root of the file or from the position defined in startAt.
cascadePaths A list of key names within the object that should be unpivoted and included as additional rows in this table, rather than being loaded as nested child data. Each entry is a plain key name as a string, e.g. - key1. Key names are case-sensitive and must match exactly as they appear in the JSON. Leave as an empty list ([]) if not required.
ignorePaths A list of paths within the object that should be excluded from loading entirely. Leave as an empty list ([]) if not required.
ignoreWhenNull Set to true to skip a row entirely when the value at the object path is null. Set to false (the default) to load the row with null values.

 

Example: a simple JSON file

For a JSON file structured like this:

{
  "employees": [
    { "id": "E001", "name": "Alex Smith", "department": "Finance" },
    { "id": "E002", "name": "Jordan Lee", "department": "HR" }
  ]
}

The complete YAML definition would be:

fileNamePattern: employees%.json
incrementalFileNamePattern:
schema: hr
fileNameColumn:
fileNameColumnIsKey: false
required: true
startAt: ''
encoding: Utf8
multiJsonObjectFileFormat: false
objects:
- name: employees
  path: employees
  cascadePaths: []
  ignorePaths: []
  ignoreWhenNull: false

Example: a nested JSON file with a wrapper object

For a JSON file where records are nested inside a wrapper object, use startAt to point directly to the nested records:

{
  "data": {
    "records": [
      { "id": "001", "status": "active" },
      { "id": "002", "status": "inactive" }
    ]
  }
}

The complete YAML definition would be:

fileNamePattern: records%.json
incrementalFileNamePattern:
schema: hr
fileNameColumn:
fileNameColumnIsKey: false
required: true
startAt: 'data'
encoding: Utf8
multiJsonObjectFileFormat: false
objects:
- name: records
  path: records
  cascadePaths: []
  ignorePaths: []
  ignoreWhenNull: false

Example: a newline-delimited JSON (NDJSON) file

For files where each line is a separate JSON object, set multiJsonObjectFileFormat: true:

{ "id": "E001", "name": "Alex Smith" }
{ "id": "E002", "name": "Jordan Lee" }

The complete YAML definition would be:

fileNamePattern: employees%.json
incrementalFileNamePattern:
schema: hr
fileNameColumn:
fileNameColumnIsKey: false
required: true
startAt: ''
encoding: Utf8
multiJsonObjectFileFormat: true
objects:
- name: employees
  path: ''
  cascadePaths: []
  ignorePaths: []
  ignoreWhenNull: false

When using multiJsonObjectFileFormat: true, the path within the object definition can be left empty - each line is itself the record.

 


How to add and configure an XML file input

Go to the data source grouping you will add your new file to and click + then Add XML Input:

 

Configure the file definition of the XML input in the editor and click Save (note: the configuration will not save if there are syntax issues with the YAML configuration):

 

XML files are hierarchical - data is structured as nested elements and attributes rather than flat rows and columns. The YAML configuration for an XML input maps this structure into flat database tables by defining table paths (the repeating elements that become rows) and using path syntax to identify the values to extract as columns.

Unlike delimited and JSON inputs, XML inputs do not have a table field at the top level. Instead, one or more tables are defined within the tables list, each with its own path into the XML structure. A single XML file can therefore populate multiple tables in one load.

Configuration for XML File Inputs
cascadeRowIdOrNull Set to true to add a unique system-generated ID to every record loaded from this file. This is useful when the XML data has no natural unique identifier and you need a reliable key for joins or incremental loads. Leave empty or set to null to disable this feature entirely.
cascadeRowIdOrNull > isKey Set to true to use the generated ID as the table's primary key, or false to include it as a non-key column. Requires cascadeRowIdOrNull to be configured.
dataElementName An optional field used when the XML file uses a non-standard wrapper element name for its data records. Leave empty in most cases.

Next set up the definitions for the XML tables. Each entry maps a repeating element in the XML to a database table:

Configuration for XML Repeating Elements (Tables)
name The name of the table that will be created in One Model's database. Use lowercase letters, numbers, and underscores only. Must begin with a letter and be no more than 128 characters.
path The path to the repeating XML element that will become the rows of this table. Table paths always begin from the root of the XML document using // notation, e.g. //employees/employee. Any XML element that has two or more occurrences at the same path must be defined as a table.
recursive Set to true if the table element can contain nested elements of the same type (e.g. an org chart where a manager element can contain subordinate elements of the same structure). Otherwise set to the default false.
cascadePaths A list of child paths within the table element that should be unpivoted and loaded as additional rows into this table, rather than being treated as a separate child table. Each entry requires: name (the column name for the extracted value), path (relative to the current table element), and isKey (true/false). Leave empty if not required.
renamePaths A list of paths whose extracted values should be loaded under a different column name. Use this when the XML element name is not suitable as a column name. Each entry requires: name (the new column name) and path (relative to the current table element). Leave empty if not required.
ignorePaths A list of paths within the table element that should be excluded from loading entirely. Each entry requires only a path. Use this to filter out elements you do not need or that would cause column conflicts. Leave empty if not required.

Path syntax reference

All paths in the XML YAML configuration use the following token syntax. Table paths always begin from the root of the document; column paths (used in cascadePaths, renamePaths, and ignorePaths) always begin from the current table element.

Token Description Example Result
// Root accessor. Navigates from the root of the XML document. Used for table paths. //employees/employee The <employee> element from the root
/ Child accessor. Navigates to the next child element. Used in both table and column paths. //a/b   ./a/b Child element from root, or child of current element
. Current element accessor. Navigates from the current table element. Used in column paths. ./id The <id> child of the current element
@ Attribute accessor. Extracts an attribute value from the current element. ./@id The id attribute on the current element
.. Parent accessor. Navigates to the parent or a sibling element. ../dept   .. The sibling <dept> element, or the parent element

Rules to follow when constructing paths:

  • Table paths always begin from the root of the XML document (use //).
  • Column paths always begin from the current table element (use ./ or ../).
  • Any XML element with two or more occurrences at the same path must be defined as a table - it cannot be a column.
  • Columns cannot be duplicated within a table. If the same path would produce two columns, either define it as a table or use renamePaths to give one a distinct name.
  • All names (tables[].name, cascadePaths[].name, renamePaths[].name) must begin with a letter, contain only letters, numbers, and underscores, and be no longer than 128 characters.

Example: a simple XML file

In this example, each <employee> element becomes one row in the employees table. The id attribute and child elements (name, department, startDate) are automatically extracted as columns.

<employees>
  <employee id="E001">
    <name>Alex Smith</name>
    <department>Finance</department>
    <startDate>2021-03-15</startDate>
  </employee>
  <employee id="E002">
    <name>Jordan Lee</name>
    <department>HR</department>
    <startDate>2019-08-01</startDate>
  </employee>
</employees>

The complete YAML definition would be:

fileNamePattern: employees%.xml
incrementalFileNamePattern:
schema: hr
fileNameColumn:
fileNameColumnIsKey: false
required: true
cascadeRowIdOrNull:
  isKey: false
tables:
- name: employees
  path: //employees/employee
  recursive: false
  cascadePaths: []
  renamePaths: []
  ignorePaths: []
dataElementName:

Example: an XML file with a nested child table

Because <role> repeats, it must be defined as its own table. The two tables are linked automatically by One Model using a generated key - cascadeRowIdOrNull is set to isKey: true so that One Model generates a unique ID for each employee record that can be used to join the two tables. The ignorePaths entry on the employees table prevents the <roles> content from also being loaded as columns into the parent table.

<employees>
  <employee id="E001">
    <name>Alex Smith</name>
    <roles>
      <role>
        <title>Analyst</title>
        <from>2019-01-01</from>
      </role>
      <role>
        <title>Senior Analyst</title>
        <from>2021-06-01</from>
      </role>
    </roles>
  </employee>
</employees>

The complete YAML definition would be:

fileNamePattern: employees%.xml
incrementalFileNamePattern:
schema: hr
fileNameColumn:
fileNameColumnIsKey: false
required: true
cascadeRowIdOrNull:
  isKey: true
tables:
- name: employees
  path: //employees/employee
  recursive: false
  cascadePaths: []
  renamePaths: []
  ignorePaths:
  - path: ./roles/role
- name: employee_roles
  path: //employees/employee/roles/role
  recursive: false
  cascadePaths: []
  renamePaths: []
  ignorePaths: []
dataElementName:

 


Next steps

You have now configured your File Upload data source and file input. You can now:

  • Upload a file to test the configuration via the Upload file/s option on the data source
  • Monitor your data load from Data > Loads. For additional information on data loads see the guide to Data Loads.
  • For additional assistance with date and timestamp column formats, see File Source - Delimited Input Date and Timestamp columns.
  • For questions, contact your One Model Customer Success team.

 

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.