FAQs - Data Destinations

Can I send Data to Databricks?

Yes. The best approach for getting your data into Databricks depends on which type of environment you're using Databricks in:

Databricks on AWS: Use the One Model Amazon S3 Data Destination to send your data to an S3 bucket that Databricks has access to.
Databricks on Azure: Use the One Model Azure Data Lake Gen 2 Data Destination to send your data to an Azure Data Lake Storage container.
Databricks on Google Cloud: Use the One Model Google BigQuery Data Destination to send the data into Google BigQuery, and allow Databricks to connect to that.

Alternatively, use the One Model SFTP Data Destination to send the files to an SFTP Server, and allow Databricks from any environment to access them there.

Can I send data to another destination via an S3 bucket?

Yes, we have an Amazon S3 Data Destination so you can send data to another location via an S3 bucket.

How do I check on the data that was sent in my data destination?

A copy of the file that was sent can be downloaded if you navigate to Data > Destination > Select the run > View Data Destination History > View > expand the "Files" section and subsections where you should see a Download button for the file created in each section. Click on the Download button and your download should begin automatically.

What do I do if I notice an error on a data destination?

Navigate to Data Destinations -> Data Destination History -> Select the failed run. You should see a helpful error message under either the Files section, or the Run Events section - depending if the error was in generating the files, or sending them to the final Destination.
The error message should enable you to troubleshoot the issue (for example if Credentials have expired, or if a Processing Script references a table that no longer exists). If not, our Customer Success team can also assist you in understanding the problem.

Can I set up an alert to notify me if a run has failed or errored?

Yes, follow the steps in this article - Data Alerts and Notifications to create a subscription for your data loads and data destinations to be delivered to your email inbox.

If I use a data destination, can I access a copy of the file that is sent?

Yes, see the answer to How do I check on the data that was sent in my data destination? above for further information.

How do scheduled destinations interact with data loads?

There are some considerations when you use the Schedule > Time option. A data load will rarely complete in the same amount of time each extract because of the variance in the number of records being generated and extracted at the source, as well as other factors, so the schedule should allow for variation in the data load completion time. And even with a variation allowance, there is a risk of data overlap as explained below.

Let’s look at the “Redshift - Direct Connect - One” destination from our screenshot examples.

It has a number of ‘one schema’ tables selected to be extracted at 6am local time. The daily data extract that feeds into those one schema tables runs daily at 12am, and typically completes loading and processing by 5am. Here’s some potential scenarios and what would happen with the scheduled 6am data destination.

Note, on Jan-04, if the destination was extracting source data tables (not one schema tables) then processing wouldn’t be disrupted, but the destination could be extracting from the Jan-03 or Jan-04 load because there are 21 minutes where the Jan-04 extract is still loading some tables. Therefore the destination data may have mismatches and should be run manually to correct.

Please remember, this is an example only. Please speak with your Customer Success Lead to discuss your specific schedule requirements.

Can I run multiple destinations for different data sources at the same time?

It is possible to run many, separate Data Destinations at the same time. These may be set up to point to the same Database Tables, different Database Tables, Processing Script and/or Query Source tables. They could also be sent to multiple different locations if you wish (e.g. Azure, SFTP and other output locations as described here) However, if they are scheduled to run simultaneously, or the data load completion times overlap, then the time to complete may be extended due to a slower performance across multiple runs (particularly for large extracts).

If you do have multiple destinations configured, we recommend staggering the schedules if possible.

Can I run a single destination for a single data source being extracted multiple times a day?

Yes you can, but they run one after another and not simultaneously as they will be configured to start “On data load completion”. This is because you may see some data mismatches if a destination run starts before the latest data extract has finished loading into the data tables. A destination run will be rejected if the prior destination is still running. If a run is Rejected, you can manually start a new run, or it will run at the next ’data load completion’ time.

These scenarios may or may not apply to your requirement therefore it is best to discuss your specific scenario with your Customer Success Lead.

Why didn’t my Destination Run start at the scheduled time?

If you have your destination set to a specific time, there are some scenarios where you may see a delayed start or rejection:

The start time will be delayed if the related data load is in between the steps of "Starting data processing" and "Data processing ran successfully". The destination will commence after the step of “Starting Cache Warming” is reached. Refer example below where the scheduled time is 8.00pm but you can see in the Data Destination History page that the start time was 8.06pm. Click on View to get to the Overview page, then expand the Run Events section and you will see a message “Waiting for Model Processing to complete” with the associated timings.

The destination run will be rejected if for some reason the previous destination run is still in progress. This will be more common if a manual destination run is tried in addition to the scheduled time, or you have multiple data loads set to run upon “Data Load Completion”. Once again, you can use the View button to check the Message details. In this case, the message will read “Rejected - Previous load still running link to history” and give you a link to the destination that was still running.

My destination data isn’t the latest data, or seems to be a mix of old and new. Why?

For destinations set to run at a scheduled time, but the latest data load hasn’t reached the Processing step, then the destination will force the Processing to wait until the destination completes.

For destinations extracting one schema Database Tables, Processing Script, or Query Source tables, this means the latest data hasn’t been processed and the extract will contain the previously processed data.

For destinations extracting source Database Tables, if your destination starts at the scheduled time, but the latest data load hasn’t reached the Processing step then processing wouldn’t be disrupted, but the destination could be extracting from the previously processed OR currently loading data.

If you do notice (or suspect) that you have some corrupt or missing data, then manually executing a destination run (or waiting until the next scheduled run) will correct the data extract, because they are always a full (aka destructive) load. Feel free to discuss / troubleshoot any issues with your Customer Success Lead.

Read more in our Introduction to Data Destinations.

FAQs - Data Destinations

Can I send Data to Databricks?

Can I send data to another destination via an S3 bucket?

How do I check on the data that was sent in my data destination?

What do I do if I notice an error on a data destination?

Can I set up an alert to notify me if a run has failed or errored?

If I use a data destination, can I access a copy of the file that is sent?

How do scheduled destinations interact with data loads?

Can I run multiple destinations for different data sources at the same time?

Can I run a single destination for a single data source being extracted multiple times a day?

Why didn’t my Destination Run start at the scheduled time?

My destination data isn’t the latest data, or seems to be a mix of old and new. Why?

Was this article helpful?

Comments

<%= previousTitle %>

<%= nextTitle %>

In this article:

<%= heading %>

<% if (block.html_url) { %> <%= block.name %> <% } else { %> <%= block.name %> <% } %>

<%= heading %>

<% if (block.html_url) { %> <%= block.name %> <% } else { %> <%= block.name %> <% } %>

Learn Apply Lead

Categories

Toggle navigation menu

<%= category.name %>