Once you have set up and configured your data destination, then you will want to run it. You can either manually kick off a run or set it to run on a schedule. Follow these instructions to run your data destination.
If you haven't already set up your data destinations, then start by reading this article about setting up and configuring your data destination.
1. Run a Destination
Navigate to Data > Data Destinations when you are ready to run your data destination.
There are two main options:
- Manual
- Schedule
- Time/s
- On data load completion
You will see the option buttons along the row of each destination that has been configured.
1. Manual
A permissioned user can manually run a destination at any time. Once you click the play icon, you will see a popup to confirm that you are sure. Click on Run Data Destination and it will begin immediately. You can track progress in the Data Destination History screen by clicking on the list icon.
Clicking Cancel will return you to the Data Destinations page without initiating a run.
2. Schedule
There are many different ways to schedule your data destination to suit your downstream data needs including scheduling by time or on data load completion.
-
On data load completion
Data load completion times may vary from day-to-day for a range of reasons including the size of a data extract and other factors. A destination run from a daily data load is very common, and we generally recommend using the On data load completion option to ensure the data destination run does not start before the load completes so that the run is extracting from the latest data load.
-
Time
If you would prefer the destination to run at a specific time, or you want a destination to deliver the configured file/s even if no new data has been received, then you should select the Time option.
If you are using the Day option, select the relevant time and time zone, then click Save. If you don’t see the Day or time options, check that Enabled is ticked at the top.
If you want a less frequent but scheduled destination, you will also see options to configure Weekly or Monthly, including the option for multiple days per week, or specific dates per month. Once your preferred schedule is set up, click Save.
The destination will automatically run at the nominated time. You can check the Data Destination History page for current or past runs, and you can also create a Notification to be emailed to you when the destination completes, if it errors or is rejected. Learn more about setting up your notifications here.
** After reviewing the run information in this section, please also read the FAQs below to understand the advantages and disadvantages of using the schedule option. **
2. FAQs
How do scheduled destinations interact with data loads?
There are some considerations when you use the Schedule > Time option.
A data load will rarely complete in the same amount of time each extract because of the variance in the number of records being generated and extracted at the source, as well as other factors, so the schedule should allow for variation in the data load completion time. And even with a variation allowance, there is a risk of data overlap as explained in the example below.
Let’s look at the “Redshift - Direct Connect - One” destination from our screenshot examples.
It has a number of ‘one schema’ tables selected to be extracted at 6am local time. The daily data extract that feeds into those one schema tables runs daily at 12am, and typically completes loading and processing by 5am. Here’s some potential scenarios and what would happen with the scheduled 6am data destination.
Note that on Jan-04, if the destination was extracting source data tables (not one schema tables) then processing wouldn’t be disrupted, but the destination could be extracting from the Jan-03 or Jan-04 load because there are 21 minutes where the Jan-04 extract is still loading some tables. Therefore the destination data may have mismatches and should be run manually to correct.
Please remember, this is an example only. Please speak with your Customer Success Lead to discuss your specific schedule requirements.
Can I run multiple destinations for different data sources at the same time?
It is possible to run many, separate Data Destinations at the same time. These may be set up to point to the same Database Tables, different Database Tables, Processing Script and/or Query Source tables. They could also be sent to multiple different locations if you wish (e.g. Azure, SFTP and other output locations as described here) However, if they are scheduled to run simultaneously, or the data load completion times overlap, then the time to complete may be extended due to a slower performance across multiple runs (particularly for large extracts).
If you do have multiple destinations configured, we recommend staggering the schedules if possible.
Can I run a single destination for a single data source being extracted multiple times a day?
Yes you can, but they run one after another and not simultaneously as they will be configured to start “On data load completion”. This is because you may see some data mismatches if a destination run starts before the latest data extract has finished loading into the data tables. A destination run will be rejected if the prior destination is still running. If a run is Rejected, you can manually start a new run, or it will run at the next ’data load completion’ time.
These scenarios may or may not apply to your requirement therefore it is best to discuss your specific scenario with your Customer Success Lead.
Why didn’t my Destination Run start at the scheduled time?
If you have your destination set to a specific time, there are some scenarios where you may see a delayed start or rejection:
- The start time will be delayed if the related data load is in between the steps of "Starting data processing" and "Data processing ran successfully". The destination will commence after the step of “Starting Cache Warming” is reached. Refer example below where the scheduled time is 8.00pm but you can see in the Data Destination History page that the start time was 8.06pm. Click on View to get to the Overview page, then expand the Run Events section and you will see a message “Waiting for Model Processing to complete” with the associated timings.
- The destination run will be rejected if for some reason the previous destination run is still in progress. This will be more common if a manual destination run is tried in addition to the scheduled time, or you have multiple data loads set to run upon “Data Load Completion”. Once again, you can use the View button to check the Message details. In this case, the message will read “Rejected - Previous load still running link to history” and give you a link to the destination that was still running.
My destination data isn’t the latest data, or seems to be a mix of old and new. Why?
For destinations set to run at a scheduled time, but the latest data load hasn’t reached the Processing step, then the destination will force the Processing to wait until the destination completes.
For destinations extracting one schema Database Tables, Processing Script, or Query Source tables, this means the latest data hasn’t been processed and the extract will contain the previously processed data.
For destinations extracting source Database Tables, if your destination starts at the scheduled time, but the latest data load hasn’t reached the Processing step then processing wouldn’t be disrupted, but the destination could be extracting from the previously processed OR currently loading data.
If you do notice (or suspect) that you have some corrupt or missing data, then manually executing a destination run (or waiting until the next scheduled run) will correct the data extract, because they are always a full (aka destructive) load. Feel free to discuss / troubleshoot any issues with your Customer Success Lead.
Comments
0 comments
Please sign in to leave a comment.