Use the Google BigQuery Destination component to store the output of your package in a Google BigQuery table.
Destination components are always the last component in a package.
To define the Google BigQuery destination:
- Add a Google BigQuery Destination component at the end of your dataflow.
- Open the component and name it.
- Define the parameters for connecting to your BigQuery as follows (for more information, see Allow Xplenty access to my Google BigQuery dataset):
- Google bigQuery connection - either click the drop-down arrow and select an existing connection, or click create new to create a new connection.
- Target table - the name of an existing table in your database. If the table doesn't exist or cannot be created, a failure will occur. Data can be inserted into a partitioned table by specifying the partition to insert into using the notation table$partition (e.g. events$20160405).
- Max bad records - if loading data into Google BigQuery returns fewer errors than this value, it continues without failing.
- Operation type - the method of data insertion
- Insert - default behavior. Data will only be appended to the target table.
- Truncate and Insert - truncate the target table before the data flow executes and insert the data into the target table.
- Merge - data will be inserted and/or updated in the target table according to the defined key(s). See details below.
- Create table - check Create table if you want Xplenty to attempt to create a table. If a table with the same name already exists in the data set, data insertion will proceed, unless the mapping, as described in the next paragraph, fails.
Then click the Test Connection button to help check that the connection is good and that the target table exists.
Merging data into an existing table
The merge operation is done in the following steps:
- A staging table is created and the data flow's data is stored in it.
- Note: The incoming data must be unique according to the key fields you selected. You may use the aggregate component or the limit component to make sure that the key fields are indeed unique. A workaround for this kind of failure can be to use a Limit component and add the key field/s as a partition and limit it to 1 thus removing duplicates.
- The target table is copied onto a new table (target-copy) and is cleaned up.
- Data that exists in the staging table that doesn't exist in target-copy is inserted into the target table.
- Data that exists in both staging table and target-copy is joined, columns values are taken from the staging table unless they are null.
- Both staging table and target-copy table are dropped.
- In case of an error, the process tries to copy the target-copy back on to the original table.