Use the Google BigQuqery destination component to store the output of a data flow in a BigQuery table.
The destination component stores the data intermediately into Google Cloud Storage and then uses BigQuery's API to import the data into it.
Select an existing Google BigQuery connection or create a new one.
- Target table - the name of the target table in your BigQuery dataset. By default, if the table doesn't exist, it will be created automatically.
- Automatically create table if it doesn't exist - if unchecked and the table doesn't exist, the job fails.
- Automatically add missing columns - when checked, the job will check if each of the specified columns exist in the table and if one does not exist, it will add it. Note: Uncheck this if your target is a partitioned table (See below).
Inserting data into partitioned tables
Data can be inserted into a partitioned table by specifying the partition to insert into using the notation table$partition (e.g. events$20160405).
Append (Insert only) - default behaviour. Data will only be appended to the target table
Overwrite (Truncate and insert) - truncate the target table before data is inserted into the target table.
Merge with existing data - incoming data is merged with existing data in the table. Requires setting the merge keys correctly in field mapping. Merge is done in the following manner:
- A staging table is created and the data flow's data is stored in it.
- The target table is copied onto a new table (target-copy) and is cleaned up.
- Data that exists in the staging table that doesn't exist in target-copy is inserted into the target table. (new data)
- Data that exists in both staging table and target-copy is joined, columns values are taken from the staging table unless they are null. (old/updated data).
- Both staging table and target-copy table are dropped.
- In case of an error, the process tries to copy the target-copy back on to the original table.
Note: In merge, the incoming data must be unique according to the key fields you selected. You may use the aggregate component or the limit component (partition by the key field(s) and limit to 1 record per partition) to make sure that the key fields are indeed unique.
- Maximum errors - If this number of errors occurs in BigQuery while loading data into the table, the job fails.
Map the dataflow fields to the target table's columns.