Using components: Amazon Redshift Destination

Use the Amazon Redshift destination component to store the output of a data flow in Amazon Redshift. The destination component stores the data intermediately into Amazon S3 and then uses Amazon Redshift's COPY statement to push the data into the table.

Connection

Select an existing Amazon Redshift connection or create a new one (for more information, see Allowing Integrate.io ETL access to my Redshift cluster.) 

Destination Properties

  • Target schema - the target table's schema. If empty, the default schema is used.
  • Target table - the name of the target table in your Redshift cluster. By default, if the table doesn't exist, it will be created automatically.
  • Automatically create table if it doesn't exist - if unchecked and the table doesn't exist, the job fails.
  • Automatically add missing columns - when checked, the job will check if each of the specified columns exist in the table and if one does not exist, it will add it. Key columns can't be automatically added to a table.

Operation type

Append (Insert only) - default behaviour. Data will only be appended to the target table

Overwrite (Truncate and insert) - truncate the target table before data is inserted into the target table. The connection's user must be the owner of the table or a superuser.

Overwrite (Delete all rows on table and insert) - deletes all of the target table before the data flow executes. If a truncate statement can't be executed on the target table due to permissions or other constraints, you can use this instead. This operation does not clear the schema.  

Merge with existing data using delete and insert - incoming data is merged with existing data in the table by deleting target table data that exists in both the data sets and then inserting all the incoming data into the target table. Requires setting the merge keys correctly in field mapping. Merge is done in a single transaction:

  1. The dataflow's output is copied into a temporary table with the same schema as the target table.
  2. Rows with keys that exist in the temporary table are deleted from the target table.
  3. All rows in the temporary table are inserted into the target table.
  4. temporary table is dropped.

Merge with existing data using update and insert - incoming data is merged with existing data in the table by updating existing data and inserting new data. Requires setting the merge keys correctly in field mapping. Merge is done in the following manner:

  1. The dataflow's output is copied into a temporary table with the same schema as the target table.
  2. Target table rows that exist in temporary table are updated (according to the keys defined in the destination component).
  3. Rows with keys that exist in the target table are deleted from the temporary table.
  4. All rows in the temporary table are inserted into the target table.
  5. temporary table is dropped.

Pre and post action SQL

Pre-action SQL - SQL code to execute before inserting the data into the target table. If a merge operation is selected, the sql code is executed before the staging table is created.

Post-action SQL - SQL code to execute after inserting the data into the target table. If a merge operation is selected, the sql code is executed after the staging table is merged into the target table.

Advanced options

  • Intermediate compression - Data is stored in Amazon S3 prior to loading it into Redshift. Select whether to compress the data before storing it to Amazon S3 or not. You may gain performance by compressing the data if it's relatively large and your process is not CPU intensive, or otherwise.
  • Maximum errors - If this number of errors occurs in Redshift while loading data into the table, the job fails.
  • Truncate columns - Truncates string values in order for them to fit in the target column specification.
  • Trim white space - Trims trailing white space inserted into *CHAR columns.
  • Load empty data as null - Loads empty string values as null into *CHAR columns.
  • Load blank data as null - Loads fields consisting only of white space as null into *CHAR columns.
  • Null string - String fields that match this value will be replaced with NULL.
  • Replacement character for invalid UTF-8 characters - By default, invalid UTF-8 characters in input will be replaced by ?. You can select any other single ASCII character. Note that 0x00(NUL) characters are automatically removed by Integrate.io ETL.
  • Round decimal values - Rounds up numeric values whose scale exceeds the scale of the target column.
  • Input contains explicit identity values - Check if target table contains an identity column and you’d like to override auto-generated values. Only works with append or overwrite operations.
  • Apply compression during data copy - By default (automatic), data inserted into an empty target table will be compressed only if the table columns have RAW encoding or not encoding. If you select On, data inserted into an empty target table will be compressed regardless of existing column encoding. If you select Off, automatic compression is disabled. Refer here for more information.
  • Compression sample size - Specifies the number of rows to be used as the sample size for compression analysis.

Schema Mapping

Map the dataflow fields to the target table's columns. Columns defined as key will be used as the sort key when Integrate.io ETL creates the table. If merge operation is used, you must select at least a field or multiple fields as keys, which will be used to uniquely identify rows in the table for the merge operation.

The data types in Integrate.io ETL are mapped as follows when the table is created automatically. Note that since Integrate.io ETL doesn't have a notion of maximum string length, the string columns are created with the maximum length allowed in Redshift.

Integrate.io ETL Redshift
String VARCHAR(65535)
Integer INT
Long BIGINT
Float REAL
Double DOUBLE PRECISION
DateTime TIMESTAMP
Boolean BOOLEAN