Using components: Amazon Redshift Destination

Use the Amazon Redshift Destination component to store the output of your package in Amazon Redshift. This requires that you have an existing Redshift cluster in your AWS account.

The Redshift Destination component is always the last component in a package.

To define the Redshift destination:

  1. Add a Redshift Destination component at the end of your dataflow.
  2. Open the component and name it.

Destination settings

  1. Define the parameters for connecting to your Redshift as follows (for more information, see Allow Xplenty access to my Redshift cluster):
    • redshift connection - either click the drop-down arrow and select an existing connection, or click create new to create a new connection.
    • target schema - the target table's schema. If empty, the default schema is used.
    • target table - the name of the target table in your Redshift cluster.
    • create table - automatically create the table defined in the component if it doesn't exist. See below for more information.
    • add columns - automatically add columns that are mapped in the destination and do not exist in the table. See below for more information. 
    • date and time formats - the date and time formats used when loading string fields in Xplenty into date and time columns in a Redshift table. For more information on the formats, see Amazon's documentation about the COPY statement and date and time formats.
    • replacement character - When set to a character, enables loading of data into VARCHAR columns even if the data contains invalid UTF-8 characters. Each invalid UTF-8 character in the input string is replaced with the replacement character (see ACCEPTINVCHARS in Redshift's COPY documentation). Note that 0x00(NUL) characters are automatically removed by Xplenty.
    • max errors - if loading data into Redshift returns fewer errors than this value, it continues without failing.
    • null string - string fields that match this value will be replaced with NULL.
    • compression - By default (automatic), data inserted into an empty target table will be compressed only if the table columns have RAW encoding or not encoding. If you select On, data inserted into an empty target table will be compressed regardless of existing column encoding. If you select Off, automatic compression is disabled. Refer here for more information.
    • compression rows - compression sample rate. default sample rate is 100,000 records.
    • intermediate compression - Data may be stored in Amazon S3 prior to loading it into Redshift. Select whether to compress the data before storing it to Amazon S3 or not. You may gain performance by compressing the data if it's relatively large and your process is not CPU intensive, or otherwise.
    • empty as null - loads empty string fields as NULL.
    • blank as null - loads blank fields (consisting only of white space) as NULL.
    • explicit ids - use this if your target table has an IDENTITY column and you wish to override the auto-generated values with values from the dataflow.
    • truncate columns - truncates string values in order for them to fit in the target column specification.
    • round decimals - rounds up numeric values whose scale exceeds the scale of the target column.
    • Operation type - the method of data insertion
      • insert - default behaviour. Data will only be appended to the target table.
      • truncate and insert - truncate the target table before the data flow executes.
      • delete and insert - deletes all of the target table before the data flow executes. If a truncate statement can't be executed on the target table due to permissions or other constraints, you can use this instead.
      • merge (delete and insert) - incoming data is merged with existing data in the table by deleting target table data that exists in both the data sets and then inserting all the incoming data into the target table. Requires setting the merge keys correctly in field mapping. Merge is done in a single transaction:
        1. The dataflow's output is copied into a temporary table with the same schema as the target table.
        2. Rows with keys that exist in the temporary table are deleted from the target table.
        3. All rows in the temporary table are inserted into the target table.
        4. temporary table is dropped.
      • merge (update and insert) - incoming data is merged with existing data in the table by updating existing data and inserting new data. Requires setting the merge keys correctly in field mapping. Merge is done in the following manner:
        1. The dataflow's output is copied into a temporary table with the same schema as the target table.
        2. Target table rows that exist in temporary table are updated (according to the keys defined in the destination component).
        3. Rows with keys that exist in the target table are deleted from the temporary table.
        4. All rows in the temporary table are inserted into the target table.
        5. temporary table is dropped.
    • pre action sql - sql code to execute before inserting the data into the target table. If a merge operation is selected, the sql code is executed before the staging table is created.
    • post action sql - sql code to execute after inserting the data into the target table. If a merge operation is selected, the sql code is executed after the staging table is merged into the target table.
    • Then click the Test Connection button to help check that the connection is good and that the target table exists.

      Note: If you did not select 'empty as null' , 'blank as null' or use the 'null string' option - string fields in the target table will contain empty strings instead of null values.

  2. Map fields from your dataflow to columns in the target table. You can click the green Auto-fill fields icon to automatically populate the list of fields and column names.
  3. If merge operation type is used, select the proper key columns. If append operation type is used, key selection can be ignored.

Automatically creating and altering destination table

Xplenty can automatically create the destination table for you if it doesn't exist and it can also append columns to an existing table. If you define columns as key (regardless of the operation type), Xplenty defines them as the sort key in a new table. The data types in Xplenty are automatically mapped as follows. Note that since Xplenty doesn't have a notion of maximum string length, the string columns are created with the maximum length allowed in Redshift.

Xplenty Redshift
String VARCHAR(65535)
Integer INT
Long BIGINT
Float REAL
Double DOUBLE PRECISION
DateTime TIMESTAMP
Boolean BOOLEAN

Feedback and Knowledge Base