Using components: File Storage Destination

Use the file storage destination component to define where and how your package output is written.
Destination components are always the last component in a package.

To define the file destination:

  1. Add a file storage destination component at the end of your dataflow.
  2. Open the component and name it.

destination location

  1. connection - either click the drop-down arrow and select an existing connection, or click create new to create a new connection (see Defining connections).
  2. bucket/container - the name of the target cloud storage bucket or container where the package output folder and files will be written. Only relevant in the case of object stores such as Amazon S3 and Google Cloud Storage object stores.
  3. target directory - your destination folder in the form of folder1/folder2/. It must not exist (unless you select Overwrite destination path - see below).
  4. target file names - Xplenty is a parallel processing platform and this means that you may have multiple files in the destination (you can also merge the output to a single file, see below). The default pattern for files is part-m-00000 and an extension may be added according to the destination type and compression. Select custom pattern to control the prefix and suffix around the file number.

Then click the Test Connection button to help check that the connection is good and that the bucket/container exists and if the path exists or not.

Note: Paths that contain variables are not validated correctly.

destination type

Define the type of your destination object:

  • flat file - contains one record per line. Within such a record, the individual fields are separated by delimiters such as comma or tab characters. The output data is utf-8 encoded.
  • json - JavaScript Object Notation. Each file contains one Json object per line. The Json objects contain unordered key value pairs. The values can be string, number, Boolean, arrays or Json objects. The output data is utf-8 encoded.
  • parquet - Apache Parquet is a columnar storage format popular with Hive and Impala. You can control the Parquet output with the system variables: _PARQUET_COMPRESSION,  _PARQUET_PAGE_SIZE,  _PARQUET_BLOCK_SIZE. Note that datetime and complex data types are not currently supported with Parquet.

Flat file parameters

If you selected flat files as the destination type, you need to define the delimiter character used to separate fields in your objects and whether the object data is quote enclosed.

  1. In the field delimiter drop-down list, select one of the characters (, tab). You can also type a single character or one of the following escape sequences:
    • \b (backspace)
    • \f (formfeed)
    • \n (newline)
    • \r (carriage return)
    • \t (tab)
    • \' (single quote)
    • \" (double quote)
    • \\ (backslash)
  2. String qualifier - if double quote or single quote are selected, fields that contain the delimiter will be enclosed in single or double quotes. String qualifiers within the field data will be escaped by doubling them (i.e. " becomes "").
  3. Check Write field names in first row if you want the first row of the destination file to contain column headings.

More options

  • compression type - Select the type of compression for your data (Gzip, Bzip2 or none).
  • Merge to single file - if you want to make sure that only one file is written to the destination directory, select this option. Otherwise, it is possible to have many output files in the destination path.
  • Delete destination directory before writing - Check if you'd like to delete the destination directory before writing. If the directory already exists and it's not deleted, the job will fail.
  • When destination directory exists:
    • Fail job - by default, the job will fail if the destination directory already exists.
    • Replace existing files - when selected, the job will not fail when the destination directory exists and files will be written to the existing directory. If files with the same names exist in the directory, they will be overwritten. If you'd like to add files to the directory, make sure to change use a custom file pattern that is unique for every job execution (e.g. use the variable $_JOB_ID in the prefix.

Feedback and Knowledge Base