Using components: File Storage Destination

Use the file storage destination component to define where and how your package output is written.
Destination components are always the last component in a package.

To define the file destination:

  1. Add a file storage destination component at the end of your dataflow.
  2. Open the component and name it.

destination location

  1. connection - either click the drop-down arrow and select an existing connection, or click create new to create a new connection (see Defining connections).
  2. bucket/container - the name of the target cloud storage bucket or container where the package output folder and files will be written. Only relevant in the case of object stores such as Amazon S3 and Google Cloud Storage object stores.
  3. target directory - your destination folder in the form of folder1/folder2/. It must not exist (unless you select Overwrite destination path - see below).
  4. target file names - Xplenty is a parallel processing platform and this means that you may have multiple files in the destination (you can also merge the output to a single file, see below). The default pattern for files is part-m-00000 and an extension may be added according to the destination type and compression. Select custom pattern to control the prefix and suffix around the file number.

Then click the Test Connection button to help check that the connection is good and that the bucket/container exists and if the path exists or not.

Note: Paths that contain variables are not validated correctly.

destination type

Define the type of your destination object:

  • flat file - contains one record per line. Within such a record, the individual fields are separated by delimiters such as comma or tab characters. The output data is utf-8 encoded.
  • json - JavaScript Object Notation. Each file contains one Json object per line. The Json objects contain unordered key value pairs. The values can be string, number, Boolean, arrays or Json objects. The output data is utf-8 encoded.
  • parquet - Apache Parquet is a columnar storage format popular with Hive and Impala. You can control the Parquet output with the system variables: _PARQUET_COMPRESSION,  _PARQUET_PAGE_SIZE,  _PARQUET_BLOCK_SIZE. Note that datetime and complex data types are not currently supported with Parquet.

Flat file parameters

If you selected flat files as the destination type, you need to define the delimiter character used to separate fields in your objects and whether the object data is quote enclosed.

  1. In the field delimiter drop-down list, select one of the characters (, tab). You can also type a single character or one of the following escape sequences:
    • \b (backspace)
    • \f (formfeed)
    • \n (newline)
    • \r (carriage return)
    • \t (tab)
    • \' (single quote)
    • \" (double quote)
    • \\ (backslash)
  2. String qualifier - if double quote or single quote are selected, fields that contain the delimiter will be enclosed in single or double quotes. String qualifiers within the field data will be escaped by doubling them (i.e. " becomes "").
  3. Check Write field names in first row if you want the first row of the destination file to contain column headings.

More options

  • compression type - Select the type of compression for your data (Gzip, Bzip2 or none).
  • Merge to single file - if you want to make sure that only one file is written to the destination directory, select this option. Otherwise, it is possible to have many output files in the destination path.
  • Delete destination directory before writing - Check if you'd like to delete the destination directory before writing. If the directory already exists and it's not deleted, the job will fail.
  • When destination directory exists:
    • Fail job - by default, the job will fail if the destination directory already exists.
    • Replace existing files - when selected, the job will not fail when the destination directory exists and files will be written to the existing directory. If files with the same names exist in the directory, they will be overwritten. If you'd like to add files to the directory, make sure to change use a custom file pattern that is unique for every job execution (e.g. use the variable $_JOB_ID in the prefix.

Creating packages

  1. Creating a new package
  2. Create a package from a template
  3. Working in the package designer
  4. Using Components: Facebook Ads Insights Source (Beta)
  5. Using components: File Storage Source
  6. Using components: Database Source
  7. Using components: Google AdWords Source
  8. Using components: NetSuite Source
  9. Using Components: Google Analytics Source
  10. Using Components: Google BigQuery Source
  11. Using components: Google Cloud Spanner Source
  12. Using Components: Bing Ads Source
  13. Using components: MongoDB Source
  14. Using components: Amazon Redshift Source
  15. Using Components: Rest API Source
  16. Using Components: Salesforce Source
  17. Using components: Select
  18. Using components: Sort
  19. Using components: Rank
  20. Using components: Limit
  21. Using components: Sample
  22. Using components: Join
  23. Using components: Cross Join
  24. Using components: Clone
  25. Using components: Cube and Rollup
  26. Using components: Union
  27. Using components: Filter
  28. Using Components: Window
  29. Using components: Assert
  30. Using components: Aggregate
  31. Using components: Distinct
  32. Using components: File Storage Destination
  33. Using components: Amazon Redshift Destination
  34. Using Components: Salesforce Destination (Beta)
  35. Using components: Google BigQuery Destination
  36. Using components: Google Cloud Spanner Destination
  37. Using components: Database Destination
  38. Using components: MongoDB Destination
  39. Using and setting variables in your packages
  40. Validating a package
  41. Using pattern-matching in source component paths
  42. Using ISO 8601 string functions
  43. Using Expressions in Xplenty
  44. Xplenty Functions

Feedback and Knowledge Base