Using components: File Storage Destination

Use the File storage destination component to store the output of a data flow into files in a designated directory on a file server (SFTP, HDFS) or object store (Amazon S3, Google Cloud Storage, Azure Blob Storage).

Connection

Select an existing File storage connection or create a new one.

Destination Properties

  • Target bucket - the name of the target cloud storage bucket where the package output directory and files will be written. Only relevant in the case of object stores such as Amazon S3 and Google Cloud Storage object stores.
  • Target directory - the name of the target directory (within the bucket for object stores). One or more files will be created in the directory. By default, if the target directory already exists, the job will fail (see below).
  • Destination format define the target format to use.
    • Delimited values options - produce csv, tsv or any other delimited values format. The output files are utf-8 encoded.
    • Line delimited JSON - produce a JSON object per record in each line of the output files. The output files are utf-8 encoded.
    • Parquet - Apache Parquet is a columnar storage format popular with Impala, AWS Athena, Presto and other open source DWH solutions. You can control the Parquet output with the system variables: _PARQUET_COMPRESSION, _PARQUET_PAGE_SIZE, _PARQUET_BLOCK_SIZE. Note that datetime and complex data types are not supported with Parquet.
  • Delimited values options
    • Delimiter - select or type a single character to separate values in the output file (tab by default).
    • String Qualifier - if double quote or single quote are selected, fields that contain the selected delimiter will be enclosed in single or double quotes. String qualifiers within the field data will be escaped by doubling them (i.e. " becomes "").
    • Write field names in header - check to add a header line containing the field names in each output file.
  • Output compression - Select the type of compression for your data (Gzip, Bzip2 or none). Using Gzip or Bzip2 compression adds a .gz or .bz2 suffix to the output directory name.

Destination Action

  • Write all files directly and fail the job if target directory already exists - Files will be written directly to the target directory. By default, the job will fail if the destination directory already exists. You can use variables to dynamically generate new directory names every time a job is executed (e.g. /output/${_JOB_ID}).
  • Write all files directly and delete target directory if already exists - Files will be written directly to the target directory. During execution, the job checks if the directory exists and deletes it.
  • Write all files directly and replace files in directory if they already exist - Files will be written directly to the target directory. When selected, the job will not fail when the destination directory exists and files will be written to the existing directory. If files with the same names exist in the directory, they will be overwritten. If you'd like to add files to the directory, make sure to change use a custom file pattern that is unique for every job execution (e.g. use the variable ${_JOB_ID} in the file prefix (see below).
  • Use intermediate storage and copy files to an existing directory in destination - Files will be written to an intermediate storage and then will be copied to the target directory.

  • Merge output to single file - check to make sure only a single file is written to destination directory. Limitations on file sizes in certain platforms may fail your job.

Target file names

Default file pattern is part-[mr]-[0-9]{5} (for example part-m-00000). To change it select custom pattern:

  • File name prefix - leave empty to keep the default prefix (part-[mr]-) or change to your custom prefix. Use variables to set the prefix dynamically.
  • File name suffix - Xplenty automatically suggests suffixes according to the file format and compression type you selected.

Creating packages

  1. Creating a new package in New Xplenty
  2. Creating a workflow
  3. Working in the new package designer
  4. Validating a package
  5. Using components: Amazon Redshift Source
  6. Using components: Bing Ads Source
  7. Using components: Database Source
  8. Using components: Facebook Ads Insights Source
  9. Using components: File Storage Source
  10. Using components: Google Adwords source
  11. Using components: Google Analytics Source
  12. Using components: Google BigQuery Source
  13. Using components: Google Cloud Spanner Source
  14. Using components: MongoDB Source
  15. Using components: NetSuite Source
  16. Using components: Salesforce source
  17. Using components: Rest API Source
  18. Using components: Aggregate Transformation
  19. Using components: Assert Transformation
  20. Using components: Clone transformation
  21. Using components: Cross Join Transformation
  22. Using components: Distinct Transformation
  23. Using components: Filter Transformation
  24. Using components: Join Transformation
  25. Using components: Limit Transformation
  26. Using components: Rank Transformation
  27. Using components: Select Transformation
  28. Using components: Sort Transformation
  29. Using components: Union Transformation
  30. Using components: Window Transformation
  31. Using components: Sample Transformation
  32. Using components: Cube transformation
  33. Using components: Amazon Redshift Destination
  34. Using components: Database Destination
  35. Using components: File Storage Destination
  36. Using components: Google BigQuery Destination
  37. Using components: Google Spanner Destination
  38. Using components: MongoDB Destination
  39. Using components: Salesforce Destination
  40. Using components: Snowflake Destination (beta)
  41. Using Components: Rest API Destination
  42. Using and setting variables in your packages
  43. System and pre-defined variables
  44. Using pattern-matching in source component paths
  45. Using ISO 8601 string functions
  46. Using Expressions in Xplenty
  47. Xplenty Functions

Feedback and Knowledge Base