Using components: File Storage Source

Use the file storage source component to read data stored in a file or multiple files in object stores such as Amazon S3, Google Cloud Storage or Azure Blob Storage or in file servers such as SFTP.

Connection

Select an existing file storage connection or create a new one.

Source Properties

Source location

  • Source bucket - The name of your cloud storage source bucket that contains the folders and objects defined in the path. Only relevant in the case of object stores such as Amazon S3 and Google Cloud Storage object stores.
  • Source path - The path to your input folder or file. Examples:
    • Folder: sales/2015/01/
    • File or object: sales/2015/01/log.csv
    • Pattern: sales/2015/{01,02}/
      You can use wild card characters for pattern globbing.

Note - file and directory names that begin with an underscore (_) or a dot (.) are ignored and data contained in them will not be read.

Source file format

  • record delimiter - defines what breaks the data into records.
    • New line (\n,\n\r,\r) - each line in your files is a record.
    • End of file - each file is treated as a record.
  • record type - defines the format of the record.
    • Delimited values - fields are delimited by a delimiter you define such as tab, comma or otherwise (see delimited values parameters below).
    • Json object - each record is a Json object (enclosed in curly brackets).
    • Raw - the record is read in its entirety into a single string/binary field.

Read more about selecting the right record delimiter and format.

Note: The source data can be compressed (gzip or bzip2) or uncompressed. The source data must be utf-8 encoded.

Delimited values parameters

If you selected new line record delimiter and delimited values record type, define the delimiter character used to separate fields in your objects and whether the data is enclosed in quotes.

  1. In the field delimiter drop-down list, select one of the characters (, tab). You can also type a single character or one of the following escape sequences:
    • \b (backspace)
    • \f (formfeed)
    • \n (newline)
    • \r (carriage return)
    • \t (tab)
    • \' (single quote)
    • \" (double quote)
    • \\ (backslash)

  2. If some or all of the fields are enclosed in single or double quotes, select ' or " in the string qualifier drop-down list. If the fields may also contain line breaks, select " (newline inside) or ' (newline inside) according to the string qualifier used in the files. Use the "newline inside" option with caution as unbalanced double quotes may have undesired effects on job performance.
  3. Check First row contains column names if there is a header row in each source file and you wish to skip it.

Json parameters

  • Base record JSONPath Expression - You can use a custom JSONPath expressions to define the base record and extract nested objects and arrays. Select the object preset, to use the keys of the JSON object as the input fields. Select the array preset, to use the keys of the JSON objects within the array as the input fields. Type in a custom JSONPath expression to extract fields of nested objects/arrays (e.g. $.data[*]/ would use the keys of the JSON objects with the array named "data" in the input JSON).

Source action 

  • Process all files directly from source - by default, files are read and data is manipulated within the same process.
  • Process only new files (Incremental load) - Define connection and path to store file manifest (see below) to let Xplenty read only new files in your source bucket/path.
    • Manifest connection - By default set to the source's connection. You may use another connection that has read/write access to store the manifest file.
    • Manifest path - Fill in bucket/directory/filename.gz. This is where the manifest file will be stored when the job is executed. The manifest lists all files that have been processed. Each job execution compares the current file listing in the input path to the manifest and only processes new files. Backups of the previous manifests will be stored with each job execution for debugging/rollback purposes.
  • Copy, merge and process all files - Use when the source path contains many small files and you'd like all of them. Xplenty will first read all files, merge them into larger files and then process them as large files which are faster to read and process.
    Note: The process will fail if the input path is for a single file. You may get strange results if your delimited data doesn't end in a line break or if your files contain header lines.

Source Schema

After defining the source location and format you, select the fields to use in the source.

  • With delimited values, fields are read by order so make sure you define all fields that exist in the source files by order.
  • With JSON input, you may define only the fields that you wish to use in your package.
  • With raw input, there is only a single field that contains the entire record data.
  • Define how you will refer to these fields (alias) in the other components of your package. If you use illegal characters, we'll let you know before you close the dialog.
  • You can also add file_path field to get the path to the file as a field in your data.

For each field, define the alias to use for the field in following components and data type. For JSON input, define the key in the source JSON file for each field. Read more about processing JSON data here.

Creating packages

  1. Creating a new package in New Xplenty
  2. Creating a workflow
  3. Working in the new package designer
  4. Validating a package
  5. Using components: Amazon Redshift Source
  6. Using components: Bing Ads Source
  7. Using components: Database Source
  8. Using components: Facebook Ads Insights Source
  9. Using components: File Storage Source
  10. Using components: Google Adwords source
  11. Using components: Google Analytics Source
  12. Using components: Google BigQuery Source
  13. Using components: Google Cloud Spanner Source
  14. Using components: MongoDB Source
  15. Using components: NetSuite Source
  16. Using components: Salesforce source
  17. Using components: Rest API Source
  18. Using components: Aggregate Transformation
  19. Using components: Assert Transformation
  20. Using components: Clone transformation
  21. Using components: Cross Join Transformation
  22. Using components: Distinct Transformation
  23. Using components: Filter Transformation
  24. Using components: Join Transformation
  25. Using components: Limit Transformation
  26. Using components: Rank Transformation
  27. Using components: Select Transformation
  28. Using components: Sort Transformation
  29. Using components: Union Transformation
  30. Using components: Window Transformation
  31. Using components: Sample Transformation
  32. Using components: Cube transformation
  33. Using components: Amazon Redshift Destination
  34. Using components: Database Destination
  35. Using components: File Storage Destination
  36. Using components: Google BigQuery Destination
  37. Using components: Google Spanner Destination
  38. Using components: MongoDB Destination
  39. Using components: Salesforce Destination
  40. Using components: Snowflake Destination (beta)
  41. Using Components: Rest API Destination
  42. Using and setting variables in your packages
  43. System and pre-defined variables
  44. Using pattern-matching in source component paths
  45. Using ISO 8601 string functions
  46. Using Expressions in Xplenty
  47. Xplenty Functions

Feedback and Knowledge Base