Using and setting variables in your packages

Using variables in your package


You can use variables in most components and fields of a package. You cannot use, for example, variables in the limit component or in alias fields.
Variable values are expressions so you can use functions and operators to assign dynamic values to your variables.

You can use three types of variables:

  • user variables that you define. You also set the default value.
  • system variables for which you can change the default value:
    • _ADWORDS_API_MAX_INPUT_SPLITS - request timeout (in milliseconds) for Google Adwords source components.
    • _ADWORDS_API_REQUEST_READ_TIMEOUT - request timeout (in milliseconds) for Google Adwords source components.
    • _BQ_READER_POLL_INTERVAL - Sets the interval in milliseconds between retries when polling data export from Google BigQuery.
    • _BQ_READER_POLL_RETRIES - Controls the number of retries when polling data export from Google BigQuery.
    • _BYTES_PER_REDUCER - the string size, in bytes, that can be read for each record. Larger records are ignored.
    • _CACHED_BAG_MEMORY_PERCENT - Percentage of the heap allocated for all bags in a map or reduce task. When the amount is filled, data is spilled to disk. Higher value reduces spills to disk but increases likelihood of running out of heap memory.
    • _COPY_TARGET_PARTITIONS - Controls how many partitions the data is divided to by the copy pre-process action. Setting this variable's value to 0 forces the process not to merge files.
    • _COPY_TARGET_SIZE - Controls the maximum size per file in partition for files that are concatenated by the copy pre-process action.
    • _DEFAULT_TIMEZONE - default time zone for date-time datatype fields.
    • _DEFAULT_PARALLELISM - Sets the default number of parallel reduce tasks to use in the package. Generally speaking, the number of reducers depend on the size of your data and it's distribution. If your data is relatively big but skewed (for example, when you aggregate by a field, most records fall into one group), adding more reducers will not have a positive effect on performance. The default value is 0, which means that the number of reducers is being calculated by _BYTES_PER_REDUCER.
    • _FS_SFTP_MAX_RETRIES - number of retry attempts when trying to find files or directories in SFTP (default - 5).
    • _FS_SFTP_RETRY_SLEEP - interval (ms) between retry attempts when trying to find files or directories in SFTP (default - 500)
    • _GA_API_MAX_INPUT_SPLITS - maximum number of concurrent Google Analytics requests.
    • _GA_API_REQUEST_MAX_RESULTS - maximum results per page for Google Analytics source components.
    • _GA_API_REQUEST_READ_TIMEOUT - request timeout (in milliseconds) for Google Analytics source components.
    • _INTERMEDIATE_COMPRESSION - compression for intermediate results. Defaults to false.
    • _LINE_RECORD_READER_MAX_LENGTH - maximum length in byte for lines read from files. Lines longer than this value will be discraded.
    • _MAP_MAX_ATTEMPTS - number of times to try to execute a map task before failing the job.
    • _MAP_MAX_FAILURES_PERCENT - Controls the maximum percentage of map tasks that are allowed to fail without triggering job failure. The value range is 0-100.
    • _MAP_TASK_TIMEOUT - number of milliseconds before task is killed if the task doesn't update status.
    • _MAX_COMBINED_SPLIT_SIZE - amount of data, in bytes, to be processed by a single task. Smaller files are combined until this size is reached. Larger files are split if they are uncompressed or compressed using Bzip2.
    • _PARQUET_BLOCK_SIZE - Size of a row group being buffered in memory for Apache Parquet.
    • _PARQUET_COMPRESSION - Compression type for Apache Parquet. Available values are: UNCOMPRESSED, GZIP, SNAPPY.
    • _PARQUET_PAGE_SIZE - Page size for Apache Parquet compression.
    • _REDUCER_MAX_ATTEMPTS - number of times to try to execute a map task before failing the job.
    • _REDUCER_MAX_FAILURES_PERCENT - maximum percentage of reduce tasks that are allowed to fail without triggering job failure. The value range is 0-100.
    • _SHUFFLE_INPUT_BUFFER_PERCENT - The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle.
    • _COPY_PARALLELISM - Controls how many processes are used in the copy pre-process action.
    • _SYNC_WAIT_TIME - time in seconds to wait between staging data for an Amazon Redshift destination and executing COPY on the Redshift cluster.
  • variables predefined by Xplenty, whose values are set by the system when the job is run:
    • _CLUSTER_ID_S3_ESCAPED - The S3 escaped ID of the cluster on which the job is running.
    • _CLUSTER_ID - The ID of the cluster on which the job is running.
    • _CLUSTER_NODES_COUNT - The number of nodes in the cluster that executes the job.
    • _JOB_ID - Xplenty Job Identifier
    • _JOB_ID_S3_ESCAPED - S3 escaped Xplenty Job Identifier
    • _JOB_SUBMISSION_TIMESTAMP - ISO-8601 date-time value of the time the job was submitted in UTC. For example: 2013-04-22T14:18:17Z
    • _JOB_SUBMISSION_TIMESTAMP_S3_ESCAPED - S3 escaped date-time value of the time the job was submitted in UTC. For example: 2013-01-09T14-52-21Z
    • _JOB_SUBMITTER_EMAIL - The email address of the user who submitted the job. For example: helpdesk@xplenty.com
    • _JOB_SUBMITTER_EMAIL_S3_ESCAPED - The S3 escaped email address of the user who submitted the job. For example: helpdesk-xplenty-com
    • _PACKAGE_OWNER_EMAIL - The email address of the user who owns the package.
    • _PACKAGE_OWNER_EMAIL_S3_ESCAPED - The S3 escaped email address of the user who owns the package.
    • _ACCOUNT_ID - The internal id of the account under which the package and job were created.
    • _ACCOUNT_ID_S3_ESCAPED - The S3 escaped internal id of the account under which the package and job were created.
    • _ACCOUNT_NAME - The name of the account under which the package and job were created.
    • _ACCOUNT_NAME_S3_ESCAPED - The S3 escaped name of the account under which the package and job were created.
    • _PACKAGE_LAST_SUCCESSFUL_JOB_SUBMISSION_TIMESTAMP - The timestamp (in ISO-8601 date-time format) of the last successful job for the same package in UTC. This variable can be used to read data is newer than the previously executed job in incremental loads.

Using variables in a variable or a field:

In the required field, type $ followed by the variable name. For example, if the variable name is country, type $country. Note that variables are simply substituted with their values. Therefore if you use a variable where a string value is expected, you should enclose it in single quotes, as in this example: SUBSTRING('$country',3,5)

Setting variable values in your package

Set user and system variable default values in the package designer. If required, you can override these default values when you run a job through the UI (see Running jobs), the scheduler or the API.

To define a user variable and set its value:
  1. Click ... (More options), then click Set variables ...
  2. On the user variables tab, type a name and a default value in the relevant text boxes.
  3. Add additional variables as required.
You can also set the user variable to these variables:
  • Predefined variables
  • Another user variable that is listed before this new variable

To modify a system variable default value in the package:

  1. Click ... (More options), then click Set variables ...
  2. Click system variables.
  3. Type a new default value in the relevant text box.
You can also set the system variable to these variables:
  • Predefined variables
  • User variables
  • Another system variable that is listed before this system variable

Creating packages

  1. Creating a new package
  2. Create a package from a template
  3. Working in the package designer
  4. Using Components: Facebook Ads Insights Source (Beta)
  5. Using components: File Storage Source
  6. Using components: Database Source
  7. Using components: Google AdWords Source
  8. Using components: NetSuite Source
  9. Using Components: Google Analytics Source
  10. Using Components: Google BigQuery Source
  11. Using components: Google Cloud Spanner Source
  12. Using Components: Bing Ads Source
  13. Using components: MongoDB Source
  14. Using components: Amazon Redshift Source
  15. Using Components: Rest API Source
  16. Using Components: Salesforce Source
  17. Using components: Select
  18. Using components: Sort
  19. Using components: Rank
  20. Using components: Limit
  21. Using components: Sample
  22. Using components: Join
  23. Using components: Cross Join
  24. Using components: Clone
  25. Using components: Cube and Rollup
  26. Using components: Union
  27. Using components: Filter
  28. Using Components: Window
  29. Using components: Assert
  30. Using components: Aggregate
  31. Using components: Distinct
  32. Using components: File Storage Destination
  33. Using components: Amazon Redshift Destination
  34. Using Components: Salesforce Destination (Beta)
  35. Using components: Google BigQuery Destination
  36. Using components: Google Cloud Spanner Destination
  37. Using components: Database Destination
  38. Using components: MongoDB Destination
  39. Using and setting variables in your packages
  40. Validating a package
  41. Using pattern-matching in source component paths
  42. Using ISO 8601 string functions
  43. Using Expressions in Xplenty
  44. Xplenty Functions

Feedback and Knowledge Base