Using components: Join Transformation

Use the Join transformation to combine records from two different inputs. The join component can be used to add information from one data source to another data source or to filter data that exists in both data sources or exists in only one of them. It is important to select join keys (see below) that are unique in at least one of the data inputs to avoid data duplication which may result undesired consequences (e.g. billions of records in output or cluster errors due to disk or memory problems).

Join inputs

Select 2 inputs and drag and drop the component menu button onto the join component:

Within the component you can define which input is the left input and which is the right input.


Join properties

Join type

  • Inner - returns only those records that have a matching value in the join keys from both the left and right inputs. Note that null values are not considered matching values.
  • Left - returns the same records as an inner join, as well as records from the left input that have no matches in the join key in the right input. Such records will have null values in the right input fields.
  • Right - returns the same records as an inner join, as well as records from the right input that have no matches in the join key in the left input. Such records will have null values in the left input fields.
  • Full - returns all records that would be returned by a inner (matches on both inputs), records from the left input that have no matches in the right input and records from the right input that have no matches in the left input.

Join optimization

  • Default - uses Hash join - both inputs are read, tagged by input and are sorted and put into buckets according to the join keys. Then for each key, the records are cross joined by input tags.
  • Replicated - use when one input is small enough to fit into main memory, thereby improving efficiency. The large relation should be the left input and the small one should be the right input. If the small relation doesn't fit into main memory, the process fails and an error is generated. Replicated join only works with inner or left joins.
  • Skewed - use if the underlying key values are very skewed, so that processing isn't evenly distributed. This will affect performance and may cause the process that deals with most of the data to go out of memory. When Skewed join is used, a histogram is computed on the join key using the left input and this data is used to allocate more reducers for a given key.

Join keys

Inputs are joined on equal key values. Select the key fields that should have equal values in both inputs. If multiple key fields are selected, the join component's output the records that meet all the join conditions (logical "AND").

If the join's inputs contain fields with the same names, they will appear in the join output prefixed by the input's name (e.g. adwords1::customer_id). Use a select component to fix the field aliases or remove fields that contain the same values.

Creating packages

  1. Creating a new package in New Xplenty
  2. Creating a workflow
  3. Working in the new package designer
  4. Validating a package
  5. Using components: Amazon Redshift Source
  6. Using components: Bing Ads Source
  7. Using components: Database Source
  8. Using components: Facebook Ads Insights Source
  9. Using components: File Storage Source
  10. Using components: Google Adwords source
  11. Using components: Google Analytics Source
  12. Using components: Google BigQuery Source
  13. Using components: Google Cloud Spanner Source
  14. Using components: MongoDB Source
  15. Using components: NetSuite Source
  16. Using components: Salesforce source
  17. Using components: Rest API Source
  18. Using components: Aggregate Transformation
  19. Using components: Assert Transformation
  20. Using components: Clone transformation
  21. Using components: Cross Join Transformation
  22. Using components: Distinct Transformation
  23. Using components: Filter Transformation
  24. Using components: Join Transformation
  25. Using components: Limit Transformation
  26. Using components: Rank Transformation
  27. Using components: Sort Transformation
  28. Using components: Union Transformation
  29. Using components: Window Transformation
  30. Using components: Sample Transformation
  31. Using components: Cube transformation
  32. Using components: Amazon Redshift Destination
  33. Using components: Database Destination
  34. Using components: File Storage Destination
  35. Using components: Google BigQuery Destination
  36. Using components: Google Spanner Destination
  37. Using components: MongoDB Destination
  38. Using components: Salesforce Destination
  39. Using components: Snowflake Destination
  40. Using Components: Rest API Destination
  41. Using and setting variables in your packages
  42. System and pre-defined variables
  43. Using pattern-matching in source component paths
  44. Using ISO 8601 string functions
  45. Using Expressions in Xplenty
  46. Xplenty Functions

Feedback and Knowledge Base