Using components: Join

Use the Join component to combine records from two different dataflows, referred to (in the Join component) as the left source and right source.

To join two dataflows, simply drag and drop the component menu button onto the join component:

A join condition is formed by selecting a field from the left dataflow and a field with a similar datatype from the right dataflow.

The join condition is an equijoin. It compares the values in the fields being joined for equality and then includes all the fields in the left and right sources.

If you define more than one join condition, the Join component returns records with values that meet all the join conditions specified (logical "AND").

The records returned by the join condition depend on the type of join as follows:

  • Inner - returns only those records that have a matching value in the joined fields from both the left and right sources. Note that null values are not considered matching values.
  • Left - returns the same records as an inner join, as well as records from the left source that have no matches in the joined field in the right source. Such records will have null values in the right source fields.
  • Right - returns the same records as an inner join, as well as records from the right source that have no matches in the joined field in the left source. Such records will have null values in the left source fields.
  • Full - returns all records that would be returned by a inner, left and right joins (all records from both tables are returned).

Hint: After a Join, you usually add a Select component to fix up the aliases and field order.

To join two sources by one or more fields:

  1. Add a Join component in your package where two dataflows can be joined.
  2. Open the component and name it.
  3. Under general options, define the type of join.
  4. In the first row, define the fields to be joined for each source (left and right).
  5. If required, add fields for additional join criteria.

To optimize a join according to the data in the records:

  1. Under join type, click advanced options.
  2. Select one of the optimization types as follows:

    • Default - uses Hash join - both inputs are read, tagged by source and are sorted and put into buckets according to the join keys. Then for each key, the records are cross joined by source tags.
    • Replicated - use when one input is small enough to fit into main memory, thereby improving efficiency. The large relation should be the left source and the small one should be the right source. If the small relation doesn't fit into main memory, the process fails and an error is generated. Replicated join only works with inner or left joins.
    • Skeweduse if the underlying key values are very skewed, so that processing isn't evenly distributed. This will affect performance and may cause the reducer that deals with most of the data to go out of memory. When Skewed join is used, a histogram is computed on the join key using the left source and this data is used to allocate more reducers for a given key. 
    • Merge - use if both inputs are already sorted on the join key, enabling a significant performance improvement. Merge join only works with inner joins.

Creating packages

  1. Creating a new package
  2. Create a package from a template
  3. Working in the package designer
  4. Using Components: Facebook Ads Insights Source (Beta)
  5. Using components: File Storage Source
  6. Using components: Database Source
  7. Using components: Google AdWords Source
  8. Using components: NetSuite Source
  9. Using Components: Google Analytics Source
  10. Using Components: Google BigQuery Source
  11. Using components: Google Cloud Spanner Source
  12. Using Components: Bing Ads Source
  13. Using components: MongoDB Source
  14. Using components: Amazon Redshift Source
  15. Using Components: Rest API Source
  16. Using Components: Salesforce Source
  17. Using components: Select
  18. Using components: Sort
  19. Using components: Rank
  20. Using components: Limit
  21. Using components: Sample
  22. Using components: Join
  23. Using components: Cross Join
  24. Using components: Clone
  25. Using components: Cube and Rollup
  26. Using components: Union
  27. Using components: Filter
  28. Using Components: Window
  29. Using components: Assert
  30. Using components: Aggregate
  31. Using components: Distinct
  32. Using components: File Storage Destination
  33. Using components: Amazon Redshift Destination
  34. Using Components: Salesforce Destination
  35. Using components: Google BigQuery Destination
  36. Using components: Google Cloud Spanner Destination
  37. Using components: Database Destination
  38. Using components: MongoDB Destination
  39. Using components: Snowflake Destination (beta)
  40. Using and setting variables in your packages
  41. System and pre-defined variables
  42. Validating a package
  43. Using pattern-matching in source component paths
  44. Using ISO 8601 string functions
  45. Using Expressions in Xplenty
  46. Xplenty Functions

Feedback and Knowledge Base