Tableau Data Relationships vs Tableau Joins
Tableau data connections can be a little complicated if you are unfamiliar with the data models. There are 3 different ways to merge data together from different data sources, Data Relationships, Data Joins and Blends. Tableau’s new default way is the data relationships which makes things a lot easier for the novice. You can still do the traditional joins but you now don’t have to worry about duplication problems which was a major concern with doing joins historically. In this blog post we will be talking about the difference between the new relationships and joins so you know the difference between the two conceptually.
Enter your email to download the data
Simple Data Relationship by Order ID
When connecting to raw data and brining 2 or more tables into the data connection screen the default interface will be the data relationship screen. This screen is known as the logical layer because as you bring tables out, they will stay independent of each other outside of the logical connection you give it. This is a lot different then in the join interface because you can use these tables separate of each other compared to a join where you combined the data together into one table. Once you bring out the tables Tableau will ask you to let it know how these tables are related, if the column names are the same in both files usually Tableau will guess it correctly for you, if not then just select the correct column names from the dropdown. Now when you go to create a chart you will be able to use columns from both data sources and it will only activate each table IF you choose a column from that table, if not it will not use it. The important part here is when it comes to measures because this route will not duplicate the rows and give you the correct sum.
Simple Join by Order ID
The traditional join interface is still there you just have to double click on one of the tables in the Tableau data canvas to open it. The join interface looks similar but behaves a lot differently. You still have to select how the tables are related, which is by Order ID in our data. The difference here is this will merge the 2 tables together into one table compared to having 2 separate tables that work together as needed. The main problem people have to this is when there is a one-to-many problem, which will result in duplication and mess up your numbers. For example, if in File 1 you have 1 row with Order ID A1 and in File 2 you have 3 rows with Order ID A1, then this will duplicate the measures in File by 3x. Now you can solve this by using LOD calculations but that is pretty advanced and you don’t need to do that if using the relationship route.
Data Relationships and Join Combo
You can use data relationships and joins together in one data connection. Let’s say we sticking with the same simple join on Order ID with 2 tables. You could join for example the sales person in the first table to create one table with that new column then do a data relationship with the second table. You could also just do another relationship with the 3rd table instead of a join. You can mix and match as needed to get your desired results. Usually, you will not have any problems if you are bringing into a dimension and there is a one-to-one relationship which will not cause duplications.
Conclusion
I would say to just stick to data relationship by default and if for some reason you are not getting the correct number you are picturing or run into some limitations then look into seeing if a join can solve it. I think most of the time you should be ok with just doing the data relationship especially if you are not familiar with joins. The main thing to remember is the data relationships will solve your duplication problems most the time compared to doing a join, you might have duplication so always double check your numbers.