1--- 2stage: none 3group: unassigned 4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments 5--- 6 7# Polymorphic Associations 8 9**Summary:** always use separate tables instead of polymorphic associations. 10 11Rails makes it possible to define so called "polymorphic associations". This 12usually works by adding two columns to a table: a target type column, and a 13target ID. For example, at the time of writing we have such a setup for 14`members` with the following columns: 15 16- `source_type`: a string defining the model to use, can be either `Project` or 17 `Namespace`. 18- `source_id`: the ID of the row to retrieve based on `source_type`. For 19 example, when `source_type` is `Project` then `source_id` contains a 20 project ID. 21 22While such a setup may appear to be useful, it comes with many drawbacks; enough 23that you should avoid this at all costs. 24 25## Space Wasted 26 27Because this setup relies on string values to determine the model to use, it 28wastes a lot of space. For example, for `Project` and `Namespace` the 29maximum size is 9 bytes, plus 1 extra byte for every string when using 30PostgreSQL. While this may only be 10 bytes per row, given enough tables and 31rows using such a setup we can end up wasting quite a bit of disk space and 32memory (for any indexes). 33 34## Indexes 35 36Because our associations are broken up into two columns this may result in 37requiring composite indexes for queries to be performed efficiently. While 38composite indexes are not wrong at all, they can be tricky to set up as the 39ordering of columns in these indexes is important to ensure optimal performance. 40 41## Consistency 42 43One really big problem with polymorphic associations is being unable to enforce 44data consistency on the database level using foreign keys. For consistency to be 45enforced on the database level one would have to write their own foreign key 46logic to support polymorphic associations. 47 48Enforcing consistency on the database level is absolutely crucial for 49maintaining a healthy environment, and thus is another reason to avoid 50polymorphic associations. 51 52## Query Overhead 53 54When using polymorphic associations you always need to filter using both 55columns. For example, you may end up writing a query like this: 56 57```sql 58SELECT * 59FROM members 60WHERE source_type = 'Project' 61AND source_id = 13083; 62``` 63 64Here PostgreSQL can perform the query quite efficiently if both columns are 65indexed. As the query gets more complex, it may not be able to use these 66indexes effectively. 67 68## Mixed Responsibilities 69 70Similar to functions and classes, a table should have a single responsibility: 71storing data with a certain set of pre-defined columns. When using polymorphic 72associations, you are storing different types of data (possibly with 73different columns set) in the same table. 74 75## The Solution 76 77Fortunately, there is a solution to these problems: use a 78separate table for every type you would otherwise store in the same table. Using 79a separate table allows you to use everything a database may provide to ensure 80consistency and query data efficiently, without any additional application logic 81being necessary. 82 83Let's say you have a `members` table storing both approved and pending members, 84for both projects and groups, and the pending state is determined by the column 85`requested_at` being set or not. Schema wise such a setup can lead to various 86columns only being set for certain rows, wasting space. It's also possible that 87certain indexes are only set for certain rows, again wasting space. Finally, 88querying such a table requires less than ideal queries. For example: 89 90```sql 91SELECT * 92FROM members 93WHERE requested_at IS NULL 94AND source_type = 'GroupMember' 95AND source_id = 4 96``` 97 98Instead such a table should be broken up into separate tables. For example, you 99may end up with 4 tables in this case: 100 101- project_members 102- group_members 103- pending_project_members 104- pending_group_members 105 106This makes querying data trivial. For example, to get the members of a group 107you'd run: 108 109```sql 110SELECT * 111FROM group_members 112WHERE group_id = 4 113``` 114 115To get all the pending members of a group in turn you'd run: 116 117```sql 118SELECT * 119FROM pending_group_members 120WHERE group_id = 4 121``` 122 123If you want to get both you can use a `UNION`, though you need to be explicit 124about what columns you want to `SELECT` as otherwise the result set uses the 125columns of the first query. For example: 126 127```sql 128SELECT id, 'Group' AS target_type, group_id AS target_id 129FROM group_members 130 131UNION ALL 132 133SELECT id, 'Project' AS target_type, project_id AS target_id 134FROM project_members 135``` 136 137The above example is perhaps a bit silly, but it shows that there's nothing 138stopping you from merging the data together and presenting it on the same page. 139Selecting columns explicitly can also speed up queries as the database has to do 140less work to get the data (compared to selecting all columns, even ones you're 141not using). 142 143Our schema also becomes easier. No longer do we need to both store and index the 144`source_type` column, we can define foreign keys easily, and we don't need to 145filter rows using the `IS NULL` condition. 146 147To summarize: using separate tables allows us to use foreign keys effectively, 148create indexes only where necessary, conserve space, query data more 149efficiently, and scale these tables more easily (for example, by storing them on 150separate disks). A nice side effect of this is that code can also become easier, 151as a single model isn't responsible for handling different kinds of 152data. 153