1---
2stage: none
3group: unassigned
4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
5---
6
7# Polymorphic Associations
8
9**Summary:** always use separate tables instead of polymorphic associations.
10
11Rails makes it possible to define so called "polymorphic associations". This
12usually works by adding two columns to a table: a target type column, and a
13target ID. For example, at the time of writing we have such a setup for
14`members` with the following columns:
15
16- `source_type`: a string defining the model to use, can be either `Project` or
17  `Namespace`.
18- `source_id`: the ID of the row to retrieve based on `source_type`. For
19  example, when `source_type` is `Project` then `source_id` contains a
20  project ID.
21
22While such a setup may appear to be useful, it comes with many drawbacks; enough
23that you should avoid this at all costs.
24
25## Space Wasted
26
27Because this setup relies on string values to determine the model to use, it
28wastes a lot of space. For example, for `Project` and `Namespace` the
29maximum size is 9 bytes, plus 1 extra byte for every string when using
30PostgreSQL. While this may only be 10 bytes per row, given enough tables and
31rows using such a setup we can end up wasting quite a bit of disk space and
32memory (for any indexes).
33
34## Indexes
35
36Because our associations are broken up into two columns this may result in
37requiring composite indexes for queries to be performed efficiently. While
38composite indexes are not wrong at all, they can be tricky to set up as the
39ordering of columns in these indexes is important to ensure optimal performance.
40
41## Consistency
42
43One really big problem with polymorphic associations is being unable to enforce
44data consistency on the database level using foreign keys. For consistency to be
45enforced on the database level one would have to write their own foreign key
46logic to support polymorphic associations.
47
48Enforcing consistency on the database level is absolutely crucial for
49maintaining a healthy environment, and thus is another reason to avoid
50polymorphic associations.
51
52## Query Overhead
53
54When using polymorphic associations you always need to filter using both
55columns. For example, you may end up writing a query like this:
56
57```sql
58SELECT *
59FROM members
60WHERE source_type = 'Project'
61AND source_id = 13083;
62```
63
64Here PostgreSQL can perform the query quite efficiently if both columns are
65indexed. As the query gets more complex, it may not be able to use these
66indexes effectively.
67
68## Mixed Responsibilities
69
70Similar to functions and classes, a table should have a single responsibility:
71storing data with a certain set of pre-defined columns. When using polymorphic
72associations, you are storing different types of data (possibly with
73different columns set) in the same table.
74
75## The Solution
76
77Fortunately, there is a solution to these problems: use a
78separate table for every type you would otherwise store in the same table. Using
79a separate table allows you to use everything a database may provide to ensure
80consistency and query data efficiently, without any additional application logic
81being necessary.
82
83Let's say you have a `members` table storing both approved and pending members,
84for both projects and groups, and the pending state is determined by the column
85`requested_at` being set or not. Schema wise such a setup can lead to various
86columns only being set for certain rows, wasting space. It's also possible that
87certain indexes are only set for certain rows, again wasting space. Finally,
88querying such a table requires less than ideal queries. For example:
89
90```sql
91SELECT *
92FROM members
93WHERE requested_at IS NULL
94AND source_type = 'GroupMember'
95AND source_id = 4
96```
97
98Instead such a table should be broken up into separate tables. For example, you
99may end up with 4 tables in this case:
100
101- project_members
102- group_members
103- pending_project_members
104- pending_group_members
105
106This makes querying data trivial. For example, to get the members of a group
107you'd run:
108
109```sql
110SELECT *
111FROM group_members
112WHERE group_id = 4
113```
114
115To get all the pending members of a group in turn you'd run:
116
117```sql
118SELECT *
119FROM pending_group_members
120WHERE group_id = 4
121```
122
123If you want to get both you can use a `UNION`, though you need to be explicit
124about what columns you want to `SELECT` as otherwise the result set uses the
125columns of the first query. For example:
126
127```sql
128SELECT id, 'Group' AS target_type, group_id AS target_id
129FROM group_members
130
131UNION ALL
132
133SELECT id, 'Project' AS target_type, project_id AS target_id
134FROM project_members
135```
136
137The above example is perhaps a bit silly, but it shows that there's nothing
138stopping you from merging the data together and presenting it on the same page.
139Selecting columns explicitly can also speed up queries as the database has to do
140less work to get the data (compared to selecting all columns, even ones you're
141not using).
142
143Our schema also becomes easier. No longer do we need to both store and index the
144`source_type` column, we can define foreign keys easily, and we don't need to
145filter rows using the `IS NULL` condition.
146
147To summarize: using separate tables allows us to use foreign keys effectively,
148create indexes only where necessary, conserve space, query data more
149efficiently, and scale these tables more easily (for example, by storing them on
150separate disks). A nice side effect of this is that code can also become easier,
151as a single model isn't responsible for handling different kinds of
152data.
153