1===================================================
2Adding reference counters (krefs) to kernel objects
3===================================================
4
5:Author: Corey Minyard <minyard@acm.org>
6:Author: Thomas Hellstrom <thellstrom@vmware.com>
7
8A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
9presentation on krefs, which can be found at:
10
11  - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
12  - http://www.kroah.com/linux/talks/ols_2004_kref_talk/
13
14Introduction
15============
16
17krefs allow you to add reference counters to your objects.  If you
18have objects that are used in multiple places and passed around, and
19you don't have refcounts, your code is almost certainly broken.  If
20you want refcounts, krefs are the way to go.
21
22To use a kref, add one to your data structures like::
23
24    struct my_data
25    {
26	.
27	.
28	struct kref refcount;
29	.
30	.
31    };
32
33The kref can occur anywhere within the data structure.
34
35Initialization
36==============
37
38You must initialize the kref after you allocate it.  To do this, call
39kref_init as so::
40
41     struct my_data *data;
42
43     data = kmalloc(sizeof(*data), GFP_KERNEL);
44     if (!data)
45            return -ENOMEM;
46     kref_init(&data->refcount);
47
48This sets the refcount in the kref to 1.
49
50Kref rules
51==========
52
53Once you have an initialized kref, you must follow the following
54rules:
55
561) If you make a non-temporary copy of a pointer, especially if
57   it can be passed to another thread of execution, you must
58   increment the refcount with kref_get() before passing it off::
59
60       kref_get(&data->refcount);
61
62   If you already have a valid pointer to a kref-ed structure (the
63   refcount cannot go to zero) you may do this without a lock.
64
652) When you are done with a pointer, you must call kref_put()::
66
67       kref_put(&data->refcount, data_release);
68
69   If this is the last reference to the pointer, the release
70   routine will be called.  If the code never tries to get
71   a valid pointer to a kref-ed structure without already
72   holding a valid pointer, it is safe to do this without
73   a lock.
74
753) If the code attempts to gain a reference to a kref-ed structure
76   without already holding a valid pointer, it must serialize access
77   where a kref_put() cannot occur during the kref_get(), and the
78   structure must remain valid during the kref_get().
79
80For example, if you allocate some data and then pass it to another
81thread to process::
82
83    void data_release(struct kref *ref)
84    {
85	struct my_data *data = container_of(ref, struct my_data, refcount);
86	kfree(data);
87    }
88
89    void more_data_handling(void *cb_data)
90    {
91	struct my_data *data = cb_data;
92	.
93	. do stuff with data here
94	.
95	kref_put(&data->refcount, data_release);
96    }
97
98    int my_data_handler(void)
99    {
100	int rv = 0;
101	struct my_data *data;
102	struct task_struct *task;
103	data = kmalloc(sizeof(*data), GFP_KERNEL);
104	if (!data)
105		return -ENOMEM;
106	kref_init(&data->refcount);
107
108	kref_get(&data->refcount);
109	task = kthread_run(more_data_handling, data, "more_data_handling");
110	if (task == ERR_PTR(-ENOMEM)) {
111		rv = -ENOMEM;
112	        kref_put(&data->refcount, data_release);
113		goto out;
114	}
115
116	.
117	. do stuff with data here
118	.
119    out:
120	kref_put(&data->refcount, data_release);
121	return rv;
122    }
123
124This way, it doesn't matter what order the two threads handle the
125data, the kref_put() handles knowing when the data is not referenced
126any more and releasing it.  The kref_get() does not require a lock,
127since we already have a valid pointer that we own a refcount for.  The
128put needs no lock because nothing tries to get the data without
129already holding a pointer.
130
131In the above example, kref_put() will be called 2 times in both success
132and error paths. This is necessary because the reference count got
133incremented 2 times by kref_init() and kref_get().
134
135Note that the "before" in rule 1 is very important.  You should never
136do something like::
137
138	task = kthread_run(more_data_handling, data, "more_data_handling");
139	if (task == ERR_PTR(-ENOMEM)) {
140		rv = -ENOMEM;
141		goto out;
142	} else
143		/* BAD BAD BAD - get is after the handoff */
144		kref_get(&data->refcount);
145
146Don't assume you know what you are doing and use the above construct.
147First of all, you may not know what you are doing.  Second, you may
148know what you are doing (there are some situations where locking is
149involved where the above may be legal) but someone else who doesn't
150know what they are doing may change the code or copy the code.  It's
151bad style.  Don't do it.
152
153There are some situations where you can optimize the gets and puts.
154For instance, if you are done with an object and enqueuing it for
155something else or passing it off to something else, there is no reason
156to do a get then a put::
157
158	/* Silly extra get and put */
159	kref_get(&obj->ref);
160	enqueue(obj);
161	kref_put(&obj->ref, obj_cleanup);
162
163Just do the enqueue.  A comment about this is always welcome::
164
165	enqueue(obj);
166	/* We are done with obj, so we pass our refcount off
167	   to the queue.  DON'T TOUCH obj AFTER HERE! */
168
169The last rule (rule 3) is the nastiest one to handle.  Say, for
170instance, you have a list of items that are each kref-ed, and you wish
171to get the first one.  You can't just pull the first item off the list
172and kref_get() it.  That violates rule 3 because you are not already
173holding a valid pointer.  You must add a mutex (or some other lock).
174For instance::
175
176	static DEFINE_MUTEX(mutex);
177	static LIST_HEAD(q);
178	struct my_data
179	{
180		struct kref      refcount;
181		struct list_head link;
182	};
183
184	static struct my_data *get_entry()
185	{
186		struct my_data *entry = NULL;
187		mutex_lock(&mutex);
188		if (!list_empty(&q)) {
189			entry = container_of(q.next, struct my_data, link);
190			kref_get(&entry->refcount);
191		}
192		mutex_unlock(&mutex);
193		return entry;
194	}
195
196	static void release_entry(struct kref *ref)
197	{
198		struct my_data *entry = container_of(ref, struct my_data, refcount);
199
200		list_del(&entry->link);
201		kfree(entry);
202	}
203
204	static void put_entry(struct my_data *entry)
205	{
206		mutex_lock(&mutex);
207		kref_put(&entry->refcount, release_entry);
208		mutex_unlock(&mutex);
209	}
210
211The kref_put() return value is useful if you do not want to hold the
212lock during the whole release operation.  Say you didn't want to call
213kfree() with the lock held in the example above (since it is kind of
214pointless to do so).  You could use kref_put() as follows::
215
216	static void release_entry(struct kref *ref)
217	{
218		/* All work is done after the return from kref_put(). */
219	}
220
221	static void put_entry(struct my_data *entry)
222	{
223		mutex_lock(&mutex);
224		if (kref_put(&entry->refcount, release_entry)) {
225			list_del(&entry->link);
226			mutex_unlock(&mutex);
227			kfree(entry);
228		} else
229			mutex_unlock(&mutex);
230	}
231
232This is really more useful if you have to call other routines as part
233of the free operations that could take a long time or might claim the
234same lock.  Note that doing everything in the release routine is still
235preferred as it is a little neater.
236
237The above example could also be optimized using kref_get_unless_zero() in
238the following way::
239
240	static struct my_data *get_entry()
241	{
242		struct my_data *entry = NULL;
243		mutex_lock(&mutex);
244		if (!list_empty(&q)) {
245			entry = container_of(q.next, struct my_data, link);
246			if (!kref_get_unless_zero(&entry->refcount))
247				entry = NULL;
248		}
249		mutex_unlock(&mutex);
250		return entry;
251	}
252
253	static void release_entry(struct kref *ref)
254	{
255		struct my_data *entry = container_of(ref, struct my_data, refcount);
256
257		mutex_lock(&mutex);
258		list_del(&entry->link);
259		mutex_unlock(&mutex);
260		kfree(entry);
261	}
262
263	static void put_entry(struct my_data *entry)
264	{
265		kref_put(&entry->refcount, release_entry);
266	}
267
268Which is useful to remove the mutex lock around kref_put() in put_entry(), but
269it's important that kref_get_unless_zero is enclosed in the same critical
270section that finds the entry in the lookup table,
271otherwise kref_get_unless_zero may reference already freed memory.
272Note that it is illegal to use kref_get_unless_zero without checking its
273return value. If you are sure (by already having a valid pointer) that
274kref_get_unless_zero() will return true, then use kref_get() instead.
275
276Krefs and RCU
277=============
278
279The function kref_get_unless_zero also makes it possible to use rcu
280locking for lookups in the above example::
281
282	struct my_data
283	{
284		struct rcu_head rhead;
285		.
286		struct kref refcount;
287		.
288		.
289	};
290
291	static struct my_data *get_entry_rcu()
292	{
293		struct my_data *entry = NULL;
294		rcu_read_lock();
295		if (!list_empty(&q)) {
296			entry = container_of(q.next, struct my_data, link);
297			if (!kref_get_unless_zero(&entry->refcount))
298				entry = NULL;
299		}
300		rcu_read_unlock();
301		return entry;
302	}
303
304	static void release_entry_rcu(struct kref *ref)
305	{
306		struct my_data *entry = container_of(ref, struct my_data, refcount);
307
308		mutex_lock(&mutex);
309		list_del_rcu(&entry->link);
310		mutex_unlock(&mutex);
311		kfree_rcu(entry, rhead);
312	}
313
314	static void put_entry(struct my_data *entry)
315	{
316		kref_put(&entry->refcount, release_entry_rcu);
317	}
318
319But note that the struct kref member needs to remain in valid memory for a
320rcu grace period after release_entry_rcu was called. That can be accomplished
321by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
322before using kfree, but note that synchronize_rcu() may sleep for a
323substantial amount of time.
324