1From: herbs@cntc.com (Herb Sutter)
2Subject: Guru of the Week #29: Solution
3Date: 22 Jan 1998 00:00:00 GMT
4Message-ID: <6a8q26$9qa@netlab.cs.rpi.edu>
5Newsgroups: comp.lang.c++.moderated
6
7
8 .--------------------------------------------------------------------.
9 |  Guru of the Week problems and solutions are posted regularly on   |
10 |   news:comp.lang.c++.moderated. For past problems and solutions    |
11 |            see the GotW archive at http://www.cntc.com.            |
12 | Is there a topic you'd like to see covered? mailto:herbs@cntc.com  |
13 `--------------------------------------------------------------------'
14_______________________________________________________
15
16GotW #29:   Strings
17
18Difficulty: 7 / 10
19_______________________________________________________
20
21
22>Write a ci_string class which is identical to the
23>standard 'string' class, but is case-insensitive in the
24>same way as the C function stricmp():
25
26The "how can I make a case-insensitive string?"
27question is so common that it probably deserves its own
28FAQ -- hence this issue of GotW.
29
30Note 1:  The stricmp() case-insensitive string
31comparison function is not part of the C standard, but
32it is a common extension on many C compilers.
33
34Note 2:  What "case insensitive" actually means depends
35entirely on your application and language.  For
36example, many languages do not have "cases" at all, and
37for languages that do you have to decide whether you
38want accented characters to compare equal to unaccented
39characters, and so on.  This GotW provides guidance on
40how to implement case-insensitivity for standard
41strings in whatever sense applies to your particular
42situation.
43
44
45Here's what we want to achieve:
46
47>    ci_string s( "AbCdE" );
48>
49>    // case insensitive
50>    assert( s == "abcde" );
51>    assert( s == "ABCDE" );
52>
53>    // still case-preserving, of course
54>    assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
55>    assert( strcmp( s.c_str(), "abcde" ) != 0 );
56
57The key here is to understand what a "string" actually
58is in standard C++.  If you look in your trusty string
59header, you'll see something like this:
60
61  typedef basic_string<char> string;
62
63So string isn't really a class... it's a typedef of a
64template.  In turn, the basic_string<> template is
65declared as follows, in all its glory:
66
67  template<class charT,
68           class traits = char_traits<charT>,
69           class Allocator = allocator<charT> >
70      class basic_string;
71
72So "string" really means "basic_string<char,
73char_traits<char>, allocator<char> >".  We don't need
74to worry about the allocator part, but the key here is
75the char_traits part because char_traits defines how
76characters interact and compare(!).
77
78basic_string supplies useful comparison functions that
79let you compare whether a string is equal to another,
80less than another, and so on.  These string comparisons
81functions are built on top of character comparison
82functions supplied in the char_traits template.  In
83particular, the char_traits template supplies character
84comparison functions named eq(), ne(), and lt() for
85equality, inequality, and less-than comparisons, and
86compare() and find() functions to compare and search
87sequences of characters.
88
89If we want these to behave differently, all we have to
90do is provide a different char_traits template!  Here's
91the easiest way:
92
93  struct ci_char_traits : public char_traits<char>
94                // just inherit all the other functions
95                //  that we don't need to override
96  {
97    static bool eq( char c1, char c2 ) {
98      return tolower(c1) == tolower(c2);
99    }
100
101    static bool ne( char c1, char c2 ) {
102      return tolower(c1) != tolower(c2);
103    }
104
105    static bool lt( char c1, char c2 ) {
106      return tolower(c1) < tolower(c2);
107    }
108
109    static int compare( const char* s1,
110                        const char* s2,
111                        size_t n ) {
112      return strnicmp( s1, s2, n );
113             // if available on your compiler,
114             //  otherwise you can roll your own
115    }
116
117    static const char*
118    find( const char* s, int n, char a ) {
119      while( n-- > 0 && tolower(*s) != tolower(a) ) {
120          ++s;
121      }
122      return n >= 0 ? s : 0;
123    }
124  };
125
126[N.B. A bug in the original code has been fixed for the
127GCC documentation, the corrected code was taken from
128Herb Sutter's book, Exceptional C++]
129
130And finally, the key that brings it all together:
131
132  typedef basic_string<char, ci_char_traits> ci_string;
133
134All we've done is created a typedef named "ci_string"
135which operates exactly like the standard "string",
136except that it uses ci_char_traits instead of
137char_traits<char> to get its character comparison
138rules.  Since we've handily made the ci_char_traits
139rules case-insensitive, we've made ci_string itself
140case-insensitive without any further surgery -- that
141is, we have a case-insensitive string without having
142touched basic_string at all!
143
144This GotW should give you a flavour for how the
145basic_string template works and how flexible it is in
146practice.  If you want different comparisons than the
147ones stricmp() and tolower() give you, just replace the
148five functions shown above with your own code that
149performs character comparisons the way that's
150appropriate in your particular application.
151
152
153
154Exercise for the reader:
155
156Is it safe to inherit ci_char_traits from
157char_traits<char> this way?  Why or why not?
158
159
160