1From: herbs@cntc.com (Herb Sutter) 2Subject: Guru of the Week #29: Solution 3Date: 22 Jan 1998 00:00:00 GMT 4Message-ID: <6a8q26$9qa@netlab.cs.rpi.edu> 5Newsgroups: comp.lang.c++.moderated 6 7 8 .--------------------------------------------------------------------. 9 | Guru of the Week problems and solutions are posted regularly on | 10 | news:comp.lang.c++.moderated. For past problems and solutions | 11 | see the GotW archive at http://www.cntc.com. | 12 | Is there a topic you'd like to see covered? mailto:herbs@cntc.com | 13 `--------------------------------------------------------------------' 14_______________________________________________________ 15 16GotW #29: Strings 17 18Difficulty: 7 / 10 19_______________________________________________________ 20 21 22>Write a ci_string class which is identical to the 23>standard 'string' class, but is case-insensitive in the 24>same way as the C function stricmp(): 25 26The "how can I make a case-insensitive string?" 27question is so common that it probably deserves its own 28FAQ -- hence this issue of GotW. 29 30Note 1: The stricmp() case-insensitive string 31comparison function is not part of the C standard, but 32it is a common extension on many C compilers. 33 34Note 2: What "case insensitive" actually means depends 35entirely on your application and language. For 36example, many languages do not have "cases" at all, and 37for languages that do you have to decide whether you 38want accented characters to compare equal to unaccented 39characters, and so on. This GotW provides guidance on 40how to implement case-insensitivity for standard 41strings in whatever sense applies to your particular 42situation. 43 44 45Here's what we want to achieve: 46 47> ci_string s( "AbCdE" ); 48> 49> // case insensitive 50> assert( s == "abcde" ); 51> assert( s == "ABCDE" ); 52> 53> // still case-preserving, of course 54> assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); 55> assert( strcmp( s.c_str(), "abcde" ) != 0 ); 56 57The key here is to understand what a "string" actually 58is in standard C++. If you look in your trusty string 59header, you'll see something like this: 60 61 typedef basic_string<char> string; 62 63So string isn't really a class... it's a typedef of a 64template. In turn, the basic_string<> template is 65declared as follows, in all its glory: 66 67 template<class charT, 68 class traits = char_traits<charT>, 69 class Allocator = allocator<charT> > 70 class basic_string; 71 72So "string" really means "basic_string<char, 73char_traits<char>, allocator<char> >". We don't need 74to worry about the allocator part, but the key here is 75the char_traits part because char_traits defines how 76characters interact and compare(!). 77 78basic_string supplies useful comparison functions that 79let you compare whether a string is equal to another, 80less than another, and so on. These string comparisons 81functions are built on top of character comparison 82functions supplied in the char_traits template. In 83particular, the char_traits template supplies character 84comparison functions named eq(), ne(), and lt() for 85equality, inequality, and less-than comparisons, and 86compare() and find() functions to compare and search 87sequences of characters. 88 89If we want these to behave differently, all we have to 90do is provide a different char_traits template! Here's 91the easiest way: 92 93 struct ci_char_traits : public char_traits<char> 94 // just inherit all the other functions 95 // that we don't need to override 96 { 97 static bool eq( char c1, char c2 ) { 98 return tolower(c1) == tolower(c2); 99 } 100 101 static bool ne( char c1, char c2 ) { 102 return tolower(c1) != tolower(c2); 103 } 104 105 static bool lt( char c1, char c2 ) { 106 return tolower(c1) < tolower(c2); 107 } 108 109 static int compare( const char* s1, 110 const char* s2, 111 size_t n ) { 112 return strnicmp( s1, s2, n ); 113 // if available on your compiler, 114 // otherwise you can roll your own 115 } 116 117 static const char* 118 find( const char* s, int n, char a ) { 119 while( n-- > 0 && tolower(*s) != tolower(a) ) { 120 ++s; 121 } 122 return n >= 0 ? s : 0; 123 } 124 }; 125 126[N.B. A bug in the original code has been fixed for the 127GCC documentation, the corrected code was taken from 128Herb Sutter's book, Exceptional C++] 129 130And finally, the key that brings it all together: 131 132 typedef basic_string<char, ci_char_traits> ci_string; 133 134All we've done is created a typedef named "ci_string" 135which operates exactly like the standard "string", 136except that it uses ci_char_traits instead of 137char_traits<char> to get its character comparison 138rules. Since we've handily made the ci_char_traits 139rules case-insensitive, we've made ci_string itself 140case-insensitive without any further surgery -- that 141is, we have a case-insensitive string without having 142touched basic_string at all! 143 144This GotW should give you a flavour for how the 145basic_string template works and how flexible it is in 146practice. If you want different comparisons than the 147ones stricmp() and tolower() give you, just replace the 148five functions shown above with your own code that 149performs character comparisons the way that's 150appropriate in your particular application. 151 152 153 154Exercise for the reader: 155 156Is it safe to inherit ci_char_traits from 157char_traits<char> this way? Why or why not? 158 159 160