1NAME
2
3 String::Tagged - string buffers with value tags on extents
4
5SYNOPSIS
6
7 use String::Tagged;
8
9 my $st = String::Tagged->new( "An important message" );
10
11 $st->apply_tag( 3, 9, bold => 1 );
12
13 $st->iter_substr_nooverlap(
14 sub {
15 my ( $substring, %tags ) = @_;
16
17 print $tags{bold} ? "<b>$substring</b>"
18 : $substring;
19 }
20 );
21
22DESCRIPTION
23
24 This module implements an object class, instances of which store a
25 (mutable) string buffer that supports tags. A tag is a name/value pair
26 that applies to some non-empty extent of the underlying string.
27
28 The types of tag names ought to be strings, or at least values that are
29 well-behaved as strings, as the names will often be used as the keys in
30 hashes or applied to the eq operator.
31
32 The types of tag values are not restricted - any scalar will do. This
33 could be a simple integer or string, ARRAY or HASH reference, or even a
34 CODE reference containing an event handler of some kind.
35
36 Tags may be arbitrarily overlapped. Any given offset within the string
37 has in effect, a set of uniquely named tags. Tags of different names
38 are independent. For tags of the same name, only the latest, shortest
39 tag takes effect.
40
41 For example, consider a string with three tags represented here:
42
43 Here is my string with tags
44 [-------------------------] foo => 1
45 [-------] foo => 2
46 [---] bar => 3
47
48 Every character in this string has a tag named foo. The value of this
49 tag is 2 for the words my and string and the space inbetween, and 1
50 elsewhere. Additionally, the words is and my and the space between them
51 also have the tag bar with a value 3.
52
53 Since String::Tagged does not understand the significance of the tag
54 values it therefore cannot detect if two neighbouring tags really
55 contain the same semantic idea. Consider the following string:
56
57 A string with words
58 [-------] type => "message"
59 [--------] type => "message"
60
61 This string contains two tags. String::Tagged will treat this as two
62 different tag values as far as iter_tags_nooverlap is concerned, even
63 though get_tag_at yields the same value for the type tag at any
64 position in the string. The merge_tags method may be used to merge tag
65 extents of tags that should be considered as equal.
66
67NAMING
68
69 I spent a lot of time considering the name for this module. It seems
70 that a number of people across a number of languages all created
71 similar functionality, though named very differently. For the benefit
72 of keyword-based search tools and similar, here's a list of some other
73 names this sort of object might be known by:
74
75 * Extents
76
77 * Overlays
78
79 * Attribute or attributed strings
80
81 * Markup
82
83 * Out-of-band data
84
85CONSTRUCTOR
86
87 new
88
89 $st = String::Tagged->new( $str )
90
91 Returns a new instance of a String::Tagged object. It will contain no
92 tags. If the optional $str argument is supplied, the string buffer will
93 be initialised from this value.
94
95 If $str is a String::Tagged object then it will be cloned, as if
96 calling the clone method on it.
97
98 new_tagged
99
100 $st = String::Tagged->new_tagged( $str, %tags )
101
102 Shortcut for creating a new String::Tagged object with the given tags
103 applied to the entire length. The tags will not be anchored at either
104 end.
105
106 clone (class)
107
108 $new = String::Tagged->clone( $orig, %opts )
109
110 Returns a new instance of String::Tagged made by cloning the original,
111 subject to the options provided. The returned instance will be in the
112 requested class, which need not match the class of the original.
113
114 The following options are recognised:
115
116 only_tags => ARRAY
117
118 If present, gives an ARRAY reference containing tag names. Only those
119 tags named here will be copied; others will be ignored.
120
121 except_tags => ARRAY
122
123 If present, gives an ARRAY reference containing tag names. All tags
124 will be copied except those named here.
125
126 convert_tags => HASH
127
128 If present, gives a HASH reference containing tag conversion
129 functions. For any tags in the original to be copied whose names
130 appear in the hash, the name and value are passed into the
131 corresponding function, which should return an even-sized key/value
132 list giving a tag, or a list of tags, to apply to the new clone.
133
134 my @new_tags = $convert_tags->{$orig_name}->( $orig_name, $orig_value )
135 # Where @new_tags is ( $new_name, $new_value, $new_name_2, $new_value_2, ... )
136
137 As a further convenience, if the value for a given tag name is a
138 plain string instead of a code reference, it gives the new name for
139 the tag, and will be applied with its existing value.
140
141 If only_tags is being used too, then the source names of any tags to
142 be converted must also be listed there, or they will not be copied.
143
144 clone (instance)
145
146 $new = $orig->clone( %args )
147
148 Called as an instance (rather than a class) method, the newly-cloned
149 instance is returned in the same class as the original.
150
151 from_sprintf
152
153 $str = String::Tagged->from_sprintf( $format, @args )
154
155 Since version 0.15.
156
157 Returns a new instance of a String::Tagged object, initialised by
158 formatting the supplied arguments using the supplied format.
159
160 The $format string is similar to that supported by the core sprintf
161 operator, though a few features such as out-of-order argument indexing
162 and vector formatting are missing. This format string may be a plain
163 perl string, or an instance of String::Tagged. In the latter case, any
164 tags within it are preserved in the result.
165
166 In the case of a %s conversion, the value of the argument consumed may
167 itself be a String::Tagged instance. In this case it will be appended
168 to the returned object, preserving any tags within it.
169
170 All other conversions are handled individually by the core sprintf
171 operator and appended to the result.
172
173 join
174
175 $str = String::Tagged->join( $sep, @parts )
176
177 Since version 0.17.
178
179 Returns a new instance of a String::Tagged object, formed by
180 concatenating each of the component piece together, joined with the
181 separator string.
182
183 The result will be much like the core join function, except that it
184 will preserve tags in the resulting string.
185
186METHODS
187
188 str
189
190 $str = $st->str
191
192 $str = "$st"
193
194 Returns the plain string contained within the object.
195
196 This method is also called for stringification; so the String::Tagged
197 object can be used in a plain string interpolation such as
198
199 my $message = String::Tagged->new( "Hello world" );
200 print "My message is $message\n";
201
202 length
203
204 $len = $st->length
205
206 $len = length( $st )
207
208 Returns the length of the plain string. Because stringification works
209 on this object class, the normal core length function works correctly
210 on it.
211
212 substr
213
214 $str = $st->substr( $start, $len )
215
216 Returns a String::Tagged instance representing a section from within
217 the given string, containing all the same tags at the same conceptual
218 positions.
219
220 plain_substr
221
222 $str = $st->plain_substr( $start, $len )
223
224 Returns as a plain perl string, the substring at the given position.
225 This will be the same string data as returned by substr, only as a
226 plain string without the tags
227
228 apply_tag
229
230 $st->apply_tag( $start, $len, $name, $value )
231
232 Apply the named tag value to the given extent. The tag will start on
233 the character at the $start index, and continue for the next $len
234 characters.
235
236 If $start is given as -1, the tag will be considered to start "before"
237 the actual string. If $len is given as -1, the tag will be considered
238 to end "after" end of the actual string. These special limits are used
239 by set_substr when deciding whether to move a tag boundary. The start
240 of any tag that starts "before" the string is never moved, even if more
241 text is inserted at the beginning. Similarly, a tag which ends "after"
242 the end of the string, will continue to the end even if more text is
243 appended.
244
245 This method returns the $st object.
246
247 $st->apply_tag( $e, $name, $value )
248
249 Alternatively, an existing extent object can be passed as the first
250 argument instead of two integers. The new tag will apply at the given
251 extent.
252
253 unapply_tag
254
255 $st->unapply_tag( $start, $len, $name )
256
257 Unapply the named tag value from the given extent. If the tag extends
258 beyond this extent, then any partial fragment of the tag will be left
259 in the string.
260
261 This method returns the $st object.
262
263 $st->unapply_tag( $e, $name )
264
265 Alternatively, an existing extent object can be passed as the first
266 argument instead of two integers.
267
268 delete_tag
269
270 $st->delete_tag( $start, $len, $name )
271
272 Delete the named tag within the given extent. Entire tags are removed,
273 even if they extend beyond this extent.
274
275 This method returns the $st object.
276
277 $st->delete_tag( $e, $name )
278
279 Alternatively, an existing extent object can be passed as the first
280 argument instead of two integers.
281
282 merge_tags
283
284 $st->merge_tags( $eqsub )
285
286 Merge neighbouring or overlapping tags of the same name and equal
287 values.
288
289 For each pair of tags of the same name that apply on neighbouring or
290 overlapping extents, the $eqsub callback is called, as
291
292 $equal = $eqsub->( $name, $value_a, $value_b )
293
294 If this function returns true then the tags are merged.
295
296 The equallity test function is free to perform any comparison of the
297 values that may be relevant to the application; for example it may
298 deeply compare referred structures and check for equivalence in some
299 application-defined manner. In this case, the first tag of a pair is
300 retained, the second is deleted. This may be relevant if the tag value
301 is a reference to some object.
302
303 iter_extents
304
305 $st->iter_extents( $callback, %opts )
306
307 Iterate the tags stored in the string. For each tag, the CODE reference
308 in $callback is invoked once, being passed an extent object that
309 represents the extent of the tag.
310
311 $callback->( $extent, $tagname, $tagvalue )
312
313 Options passed in %opts may include:
314
315 start => INT
316
317 Start at the given position; defaults to 0.
318
319 end => INT
320
321 End after the given position; defaults to end of string. This option
322 overrides len.
323
324 len => INT
325
326 End after the given length beyond the start position; defaults to end
327 of string. This option only applies if end is not given.
328
329 only => ARRAY
330
331 Select only the tags named in the given ARRAY reference.
332
333 except => ARRAY
334
335 Select all the tags except those named in the given ARRAY reference.
336
337 iter_tags
338
339 $st->iter_tags( $callback, %opts )
340
341 Iterate the tags stored in the string. For each tag, the CODE reference
342 in $callback is invoked once, being passed the start point and length
343 of the tag.
344
345 $callback->( $start, $length, $tagname, $tagvalue )
346
347 Options passed in %opts are the same as for iter_extents.
348
349 iter_extents_nooverlap
350
351 $st->iter_extents_nooverlap( $callback, %opts )
352
353 Iterate non-overlapping extents of tags stored in the string. The CODE
354 reference in $callback is invoked for each extent in the string where
355 no tags change. The entire set of tags active in that extent is given
356 to the callback. Because the extent covers possibly-multiple tags, it
357 will not define the anchor_before and anchor_after flags.
358
359 $callback->( $extent, %tags )
360
361 The callback will be invoked over the entire length of the string,
362 including any extents with no tags applied.
363
364 Options may be passed in %opts to control the range of the string
365 iterated over, in the same way as the iter_extents method.
366
367 If the only or except filters are applied, then only the tags that
368 survive filtering will be present in the %tags hash. Tags that are
369 excluded by the filtering will not be present, nor will their bounds be
370 used to split the string into extents.
371
372 iter_tags_nooverlap
373
374 $st->iter_tags_nooverlap( $callback, %opts )
375
376 Iterate extents of the string using iter_extents_nooverlap, but passing
377 the start and length of each extent to the callback instead of the
378 extent object.
379
380 $callback->( $start, $length, %tags )
381
382 Options may be passed in %opts to control the range of the string
383 iterated over, in the same way as the iter_extents method.
384
385 iter_substr_nooverlap
386
387 $st->iter_substr_nooverlap( $callback, %opts )
388
389 Iterate extents of the string using iter_extents_nooverlap, but passing
390 the substring of data instead of the extent object.
391
392 $callback->( $substr, %tags )
393
394 Options may be passed in %opts to control the range of the string
395 iterated over, in the same way as the iter_extents method.
396
397 tagnames
398
399 @names = $st->tagnames
400
401 Returns the set of tag names used in the string, in no particular
402 order.
403
404 get_tags_at
405
406 $tags = $st->get_tags_at( $pos )
407
408 Returns a HASH reference of all the tag values active at the given
409 position.
410
411 get_tag_at
412
413 $value = $st->get_tag_at( $pos, $name )
414
415 Returns the value of the named tag at the given position, or undef if
416 the tag is not applied there.
417
418 get_tag_extent
419
420 $extent = $st->get_tag_extent( $pos, $name )
421
422 If the named tag applies to the given position, returns the extent of
423 the tag at that position. If it does not, undef is returned. If an
424 extent is returned it will define the anchor_before and anchor_after
425 flags if appropriate.
426
427 get_tag_missing_extent
428
429 $extent = $st->get_tag_missing_extent( $pos, $name )
430
431 If the named tag does not apply at the given position, returns the
432 extent of the string around that position that does not have the tag.
433 If it does exist, undef is returned. If an extent is returned it will
434 not define the anchor_before and anchor_after flags, as these do not
435 make sense for the range in which a tag is absent.
436
437 set_substr
438
439 $st->set_substr( $start, $len, $newstr )
440
441 Modifies a extent of the underlying plain string to that given. The
442 extents of tags in the string are adjusted to cope with the modified
443 region, and the adjustment in length.
444
445 Tags entirely before the replaced extent remain unchanged.
446
447 Tags entirely within the replaced extent are deleted.
448
449 Tags entirely after the replaced extent are moved by appropriate amount
450 to ensure they still apply to the same characters as before.
451
452 Tags that start before and end after the extent remain, and have their
453 lengths suitably adjusted.
454
455 Tags that span just the start or end of the extent, but not both, are
456 truncated, so as to remove the part of the tag applied on the modified
457 extent but preserving that applied outside.
458
459 If $newstr is a String::Tagged object, then its tags will be applied to
460 $st as appropriate. Edge-anchored tags in $newstr will not be extended
461 through $st, though they will apply as edge-anchored if they now sit at
462 the edge of the new string.
463
464 insert
465
466 $st->insert( $start, $newstr )
467
468 Insert the given string at the given position. A shortcut around
469 set_substr.
470
471 If $newstr is a String::Tagged object, then its tags will be applied to
472 $st as appropriate. If $start is 0, any before-anchored tags in will
473 become before-anchored in $st.
474
475 append
476
477 $st->append( $newstr )
478
479 $st .= $newstr
480
481 Append to the underlying plain string. A shortcut around set_substr.
482
483 If $newstr is a String::Tagged object, then its tags will be applied to
484 $st as appropriate. Any after-anchored tags in will become
485 after-anchored in $st.
486
487 append_tagged
488
489 $st->append_tagged( $newstr, %tags )
490
491 Append to the underlying plain string, and apply the given tags to the
492 newly-inserted extent.
493
494 Returns $st itself so that the method may be easily chained.
495
496 concat
497
498 $ret = $st->concat( $other )
499
500 $ret = $st . $other
501
502 Returns a new String::Tagged containing the two strings concatenated
503 together, preserving any tags present. This method overloads normal
504 string concatenation operator, so expressions involving String::Tagged
505 values retain their tags.
506
507 This method or operator tries to respect subclassing; preferring to
508 return a new object of a subclass if either argument or operand is a
509 subclass of String::Tagged. If they are both subclasses, it will prefer
510 the type of the invocant or first operand.
511
512 matches
513
514 @subs = $st->matches( $regexp )
515
516 Returns a list of substrings (as String::Tagged instances) for every
517 non-overlapping match of the given $regexp.
518
519 This could be used, for example, to build a formatted string from a
520 formatted template containing variable expansions:
521
522 my $template = ...
523 my %vars = ...
524
525 my $ret = String::Tagged->new;
526 foreach my $m ( $template->matches( qr/\$\w+|[^$]+/ ) ) {
527 if( $m =~ m/^\$(\w+)$/ ) {
528 $ret->append_tagged( $vars{$1}, %{ $m->get_tags_at( 0 ) } );
529 }
530 else {
531 $ret->append( $m );
532 }
533 }
534
535 This iterates segments of the template containing variables expansions
536 starting with a $ symbol, and replaces them with values from the %vars
537 hash, careful to preserve all the formatting tags from the original
538 template string.
539
540 split
541
542 @parts = $st->split( $regexp, $limit )
543
544 Returns a list of substrings by applying the regexp to the string
545 content; similar to the core perl split function. If $limit is
546 supplied, the method will stop at that number of elements, returning
547 the entire remainder of the input string as the final element. If the
548 $regexp contains a capture group then the content of the first one will
549 be added to the return list as well.
550
551 sprintf
552
553 $ret = $st->sprintf( @args )
554
555 Since version 0.15.
556
557 Returns a new string by using the given instance as the format string
558 for a "from_sprintf" constructor call. The returned instance will be of
559 the same class as the invocant.
560
561 debug_sprintf
562
563 $ret = $st->debug_sprintf
564
565 Returns a representation of the string data and all the tags, suitable
566 for debug printing or other similar use. This is a format such as is
567 given in the DESCRIPTION section above.
568
569 The output will consist of a number of lines, the first containing the
570 plain underlying string, then one line per tag. The line shows the
571 extent of the tag given by [---] markers, or a | in the special case of
572 a tag covering only a single character. Special markings of < and >
573 indicate tags which are "before" or "after" anchored.
574
575 For example:
576
577 Hello, world
578 [---] word => 1
579 <[----------]> everywhere => 1
580 | space => 1
581
582Extent Objects
583
584 These objects represent a range of characters within the containing
585 String::Tagged object. The range they represent is fixed at the time of
586 creation. If the containing string is modified by a call to set_substr
587 then the effect on the extent object is not defined. These objects
588 should be considered as relatively short-lived - used briefly for the
589 purpose of querying the result of an operation, then discarded soon
590 after.
591
592 $extent->string
593
594 Returns the containing String::Tagged object.
595
596 $extent->start
597
598 Returns the start index of the extent. This is the index of the first
599 character within the extent.
600
601 $extent->end
602
603 Returns the end index of the extent. This is the index of the first
604 character beyond the end of the extent.
605
606 $extent->anchor_before
607
608 True if this extent begins "before" the start of the string. Only
609 certain methods return extents with this flag defined.
610
611 $extent->anchor_after
612
613 True if this extent ends "after" the end of the string. Only certain
614 methods return extents with this flag defined.
615
616 $extent->length
617
618 Returns the number of characters within the extent.
619
620 $extent->substr
621
622 Returns the substring contained by the extent.
623
624 $extent->plain_substr
625
626 Returns the substring of the underlying plain string buffer contained
627 by the extent.
628
629TODO
630
631 * There are likely variations on the rules for set_substr that could
632 equally apply to some uses of tagged strings. Consider whether the
633 behaviour of modification is chosen per-method, per-tag, or
634 per-string.
635
636 * Consider how to implement a clone from one tag format to another
637 which wants to merge multiple different source tags together into a
638 single new one.
639
640AUTHOR
641
642 Paul Evans <leonerd@leonerd.org.uk>
643
644