• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

examples/H10-Aug-2021-7047

lib/String/H10-Aug-2021-1,964847

t/H10-Aug-2021-1,383985

Build.PLH A D10-Aug-2021427 2319

ChangesH A D10-Aug-20213.3 KiB10278

LICENSEH A D10-Aug-202118 KiB380292

MANIFESTH A D10-Aug-2021477 3130

META.jsonH A D10-Aug-2021976 4443

META.ymlH A D10-Aug-2021646 2524

READMEH A D10-Aug-202120.5 KiB644416

README

1NAME
2
3    String::Tagged - string buffers with value tags on extents
4
5SYNOPSIS
6
7     use String::Tagged;
8
9     my $st = String::Tagged->new( "An important message" );
10
11     $st->apply_tag( 3, 9, bold => 1 );
12
13     $st->iter_substr_nooverlap(
14        sub {
15           my ( $substring, %tags ) = @_;
16
17           print $tags{bold} ? "<b>$substring</b>"
18                             : $substring;
19        }
20     );
21
22DESCRIPTION
23
24    This module implements an object class, instances of which store a
25    (mutable) string buffer that supports tags. A tag is a name/value pair
26    that applies to some non-empty extent of the underlying string.
27
28    The types of tag names ought to be strings, or at least values that are
29    well-behaved as strings, as the names will often be used as the keys in
30    hashes or applied to the eq operator.
31
32    The types of tag values are not restricted - any scalar will do. This
33    could be a simple integer or string, ARRAY or HASH reference, or even a
34    CODE reference containing an event handler of some kind.
35
36    Tags may be arbitrarily overlapped. Any given offset within the string
37    has in effect, a set of uniquely named tags. Tags of different names
38    are independent. For tags of the same name, only the latest, shortest
39    tag takes effect.
40
41    For example, consider a string with three tags represented here:
42
43     Here is my string with tags
44     [-------------------------]  foo => 1
45             [-------]            foo => 2
46          [---]                   bar => 3
47
48    Every character in this string has a tag named foo. The value of this
49    tag is 2 for the words my and string and the space inbetween, and 1
50    elsewhere. Additionally, the words is and my and the space between them
51    also have the tag bar with a value 3.
52
53    Since String::Tagged does not understand the significance of the tag
54    values it therefore cannot detect if two neighbouring tags really
55    contain the same semantic idea. Consider the following string:
56
57     A string with words
58     [-------]            type => "message"
59              [--------]  type => "message"
60
61    This string contains two tags. String::Tagged will treat this as two
62    different tag values as far as iter_tags_nooverlap is concerned, even
63    though get_tag_at yields the same value for the type tag at any
64    position in the string. The merge_tags method may be used to merge tag
65    extents of tags that should be considered as equal.
66
67NAMING
68
69    I spent a lot of time considering the name for this module. It seems
70    that a number of people across a number of languages all created
71    similar functionality, though named very differently. For the benefit
72    of keyword-based search tools and similar, here's a list of some other
73    names this sort of object might be known by:
74
75      * Extents
76
77      * Overlays
78
79      * Attribute or attributed strings
80
81      * Markup
82
83      * Out-of-band data
84
85CONSTRUCTOR
86
87 new
88
89       $st = String::Tagged->new( $str )
90
91    Returns a new instance of a String::Tagged object. It will contain no
92    tags. If the optional $str argument is supplied, the string buffer will
93    be initialised from this value.
94
95    If $str is a String::Tagged object then it will be cloned, as if
96    calling the clone method on it.
97
98 new_tagged
99
100       $st = String::Tagged->new_tagged( $str, %tags )
101
102    Shortcut for creating a new String::Tagged object with the given tags
103    applied to the entire length. The tags will not be anchored at either
104    end.
105
106 clone (class)
107
108       $new = String::Tagged->clone( $orig, %opts )
109
110    Returns a new instance of String::Tagged made by cloning the original,
111    subject to the options provided. The returned instance will be in the
112    requested class, which need not match the class of the original.
113
114    The following options are recognised:
115
116    only_tags => ARRAY
117
118      If present, gives an ARRAY reference containing tag names. Only those
119      tags named here will be copied; others will be ignored.
120
121    except_tags => ARRAY
122
123      If present, gives an ARRAY reference containing tag names. All tags
124      will be copied except those named here.
125
126    convert_tags => HASH
127
128      If present, gives a HASH reference containing tag conversion
129      functions. For any tags in the original to be copied whose names
130      appear in the hash, the name and value are passed into the
131      corresponding function, which should return an even-sized key/value
132      list giving a tag, or a list of tags, to apply to the new clone.
133
134       my @new_tags = $convert_tags->{$orig_name}->( $orig_name, $orig_value )
135       # Where @new_tags is ( $new_name, $new_value, $new_name_2, $new_value_2, ... )
136
137      As a further convenience, if the value for a given tag name is a
138      plain string instead of a code reference, it gives the new name for
139      the tag, and will be applied with its existing value.
140
141      If only_tags is being used too, then the source names of any tags to
142      be converted must also be listed there, or they will not be copied.
143
144 clone (instance)
145
146       $new = $orig->clone( %args )
147
148    Called as an instance (rather than a class) method, the newly-cloned
149    instance is returned in the same class as the original.
150
151 from_sprintf
152
153       $str = String::Tagged->from_sprintf( $format, @args )
154
155    Since version 0.15.
156
157    Returns a new instance of a String::Tagged object, initialised by
158    formatting the supplied arguments using the supplied format.
159
160    The $format string is similar to that supported by the core sprintf
161    operator, though a few features such as out-of-order argument indexing
162    and vector formatting are missing. This format string may be a plain
163    perl string, or an instance of String::Tagged. In the latter case, any
164    tags within it are preserved in the result.
165
166    In the case of a %s conversion, the value of the argument consumed may
167    itself be a String::Tagged instance. In this case it will be appended
168    to the returned object, preserving any tags within it.
169
170    All other conversions are handled individually by the core sprintf
171    operator and appended to the result.
172
173 join
174
175       $str = String::Tagged->join( $sep, @parts )
176
177    Since version 0.17.
178
179    Returns a new instance of a String::Tagged object, formed by
180    concatenating each of the component piece together, joined with the
181    separator string.
182
183    The result will be much like the core join function, except that it
184    will preserve tags in the resulting string.
185
186METHODS
187
188 str
189
190       $str = $st->str
191
192       $str = "$st"
193
194    Returns the plain string contained within the object.
195
196    This method is also called for stringification; so the String::Tagged
197    object can be used in a plain string interpolation such as
198
199     my $message = String::Tagged->new( "Hello world" );
200     print "My message is $message\n";
201
202 length
203
204       $len = $st->length
205
206       $len = length( $st )
207
208    Returns the length of the plain string. Because stringification works
209    on this object class, the normal core length function works correctly
210    on it.
211
212 substr
213
214       $str = $st->substr( $start, $len )
215
216    Returns a String::Tagged instance representing a section from within
217    the given string, containing all the same tags at the same conceptual
218    positions.
219
220 plain_substr
221
222       $str = $st->plain_substr( $start, $len )
223
224    Returns as a plain perl string, the substring at the given position.
225    This will be the same string data as returned by substr, only as a
226    plain string without the tags
227
228 apply_tag
229
230       $st->apply_tag( $start, $len, $name, $value )
231
232    Apply the named tag value to the given extent. The tag will start on
233    the character at the $start index, and continue for the next $len
234    characters.
235
236    If $start is given as -1, the tag will be considered to start "before"
237    the actual string. If $len is given as -1, the tag will be considered
238    to end "after" end of the actual string. These special limits are used
239    by set_substr when deciding whether to move a tag boundary. The start
240    of any tag that starts "before" the string is never moved, even if more
241    text is inserted at the beginning. Similarly, a tag which ends "after"
242    the end of the string, will continue to the end even if more text is
243    appended.
244
245    This method returns the $st object.
246
247       $st->apply_tag( $e, $name, $value )
248
249    Alternatively, an existing extent object can be passed as the first
250    argument instead of two integers. The new tag will apply at the given
251    extent.
252
253 unapply_tag
254
255       $st->unapply_tag( $start, $len, $name )
256
257    Unapply the named tag value from the given extent. If the tag extends
258    beyond this extent, then any partial fragment of the tag will be left
259    in the string.
260
261    This method returns the $st object.
262
263       $st->unapply_tag( $e, $name )
264
265    Alternatively, an existing extent object can be passed as the first
266    argument instead of two integers.
267
268 delete_tag
269
270       $st->delete_tag( $start, $len, $name )
271
272    Delete the named tag within the given extent. Entire tags are removed,
273    even if they extend beyond this extent.
274
275    This method returns the $st object.
276
277       $st->delete_tag( $e, $name )
278
279    Alternatively, an existing extent object can be passed as the first
280    argument instead of two integers.
281
282 merge_tags
283
284       $st->merge_tags( $eqsub )
285
286    Merge neighbouring or overlapping tags of the same name and equal
287    values.
288
289    For each pair of tags of the same name that apply on neighbouring or
290    overlapping extents, the $eqsub callback is called, as
291
292      $equal = $eqsub->( $name, $value_a, $value_b )
293
294    If this function returns true then the tags are merged.
295
296    The equallity test function is free to perform any comparison of the
297    values that may be relevant to the application; for example it may
298    deeply compare referred structures and check for equivalence in some
299    application-defined manner. In this case, the first tag of a pair is
300    retained, the second is deleted. This may be relevant if the tag value
301    is a reference to some object.
302
303 iter_extents
304
305       $st->iter_extents( $callback, %opts )
306
307    Iterate the tags stored in the string. For each tag, the CODE reference
308    in $callback is invoked once, being passed an extent object that
309    represents the extent of the tag.
310
311     $callback->( $extent, $tagname, $tagvalue )
312
313    Options passed in %opts may include:
314
315    start => INT
316
317      Start at the given position; defaults to 0.
318
319    end => INT
320
321      End after the given position; defaults to end of string. This option
322      overrides len.
323
324    len => INT
325
326      End after the given length beyond the start position; defaults to end
327      of string. This option only applies if end is not given.
328
329    only => ARRAY
330
331      Select only the tags named in the given ARRAY reference.
332
333    except => ARRAY
334
335      Select all the tags except those named in the given ARRAY reference.
336
337 iter_tags
338
339       $st->iter_tags( $callback, %opts )
340
341    Iterate the tags stored in the string. For each tag, the CODE reference
342    in $callback is invoked once, being passed the start point and length
343    of the tag.
344
345     $callback->( $start, $length, $tagname, $tagvalue )
346
347    Options passed in %opts are the same as for iter_extents.
348
349 iter_extents_nooverlap
350
351       $st->iter_extents_nooverlap( $callback, %opts )
352
353    Iterate non-overlapping extents of tags stored in the string. The CODE
354    reference in $callback is invoked for each extent in the string where
355    no tags change. The entire set of tags active in that extent is given
356    to the callback. Because the extent covers possibly-multiple tags, it
357    will not define the anchor_before and anchor_after flags.
358
359     $callback->( $extent, %tags )
360
361    The callback will be invoked over the entire length of the string,
362    including any extents with no tags applied.
363
364    Options may be passed in %opts to control the range of the string
365    iterated over, in the same way as the iter_extents method.
366
367    If the only or except filters are applied, then only the tags that
368    survive filtering will be present in the %tags hash. Tags that are
369    excluded by the filtering will not be present, nor will their bounds be
370    used to split the string into extents.
371
372 iter_tags_nooverlap
373
374       $st->iter_tags_nooverlap( $callback, %opts )
375
376    Iterate extents of the string using iter_extents_nooverlap, but passing
377    the start and length of each extent to the callback instead of the
378    extent object.
379
380     $callback->( $start, $length, %tags )
381
382    Options may be passed in %opts to control the range of the string
383    iterated over, in the same way as the iter_extents method.
384
385 iter_substr_nooverlap
386
387       $st->iter_substr_nooverlap( $callback, %opts )
388
389    Iterate extents of the string using iter_extents_nooverlap, but passing
390    the substring of data instead of the extent object.
391
392     $callback->( $substr, %tags )
393
394    Options may be passed in %opts to control the range of the string
395    iterated over, in the same way as the iter_extents method.
396
397 tagnames
398
399       @names = $st->tagnames
400
401    Returns the set of tag names used in the string, in no particular
402    order.
403
404 get_tags_at
405
406       $tags = $st->get_tags_at( $pos )
407
408    Returns a HASH reference of all the tag values active at the given
409    position.
410
411 get_tag_at
412
413       $value = $st->get_tag_at( $pos, $name )
414
415    Returns the value of the named tag at the given position, or undef if
416    the tag is not applied there.
417
418 get_tag_extent
419
420       $extent = $st->get_tag_extent( $pos, $name )
421
422    If the named tag applies to the given position, returns the extent of
423    the tag at that position. If it does not, undef is returned. If an
424    extent is returned it will define the anchor_before and anchor_after
425    flags if appropriate.
426
427 get_tag_missing_extent
428
429       $extent = $st->get_tag_missing_extent( $pos, $name )
430
431    If the named tag does not apply at the given position, returns the
432    extent of the string around that position that does not have the tag.
433    If it does exist, undef is returned. If an extent is returned it will
434    not define the anchor_before and anchor_after flags, as these do not
435    make sense for the range in which a tag is absent.
436
437 set_substr
438
439       $st->set_substr( $start, $len, $newstr )
440
441    Modifies a extent of the underlying plain string to that given. The
442    extents of tags in the string are adjusted to cope with the modified
443    region, and the adjustment in length.
444
445    Tags entirely before the replaced extent remain unchanged.
446
447    Tags entirely within the replaced extent are deleted.
448
449    Tags entirely after the replaced extent are moved by appropriate amount
450    to ensure they still apply to the same characters as before.
451
452    Tags that start before and end after the extent remain, and have their
453    lengths suitably adjusted.
454
455    Tags that span just the start or end of the extent, but not both, are
456    truncated, so as to remove the part of the tag applied on the modified
457    extent but preserving that applied outside.
458
459    If $newstr is a String::Tagged object, then its tags will be applied to
460    $st as appropriate. Edge-anchored tags in $newstr will not be extended
461    through $st, though they will apply as edge-anchored if they now sit at
462    the edge of the new string.
463
464 insert
465
466       $st->insert( $start, $newstr )
467
468    Insert the given string at the given position. A shortcut around
469    set_substr.
470
471    If $newstr is a String::Tagged object, then its tags will be applied to
472    $st as appropriate. If $start is 0, any before-anchored tags in will
473    become before-anchored in $st.
474
475 append
476
477       $st->append( $newstr )
478
479       $st .= $newstr
480
481    Append to the underlying plain string. A shortcut around set_substr.
482
483    If $newstr is a String::Tagged object, then its tags will be applied to
484    $st as appropriate. Any after-anchored tags in will become
485    after-anchored in $st.
486
487 append_tagged
488
489       $st->append_tagged( $newstr, %tags )
490
491    Append to the underlying plain string, and apply the given tags to the
492    newly-inserted extent.
493
494    Returns $st itself so that the method may be easily chained.
495
496 concat
497
498       $ret = $st->concat( $other )
499
500       $ret = $st . $other
501
502    Returns a new String::Tagged containing the two strings concatenated
503    together, preserving any tags present. This method overloads normal
504    string concatenation operator, so expressions involving String::Tagged
505    values retain their tags.
506
507    This method or operator tries to respect subclassing; preferring to
508    return a new object of a subclass if either argument or operand is a
509    subclass of String::Tagged. If they are both subclasses, it will prefer
510    the type of the invocant or first operand.
511
512 matches
513
514       @subs = $st->matches( $regexp )
515
516    Returns a list of substrings (as String::Tagged instances) for every
517    non-overlapping match of the given $regexp.
518
519    This could be used, for example, to build a formatted string from a
520    formatted template containing variable expansions:
521
522     my $template = ...
523     my %vars = ...
524
525     my $ret = String::Tagged->new;
526     foreach my $m ( $template->matches( qr/\$\w+|[^$]+/ ) ) {
527        if( $m =~ m/^\$(\w+)$/ ) {
528           $ret->append_tagged( $vars{$1}, %{ $m->get_tags_at( 0 ) } );
529        }
530        else {
531           $ret->append( $m );
532        }
533     }
534
535    This iterates segments of the template containing variables expansions
536    starting with a $ symbol, and replaces them with values from the %vars
537    hash, careful to preserve all the formatting tags from the original
538    template string.
539
540 split
541
542       @parts = $st->split( $regexp, $limit )
543
544    Returns a list of substrings by applying the regexp to the string
545    content; similar to the core perl split function. If $limit is
546    supplied, the method will stop at that number of elements, returning
547    the entire remainder of the input string as the final element. If the
548    $regexp contains a capture group then the content of the first one will
549    be added to the return list as well.
550
551 sprintf
552
553       $ret = $st->sprintf( @args )
554
555    Since version 0.15.
556
557    Returns a new string by using the given instance as the format string
558    for a "from_sprintf" constructor call. The returned instance will be of
559    the same class as the invocant.
560
561 debug_sprintf
562
563       $ret = $st->debug_sprintf
564
565    Returns a representation of the string data and all the tags, suitable
566    for debug printing or other similar use. This is a format such as is
567    given in the DESCRIPTION section above.
568
569    The output will consist of a number of lines, the first containing the
570    plain underlying string, then one line per tag. The line shows the
571    extent of the tag given by [---] markers, or a | in the special case of
572    a tag covering only a single character. Special markings of < and >
573    indicate tags which are "before" or "after" anchored.
574
575    For example:
576
577      Hello, world
578      [---]         word       => 1
579     <[----------]> everywhere => 1
580            |       space      => 1
581
582Extent Objects
583
584    These objects represent a range of characters within the containing
585    String::Tagged object. The range they represent is fixed at the time of
586    creation. If the containing string is modified by a call to set_substr
587    then the effect on the extent object is not defined. These objects
588    should be considered as relatively short-lived - used briefly for the
589    purpose of querying the result of an operation, then discarded soon
590    after.
591
592 $extent->string
593
594    Returns the containing String::Tagged object.
595
596 $extent->start
597
598    Returns the start index of the extent. This is the index of the first
599    character within the extent.
600
601 $extent->end
602
603    Returns the end index of the extent. This is the index of the first
604    character beyond the end of the extent.
605
606 $extent->anchor_before
607
608    True if this extent begins "before" the start of the string. Only
609    certain methods return extents with this flag defined.
610
611 $extent->anchor_after
612
613    True if this extent ends "after" the end of the string. Only certain
614    methods return extents with this flag defined.
615
616 $extent->length
617
618    Returns the number of characters within the extent.
619
620 $extent->substr
621
622    Returns the substring contained by the extent.
623
624 $extent->plain_substr
625
626    Returns the substring of the underlying plain string buffer contained
627    by the extent.
628
629TODO
630
631      * There are likely variations on the rules for set_substr that could
632      equally apply to some uses of tagged strings. Consider whether the
633      behaviour of modification is chosen per-method, per-tag, or
634      per-string.
635
636      * Consider how to implement a clone from one tag format to another
637      which wants to merge multiple different source tags together into a
638      single new one.
639
640AUTHOR
641
642    Paul Evans <leonerd@leonerd.org.uk>
643
644