xref: /openbsd/gnu/usr.bin/perl/pod/perlhacktips.pod (revision 3d61058a)
1
2=encoding utf8
3
4=for comment
5Consistent formatting of this file is achieved with:
6  perl ./Porting/podtidy pod/perlhacktips.pod
7
8=head1 NAME
9
10perlhacktips - Tips for Perl core C code hacking
11
12=head1 DESCRIPTION
13
14This document will help you learn the best way to go about hacking on
15the Perl core C code.  It covers common problems, debugging, profiling,
16and more.
17
18If you haven't read L<perlhack> and L<perlhacktut> yet, you might want
19to do that first.
20
21=head1 COMMON PROBLEMS
22
23Perl source now permits some specific C99 features which we know are
24supported by all platforms, but mostly plays by ANSI C89 rules.  You
25don't care about some particular platform having broken Perl?  I hear
26there is still a strong demand for J2EE programmers.
27
28=head2 Perl environment problems
29
30=over 4
31
32=item *
33
34Not compiling with threading
35
36Compiling with threading (-Duseithreads) completely rewrites the
37function prototypes of Perl.  You better try your changes with that.
38Related to this is the difference between "Perl_-less" and "Perl_-ly"
39APIs, for example:
40
41  Perl_sv_setiv(aTHX_ ...);
42  sv_setiv(...);
43
44The first one explicitly passes in the context, which is needed for
45e.g. threaded builds.  The second one does that implicitly; do not get
46them mixed.  If you are not passing in a aTHX_, you will need to do a
47dTHX as the first thing in the function.
48
49See L<perlguts/"How multiple interpreters and concurrency are
50supported"> for further discussion about context.
51
52=item *
53
54Not compiling with -DDEBUGGING
55
56The DEBUGGING define exposes more code to the compiler, therefore more
57ways for things to go wrong.  You should try it.
58
59=item *
60
61Introducing (non-read-only) globals
62
63Do not introduce any modifiable globals, truly global or file static.
64They are bad form and complicate multithreading and other forms of
65concurrency.  The right way is to introduce them as new interpreter
66variables, see F<intrpvar.h> (at the very end for binary
67compatibility).
68
69Introducing read-only (const) globals is okay, as long as you verify
70with e.g. C<nm libperl.a|egrep -v ' [TURtr] '> (if your C<nm> has
71BSD-style output) that the data you added really is read-only.  (If it
72is, it shouldn't show up in the output of that command.)
73
74If you want to have static strings, make them constant:
75
76  static const char etc[] = "...";
77
78If you want to have arrays of constant strings, note carefully the
79right combination of C<const>s:
80
81    static const char * const yippee[] =
82        {"hi", "ho", "silver"};
83
84=item *
85
86Not exporting your new function
87
88Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any
89function that is part of the public API (the shared Perl library) to be
90explicitly marked as exported.  See the discussion about F<embed.pl> in
91L<perlguts>.
92
93=item *
94
95Exporting your new function
96
97The new shiny result of either genuine new functionality or your
98arduous refactoring is now ready and correctly exported.  So what could
99possibly go wrong?
100
101Maybe simply that your function did not need to be exported in the
102first place.  Perl has a long and not so glorious history of exporting
103functions that it should not have.
104
105If the function is used only inside one source code file, make it
106static.  See the discussion about F<embed.pl> in L<perlguts>.
107
108If the function is used across several files, but intended only for
109Perl's internal use (and this should be the common case), do not export
110it to the public API.  See the discussion about F<embed.pl> in
111L<perlguts>.
112
113=back
114
115=head2 C99
116
117Starting from 5.35.5 we now permit some C99 features in the core C
118source. However, code in dual life extensions still needs to be C89
119only, because it needs to compile against earlier version of Perl
120running on older platforms.  Also note that our headers need to also be
121valid as C++, because XS extensions written in C++ need to include
122them, hence I<member structure initialisers> can't be used in headers.
123
124C99 support is still far from complete on all platforms we currently
125support. As a baseline we can only assume C89 semantics with the
126specific C99 features described below, which we've verified work
127everywhere.  It's fine to probe for additional C99 features and use
128them where available, providing there is also a fallback for compilers
129that don't support the feature.  For example, we use C11 thread local
130storage when available, but fall back to POSIX thread specific APIs
131otherwise, and we use C<char> for booleans if C<< <stdbool.h> >> isn't
132available.
133
134Code can use (and rely on) the following C99 features being present
135
136=over
137
138=item *
139
140mixed declarations and code
141
142=item *
143
14464 bit integer types
145
146For consistency with the existing source code, use the typedefs C<I64>
147and C<U64>, instead of using C<long long> and C<unsigned long long>
148directly.
149
150=item *
151
152variadic macros
153
154    void greet(char *file, unsigned int line, char *format, ...);
155    #define logged_greet(...) greet(__FILE__, __LINE__, __VA_ARGS__);
156
157Note that C<__VA_OPT__> is standardized as of C23 and C++20.  Before
158that it was a gcc extension.
159
160=item *
161
162declarations in for loops
163
164    for (const char *p = message; *p; ++p) {
165        putchar(*p);
166    }
167
168=item *
169
170member structure initialisers
171
172But not in headers, as support was only added to C++ relatively
173recently.
174
175Hence this is fine in C and XS code, but not headers:
176
177    struct message {
178        char *action;
179        char *target;
180    };
181
182    struct message mcguffin = {
183        .target = "member structure initialisers",
184        .action = "Built"
185     };
186
187You cannot use the similar syntax for compound literals, since we also
188build perl using C++ compilers:
189
190    /* this is fine */
191    struct message m = {
192        .target = "some target",
193        .action = "some action"
194    };
195    /* this is not valid in C++ */
196    m = (struct message){
197        .target = "some target",
198        .action = "some action"
199    };
200
201While structure designators are usable, the related array designators
202are not, since they aren't supported by C++ at all.
203
204=item *
205
206flexible array members
207
208This is standards conformant:
209
210    struct greeting {
211        unsigned int len;
212        char message[];
213    };
214
215However, the source code already uses the "unwarranted chumminess with
216the compiler" hack in many places:
217
218    struct greeting {
219        unsigned int len;
220        char message[1];
221    };
222
223Strictly it B<is> undefined behaviour accessing beyond C<message[0]>,
224but this has been a commonly used hack since K&R times, and using it
225hasn't been a practical issue anywhere (in the perl source or any other
226common C code). Hence it's unclear what we would gain from actively
227changing to the C99 approach.
228
229=item *
230
231C<//> comments
232
233All compilers we tested support their use. Not all humans we tested
234support their use.
235
236=back
237
238Code explicitly should not use any other C99 features. For example
239
240=over 4
241
242=item *
243
244variable length arrays
245
246Not supported by B<any> MSVC, and this is not going to change.
247
248Even "variable" length arrays where the variable is a constant
249expression are syntax errors under MSVC.
250
251=item *
252
253C99 types in C<< <stdint.h> >>
254
255Use C<PERL_INT_FAST8_T> etc as defined in F<handy.h>
256
257=item *
258
259C99 format strings in C<< <inttypes.h> >>
260
261C<snprintf> in the VMS libc only added support for C<PRIdN> etc very
262recently, meaning that there are live supported installations without
263this, or formats such as C<%zu>.
264
265(perl's C<sv_catpvf> etc use parser code code in F<sv.c>, which
266supports the C<z> modifier, along with perl-specific formats such as
267C<SVf>.)
268
269=back
270
271If you want to use a C99 feature not listed above then you need to do
272one of
273
274=over 4
275
276=item *
277
278Probe for it in F<Configure>, set a variable in F<config.sh>, and add
279fallback logic in the headers for platforms which don't have it.
280
281=item *
282
283Write test code and verify that it works on platforms we need to
284support, before relying on it unconditionally.
285
286=back
287
288Likely you want to repeat the same plan as we used to get the current
289C99 feature set. See the message at
290L<https://markmail.org/thread/odr4fjrn72u2fkpz> for the C99 probes we
291used before. Note that the two most "fussy" compilers appear to be MSVC
292and the vendor compiler on VMS. To date all the *nix compilers have
293been far more flexible in what they support.
294
295On *nix platforms, F<Configure> attempts to set compiler flags
296appropriately. All vendor compilers that we tested defaulted to C99 (or
297C11) support. However, older versions of gcc default to C89, or permit
298I<most> C99 (with warnings), but forbid I<declarations in for loops>
299unless C<-std=gnu99> is added. The alternative C<-std=c99> B<might>
300seem better, but using it on some platforms can prevent C<< <unistd.h>
301>> declaring some prototypes being declared, which breaks the build.
302gcc's C<-ansi> flag implies C<-std=c89> so we can no longer set that,
303hence the Configure option C<-gccansipedantic> now only adds
304C<-pedantic>.
305
306The Perl core source code files (the ones at the top level of the
307source code distribution) are automatically compiled with as many as
308possible of the C<-std=gnu99>, C<-pedantic>, and a selection of C<-W>
309flags (see cflags.SH). Files in F<ext/> F<dist/> F<cpan/> etc are
310compiled with the same flags as the installed perl would use to compile
311XS extensions.
312
313Basically, it's safe to assume that F<Configure> and F<cflags.SH> have
314picked the best combination of flags for the version of gcc on the
315platform, and attempting to add more flags related to enforcing a C
316dialect will cause problems either locally, or on other systems that
317the code is shipped to.
318
319We believe that the C99 support in gcc 3.1 is good enough for us, but
320we don't have a 19 year old gcc handy to check this :-) If you have
321ancient vendor compilers that don't default to C99, the flags you might
322want to try are
323
324=over 4
325
326=item AIX
327
328C<-qlanglvl=stdc99>
329
330=item HP/UX
331
332C<-AC99>
333
334=item Solaris
335
336C<-xc99>
337
338=back
339
340=head2 Symbol Names and Namespace Pollution
341
342=head3 Choosing legal symbol names
343
344C reserves for its implementation any symbol whose name begins with an
345underscore followed immediately by either an uppercase letter C<[A-Z]>
346or another underscore.  C++ further reserves any symbol containing two
347consecutive underscores, and further reserves in the global name space
348any symbol beginning with an underscore, not just ones followed by a
349capital.  We care about C++ because header files (F<*.h>) need to be
350compilable by it, and some people do all their development using a C++
351compiler.
352
353The consequences of failing to do this are probably none.  Unless you
354stumble on a name that the implementation uses, things will work.
355Indeed, the perl core has more than a few instances of using
356implementation-reserved symbols.  (These are gradually being changed.)
357But your code might stop working any time that the implementation
358decides to use a name you already had chosen, potentially many years
359before.
360
361It's best then to:
362
363=over
364
365=item B<Don't begin a symbol name with an underscore>; (I<e.g.>, don't
366use: C<_FOOBAR>)
367
368=item B<Don't use two consecutive underscores in a symbol name>;
369(I<e.g.>, don't use C<FOO__BAR>)
370
371=back
372
373POSIX also reserves many symbols.  See Section 2.2.2 in
374L<https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html>.
375Perl also has conflicts with that.
376
377Perl reserves for its use any symbol beginning with C<Perl>, C<perl>,
378or C<PL_>.  Any time you introduce a macro into a header file that
379doesn't follow that convention, you are creating the possiblity of a
380namespace clash with an existing XS module, unless you restrict it by,
381say,
382
383 #ifdef PERL_CORE
384 #  define my_symbol
385 #endif
386
387There are many symbols in header files that aren't of this form, and
388which are accessible from XS namespace, intentionally or not, just
389about anything in F<config.h>, for example.
390
391Having to use one of these prefixes detracts from the readability of
392the code, and hasn't been an actual issue for non-trivial names. Things
393like perl defining its own C<MAX> macro have been problematic, but they
394were quickly discovered, and a S<C<#ifdef PERL_CORE>> guard added.
395
396So there's no rule imposed about using such symbols, just be aware of
397the issues.
398
399=head3 Choosing good symbol names
400
401Ideally, a symbol name name should correctly and precisely describe its
402intended purpose.  But there is a tension between that and getting
403names that are overly long and hence awkward to type and read.
404Metaphors could be helpful (a poetic name), but those tend to be
405culturally specific, and may not translate for someone whose native
406language isn't English, or even comes from a different cultural
407background.  Besides, the talent of writing poetry seems to be rare in
408programmers.
409
410Certain symbol names don't reflect their purpose, but are nonetheless
411fine to use because of long-standing conventions.  These often
412originated in the field of Mathematics, where C<i> and C<j> are
413frequently used as subscripts, and C<n> as a population count.  Since
414at least the 1950's, computer programs have used C<i>, I<etc.> as loop
415variables.
416
417Our guidance is to choose a name that reasonably describes the purpose,
418and to comment its declaration more precisely.
419
420One certainly shouldn't use misleading nor ambiguous names. C<last_foo>
421could mean either the final C<foo> or the previous C<foo>, and so could
422be confusing to the reader, or even to the writer coming back to the
423code after a few months of working on something else. Sometimes the
424programmer has a particular line of thought in mind, and it doesn't
425occur to them that ambiguity is present.
426
427There are probably still many off-by-1 bugs around because the name
428L<perlapi/C<av_len>> doesn't correspond to what other I<-len>
429constructs mean, such as L<perlapi/C<sv_len>>.  Awkward (and
430controversial) synonyms were created to use instead that conveyed its
431true meaning (L<perlapi/C<av_top_index>>).  Eventually, though, someone
432had the better idea to create a new name to signify what most people
433think C<-len> signifies.  So L<perlapi/C<av_count>> was born.  And we
434wish it had been thought up much earlier.
435
436=head2 Writing safer macros
437
438Macros are used extensively in the Perl core for such things as hiding
439internal details from the caller, so that it doesn't have to be
440concerned about them.  For example, most lines of code don't need to
441know if they are running on a threaded versus unthreaded perl.  That
442detail is automatically mostly hidden.
443
444It is often better to use an inline function instead of a macro.  They
445are immune to name collisions with the caller, and don't magnify
446problems when called with parameters that are expressions with side
447effects.  There was a time when one might choose a macro over an inline
448function because compiler support for inline functions was quite
449limited.  Some only would actually only inline the first two or three
450encountered in a compilation.  But those days are long gone, and inline
451functions are fully supported in modern compilers.
452
453Nevertheless, there are situations where a function won't do, and a
454macro is required.  One example is when a parameter can be any of
455several types.  A function has to be declared with a single explicit
456
457Or maybe the code involved is so trivial that a function would be just
458complicating overkill, such as when the macro simply creates a mnemonic
459name for some constant value.
460
461If you do choose to use a non-trivial macro, be aware that there are
462several avoidable pitfalls that can occur.  Keep in mind that a macro
463is expanded within the lexical context of each place in the source it
464is called.  If you have a token C<foo> in the macro and the source
465happens also to have C<foo>, the meaning of the macro's C<foo> will
466become that of the caller's.  Sometimes that is exactly the behavior
467you want, but be aware that this tends to be confusing later on.  It
468effectively turns C<foo> into a reserved word for any code that calls
469the macro, and this fact is usually not documented nor considered.  It
470is safer to pass C<foo> as a parameter, so that C<foo> remains freely
471available to the caller and the macro interface is explicitly
472specified.
473
474Worse is when the equivalence between the two C<foo>'s is coincidental.
475Suppose for example, that the macro declares a variable
476
477 int foo
478
479That works fine as long as the caller doesn't define the string C<foo>
480in some way.  And it might not be until years later that someone comes
481along with an instance where C<foo> is used.  For example a future
482caller could do this:
483
484 #define foo  bar
485
486Then that declaration of C<foo> in the macro suddenly becomes
487
488 int bar
489
490That could mean that something completely different happens than
491intended.  It is hard to debug; the macro and call may not even be in
492the same file, so it would require some digging and gnashing of teeth
493to figure out.
494
495Therefore, if a macro does use variables, their names should be such
496that it is very unlikely that they would collide with any caller, now
497or forever.  One way to do that, now being used in the perl source, is
498to include the name of the macro itself as part of the name of each
499variable in the macro.  Suppose the macro is named C<SvPV>  Then we
500could have
501
502 int foo_svpv_ = 0;
503
504This is harder to read than plain C<foo>, but it is pretty much
505guaranteed that a caller will never naively use C<foo_svpv_> (and run
506into problems).  (The lowercasing makes it clearer that this is a
507variable, but assumes that there won't be two elements whose names
508differ only in the case of their letters.)  The trailing underscore
509makes it even more unlikely to clash, as those, by convention, signify
510a private variable name.  (See L</Choosing legal symbol names> for
511restrictions on what names you can use.)
512
513This kind of name collision doesn't happen with the macro's formal
514parameters, so they don't need to have complicated names.  But there
515are pitfalls when a a parameter is an expression, or has some Perl
516magic attached.  When calling a function, C will evaluate the parameter
517once, and pass the result to the function.  But when calling a macro,
518the parameter is copied as-is by the C preprocessor to each instance
519inside the macro.  This means that when evaluating a parameter having
520side effects, the function and macro results differ.  This is
521particularly fraught when a parameter has overload magic, say it is a
522tied variable that reads the next line in a file upon each evaluation.
523Having it read multiple lines per call is probably not what the caller
524intended.  If a macro refers to a potentially overloadable parameter
525more than once, it should first make a copy and then use that copy the
526rest of the time. There are macros in the perl core that violate this,
527but are gradually being converted, usually by changing to use inline
528functions instead.
529
530Above we said "first make a copy".  In a macro, that is easier said
531than done, because macros are normally expressions, and declarations
532aren't allowed in expressions.  But the S<C<STMT_START> .. C<STMT_END>>
533construct, described in L<perlapi|perlapi/STMT_START>, allows you to
534have declarations in most contexts, as long as you don't need a return
535value.  If you do need a value returned, you can make the interface
536such that a pointer is passed to the construct, which then stores its
537result there.  (Or you can use GCC brace groups.  But these require a
538fallback if the code will ever get executed on a platform that lacks
539this non-standard extension to C.  And that fallback would be another
540code path, which can get out-of-sync with the brace group one, so doing
541this isn't advisable.)  In situations where there's no other way, Perl
542does furnish L<perlintern/C<PL_Sv>> and L<perlapi/C<PL_na>> to use
543(with a slight performance penalty) for some such common cases.  But
544beware that a call chain involving multiple macros using them will zap
545the other's use.  These have been very difficult to debug.
546
547For a concrete example of these pitfalls in action, see
548L<https://perlmonks.org/?node_id=11144355>.
549
550=head2 Portability problems
551
552The following are common causes of compilation and/or execution
553failures, not common to Perl as such.  The C FAQ is good bedtime
554reading.  Please test your changes with as many C compilers and
555platforms as possible; we will, anyway, and it's nice to save oneself
556from public embarrassment.
557
558Also study L<perlport> carefully to avoid any bad assumptions about the
559operating system, filesystems, character set, and so forth.
560
561Do not assume an operating system indicates a certain compiler.
562
563=over 4
564
565=item *
566
567Casting pointers to integers or casting integers to pointers
568
569    void castaway(U8* p)
570    {
571      IV i = p;
572
573or
574
575    void castaway(U8* p)
576    {
577      IV i = (IV)p;
578
579Both are bad, and broken, and unportable.  Use the PTR2IV() macro that
580does it right.  (Likewise, there are PTR2UV(), PTR2NV(), INT2PTR(), and
581NUM2PTR().)
582
583=item *
584
585Casting between function pointers and data pointers
586
587Technically speaking casting between function pointers and data
588pointers is unportable and undefined, but practically speaking it seems
589to work, but you should use the FPTR2DPTR() and DPTR2FPTR() macros.
590Sometimes you can also play games with unions.
591
592=item *
593
594Assuming C<sizeof(int) == sizeof(long)>
595
596There are platforms where longs are 64 bits, and platforms where ints
597are 64 bits, and while we are out to shock you, even platforms where
598shorts are 64 bits.  This is all legal according to the C standard. (In
599other words, C<long long> is not a portable way to specify 64 bits, and
600C<long long> is not even guaranteed to be any wider than C<long>.)
601
602Instead, use the definitions C<IV>, C<UV>, C<IVSIZE>, C<I32SIZE>, and
603so forth. Avoid things like C<I32> because they are B<not> guaranteed
604to be I<exactly> 32 bits, they are I<at least> 32 bits, nor are they
605guaranteed to be C<int> or C<long>.  If you explicitly need 64-bit
606variables, use C<I64> and C<U64>.
607
608=item *
609
610Assuming one can dereference any type of pointer for any type of data
611
612  char *p = ...;
613  long pony = *(long *)p;    /* BAD */
614
615Many platforms, quite rightly so, will give you a core dump instead of
616a pony if the p happens not to be correctly aligned.
617
618=item *
619
620Lvalue casts
621
622  (int)*p = ...;    /* BAD */
623
624Simply not portable.  Get your lvalue to be of the right type, or maybe
625use temporary variables, or dirty tricks with unions.
626
627=item *
628
629Assume B<anything> about structs (especially the ones you don't
630control, like the ones coming from the system headers)
631
632=over 8
633
634=item *
635
636That a certain field exists in a struct
637
638=item *
639
640That no other fields exist besides the ones you know of
641
642=item *
643
644That a field is of certain signedness, sizeof, or type
645
646=item *
647
648That the fields are in a certain order
649
650=over 8
651
652=item *
653
654While C guarantees the ordering specified in the struct definition,
655between different platforms the definitions might differ
656
657=back
658
659=item *
660
661That the C<sizeof(struct)> or the alignments are the same everywhere
662
663=over 8
664
665=item *
666
667There might be padding bytes between the fields to align the fields -
668the bytes can be anything
669
670=item *
671
672Structs are required to be aligned to the maximum alignment required by
673the fields - which for native types is usually equivalent to
674C<sizeof(the_field)>.
675
676=back
677
678=back
679
680=item *
681
682Assuming the character set is ASCIIish
683
684Perl can compile and run under EBCDIC platforms.  See L<perlebcdic>.
685This is transparent for the most part, but because the character sets
686differ, you shouldn't use numeric (decimal, octal, nor hex) constants
687to refer to characters.  You can safely say C<'A'>, but not C<0x41>.
688You can safely say C<'\n'>, but not C<\012>.  However, you can use
689macros defined in F<utf8.h> to specify any code point portably.
690C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means
691LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
692ASCII platforms it compiles without adding any extra code, so there is
693zero performance hit on those).  The acceptable inputs to
694C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>.  If your input
695isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead.
696C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite
697direction.
698
699If you need the string representation of a character that doesn't have
700a mnemonic name in C, you should add it to the list in
701F<regen/unicode_constants.pl>, and have Perl create C<#define>'s for
702you, based on the current platform.
703
704Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work
705properly on native code points and strings.
706
707Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper
708case alphabetic characters.  That is not true in EBCDIC.  Nor for 'a'
709to 'z'.  But '0' - '9' is an unbroken range in both systems.  Don't
710assume anything about other ranges.  (Note that special handling of
711ranges in regular expression patterns and transliterations makes it
712appear to Perl code that the aforementioned ranges are all unbroken.)
713
714Many of the comments in the existing code ignore the possibility of
715EBCDIC, and may be wrong therefore, even if the code works.  This is
716actually a tribute to the successful transparent insertion of being
717able to handle EBCDIC without having to change pre-existing code.
718
719UTF-8 and UTF-EBCDIC are two different encodings used to represent
720Unicode code points as sequences of bytes.  Macros  with the same names
721(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to
722allow the calling code to think that there is only one such encoding.
723This is almost always referred to as C<utf8>, but it means the EBCDIC
724version as well.  Again, comments in the code may well be wrong even if
725the code itself is right.  For example, the concept of UTF-8
726C<invariant characters> differs between ASCII and EBCDIC.  On ASCII
727platforms, only characters that do not have the high-order bit set
728(i.e.  whose ordinals are strict ASCII, 0 - 127) are invariant, and the
729documentation and comments in the code may assume that, often referring
730to something like, say, C<hibit>.  The situation differs and is not so
731simple on EBCDIC machines, but as long as the code itself uses the
732C<NATIVE_IS_INVARIANT()> macro appropriately, it works, even if the
733comments are wrong.
734
735As noted in L<perlhack/TESTING>, when writing test scripts, the file
736F<t/charset_tools.pl> contains some helpful functions for writing tests
737valid on both ASCII and EBCDIC platforms.  Sometimes, though, a test
738can't use a function and it's inconvenient to have different test
739versions depending on the platform.  There are 20 code points that are
740the same in all 4 character sets currently recognized by Perl (the 3
741EBCDIC code pages plus ISO 8859-1 (ASCII/Latin1)).  These can be used
742in such tests, though there is a small possibility that Perl will
743become available in yet another character set, breaking your test.  All
744but one of these code points are C0 control characters.  The most
745significant controls that are the same are C<\0>, C<\r>, and C<\N{VT}>
746(also specifiable as C<\cK>, C<\x0B>, C<\N{U+0B}>, or C<\013>).  The
747single non-control is U+00B6 PILCROW SIGN.  The controls that are the
748same have the same bit pattern in all 4 character sets, regardless of
749the UTF8ness of the string containing them.  The bit pattern for U+B6
750is the same in all 4 for non-UTF8 strings, but differs in each when its
751containing string is UTF-8 encoded.  The only other code points that
752have some sort of sameness across all 4 character sets are the pair
7530xDC and 0xFC. Together these represent upper- and lowercase LATIN
754LETTER U WITH DIAERESIS, but which is upper and which is lower may be
755reversed: 0xDC is the capital in Latin1 and 0xFC is the small letter,
756while 0xFC is the capital in EBCDIC and 0xDC is the small one.  This
757factoid may be exploited in writing case insensitive tests that are the
758same across all 4 character sets.
759
760=item *
761
762Assuming the character set is just ASCII
763
764ASCII is a 7 bit encoding, but bytes have 8 bits in them.  The 128
765extra characters have different meanings depending on the locale.
766Absent a locale, currently these extra characters are generally
767considered to be unassigned, and this has presented some problems. This
768has being changed starting in 5.12 so that these characters can be
769considered to be Latin-1 (ISO-8859-1).
770
771=item *
772
773Mixing #define and #ifdef
774
775  #define BURGLE(x) ... \
776  #ifdef BURGLE_OLD_STYLE        /* BAD */
777  ... do it the old way ... \
778  #else
779  ... do it the new way ... \
780  #endif
781
782You cannot portably "stack" cpp directives.  For example in the above
783you need two separate BURGLE() #defines, one for each #ifdef branch.
784
785=item *
786
787Adding non-comment stuff after #endif or #else
788
789  #ifdef SNOSH
790  ...
791  #else !SNOSH    /* BAD */
792  ...
793  #endif SNOSH    /* BAD */
794
795The #endif and #else cannot portably have anything non-comment after
796them.  If you want to document what is going (which is a good idea
797especially if the branches are long), use (C) comments:
798
799  #ifdef SNOSH
800  ...
801  #else /* !SNOSH */
802  ...
803  #endif /* SNOSH */
804
805The gcc option C<-Wendif-labels> warns about the bad variant (by
806default on starting from Perl 5.9.4).
807
808=item *
809
810Having a comma after the last element of an enum list
811
812  enum color {
813    CERULEAN,
814    CHARTREUSE,
815    CINNABAR,     /* BAD */
816  };
817
818is not portable.  Leave out the last comma.
819
820Also note that whether enums are implicitly morphable to ints varies
821between compilers, you might need to (int).
822
823=item *
824
825Mixing signed char pointers with unsigned char pointers
826
827  int foo(char *s) { ... }
828  ...
829  unsigned char *t = ...; /* Or U8* t = ... */
830  foo(t);   /* BAD */
831
832While this is legal practice, it is certainly dubious, and downright
833fatal in at least one platform: for example VMS cc considers this a
834fatal error.  One cause for people often making this mistake is that a
835"naked char" and therefore dereferencing a "naked char pointer" have an
836undefined signedness: it depends on the compiler and the flags of the
837compiler and the underlying platform whether the result is signed or
838unsigned.  For this very same reason using a 'char' as an array index
839is bad.
840
841=item *
842
843Macros that have string constants and their arguments as substrings of
844the string constants
845
846  #define FOO(n) printf("number = %d\n", n)    /* BAD */
847  FOO(10);
848
849Pre-ANSI semantics for that was equivalent to
850
851  printf("10umber = %d\10");
852
853which is probably not what you were expecting.  Unfortunately at least
854one reasonably common and modern C compiler does "real backward
855compatibility" here, in AIX that is what still happens even though the
856rest of the AIX compiler is very happily C89.
857
858=item *
859
860Using printf formats for non-basic C types
861
862   IV i = ...;
863   printf("i = %d\n", i);    /* BAD */
864
865While this might by accident work in some platform (where IV happens to
866be an C<int>), in general it cannot.  IV might be something larger.
867Even worse the situation is with more specific types (defined by Perl's
868configuration step in F<config.h>):
869
870   Uid_t who = ...;
871   printf("who = %d\n", who);    /* BAD */
872
873The problem here is that Uid_t might be not only not C<int>-wide but it
874might also be unsigned, in which case large uids would be printed as
875negative values.
876
877There is no simple solution to this because of printf()'s limited
878intelligence, but for many types the right format is available as with
879either 'f' or '_f' suffix, for example:
880
881   IVdf /* IV in decimal */
882   UVxf /* UV is hexadecimal */
883
884   printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */
885
886   Uid_t_f /* Uid_t in decimal */
887
888   printf("who = %"Uid_t_f"\n", who);
889
890Or you can try casting to a "wide enough" type:
891
892   printf("i = %"IVdf"\n", (IV)something_very_small_and_signed);
893
894See L<perlguts/Formatted Printing of Size_t and SSize_t> for how to
895print those.
896
897Also remember that the C<%p> format really does require a void pointer:
898
899   U8* p = ...;
900   printf("p = %p\n", (void*)p);
901
902The gcc option C<-Wformat> scans for such problems.
903
904=item *
905
906Blindly passing va_list
907
908Not all platforms support passing va_list to further varargs (stdarg)
909functions.  The right thing to do is to copy the va_list using the
910Perl_va_copy() if the NEED_VA_COPY is defined.
911
912=for apidoc_section $genconfig
913=for apidoc Amnh||NEED_VA_COPY
914
915=item *
916
917Using gcc statement expressions
918
919   val = ({...;...;...});    /* BAD */
920
921While a nice extension, it's not portable.  Historically, Perl used
922them in macros if available to gain some extra speed (essentially as a
923funky form of inlining), but we now support (or emulate) C99 C<static
924inline> functions, so use them instead. Declare functions as
925C<PERL_STATIC_INLINE> to transparently fall back to emulation where
926needed.
927
928=item *
929
930Binding together several statements in a macro
931
932Use the macros C<STMT_START> and C<STMT_END>.
933
934   STMT_START {
935      ...
936   } STMT_END
937
938But there can be subtle (but avoidable if you do it right) bugs
939introduced with these; see L<perlapi/C<STMT_START>> for best practices
940for their use.
941
942=item *
943
944Testing for operating systems or versions when you should be testing
945for features
946
947  #ifdef __FOONIX__    /* BAD */
948  foo = quux();
949  #endif
950
951Unless you know with 100% certainty that quux() is only ever available
952for the "Foonix" operating system B<and> that is available B<and>
953correctly working for B<all> past, present, B<and> future versions of
954"Foonix", the above is very wrong.  This is more correct (though still
955not perfect, because the below is a compile-time check):
956
957  #ifdef HAS_QUUX
958  foo = quux();
959  #endif
960
961How does the HAS_QUUX become defined where it needs to be?  Well, if
962Foonix happens to be Unixy enough to be able to run the Configure
963script, and Configure has been taught about detecting and testing
964quux(), the HAS_QUUX will be correctly defined.  In other platforms,
965the corresponding configuration step will hopefully do the same.
966
967In a pinch, if you cannot wait for Configure to be educated, or if you
968have a good hunch of where quux() might be available, you can
969temporarily try the following:
970
971  #if (defined(__FOONIX__) || defined(__BARNIX__))
972  # define HAS_QUUX
973  #endif
974
975  ...
976
977  #ifdef HAS_QUUX
978  foo = quux();
979  #endif
980
981But in any case, try to keep the features and operating systems
982separate.
983
984A good resource on the predefined macros for various operating systems,
985compilers, and so forth is
986L<https://sourceforge.net/p/predef/wiki/Home/>.
987
988=item *
989
990Assuming the contents of static memory pointed to by the return values
991of Perl wrappers for C library functions doesn't change.  Many C
992library functions return pointers to static storage that can be
993overwritten by subsequent calls to the same or related functions.  Perl
994has wrappers for some of these functions.  Originally many of those
995wrappers returned those volatile pointers.  But over time almost all of
996them have evolved to return stable copies.  To cope with the remaining
997ones, do a L<perlapi/savepv> to make a copy, thus avoiding these
998problems.  You will have to free the copy when you're done to avoid
999memory leaks.  If you don't have control over when it gets freed,
1000you'll need to make the copy in a mortal scalar, like so
1001
1002 SvPVX(sv_2mortal(newSVpv(volatile_string, 0)))
1003
1004=back
1005
1006=head2 Problematic System Interfaces
1007
1008=over 4
1009
1010=item *
1011
1012Perl strings are NOT the same as C strings:  They may contain C<NUL>
1013characters, whereas a C string is terminated by the first C<NUL>. That
1014is why Perl API functions that deal with strings generally take a
1015pointer to the first byte and either a length or a pointer to the byte
1016just beyond the final one.
1017
1018And this is the reason that many of the C library string handling
1019functions should not be used.  They don't cope with the full generality
1020of Perl strings.  It may be that your test cases don't have embedded
1021C<NUL>s, and so the tests pass, whereas there may well eventually arise
1022real-world cases where they fail.  A lesson here is to include C<NUL>s
1023in your tests.  Now it's fairly rare in most real world cases to get
1024C<NUL>s, so your code may seem to work, until one day a C<NUL> comes
1025along.
1026
1027Here's an example.  It used to be a common paradigm, for decades, in
1028the perl core to use S<C<strchr("list", c)>> to see if the character
1029C<c> is any of the ones given in C<"list">, a double-quote-enclosed
1030string of the set of characters that we are seeing if C<c> is one of.
1031As long as C<c> isn't a C<NUL>, it works.  But when C<c> is a C<NUL>,
1032C<strchr> returns a pointer to the terminating C<NUL> in C<"list">.
1033This likely will result in a segfault or a security issue when the
1034caller uses that end pointer as the starting point to read from.
1035
1036A solution to this and many similar issues is to use the C<mem>I<-foo>
1037C library functions instead.  In this case C<memchr> can be used to see
1038if C<c> is in C<"list"> and works even if C<c> is C<NUL>.  These
1039functions need an additional parameter to give the string length. In
1040the case of literal string parameters, perl has defined macros that
1041calculate the length for you.  See L<perlapi/String Handling>.
1042
1043=item *
1044
1045malloc(0), realloc(0), calloc(0, 0) are non-portable.  To be portable
1046allocate at least one byte.  (In general you should rarely need to work
1047at this low level, but instead use the various malloc wrappers.)
1048
1049=item *
1050
1051snprintf() - the return type is unportable.  Use my_snprintf() instead.
1052
1053=back
1054
1055=head2 Security problems
1056
1057Last but not least, here are various tips for safer coding. See also
1058L<perlclib> for libc/stdio replacements one should use.
1059
1060=over 4
1061
1062=item *
1063
1064Do not use gets()
1065
1066Or we will publicly ridicule you.  Seriously.
1067
1068=item *
1069
1070Do not use tmpfile()
1071
1072Use mkstemp() instead.
1073
1074=item *
1075
1076Do not use strcpy() or strcat() or strncpy() or strncat()
1077
1078Use my_strlcpy() and my_strlcat() instead: they either use the native
1079implementation, or Perl's own implementation (borrowed from the public
1080domain implementation of INN).
1081
1082=item *
1083
1084Do not use sprintf() or vsprintf()
1085
1086If you really want just plain byte strings, use my_snprintf() and
1087my_vsnprintf() instead, which will try to use snprintf() and
1088vsnprintf() if those safer APIs are available.  If you want something
1089fancier than a plain byte string, use L<C<Perl_form>()|perlapi/form> or
1090SVs and L<C<Perl_sv_catpvf()>|perlapi/sv_catpvf>.
1091
1092Note that glibc C<printf()>, C<sprintf()>, etc. are buggy before glibc
1093version 2.17.  They won't allow a C<%.s> format with a precision to
1094create a string that isn't valid UTF-8 if the current underlying locale
1095of the program is UTF-8.  What happens is that the C<%s> and its
1096operand are simply skipped without any notice.
1097L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
1098
1099=item *
1100
1101Do not use atoi()
1102
1103Use grok_atoUV() instead.  atoi() has ill-defined behavior on
1104overflows, and cannot be used for incremental parsing.  It is also
1105affected by locale, which is bad.
1106
1107=item *
1108
1109Do not use strtol() or strtoul()
1110
1111Use grok_atoUV() instead.  strtol() or strtoul() (or their
1112IV/UV-friendly macro disguises, Strtol() and Strtoul(), or Atol() and
1113Atoul() are affected by locale, which is bad.
1114
1115=for apidoc_section $numeric
1116=for apidoc AmhD||Atol|const char * nptr
1117=for apidoc AmhD||Atoul|const char * nptr
1118
1119=back
1120
1121=head1 DEBUGGING
1122
1123You can compile a special debugging version of Perl, which allows you
1124to use the C<-D> option of Perl to tell more about what Perl is doing.
1125But sometimes there is no alternative than to dive in with a debugger,
1126either to see the stack trace of a core dump (very useful in a bug
1127report), or trying to figure out what went wrong before the core dump
1128happened, or how did we end up having wrong or unexpected results.
1129
1130=head2 Poking at Perl
1131
1132To really poke around with Perl, you'll probably want to build Perl for
1133debugging, like this:
1134
1135    ./Configure -d -DDEBUGGING
1136    make
1137
1138C<-DDEBUGGING> turns on the C compiler's C<-g> flag to have it produce
1139debugging information which will allow us to step through a running
1140program, and to see in which C function we are at (without the
1141debugging information we might see only the numerical addresses of the
1142functions, which is not very helpful). It will also turn on the
1143C<DEBUGGING> compilation symbol which enables all the internal
1144debugging code in Perl. There are a whole bunch of things you can debug
1145with this: L<perlrun|perlrun/-Dletters> lists them all, and the best
1146way to find out about them is to play about with them.  The most useful
1147options are probably
1148
1149    l  Context (loop) stack processing
1150    s  Stack snapshots (with v, displays all stacks)
1151    t  Trace execution
1152    o  Method and overloading resolution
1153    c  String/numeric conversions
1154
1155For example
1156
1157    $ perl -Dst -e '$x + 1'
1158    ....
1159    (-e:1)	gvsv(main::x)
1160        =>  UNDEF
1161    (-e:1)	const(IV(1))
1162        =>  UNDEF  IV(1)
1163    (-e:1)	add
1164        =>  NV(1)
1165
1166
1167Some of the functionality of the debugging code can be achieved with a
1168non-debugging perl by using XS modules:
1169
1170    -Dr => use re 'debug'
1171    -Dx => use O 'Debug'
1172
1173=head2 Using a source-level debugger
1174
1175If the debugging output of C<-D> doesn't help you, it's time to step
1176through perl's execution with a source-level debugger.
1177
1178=over 3
1179
1180=item *
1181
1182We'll use C<gdb> for our examples here; the principles will apply to
1183any debugger (many vendors call their debugger C<dbx>), but check the
1184manual of the one you're using.
1185
1186=back
1187
1188To fire up the debugger, type
1189
1190    gdb ./perl
1191
1192Or if you have a core dump:
1193
1194    gdb ./perl core
1195
1196You'll want to do that in your Perl source tree so the debugger can
1197read the source code.  You should see the copyright message, followed
1198by the prompt.
1199
1200    (gdb)
1201
1202C<help> will get you into the documentation, but here are the most
1203useful commands:
1204
1205=over 3
1206
1207=item * run [args]
1208
1209Run the program with the given arguments.
1210
1211=item * break function_name
1212
1213=item * break source.c:xxx
1214
1215Tells the debugger that we'll want to pause execution when we reach
1216either the named function (but see L<perlguts/Internal Functions>!) or
1217the given line in the named source file.
1218
1219=item * step
1220
1221Steps through the program a line at a time.
1222
1223=item * next
1224
1225Steps through the program a line at a time, without descending into
1226functions.
1227
1228=item * continue
1229
1230Run until the next breakpoint.
1231
1232=item * finish
1233
1234Run until the end of the current function, then stop again.
1235
1236=item * 'enter'
1237
1238Just pressing Enter will do the most recent operation again - it's a
1239blessing when stepping through miles of source code.
1240
1241=item * ptype
1242
1243Prints the C definition of the argument given.
1244
1245  (gdb) ptype PL_op
1246  type = struct op {
1247      OP *op_next;
1248      OP *op_sibparent;
1249      OP *(*op_ppaddr)(void);
1250      PADOFFSET op_targ;
1251      unsigned int op_type : 9;
1252      unsigned int op_opt : 1;
1253      unsigned int op_slabbed : 1;
1254      unsigned int op_savefree : 1;
1255      unsigned int op_static : 1;
1256      unsigned int op_folded : 1;
1257      unsigned int op_spare : 2;
1258      U8 op_flags;
1259      U8 op_private;
1260  } *
1261
1262=item * print
1263
1264Execute the given C code and print its results.  B<WARNING>: Perl makes
1265heavy use of macros, and F<gdb> does not necessarily support macros
1266(see later L</"gdb macro support">).  You'll have to substitute them
1267yourself, or to invoke cpp on the source code files (see L</"The .i
1268Targets">) So, for instance, you can't say
1269
1270    print SvPV_nolen(sv)
1271
1272but you have to say
1273
1274    print Perl_sv_2pv_nolen(sv)
1275
1276=back
1277
1278You may find it helpful to have a "macro dictionary", which you can
1279produce by saying C<cpp -dM perl.c | sort>.  Even then, F<cpp> won't
1280recursively apply those macros for you.
1281
1282=head2 gdb macro support
1283
1284Recent versions of F<gdb> have fairly good macro support, but in order
1285to use it you'll need to compile perl with macro definitions included
1286in the debugging information.  Using F<gcc> version 3.1, this means
1287configuring with C<-Doptimize=-g3>.  Other compilers might use a
1288different switch (if they support debugging macros at all).
1289
1290=head2 Dumping Perl Data Structures
1291
1292One way to get around this macro hell is to use the dumping functions
1293in F<dump.c>; these work a little like an internal
1294L<Devel::Peek|Devel::Peek>, but they also cover OPs and other
1295structures that you can't get at from Perl.  Let's take an example.
1296We'll use the C<$x = $y + $z> we used before, but give it a bit of
1297context: C<$y = "6XXXX"; $z = 2.3;>.  Where's a good place to stop and
1298poke around?
1299
1300What about C<pp_add>, the function we examined earlier to implement the
1301C<+> operator:
1302
1303    (gdb) break Perl_pp_add
1304    Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
1305
1306Notice we use C<Perl_pp_add> and not C<pp_add> - see
1307L<perlguts/Internal Functions>.  With the breakpoint in place, we can
1308run our program:
1309
1310    (gdb) run -e '$y = "6XXXX"; $z = 2.3; $x = $y + $z'
1311
1312Lots of junk will go past as gdb reads in the relevant source files and
1313libraries, and then:
1314
1315    Breakpoint 1, Perl_pp_add () at pp_hot.c:309
1316    1396    dSP; dATARGET; bool useleft; SV *svl, *svr;
1317    (gdb) step
1318    311           dPOPTOPnnrl_ul;
1319    (gdb)
1320
1321We looked at this bit of code before, and we said that
1322C<dPOPTOPnnrl_ul> arranges for two C<NV>s to be placed into C<left> and
1323C<right> - let's slightly expand it:
1324
1325 #define dPOPTOPnnrl_ul  NV right = POPn; \
1326                         SV *leftsv = TOPs; \
1327                         NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
1328
1329C<POPn> takes the SV from the top of the stack and obtains its NV
1330either directly (if C<SvNOK> is set) or by calling the C<sv_2nv>
1331function.  C<TOPs> takes the next SV from the top of the stack - yes,
1332C<POPn> uses C<TOPs> - but doesn't remove it.  We then use C<SvNV> to
1333get the NV from C<leftsv> in the same way as before - yes, C<POPn> uses
1334C<SvNV>.
1335
1336Since we don't have an NV for C<$y>, we'll have to use C<sv_2nv> to
1337convert it.  If we step again, we'll find ourselves there:
1338
1339    (gdb) step
1340    Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
1341    1669        if (!sv)
1342    (gdb)
1343
1344We can now use C<Perl_sv_dump> to investigate the SV:
1345
1346    (gdb) print Perl_sv_dump(sv)
1347    SV = PV(0xa057cc0) at 0xa0675d0
1348    REFCNT = 1
1349    FLAGS = (POK,pPOK)
1350    PV = 0xa06a510 "6XXXX"\0
1351    CUR = 5
1352    LEN = 6
1353    $1 = void
1354
1355We know we're going to get C<6> from this, so let's finish the
1356subroutine:
1357
1358    (gdb) finish
1359    Run till exit from #0  Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
1360    0x462669 in Perl_pp_add () at pp_hot.c:311
1361    311           dPOPTOPnnrl_ul;
1362
1363We can also dump out this op: the current op is always stored in
1364C<PL_op>, and we can dump it with C<Perl_op_dump>.  This'll give us
1365similar output to CPAN module L<B::Debug>.
1366
1367=for apidoc_section $debugging
1368=for apidoc Amnh||PL_op
1369
1370    (gdb) print Perl_op_dump(PL_op)
1371    {
1372    13  TYPE = add  ===> 14
1373        TARG = 1
1374        FLAGS = (SCALAR,KIDS)
1375        {
1376            TYPE = null  ===> (12)
1377              (was rv2sv)
1378            FLAGS = (SCALAR,KIDS)
1379            {
1380    11          TYPE = gvsv  ===> 12
1381                FLAGS = (SCALAR)
1382                GV = main::b
1383            }
1384        }
1385
1386# finish this later #
1387
1388=head2 Using gdb to look at specific parts of a program
1389
1390With the example above, you knew to look for C<Perl_pp_add>, but what
1391if there were multiple calls to it all over the place, or you didn't
1392know what the op was you were looking for?
1393
1394One way to do this is to inject a rare call somewhere near what you're
1395looking for.  For example, you could add C<study> before your method:
1396
1397    study;
1398
1399And in gdb do:
1400
1401    (gdb) break Perl_pp_study
1402
1403And then step until you hit what you're looking for.  This works well
1404in a loop if you want to only break at certain iterations:
1405
1406    for my $i (1..100) {
1407        study if $i == 50;
1408    }
1409
1410=head2 Using gdb to look at what the parser/lexer are doing
1411
1412If you want to see what perl is doing when parsing/lexing your code,
1413you can use C<BEGIN {}>:
1414
1415    print "Before\n";
1416    BEGIN { study; }
1417    print "After\n";
1418
1419And in gdb:
1420
1421    (gdb) break Perl_pp_study
1422
1423If you want to see what the parser/lexer is doing inside of C<if>
1424blocks and the like you need to be a little trickier:
1425
1426    if ($x && $y && do { BEGIN { study } 1 } && $z) { ... }
1427
1428=head1 SOURCE CODE STATIC ANALYSIS
1429
1430Various tools exist for analysing C source code B<statically>, as
1431opposed to B<dynamically>, that is, without executing the code.  It is
1432possible to detect resource leaks, undefined behaviour, type
1433mismatches, portability problems, code paths that would cause illegal
1434memory accesses, and other similar problems by just parsing the C code
1435and looking at the resulting graph, what does it tell about the
1436execution and data flows.  As a matter of fact, this is exactly how C
1437compilers know to give warnings about dubious code.
1438
1439=head2 lint
1440
1441The good old C code quality inspector, C<lint>, is available in several
1442platforms, but please be aware that there are several different
1443implementations of it by different vendors, which means that the flags
1444are not identical across different platforms.
1445
1446There is a C<lint> target in Makefile, but you may have to diddle with
1447the flags (see above).
1448
1449=head2 Coverity
1450
1451Coverity (L<https://www.coverity.com/>) is a product similar to lint and
1452as a testbed for their product they periodically check several open
1453source projects, and they give out accounts to open source developers
1454to the defect databases.
1455
1456There is Coverity setup for the perl5 project:
1457L<https://scan.coverity.com/projects/perl5>
1458
1459=head2 HP-UX cadvise (Code Advisor)
1460
1461HP has a C/C++ static analyzer product for HP-UX caller Code Advisor.
1462(Link not given here because the URL is horribly long and seems
1463horribly unstable; use the search engine of your choice to find it.)
1464The use of the C<cadvise_cc> recipe with C<Configure ...
1465-Dcc=./cadvise_cc> (see cadvise "User Guide") is recommended; as is the
1466use of C<+wall>.
1467
1468=head2 cpd (cut-and-paste detector)
1469
1470The cpd tool detects cut-and-paste coding.  If one instance of the
1471cut-and-pasted code changes, all the other spots should probably be
1472changed, too.  Therefore such code should probably be turned into a
1473subroutine or a macro.
1474
1475cpd (L<https://docs.pmd-code.org/latest/pmd_userdocs_cpd.html>) is part
1476of the pmd project (L<https://pmd.github.io/>).  pmd was originally
1477written for static analysis of Java code, but later the cpd part of it
1478was extended to parse also C and C++.
1479
1480Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the
1481pmd-X.Y.jar from it, and then run that on source code thusly:
1482
1483  java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD \
1484   --minimum-tokens 100 --files /some/where/src --language c > cpd.txt
1485
1486You may run into memory limits, in which case you should use the -Xmx
1487option:
1488
1489  java -Xmx512M ...
1490
1491=head2 gcc warnings
1492
1493Though much can be written about the inconsistency and coverage
1494problems of gcc warnings (like C<-Wall> not meaning "all the warnings",
1495or some common portability problems not being covered by C<-Wall>, or
1496C<-ansi> and C<-pedantic> both being a poorly defined collection of
1497warnings, and so forth), gcc is still a useful tool in keeping our
1498coding nose clean.
1499
1500The C<-Wall> is by default on.
1501
1502It would be nice for C<-pedantic>) to be on always, but unfortunately
1503it is not safe on all platforms - for example fatal conflicts with the
1504system headers (Solaris being a prime example).  If Configure
1505C<-Dgccansipedantic> is used, the C<cflags> frontend selects
1506C<-pedantic> for the platforms where it is known to be safe.
1507
1508The following extra flags are added:
1509
1510=over 4
1511
1512=item *
1513
1514C<-Wendif-labels>
1515
1516=item *
1517
1518C<-Wextra>
1519
1520=item *
1521
1522C<-Wc++-compat>
1523
1524=item *
1525
1526C<-Wwrite-strings>
1527
1528=item *
1529
1530C<-Werror=pointer-arith>
1531
1532=item *
1533
1534C<-Werror=vla>
1535
1536=back
1537
1538The following flags would be nice to have but they would first need
1539their own Augean stablemaster:
1540
1541=over 4
1542
1543=item *
1544
1545C<-Wshadow>
1546
1547=item *
1548
1549C<-Wstrict-prototypes>
1550
1551=back
1552
1553The C<-Wtraditional> is another example of the annoying tendency of gcc
1554to bundle a lot of warnings under one switch (it would be impossible to
1555deploy in practice because it would complain a lot) but it does contain
1556some warnings that would be beneficial to have available on their own,
1557such as the warning about string constants inside macros containing the
1558macro arguments: this behaved differently pre-ANSI than it does in
1559ANSI, and some C compilers are still in transition, AIX being an
1560example.
1561
1562=head2 Warnings of other C compilers
1563
1564Other C compilers (yes, there B<are> other C compilers than gcc) often
1565have their "strict ANSI" or "strict ANSI with some portability
1566extensions" modes on, like for example the Sun Workshop has its C<-Xa>
1567mode on (though implicitly), or the DEC (these days, HP...) has its
1568C<-std1> mode on.
1569
1570=head1 MEMORY DEBUGGERS
1571
1572B<NOTE 1>: Running under older memory debuggers such as Purify,
1573valgrind or Third Degree greatly slows down the execution: seconds
1574become minutes, minutes become hours.  For example as of Perl 5.8.1,
1575the F<ext/Encode/t/Unicode.t> test takes extraordinarily long to
1576complete under e.g. Purify, Third Degree, and valgrind.  Under valgrind
1577it takes more than six hours, even on a snappy computer.  Said test
1578must be doing something that is quite unfriendly for memory debuggers.
1579If you don't feel like waiting, you can simply kill the perl process.
1580Roughly valgrind slows down execution by factor 10, AddressSanitizer by
1581factor 2.
1582
1583B<NOTE 2>: To minimize the number of memory leak false alarms (see
1584L</PERL_DESTRUCT_LEVEL> for more information), you have to set the
1585environment variable C<PERL_DESTRUCT_LEVEL> to 2.  For example, like
1586this:
1587
1588    env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
1589
1590B<NOTE 3>: There are known memory leaks when there are compile-time
1591errors within C<eval> or C<require>; seeing C<S_doeval> in the call
1592stack is a good sign of these.  Fixing these leaks is non-trivial,
1593unfortunately, but they must be fixed eventually.
1594
1595B<NOTE 4>: L<DynaLoader> will not clean up after itself completely
1596unless Perl is built with the Configure option
1597C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>.
1598
1599=head2 valgrind
1600
1601The valgrind tool can be used to find out both memory leaks and illegal
1602heap memory accesses.  As of version 3.3.0, Valgrind only supports
1603Linux on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64.
1604The special "test.valgrind" target can be used to run the tests under
1605valgrind.  Found errors and memory leaks are logged in files named
1606F<testfile.valgrind> and by default output is displayed inline.
1607
1608Example usage:
1609
1610    make test.valgrind
1611
1612Since valgrind adds significant overhead, tests will take much longer
1613to run.  The valgrind tests support being run in parallel to help with
1614this:
1615
1616    TEST_JOBS=9 make test.valgrind
1617
1618Note that the above two invocations will be very verbose as reachable
1619memory and leak-checking is enabled by default.  If you want to just
1620see pure errors, try:
1621
1622    VG_OPTS='-q --leak-check=no --show-reachable=no' TEST_JOBS=9 \
1623        make test.valgrind
1624
1625Valgrind also provides a cachegrind tool, invoked on perl as:
1626
1627    VG_OPTS=--tool=cachegrind make test.valgrind
1628
1629As system libraries (most notably glibc) are also triggering errors,
1630valgrind allows to suppress such errors using suppression files.  The
1631default suppression file that comes with valgrind already catches a lot
1632of them.  Some additional suppressions are defined in F<t/perl.supp>.
1633
1634To get valgrind and for more information see L<https://valgrind.org/>.
1635
1636=head2 AddressSanitizer
1637
1638AddressSanitizer ("ASan") consists of a compiler instrumentation module
1639and a run-time C<malloc> library. ASan is available for a variety of
1640architectures, operating systems, and compilers (see project link
1641below). It checks for unsafe memory usage, such as use after free and
1642buffer overflow conditions, and is fast enough that you can easily
1643compile your debugging or optimized perl with it. Modern versions of
1644ASan check for memory leaks by default on most platforms, otherwise
1645(e.g. x86_64 OS X) this feature can be enabled via
1646C<ASAN_OPTIONS=detect_leaks=1>.
1647
1648
1649To build perl with AddressSanitizer, your Configure invocation should
1650look like:
1651
1652    sh Configure -des -Dcc=clang \
1653       -Accflags=-fsanitize=address -Aldflags=-fsanitize=address \
1654       -Alddlflags=-shared\ -fsanitize=address \
1655       -fsanitize-blacklist=`pwd`/asan_ignore
1656
1657where these arguments mean:
1658
1659=over 4
1660
1661=item * -Dcc=clang
1662
1663This should be replaced by the full path to your clang executable if it
1664is not in your path.
1665
1666=item * -Accflags=-fsanitize=address
1667
1668Compile perl and extensions sources with AddressSanitizer.
1669
1670=item * -Aldflags=-fsanitize=address
1671
1672Link the perl executable with AddressSanitizer.
1673
1674=item * -Alddlflags=-shared\ -fsanitize=address
1675
1676Link dynamic extensions with AddressSanitizer.  You must manually
1677specify C<-shared> because using C<-Alddlflags=-shared> will prevent
1678Configure from setting a default value for C<lddlflags>, which usually
1679contains C<-shared> (at least on Linux).
1680
1681=item * -fsanitize-blacklist=`pwd`/asan_ignore
1682
1683AddressSanitizer will ignore functions listed in the C<asan_ignore>
1684file.  (This file should contain a short explanation of why each of the
1685functions is listed.)
1686
1687=back
1688
1689See also L<https://github.com/google/sanitizers/wiki/AddressSanitizer>.
1690
1691=head2 Dr Memory
1692
1693Dr. Memory is a tool similar to valgrind which is usable on Windows
1694and Linux.
1695
1696It supports heap checking like C<memcheck> from valgrind.  There are
1697also other tools included.
1698
1699See L<https://drmemory.org/>.
1700
1701
1702=head1 PROFILING
1703
1704Depending on your platform there are various ways of profiling Perl.
1705
1706There are two commonly used techniques of profiling executables:
1707I<statistical time-sampling> and I<basic-block counting>.
1708
1709The first method takes periodically samples of the CPU program counter,
1710and since the program counter can be correlated with the code generated
1711for functions, we get a statistical view of in which functions the
1712program is spending its time.  The caveats are that very small/fast
1713functions have lower probability of showing up in the profile, and that
1714periodically interrupting the program (this is usually done rather
1715frequently, in the scale of milliseconds) imposes an additional
1716overhead that may skew the results.  The first problem can be
1717alleviated by running the code for longer (in general this is a good
1718idea for profiling), the second problem is usually kept in guard by the
1719profiling tools themselves.
1720
1721The second method divides up the generated code into I<basic blocks>.
1722Basic blocks are sections of code that are entered only in the
1723beginning and exited only at the end.  For example, a conditional jump
1724starts a basic block.  Basic block profiling usually works by
1725I<instrumenting> the code by adding I<enter basic block #nnnn>
1726book-keeping code to the generated code.  During the execution of the
1727code the basic block counters are then updated appropriately.  The
1728caveat is that the added extra code can skew the results: again, the
1729profiling tools usually try to factor their own effects out of the
1730results.
1731
1732=head2 Gprof Profiling
1733
1734I<gprof> is a profiling tool available in many Unix platforms which
1735uses I<statistical time-sampling>.  You can build a profiled version of
1736F<perl> by compiling using gcc with the flag C<-pg>.  Either edit
1737F<config.sh> or re-run F<Configure>.  Running the profiled version of
1738Perl will create an output file called F<gmon.out> which contains the
1739profiling data collected during the execution.
1740
1741quick hint:
1742
1743    $ sh Configure -des -Dusedevel -Accflags='-pg' \
1744        -Aldflags='-pg' -Alddlflags='-pg -shared' \
1745        && make perl
1746    $ ./perl ... # creates gmon.out in current directory
1747    $ gprof ./perl > out
1748    $ less out
1749
1750(you probably need to add C<-shared> to the <-Alddlflags> line until RT
1751#118199 is resolved)
1752
1753The F<gprof> tool can then display the collected data in various ways.
1754Usually F<gprof> understands the following options:
1755
1756=over 4
1757
1758=item * -a
1759
1760Suppress statically defined functions from the profile.
1761
1762=item * -b
1763
1764Suppress the verbose descriptions in the profile.
1765
1766=item * -e routine
1767
1768Exclude the given routine and its descendants from the profile.
1769
1770=item * -f routine
1771
1772Display only the given routine and its descendants in the profile.
1773
1774=item * -s
1775
1776Generate a summary file called F<gmon.sum> which then may be given to
1777subsequent gprof runs to accumulate data over several runs.
1778
1779=item * -z
1780
1781Display routines that have zero usage.
1782
1783=back
1784
1785For more detailed explanation of the available commands and output
1786formats, see your own local documentation of F<gprof>.
1787
1788=head2 GCC gcov Profiling
1789
1790I<basic block profiling> is officially available in gcc 3.0 and later.
1791You can build a profiled version of F<perl> by compiling using gcc with
1792the flags C<-fprofile-arcs -ftest-coverage>.  Either edit F<config.sh>
1793or re-run F<Configure>.
1794
1795quick hint:
1796
1797    $ sh Configure -des -Dusedevel -Doptimize='-g' \
1798        -Accflags='-fprofile-arcs -ftest-coverage' \
1799        -Aldflags='-fprofile-arcs -ftest-coverage' \
1800        -Alddlflags='-fprofile-arcs -ftest-coverage -shared' \
1801        && make perl
1802    $ rm -f regexec.c.gcov regexec.gcda
1803    $ ./perl ...
1804    $ gcov regexec.c
1805    $ less regexec.c.gcov
1806
1807(you probably need to add C<-shared> to the <-Alddlflags> line until RT
1808#118199 is resolved)
1809
1810Running the profiled version of Perl will cause profile output to be
1811generated.  For each source file an accompanying F<.gcda> file will be
1812created.
1813
1814To display the results you use the I<gcov> utility (which should be
1815installed if you have gcc 3.0 or newer installed).  F<gcov> is run on
1816source code files, like this
1817
1818    gcov sv.c
1819
1820which will cause F<sv.c.gcov> to be created.  The F<.gcov> files
1821contain the source code annotated with relative frequencies of
1822execution indicated by "#" markers.  If you want to generate F<.gcov>
1823files for all profiled object files, you can run something like this:
1824
1825    for file in `find . -name \*.gcno`
1826    do sh -c "cd `dirname $file` && gcov `basename $file .gcno`"
1827    done
1828
1829Useful options of F<gcov> include C<-b> which will summarise the basic
1830block, branch, and function call coverage, and C<-c> which instead of
1831relative frequencies will use the actual counts.  For more information
1832on the use of F<gcov> and basic block profiling with gcc, see the
1833latest GNU CC manual.  As of gcc 4.8, this is at
1834L<https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro>.
1835
1836=head2 callgrind profiling
1837
1838callgrind is a valgrind tool for profiling source code. Paired with
1839kcachegrind (a Qt based UI), it gives you an overview of where code is
1840taking up time, as well as the ability to examine callers, call trees,
1841and more. One of its benefits is you can use it on perl and XS modules
1842that have not been compiled with debugging symbols.
1843
1844If perl is compiled with debugging symbols (C<-g>), you can view the
1845annotated source and click around, much like L<Devel::NYTProf>'s HTML
1846output.
1847
1848For basic usage:
1849
1850    valgrind --tool=callgrind ./perl ...
1851
1852By default it will write output to F<callgrind.out.PID>, but you can
1853change that with C<--callgrind-out-file=...>
1854
1855To view the data, do:
1856
1857    kcachegrind callgrind.out.PID
1858
1859If you'd prefer to view the data in a terminal, you can use
1860F<callgrind_annotate>.  In its basic form:
1861
1862    callgrind_annotate callgrind.out.PID | less
1863
1864Some useful options are:
1865
1866=over 4
1867
1868=item * --threshold
1869
1870Percentage of counts (of primary sort event) we are interested in. The
1871default is 99%, 100% might show things that seem to be missing.
1872
1873=item * --auto
1874
1875Annotate all source files containing functions that helped reach the
1876event count threshold.
1877
1878=back
1879
1880=head2 C<profiler> profiling (Cygwin)
1881
1882Cygwin allows for C<gprof> profiling and C<gcov> coverage testing, but
1883this only profiles the main executable.
1884
1885You can use the C<profiler> tool to perform sample based profiling, it
1886requires no special preparation of the executables beyond debugging
1887symbols.
1888
1889This produces sampling data which can be processed with C<gprof>.
1890
1891There is L<limited
1892documentation|https://www.cygwin.com/cygwin-ug-net/profiler.html> on
1893the Cygwin web site.
1894
1895=head2 Visual Studio Profiling
1896
1897You can use the Visual Studio profiler to profile perl if you've built
1898perl with MSVC, even though we build perl at the command-line.  You
1899will need to build perl with C<CFG=Debug> or C<CFG=DebugSymbols>.
1900
1901The Visual Studio profiler is a sampling profiler.
1902
1903See L<the visual studio
1904documentation|https://github.com/MicrosoftDocs/visualstudio-docs/blob/main/docs/profiling/beginners-guide-to-performance-profiling.md>
1905to get started.
1906
1907=head1 MISCELLANEOUS TRICKS
1908
1909=head2 PERL_DESTRUCT_LEVEL
1910
1911If you want to run any of the tests yourself manually using e.g.
1912valgrind, please note that by default perl B<does not> explicitly clean
1913up all the memory it has allocated (such as global memory arenas) but
1914instead lets the C<exit()> of the whole program "take care" of such
1915allocations, also known as "global destruction of objects".
1916
1917There is a way to tell perl to do complete cleanup: set the environment
1918variable C<PERL_DESTRUCT_LEVEL> to a non-zero value.  The F<t/TEST>
1919wrapper does set this to 2, and this is what you need to do too, if you
1920don't want to see the "global leaks": For example, for running under
1921valgrind
1922
1923    env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib t/foo/bar.t
1924
1925(Note: the mod_perl Apache module uses this environment variable for
1926its own purposes and extends its semantics.  Refer to L<the mod_perl
1927documentation|https://perl.apache.org/docs/> for more information.
1928Also, spawned threads do the equivalent of setting this variable to the
1929value 1.)
1930
1931If, at the end of a run, you get the message I<N scalars leaked>, you
1932can recompile with C<-DDEBUG_LEAKING_SCALARS> (C<Configure
1933-Accflags=-DDEBUG_LEAKING_SCALARS>), which will cause the addresses of
1934all those leaked SVs to be dumped along with details as to where each
1935SV was originally allocated.  This information is also displayed by
1936L<Devel::Peek>.  Note that the extra details recorded with each SV
1937increase memory usage, so it shouldn't be used in production
1938environments.  It also converts C<new_SV()> from a macro into a real
1939function, so you can use your favourite debugger to discover where
1940those pesky SVs were allocated.
1941
1942If you see that you're leaking memory at runtime, but neither valgrind
1943nor C<-DDEBUG_LEAKING_SCALARS> will find anything, you're probably
1944leaking SVs that are still reachable and will be properly cleaned up
1945during destruction of the interpreter.  In such cases, using the C<-Dm>
1946switch can point you to the source of the leak.  If the executable was
1947built with C<-DDEBUG_LEAKING_SCALARS>, C<-Dm> will output SV
1948allocations in addition to memory allocations.  Each SV allocation has
1949a distinct serial number that will be written on creation and
1950destruction of the SV.  So if you're executing the leaking code in a
1951loop, you need to look for SVs that are created, but never destroyed
1952between each cycle.  If such an SV is found, set a conditional
1953breakpoint within C<new_SV()> and make it break only when
1954C<PL_sv_serial> is equal to the serial number of the leaking SV.  Then
1955you will catch the interpreter in exactly the state where the leaking
1956SV is allocated, which is sufficient in many cases to find the source
1957of the leak.
1958
1959As C<-Dm> is using the PerlIO layer for output, it will by itself
1960allocate quite a bunch of SVs, which are hidden to avoid recursion. You
1961can bypass the PerlIO layer if you use the SV logging provided by
1962C<-DPERL_MEM_LOG> instead.
1963
1964=for apidoc_section $debugging
1965=for apidoc Amnh||PL_sv_serial
1966
1967=head2 Leaked SV spotting: sv_mark_arenas() and sv_sweep_arenas()
1968
1969These functions exist only on C<DEBUGGING> builds. The first marks all
1970live SVs which can be found in the SV arenas with the C<SVf_BREAK> flag.
1971The second lists any such SVs which don't have the flag set, and resets
1972the flag on the rest. They are intended to identify SVs which are being
1973created, but not freed, between two points in code. They can be used
1974either by temporarily adding calls to them in the relevant places in the
1975code, or by calling them directly from a debugger.
1976
1977For example, suppose the following code was found to be leaking:
1978
1979    while (1) { eval '\(1..3)' }
1980
1981A F<gdb> session on a threaded perl might look something like this:
1982
1983    $ gdb ./perl
1984    (gdb) break Perl_pp_entereval
1985    (gdb) run -e'while (1) { eval q{\(1..3)} }'
1986    ...
1987    Breakpoint 1, Perl_pp_entereval ....
1988    (gdb) call Perl_sv_mark_arenas(my_perl)
1989    (gdb) continue
1990    ...
1991    Breakpoint 1, Perl_pp_entereval ....`
1992    (gdb) call Perl_sv_sweep_arenas(my_perl)
1993    Unmarked SV: 0xaf23a8: AV()
1994    Unmarked SV: 0xaf2408: IV(1)
1995    Unmarked SV: 0xaf2468: IV(2)
1996    Unmarked SV: 0xaf24c8: IV(3)
1997    Unmarked SV: 0xace6c8: PV("AV()"\0)
1998    Unmarked SV: 0xace848: PV("IV(1)"\0)
1999    (gdb)
2000
2001Here, at the start of the first call to pp_entereval(), all existing SVs
2002are marked. Then at the start of the second call, we list all the SVs
2003which have been since been created but not yet freed. It is quickly clear
2004that an array and its three elements are likely not being freed, perhaps
2005as a result of a bug during constant folding. The final two SVs are just
2006temporaries created during the debugging output and can be ignored.
2007
2008This trick relies on the C<SVf_BREAK> flag not otherwise being used. This
2009flag is typically used only during global destruction, but also sometimes
2010for a mark and sweep operation when looking for common elements on the two
2011sides of a list assignment. The presence of the flag can also alter the
2012behaviour of some specific actions in the core, such as choosing whether to
2013copy or to COW a string SV. So turning it on can occasionally alter the
2014behaviour of code slightly.
2015
2016=head2 PERL_MEM_LOG
2017
2018If compiled with C<-DPERL_MEM_LOG> (C<-Accflags=-DPERL_MEM_LOG>), both
2019memory and SV allocations go through logging functions, which is handy
2020for breakpoint setting.
2021
2022Unless C<-DPERL_MEM_LOG_NOIMPL> (C<-Accflags=-DPERL_MEM_LOG_NOIMPL>) is
2023also compiled, the logging functions read $ENV{PERL_MEM_LOG} to
2024determine whether to log the event, and if so how:
2025
2026    $ENV{PERL_MEM_LOG} =~ /m/           Log all memory ops
2027    $ENV{PERL_MEM_LOG} =~ /s/           Log all SV ops
2028    $ENV{PERL_MEM_LOG} =~ /c/           Additionally log C backtrace for
2029                                        new_SV events
2030    $ENV{PERL_MEM_LOG} =~ /t/           include timestamp in Log
2031    $ENV{PERL_MEM_LOG} =~ /^(\d+)/      write to FD given (default is 2)
2032
2033Memory logging is somewhat similar to C<-Dm> but is independent of
2034C<-DDEBUGGING>, and at a higher level; all uses of Newx(), Renew(), and
2035Safefree() are logged with the caller's source code file and line
2036number (and C function name, if supported by the C compiler).  In
2037contrast, C<-Dm> is directly at the point of C<malloc()>.  SV logging
2038is similar.
2039
2040Since the logging doesn't use PerlIO, all SV allocations are logged and
2041no extra SV allocations are introduced by enabling the logging.  If
2042compiled with C<-DDEBUG_LEAKING_SCALARS>, the serial number for each SV
2043allocation is also logged.
2044
2045The C<c> option uses the C<Perl_c_backtrace> facility, and therefore
2046additionally requires the Configure C<-Dusecbacktrace> compile flag in
2047order to access it.
2048
2049=head2 DDD over gdb
2050
2051Those debugging perl with the DDD frontend over gdb may find the
2052following useful:
2053
2054You can extend the data conversion shortcuts menu, so for example you
2055can display an SV's IV value with one click, without doing any typing.
2056To do that simply edit ~/.ddd/init file and add after:
2057
2058  ! Display shortcuts.
2059  Ddd*gdbDisplayShortcuts: \
2060  /t ()   // Convert to Bin\n\
2061  /d ()   // Convert to Dec\n\
2062  /x ()   // Convert to Hex\n\
2063  /o ()   // Convert to Oct(\n\
2064
2065the following two lines:
2066
2067  ((XPV*) (())->sv_any )->xpv_pv  // 2pvx\n\
2068  ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
2069
2070so now you can do ivx and pvx lookups or you can plug there the sv_peek
2071"conversion":
2072
2073  Perl_sv_peek(my_perl, (SV*)()) // sv_peek
2074
2075(The my_perl is for threaded builds.)  Just remember that every line,
2076but the last one, should end with \n\
2077
2078Alternatively edit the init file interactively via: 3rd mouse button ->
2079New Display -> Edit Menu
2080
2081Note: you can define up to 20 conversion shortcuts in the gdb section.
2082
2083=head2 C backtrace
2084
2085On some platforms Perl supports retrieving the C level backtrace
2086(similar to what symbolic debuggers like gdb do).
2087
2088The backtrace returns the stack trace of the C call frames, with the
2089symbol names (function names), the object names (like "perl"), and if
2090it can, also the source code locations (file:line).
2091
2092The supported platforms are Linux, and OS X (some *BSD might work at
2093least partly, but they have not yet been tested).
2094
2095This feature hasn't been tested with multiple threads, but it will only
2096show the backtrace of the thread doing the backtracing.
2097
2098The feature needs to be enabled with C<Configure -Dusecbacktrace>.
2099
2100The C<-Dusecbacktrace> also enables keeping the debug information when
2101compiling/linking (often: C<-g>).  Many compilers/linkers do support
2102having both optimization and keeping the debug information.  The debug
2103information is needed for the symbol names and the source locations.
2104
2105Static functions might not be visible for the backtrace.
2106
2107Source code locations, even if available, can often be missing or
2108misleading if the compiler has e.g. inlined code.  Optimizer can make
2109matching the source code and the object code quite challenging.
2110
2111=over 4
2112
2113=item Linux
2114
2115You B<must> have the BFD (-lbfd) library installed, otherwise C<perl>
2116will fail to link.  The BFD is usually distributed as part of the GNU
2117binutils.
2118
2119Summary: C<Configure ... -Dusecbacktrace> and you need C<-lbfd>.
2120
2121=item OS X
2122
2123The source code locations are supported B<only> if you have the
2124Developer Tools installed.  (BFD is B<not> needed.)
2125
2126Summary: C<Configure ... -Dusecbacktrace> and installing the Developer
2127Tools would be good.
2128
2129=back
2130
2131Optionally, for trying out the feature, you may want to enable
2132automatic dumping of the backtrace just before a warning or croak (die)
2133message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR>
2134for Configure.
2135
2136Unless the above additional feature is enabled, nothing about the
2137backtrace functionality is visible, except for the Perl/XS level.
2138
2139Furthermore, even if you have enabled this feature to be compiled, you
2140need to enable it in runtime with an environment variable:
2141C<PERL_C_BACKTRACE_ON_ERROR=10>.  It must be an integer higher than
2142zero, telling the desired frame count.
2143
2144Retrieving the backtrace from Perl level (using for example an XS
2145extension) would be much less exciting than one would hope: normally
2146you would see C<runops>, C<entersub>, and not much else.  This API is
2147intended to be called B<from within> the Perl implementation, not from
2148Perl level execution.
2149
2150The C API for the backtrace is as follows:
2151
2152=over 4
2153
2154=item get_c_backtrace
2155
2156=item free_c_backtrace
2157
2158=item get_c_backtrace_dump
2159
2160=item dump_c_backtrace
2161
2162=back
2163
2164=head2 Poison
2165
2166If you see in a debugger a memory area mysteriously full of 0xABABABAB
2167or 0xEFEFEFEF, you may be seeing the effect of the Poison() macros, see
2168L<perlclib>.
2169
2170=head2 Read-only optrees
2171
2172Under ithreads the optree is read only.  If you want to enforce this,
2173to check for write accesses from buggy code, compile with
2174C<-Accflags=-DPERL_DEBUG_READONLY_OPS> to enable code that allocates op
2175memory via C<mmap>, and sets it read-only when it is attached to a
2176subroutine. Any write access to an op results in a C<SIGBUS> and abort.
2177
2178This code is intended for development only, and may not be portable
2179even to all Unix variants.  Also, it is an 80% solution, in that it
2180isn't able to make all ops read only.  Specifically it does not apply
2181to op slabs belonging to C<BEGIN> blocks.
2182
2183However, as an 80% solution it is still effective, as it has caught
2184bugs in the past.
2185
2186=head2 When is a bool not a bool?
2187
2188There wasn't necessarily a standard C<bool> type on compilers prior to
2189C99, and so some workarounds were created.  The C<TRUE> and C<FALSE>
2190macros are still available as alternatives for C<true> and C<false>.
2191And the C<cBOOL> macro was created to correctly cast to a true/false
2192value in all circumstances, but should no longer be necessary.  Using
2193S<C<(bool)> I<expr>>> should now always work.
2194
2195There are no plans to remove any of C<TRUE>, C<FALSE>, nor C<cBOOL>.
2196
2197=head2 Finding unsafe truncations
2198
2199You may wish to run C<Configure> with something like
2200
2201    -Accflags='-Wconversion -Wno-sign-conversion -Wno-shorten-64-to-32'
2202
2203or your compiler's equivalent to make it easier to spot any unsafe
2204truncations that show up.
2205
2206=head2 The .i Targets
2207
2208You can expand the macros in a F<foo.c> file by saying
2209
2210    make foo.i
2211
2212which will expand the macros using cpp.  Don't be scared by the
2213results.
2214
2215=head1 AUTHOR
2216
2217This document was originally written by Nathan Torkington, and is
2218maintained by the perl5-porters mailing list.
2219
2220