1# From laura_fairhead@talk21.com  Fri May 10 11:24:41 2002
2# Return-Path: <laura_fairhead@talk21.com>
3# Received: from localhost (aahz [127.0.0.1])
4# 	by skeeve.com (8.11.2/8.11.2) with ESMTP id g4A8OdU01822
5# 	for <arnold@localhost>; Fri, 10 May 2002 11:24:40 +0300
6# Received: from actcom.co.il [192.114.47.1]
7# 	by localhost with POP3 (fetchmail-5.7.4)
8# 	for arnold@localhost (single-drop); Fri, 10 May 2002 11:24:40 +0300 (IDT)
9# Received: by actcom.co.il (mbox arobbins)
10#  (with Cubic Circle's cucipop (v1.31 1998/05/13) Fri May 10 11:30:42 2002)
11# X-From_: laura_fairhead@talk21.com Fri May 10 05:39:57 2002
12# Received: from lmail.actcom.co.il by actcom.co.il  with ESMTP
13# 	(8.11.6/actcom-0.2) id g4A2dpw26380 for <arobbins@actcom.co.il>;
14# 	Fri, 10 May 2002 05:39:52 +0300 (EET DST)
15# 	(rfc931-sender: mail.actcom.co.il [192.114.47.13])
16# Received: from f7.net (consort.superb.net [209.61.216.22])
17# 	by lmail.actcom.co.il (8.11.6/8.11.6) with ESMTP id g4A2dxl10851
18# 	for <arobbins@actcom.co.il>; Fri, 10 May 2002 05:39:59 +0300
19# Received: from fencepost.gnu.org (fencepost.gnu.org [199.232.76.164])
20# 	by f7.net (8.11.6/8.11.6) with ESMTP id g4A2dwN11097
21# 	for <arnold@skeeve.com>; Thu, 9 May 2002 22:39:58 -0400
22# Received: from [194.73.242.6] (helo=wmpmta04-app.mail-store.com)
23# 	by fencepost.gnu.org with smtp (Exim 3.34 #1 (Debian))
24# 	id 1760K4-0001QX-00
25# 	for <bug-gawk@gnu.org>; Thu, 09 May 2002 22:39:56 -0400
26# Received: from wmpmtavirtual ([10.216.84.15])
27#           by wmpmta04-app.mail-store.com
28#           (InterMail vM.5.01.02.00 201-253-122-103-101-20001108) with SMTP
29#           id <20020510023921.EEW24107.wmpmta04-app.mail-store.com@wmpmtavirtual>
30#           for <bug-gawk@gnu.org>; Fri, 10 May 2002 03:39:21 +0100
31# Received: from 213.1.102.243 by t21web05-lrs ([10.216.84.15]); Fri, 10 May 02 03:38:42 GMT+01:00
32# X-Mailer: talk21 v1.24 - http://talk21.btopenworld.com
33# From: laura_fairhead@talk21.com
34# To: bug-gawk@gnu.org
35# X-Talk21Ref: none
36# Date: Fri, 10 May 2002 03:38:42 GMT+01:00
37# Subject: bug in gawk 3.1.0 regex code
38# Mime-Version: 1.0
39# Content-type: multipart/mixed; boundary="--GgOuLpDpIyE--1020998322088--"
40# Message-Id: <20020510023921.EEW24107.wmpmta04-app.mail-store.com@wmpmtavirtual>
41# X-SpamBouncer: 1.4 (10/07/01)
42# X-SBClass: OK
43# Status: RO
44#
45# Multipart Message Boundary - attachment/bodypart follows:
46#
47#
48# ----GgOuLpDpIyE--1020998322088--
49# Content-Type: text/plain
50# Content-Transfer-Encoding: 7bit
51#
52#
53# I believe I've just found a bug in gawk3.1.0 implementation of
54# extended regular expressions. It seems to be down to the alternation
55# operator; when using an end anchor '$' as a subexpression in an
56# alternation and the entire matched RE is a nul-string it fails
57# to match the end of string, for example;
58#
59# gsub(/$|2/,"x")
60# print
61#
62# input           = 12345
63# expected output = 1x345x
64# actual output   = 1x345
65#
66# The start anchor '^' always works as expected;
67#
68# gsub(/^|2/,"x")
69# print
70#
71# input           = 12345
72# expected output = x1x345
73# actual output   = x1x345
74#
75# This was with POSIX compliance enabled althought that doesn't
76# effect the result.
77#
78# I checked on gawk3.0.6 and got exactly the same results however
79# gawk2.15.6 gives the expected results.
80#
81# All the follow platforms produced the same results;
82#
83# gawk3.0.6 / Win98 / i386
84# gawk3.1.0 / Win98 / i386
85# gawk3.0.5 / Linux2.2.16 / i386
86#
87# Complete test results were as follows;
88#
89# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
90# regex               input     expected  actual    bug?
91# -------------------------------------------------------------
92# (^)                 12345     x12345    x12345
93# ($)                 12345     12345x    12345x
94# (^)|($)             12345     x12345x   x12345x
95# ($)|(^)             12345     x12345x   x12345x
96# 2                   12345     1x345     1x345
97# (^)|2               12345     x1x345    x1x345
98# 2|(^)               12345     x1x345    x1x345
99# ($)|2               12345     1x345x    1x345     **BUG**
100# 2|($)               12345     1x345x    1x345     **BUG**
101# (2)|(^)             12345     x1x345    x1x345
102# (^)|(2)             12345     x1x345    x1x345
103# (2)|($)             12345     1x345x    1x345     **BUG**
104# ($)|(2)             12345     1x345x    1x345     **BUG**
105# ((2)|(^)).          12345     xx45      xx45
106# ((^)|(2)).          12345     xx45      xx45
107# .((2)|($))          12345     x34x      x34x
108# .(($)|(2))          12345     x34x      x34x
109# (^)|6               12345     x12345    x12345
110# 6|(^)               12345     x12345    x12345
111# ($)|6               12345     12345x    12345x
112# 6|($)               12345     12345x    12345x
113# 2|6|(^)             12345     x1x345    x1x345
114# 2|(^)|6             12345     x1x345    x1x345
115# 6|2|(^)             12345     x1x345    x1x345
116# 6|(^)|2             12345     x1x345    x1x345
117# (^)|6|2             12345     x1x345    x1x345
118# (^)|2|6             12345     x1x345    x1x345
119# 2|6|($)             12345     1x345x    1x345     **BUG**
120# 2|($)|6             12345     1x345x    1x345     **BUG**
121# 6|2|($)             12345     1x345x    1x345     **BUG**
122# 6|($)|2             12345     1x345x    1x345     **BUG**
123# ($)|6|2             12345     1x345x    1x345     **BUG**
124# ($)|2|6             12345     1x345x    1x345     **BUG**
125# 2|4|(^)             12345     x1x3x5    x1x3x5
126# 2|(^)|4             12345     x1x3x5    x1x3x5
127# 4|2|(^)             12345     x1x3x5    x1x3x5
128# 4|(^)|2             12345     x1x3x5    x1x3x5
129# (^)|4|2             12345     x1x3x5    x1x3x5
130# (^)|2|4             12345     x1x3x5    x1x3x5
131# 2|4|($)             12345     1x3x5x    1x3x5     **BUG**
132# 2|($)|4             12345     1x3x5x    1x3x5     **BUG**
133# 4|2|($)             12345     1x3x5x    1x3x5     **BUG**
134# 4|($)|2             12345     1x3x5x    1x3x5     **BUG**
135# ($)|4|2             12345     1x3x5x    1x3x5     **BUG**
136# ($)|2|4             12345     1x3x5x    1x3x5     **BUG**
137# x{0}((2)|(^))       12345     x1x345    x1x345
138# x{0}((^)|(2))       12345     x1x345    x1x345
139# x{0}((2)|($))       12345     1x345x    1x345     **BUG**
140# x{0}(($)|(2))       12345     1x345x    1x345     **BUG**
141# x*((2)|(^))         12345     x1x345    x1x345
142# x*((^)|(2))         12345     x1x345    x1x345
143# x*((2)|($))         12345     1x345x    1x345     **BUG**
144# x*(($)|(2))         12345     1x345x    1x345     **BUG**
145# x{0}^               12345     x12345    x12345
146# x{0}$               12345     12345x    12345x
147# (x{0}^)|2           12345     x1x345    x1x345
148# (x{0}$)|2           12345     1x345x    1x345     **BUG**
149# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
150#
151#
152# Here's the test program I used, a few of the cases use ERE {n[,[m]]}
153# operators so need '-W posix', (although the same results minus
154# those tests came out without POSIX compliance enabled)
155#
156# [ Invocation was 'gawk -W posix -f tregex.awk' ]
157#
158# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
159# tregex.awk
160# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
161BEGIN{
162print _=sprintf("%-20s%-10s%-10s%-10s%-10s\n","regex","input","expected","actual","bug?")
163OFS="-"
164$(length(_)+1)=""
165print $0
166
167#while(getline <ARGV[1]) # ADR: was testre.dat
168while(getline) # ADR: use stdin so can automate generation of test
169{
170RE=$1;IN=$2;OUT=$3
171$0=IN
172gsub(RE,"x")
173printf "%-20s%-10s%-10s%-10s%-10s\n",RE,IN,OUT,$0,$0==OUT?"":"**BUG**"
174}
175}
176# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
177#
178# This is the test data file used;
179#
180# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181# testre.dat
182# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
183# (^)             12345           x12345
184# ($)             12345           12345x
185# (^)|($)         12345           x12345x
186# ($)|(^)         12345           x12345x
187# 2               12345           1x345
188# (^)|2           12345           x1x345
189# 2|(^)           12345           x1x345
190# ($)|2           12345           1x345x
191# 2|($)           12345           1x345x
192# (2)|(^)         12345           x1x345
193# (^)|(2)         12345           x1x345
194# (2)|($)         12345           1x345x
195# ($)|(2)         12345           1x345x
196# ((2)|(^)).      12345           xx45
197# ((^)|(2)).      12345           xx45
198# .((2)|($))      12345           x34x
199# .(($)|(2))      12345           x34x
200# (^)|6           12345           x12345
201# 6|(^)           12345           x12345
202# ($)|6           12345           12345x
203# 6|($)           12345           12345x
204# 2|6|(^)         12345           x1x345
205# 2|(^)|6         12345           x1x345
206# 6|2|(^)         12345           x1x345
207# 6|(^)|2         12345           x1x345
208# (^)|6|2         12345           x1x345
209# (^)|2|6         12345           x1x345
210# 2|6|($)         12345           1x345x
211# 2|($)|6         12345           1x345x
212# 6|2|($)         12345           1x345x
213# 6|($)|2         12345           1x345x
214# ($)|6|2         12345           1x345x
215# ($)|2|6         12345           1x345x
216# 2|4|(^)         12345           x1x3x5
217# 2|(^)|4         12345           x1x3x5
218# 4|2|(^)         12345           x1x3x5
219# 4|(^)|2         12345           x1x3x5
220# (^)|4|2         12345           x1x3x5
221# (^)|2|4         12345           x1x3x5
222# 2|4|($)         12345           1x3x5x
223# 2|($)|4         12345           1x3x5x
224# 4|2|($)         12345           1x3x5x
225# 4|($)|2         12345           1x3x5x
226# ($)|4|2         12345           1x3x5x
227# ($)|2|4         12345           1x3x5x
228# x{0}((2)|(^))   12345           x1x345
229# x{0}((^)|(2))   12345           x1x345
230# x{0}((2)|($))   12345           1x345x
231# x{0}(($)|(2))   12345           1x345x
232# x*((2)|(^))     12345           x1x345
233# x*((^)|(2))     12345           x1x345
234# x*((2)|($))     12345           1x345x
235# x*(($)|(2))     12345           1x345x
236# x{0}^           12345           x12345
237# x{0}$           12345           12345x
238# (x{0}^)|2       12345           x1x345
239# (x{0}$)|2       12345           1x345x
240# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
241#
242# I've attached a full copy of this e-mail in ZIP format
243# in case of e-mail transport errors corrupting the data.
244#
245# I've posted the same bug report to gnu.utils.bug and
246# it's being discussed in this thread on comp.lang.awk;
247#
248# From: laura@madonnaweb.com (laura fairhead)
249# Newsgroups: comp.lang.awk
250# Subject: bug in gawk3.1.0 regex code
251# Date: Wed, 08 May 2002 23:31:40 GMT
252# Message-ID: <3cd9b0f7.29675926@NEWS.CIS.DFN.DE>
253#
254#
255# byefrom
256#
257# Laura Fairhead
258#
259#
260#
261#
262# --------------------
263# talk21 your FREE portable and private address on the net at http://www.talk21.com
264# ----GgOuLpDpIyE--1020998322088--
265# Content-Type: : application/zip;; Name="COPY.ZIP"
266# Content-Transfer-Encoding: base64
267# Content-Disposition: attachment; filename="COPY.ZIP"
268#
269# UEsDBBQAAAAIALoaqiyj8d/bjwMAAKsaAAADAAAARklMrVjfa+JAEH4P5H8ISwrRU9EYfbheKBR6
270# xRcLvevbYbFtzsqJlBrpQr3722+zMWZ31pk1MaG0Q/m+nR87O9kvruM6/5p4XOc9WSTc05/l
271# +m2bSivhb8lzmrx43vw53c5X2f+etourHOc63XMe1wlmLQ8+g3AYjaTFD2ZplY9g+xRbWly3
272# NPastYMrQN9cs4DvHYz+dHbomY8SOTctGDlcQfXND1Uz6cK3EXcVdpY37ltSuB55u339cNtu
273# F76NPTudHYR0zS2RZ/sd1maHVLdYI/cp31b2PvFW72jkvIi2tLTI94nXY/eCfeZK8Ap7GO1b
274# u7QAO8+8FjsLfFx7OowtfW6dLYRv22wZ031uYYc7M/aK5xvEfjp7vDPnQxW2OZuqndDxWeyw
275# dt6y5rXPt5xrqG8bW9a8tm8ZN1q1UyYTXvNT2HjN7VWLLL3GR7pl9nlUkx1Z+5xm2/qcYsu4
276# z2KHtfOWNad6jR92jGN9jvm2sSNbn1vYlj4n2TLus9h4zW1s/tn/e3iHV55MOXumvUarsvVX
277# +OknNGfrr/AK7DbMulLkbZh1VTa8uFSLHF5cqlVt5tW9eWRsH2VbVY10rp+TCu9Q6Rxj2/Ju
278# SJE2KG5TqW57848/jS15fXM7mX66ztv7cp16j/FGGr8DdtEN+5uL7sD49WvNOkwGIv5KaS3+
279# FsJamLmyFkYmrFnLde6+/4hZl7mOH6yS9SJ9DR5bXwatmLHCrd/PivTxulwlwSJJV8t14n1j
280# abIRCfde5mm2iojx/ib2B5eTaeyHl3cPP2N/KNbsx5Op6yw226fg/qbDeIbNc/DoHAR6Mu2I
281# dTp+X/zEsTCvGPvK9j0govsrfxqqdJN9cKhMY0vilwdPOebmRwqIy4+x+Tni+Hrc/PKAAnGZ
282# 7pXH2fyaYK6X4+B9CcPBt/RRt9z8FoDhoOpH/QJ9j+KAkkf9As2O4oA6N/xy6RWo8OMoqLYN
283# 1DDipqo+joIqEGtQqDWJRibXK9oO6igMB1Uu2XeKZwwHlSuO0zue6idVGVE4VQPheeiVIc8F
284# sV6Bg6oRx+knkup3Kl8VR+Vb5qGru2N14SNTx2E4qNhwnH1/+chUYRROvfvjeejK6khdeLm/
285# +HoFDqolHGfdX17sG5WviqPyLXBQ1WB9D/ULjSvHH9ZXUJOgOKA+UL9AZ1A4dThTftXxTOWh
286# qgRs7kI9gF4gwM0fnVfgjo/F19A96T9QSwECFAAUAAAACAC6Gqoso/Hf248DAACrGgAAAwAA
287# AAAAAAABACAAAAAAAAAARklMUEsFBgAAAAABAAEAMQAAALADAAAAAA==
288# ----GgOuLpDpIyE--1020998322088----
289#
290#
291#
292