1 /*
2  * Copyright (c) 2004, 2015, Oracle and/or its affiliates. All rights reserved.
3  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
4  *
5  * This code is free software; you can redistribute it and/or modify it
6  * under the terms of the GNU General Public License version 2 only, as
7  * published by the Free Software Foundation.
8  *
9  * This code is distributed in the hope that it will be useful, but WITHOUT
10  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
11  * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
12  * version 2 for more details (a copy is included in the LICENSE file that
13  * accompanied this code).
14  *
15  * You should have received a copy of the GNU General Public License version
16  * 2 along with this work; if not, write to the Free Software Foundation,
17  * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
18  *
19  * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
20  * or visit www.oracle.com if you need additional information or have any
21  * questions.
22  */
23 
24 /**
25  * @test
26  * @bug 5033550
27  * @summary  JDWP back end uses modified UTF-8
28  * @author jjh
29  *
30  * @run build TestScaffold VMConnection TargetListener TargetAdapter
31  * @run compile -g UTF8Test.java
32  * @run driver UTF8Test
33  */
34 
35 /*
36   There is UTF-8 and there is modified UTF-8, which I will call M-UTF-8.
37   The two differ in the representation of binary 0, and
38   in some other more esoteric representations.
39   See
40       http://java.sun.com/developer/technicalArticles/Intl/Supplementary/#Modified_UTF-8
41       http://java.sun.com/javase/6/docs/technotes/guides/jni/spec/types.html#wp16542
42 
43   All the following are observations of the treatment
44   of binary 0.  In UTF-8, this represented as one byte:
45       0x00
46 
47   while in modified UTF-8, it is represented as two bytes
48       0xc0 0x80
49 
50   ** I haven't investigated if the other differences between UTF-8 and
51      M-UTF-8 are handled in the same way.
52 
53  Here is how these our handled in our BE, JDWP, and FE:
54 
55  - Strings in .class files are M-UTF-8.
56 
57  - To get the value of a string object from the VM, our BE calls
58       char * utf = JNI_FUNC_PTR(env,GetStringUTFChars)(env, string, NULL);
59    which returns M-UTF-8.
60 
61 - To create a string object in the VM, our BE VirtualMachine.createString() calls
62       string = JNI_FUNC_PTR(env,NewStringUTF)(env, cstring);
63       This function expects the string to be M-UTF-8
64       BUG:  If the string came from JDWP, then it is actually UTF-8
65 
66 - I haven't investigated strings in JVMTI.
67 
68 - The JDWP spec says that strings are UTF-8.  The intro
69   says this for all strings, and the createString command and
70   the StringRefernce.value command say it explicitly.
71 
72 - Our FE java writes strings to JDWP as UTF-8.
73 
74 - BE function outStream_writeString uses strlen meaning
75   it expects no 0 bytes, meaning that it expects M-UTF-8
76   This function writes the byte length and then calls
77   outStream.c::writeBytes which just writes the bytes to JDWP as is.
78 
79   BUG: If such a string came from the VM via JNI, it is actually
80        M-UTF-8
81   FIX:  - scan string to see if contains an M-UTF-8 char.
82           if yes,
83              - call String(bytes, 0, len, "UTF8")
84                to get a java string.  Will this work -ie, the
85                input is M-UTF-8 instead of real UTF-8
86              - call some java method (NOT JNI which
87                would just come back with M-UTF-8)
88                on the String to get real UTF-8
89 
90 
91 - The JDWP StringReference.value command does reads a string
92   from the BE out of the JDWP stream and does this to
93   createe a Java String for it (see PacketStream.readString):
94          String readString() {
95           String ret;
96           int len = readInt();
97 
98           try {
99               ret = new String(pkt.data, inCursor, len, "UTF8");
100           } catch(java.io.UnsupportedEncodingException e) {
101 
102   This String ctor converts _both- the M-UTF-8 0xc0 0x80
103   and UTF-8 0x00  into a Java char containing 0x0000
104 
105   Does it do this for the other differences too?
106 
107 Summary:
108 1.  JDWP says strings are UTF-8.
109     We interpret this to mean standard UTF-8.
110 
111 2.  JVMTI will be changed to match JNI saying that strings
112     are M-UTF-8.
113 
114 3.  The BE gets UTF-8 strings off JDWP and must convert them to
115     M-UTF-8 before giving it to JVMTI or JNI.
116 
117 4.  The BE gets M-UTF-8 strings from JNI and JVMTI and
118     must convert them to UTF-8 when writing to JDWP.
119 
120 
121  Here is how the supplementals are represented in java Strings.
122  This from java.lang.Character doc:
123     The Java 2 platform uses the UTF-16 representation in char arrays and
124     in the String and StringBuffer classes. In this representation,
125     supplementary characters are represented as a pair of char values,
126     the first from the high-surrogates range, (\uD800-\uDBFF), the second
127     from the low-surrogates range (\uDC00-\uDFFF).
128   See utf8.txt
129 
130 
131 ----
132 
133 NSK Packet.java in the nsk/share/jdwp framework does this to write
134 a string to JDWP:
135  public void addString(String value) {
136         final int count = JDWP.TypeSize.INT + value.length();
137         addInt(value.length());
138         try {
139             addBytes(value.getBytes("UTF-8"), 0, value.length());
140         } catch (UnsupportedEncodingException e) {
141             throw new Failure("Unsupported UTF-8 ecnoding while adding string value to JDWP packet:\n\t"
142                                 + e);
143         }
144     }
145  ?? Does this get the standard UTF-8?  I would expect so.
146 
147 and the readString method does this:
148         for (int i = 0; i < len; i++)
149             s[i] = getByte();
150 
151         try {
152             return new String(s, "UTF-8");
153         } catch (UnsupportedEncodingException e) {
154             throw new Failure("Unsupported UTF-8 ecnoding while extracting string value from JDWP packet:\n\t"
155                                 + e);
156         }
157 Thus, this won't notice the modified UTF-8 coming in from JDWP .
158 
159 
160 */
161 
162 import com.sun.jdi.*;
163 import com.sun.jdi.event.*;
164 import com.sun.jdi.request.*;
165 import java.io.UnsupportedEncodingException;
166 import java.util.*;
167 
168     /********** target program **********/
169 
170 /*
171  * The debuggee has a few Strings the debugger reads via JDI
172  */
173 class UTF8Targ {
174     static String[] vals = new String[] {"xx\u0000yy",           // standard UTF-8 0
175                                          "xx\ud800\udc00yy",     // first supplementary
176                                          "xx\udbff\udfffyy"      // last supplementary
177                                          // d800 = 1101 1000 0000 0000   dc00 = 1101 1100 0000 0000
178                                          // dbff = 1101 1011 1111 1111   dfff = 1101 1111 1111 1111
179     };
180 
181     static String aField;
182 
main(String[] args)183     public static void main(String[] args){
184         System.out.println("Howdy!");
185         gus();
186         System.out.println("Goodbye from UTF8Targ!");
187     }
gus()188     static void gus() {
189     }
190 }
191 
192     /********** test program **********/
193 
194 public class UTF8Test extends TestScaffold {
195     ClassType targetClass;
196     ThreadReference mainThread;
197     Field targetField;
UTF8Test(String args[])198     UTF8Test (String args[]) {
199         super(args);
200     }
201 
main(String[] args)202     public static void main(String[] args)      throws Exception {
203         new UTF8Test(args).startTests();
204     }
205 
206     /********** test core **********/
207 
runTests()208     protected void runTests() throws Exception {
209         /*
210          * Get to the top of main()
211          * to determine targetClass and mainThread
212          */
213         BreakpointEvent bpe = startToMain("UTF8Targ");
214         targetClass = (ClassType)bpe.location().declaringType();
215         targetField = targetClass.fieldByName("aField");
216 
217         ArrayReference targetVals = (ArrayReference)targetClass.getValue(targetClass.fieldByName("vals"));
218 
219         /* For each string in the debuggee's 'val' array, verify that we can
220          * read that value via JDI.
221          */
222 
223         for (int ii = 0; ii < UTF8Targ.vals.length; ii++) {
224             StringReference val = (StringReference)targetVals.getValue(ii);
225             String valStr = val.value();
226 
227             /*
228              * Verify that we can read a value correctly.
229              * We read it via JDI, and access it directly from the static
230              * var in the debuggee class.
231              */
232             if (!valStr.equals(UTF8Targ.vals[ii]) ||
233                 valStr.length() != UTF8Targ.vals[ii].length()) {
234                 failure("     FAILED: Expected /" + printIt(UTF8Targ.vals[ii]) +
235                         "/, but got /" + printIt(valStr) + "/, length = " + valStr.length());
236             }
237         }
238 
239         /* Test 'all' unicode chars - send them to the debuggee via JDI
240          * and then read them back.
241          */
242         doFancyVersion();
243 
244         resumeTo("UTF8Targ", "gus", "()V");
245         try {
246             Thread.sleep(1000);
247         } catch (InterruptedException ee) {
248         }
249 
250 
251         /*
252          * resume the target listening for events
253          */
254 
255         listenUntilVMDisconnect();
256 
257         /*
258          * deal with results of test
259          * if anything has called failure("foo") testFailed will be true
260          */
261         if (!testFailed) {
262             println("UTF8Test: passed");
263         } else {
264             throw new Exception("UTF8Test: failed");
265         }
266     }
267 
268     /**
269      * For each unicode value, send a string containing
270      * it to the debuggee via JDI, read it back via JDI, and see if
271      * we get the same value.
272      */
doFancyVersion()273     void doFancyVersion() throws Exception {
274         // This does 4 chars at a time just to save time.
275         for (int ii = Character.MIN_CODE_POINT;
276              ii < Character.MIN_SUPPLEMENTARY_CODE_POINT;
277              ii += 4) {
278             // Skip the surrogates
279             if (ii == Character.MIN_SURROGATE) {
280                 ii = Character.MAX_SURROGATE - 3;
281                 break;
282             }
283             doFancyTest(ii, ii + 1, ii + 2, ii + 3);
284         }
285 
286         // Do the supplemental chars.
287         for (int ii = Character.MIN_SUPPLEMENTARY_CODE_POINT;
288              ii <= Character.MAX_CODE_POINT;
289              ii += 2000) {
290             // Too many of these so just do a few
291             doFancyTest(ii, ii + 1, ii + 2, ii + 3);
292         }
293 
294     }
295 
doFancyTest(int ... args)296     void doFancyTest(int ... args) throws Exception {
297         String ss = new String(args, 0, 4);
298         targetClass.setValue(targetField, vm().mirrorOf(ss));
299 
300         StringReference returnedVal = (StringReference)targetClass.getValue(targetField);
301         String returnedStr = returnedVal.value();
302 
303         if (!ss.equals(returnedStr)) {
304             failure("Set: FAILED: Expected /" + printIt(ss) +
305                     "/, but got /" + printIt(returnedStr) + "/, length = " + returnedStr.length());
306         }
307     }
308 
309     /**
310      * Return a String containing binary representations of
311      * the chars in a String.
312      */
printIt(String arg)313      String printIt(String arg) {
314         char[] carray = arg.toCharArray();
315         StringBuffer bb = new StringBuffer(arg.length() * 5);
316         for (int ii = 0; ii < arg.length(); ii++) {
317             int ccc = arg.charAt(ii);
318             bb.append(String.format("%1$04x ", ccc));
319         }
320         return bb.toString();
321     }
322 
printIt1(String arg)323     String printIt1(String arg) {
324         byte[] barray = null;
325         try {
326              barray = arg.getBytes("UTF-8");
327         } catch (UnsupportedEncodingException ee) {
328         }
329         StringBuffer bb = new StringBuffer(barray.length * 3);
330         for (int ii = 0; ii < barray.length; ii++) {
331             bb.append(String.format("%1$02x ", barray[ii]));
332         }
333         return bb.toString();
334     }
335 
336 }
337