1 /****************************************************************************
2 **
3 ** Copyright (C) 2016 The Qt Company Ltd.
4 ** Copyright (C) 2016 Intel Corporation.
5 ** Contact: https://www.qt.io/licensing/
6 **
7 ** This file is part of the QtCore module of the Qt Toolkit.
8 **
9 ** $QT_BEGIN_LICENSE:LGPL$
10 ** Commercial License Usage
11 ** Licensees holding valid commercial Qt licenses may use this file in
12 ** accordance with the commercial license agreement provided with the
13 ** Software or, alternatively, in accordance with the terms contained in
14 ** a written agreement between you and The Qt Company. For licensing terms
15 ** and conditions see https://www.qt.io/terms-conditions. For further
16 ** information use the contact form at https://www.qt.io/contact-us.
17 **
18 ** GNU Lesser General Public License Usage
19 ** Alternatively, this file may be used under the terms of the GNU Lesser
20 ** General Public License version 3 as published by the Free Software
21 ** Foundation and appearing in the file LICENSE.LGPL3 included in the
22 ** packaging of this file. Please review the following information to
23 ** ensure the GNU Lesser General Public License version 3 requirements
24 ** will be met: https://www.gnu.org/licenses/lgpl-3.0.html.
25 **
26 ** GNU General Public License Usage
27 ** Alternatively, this file may be used under the terms of the GNU
28 ** General Public License version 2.0 or (at your option) the GNU General
29 ** Public license version 3 or any later version approved by the KDE Free
30 ** Qt Foundation. The licenses are as published by the Free Software
31 ** Foundation and appearing in the file LICENSE.GPL2 and LICENSE.GPL3
32 ** included in the packaging of this file. Please review the following
33 ** information to ensure the GNU General Public License requirements will
34 ** be met: https://www.gnu.org/licenses/gpl-2.0.html and
35 ** https://www.gnu.org/licenses/gpl-3.0.html.
36 **
37 ** $QT_END_LICENSE$
38 **
39 ****************************************************************************/
40 
41 /*!
42     \class QUrl
43     \inmodule QtCore
44 
45     \brief The QUrl class provides a convenient interface for working
46     with URLs.
47 
48     \reentrant
49     \ingroup io
50     \ingroup network
51     \ingroup shared
52 
53 
54     It can parse and construct URLs in both encoded and unencoded
55     form. QUrl also has support for internationalized domain names
56     (IDNs).
57 
58     The most common way to use QUrl is to initialize it via the
59     constructor by passing a QString. Otherwise, setUrl() can also
60     be used.
61 
62     URLs can be represented in two forms: encoded or unencoded. The
63     unencoded representation is suitable for showing to users, but
64     the encoded representation is typically what you would send to
65     a web server. For example, the unencoded URL
66     "http://bühler.example.com/List of applicants.xml"
67     would be sent to the server as
68     "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".
69 
70     A URL can also be constructed piece by piece by calling
71     setScheme(), setUserName(), setPassword(), setHost(), setPort(),
72     setPath(), setQuery() and setFragment(). Some convenience
73     functions are also available: setAuthority() sets the user name,
74     password, host and port. setUserInfo() sets the user name and
75     password at once.
76 
77     Call isValid() to check if the URL is valid. This can be done at any point
78     during the constructing of a URL. If isValid() returns \c false, you should
79     clear() the URL before proceeding, or start over by parsing a new URL with
80     setUrl().
81 
82     Constructing a query is particularly convenient through the use of the \l
83     QUrlQuery class and its methods QUrlQuery::setQueryItems(),
84     QUrlQuery::addQueryItem() and QUrlQuery::removeQueryItem(). Use
85     QUrlQuery::setQueryDelimiters() to customize the delimiters used for
86     generating the query string.
87 
88     For the convenience of generating encoded URL strings or query
89     strings, there are two static functions called
90     fromPercentEncoding() and toPercentEncoding() which deal with
91     percent encoding and decoding of QString objects.
92 
93     fromLocalFile() constructs a QUrl by parsing a local
94     file path. toLocalFile() converts a URL to a local file path.
95 
96     The human readable representation of the URL is fetched with
97     toString(). This representation is appropriate for displaying a
98     URL to a user in unencoded form. The encoded form however, as
99     returned by toEncoded(), is for internal use, passing to web
100     servers, mail clients and so on. Both forms are technically correct
101     and represent the same URL unambiguously -- in fact, passing either
102     form to QUrl's constructor or to setUrl() will yield the same QUrl
103     object.
104 
105     QUrl conforms to the URI specification from
106     \l{RFC 3986} (Uniform Resource Identifier: Generic Syntax), and includes
107     scheme extensions from \l{RFC 1738} (Uniform Resource Locators). Case
108     folding rules in QUrl conform to \l{RFC 3491} (Nameprep: A Stringprep
109     Profile for Internationalized Domain Names (IDN)). It is also compatible with the
110     \l{http://freedesktop.org/wiki/Specifications/file-uri-spec/}{file URI specification}
111     from freedesktop.org, provided that the locale encodes file names using
112     UTF-8 (required by IDN).
113 
114     \section2 Relative URLs vs Relative Paths
115 
116     Calling isRelative() will return whether or not the URL is relative.
117     A relative URL has no \l {scheme}. For example:
118 
119     \snippet code/src_corelib_io_qurl.cpp 8
120 
121     Notice that a URL can be absolute while containing a relative path, and
122     vice versa:
123 
124     \snippet code/src_corelib_io_qurl.cpp 9
125 
126     A relative URL can be resolved by passing it as an argument to resolved(),
127     which returns an absolute URL. isParentOf() is used for determining whether
128     one URL is a parent of another.
129 
130     \section2 Error checking
131 
132     QUrl is capable of detecting many errors in URLs while parsing it or when
133     components of the URL are set with individual setter methods (like
134     setScheme(), setHost() or setPath()). If the parsing or setter function is
135     successful, any previously recorded error conditions will be discarded.
136 
137     By default, QUrl setter methods operate in QUrl::TolerantMode, which means
138     they accept some common mistakes and mis-representation of data. An
139     alternate method of parsing is QUrl::StrictMode, which applies further
140     checks. See QUrl::ParsingMode for a description of the difference of the
141     parsing modes.
142 
143     QUrl only checks for conformance with the URL specification. It does not
144     try to verify that high-level protocol URLs are in the format they are
145     expected to be by handlers elsewhere. For example, the following URIs are
146     all considered valid by QUrl, even if they do not make sense when used:
147 
148     \list
149       \li "http:/filename.html"
150       \li "mailto://example.com"
151     \endlist
152 
153     When the parser encounters an error, it signals the event by making
154     isValid() return false and toString() / toEncoded() return an empty string.
155     If it is necessary to show the user the reason why the URL failed to parse,
156     the error condition can be obtained from QUrl by calling errorString().
157     Note that this message is highly technical and may not make sense to
158     end-users.
159 
160     QUrl is capable of recording only one error condition. If more than one
161     error is found, it is undefined which error is reported.
162 
163     \section2 Character Conversions
164 
165     Follow these rules to avoid erroneous character conversion when
166     dealing with URLs and strings:
167 
168     \list
169     \li When creating a QString to contain a URL from a QByteArray or a
170        char*, always use QString::fromUtf8().
171     \endlist
172 */
173 
174 /*!
175     \enum QUrl::ParsingMode
176 
177     The parsing mode controls the way QUrl parses strings.
178 
179     \value TolerantMode QUrl will try to correct some common errors in URLs.
180                         This mode is useful for parsing URLs coming from sources
181                         not known to be strictly standards-conforming.
182 
183     \value StrictMode Only valid URLs are accepted. This mode is useful for
184                       general URL validation.
185 
186     \value DecodedMode QUrl will interpret the URL component in the fully-decoded form,
187                        where percent characters stand for themselves, not as the beginning
188                        of a percent-encoded sequence. This mode is only valid for the
189                        setters setting components of a URL; it is not permitted in
190                        the QUrl constructor, in fromEncoded() or in setUrl().
191                        For more information on this mode, see the documentation for
192                        \l {QUrl::ComponentFormattingOption}{QUrl::FullyDecoded}.
193 
194     In TolerantMode, the parser has the following behaviour:
195 
196     \list
197 
198     \li Spaces and "%20": unencoded space characters will be accepted and will
199     be treated as equivalent to "%20".
200 
201     \li Single "%" characters: Any occurrences of a percent character "%" not
202     followed by exactly two hexadecimal characters (e.g., "13% coverage.html")
203     will be replaced by "%25". Note that one lone "%" character will trigger
204     the correction mode for all percent characters.
205 
206     \li Reserved and unreserved characters: An encoded URL should only
207     contain a few characters as literals; all other characters should
208     be percent-encoded. In TolerantMode, these characters will be
209     accepted if they are found in the URL:
210             space / double-quote / "<" / ">" / "\" /
211             "^" / "`" / "{" / "|" / "}"
212     Those same characters can be decoded again by passing QUrl::DecodeReserved
213     to toString() or toEncoded(). In the getters of individual components,
214     those characters are often returned in decoded form.
215 
216     \endlist
217 
218     When in StrictMode, if a parsing error is found, isValid() will return \c
219     false and errorString() will return a message describing the error.
220     If more than one error is detected, it is undefined which error gets
221     reported.
222 
223     Note that TolerantMode is not usually enough for parsing user input, which
224     often contains more errors and expectations than the parser can deal with.
225     When dealing with data coming directly from the user -- as opposed to data
226     coming from data-transfer sources, such as other programs -- it is
227     recommended to use fromUserInput().
228 
229     \sa fromUserInput(), setUrl(), toString(), toEncoded(), QUrl::FormattingOptions
230 */
231 
232 /*!
233     \enum QUrl::UrlFormattingOption
234 
235     The formatting options define how the URL is formatted when written out
236     as text.
237 
238     \value None The format of the URL is unchanged.
239     \value RemoveScheme  The scheme is removed from the URL.
240     \value RemovePassword  Any password in the URL is removed.
241     \value RemoveUserInfo  Any user information in the URL is removed.
242     \value RemovePort      Any specified port is removed from the URL.
243     \value RemoveAuthority
244     \value RemovePath   The URL's path is removed, leaving only the scheme,
245                         host address, and port (if present).
246     \value RemoveQuery  The query part of the URL (following a '?' character)
247                         is removed.
248     \value RemoveFragment
249     \value RemoveFilename The filename (i.e. everything after the last '/' in the path) is removed.
250             The trailing '/' is kept, unless StripTrailingSlash is set.
251             Only valid if RemovePath is not set.
252     \value PreferLocalFile If the URL is a local file according to isLocalFile()
253      and contains no query or fragment, a local file path is returned.
254     \value StripTrailingSlash  The trailing slash is removed from the path, if one is present.
255     \value NormalizePathSegments  Modifies the path to remove redundant directory separators,
256              and to resolve "."s and ".."s (as far as possible). For non-local paths, adjacent
257              slashes are preserved.
258 
259     Note that the case folding rules in \l{RFC 3491}{Nameprep}, which QUrl
260     conforms to, require host names to always be converted to lower case,
261     regardless of the Qt::FormattingOptions used.
262 
263     The options from QUrl::ComponentFormattingOptions are also possible.
264 
265     \sa QUrl::ComponentFormattingOptions
266 */
267 
268 /*!
269     \enum QUrl::ComponentFormattingOption
270     \since 5.0
271 
272     The component formatting options define how the components of an URL will
273     be formatted when written out as text. They can be combined with the
274     options from QUrl::FormattingOptions when used in toString() and
275     toEncoded().
276 
277     \value PrettyDecoded   The component is returned in a "pretty form", with
278                            most percent-encoded characters decoded. The exact
279                            behavior of PrettyDecoded varies from component to
280                            component and may also change from Qt release to Qt
281                            release. This is the default.
282 
283     \value EncodeSpaces    Leave space characters in their encoded form ("%20").
284 
285     \value EncodeUnicode   Leave non-US-ASCII characters encoded in their UTF-8
286                            percent-encoded form (e.g., "%C3%A9" for the U+00E9
287                            codepoint, LATIN SMALL LETTER E WITH ACUTE).
288 
289     \value EncodeDelimiters Leave certain delimiters in their encoded form, as
290                             would appear in the URL when the full URL is
291                             represented as text. The delimiters are affected
292                             by this option change from component to component.
293                             This flag has no effect in toString() or toEncoded().
294 
295     \value EncodeReserved  Leave US-ASCII characters not permitted in the URL by
296                            the specification in their encoded form. This is the
297                            default on toString() and toEncoded().
298 
299     \value DecodeReserved  Decode the US-ASCII characters that the URL specification
300                            does not allow to appear in the URL. This is the
301                            default on the getters of individual components.
302 
303     \value FullyEncoded    Leave all characters in their properly-encoded form,
304                            as this component would appear as part of a URL. When
305                            used with toString(), this produces a fully-compliant
306                            URL in QString form, exactly equal to the result of
307                            toEncoded()
308 
309     \value FullyDecoded    Attempt to decode as much as possible. For individual
310                            components of the URL, this decodes every percent
311                            encoding sequence, including control characters (U+0000
312                            to U+001F) and UTF-8 sequences found in percent-encoded form.
313                            Use of this mode may cause data loss, see below for more information.
314 
315     The values of EncodeReserved and DecodeReserved should not be used together
316     in one call. The behavior is undefined if that happens. They are provided
317     as separate values because the behavior of the "pretty mode" with regards
318     to reserved characters is different on certain components and specially on
319     the full URL.
320 
321     \section2 Full decoding
322 
323     The FullyDecoded mode is similar to the behavior of the functions returning
324     QString in Qt 4.x, in that every character represents itself and never has
325     any special meaning. This is true even for the percent character ('%'),
326     which should be interpreted to mean a literal percent, not the beginning of
327     a percent-encoded sequence. The same actual character, in all other
328     decoding modes, is represented by the sequence "%25".
329 
330     Whenever re-applying data obtained with QUrl::FullyDecoded into a QUrl,
331     care must be taken to use the QUrl::DecodedMode parameter to the setters
332     (like setPath() and setUserName()). Failure to do so may cause
333     re-interpretation of the percent character ('%') as the beginning of a
334     percent-encoded sequence.
335 
336     This mode is quite useful when portions of a URL are used in a non-URL
337     context. For example, to extract the username, password or file paths in an
338     FTP client application, the FullyDecoded mode should be used.
339 
340     This mode should be used with care, since there are two conditions that
341     cannot be reliably represented in the returned QString. They are:
342 
343     \list
344       \li \b{Non-UTF-8 sequences:} URLs may contain sequences of
345       percent-encoded characters that do not form valid UTF-8 sequences. Since
346       URLs need to be decoded using UTF-8, any decoder failure will result in
347       the QString containing one or more replacement characters where the
348       sequence existed.
349 
350       \li \b{Encoded delimiters:} URLs are also allowed to make a distinction
351       between a delimiter found in its literal form and its equivalent in
352       percent-encoded form. This is most commonly found in the query, but is
353       permitted in most parts of the URL.
354     \endlist
355 
356     The following example illustrates the problem:
357 
358     \snippet code/src_corelib_io_qurl.cpp 10
359 
360     If the two URLs were used via HTTP GET, the interpretation by the web
361     server would probably be different. In the first case, it would interpret
362     as one parameter, with a key of "q" and value "a+=b&c". In the second
363     case, it would probably interpret as two parameters, one with a key of "q"
364     and value "a =b", and the second with a key "c" and no value.
365 
366     \sa QUrl::FormattingOptions
367 */
368 
369 /*!
370     \enum QUrl::UserInputResolutionOption
371     \since 5.4
372 
373     The user input resolution options define how fromUserInput() should
374     interpret strings that could either be a relative path or the short
375     form of a HTTP URL. For instance \c{file.pl} can be either a local file
376     or the URL \c{http://file.pl}.
377 
378     \value DefaultResolution  The default resolution mechanism is to check
379                               whether a local file exists, in the working
380                               directory given to fromUserInput, and only
381                               return a local path in that case. Otherwise a URL
382                               is assumed.
383     \value AssumeLocalFile    This option makes fromUserInput() always return
384                               a local path unless the input contains a scheme, such as
385                               \c{http://file.pl}. This is useful for applications
386                               such as text editors, which are able to create
387                               the file if it doesn't exist.
388 
389     \sa fromUserInput()
390 */
391 
392 /*!
393     \fn QUrl::QUrl(QUrl &&other)
394 
395     Move-constructs a QUrl instance, making it point at the same
396     object that \a other was pointing to.
397 
398     \since 5.2
399 */
400 
401 /*!
402     \fn QUrl &QUrl::operator=(QUrl &&other)
403 
404     Move-assigns \a other to this QUrl instance.
405 
406     \since 5.2
407 */
408 
409 #include "qurl.h"
410 #include "qurl_p.h"
411 #include "qplatformdefs.h"
412 #include "qstring.h"
413 #include "qstringlist.h"
414 #include "qdebug.h"
415 #include "qhash.h"
416 #include "qdir.h"         // for QDir::fromNativeSeparators
417 #include "qdatastream.h"
418 #if QT_CONFIG(topleveldomain) // ### Qt6: Remove section
419 #include "qtldurl_p.h"
420 #endif
421 #include "private/qipaddress_p.h"
422 #include "qurlquery.h"
423 #include "private/qdir_p.h"
424 #include <private/qmemory_p.h>
425 
426 QT_BEGIN_NAMESPACE
427 
isHex(char c)428 inline static bool isHex(char c)
429 {
430     c |= 0x20;
431     return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f');
432 }
433 
ftpScheme()434 static inline QString ftpScheme()
435 {
436     return QStringLiteral("ftp");
437 }
438 
fileScheme()439 static inline QString fileScheme()
440 {
441     return QStringLiteral("file");
442 }
443 
webDavScheme()444 static inline QString webDavScheme()
445 {
446     return QStringLiteral("webdavs");
447 }
448 
webDavSslTag()449 static inline QString webDavSslTag()
450 {
451     return QStringLiteral("@SSL");
452 }
453 
454 class QUrlPrivate
455 {
456 public:
457     enum Section : uchar {
458         Scheme = 0x01,
459         UserName = 0x02,
460         Password = 0x04,
461         UserInfo = UserName | Password,
462         Host = 0x08,
463         Port = 0x10,
464         Authority = UserInfo | Host | Port,
465         Path = 0x20,
466         Hierarchy = Authority | Path,
467         Query = 0x40,
468         Fragment = 0x80,
469         FullUrl = 0xff
470     };
471 
472     enum Flags : uchar {
473         IsLocalFile = 0x01
474     };
475 
476     enum ErrorCode {
477         // the high byte of the error code matches the Section
478         // the first item in each value must be the generic "Invalid xxx Error"
479         InvalidSchemeError = Scheme << 8,
480 
481         InvalidUserNameError = UserName << 8,
482 
483         InvalidPasswordError = Password << 8,
484 
485         InvalidRegNameError = Host << 8,
486         InvalidIPv4AddressError,
487         InvalidIPv6AddressError,
488         InvalidCharacterInIPv6Error,
489         InvalidIPvFutureError,
490         HostMissingEndBracket,
491 
492         InvalidPortError = Port << 8,
493         PortEmptyError,
494 
495         InvalidPathError = Path << 8,
496 
497         InvalidQueryError = Query << 8,
498 
499         InvalidFragmentError = Fragment << 8,
500 
501         // the following three cases are only possible in combination with
502         // presence/absence of the path, authority and scheme. See validityError().
503         AuthorityPresentAndPathIsRelative = Authority << 8 | Path << 8 | 0x10000,
504         AuthorityAbsentAndPathIsDoubleSlash,
505         RelativeUrlPathContainsColonBeforeSlash = Scheme << 8 | Authority << 8 | Path << 8 | 0x10000,
506 
507         NoError = 0
508     };
509 
510     struct Error {
511         QString source;
512         ErrorCode code;
513         int position;
514     };
515 
516     QUrlPrivate();
517     QUrlPrivate(const QUrlPrivate &copy);
518     ~QUrlPrivate();
519 
520     void parse(const QString &url, QUrl::ParsingMode parsingMode);
isEmpty() const521     bool isEmpty() const
522     { return sectionIsPresent == 0 && port == -1 && path.isEmpty(); }
523 
524     std::unique_ptr<Error> cloneError() const;
525     void clearError();
526     void setError(ErrorCode errorCode, const QString &source, int supplement = -1);
527     ErrorCode validityError(QString *source = nullptr, int *position = nullptr) const;
528     bool validateComponent(Section section, const QString &input, int begin, int end);
validateComponent(Section section,const QString & input)529     bool validateComponent(Section section, const QString &input)
530     { return validateComponent(section, input, 0, uint(input.length())); }
531 
532     // no QString scheme() const;
533     void appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
534     void appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
535     void appendUserName(QString &appendTo, QUrl::FormattingOptions options) const;
536     void appendPassword(QString &appendTo, QUrl::FormattingOptions options) const;
537     void appendHost(QString &appendTo, QUrl::FormattingOptions options) const;
538     void appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
539     void appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
540     void appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
541 
542     // the "end" parameters are like STL iterators: they point to one past the last valid element
543     bool setScheme(const QString &value, int len, bool doSetError);
544     void setAuthority(const QString &auth, int from, int end, QUrl::ParsingMode mode);
545     void setUserInfo(const QString &userInfo, int from, int end);
546     void setUserName(const QString &value, int from, int end);
547     void setPassword(const QString &value, int from, int end);
548     bool setHost(const QString &value, int from, int end, QUrl::ParsingMode mode);
549     void setPath(const QString &value, int from, int end);
550     void setQuery(const QString &value, int from, int end);
551     void setFragment(const QString &value, int from, int end);
552 
hasScheme() const553     inline bool hasScheme() const { return sectionIsPresent & Scheme; }
hasAuthority() const554     inline bool hasAuthority() const { return sectionIsPresent & Authority; }
hasUserInfo() const555     inline bool hasUserInfo() const { return sectionIsPresent & UserInfo; }
hasUserName() const556     inline bool hasUserName() const { return sectionIsPresent & UserName; }
hasPassword() const557     inline bool hasPassword() const { return sectionIsPresent & Password; }
hasHost() const558     inline bool hasHost() const { return sectionIsPresent & Host; }
hasPort() const559     inline bool hasPort() const { return port != -1; }
hasPath() const560     inline bool hasPath() const { return !path.isEmpty(); }
hasQuery() const561     inline bool hasQuery() const { return sectionIsPresent & Query; }
hasFragment() const562     inline bool hasFragment() const { return sectionIsPresent & Fragment; }
563 
isLocalFile() const564     inline bool isLocalFile() const { return flags & IsLocalFile; }
565     QString toLocalFile(QUrl::FormattingOptions options) const;
566 
567     QString mergePaths(const QString &relativePath) const;
568 
569     QAtomicInt ref;
570     int port;
571 
572     QString scheme;
573     QString userName;
574     QString password;
575     QString host;
576     QString path;
577     QString query;
578     QString fragment;
579 
580     std::unique_ptr<Error> error;
581 
582     // not used for:
583     //  - Port (port == -1 means absence)
584     //  - Path (there's no path delimiter, so we optimize its use out of existence)
585     // Schemes are never supposed to be empty, but we keep the flag anyway
586     uchar sectionIsPresent;
587     uchar flags;
588 
589     // 32-bit: 2 bytes tail padding available
590     // 64-bit: 6 bytes tail padding available
591 };
592 
QUrlPrivate()593 inline QUrlPrivate::QUrlPrivate()
594     : ref(1), port(-1),
595       sectionIsPresent(0),
596       flags(0)
597 {
598 }
599 
QUrlPrivate(const QUrlPrivate & copy)600 inline QUrlPrivate::QUrlPrivate(const QUrlPrivate &copy)
601     : ref(1), port(copy.port),
602       scheme(copy.scheme),
603       userName(copy.userName),
604       password(copy.password),
605       host(copy.host),
606       path(copy.path),
607       query(copy.query),
608       fragment(copy.fragment),
609       error(copy.cloneError()),
610       sectionIsPresent(copy.sectionIsPresent),
611       flags(copy.flags)
612 {
613 }
614 
615 inline QUrlPrivate::~QUrlPrivate()
616     = default;
617 
cloneError() const618 std::unique_ptr<QUrlPrivate::Error> QUrlPrivate::cloneError() const
619 {
620     return error ? qt_make_unique<Error>(*error) : nullptr;
621 }
622 
clearError()623 inline void QUrlPrivate::clearError()
624 {
625     error.reset();
626 }
627 
setError(ErrorCode errorCode,const QString & source,int supplement)628 inline void QUrlPrivate::setError(ErrorCode errorCode, const QString &source, int supplement)
629 {
630     if (error) {
631         // don't overwrite an error set in a previous section during parsing
632         return;
633     }
634     error = qt_make_unique<Error>();
635     error->code = errorCode;
636     error->source = source;
637     error->position = supplement;
638 }
639 
640 // From RFC 3986, Appendix A Collected ABNF for URI
641 //    URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
642 //[...]
643 //    scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
644 //
645 //    authority     = [ userinfo "@" ] host [ ":" port ]
646 //    userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )
647 //    host          = IP-literal / IPv4address / reg-name
648 //    port          = *DIGIT
649 //[...]
650 //    reg-name      = *( unreserved / pct-encoded / sub-delims )
651 //[..]
652 //    pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
653 //
654 //    query         = *( pchar / "/" / "?" )
655 //
656 //    fragment      = *( pchar / "/" / "?" )
657 //
658 //    pct-encoded   = "%" HEXDIG HEXDIG
659 //
660 //    unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
661 //    reserved      = gen-delims / sub-delims
662 //    gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
663 //    sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
664 //                  / "*" / "+" / "," / ";" / "="
665 // the path component has a complex ABNF that basically boils down to
666 // slash-separated segments of "pchar"
667 
668 // The above is the strict definition of the URL components and we mostly
669 // adhere to it, with few exceptions. QUrl obeys the following behavior:
670 //  - percent-encoding sequences always use uppercase HEXDIG;
671 //  - unreserved characters are *always* decoded, no exceptions;
672 //  - the space character and bytes with the high bit set are controlled by
673 //    the EncodeSpaces and EncodeUnicode bits;
674 //  - control characters, the percent sign itself, and bytes with the high
675 //    bit set that don't form valid UTF-8 sequences are always encoded,
676 //    except in FullyDecoded mode;
677 //  - sub-delims are always left alone, except in FullyDecoded mode;
678 //  - gen-delim change behavior depending on which section of the URL (or
679 //    the entire URL) we're looking at; see below;
680 //  - characters not mentioned above, like "<", and ">", are usually
681 //    decoded in individual sections of the URL, but encoded when the full
682 //    URL is put together (we can change on subjective definition of
683 //    "pretty").
684 //
685 // The behavior for the delimiters bears some explanation. The spec says in
686 // section 2.2:
687 //     URIs that differ in the replacement of a reserved character with its
688 //     corresponding percent-encoded octet are not equivalent.
689 // (note: QUrl API mistakenly uses the "reserved" term, so we will refer to
690 // them here as "delimiters").
691 //
692 // For that reason, we cannot encode delimiters found in decoded form and we
693 // cannot decode the ones found in encoded form if that would change the
694 // interpretation. Conversely, we *can* perform the transformation if it would
695 // not change the interpretation. From the last component of a URL to the first,
696 // here are the gen-delims we can unambiguously transform when the field is
697 // taken in isolation:
698 //  - fragment: none, since it's the last
699 //  - query: "#" is unambiguous
700 //  - path: "#" and "?" are unambiguous
701 //  - host: completely special but never ambiguous, see setHost() below.
702 //  - password: the "#", "?", "/", "[", "]" and "@" characters are unambiguous
703 //  - username: the "#", "?", "/", "[", "]", "@", and ":" characters are unambiguous
704 //  - scheme: doesn't accept any delimiter, see setScheme() below.
705 //
706 // Internally, QUrl stores each component in the format that corresponds to the
707 // default mode (PrettyDecoded). It deviates from the "strict" FullyEncoded
708 // mode in the following way:
709 //  - spaces are decoded
710 //  - valid UTF-8 sequences are decoded
711 //  - gen-delims that can be unambiguously transformed are decoded
712 //  - characters controlled by DecodeReserved are often decoded, though this behavior
713 //    can change depending on the subjective definition of "pretty"
714 //
715 // Note that the list of gen-delims that we can transform is different for the
716 // user info (user name + password) and the authority (user info + host +
717 // port).
718 
719 
720 // list the recoding table modifications to be used with the recodeFromUser and
721 // appendToUser functions, according to the rules above. Spaces and UTF-8
722 // sequences are handled outside the tables.
723 
724 // the encodedXXX tables are run with the delimiters set to "leave" by default;
725 // the decodedXXX tables are run with the delimiters set to "decode" by default
726 // (except for the query, which doesn't use these functions)
727 
728 #define decode(x) ushort(x)
729 #define leave(x)  ushort(0x100 | (x))
730 #define encode(x) ushort(0x200 | (x))
731 
732 static const ushort userNameInIsolation[] = {
733     decode(':'), // 0
734     decode('@'), // 1
735     decode(']'), // 2
736     decode('['), // 3
737     decode('/'), // 4
738     decode('?'), // 5
739     decode('#'), // 6
740 
741     decode('"'), // 7
742     decode('<'),
743     decode('>'),
744     decode('^'),
745     decode('\\'),
746     decode('|'),
747     decode('{'),
748     decode('}'),
749     0
750 };
751 static const ushort * const passwordInIsolation = userNameInIsolation + 1;
752 static const ushort * const pathInIsolation = userNameInIsolation + 5;
753 static const ushort * const queryInIsolation = userNameInIsolation + 6;
754 static const ushort * const fragmentInIsolation = userNameInIsolation + 7;
755 
756 static const ushort userNameInUserInfo[] =  {
757     encode(':'), // 0
758     decode('@'), // 1
759     decode(']'), // 2
760     decode('['), // 3
761     decode('/'), // 4
762     decode('?'), // 5
763     decode('#'), // 6
764 
765     decode('"'), // 7
766     decode('<'),
767     decode('>'),
768     decode('^'),
769     decode('\\'),
770     decode('|'),
771     decode('{'),
772     decode('}'),
773     0
774 };
775 static const ushort * const passwordInUserInfo = userNameInUserInfo + 1;
776 
777 static const ushort userNameInAuthority[] = {
778     encode(':'), // 0
779     encode('@'), // 1
780     encode(']'), // 2
781     encode('['), // 3
782     decode('/'), // 4
783     decode('?'), // 5
784     decode('#'), // 6
785 
786     decode('"'), // 7
787     decode('<'),
788     decode('>'),
789     decode('^'),
790     decode('\\'),
791     decode('|'),
792     decode('{'),
793     decode('}'),
794     0
795 };
796 static const ushort * const passwordInAuthority = userNameInAuthority + 1;
797 
798 static const ushort userNameInUrl[] = {
799     encode(':'), // 0
800     encode('@'), // 1
801     encode(']'), // 2
802     encode('['), // 3
803     encode('/'), // 4
804     encode('?'), // 5
805     encode('#'), // 6
806 
807     // no need to list encode(x) for the other characters
808     0
809 };
810 static const ushort * const passwordInUrl = userNameInUrl + 1;
811 static const ushort * const pathInUrl = userNameInUrl + 5;
812 static const ushort * const queryInUrl = userNameInUrl + 6;
813 static const ushort * const fragmentInUrl = userNameInUrl + 6;
814 
parseDecodedComponent(QString & data)815 static inline void parseDecodedComponent(QString &data)
816 {
817     data.replace(QLatin1Char('%'), QLatin1String("%25"));
818 }
819 
820 static inline QString
recodeFromUser(const QString & input,const ushort * actions,int from,int to)821 recodeFromUser(const QString &input, const ushort *actions, int from, int to)
822 {
823     QString output;
824     const QChar *begin = input.constData() + from;
825     const QChar *end = input.constData() + to;
826     if (qt_urlRecode(output, begin, end, {}, actions))
827         return output;
828 
829     return input.mid(from, to - from);
830 }
831 
832 // appendXXXX functions: copy from the internal form to the external, user form.
833 // the internal value is stored in its PrettyDecoded form, so that case is easy.
appendToUser(QString & appendTo,const QStringRef & value,QUrl::FormattingOptions options,const ushort * actions)834 static inline void appendToUser(QString &appendTo, const QStringRef &value, QUrl::FormattingOptions options,
835                                 const ushort *actions)
836 {
837     // Test ComponentFormattingOptions, ignore FormattingOptions.
838     if ((options & 0xFFFF0000) == QUrl::PrettyDecoded) {
839         appendTo += value;
840         return;
841     }
842 
843     if (!qt_urlRecode(appendTo, value.data(), value.end(), options, actions))
844         appendTo += value;
845 }
846 
appendToUser(QString & appendTo,const QString & value,QUrl::FormattingOptions options,const ushort * actions)847 static inline void appendToUser(QString &appendTo, const QString &value, QUrl::FormattingOptions options,
848                                 const ushort *actions)
849 {
850     appendToUser(appendTo, QStringRef(&value), options, actions);
851 }
852 
853 
appendAuthority(QString & appendTo,QUrl::FormattingOptions options,Section appendingTo) const854 inline void QUrlPrivate::appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
855 {
856     if ((options & QUrl::RemoveUserInfo) != QUrl::RemoveUserInfo) {
857         appendUserInfo(appendTo, options, appendingTo);
858 
859         // add '@' only if we added anything
860         if (hasUserName() || (hasPassword() && (options & QUrl::RemovePassword) == 0))
861             appendTo += QLatin1Char('@');
862     }
863     appendHost(appendTo, options);
864     if (!(options & QUrl::RemovePort) && port != -1)
865         appendTo += QLatin1Char(':') + QString::number(port);
866 }
867 
appendUserInfo(QString & appendTo,QUrl::FormattingOptions options,Section appendingTo) const868 inline void QUrlPrivate::appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
869 {
870     if (Q_LIKELY(!hasUserInfo()))
871         return;
872 
873     const ushort *userNameActions;
874     const ushort *passwordActions;
875     if (options & QUrl::EncodeDelimiters) {
876         userNameActions = userNameInUrl;
877         passwordActions = passwordInUrl;
878     } else {
879         switch (appendingTo) {
880         case UserInfo:
881             userNameActions = userNameInUserInfo;
882             passwordActions = passwordInUserInfo;
883             break;
884 
885         case Authority:
886             userNameActions = userNameInAuthority;
887             passwordActions = passwordInAuthority;
888             break;
889 
890         case FullUrl:
891             userNameActions = userNameInUrl;
892             passwordActions = passwordInUrl;
893             break;
894 
895         default:
896             // can't happen
897             Q_UNREACHABLE();
898             break;
899         }
900     }
901 
902     if (!qt_urlRecode(appendTo, userName.constData(), userName.constEnd(), options, userNameActions))
903         appendTo += userName;
904     if (options & QUrl::RemovePassword || !hasPassword()) {
905         return;
906     } else {
907         appendTo += QLatin1Char(':');
908         if (!qt_urlRecode(appendTo, password.constData(), password.constEnd(), options, passwordActions))
909             appendTo += password;
910     }
911 }
912 
appendUserName(QString & appendTo,QUrl::FormattingOptions options) const913 inline void QUrlPrivate::appendUserName(QString &appendTo, QUrl::FormattingOptions options) const
914 {
915     // only called from QUrl::userName()
916     appendToUser(appendTo, userName, options,
917                  options & QUrl::EncodeDelimiters ? userNameInUrl : userNameInIsolation);
918 }
919 
appendPassword(QString & appendTo,QUrl::FormattingOptions options) const920 inline void QUrlPrivate::appendPassword(QString &appendTo, QUrl::FormattingOptions options) const
921 {
922     // only called from QUrl::password()
923     appendToUser(appendTo, password, options,
924                  options & QUrl::EncodeDelimiters ? passwordInUrl : passwordInIsolation);
925 }
926 
appendPath(QString & appendTo,QUrl::FormattingOptions options,Section appendingTo) const927 inline void QUrlPrivate::appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
928 {
929     QString thePath = path;
930     if (options & QUrl::NormalizePathSegments) {
931         thePath = qt_normalizePathSegments(path, isLocalFile() ? QDirPrivate::DefaultNormalization : QDirPrivate::RemotePath);
932     }
933 
934     QStringRef thePathRef(&thePath);
935     if (options & QUrl::RemoveFilename) {
936         const int slash = path.lastIndexOf(QLatin1Char('/'));
937         if (slash == -1)
938             return;
939         thePathRef = path.leftRef(slash + 1);
940     }
941     // check if we need to remove trailing slashes
942     if (options & QUrl::StripTrailingSlash) {
943         while (thePathRef.length() > 1 && thePathRef.endsWith(QLatin1Char('/')))
944             thePathRef.chop(1);
945     }
946 
947     appendToUser(appendTo, thePathRef, options,
948                  appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? pathInUrl : pathInIsolation);
949 }
950 
appendFragment(QString & appendTo,QUrl::FormattingOptions options,Section appendingTo) const951 inline void QUrlPrivate::appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
952 {
953     appendToUser(appendTo, fragment, options,
954                  options & QUrl::EncodeDelimiters ? fragmentInUrl :
955                  appendingTo == FullUrl ? nullptr : fragmentInIsolation);
956 }
957 
appendQuery(QString & appendTo,QUrl::FormattingOptions options,Section appendingTo) const958 inline void QUrlPrivate::appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
959 {
960     appendToUser(appendTo, query, options,
961                  appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? queryInUrl : queryInIsolation);
962 }
963 
964 // setXXX functions
965 
setScheme(const QString & value,int len,bool doSetError)966 inline bool QUrlPrivate::setScheme(const QString &value, int len, bool doSetError)
967 {
968     // schemes are strictly RFC-compliant:
969     //    scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
970     // we also lowercase the scheme
971 
972     // schemes in URLs are not allowed to be empty, but they can be in
973     // "Relative URIs" which QUrl also supports. QUrl::setScheme does
974     // not call us with len == 0, so this can only be from parse()
975     scheme.clear();
976     if (len == 0)
977         return false;
978 
979     sectionIsPresent |= Scheme;
980 
981     // validate it:
982     int needsLowercasing = -1;
983     const ushort *p = reinterpret_cast<const ushort *>(value.constData());
984     for (int i = 0; i < len; ++i) {
985         if (p[i] >= 'a' && p[i] <= 'z')
986             continue;
987         if (p[i] >= 'A' && p[i] <= 'Z') {
988             needsLowercasing = i;
989             continue;
990         }
991         if (i) {
992             if (p[i] >= '0' && p[i] <= '9')
993                 continue;
994             if (p[i] == '+' || p[i] == '-' || p[i] == '.')
995                 continue;
996         }
997 
998         // found something else
999         // don't call setError needlessly:
1000         // if we've been called from parse(), it will try to recover
1001         if (doSetError)
1002             setError(InvalidSchemeError, value, i);
1003         return false;
1004     }
1005 
1006     scheme = value.left(len);
1007 
1008     if (needsLowercasing != -1) {
1009         // schemes are ASCII only, so we don't need the full Unicode toLower
1010         QChar *schemeData = scheme.data(); // force detaching here
1011         for (int i = needsLowercasing; i >= 0; --i) {
1012             ushort c = schemeData[i].unicode();
1013             if (c >= 'A' && c <= 'Z')
1014                 schemeData[i] = QChar(c + 0x20);
1015         }
1016     }
1017 
1018     // did we set to the file protocol?
1019     if (scheme == fileScheme()
1020 #ifdef Q_OS_WIN
1021         || scheme == webDavScheme()
1022 #endif
1023        ) {
1024         flags |= IsLocalFile;
1025     } else {
1026         flags &= ~IsLocalFile;
1027     }
1028     return true;
1029 }
1030 
setAuthority(const QString & auth,int from,int end,QUrl::ParsingMode mode)1031 inline void QUrlPrivate::setAuthority(const QString &auth, int from, int end, QUrl::ParsingMode mode)
1032 {
1033     sectionIsPresent &= ~Authority;
1034     sectionIsPresent |= Host;
1035     port = -1;
1036 
1037     // we never actually _loop_
1038     while (from != end) {
1039         int userInfoIndex = auth.indexOf(QLatin1Char('@'), from);
1040         if (uint(userInfoIndex) < uint(end)) {
1041             setUserInfo(auth, from, userInfoIndex);
1042             if (mode == QUrl::StrictMode && !validateComponent(UserInfo, auth, from, userInfoIndex))
1043                 break;
1044             from = userInfoIndex + 1;
1045         }
1046 
1047         int colonIndex = auth.lastIndexOf(QLatin1Char(':'), end - 1);
1048         if (colonIndex < from)
1049             colonIndex = -1;
1050 
1051         if (uint(colonIndex) < uint(end)) {
1052             if (auth.at(from).unicode() == '[') {
1053                 // check if colonIndex isn't inside the "[...]" part
1054                 int closingBracket = auth.indexOf(QLatin1Char(']'), from);
1055                 if (uint(closingBracket) > uint(colonIndex))
1056                     colonIndex = -1;
1057             }
1058         }
1059 
1060         if (uint(colonIndex) < uint(end) - 1) {
1061             // found a colon with digits after it
1062             unsigned long x = 0;
1063             for (int i = colonIndex + 1; i < end; ++i) {
1064                 ushort c = auth.at(i).unicode();
1065                 if (c >= '0' && c <= '9') {
1066                     x *= 10;
1067                     x += c - '0';
1068                 } else {
1069                     x = ulong(-1); // x != ushort(x)
1070                     break;
1071                 }
1072             }
1073             if (x == ushort(x)) {
1074                 port = ushort(x);
1075             } else {
1076                 setError(InvalidPortError, auth, colonIndex + 1);
1077                 if (mode == QUrl::StrictMode)
1078                     break;
1079             }
1080         }
1081 
1082         setHost(auth, from, qMin<uint>(end, colonIndex), mode);
1083         if (mode == QUrl::StrictMode && !validateComponent(Host, auth, from, qMin<uint>(end, colonIndex))) {
1084             // clear host too
1085             sectionIsPresent &= ~Authority;
1086             break;
1087         }
1088 
1089         // success
1090         return;
1091     }
1092     // clear all sections but host
1093     sectionIsPresent &= ~Authority | Host;
1094     userName.clear();
1095     password.clear();
1096     host.clear();
1097     port = -1;
1098 }
1099 
setUserInfo(const QString & userInfo,int from,int end)1100 inline void QUrlPrivate::setUserInfo(const QString &userInfo, int from, int end)
1101 {
1102     int delimIndex = userInfo.indexOf(QLatin1Char(':'), from);
1103     setUserName(userInfo, from, qMin<uint>(delimIndex, end));
1104 
1105     if (uint(delimIndex) >= uint(end)) {
1106         password.clear();
1107         sectionIsPresent &= ~Password;
1108     } else {
1109         setPassword(userInfo, delimIndex + 1, end);
1110     }
1111 }
1112 
setUserName(const QString & value,int from,int end)1113 inline void QUrlPrivate::setUserName(const QString &value, int from, int end)
1114 {
1115     sectionIsPresent |= UserName;
1116     userName = recodeFromUser(value, userNameInIsolation, from, end);
1117 }
1118 
setPassword(const QString & value,int from,int end)1119 inline void QUrlPrivate::setPassword(const QString &value, int from, int end)
1120 {
1121     sectionIsPresent |= Password;
1122     password = recodeFromUser(value, passwordInIsolation, from, end);
1123 }
1124 
setPath(const QString & value,int from,int end)1125 inline void QUrlPrivate::setPath(const QString &value, int from, int end)
1126 {
1127     // sectionIsPresent |= Path; // not used, save some cycles
1128     path = recodeFromUser(value, pathInIsolation, from, end);
1129 }
1130 
setFragment(const QString & value,int from,int end)1131 inline void QUrlPrivate::setFragment(const QString &value, int from, int end)
1132 {
1133     sectionIsPresent |= Fragment;
1134     fragment = recodeFromUser(value, fragmentInIsolation, from, end);
1135 }
1136 
setQuery(const QString & value,int from,int iend)1137 inline void QUrlPrivate::setQuery(const QString &value, int from, int iend)
1138 {
1139     sectionIsPresent |= Query;
1140     query = recodeFromUser(value, queryInIsolation, from, iend);
1141 }
1142 
1143 // Host handling
1144 // The RFC says the host is:
1145 //    host          = IP-literal / IPv4address / reg-name
1146 //    IP-literal    = "[" ( IPv6address / IPvFuture  ) "]"
1147 //    IPvFuture     = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1148 //  [a strict definition of IPv6Address and IPv4Address]
1149 //     reg-name      = *( unreserved / pct-encoded / sub-delims )
1150 //
1151 // We deviate from the standard in all but IPvFuture. For IPvFuture we accept
1152 // and store only exactly what the RFC says we should. No percent-encoding is
1153 // permitted in this field, so Unicode characters and space aren't either.
1154 //
1155 // For IPv4 addresses, we accept broken addresses like inet_aton does (that is,
1156 // less than three dots). However, we correct the address to the proper form
1157 // and store the corrected address. After correction, we comply to the RFC and
1158 // it's exclusively composed of unreserved characters.
1159 //
1160 // For IPv6 addresses, we accept addresses including trailing (embedded) IPv4
1161 // addresses, the so-called v4-compat and v4-mapped addresses. We also store
1162 // those addresses like that in the hostname field, which violates the spec.
1163 // IPv6 hosts are stored with the square brackets in the QString. It also
1164 // requires no transformation in any way.
1165 //
1166 // As for registered names, it's the other way around: we accept only valid
1167 // hostnames as specified by STD 3 and IDNA. That means everything we accept is
1168 // valid in the RFC definition above, but there are many valid reg-names
1169 // according to the RFC that we do not accept in the name of security. Since we
1170 // do accept IDNA, reg-names are subject to ACE encoding and decoding, which is
1171 // specified by the DecodeUnicode flag. The hostname is stored in its Unicode form.
1172 
appendHost(QString & appendTo,QUrl::FormattingOptions options) const1173 inline void QUrlPrivate::appendHost(QString &appendTo, QUrl::FormattingOptions options) const
1174 {
1175     if (host.isEmpty())
1176         return;
1177     if (host.at(0).unicode() == '[') {
1178         // IPv6 addresses might contain a zone-id which needs to be recoded
1179         if (options != 0)
1180             if (qt_urlRecode(appendTo, host.constBegin(), host.constEnd(), options, nullptr))
1181                 return;
1182         appendTo += host;
1183     } else {
1184         // this is either an IPv4Address or a reg-name
1185         // if it is a reg-name, it is already stored in Unicode form
1186         if (options & QUrl::EncodeUnicode && !(options & 0x4000000))
1187             appendTo += qt_ACE_do(host, ToAceOnly, AllowLeadingDot);
1188         else
1189             appendTo += host;
1190     }
1191 }
1192 
1193 // the whole IPvFuture is passed and parsed here, including brackets;
1194 // returns null if the parsing was successful, or the QChar of the first failure
parseIpFuture(QString & host,const QChar * begin,const QChar * end,QUrl::ParsingMode mode)1195 static const QChar *parseIpFuture(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1196 {
1197     //    IPvFuture     = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1198     static const char acceptable[] =
1199             "!$&'()*+,;=" // sub-delims
1200             ":"           // ":"
1201             "-._~";       // unreserved
1202 
1203     // the brackets and the "v" have been checked
1204     const QChar *const origBegin = begin;
1205     if (begin[3].unicode() != '.')
1206         return &begin[3];
1207     if ((begin[2].unicode() >= 'A' && begin[2].unicode() <= 'F') ||
1208             (begin[2].unicode() >= 'a' && begin[2].unicode() <= 'f') ||
1209             (begin[2].unicode() >= '0' && begin[2].unicode() <= '9')) {
1210         // this is so unlikely that we'll just go down the slow path
1211         // decode the whole string, skipping the "[vH." and "]" which we already know to be there
1212         host += QString::fromRawData(begin, 4);
1213 
1214         // uppercase the version, if necessary
1215         if (begin[2].unicode() >= 'a')
1216             host[host.length() - 2] = begin[2].unicode() - 0x20;
1217 
1218         begin += 4;
1219         --end;
1220 
1221         QString decoded;
1222         if (mode == QUrl::TolerantMode && qt_urlRecode(decoded, begin, end, QUrl::FullyDecoded, nullptr)) {
1223             begin = decoded.constBegin();
1224             end = decoded.constEnd();
1225         }
1226 
1227         for ( ; begin != end; ++begin) {
1228             if (begin->unicode() >= 'A' && begin->unicode() <= 'Z')
1229                 host += *begin;
1230             else if (begin->unicode() >= 'a' && begin->unicode() <= 'z')
1231                 host += *begin;
1232             else if (begin->unicode() >= '0' && begin->unicode() <= '9')
1233                 host += *begin;
1234             else if (begin->unicode() < 0x80 && strchr(acceptable, begin->unicode()) != nullptr)
1235                 host += *begin;
1236             else
1237                 return decoded.isEmpty() ? begin : &origBegin[2];
1238         }
1239         host += QLatin1Char(']');
1240         return nullptr;
1241     }
1242     return &origBegin[2];
1243 }
1244 
1245 // ONLY the IPv6 address is parsed here, WITHOUT the brackets
parseIp6(QString & host,const QChar * begin,const QChar * end,QUrl::ParsingMode mode)1246 static const QChar *parseIp6(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1247 {
1248     // ### Update to use QStringView once QStringView::indexOf and QStringView::lastIndexOf exists
1249     QString decoded;
1250     if (mode == QUrl::TolerantMode) {
1251         // this struct is kept in automatic storage because it's only 4 bytes
1252         const ushort decodeColon[] = { decode(':'), 0 };
1253         if (qt_urlRecode(decoded, begin, end, QUrl::ComponentFormattingOption::PrettyDecoded, decodeColon) == 0)
1254             decoded = QString(begin, end-begin);
1255     } else {
1256       decoded = QString(begin, end-begin);
1257     }
1258 
1259     const QLatin1String zoneIdIdentifier("%25");
1260     QIPAddressUtils::IPv6Address address;
1261     QString zoneId;
1262 
1263     const QChar *endBeforeZoneId = decoded.constEnd();
1264 
1265     int zoneIdPosition = decoded.indexOf(zoneIdIdentifier);
1266     if ((zoneIdPosition != -1) && (decoded.lastIndexOf(zoneIdIdentifier) == zoneIdPosition)) {
1267         zoneId = decoded.mid(zoneIdPosition + zoneIdIdentifier.size());
1268         endBeforeZoneId = decoded.constBegin() + zoneIdPosition;
1269 
1270         // was there anything after the zone ID separator?
1271         if (zoneId.isEmpty())
1272             return end;
1273     }
1274 
1275     // did the address become empty after removing the zone ID?
1276     // (it might have always been empty)
1277     if (decoded.constBegin() == endBeforeZoneId)
1278         return end;
1279 
1280     const QChar *ret = QIPAddressUtils::parseIp6(address, decoded.constBegin(), endBeforeZoneId);
1281     if (ret)
1282         return begin + (ret - decoded.constBegin());
1283 
1284     host.reserve(host.size() + (decoded.constEnd() - decoded.constBegin()));
1285     host += QLatin1Char('[');
1286     QIPAddressUtils::toString(host, address);
1287 
1288     if (!zoneId.isEmpty()) {
1289         host += zoneIdIdentifier;
1290         host += zoneId;
1291     }
1292     host += QLatin1Char(']');
1293     return nullptr;
1294 }
1295 
setHost(const QString & value,int from,int iend,QUrl::ParsingMode mode)1296 inline bool QUrlPrivate::setHost(const QString &value, int from, int iend, QUrl::ParsingMode mode)
1297 {
1298     const QChar *begin = value.constData() + from;
1299     const QChar *end = value.constData() + iend;
1300 
1301     const int len = end - begin;
1302     host.clear();
1303     sectionIsPresent |= Host;
1304     if (len == 0)
1305         return true;
1306 
1307     if (begin[0].unicode() == '[') {
1308         // IPv6Address or IPvFuture
1309         // smallest IPv6 address is      "[::]"   (len = 4)
1310         // smallest IPvFuture address is "[v7.X]" (len = 6)
1311         if (end[-1].unicode() != ']') {
1312             setError(HostMissingEndBracket, value);
1313             return false;
1314         }
1315 
1316         if (len > 5 && begin[1].unicode() == 'v') {
1317             const QChar *c = parseIpFuture(host, begin, end, mode);
1318             if (c)
1319                 setError(InvalidIPvFutureError, value, c - value.constData());
1320             return !c;
1321         } else if (begin[1].unicode() == 'v') {
1322             setError(InvalidIPvFutureError, value, from);
1323         }
1324 
1325         const QChar *c = parseIp6(host, begin + 1, end - 1, mode);
1326         if (!c)
1327             return true;
1328 
1329         if (c == end - 1)
1330             setError(InvalidIPv6AddressError, value, from);
1331         else
1332             setError(InvalidCharacterInIPv6Error, value, c - value.constData());
1333         return false;
1334     }
1335 
1336     // check if it's an IPv4 address
1337     QIPAddressUtils::IPv4Address ip4;
1338     if (QIPAddressUtils::parseIp4(ip4, begin, end)) {
1339         // yes, it was
1340         QIPAddressUtils::toString(host, ip4);
1341         return true;
1342     }
1343 
1344     // This is probably a reg-name.
1345     // But it can also be an encoded string that, when decoded becomes one
1346     // of the types above.
1347     //
1348     // Two types of encoding are possible:
1349     //  percent encoding (e.g., "%31%30%2E%30%2E%30%2E%31" -> "10.0.0.1")
1350     //  Unicode encoding (some non-ASCII characters case-fold to digits
1351     //                    when nameprepping is done)
1352     //
1353     // The qt_ACE_do function below applies nameprepping and the STD3 check.
1354     // That means a Unicode string may become an IPv4 address, but it cannot
1355     // produce a '[' or a '%'.
1356 
1357     // check for percent-encoding first
1358     QString s;
1359     if (mode == QUrl::TolerantMode && qt_urlRecode(s, begin, end, { }, nullptr)) {
1360         // something was decoded
1361         // anything encoded left?
1362         int pos = s.indexOf(QChar(0x25)); // '%'
1363         if (pos != -1) {
1364             setError(InvalidRegNameError, s, pos);
1365             return false;
1366         }
1367 
1368         // recurse
1369         return setHost(s, 0, s.length(), QUrl::StrictMode);
1370     }
1371 
1372     s = qt_ACE_do(QString::fromRawData(begin, len), NormalizeAce, ForbidLeadingDot);
1373     if (s.isEmpty()) {
1374         setError(InvalidRegNameError, value);
1375         return false;
1376     }
1377 
1378     // check IPv4 again
1379     if (QIPAddressUtils::parseIp4(ip4, s.constBegin(), s.constEnd())) {
1380         QIPAddressUtils::toString(host, ip4);
1381     } else {
1382         host = s;
1383     }
1384     return true;
1385 }
1386 
parse(const QString & url,QUrl::ParsingMode parsingMode)1387 inline void QUrlPrivate::parse(const QString &url, QUrl::ParsingMode parsingMode)
1388 {
1389     //   URI-reference = URI / relative-ref
1390     //   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
1391     //   relative-ref  = relative-part [ "?" query ] [ "#" fragment ]
1392     //   hier-part     = "//" authority path-abempty
1393     //                 / other path types
1394     //   relative-part = "//" authority path-abempty
1395     //                 /  other path types here
1396 
1397     sectionIsPresent = 0;
1398     flags = 0;
1399     clearError();
1400 
1401     // find the important delimiters
1402     int colon = -1;
1403     int question = -1;
1404     int hash = -1;
1405     const int len = url.length();
1406     const QChar *const begin = url.constData();
1407     const ushort *const data = reinterpret_cast<const ushort *>(begin);
1408 
1409     for (int i = 0; i < len; ++i) {
1410         uint uc = data[i];
1411         if (uc == '#' && hash == -1) {
1412             hash = i;
1413 
1414             // nothing more to be found
1415             break;
1416         }
1417 
1418         if (question == -1) {
1419             if (uc == ':' && colon == -1)
1420                 colon = i;
1421             else if (uc == '?')
1422                 question = i;
1423         }
1424     }
1425 
1426     // check if we have a scheme
1427     int hierStart;
1428     if (colon != -1 && setScheme(url, colon, /* don't set error */ false)) {
1429         hierStart = colon + 1;
1430     } else {
1431         // recover from a failed scheme: it might not have been a scheme at all
1432         scheme.clear();
1433         sectionIsPresent = 0;
1434         hierStart = 0;
1435     }
1436 
1437     int pathStart;
1438     int hierEnd = qMin<uint>(qMin<uint>(question, hash), len);
1439     if (hierEnd - hierStart >= 2 && data[hierStart] == '/' && data[hierStart + 1] == '/') {
1440         // we have an authority, it ends at the first slash after these
1441         int authorityEnd = hierEnd;
1442         for (int i = hierStart + 2; i < authorityEnd ; ++i) {
1443             if (data[i] == '/') {
1444                 authorityEnd = i;
1445                 break;
1446             }
1447         }
1448 
1449         setAuthority(url, hierStart + 2, authorityEnd, parsingMode);
1450 
1451         // even if we failed to set the authority properly, let's try to recover
1452         pathStart = authorityEnd;
1453         setPath(url, pathStart, hierEnd);
1454     } else {
1455         userName.clear();
1456         password.clear();
1457         host.clear();
1458         port = -1;
1459         pathStart = hierStart;
1460 
1461         if (hierStart < hierEnd)
1462             setPath(url, hierStart, hierEnd);
1463         else
1464             path.clear();
1465     }
1466 
1467     if (uint(question) < uint(hash))
1468         setQuery(url, question + 1, qMin<uint>(hash, len));
1469 
1470     if (hash != -1)
1471         setFragment(url, hash + 1, len);
1472 
1473     if (error || parsingMode == QUrl::TolerantMode)
1474         return;
1475 
1476     // The parsing so far was partially tolerant of errors, except for the
1477     // scheme parser (which is always strict) and the authority (which was
1478     // executed in strict mode).
1479     // If we haven't found any errors so far, continue the strict-mode parsing
1480     // from the path component onwards.
1481 
1482     if (!validateComponent(Path, url, pathStart, hierEnd))
1483         return;
1484     if (uint(question) < uint(hash) && !validateComponent(Query, url, question + 1, qMin<uint>(hash, len)))
1485         return;
1486     if (hash != -1)
1487         validateComponent(Fragment, url, hash + 1, len);
1488 }
1489 
toLocalFile(QUrl::FormattingOptions options) const1490 QString QUrlPrivate::toLocalFile(QUrl::FormattingOptions options) const
1491 {
1492     QString tmp;
1493     QString ourPath;
1494     appendPath(ourPath, options, QUrlPrivate::Path);
1495 
1496     // magic for shared drive on windows
1497     if (!host.isEmpty()) {
1498         tmp = QLatin1String("//") + host;
1499 #ifdef Q_OS_WIN // QTBUG-42346, WebDAV is visible as local file on Windows only.
1500         if (scheme == webDavScheme())
1501             tmp += webDavSslTag();
1502 #endif
1503         if (!ourPath.isEmpty() && !ourPath.startsWith(QLatin1Char('/')))
1504             tmp += QLatin1Char('/');
1505         tmp += ourPath;
1506     } else {
1507         tmp = ourPath;
1508 #ifdef Q_OS_WIN
1509         // magic for drives on windows
1510         if (ourPath.length() > 2 && ourPath.at(0) == QLatin1Char('/') && ourPath.at(2) == QLatin1Char(':'))
1511             tmp.remove(0, 1);
1512 #endif
1513     }
1514     return tmp;
1515 }
1516 
1517 /*
1518     From http://www.ietf.org/rfc/rfc3986.txt, 5.2.3: Merge paths
1519 
1520     Returns a merge of the current path with the relative path passed
1521     as argument.
1522 
1523     Note: \a relativePath is relative (does not start with '/').
1524 */
mergePaths(const QString & relativePath) const1525 inline QString QUrlPrivate::mergePaths(const QString &relativePath) const
1526 {
1527     // If the base URI has a defined authority component and an empty
1528     // path, then return a string consisting of "/" concatenated with
1529     // the reference's path; otherwise,
1530     if (!host.isEmpty() && path.isEmpty())
1531         return QLatin1Char('/') + relativePath;
1532 
1533     // Return a string consisting of the reference's path component
1534     // appended to all but the last segment of the base URI's path
1535     // (i.e., excluding any characters after the right-most "/" in the
1536     // base URI path, or excluding the entire base URI path if it does
1537     // not contain any "/" characters).
1538     QString newPath;
1539     if (!path.contains(QLatin1Char('/')))
1540         newPath = relativePath;
1541     else
1542         newPath = path.leftRef(path.lastIndexOf(QLatin1Char('/')) + 1) + relativePath;
1543 
1544     return newPath;
1545 }
1546 
1547 /*
1548     From http://www.ietf.org/rfc/rfc3986.txt, 5.2.4: Remove dot segments
1549 
1550     Removes unnecessary ../ and ./ from the path. Used for normalizing
1551     the URL.
1552 */
removeDotsFromPath(QString * path)1553 static void removeDotsFromPath(QString *path)
1554 {
1555     // The input buffer is initialized with the now-appended path
1556     // components and the output buffer is initialized to the empty
1557     // string.
1558     QChar *out = path->data();
1559     const QChar *in = out;
1560     const QChar *end = out + path->size();
1561 
1562     // If the input buffer consists only of
1563     // "." or "..", then remove that from the input
1564     // buffer;
1565     if (path->size() == 1 && in[0].unicode() == '.')
1566         ++in;
1567     else if (path->size() == 2 && in[0].unicode() == '.' && in[1].unicode() == '.')
1568         in += 2;
1569     // While the input buffer is not empty, loop:
1570     while (in < end) {
1571 
1572         // otherwise, if the input buffer begins with a prefix of "../" or "./",
1573         // then remove that prefix from the input buffer;
1574         if (path->size() >= 2 && in[0].unicode() == '.' && in[1].unicode() == '/')
1575             in += 2;
1576         else if (path->size() >= 3 && in[0].unicode() == '.'
1577                  && in[1].unicode() == '.' && in[2].unicode() == '/')
1578             in += 3;
1579 
1580         // otherwise, if the input buffer begins with a prefix of
1581         // "/./" or "/.", where "." is a complete path segment,
1582         // then replace that prefix with "/" in the input buffer;
1583         if (in <= end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1584                 && in[2].unicode() == '/') {
1585             in += 2;
1586             continue;
1587         } else if (in == end - 2 && in[0].unicode() == '/' && in[1].unicode() == '.') {
1588             *out++ = QLatin1Char('/');
1589             in += 2;
1590             break;
1591         }
1592 
1593         // otherwise, if the input buffer begins with a prefix
1594         // of "/../" or "/..", where ".." is a complete path
1595         // segment, then replace that prefix with "/" in the
1596         // input buffer and remove the last //segment and its
1597         // preceding "/" (if any) from the output buffer;
1598         if (in <= end - 4 && in[0].unicode() == '/' && in[1].unicode() == '.'
1599                 && in[2].unicode() == '.' && in[3].unicode() == '/') {
1600             while (out > path->constData() && (--out)->unicode() != '/')
1601                 ;
1602             if (out == path->constData() && out->unicode() != '/')
1603                 ++in;
1604             in += 3;
1605             continue;
1606         } else if (in == end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1607                    && in[2].unicode() == '.') {
1608             while (out > path->constData() && (--out)->unicode() != '/')
1609                 ;
1610             if (out->unicode() == '/')
1611                 ++out;
1612             in += 3;
1613             break;
1614         }
1615 
1616         // otherwise move the first path segment in
1617         // the input buffer to the end of the output
1618         // buffer, including the initial "/" character
1619         // (if any) and any subsequent characters up
1620         // to, but not including, the next "/"
1621         // character or the end of the input buffer.
1622         *out++ = *in++;
1623         while (in < end && in->unicode() != '/')
1624             *out++ = *in++;
1625     }
1626     path->truncate(out - path->constData());
1627 }
1628 
validityError(QString * source,int * position) const1629 inline QUrlPrivate::ErrorCode QUrlPrivate::validityError(QString *source, int *position) const
1630 {
1631     Q_ASSERT(!source == !position);
1632     if (error) {
1633         if (source) {
1634             *source = error->source;
1635             *position = error->position;
1636         }
1637         return error->code;
1638     }
1639 
1640     // There are three more cases of invalid URLs that QUrl recognizes and they
1641     // are only possible with constructed URLs (setXXX methods), not with
1642     // parsing. Therefore, they are tested here.
1643     //
1644     // Two cases are a non-empty path that doesn't start with a slash and:
1645     //  - with an authority
1646     //  - without an authority, without scheme but the path with a colon before
1647     //    the first slash
1648     // The third case is an empty authority and a non-empty path that starts
1649     // with "//".
1650     // Those cases are considered invalid because toString() would produce a URL
1651     // that wouldn't be parsed back to the same QUrl.
1652 
1653     if (path.isEmpty())
1654         return NoError;
1655     if (path.at(0) == QLatin1Char('/')) {
1656         if (hasAuthority() || path.length() == 1 || path.at(1) != QLatin1Char('/'))
1657             return NoError;
1658         if (source) {
1659             *source = path;
1660             *position = 0;
1661         }
1662         return AuthorityAbsentAndPathIsDoubleSlash;
1663     }
1664 
1665     if (sectionIsPresent & QUrlPrivate::Host) {
1666         if (source) {
1667             *source = path;
1668             *position = 0;
1669         }
1670         return AuthorityPresentAndPathIsRelative;
1671     }
1672     if (sectionIsPresent & QUrlPrivate::Scheme)
1673         return NoError;
1674 
1675     // check for a path of "text:text/"
1676     for (int i = 0; i < path.length(); ++i) {
1677         ushort c = path.at(i).unicode();
1678         if (c == '/') {
1679             // found the slash before the colon
1680             return NoError;
1681         }
1682         if (c == ':') {
1683             // found the colon before the slash, it's invalid
1684             if (source) {
1685                 *source = path;
1686                 *position = i;
1687             }
1688             return RelativeUrlPathContainsColonBeforeSlash;
1689         }
1690     }
1691     return NoError;
1692 }
1693 
validateComponent(QUrlPrivate::Section section,const QString & input,int begin,int end)1694 bool QUrlPrivate::validateComponent(QUrlPrivate::Section section, const QString &input,
1695                                     int begin, int end)
1696 {
1697     // What we need to look out for, that the regular parser tolerates:
1698     //  - percent signs not followed by two hex digits
1699     //  - forbidden characters, which should always appear encoded
1700     //    '"' / '<' / '>' / '\' / '^' / '`' / '{' / '|' / '}' / BKSP
1701     //    control characters
1702     //  - delimiters not allowed in certain positions
1703     //    . scheme: parser is already strict
1704     //    . user info: gen-delims except ":" disallowed ("/" / "?" / "#" / "[" / "]" / "@")
1705     //    . host: parser is stricter than the standard
1706     //    . port: parser is stricter than the standard
1707     //    . path: all delimiters allowed
1708     //    . fragment: all delimiters allowed
1709     //    . query: all delimiters allowed
1710     static const char forbidden[] = "\"<>\\^`{|}\x7F";
1711     static const char forbiddenUserInfo[] = ":/?#[]@";
1712 
1713     Q_ASSERT(section != Authority && section != Hierarchy && section != FullUrl);
1714 
1715     const ushort *const data = reinterpret_cast<const ushort *>(input.constData());
1716     for (uint i = uint(begin); i < uint(end); ++i) {
1717         uint uc = data[i];
1718         if (uc >= 0x80)
1719             continue;
1720 
1721         bool error = false;
1722         if ((uc == '%' && (uint(end) < i + 2 || !isHex(data[i + 1]) || !isHex(data[i + 2])))
1723                 || uc <= 0x20 || strchr(forbidden, uc)) {
1724             // found an error
1725             error = true;
1726         } else if (section & UserInfo) {
1727             if (section == UserInfo && strchr(forbiddenUserInfo + 1, uc))
1728                 error = true;
1729             else if (section != UserInfo && strchr(forbiddenUserInfo, uc))
1730                 error = true;
1731         }
1732 
1733         if (!error)
1734             continue;
1735 
1736         ErrorCode errorCode = ErrorCode(int(section) << 8);
1737         if (section == UserInfo) {
1738             // is it the user name or the password?
1739             errorCode = InvalidUserNameError;
1740             for (uint j = uint(begin); j < i; ++j)
1741                 if (data[j] == ':') {
1742                     errorCode = InvalidPasswordError;
1743                     break;
1744                 }
1745         }
1746 
1747         setError(errorCode, input, i);
1748         return false;
1749     }
1750 
1751     // no errors
1752     return true;
1753 }
1754 
1755 #if 0
1756 inline void QUrlPrivate::validate() const
1757 {
1758     QUrlPrivate *that = (QUrlPrivate *)this;
1759     that->encodedOriginal = that->toEncoded(); // may detach
1760     parse(ParseOnly);
1761 
1762     QURL_SETFLAG(that->stateFlags, Validated);
1763 
1764     if (!isValid)
1765         return;
1766 
1767     QString auth = authority(); // causes the non-encoded forms to be valid
1768 
1769     // authority() calls canonicalHost() which sets this
1770     if (!isHostValid)
1771         return;
1772 
1773     if (scheme == QLatin1String("mailto")) {
1774         if (!host.isEmpty() || port != -1 || !userName.isEmpty() || !password.isEmpty()) {
1775             that->isValid = false;
1776             that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "expected empty host, username,"
1777                                                            "port and password"),
1778                                       0, 0);
1779         }
1780     } else if (scheme == ftpScheme() || scheme == httpScheme()) {
1781         if (host.isEmpty() && !(path.isEmpty() && encodedPath.isEmpty())) {
1782             that->isValid = false;
1783             that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "the host is empty, but not the path"),
1784                                       0, 0);
1785         }
1786     }
1787 }
1788 #endif
1789 
1790 /*!
1791     \macro QT_NO_URL_CAST_FROM_STRING
1792     \relates QUrl
1793 
1794     Disables automatic conversions from QString (or char *) to QUrl.
1795 
1796     Compiling your code with this define is useful when you have a lot of
1797     code that uses QString for file names and you wish to convert it to
1798     use QUrl for network transparency. In any code that uses QUrl, it can
1799     help avoid missing QUrl::resolved() calls, and other misuses of
1800     QString to QUrl conversions.
1801 
1802     \oldcode
1803         url = filename; // probably not what you want
1804     \newcode
1805         url = QUrl::fromLocalFile(filename);
1806         url = baseurl.resolved(QUrl(filename));
1807     \endcode
1808 
1809     \sa QT_NO_CAST_FROM_ASCII
1810 */
1811 
1812 
1813 /*!
1814     Constructs a URL by parsing \a url. QUrl will automatically percent encode
1815     all characters that are not allowed in a URL and decode the percent-encoded
1816     sequences that represent an unreserved character (letters, digits, hyphens,
1817     undercores, dots and tildes). All other characters are left in their
1818     original forms.
1819 
1820     Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1821     (the default), QUrl will correct certain mistakes, notably the presence of
1822     a percent character ('%') not followed by two hexadecimal digits, and it
1823     will accept any character in any position. In StrictMode, encoding mistakes
1824     will not be tolerated and QUrl will also check that certain forbidden
1825     characters are not present in unencoded form. If an error is detected in
1826     StrictMode, isValid() will return false. The parsing mode DecodedMode is not
1827     permitted in this context.
1828 
1829     Example:
1830 
1831     \snippet code/src_corelib_io_qurl.cpp 0
1832 
1833     To construct a URL from an encoded string, you can also use fromEncoded():
1834 
1835     \snippet code/src_corelib_io_qurl.cpp 1
1836 
1837     Both functions are equivalent and, in Qt 5, both functions accept encoded
1838     data. Usually, the choice of the QUrl constructor or setUrl() versus
1839     fromEncoded() will depend on the source data: the constructor and setUrl()
1840     take a QString, whereas fromEncoded takes a QByteArray.
1841 
1842     \sa setUrl(), fromEncoded(), TolerantMode
1843 */
QUrl(const QString & url,ParsingMode parsingMode)1844 QUrl::QUrl(const QString &url, ParsingMode parsingMode) : d(nullptr)
1845 {
1846     setUrl(url, parsingMode);
1847 }
1848 
1849 /*!
1850     Constructs an empty QUrl object.
1851 */
QUrl()1852 QUrl::QUrl() : d(nullptr)
1853 {
1854 }
1855 
1856 /*!
1857     Constructs a copy of \a other.
1858 */
QUrl(const QUrl & other)1859 QUrl::QUrl(const QUrl &other) : d(other.d)
1860 {
1861     if (d)
1862         d->ref.ref();
1863 }
1864 
1865 /*!
1866     Destructor; called immediately before the object is deleted.
1867 */
~QUrl()1868 QUrl::~QUrl()
1869 {
1870     if (d && !d->ref.deref())
1871         delete d;
1872 }
1873 
1874 /*!
1875     Returns \c true if the URL is non-empty and valid; otherwise returns \c false.
1876 
1877     The URL is run through a conformance test. Every part of the URL
1878     must conform to the standard encoding rules of the URI standard
1879     for the URL to be reported as valid.
1880 
1881     \snippet code/src_corelib_io_qurl.cpp 2
1882 */
isValid() const1883 bool QUrl::isValid() const
1884 {
1885     if (isEmpty()) {
1886         // also catches d == nullptr
1887         return false;
1888     }
1889     return d->validityError() == QUrlPrivate::NoError;
1890 }
1891 
1892 /*!
1893     Returns \c true if the URL has no data; otherwise returns \c false.
1894 
1895     \sa clear()
1896 */
isEmpty() const1897 bool QUrl::isEmpty() const
1898 {
1899     if (!d) return true;
1900     return d->isEmpty();
1901 }
1902 
1903 /*!
1904     Resets the content of the QUrl. After calling this function, the
1905     QUrl is equal to one that has been constructed with the default
1906     empty constructor.
1907 
1908     \sa isEmpty()
1909 */
clear()1910 void QUrl::clear()
1911 {
1912     if (d && !d->ref.deref())
1913         delete d;
1914     d = nullptr;
1915 }
1916 
1917 /*!
1918     Parses \a url and sets this object to that value. QUrl will automatically
1919     percent encode all characters that are not allowed in a URL and decode the
1920     percent-encoded sequences that represent an unreserved character (letters,
1921     digits, hyphens, undercores, dots and tildes). All other characters are
1922     left in their original forms.
1923 
1924     Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1925     (the default), QUrl will correct certain mistakes, notably the presence of
1926     a percent character ('%') not followed by two hexadecimal digits, and it
1927     will accept any character in any position. In StrictMode, encoding mistakes
1928     will not be tolerated and QUrl will also check that certain forbidden
1929     characters are not present in unencoded form. If an error is detected in
1930     StrictMode, isValid() will return false. The parsing mode DecodedMode is
1931     not permitted in this context and will produce a run-time warning.
1932 
1933     \sa url(), toString()
1934 */
setUrl(const QString & url,ParsingMode parsingMode)1935 void QUrl::setUrl(const QString &url, ParsingMode parsingMode)
1936 {
1937     if (parsingMode == DecodedMode) {
1938         qWarning("QUrl: QUrl::DecodedMode is not permitted when parsing a full URL");
1939     } else {
1940         detach();
1941         d->parse(url, parsingMode);
1942     }
1943 }
1944 
1945 /*!
1946     \fn void QUrl::setEncodedUrl(const QByteArray &encodedUrl, ParsingMode parsingMode)
1947     \deprecated
1948     Constructs a URL by parsing the contents of \a encodedUrl.
1949 
1950     \a encodedUrl is assumed to be a URL string in percent encoded
1951     form, containing only ASCII characters.
1952 
1953     The parsing mode \a parsingMode is used for parsing \a encodedUrl.
1954 
1955     \obsolete Use setUrl(QString::fromUtf8(encodedUrl), parsingMode)
1956 
1957     \sa setUrl()
1958 */
1959 
1960 /*!
1961     Sets the scheme of the URL to \a scheme. As a scheme can only
1962     contain ASCII characters, no conversion or decoding is done on the
1963     input. It must also start with an ASCII letter.
1964 
1965     The scheme describes the type (or protocol) of the URL. It's
1966     represented by one or more ASCII characters at the start the URL.
1967 
1968     A scheme is strictly \l {http://www.ietf.org/rfc/rfc3986.txt} {RFC 3986}-compliant:
1969         \tt {scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )}
1970 
1971     The following example shows a URL where the scheme is "ftp":
1972 
1973     \image qurl-authority2.png
1974 
1975     To set the scheme, the following call is used:
1976     \snippet code/src_corelib_io_qurl.cpp 11
1977 
1978     The scheme can also be empty, in which case the URL is interpreted
1979     as relative.
1980 
1981     \sa scheme(), isRelative()
1982 */
setScheme(const QString & scheme)1983 void QUrl::setScheme(const QString &scheme)
1984 {
1985     detach();
1986     d->clearError();
1987     if (scheme.isEmpty()) {
1988         // schemes are not allowed to be empty
1989         d->sectionIsPresent &= ~QUrlPrivate::Scheme;
1990         d->flags &= ~QUrlPrivate::IsLocalFile;
1991         d->scheme.clear();
1992     } else {
1993         d->setScheme(scheme, scheme.length(), /* do set error */ true);
1994     }
1995 }
1996 
1997 /*!
1998     Returns the scheme of the URL. If an empty string is returned,
1999     this means the scheme is undefined and the URL is then relative.
2000 
2001     The scheme can only contain US-ASCII letters or digits, which means it
2002     cannot contain any character that would otherwise require encoding.
2003     Additionally, schemes are always returned in lowercase form.
2004 
2005     \sa setScheme(), isRelative()
2006 */
scheme() const2007 QString QUrl::scheme() const
2008 {
2009     if (!d) return QString();
2010 
2011     return d->scheme;
2012 }
2013 
2014 /*!
2015     Sets the authority of the URL to \a authority.
2016 
2017     The authority of a URL is the combination of user info, a host
2018     name and a port. All of these elements are optional; an empty
2019     authority is therefore valid.
2020 
2021     The user info and host are separated by a '@', and the host and
2022     port are separated by a ':'. If the user info is empty, the '@'
2023     must be omitted; although a stray ':' is permitted if the port is
2024     empty.
2025 
2026     The following example shows a valid authority string:
2027 
2028     \image qurl-authority.png
2029 
2030     The \a authority data is interpreted according to \a mode: in StrictMode,
2031     any '%' characters must be followed by exactly two hexadecimal characters
2032     and some characters (including space) are not allowed in undecoded form. In
2033     TolerantMode (the default), all characters are accepted in undecoded form
2034     and the tolerant parser will correct stray '%' not followed by two hex
2035     characters.
2036 
2037     This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2038     decoded data, call setUserName(), setPassword(), setHost() and setPort()
2039     individually.
2040 
2041     \sa setUserInfo(), setHost(), setPort()
2042 */
setAuthority(const QString & authority,ParsingMode mode)2043 void QUrl::setAuthority(const QString &authority, ParsingMode mode)
2044 {
2045     detach();
2046     d->clearError();
2047 
2048     if (mode == DecodedMode) {
2049         qWarning("QUrl::setAuthority(): QUrl::DecodedMode is not permitted in this function");
2050         return;
2051     }
2052 
2053     d->setAuthority(authority, 0, authority.length(), mode);
2054     if (authority.isNull()) {
2055         // QUrlPrivate::setAuthority cleared almost everything
2056         // but it leaves the Host bit set
2057         d->sectionIsPresent &= ~QUrlPrivate::Authority;
2058     }
2059 }
2060 
2061 /*!
2062     Returns the authority of the URL if it is defined; otherwise
2063     an empty string is returned.
2064 
2065     This function returns an unambiguous value, which may contain that
2066     characters still percent-encoded, plus some control sequences not
2067     representable in decoded form in QString.
2068 
2069     The \a options argument controls how to format the user info component. The
2070     value of QUrl::FullyDecoded is not permitted in this function. If you need
2071     to obtain fully decoded data, call userName(), password(), host() and
2072     port() individually.
2073 
2074     \sa setAuthority(), userInfo(), userName(), password(), host(), port()
2075 */
authority(ComponentFormattingOptions options) const2076 QString QUrl::authority(ComponentFormattingOptions options) const
2077 {
2078     QString result;
2079     if (!d)
2080         return result;
2081 
2082     if (options == QUrl::FullyDecoded) {
2083         qWarning("QUrl::authority(): QUrl::FullyDecoded is not permitted in this function");
2084         return result;
2085     }
2086 
2087     d->appendAuthority(result, options, QUrlPrivate::Authority);
2088     return result;
2089 }
2090 
2091 /*!
2092     Sets the user info of the URL to \a userInfo. The user info is an
2093     optional part of the authority of the URL, as described in
2094     setAuthority().
2095 
2096     The user info consists of a user name and optionally a password,
2097     separated by a ':'. If the password is empty, the colon must be
2098     omitted. The following example shows a valid user info string:
2099 
2100     \image qurl-authority3.png
2101 
2102     The \a userInfo data is interpreted according to \a mode: in StrictMode,
2103     any '%' characters must be followed by exactly two hexadecimal characters
2104     and some characters (including space) are not allowed in undecoded form. In
2105     TolerantMode (the default), all characters are accepted in undecoded form
2106     and the tolerant parser will correct stray '%' not followed by two hex
2107     characters.
2108 
2109     This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2110     decoded data, call setUserName() and setPassword() individually.
2111 
2112     \sa userInfo(), setUserName(), setPassword(), setAuthority()
2113 */
setUserInfo(const QString & userInfo,ParsingMode mode)2114 void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode)
2115 {
2116     detach();
2117     d->clearError();
2118     QString trimmed = userInfo.trimmed();
2119     if (mode == DecodedMode) {
2120         qWarning("QUrl::setUserInfo(): QUrl::DecodedMode is not permitted in this function");
2121         return;
2122     }
2123 
2124     d->setUserInfo(trimmed, 0, trimmed.length());
2125     if (userInfo.isNull()) {
2126         // QUrlPrivate::setUserInfo cleared almost everything
2127         // but it leaves the UserName bit set
2128         d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2129     } else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::UserInfo, userInfo)) {
2130         d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2131         d->userName.clear();
2132         d->password.clear();
2133     }
2134 }
2135 
2136 /*!
2137     Returns the user info of the URL, or an empty string if the user
2138     info is undefined.
2139 
2140     This function returns an unambiguous value, which may contain that
2141     characters still percent-encoded, plus some control sequences not
2142     representable in decoded form in QString.
2143 
2144     The \a options argument controls how to format the user info component. The
2145     value of QUrl::FullyDecoded is not permitted in this function. If you need
2146     to obtain fully decoded data, call userName() and password() individually.
2147 
2148     \sa setUserInfo(), userName(), password(), authority()
2149 */
userInfo(ComponentFormattingOptions options) const2150 QString QUrl::userInfo(ComponentFormattingOptions options) const
2151 {
2152     QString result;
2153     if (!d)
2154         return result;
2155 
2156     if (options == QUrl::FullyDecoded) {
2157         qWarning("QUrl::userInfo(): QUrl::FullyDecoded is not permitted in this function");
2158         return result;
2159     }
2160 
2161     d->appendUserInfo(result, options, QUrlPrivate::UserInfo);
2162     return result;
2163 }
2164 
2165 /*!
2166     Sets the URL's user name to \a userName. The \a userName is part
2167     of the user info element in the authority of the URL, as described
2168     in setUserInfo().
2169 
2170     The \a userName data is interpreted according to \a mode: in StrictMode,
2171     any '%' characters must be followed by exactly two hexadecimal characters
2172     and some characters (including space) are not allowed in undecoded form. In
2173     TolerantMode (the default), all characters are accepted in undecoded form
2174     and the tolerant parser will correct stray '%' not followed by two hex
2175     characters. In DecodedMode, '%' stand for themselves and encoded characters
2176     are not possible.
2177 
2178     QUrl::DecodedMode should be used when setting the user name from a data
2179     source which is not a URL, such as a password dialog shown to the user or
2180     with a user name obtained by calling userName() with the QUrl::FullyDecoded
2181     formatting option.
2182 
2183     \sa userName(), setUserInfo()
2184 */
setUserName(const QString & userName,ParsingMode mode)2185 void QUrl::setUserName(const QString &userName, ParsingMode mode)
2186 {
2187     detach();
2188     d->clearError();
2189 
2190     QString data = userName;
2191     if (mode == DecodedMode) {
2192         parseDecodedComponent(data);
2193         mode = TolerantMode;
2194     }
2195 
2196     d->setUserName(data, 0, data.length());
2197     if (userName.isNull())
2198         d->sectionIsPresent &= ~QUrlPrivate::UserName;
2199     else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::UserName, userName))
2200         d->userName.clear();
2201 }
2202 
2203 /*!
2204     Returns the user name of the URL if it is defined; otherwise
2205     an empty string is returned.
2206 
2207     The \a options argument controls how to format the user name component. All
2208     values produce an unambiguous result. With QUrl::FullyDecoded, all
2209     percent-encoded sequences are decoded; otherwise, the returned value may
2210     contain some percent-encoded sequences for some control sequences not
2211     representable in decoded form in QString.
2212 
2213     Note that QUrl::FullyDecoded may cause data loss if those non-representable
2214     sequences are present. It is recommended to use that value when the result
2215     will be used in a non-URL context, such as setting in QAuthenticator or
2216     negotiating a login.
2217 
2218     \sa setUserName(), userInfo()
2219 */
userName(ComponentFormattingOptions options) const2220 QString QUrl::userName(ComponentFormattingOptions options) const
2221 {
2222     QString result;
2223     if (d)
2224         d->appendUserName(result, options);
2225     return result;
2226 }
2227 
2228 /*!
2229     \fn void QUrl::setEncodedUserName(const QByteArray &userName)
2230     \deprecated
2231     \since 4.4
2232 
2233     Sets the URL's user name to the percent-encoded \a userName. The \a
2234     userName is part of the user info element in the authority of the
2235     URL, as described in setUserInfo().
2236 
2237     \obsolete Use setUserName(QString::fromUtf8(userName))
2238 
2239     \sa setUserName(), encodedUserName(), setUserInfo()
2240 */
2241 
2242 /*!
2243     \fn QByteArray QUrl::encodedUserName() const
2244     \deprecated
2245     \since 4.4
2246 
2247     Returns the user name of the URL if it is defined; otherwise
2248     an empty string is returned. The returned value will have its
2249     non-ASCII and other control characters percent-encoded, as in
2250     toEncoded().
2251 
2252     \obsolete Use userName(QUrl::FullyEncoded).toLatin1()
2253 
2254     \sa setEncodedUserName()
2255 */
2256 
2257 /*!
2258     Sets the URL's password to \a password. The \a password is part of
2259     the user info element in the authority of the URL, as described in
2260     setUserInfo().
2261 
2262     The \a password data is interpreted according to \a mode: in StrictMode,
2263     any '%' characters must be followed by exactly two hexadecimal characters
2264     and some characters (including space) are not allowed in undecoded form. In
2265     TolerantMode, all characters are accepted in undecoded form and the
2266     tolerant parser will correct stray '%' not followed by two hex characters.
2267     In DecodedMode, '%' stand for themselves and encoded characters are not
2268     possible.
2269 
2270     QUrl::DecodedMode should be used when setting the password from a data
2271     source which is not a URL, such as a password dialog shown to the user or
2272     with a password obtained by calling password() with the QUrl::FullyDecoded
2273     formatting option.
2274 
2275     \sa password(), setUserInfo()
2276 */
setPassword(const QString & password,ParsingMode mode)2277 void QUrl::setPassword(const QString &password, ParsingMode mode)
2278 {
2279     detach();
2280     d->clearError();
2281 
2282     QString data = password;
2283     if (mode == DecodedMode) {
2284         parseDecodedComponent(data);
2285         mode = TolerantMode;
2286     }
2287 
2288     d->setPassword(data, 0, data.length());
2289     if (password.isNull())
2290         d->sectionIsPresent &= ~QUrlPrivate::Password;
2291     else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::Password, password))
2292         d->password.clear();
2293 }
2294 
2295 /*!
2296     Returns the password of the URL if it is defined; otherwise
2297     an empty string is returned.
2298 
2299     The \a options argument controls how to format the user name component. All
2300     values produce an unambiguous result. With QUrl::FullyDecoded, all
2301     percent-encoded sequences are decoded; otherwise, the returned value may
2302     contain some percent-encoded sequences for some control sequences not
2303     representable in decoded form in QString.
2304 
2305     Note that QUrl::FullyDecoded may cause data loss if those non-representable
2306     sequences are present. It is recommended to use that value when the result
2307     will be used in a non-URL context, such as setting in QAuthenticator or
2308     negotiating a login.
2309 
2310     \sa setPassword()
2311 */
password(ComponentFormattingOptions options) const2312 QString QUrl::password(ComponentFormattingOptions options) const
2313 {
2314     QString result;
2315     if (d)
2316         d->appendPassword(result, options);
2317     return result;
2318 }
2319 
2320 /*!
2321     \fn void QUrl::setEncodedPassword(const QByteArray &password)
2322     \deprecated
2323     \since 4.4
2324 
2325     Sets the URL's password to the percent-encoded \a password. The \a
2326     password is part of the user info element in the authority of the
2327     URL, as described in setUserInfo().
2328 
2329     \obsolete Use setPassword(QString::fromUtf8(password));
2330 
2331     \sa setPassword(), encodedPassword(), setUserInfo()
2332 */
2333 
2334 /*!
2335     \fn QByteArray QUrl::encodedPassword() const
2336     \deprecated
2337     \since 4.4
2338 
2339     Returns the password of the URL if it is defined; otherwise an
2340     empty string is returned. The returned value will have its
2341     non-ASCII and other control characters percent-encoded, as in
2342     toEncoded().
2343 
2344     \obsolete Use password(QUrl::FullyEncoded).toLatin1()
2345 
2346     \sa setEncodedPassword(), toEncoded()
2347 */
2348 
2349 /*!
2350     Sets the host of the URL to \a host. The host is part of the
2351     authority.
2352 
2353     The \a host data is interpreted according to \a mode: in StrictMode,
2354     any '%' characters must be followed by exactly two hexadecimal characters
2355     and some characters (including space) are not allowed in undecoded form. In
2356     TolerantMode, all characters are accepted in undecoded form and the
2357     tolerant parser will correct stray '%' not followed by two hex characters.
2358     In DecodedMode, '%' stand for themselves and encoded characters are not
2359     possible.
2360 
2361     Note that, in all cases, the result of the parsing must be a valid hostname
2362     according to STD 3 rules, as modified by the Internationalized Resource
2363     Identifiers specification (RFC 3987). Invalid hostnames are not permitted
2364     and will cause isValid() to become false.
2365 
2366     \sa host(), setAuthority()
2367 */
setHost(const QString & host,ParsingMode mode)2368 void QUrl::setHost(const QString &host, ParsingMode mode)
2369 {
2370     detach();
2371     d->clearError();
2372 
2373     QString data = host;
2374     if (mode == DecodedMode) {
2375         parseDecodedComponent(data);
2376         mode = TolerantMode;
2377     }
2378 
2379     if (d->setHost(data, 0, data.length(), mode)) {
2380         if (host.isNull())
2381             d->sectionIsPresent &= ~QUrlPrivate::Host;
2382     } else if (!data.startsWith(QLatin1Char('['))) {
2383         // setHost failed, it might be IPv6 or IPvFuture in need of bracketing
2384         Q_ASSERT(d->error);
2385 
2386         data.prepend(QLatin1Char('['));
2387         data.append(QLatin1Char(']'));
2388         if (!d->setHost(data, 0, data.length(), mode)) {
2389             // failed again
2390             if (data.contains(QLatin1Char(':'))) {
2391                 // source data contains ':', so it's an IPv6 error
2392                 d->error->code = QUrlPrivate::InvalidIPv6AddressError;
2393             }
2394         } else {
2395             // succeeded
2396             d->clearError();
2397         }
2398     }
2399 }
2400 
2401 /*!
2402     Returns the host of the URL if it is defined; otherwise
2403     an empty string is returned.
2404 
2405     The \a options argument controls how the hostname will be formatted. The
2406     QUrl::EncodeUnicode option will cause this function to return the hostname
2407     in the ASCII-Compatible Encoding (ACE) form, which is suitable for use in
2408     channels that are not 8-bit clean or that require the legacy hostname (such
2409     as DNS requests or in HTTP request headers). If that flag is not present,
2410     this function returns the International Domain Name (IDN) in Unicode form,
2411     according to the list of permissible top-level domains (see
2412     idnWhitelist()).
2413 
2414     All other flags are ignored. Host names cannot contain control or percent
2415     characters, so the returned value can be considered fully decoded.
2416 
2417     \sa setHost(), idnWhitelist(), setIdnWhitelist(), authority()
2418 */
host(ComponentFormattingOptions options) const2419 QString QUrl::host(ComponentFormattingOptions options) const
2420 {
2421     QString result;
2422     if (d) {
2423         d->appendHost(result, options);
2424         if (result.startsWith(QLatin1Char('[')))
2425             result = result.mid(1, result.length() - 2);
2426     }
2427     return result;
2428 }
2429 
2430 /*!
2431     \fn void QUrl::setEncodedHost(const QByteArray &host)
2432     \deprecated
2433     \since 4.4
2434 
2435     Sets the URL's host to the ACE- or percent-encoded \a host. The \a
2436     host is part of the user info element in the authority of the
2437     URL, as described in setAuthority().
2438 
2439     \obsolete Use setHost(QString::fromUtf8(host)).
2440 
2441     \sa setHost(), encodedHost(), setAuthority(), fromAce()
2442 */
2443 
2444 /*!
2445     \fn QByteArray QUrl::encodedHost() const
2446     \deprecated
2447     \since 4.4
2448 
2449     Returns the host part of the URL if it is defined; otherwise
2450     an empty string is returned.
2451 
2452     Note: encodedHost() does not return percent-encoded hostnames. Instead,
2453     the ACE-encoded (bare ASCII in Punycode encoding) form will be
2454     returned for any non-ASCII hostname.
2455 
2456     This function is equivalent to calling QUrl::toAce() on the return
2457     value of host().
2458 
2459     \obsolete Use host(QUrl::FullyEncoded).toLatin1() or toAce(host()).
2460 
2461     \sa setEncodedHost()
2462 */
2463 
2464 /*!
2465     Sets the port of the URL to \a port. The port is part of the
2466     authority of the URL, as described in setAuthority().
2467 
2468     \a port must be between 0 and 65535 inclusive. Setting the
2469     port to -1 indicates that the port is unspecified.
2470 */
setPort(int port)2471 void QUrl::setPort(int port)
2472 {
2473     detach();
2474     d->clearError();
2475 
2476     if (port < -1 || port > 65535) {
2477         d->setError(QUrlPrivate::InvalidPortError, QString::number(port), 0);
2478         port = -1;
2479     }
2480 
2481     d->port = port;
2482     if (port != -1)
2483         d->sectionIsPresent |= QUrlPrivate::Host;
2484 }
2485 
2486 /*!
2487     \since 4.1
2488 
2489     Returns the port of the URL, or \a defaultPort if the port is
2490     unspecified.
2491 
2492     Example:
2493 
2494     \snippet code/src_corelib_io_qurl.cpp 3
2495 */
port(int defaultPort) const2496 int QUrl::port(int defaultPort) const
2497 {
2498     if (!d) return defaultPort;
2499     return d->port == -1 ? defaultPort : d->port;
2500 }
2501 
2502 /*!
2503     Sets the path of the URL to \a path. The path is the part of the
2504     URL that comes after the authority but before the query string.
2505 
2506     \image qurl-ftppath.png
2507 
2508     For non-hierarchical schemes, the path will be everything
2509     following the scheme declaration, as in the following example:
2510 
2511     \image qurl-mailtopath.png
2512 
2513     The \a path data is interpreted according to \a mode: in StrictMode,
2514     any '%' characters must be followed by exactly two hexadecimal characters
2515     and some characters (including space) are not allowed in undecoded form. In
2516     TolerantMode, all characters are accepted in undecoded form and the
2517     tolerant parser will correct stray '%' not followed by two hex characters.
2518     In DecodedMode, '%' stand for themselves and encoded characters are not
2519     possible.
2520 
2521     QUrl::DecodedMode should be used when setting the path from a data source
2522     which is not a URL, such as a dialog shown to the user or with a path
2523     obtained by calling path() with the QUrl::FullyDecoded formatting option.
2524 
2525     \sa path()
2526 */
setPath(const QString & path,ParsingMode mode)2527 void QUrl::setPath(const QString &path, ParsingMode mode)
2528 {
2529     detach();
2530     d->clearError();
2531 
2532     QString data = path;
2533     if (mode == DecodedMode) {
2534         parseDecodedComponent(data);
2535         mode = TolerantMode;
2536     }
2537 
2538     d->setPath(data, 0, data.length());
2539 
2540     // optimized out, since there is no path delimiter
2541 //    if (path.isNull())
2542 //        d->sectionIsPresent &= ~QUrlPrivate::Path;
2543 //    else
2544     if (mode == StrictMode && !d->validateComponent(QUrlPrivate::Path, path))
2545         d->path.clear();
2546 }
2547 
2548 /*!
2549     Returns the path of the URL.
2550 
2551     \snippet code/src_corelib_io_qurl.cpp 12
2552 
2553     The \a options argument controls how to format the path component. All
2554     values produce an unambiguous result. With QUrl::FullyDecoded, all
2555     percent-encoded sequences are decoded; otherwise, the returned value may
2556     contain some percent-encoded sequences for some control sequences not
2557     representable in decoded form in QString.
2558 
2559     Note that QUrl::FullyDecoded may cause data loss if those non-representable
2560     sequences are present. It is recommended to use that value when the result
2561     will be used in a non-URL context, such as sending to an FTP server.
2562 
2563     An example of data loss is when you have non-Unicode percent-encoded sequences
2564     and use FullyDecoded (the default):
2565 
2566     \snippet code/src_corelib_io_qurl.cpp 13
2567 
2568     In this example, there will be some level of data loss because the \c %FF cannot
2569     be converted.
2570 
2571     Data loss can also occur when the path contains sub-delimiters (such as \c +):
2572 
2573     \snippet code/src_corelib_io_qurl.cpp 14
2574 
2575     Other decoding examples:
2576 
2577     \snippet code/src_corelib_io_qurl.cpp 15
2578 
2579     \sa setPath()
2580 */
path(ComponentFormattingOptions options) const2581 QString QUrl::path(ComponentFormattingOptions options) const
2582 {
2583     QString result;
2584     if (d)
2585         d->appendPath(result, options, QUrlPrivate::Path);
2586     return result;
2587 }
2588 
2589 /*!
2590     \fn void QUrl::setEncodedPath(const QByteArray &path)
2591     \deprecated
2592     \since 4.4
2593 
2594     Sets the URL's path to the percent-encoded \a path.  The path is
2595     the part of the URL that comes after the authority but before the
2596     query string.
2597 
2598     \image qurl-ftppath.png
2599 
2600     For non-hierarchical schemes, the path will be everything
2601     following the scheme declaration, as in the following example:
2602 
2603     \image qurl-mailtopath.png
2604 
2605     \obsolete Use setPath(QString::fromUtf8(path)).
2606 
2607     \sa setPath(), encodedPath(), setUserInfo()
2608 */
2609 
2610 /*!
2611     \fn QByteArray QUrl::encodedPath() const
2612     \deprecated
2613     \since 4.4
2614 
2615     Returns the path of the URL if it is defined; otherwise an
2616     empty string is returned. The returned value will have its
2617     non-ASCII and other control characters percent-encoded, as in
2618     toEncoded().
2619 
2620     \obsolete Use path(QUrl::FullyEncoded).toLatin1().
2621 
2622     \sa setEncodedPath(), toEncoded()
2623 */
2624 
2625 /*!
2626     \since 5.2
2627 
2628     Returns the name of the file, excluding the directory path.
2629 
2630     Note that, if this QUrl object is given a path ending in a slash, the name of the file is considered empty.
2631 
2632     If the path doesn't contain any slash, it is fully returned as the fileName.
2633 
2634     Example:
2635 
2636     \snippet code/src_corelib_io_qurl.cpp 7
2637 
2638     The \a options argument controls how to format the file name component. All
2639     values produce an unambiguous result. With QUrl::FullyDecoded, all
2640     percent-encoded sequences are decoded; otherwise, the returned value may
2641     contain some percent-encoded sequences for some control sequences not
2642     representable in decoded form in QString.
2643 
2644     \sa path()
2645 */
fileName(ComponentFormattingOptions options) const2646 QString QUrl::fileName(ComponentFormattingOptions options) const
2647 {
2648     const QString ourPath = path(options);
2649     const int slash = ourPath.lastIndexOf(QLatin1Char('/'));
2650     if (slash == -1)
2651         return ourPath;
2652     return ourPath.mid(slash + 1);
2653 }
2654 
2655 /*!
2656     \since 4.2
2657 
2658     Returns \c true if this URL contains a Query (i.e., if ? was seen on it).
2659 
2660     \sa setQuery(), query(), hasFragment()
2661 */
hasQuery() const2662 bool QUrl::hasQuery() const
2663 {
2664     if (!d) return false;
2665     return d->hasQuery();
2666 }
2667 
2668 /*!
2669     Sets the query string of the URL to \a query.
2670 
2671     This function is useful if you need to pass a query string that
2672     does not fit into the key-value pattern, or that uses a different
2673     scheme for encoding special characters than what is suggested by
2674     QUrl.
2675 
2676     Passing a value of QString() to \a query (a null QString) unsets
2677     the query completely. However, passing a value of QString("")
2678     will set the query to an empty value, as if the original URL
2679     had a lone "?".
2680 
2681     The \a query data is interpreted according to \a mode: in StrictMode,
2682     any '%' characters must be followed by exactly two hexadecimal characters
2683     and some characters (including space) are not allowed in undecoded form. In
2684     TolerantMode, all characters are accepted in undecoded form and the
2685     tolerant parser will correct stray '%' not followed by two hex characters.
2686     In DecodedMode, '%' stand for themselves and encoded characters are not
2687     possible.
2688 
2689     Query strings often contain percent-encoded sequences, so use of
2690     DecodedMode is discouraged. One special sequence to be aware of is that of
2691     the plus character ('+'). QUrl does not convert spaces to plus characters,
2692     even though HTML forms posted by web browsers do. In order to represent an
2693     actual plus character in a query, the sequence "%2B" is usually used. This
2694     function will leave "%2B" sequences untouched in TolerantMode or
2695     StrictMode.
2696 
2697     \sa query(), hasQuery()
2698 */
setQuery(const QString & query,ParsingMode mode)2699 void QUrl::setQuery(const QString &query, ParsingMode mode)
2700 {
2701     detach();
2702     d->clearError();
2703 
2704     QString data = query;
2705     if (mode == DecodedMode) {
2706         parseDecodedComponent(data);
2707         mode = TolerantMode;
2708     }
2709 
2710     d->setQuery(data, 0, data.length());
2711     if (query.isNull())
2712         d->sectionIsPresent &= ~QUrlPrivate::Query;
2713     else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::Query, query))
2714         d->query.clear();
2715 }
2716 
2717 /*!
2718     \fn void QUrl::setEncodedQuery(const QByteArray &query)
2719     \deprecated
2720 
2721     Sets the query string of the URL to \a query. The string is
2722     inserted as-is, and no further encoding is performed when calling
2723     toEncoded().
2724 
2725     This function is useful if you need to pass a query string that
2726     does not fit into the key-value pattern, or that uses a different
2727     scheme for encoding special characters than what is suggested by
2728     QUrl.
2729 
2730     Passing a value of QByteArray() to \a query (a null QByteArray) unsets
2731     the query completely. However, passing a value of QByteArray("")
2732     will set the query to an empty value, as if the original URL
2733     had a lone "?".
2734 
2735     \obsolete Use setQuery, which has the same null / empty behavior.
2736 
2737     \sa encodedQuery(), hasQuery()
2738 */
2739 
2740 /*!
2741     \overload
2742     \since 5.0
2743     Sets the query string of the URL to \a query.
2744 
2745     This function reconstructs the query string from the QUrlQuery object and
2746     sets on this QUrl object. This function does not have parsing parameters
2747     because the QUrlQuery contains data that is already parsed.
2748 
2749     \sa query(), hasQuery()
2750 */
setQuery(const QUrlQuery & query)2751 void QUrl::setQuery(const QUrlQuery &query)
2752 {
2753     detach();
2754     d->clearError();
2755 
2756     // we know the data is in the right format
2757     d->query = query.toString();
2758     if (query.isEmpty())
2759         d->sectionIsPresent &= ~QUrlPrivate::Query;
2760     else
2761         d->sectionIsPresent |= QUrlPrivate::Query;
2762 }
2763 
2764 /*!
2765     \fn void QUrl::setQueryItems(const QList<QPair<QString, QString> > &query)
2766     \deprecated
2767 
2768     Sets the query string of the URL to an encoded version of \a
2769     query. The contents of \a query are converted to a string
2770     internally, each pair delimited by the character returned by
2771     \l {QUrlQuery::queryPairDelimiter()}{queryPairDelimiter()}, and the key and value are delimited by
2772     \l {QUrlQuery::queryValueDelimiter()}{queryValueDelimiter()}
2773 
2774     \note This method does not encode spaces (ASCII 0x20) as plus (+) signs,
2775     like HTML forms do. If you need that kind of encoding, you must encode
2776     the value yourself and use QUrl::setEncodedQueryItems.
2777 
2778     \obsolete Use QUrlQuery and setQuery().
2779 
2780     \sa queryItems(), setEncodedQueryItems()
2781 */
2782 
2783 /*!
2784     \fn void QUrl::setEncodedQueryItems(const QList<QPair<QByteArray, QByteArray> > &query)
2785     \deprecated
2786     \since 4.4
2787 
2788     Sets the query string of the URL to the encoded version of \a
2789     query. The contents of \a query are converted to a string
2790     internally, each pair delimited by the character returned by
2791     \l {QUrlQuery::queryPairDelimiter()}{queryPairDelimiter()}, and the key and value are delimited by
2792     \l {QUrlQuery::queryValueDelimiter()}{queryValueDelimiter()}.
2793 
2794     \obsolete Use QUrlQuery and setQuery().
2795 
2796     \sa encodedQueryItems(), setQueryItems()
2797 */
2798 
2799 /*!
2800     \fn void QUrl::addQueryItem(const QString &key, const QString &value)
2801     \deprecated
2802 
2803     Inserts the pair \a key = \a value into the query string of the
2804     URL.
2805 
2806     The key-value pair is encoded before it is added to the query. The
2807     pair is converted into separate strings internally. The \a key and
2808     \a value is first encoded into UTF-8 and then delimited by the
2809     character returned by \l {QUrlQuery::queryValueDelimiter()}{queryValueDelimiter()}.
2810     Each key-value pair is delimited by the character returned by
2811     \l {QUrlQuery::queryPairDelimiter()}{queryPairDelimiter()}
2812 
2813     \note This method does not encode spaces (ASCII 0x20) as plus (+) signs,
2814     like HTML forms do. If you need that kind of encoding, you must encode
2815     the value yourself and use QUrl::addEncodedQueryItem.
2816 
2817     \obsolete Use QUrlQuery and setQuery().
2818 
2819     \sa addEncodedQueryItem()
2820 */
2821 
2822 /*!
2823     \fn void QUrl::addEncodedQueryItem(const QByteArray &key, const QByteArray &value)
2824     \deprecated
2825     \since 4.4
2826 
2827     Inserts the pair \a key = \a value into the query string of the
2828     URL.
2829 
2830     \obsolete Use QUrlQuery and setQuery().
2831 
2832     \sa addQueryItem()
2833 */
2834 
2835 /*!
2836     \fn QList<QPair<QString, QString> > QUrl::queryItems() const
2837     \deprecated
2838 
2839     Returns the query string of the URL, as a map of keys and values.
2840 
2841     \note This method does not decode spaces plus (+) signs as spaces (ASCII
2842     0x20), like HTML forms do. If you need that kind of decoding, you must
2843     use QUrl::encodedQueryItems and decode the data yourself.
2844 
2845     \obsolete Use QUrlQuery.
2846 
2847     \sa setQueryItems(), setEncodedQuery()
2848 */
2849 
2850 /*!
2851     \fn QList<QPair<QByteArray, QByteArray> > QUrl::encodedQueryItems() const
2852     \deprecated
2853     \since 4.4
2854 
2855     Returns the query string of the URL, as a map of encoded keys and values.
2856 
2857     \obsolete Use QUrlQuery.
2858 
2859     \sa setEncodedQueryItems(), setQueryItems(), setEncodedQuery()
2860 */
2861 
2862 /*!
2863     \fn bool QUrl::hasQueryItem(const QString &key) const
2864     \deprecated
2865 
2866     Returns \c true if there is a query string pair whose key is equal
2867     to \a key from the URL.
2868 
2869     \obsolete Use QUrlQuery.
2870 
2871     \sa hasEncodedQueryItem()
2872 */
2873 
2874 /*!
2875     \fn bool QUrl::hasEncodedQueryItem(const QByteArray &key) const
2876     \deprecated
2877     \since 4.4
2878 
2879     Returns \c true if there is a query string pair whose key is equal
2880     to \a key from the URL.
2881 
2882     \obsolete Use QUrlQuery.
2883 
2884     \sa hasQueryItem()
2885 */
2886 
2887 /*!
2888     \fn QString QUrl::queryItemValue(const QString &key) const
2889     \deprecated
2890 
2891     Returns the first query string value whose key is equal to \a key
2892     from the URL.
2893 
2894     \note This method does not decode spaces plus (+) signs as spaces (ASCII
2895     0x20), like HTML forms do. If you need that kind of decoding, you must
2896     use QUrl::encodedQueryItemValue and decode the data yourself.
2897 
2898     \obsolete Use QUrlQuery.
2899 
2900     \sa allQueryItemValues()
2901 */
2902 
2903 /*!
2904     \fn QByteArray QUrl::encodedQueryItemValue(const QByteArray &key) const
2905     \deprecated
2906     \since 4.4
2907 
2908     Returns the first query string value whose key is equal to \a key
2909     from the URL.
2910 
2911     \obsolete Use QUrlQuery.
2912 
2913     \sa queryItemValue(), allQueryItemValues()
2914 */
2915 
2916 /*!
2917     \fn QStringList QUrl::allQueryItemValues(const QString &key) const
2918     \deprecated
2919 
2920     Returns the a list of query string values whose key is equal to
2921     \a key from the URL.
2922 
2923     \note This method does not decode spaces plus (+) signs as spaces (ASCII
2924     0x20), like HTML forms do. If you need that kind of decoding, you must
2925     use QUrl::allEncodedQueryItemValues and decode the data yourself.
2926 
2927     \obsolete Use QUrlQuery.
2928 
2929     \sa queryItemValue()
2930 */
2931 
2932 /*!
2933     \fn QList<QByteArray> QUrl::allEncodedQueryItemValues(const QByteArray &key) const
2934     \deprecated
2935     \since 4.4
2936 
2937     Returns the a list of query string values whose key is equal to
2938     \a key from the URL.
2939 
2940     \obsolete Use QUrlQuery.
2941 
2942     \sa allQueryItemValues(), queryItemValue(), encodedQueryItemValue()
2943 */
2944 
2945 /*!
2946     \fn void QUrl::removeQueryItem(const QString &key)
2947     \deprecated
2948 
2949     Removes the first query string pair whose key is equal to \a key
2950     from the URL.
2951 
2952     \obsolete Use QUrlQuery.
2953 
2954     \sa removeAllQueryItems()
2955 */
2956 
2957 /*!
2958     \fn void QUrl::removeEncodedQueryItem(const QByteArray &key)
2959     \deprecated
2960     \since 4.4
2961 
2962     Removes the first query string pair whose key is equal to \a key
2963     from the URL.
2964 
2965     \obsolete Use QUrlQuery.
2966 
2967     \sa removeQueryItem(), removeAllQueryItems()
2968 */
2969 
2970 /*!
2971     \fn void QUrl::removeAllQueryItems(const QString &key)
2972     \deprecated
2973 
2974     Removes all the query string pairs whose key is equal to \a key
2975     from the URL.
2976 
2977     \obsolete Use QUrlQuery.
2978 
2979    \sa removeQueryItem()
2980 */
2981 
2982 /*!
2983     \fn void QUrl::removeAllEncodedQueryItems(const QByteArray &key)
2984     \deprecated
2985     \since 4.4
2986 
2987     Removes all the query string pairs whose key is equal to \a key
2988     from the URL.
2989 
2990     \obsolete Use QUrlQuery.
2991 
2992    \sa removeQueryItem()
2993 */
2994 
2995 /*!
2996     \fn QByteArray QUrl::encodedQuery() const
2997     \deprecated
2998 
2999     Returns the query string of the URL in percent encoded form.
3000 
3001     \obsolete Use query(QUrl::FullyEncoded).toLatin1()
3002 
3003     \sa setEncodedQuery(), query()
3004 */
3005 
3006 /*!
3007     Returns the query string of the URL if there's a query string, or an empty
3008     result if not. To determine if the parsed URL contained a query string, use
3009     hasQuery().
3010 
3011     The \a options argument controls how to format the query component. All
3012     values produce an unambiguous result. With QUrl::FullyDecoded, all
3013     percent-encoded sequences are decoded; otherwise, the returned value may
3014     contain some percent-encoded sequences for some control sequences not
3015     representable in decoded form in QString.
3016 
3017     Note that use of QUrl::FullyDecoded in queries is discouraged, as queries
3018     often contain data that is supposed to remain percent-encoded, including
3019     the use of the "%2B" sequence to represent a plus character ('+').
3020 
3021     \sa setQuery(), hasQuery()
3022 */
query(ComponentFormattingOptions options) const3023 QString QUrl::query(ComponentFormattingOptions options) const
3024 {
3025     QString result;
3026     if (d) {
3027         d->appendQuery(result, options, QUrlPrivate::Query);
3028         if (d->hasQuery() && result.isNull())
3029             result.detach();
3030     }
3031     return result;
3032 }
3033 
3034 /*!
3035     Sets the fragment of the URL to \a fragment. The fragment is the
3036     last part of the URL, represented by a '#' followed by a string of
3037     characters. It is typically used in HTTP for referring to a
3038     certain link or point on a page:
3039 
3040     \image qurl-fragment.png
3041 
3042     The fragment is sometimes also referred to as the URL "reference".
3043 
3044     Passing an argument of QString() (a null QString) will unset the fragment.
3045     Passing an argument of QString("") (an empty but not null QString) will set the
3046     fragment to an empty string (as if the original URL had a lone "#").
3047 
3048     The \a fragment data is interpreted according to \a mode: in StrictMode,
3049     any '%' characters must be followed by exactly two hexadecimal characters
3050     and some characters (including space) are not allowed in undecoded form. In
3051     TolerantMode, all characters are accepted in undecoded form and the
3052     tolerant parser will correct stray '%' not followed by two hex characters.
3053     In DecodedMode, '%' stand for themselves and encoded characters are not
3054     possible.
3055 
3056     QUrl::DecodedMode should be used when setting the fragment from a data
3057     source which is not a URL or with a fragment obtained by calling
3058     fragment() with the QUrl::FullyDecoded formatting option.
3059 
3060     \sa fragment(), hasFragment()
3061 */
setFragment(const QString & fragment,ParsingMode mode)3062 void QUrl::setFragment(const QString &fragment, ParsingMode mode)
3063 {
3064     detach();
3065     d->clearError();
3066 
3067     QString data = fragment;
3068     if (mode == DecodedMode) {
3069         parseDecodedComponent(data);
3070         mode = TolerantMode;
3071     }
3072 
3073     d->setFragment(data, 0, data.length());
3074     if (fragment.isNull())
3075         d->sectionIsPresent &= ~QUrlPrivate::Fragment;
3076     else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::Fragment, fragment))
3077         d->fragment.clear();
3078 }
3079 
3080 /*!
3081     Returns the fragment of the URL. To determine if the parsed URL contained a
3082     fragment, use hasFragment().
3083 
3084     The \a options argument controls how to format the fragment component. All
3085     values produce an unambiguous result. With QUrl::FullyDecoded, all
3086     percent-encoded sequences are decoded; otherwise, the returned value may
3087     contain some percent-encoded sequences for some control sequences not
3088     representable in decoded form in QString.
3089 
3090     Note that QUrl::FullyDecoded may cause data loss if those non-representable
3091     sequences are present. It is recommended to use that value when the result
3092     will be used in a non-URL context.
3093 
3094     \sa setFragment(), hasFragment()
3095 */
fragment(ComponentFormattingOptions options) const3096 QString QUrl::fragment(ComponentFormattingOptions options) const
3097 {
3098     QString result;
3099     if (d) {
3100         d->appendFragment(result, options, QUrlPrivate::Fragment);
3101         if (d->hasFragment() && result.isNull())
3102             result.detach();
3103     }
3104     return result;
3105 }
3106 
3107 /*!
3108     \fn void QUrl::setEncodedFragment(const QByteArray &fragment)
3109     \deprecated
3110     \since 4.4
3111 
3112     Sets the URL's fragment to the percent-encoded \a fragment. The fragment is the
3113     last part of the URL, represented by a '#' followed by a string of
3114     characters. It is typically used in HTTP for referring to a
3115     certain link or point on a page:
3116 
3117     \image qurl-fragment.png
3118 
3119     The fragment is sometimes also referred to as the URL "reference".
3120 
3121     Passing an argument of QByteArray() (a null QByteArray) will unset the fragment.
3122     Passing an argument of QByteArray("") (an empty but not null QByteArray)
3123     will set the fragment to an empty string (as if the original URL
3124     had a lone "#").
3125 
3126     \obsolete Use setFragment(), which has the same behavior of null / empty.
3127 
3128     \sa setFragment(), encodedFragment()
3129 */
3130 
3131 /*!
3132     \fn QByteArray QUrl::encodedFragment() const
3133     \deprecated
3134     \since 4.4
3135 
3136     Returns the fragment of the URL if it is defined; otherwise an
3137     empty string is returned. The returned value will have its
3138     non-ASCII and other control characters percent-encoded, as in
3139     toEncoded().
3140 
3141     \obsolete Use query(QUrl::FullyEncoded).toLatin1().
3142 
3143     \sa setEncodedFragment(), toEncoded()
3144 */
3145 
3146 /*!
3147     \since 4.2
3148 
3149     Returns \c true if this URL contains a fragment (i.e., if # was seen on it).
3150 
3151     \sa fragment(), setFragment()
3152 */
hasFragment() const3153 bool QUrl::hasFragment() const
3154 {
3155     if (!d) return false;
3156     return d->hasFragment();
3157 }
3158 
3159 #if QT_DEPRECATED_SINCE(5, 15)
3160 #if QT_CONFIG(topleveldomain)
3161 /*!
3162     \since 4.8
3163 
3164     \deprecated
3165 
3166     Returns the TLD (Top-Level Domain) of the URL, (e.g. .co.uk, .net).
3167     Note that the return value is prefixed with a '.' unless the
3168     URL does not contain a valid TLD, in which case the function returns
3169     an empty string.
3170 
3171     Note that this function considers a TLD to be any domain that allows users
3172     to register subdomains under, including many home, dynamic DNS websites and
3173     blogging providers. This is useful for determining whether two websites
3174     belong to the same infrastructure and communication should be allowed, such
3175     as browser cookies: two domains should be considered part of the same
3176     website if they share at least one label in addition to the value
3177     returned by this function.
3178 
3179     \list
3180       \li \c{foo.co.uk} and \c{foo.com} do not share a top-level domain
3181       \li \c{foo.co.uk} and \c{bar.co.uk} share the \c{.co.uk} domain, but the next label is different
3182       \li \c{www.foo.co.uk} and \c{ftp.foo.co.uk} share the same top-level domain and one more label,
3183           so they are considered part of the same site
3184     \endlist
3185 
3186     If \a options includes EncodeUnicode, the returned string will be in
3187     ASCII Compatible Encoding.
3188 */
topLevelDomain(ComponentFormattingOptions options) const3189 QString QUrl::topLevelDomain(ComponentFormattingOptions options) const
3190 {
3191     QString tld = qTopLevelDomain(host());
3192     if (options & EncodeUnicode) {
3193         return qt_ACE_do(tld, ToAceOnly, AllowLeadingDot);
3194     }
3195     return tld;
3196 }
3197 #endif
3198 #endif // QT_DEPRECATED_SINCE(5, 15)
3199 /*!
3200     Returns the result of the merge of this URL with \a relative. This
3201     URL is used as a base to convert \a relative to an absolute URL.
3202 
3203     If \a relative is not a relative URL, this function will return \a
3204     relative directly. Otherwise, the paths of the two URLs are
3205     merged, and the new URL returned has the scheme and authority of
3206     the base URL, but with the merged path, as in the following
3207     example:
3208 
3209     \snippet code/src_corelib_io_qurl.cpp 5
3210 
3211     Calling resolved() with ".." returns a QUrl whose directory is
3212     one level higher than the original. Similarly, calling resolved()
3213     with "../.." removes two levels from the path. If \a relative is
3214     "/", the path becomes "/".
3215 
3216     \sa isRelative()
3217 */
resolved(const QUrl & relative) const3218 QUrl QUrl::resolved(const QUrl &relative) const
3219 {
3220     if (!d) return relative;
3221     if (!relative.d) return *this;
3222 
3223     QUrl t;
3224     if (!relative.d->scheme.isEmpty()) {
3225         t = relative;
3226         t.detach();
3227     } else {
3228         if (relative.d->hasAuthority()) {
3229             t = relative;
3230             t.detach();
3231         } else {
3232             t.d = new QUrlPrivate;
3233 
3234             // copy the authority
3235             t.d->userName = d->userName;
3236             t.d->password = d->password;
3237             t.d->host = d->host;
3238             t.d->port = d->port;
3239             t.d->sectionIsPresent = d->sectionIsPresent & QUrlPrivate::Authority;
3240 
3241             if (relative.d->path.isEmpty()) {
3242                 t.d->path = d->path;
3243                 if (relative.d->hasQuery()) {
3244                     t.d->query = relative.d->query;
3245                     t.d->sectionIsPresent |= QUrlPrivate::Query;
3246                 } else if (d->hasQuery()) {
3247                     t.d->query = d->query;
3248                     t.d->sectionIsPresent |= QUrlPrivate::Query;
3249                 }
3250             } else {
3251                 t.d->path = relative.d->path.startsWith(QLatin1Char('/'))
3252                             ? relative.d->path
3253                             : d->mergePaths(relative.d->path);
3254                 if (relative.d->hasQuery()) {
3255                     t.d->query = relative.d->query;
3256                     t.d->sectionIsPresent |= QUrlPrivate::Query;
3257                 }
3258             }
3259         }
3260         t.d->scheme = d->scheme;
3261         if (d->hasScheme())
3262             t.d->sectionIsPresent |= QUrlPrivate::Scheme;
3263         else
3264             t.d->sectionIsPresent &= ~QUrlPrivate::Scheme;
3265         t.d->flags |= d->flags & QUrlPrivate::IsLocalFile;
3266     }
3267     t.d->fragment = relative.d->fragment;
3268     if (relative.d->hasFragment())
3269         t.d->sectionIsPresent |= QUrlPrivate::Fragment;
3270     else
3271         t.d->sectionIsPresent &= ~QUrlPrivate::Fragment;
3272 
3273     removeDotsFromPath(&t.d->path);
3274 
3275 #if defined(QURL_DEBUG)
3276     qDebug("QUrl(\"%ls\").resolved(\"%ls\") = \"%ls\"",
3277            qUtf16Printable(url()),
3278            qUtf16Printable(relative.url()),
3279            qUtf16Printable(t.url()));
3280 #endif
3281     return t;
3282 }
3283 
3284 /*!
3285     Returns \c true if the URL is relative; otherwise returns \c false. A URL is
3286     relative reference if its scheme is undefined; this function is therefore
3287     equivalent to calling scheme().isEmpty().
3288 
3289     Relative references are defined in RFC 3986 section 4.2.
3290 
3291     \sa {Relative URLs vs Relative Paths}
3292 */
isRelative() const3293 bool QUrl::isRelative() const
3294 {
3295     if (!d) return true;
3296     return !d->hasScheme();
3297 }
3298 
3299 /*!
3300     Returns a string representation of the URL. The output can be customized by
3301     passing flags with \a options. The option QUrl::FullyDecoded is not
3302     permitted in this function since it would generate ambiguous data.
3303 
3304     The resulting QString can be passed back to a QUrl later on.
3305 
3306     Synonym for toString(options).
3307 
3308     \sa FormattingOptions, toEncoded(), toString()
3309 */
url(FormattingOptions options) const3310 QString QUrl::url(FormattingOptions options) const
3311 {
3312     return toString(options);
3313 }
3314 
3315 /*!
3316     Returns a string representation of the URL. The output can be customized by
3317     passing flags with \a options. The option QUrl::FullyDecoded is not
3318     permitted in this function since it would generate ambiguous data.
3319 
3320     The default formatting option is \l{QUrl::FormattingOptions}{PrettyDecoded}.
3321 
3322     \sa FormattingOptions, url(), setUrl()
3323 */
toString(FormattingOptions options) const3324 QString QUrl::toString(FormattingOptions options) const
3325 {
3326     QString url;
3327     if (!isValid()) {
3328         // also catches isEmpty()
3329         return url;
3330     }
3331     if ((options & QUrl::FullyDecoded) == QUrl::FullyDecoded) {
3332         qWarning("QUrl: QUrl::FullyDecoded is not permitted when reconstructing the full URL");
3333         options &= ~QUrl::FullyDecoded;
3334         //options |= QUrl::PrettyDecoded; // no-op, value is 0
3335     }
3336 
3337     // return just the path if:
3338     //  - QUrl::PreferLocalFile is passed
3339     //  - QUrl::RemovePath isn't passed (rather stupid if the user did...)
3340     //  - there's no query or fragment to return
3341     //    that is, either they aren't present, or we're removing them
3342     //  - it's a local file
3343     if (options.testFlag(QUrl::PreferLocalFile) && !options.testFlag(QUrl::RemovePath)
3344             && (!d->hasQuery() || options.testFlag(QUrl::RemoveQuery))
3345             && (!d->hasFragment() || options.testFlag(QUrl::RemoveFragment))
3346             && isLocalFile()) {
3347         url = d->toLocalFile(options | QUrl::FullyDecoded);
3348         return url;
3349     }
3350 
3351     // for the full URL, we consider that the reserved characters are prettier if encoded
3352     if (options & DecodeReserved)
3353         options &= ~EncodeReserved;
3354     else
3355         options |= EncodeReserved;
3356 
3357     if (!(options & QUrl::RemoveScheme) && d->hasScheme())
3358         url += d->scheme + QLatin1Char(':');
3359 
3360     bool pathIsAbsolute = d->path.startsWith(QLatin1Char('/'));
3361     if (!((options & QUrl::RemoveAuthority) == QUrl::RemoveAuthority) && d->hasAuthority()) {
3362         url += QLatin1String("//");
3363         d->appendAuthority(url, options, QUrlPrivate::FullUrl);
3364     } else if (isLocalFile() && pathIsAbsolute) {
3365         // Comply with the XDG file URI spec, which requires triple slashes.
3366         url += QLatin1String("//");
3367     }
3368 
3369     if (!(options & QUrl::RemovePath))
3370         d->appendPath(url, options, QUrlPrivate::FullUrl);
3371 
3372     if (!(options & QUrl::RemoveQuery) && d->hasQuery()) {
3373         url += QLatin1Char('?');
3374         d->appendQuery(url, options, QUrlPrivate::FullUrl);
3375     }
3376     if (!(options & QUrl::RemoveFragment) && d->hasFragment()) {
3377         url += QLatin1Char('#');
3378         d->appendFragment(url, options, QUrlPrivate::FullUrl);
3379     }
3380 
3381     return url;
3382 }
3383 
3384 /*!
3385     \since 5.0
3386 
3387     Returns a human-displayable string representation of the URL.
3388     The output can be customized by passing flags with \a options.
3389     The option RemovePassword is always enabled, since passwords
3390     should never be shown back to users.
3391 
3392     With the default options, the resulting QString can be passed back
3393     to a QUrl later on, but any password that was present initially will
3394     be lost.
3395 
3396     \sa FormattingOptions, toEncoded(), toString()
3397 */
3398 
toDisplayString(FormattingOptions options) const3399 QString QUrl::toDisplayString(FormattingOptions options) const
3400 {
3401     return toString(options | RemovePassword);
3402 }
3403 
3404 /*!
3405     \since 5.2
3406 
3407     Returns an adjusted version of the URL.
3408     The output can be customized by passing flags with \a options.
3409 
3410     The encoding options from QUrl::ComponentFormattingOption don't make
3411     much sense for this method, nor does QUrl::PreferLocalFile.
3412 
3413     This is always equivalent to QUrl(url.toString(options)).
3414 
3415     \sa FormattingOptions, toEncoded(), toString()
3416 */
adjusted(QUrl::FormattingOptions options) const3417 QUrl QUrl::adjusted(QUrl::FormattingOptions options) const
3418 {
3419     if (!isValid()) {
3420         // also catches isEmpty()
3421         return QUrl();
3422     }
3423     QUrl that = *this;
3424     if (options & RemoveScheme)
3425         that.setScheme(QString());
3426     if ((options & RemoveAuthority) == RemoveAuthority) {
3427         that.setAuthority(QString());
3428     } else {
3429         if ((options & RemoveUserInfo) == RemoveUserInfo)
3430             that.setUserInfo(QString());
3431         else if (options & RemovePassword)
3432             that.setPassword(QString());
3433         if (options & RemovePort)
3434             that.setPort(-1);
3435     }
3436     if (options & RemoveQuery)
3437         that.setQuery(QString());
3438     if (options & RemoveFragment)
3439         that.setFragment(QString());
3440     if (options & RemovePath) {
3441         that.setPath(QString());
3442     } else if (options & (StripTrailingSlash | RemoveFilename | NormalizePathSegments)) {
3443         that.detach();
3444         QString path;
3445         d->appendPath(path, options | FullyEncoded, QUrlPrivate::Path);
3446         that.d->setPath(path, 0, path.length());
3447     }
3448     return that;
3449 }
3450 
3451 /*!
3452     Returns the encoded representation of the URL if it's valid;
3453     otherwise an empty QByteArray is returned. The output can be
3454     customized by passing flags with \a options.
3455 
3456     The user info, path and fragment are all converted to UTF-8, and
3457     all non-ASCII characters are then percent encoded. The host name
3458     is encoded using Punycode.
3459 */
toEncoded(FormattingOptions options) const3460 QByteArray QUrl::toEncoded(FormattingOptions options) const
3461 {
3462     options &= ~(FullyDecoded | FullyEncoded);
3463     return toString(options | FullyEncoded).toLatin1();
3464 }
3465 
3466 /*!
3467     \fn QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode parsingMode)
3468 
3469     Parses \a input and returns the corresponding QUrl. \a input is
3470     assumed to be in encoded form, containing only ASCII characters.
3471 
3472     Parses the URL using \a parsingMode. See setUrl() for more information on
3473     this parameter. QUrl::DecodedMode is not permitted in this context.
3474 
3475     \sa toEncoded(), setUrl()
3476 */
fromEncoded(const QByteArray & input,ParsingMode mode)3477 QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode mode)
3478 {
3479     return QUrl(QString::fromUtf8(input.constData(), input.size()), mode);
3480 }
3481 
3482 /*!
3483     Returns a decoded copy of \a input. \a input is first decoded from
3484     percent encoding, then converted from UTF-8 to unicode.
3485 
3486     \note Given invalid input (such as a string containing the sequence "%G5",
3487     which is not a valid hexadecimal number) the output will be invalid as
3488     well. As an example: the sequence "%G5" could be decoded to 'W'.
3489 */
fromPercentEncoding(const QByteArray & input)3490 QString QUrl::fromPercentEncoding(const QByteArray &input)
3491 {
3492     QByteArray ba = QByteArray::fromPercentEncoding(input);
3493     return QString::fromUtf8(ba, ba.size());
3494 }
3495 
3496 /*!
3497     Returns an encoded copy of \a input. \a input is first converted
3498     to UTF-8, and all ASCII-characters that are not in the unreserved group
3499     are percent encoded. To prevent characters from being percent encoded
3500     pass them to \a exclude. To force characters to be percent encoded pass
3501     them to \a include.
3502 
3503     Unreserved is defined as:
3504        \tt {ALPHA / DIGIT / "-" / "." / "_" / "~"}
3505 
3506     \snippet code/src_corelib_io_qurl.cpp 6
3507 */
toPercentEncoding(const QString & input,const QByteArray & exclude,const QByteArray & include)3508 QByteArray QUrl::toPercentEncoding(const QString &input, const QByteArray &exclude, const QByteArray &include)
3509 {
3510     return input.toUtf8().toPercentEncoding(exclude, include);
3511 }
3512 
3513 /*!
3514     \internal
3515     \since 5.0
3516     Used in the setEncodedXXX compatibility functions. Converts \a ba to
3517     QString form.
3518 */
fromEncodedComponent_helper(const QByteArray & ba)3519 QString QUrl::fromEncodedComponent_helper(const QByteArray &ba)
3520 {
3521     return qt_urlRecodeByteArray(ba);
3522 }
3523 
3524 /*!
3525     \fn QByteArray QUrl::toPunycode(const QString &uc)
3526     \obsolete
3527     Returns a \a uc in Punycode encoding.
3528 
3529     Punycode is a Unicode encoding used for internationalized domain
3530     names, as defined in RFC3492. If you want to convert a domain name from
3531     Unicode to its ASCII-compatible representation, use toAce().
3532 */
3533 
3534 /*!
3535     \fn QString QUrl::fromPunycode(const QByteArray &pc)
3536     \obsolete
3537     Returns the Punycode decoded representation of \a pc.
3538 
3539     Punycode is a Unicode encoding used for internationalized domain
3540     names, as defined in RFC3492. If you want to convert a domain from
3541     its ASCII-compatible encoding to the Unicode representation, use
3542     fromAce().
3543 */
3544 
3545 /*!
3546     \since 4.2
3547 
3548     Returns the Unicode form of the given domain name
3549     \a domain, which is encoded in the ASCII Compatible Encoding (ACE).
3550     The result of this function is considered equivalent to \a domain.
3551 
3552     If the value in \a domain cannot be encoded, it will be converted
3553     to QString and returned.
3554 
3555     The ASCII Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
3556     and RFC 3492. It is part of the Internationalizing Domain Names in
3557     Applications (IDNA) specification, which allows for domain names
3558     (like \c "example.com") to be written using international
3559     characters.
3560 */
fromAce(const QByteArray & domain)3561 QString QUrl::fromAce(const QByteArray &domain)
3562 {
3563     return qt_ACE_do(QString::fromLatin1(domain), NormalizeAce, ForbidLeadingDot /*FIXME: make configurable*/);
3564 }
3565 
3566 /*!
3567     \since 4.2
3568 
3569     Returns the ASCII Compatible Encoding of the given domain name \a domain.
3570     The result of this function is considered equivalent to \a domain.
3571 
3572     The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
3573     and RFC 3492. It is part of the Internationalizing Domain Names in
3574     Applications (IDNA) specification, which allows for domain names
3575     (like \c "example.com") to be written using international
3576     characters.
3577 
3578     This function returns an empty QByteArray if \a domain is not a valid
3579     hostname. Note, in particular, that IPv6 literals are not valid domain
3580     names.
3581 */
toAce(const QString & domain)3582 QByteArray QUrl::toAce(const QString &domain)
3583 {
3584     return qt_ACE_do(domain, ToAceOnly, ForbidLeadingDot /*FIXME: make configurable*/).toLatin1();
3585 }
3586 
3587 /*!
3588     \internal
3589 
3590     Returns \c true if this URL is "less than" the given \a url. This
3591     provides a means of ordering URLs.
3592 */
operator <(const QUrl & url) const3593 bool QUrl::operator <(const QUrl &url) const
3594 {
3595     if (!d || !url.d) {
3596         bool thisIsEmpty = !d || d->isEmpty();
3597         bool thatIsEmpty = !url.d || url.d->isEmpty();
3598 
3599         // sort an empty URL first
3600         return thisIsEmpty && !thatIsEmpty;
3601     }
3602 
3603     int cmp;
3604     cmp = d->scheme.compare(url.d->scheme);
3605     if (cmp != 0)
3606         return cmp < 0;
3607 
3608     cmp = d->userName.compare(url.d->userName);
3609     if (cmp != 0)
3610         return cmp < 0;
3611 
3612     cmp = d->password.compare(url.d->password);
3613     if (cmp != 0)
3614         return cmp < 0;
3615 
3616     cmp = d->host.compare(url.d->host);
3617     if (cmp != 0)
3618         return cmp < 0;
3619 
3620     if (d->port != url.d->port)
3621         return d->port < url.d->port;
3622 
3623     cmp = d->path.compare(url.d->path);
3624     if (cmp != 0)
3625         return cmp < 0;
3626 
3627     if (d->hasQuery() != url.d->hasQuery())
3628         return url.d->hasQuery();
3629 
3630     cmp = d->query.compare(url.d->query);
3631     if (cmp != 0)
3632         return cmp < 0;
3633 
3634     if (d->hasFragment() != url.d->hasFragment())
3635         return url.d->hasFragment();
3636 
3637     cmp = d->fragment.compare(url.d->fragment);
3638     return cmp < 0;
3639 }
3640 
3641 /*!
3642     Returns \c true if this URL and the given \a url are equal;
3643     otherwise returns \c false.
3644 */
operator ==(const QUrl & url) const3645 bool QUrl::operator ==(const QUrl &url) const
3646 {
3647     if (!d && !url.d)
3648         return true;
3649     if (!d)
3650         return url.d->isEmpty();
3651     if (!url.d)
3652         return d->isEmpty();
3653 
3654     // First, compare which sections are present, since it speeds up the
3655     // processing considerably. We just have to ignore the host-is-present flag
3656     // for local files (the "file" protocol), due to the requirements of the
3657     // XDG file URI specification.
3658     int mask = QUrlPrivate::FullUrl;
3659     if (isLocalFile())
3660         mask &= ~QUrlPrivate::Host;
3661     return (d->sectionIsPresent & mask) == (url.d->sectionIsPresent & mask) &&
3662             d->scheme == url.d->scheme &&
3663             d->userName == url.d->userName &&
3664             d->password == url.d->password &&
3665             d->host == url.d->host &&
3666             d->port == url.d->port &&
3667             d->path == url.d->path &&
3668             d->query == url.d->query &&
3669             d->fragment == url.d->fragment;
3670 }
3671 
3672 /*!
3673     \since 5.2
3674 
3675     Returns \c true if this URL and the given \a url are equal after
3676     applying \a options to both; otherwise returns \c false.
3677 
3678     This is equivalent to calling adjusted(options) on both URLs
3679     and comparing the resulting urls, but faster.
3680 
3681 */
matches(const QUrl & url,FormattingOptions options) const3682 bool QUrl::matches(const QUrl &url, FormattingOptions options) const
3683 {
3684     if (!d && !url.d)
3685         return true;
3686     if (!d)
3687         return url.d->isEmpty();
3688     if (!url.d)
3689         return d->isEmpty();
3690 
3691     // First, compare which sections are present, since it speeds up the
3692     // processing considerably. We just have to ignore the host-is-present flag
3693     // for local files (the "file" protocol), due to the requirements of the
3694     // XDG file URI specification.
3695     int mask = QUrlPrivate::FullUrl;
3696     if (isLocalFile())
3697         mask &= ~QUrlPrivate::Host;
3698 
3699     if (options.testFlag(QUrl::RemoveScheme))
3700         mask &= ~QUrlPrivate::Scheme;
3701     else if (d->scheme != url.d->scheme)
3702         return false;
3703 
3704     if (options.testFlag(QUrl::RemovePassword))
3705         mask &= ~QUrlPrivate::Password;
3706     else if (d->password != url.d->password)
3707         return false;
3708 
3709     if (options.testFlag(QUrl::RemoveUserInfo))
3710         mask &= ~QUrlPrivate::UserName;
3711     else if (d->userName != url.d->userName)
3712         return false;
3713 
3714     if (options.testFlag(QUrl::RemovePort))
3715         mask &= ~QUrlPrivate::Port;
3716     else if (d->port != url.d->port)
3717         return false;
3718 
3719     if (options.testFlag(QUrl::RemoveAuthority))
3720         mask &= ~QUrlPrivate::Host;
3721     else if (d->host != url.d->host)
3722         return false;
3723 
3724     if (options.testFlag(QUrl::RemoveQuery))
3725         mask &= ~QUrlPrivate::Query;
3726     else if (d->query != url.d->query)
3727         return false;
3728 
3729     if (options.testFlag(QUrl::RemoveFragment))
3730         mask &= ~QUrlPrivate::Fragment;
3731     else if (d->fragment != url.d->fragment)
3732         return false;
3733 
3734     if ((d->sectionIsPresent & mask) != (url.d->sectionIsPresent & mask))
3735         return false;
3736 
3737     if (options.testFlag(QUrl::RemovePath))
3738         return true;
3739 
3740     // Compare paths, after applying path-related options
3741     QString path1;
3742     d->appendPath(path1, options, QUrlPrivate::Path);
3743     QString path2;
3744     url.d->appendPath(path2, options, QUrlPrivate::Path);
3745     return path1 == path2;
3746 }
3747 
3748 /*!
3749     Returns \c true if this URL and the given \a url are not equal;
3750     otherwise returns \c false.
3751 */
operator !=(const QUrl & url) const3752 bool QUrl::operator !=(const QUrl &url) const
3753 {
3754     return !(*this == url);
3755 }
3756 
3757 /*!
3758     Assigns the specified \a url to this object.
3759 */
operator =(const QUrl & url)3760 QUrl &QUrl::operator =(const QUrl &url)
3761 {
3762     if (!d) {
3763         if (url.d) {
3764             url.d->ref.ref();
3765             d = url.d;
3766         }
3767     } else {
3768         if (url.d)
3769             qAtomicAssign(d, url.d);
3770         else
3771             clear();
3772     }
3773     return *this;
3774 }
3775 
3776 /*!
3777     Assigns the specified \a url to this object.
3778 */
operator =(const QString & url)3779 QUrl &QUrl::operator =(const QString &url)
3780 {
3781     if (url.isEmpty()) {
3782         clear();
3783     } else {
3784         detach();
3785         d->parse(url, TolerantMode);
3786     }
3787     return *this;
3788 }
3789 
3790 /*!
3791     \fn void QUrl::swap(QUrl &other)
3792     \since 4.8
3793 
3794     Swaps URL \a other with this URL. This operation is very
3795     fast and never fails.
3796 */
3797 
3798 /*!
3799     \internal
3800 
3801     Forces a detach.
3802 */
detach()3803 void QUrl::detach()
3804 {
3805     if (!d)
3806         d = new QUrlPrivate;
3807     else
3808         qAtomicDetach(d);
3809 }
3810 
3811 /*!
3812     \internal
3813 */
isDetached() const3814 bool QUrl::isDetached() const
3815 {
3816     return !d || d->ref.loadRelaxed() == 1;
3817 }
3818 
3819 
3820 /*!
3821     Returns a QUrl representation of \a localFile, interpreted as a local
3822     file. This function accepts paths separated by slashes as well as the
3823     native separator for this platform.
3824 
3825     This function also accepts paths with a doubled leading slash (or
3826     backslash) to indicate a remote file, as in
3827     "//servername/path/to/file.txt". Note that only certain platforms can
3828     actually open this file using QFile::open().
3829 
3830     An empty \a localFile leads to an empty URL (since Qt 5.4).
3831 
3832     \snippet code/src_corelib_io_qurl.cpp 16
3833 
3834     In the first line in snippet above, a file URL is constructed from a
3835     local, relative path. A file URL with a relative path only makes sense
3836     if there is a base URL to resolve it against. For example:
3837 
3838     \snippet code/src_corelib_io_qurl.cpp 17
3839 
3840     To resolve such a URL, it's necessary to remove the scheme beforehand:
3841 
3842     \snippet code/src_corelib_io_qurl.cpp 18
3843 
3844     For this reason, it is better to use a relative URL (that is, no scheme)
3845     for relative file paths:
3846 
3847     \snippet code/src_corelib_io_qurl.cpp 19
3848 
3849     \sa toLocalFile(), isLocalFile(), QDir::toNativeSeparators()
3850 */
fromLocalFile(const QString & localFile)3851 QUrl QUrl::fromLocalFile(const QString &localFile)
3852 {
3853     QUrl url;
3854     if (localFile.isEmpty())
3855         return url;
3856     QString scheme = fileScheme();
3857     QString deslashified = QDir::fromNativeSeparators(localFile);
3858 
3859     // magic for drives on windows
3860     if (deslashified.length() > 1 && deslashified.at(1) == QLatin1Char(':') && deslashified.at(0) != QLatin1Char('/')) {
3861         deslashified.prepend(QLatin1Char('/'));
3862     } else if (deslashified.startsWith(QLatin1String("//"))) {
3863         // magic for shared drive on windows
3864         int indexOfPath = deslashified.indexOf(QLatin1Char('/'), 2);
3865         QStringRef hostSpec = deslashified.midRef(2, indexOfPath - 2);
3866         // Check for Windows-specific WebDAV specification: "//host@SSL/path".
3867         if (hostSpec.endsWith(webDavSslTag(), Qt::CaseInsensitive)) {
3868             hostSpec.truncate(hostSpec.size() - 4);
3869             scheme = webDavScheme();
3870         }
3871 
3872         // hosts can't be IPv6 addresses without [], so we can use QUrlPrivate::setHost
3873         url.detach();
3874         if (!url.d->setHost(hostSpec.toString(), 0, hostSpec.size(), StrictMode)) {
3875             if (url.d->error->code != QUrlPrivate::InvalidRegNameError)
3876                 return url;
3877 
3878             // Path hostname is not a valid URL host, so set it entirely in the path
3879             // (by leaving deslashified unchanged)
3880         } else if (indexOfPath > 2) {
3881             deslashified = deslashified.right(deslashified.length() - indexOfPath);
3882         } else {
3883             deslashified.clear();
3884         }
3885     }
3886 
3887     url.setScheme(scheme);
3888     url.setPath(deslashified, DecodedMode);
3889     return url;
3890 }
3891 
3892 /*!
3893     Returns the path of this URL formatted as a local file path. The path
3894     returned will use forward slashes, even if it was originally created
3895     from one with backslashes.
3896 
3897     If this URL contains a non-empty hostname, it will be encoded in the
3898     returned value in the form found on SMB networks (for example,
3899     "//servername/path/to/file.txt").
3900 
3901     \snippet code/src_corelib_io_qurl.cpp 20
3902 
3903     Note: if the path component of this URL contains a non-UTF-8 binary
3904     sequence (such as %80), the behaviour of this function is undefined.
3905 
3906     \sa fromLocalFile(), isLocalFile()
3907 */
toLocalFile() const3908 QString QUrl::toLocalFile() const
3909 {
3910     // the call to isLocalFile() also ensures that we're parsed
3911     if (!isLocalFile())
3912         return QString();
3913 
3914     return d->toLocalFile(QUrl::FullyDecoded);
3915 }
3916 
3917 /*!
3918     \since 4.8
3919     Returns \c true if this URL is pointing to a local file path. A URL is a
3920     local file path if the scheme is "file".
3921 
3922     Note that this function considers URLs with hostnames to be local file
3923     paths, even if the eventual file path cannot be opened with
3924     QFile::open().
3925 
3926     \sa fromLocalFile(), toLocalFile()
3927 */
isLocalFile() const3928 bool QUrl::isLocalFile() const
3929 {
3930     return d && d->isLocalFile();
3931 }
3932 
3933 /*!
3934     Returns \c true if this URL is a parent of \a childUrl. \a childUrl is a child
3935     of this URL if the two URLs share the same scheme and authority,
3936     and this URL's path is a parent of the path of \a childUrl.
3937 */
isParentOf(const QUrl & childUrl) const3938 bool QUrl::isParentOf(const QUrl &childUrl) const
3939 {
3940     QString childPath = childUrl.path();
3941 
3942     if (!d)
3943         return ((childUrl.scheme().isEmpty())
3944             && (childUrl.authority().isEmpty())
3945             && childPath.length() > 0 && childPath.at(0) == QLatin1Char('/'));
3946 
3947     QString ourPath = path();
3948 
3949     return ((childUrl.scheme().isEmpty() || d->scheme == childUrl.scheme())
3950             && (childUrl.authority().isEmpty() || authority() == childUrl.authority())
3951             &&  childPath.startsWith(ourPath)
3952             && ((ourPath.endsWith(QLatin1Char('/')) && childPath.length() > ourPath.length())
3953                 || (!ourPath.endsWith(QLatin1Char('/'))
3954                     && childPath.length() > ourPath.length() && childPath.at(ourPath.length()) == QLatin1Char('/'))));
3955 }
3956 
3957 
3958 #ifndef QT_NO_DATASTREAM
3959 /*! \relates QUrl
3960 
3961     Writes url \a url to the stream \a out and returns a reference
3962     to the stream.
3963 
3964     \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3965 */
operator <<(QDataStream & out,const QUrl & url)3966 QDataStream &operator<<(QDataStream &out, const QUrl &url)
3967 {
3968     QByteArray u;
3969     if (url.isValid())
3970         u = url.toEncoded();
3971     out << u;
3972     return out;
3973 }
3974 
3975 /*! \relates QUrl
3976 
3977     Reads a url into \a url from the stream \a in and returns a
3978     reference to the stream.
3979 
3980     \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3981 */
operator >>(QDataStream & in,QUrl & url)3982 QDataStream &operator>>(QDataStream &in, QUrl &url)
3983 {
3984     QByteArray u;
3985     in >> u;
3986     url.setUrl(QString::fromLatin1(u));
3987     return in;
3988 }
3989 #endif // QT_NO_DATASTREAM
3990 
3991 #ifndef QT_NO_DEBUG_STREAM
operator <<(QDebug d,const QUrl & url)3992 QDebug operator<<(QDebug d, const QUrl &url)
3993 {
3994     QDebugStateSaver saver(d);
3995     d.nospace() << "QUrl(" << url.toDisplayString() << ')';
3996     return d;
3997 }
3998 #endif
3999 
errorMessage(QUrlPrivate::ErrorCode errorCode,const QString & errorSource,int errorPosition)4000 static QString errorMessage(QUrlPrivate::ErrorCode errorCode, const QString &errorSource, int errorPosition)
4001 {
4002     QChar c = uint(errorPosition) < uint(errorSource.length()) ?
4003                 errorSource.at(errorPosition) : QChar(QChar::Null);
4004 
4005     switch (errorCode) {
4006     case QUrlPrivate::NoError:
4007         Q_ASSERT_X(false, "QUrl::errorString",
4008                    "Impossible: QUrl::errorString should have treated this condition");
4009         Q_UNREACHABLE();
4010         return QString();
4011 
4012     case QUrlPrivate::InvalidSchemeError: {
4013         auto msg = QLatin1String("Invalid scheme (character '%1' not permitted)");
4014         return msg.arg(c);
4015     }
4016 
4017     case QUrlPrivate::InvalidUserNameError:
4018         return QLatin1String("Invalid user name (character '%1' not permitted)")
4019                 .arg(c);
4020 
4021     case QUrlPrivate::InvalidPasswordError:
4022         return QLatin1String("Invalid password (character '%1' not permitted)")
4023                 .arg(c);
4024 
4025     case QUrlPrivate::InvalidRegNameError:
4026         if (errorPosition != -1)
4027             return QLatin1String("Invalid hostname (character '%1' not permitted)")
4028                     .arg(c);
4029         else
4030             return QStringLiteral("Invalid hostname (contains invalid characters)");
4031     case QUrlPrivate::InvalidIPv4AddressError:
4032         return QString(); // doesn't happen yet
4033     case QUrlPrivate::InvalidIPv6AddressError:
4034         return QStringLiteral("Invalid IPv6 address");
4035     case QUrlPrivate::InvalidCharacterInIPv6Error:
4036         return QLatin1String("Invalid IPv6 address (character '%1' not permitted)").arg(c);
4037     case QUrlPrivate::InvalidIPvFutureError:
4038         return QLatin1String("Invalid IPvFuture address (character '%1' not permitted)").arg(c);
4039     case QUrlPrivate::HostMissingEndBracket:
4040         return QStringLiteral("Expected ']' to match '[' in hostname");
4041 
4042     case QUrlPrivate::InvalidPortError:
4043         return QStringLiteral("Invalid port or port number out of range");
4044     case QUrlPrivate::PortEmptyError:
4045         return QStringLiteral("Port field was empty");
4046 
4047     case QUrlPrivate::InvalidPathError:
4048         return QLatin1String("Invalid path (character '%1' not permitted)")
4049                 .arg(c);
4050 
4051     case QUrlPrivate::InvalidQueryError:
4052         return QLatin1String("Invalid query (character '%1' not permitted)")
4053                 .arg(c);
4054 
4055     case QUrlPrivate::InvalidFragmentError:
4056         return QLatin1String("Invalid fragment (character '%1' not permitted)")
4057                 .arg(c);
4058 
4059     case QUrlPrivate::AuthorityPresentAndPathIsRelative:
4060         return QStringLiteral("Path component is relative and authority is present");
4061     case QUrlPrivate::AuthorityAbsentAndPathIsDoubleSlash:
4062         return QStringLiteral("Path component starts with '//' and authority is absent");
4063     case QUrlPrivate::RelativeUrlPathContainsColonBeforeSlash:
4064         return QStringLiteral("Relative URL's path component contains ':' before any '/'");
4065     }
4066 
4067     Q_ASSERT_X(false, "QUrl::errorString", "Cannot happen, unknown error");
4068     Q_UNREACHABLE();
4069     return QString();
4070 }
4071 
appendComponentIfPresent(QString & msg,bool present,const char * componentName,const QString & component)4072 static inline void appendComponentIfPresent(QString &msg, bool present, const char *componentName,
4073                                             const QString &component)
4074 {
4075     if (present) {
4076         msg += QLatin1String(componentName);
4077         msg += QLatin1Char('"');
4078         msg += component;
4079         msg += QLatin1String("\",");
4080     }
4081 }
4082 
4083 /*!
4084     \since 4.2
4085 
4086     Returns an error message if the last operation that modified this QUrl
4087     object ran into a parsing error. If no error was detected, this function
4088     returns an empty string and isValid() returns \c true.
4089 
4090     The error message returned by this function is technical in nature and may
4091     not be understood by end users. It is mostly useful to developers trying to
4092     understand why QUrl will not accept some input.
4093 
4094     \sa QUrl::ParsingMode
4095 */
errorString() const4096 QString QUrl::errorString() const
4097 {
4098     QString msg;
4099     if (!d)
4100         return msg;
4101 
4102     QString errorSource;
4103     int errorPosition = 0;
4104     QUrlPrivate::ErrorCode errorCode = d->validityError(&errorSource, &errorPosition);
4105     if (errorCode == QUrlPrivate::NoError)
4106         return msg;
4107 
4108     msg += errorMessage(errorCode, errorSource, errorPosition);
4109     msg += QLatin1String("; source was \"");
4110     msg += errorSource;
4111     msg += QLatin1String("\";");
4112     appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::Scheme,
4113                              " scheme = ", d->scheme);
4114     appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::UserInfo,
4115                              " userinfo = ", userInfo());
4116     appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::Host,
4117                              " host = ", d->host);
4118     appendComponentIfPresent(msg, d->port != -1,
4119                              " port = ", QString::number(d->port));
4120     appendComponentIfPresent(msg, !d->path.isEmpty(),
4121                              " path = ", d->path);
4122     appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::Query,
4123                              " query = ", d->query);
4124     appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::Fragment,
4125                              " fragment = ", d->fragment);
4126     if (msg.endsWith(QLatin1Char(',')))
4127         msg.chop(1);
4128     return msg;
4129 }
4130 
4131 /*!
4132     \since 5.1
4133 
4134     Converts a list of \a urls into a list of QString objects, using toString(\a options).
4135 */
toStringList(const QList<QUrl> & urls,FormattingOptions options)4136 QStringList QUrl::toStringList(const QList<QUrl> &urls, FormattingOptions options)
4137 {
4138     QStringList lst;
4139     lst.reserve(urls.size());
4140     for (const QUrl &url : urls)
4141         lst.append(url.toString(options));
4142     return lst;
4143 
4144 }
4145 
4146 /*!
4147     \since 5.1
4148 
4149     Converts a list of strings representing \a urls into a list of urls, using QUrl(str, \a mode).
4150     Note that this means all strings must be urls, not for instance local paths.
4151 */
fromStringList(const QStringList & urls,ParsingMode mode)4152 QList<QUrl> QUrl::fromStringList(const QStringList &urls, ParsingMode mode)
4153 {
4154     QList<QUrl> lst;
4155     lst.reserve(urls.size());
4156     for (const QString &str : urls)
4157         lst.append(QUrl(str, mode));
4158     return lst;
4159 }
4160 
4161 /*!
4162     \typedef QUrl::DataPtr
4163     \internal
4164 */
4165 
4166 /*!
4167     \fn DataPtr &QUrl::data_ptr()
4168     \internal
4169 */
4170 
4171 /*!
4172     Returns the hash value for the \a url. If specified, \a seed is used to
4173     initialize the hash.
4174 
4175     \relates QHash
4176     \since 5.0
4177 */
qHash(const QUrl & url,uint seed)4178 uint qHash(const QUrl &url, uint seed) noexcept
4179 {
4180     if (!url.d)
4181         return qHash(-1, seed); // the hash of an unset port (-1)
4182 
4183     return qHash(url.d->scheme) ^
4184             qHash(url.d->userName) ^
4185             qHash(url.d->password) ^
4186             qHash(url.d->host) ^
4187             qHash(url.d->port, seed) ^
4188             qHash(url.d->path) ^
4189             qHash(url.d->query) ^
4190             qHash(url.d->fragment);
4191 }
4192 
adjustFtpPath(QUrl url)4193 static QUrl adjustFtpPath(QUrl url)
4194 {
4195     if (url.scheme() == ftpScheme()) {
4196         QString path = url.path(QUrl::PrettyDecoded);
4197         if (path.startsWith(QLatin1String("//")))
4198             url.setPath(QLatin1String("/%2F") + path.midRef(2), QUrl::TolerantMode);
4199     }
4200     return url;
4201 }
4202 
isIp6(const QString & text)4203 static bool isIp6(const QString &text)
4204 {
4205     QIPAddressUtils::IPv6Address address;
4206     return !text.isEmpty() && QIPAddressUtils::parseIp6(address, text.begin(), text.end()) == nullptr;
4207 }
4208 
4209 /*!
4210     Returns a valid URL from a user supplied \a userInput string if one can be
4211     deduced. In the case that is not possible, an invalid QUrl() is returned.
4212 
4213     This overload takes a \a workingDirectory path, in order to be able to
4214     handle relative paths. This is especially useful when handling command
4215     line arguments.
4216     If \a workingDirectory is empty, no handling of relative paths will be done,
4217     so this method will behave like its one argument overload.
4218 
4219     By default, an input string that looks like a relative path will only be treated
4220     as such if the file actually exists in the given working directory.
4221 
4222     If the application can handle files that don't exist yet, it should pass the
4223     flag AssumeLocalFile in \a options.
4224 
4225     \since 5.4
4226 */
fromUserInput(const QString & userInput,const QString & workingDirectory,UserInputResolutionOptions options)4227 QUrl QUrl::fromUserInput(const QString &userInput, const QString &workingDirectory,
4228                          UserInputResolutionOptions options)
4229 {
4230     QString trimmedString = userInput.trimmed();
4231 
4232     if (trimmedString.isEmpty())
4233         return QUrl();
4234 
4235 
4236     // Check for IPv6 addresses, since a path starting with ":" is absolute (a resource)
4237     // and IPv6 addresses can start with "c:" too
4238     if (isIp6(trimmedString)) {
4239         QUrl url;
4240         url.setHost(trimmedString);
4241         url.setScheme(QStringLiteral("http"));
4242         return url;
4243     }
4244 
4245     const QFileInfo fileInfo(QDir(workingDirectory), userInput);
4246     if (fileInfo.exists()) {
4247         return QUrl::fromLocalFile(fileInfo.absoluteFilePath());
4248     }
4249 
4250     QUrl url = QUrl(userInput, QUrl::TolerantMode);
4251     // Check both QUrl::isRelative (to detect full URLs) and QDir::isAbsolutePath (since on Windows drive letters can be interpreted as schemes)
4252     if ((options & AssumeLocalFile) && url.isRelative() && !QDir::isAbsolutePath(userInput)) {
4253         return QUrl::fromLocalFile(fileInfo.absoluteFilePath());
4254     }
4255 
4256     return fromUserInput(trimmedString);
4257 }
4258 
4259 /*!
4260     Returns a valid URL from a user supplied \a userInput string if one can be
4261     deducted. In the case that is not possible, an invalid QUrl() is returned.
4262 
4263     \since 4.6
4264 
4265     Most applications that can browse the web, allow the user to input a URL
4266     in the form of a plain string. This string can be manually typed into
4267     a location bar, obtained from the clipboard, or passed in via command
4268     line arguments.
4269 
4270     When the string is not already a valid URL, a best guess is performed,
4271     making various web related assumptions.
4272 
4273     In the case the string corresponds to a valid file path on the system,
4274     a file:// URL is constructed, using QUrl::fromLocalFile().
4275 
4276     If that is not the case, an attempt is made to turn the string into a
4277     http:// or ftp:// URL. The latter in the case the string starts with
4278     'ftp'. The result is then passed through QUrl's tolerant parser, and
4279     in the case or success, a valid QUrl is returned, or else a QUrl().
4280 
4281     \section1 Examples:
4282 
4283     \list
4284     \li qt-project.org becomes http://qt-project.org
4285     \li ftp.qt-project.org becomes ftp://ftp.qt-project.org
4286     \li hostname becomes http://hostname
4287     \li /home/user/test.html becomes file:///home/user/test.html
4288     \endlist
4289 */
fromUserInput(const QString & userInput)4290 QUrl QUrl::fromUserInput(const QString &userInput)
4291 {
4292     QString trimmedString = userInput.trimmed();
4293 
4294     // Check for IPv6 addresses, since a path starting with ":" is absolute (a resource)
4295     // and IPv6 addresses can start with "c:" too
4296     if (isIp6(trimmedString)) {
4297         QUrl url;
4298         url.setHost(trimmedString);
4299         url.setScheme(QStringLiteral("http"));
4300         return url;
4301     }
4302 
4303     // Check first for files, since on Windows drive letters can be interpretted as schemes
4304     if (QDir::isAbsolutePath(trimmedString))
4305         return QUrl::fromLocalFile(trimmedString);
4306 
4307     QUrl url = QUrl(trimmedString, QUrl::TolerantMode);
4308     QUrl urlPrepended = QUrl(QLatin1String("http://") + trimmedString, QUrl::TolerantMode);
4309 
4310     // Check the most common case of a valid url with a scheme
4311     // We check if the port would be valid by adding the scheme to handle the case host:port
4312     // where the host would be interpretted as the scheme
4313     if (url.isValid()
4314         && !url.scheme().isEmpty()
4315         && urlPrepended.port() == -1)
4316         return adjustFtpPath(url);
4317 
4318     // Else, try the prepended one and adjust the scheme from the host name
4319     if (urlPrepended.isValid() && (!urlPrepended.host().isEmpty() || !urlPrepended.path().isEmpty()))
4320     {
4321         int dotIndex = trimmedString.indexOf(QLatin1Char('.'));
4322         const QStringRef hostscheme = trimmedString.leftRef(dotIndex);
4323         if (hostscheme.compare(ftpScheme(), Qt::CaseInsensitive) == 0)
4324             urlPrepended.setScheme(ftpScheme());
4325         return adjustFtpPath(urlPrepended);
4326     }
4327 
4328     return QUrl();
4329 }
4330 
4331 QT_END_NAMESPACE
4332