|URL Scheme to Fragment|
|URL Scheme to Query|
|URL Scheme to Path|
|URL Scheme to Port|
|URL Scheme to Host|
|URL Scheme to Userinfo|
This is a regular expression that searches for URLs from arbitrary text, allowing matches up to a specific syntax, such as scheme to query, to path, to host, etc., as well as the entire URL.
- This regex does not use lookahead, lookbehind and ⌘w, so it will work equally well with many regex engines.
- This regular expression is not a strict check on the format of the URL.
Schemes other than http/https
This regular expression matches only the http(s) scheme; if you want to match schemes other than http(s), replace
https? part with
(https?|ftp|mailto|telnet|file) and so on.
Characters allowed in URL
This regular expression covers reserved and non-reserved characters as defined in RFC 3986, as well as
% (supports percent encoding).
A-Z), DIGIT (
- Hosts are not allowed to use
Information that indicates the means of communication and cannot be omitted. Basically, the name is the same as the communication protocol, but scheme does not necessarily mean protocol. In some cases, application names are used.
The user name and password used for access. userinfo, host, and port are collectively referred to as authority.
Information that indicates the server to be accessed, such as FQDN or IP address.
Port number of the server to be accessed starting with
The destination in the host starting with
/; for http(s), it is the directory name and file name.
It starts with
? and ends with
# or the end. There is no clear syntax, but in the case of http(s), specify the
key=value format separated by
& as a parameter for GET communication.
It starts with
#. There is no clear syntax, but for HTML documents, specify the id attribute of an element, and the browser will scroll to that element’s position when displaying (anchor).