Regex: Search URL from text

Summary	Regex
URL Scheme to Fragment (Entire URL)	`https?://[a-zA-Z0-9-._~:/?#\[\]@!$&'()*+,;=%]+` e.g. https://user:pass@host.com:8080/path/path.html?query=1&query=val#fragment e.g. http://host.com/path/?query=1&query=val#fragment
URL Scheme to Query	`https?://[a-zA-Z0-9-._~:/?\[\]@!$&'()*+,;=%]+` e.g. https://user:pass@host.com:8080/path/path.html?query=1&query=val e.g. http://host.com/path/?query=1&query=val
URL Scheme to Path	`https?://[a-zA-Z0-9-._~:/\[\]@!$&'()*+,;=%]+` e.g. https://user:pass@host.com:8080/path/path.html e.g. http://host.com/path/
URL Scheme to Port	`https?://([a-zA-Z0-9-._%:]*@)?[a-zA-Z0-9-.]+(:[0-9]+)?` e.g. https://user:pass@host.com:8080 e.g. http://host.com
URL Scheme to Host	`https?://([a-zA-Z0-9-._%:]*@)?[a-zA-Z0-9-.]+` e.g. https://user:pass@host.com e.g. http://host.com
URL Scheme to Userinfo	`https?://([a-zA-Z0-9-._%:]*@)?` e.g. https://user:pass@ e.g. http://

2022-12-23

Web Tools

This is a regular expression that searches for URLs from arbitrary text, allowing matches up to a specific syntax, such as scheme to query, to path, to host, etc., as well as the entire URL.

Readme

This regex does not use lookahead, lookbehind and ⌘w, so it will work equally well with many regex engines.
This regular expression is not a strict check on the format of the URL.

Schemes other than http/https

This regular expression matches only the http(s) scheme; if you want to match schemes other than http(s), replace https? part with (https?|ftp|mailto|telnet|file) and so on.

Characters allowed in URL

This regular expression covers reserved and non-reserved characters as defined in RFC 3986, as well as % (supports percent encoding).

Reserved Characters

:, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, =

Unreserved Characters

ALPHA (a-z, A-Z), DIGIT (0-9), -, ., _, ~

Hosts are not allowed to use _ and ~.

URL syntax

scheme://userinfo@host:port/path?query#fragment

scheme

Information that indicates the means of communication and cannot be omitted. Basically, the name is the same as the communication protocol, but scheme does not necessarily mean protocol. In some cases, application names are used.

userinfo

The user name and password used for access. userinfo, host, and port are collectively referred to as authority.

host

Information that indicates the server to be accessed, such as FQDN or IP address.

port

Port number of the server to be accessed starting with :.

path

The destination in the host starting with /; for http(s), it is the directory name and file name.

query

It starts with ? and ends with # or the end. There is no clear syntax, but in the case of http(s), specify the key=value format separated by & as a parameter for GET communication.

fragment

It starts with #. There is no clear syntax, but for HTML documents, specify the id attribute of an element, and the browser will scroll to that element’s position when displaying (anchor).