JP
/
EN

Regex: Search URL from text

SummaryRegex
URL Scheme to Fragment
(Entire URL)
https?://[a-zA-Z0-9-._~:/?#\[\]@!$&'()*+,;=%]+
e.g. https://user:pass@host.com:8080/path/path.html?query=1&query=val#fragment
e.g. http://host.com/path/?query=1&query=val#fragment
URL Scheme to Queryhttps?://[a-zA-Z0-9-._~:/?\[\]@!$&'()*+,;=%]+
e.g. https://user:pass@host.com:8080/path/path.html?query=1&query=val
e.g. http://host.com/path/?query=1&query=val
URL Scheme to Pathhttps?://[a-zA-Z0-9-._~:/\[\]@!$&'()*+,;=%]+
e.g. https://user:pass@host.com:8080/path/path.html
e.g. http://host.com/path/
URL Scheme to Porthttps?://([a-zA-Z0-9-._%:]*@)?[a-zA-Z0-9-.]+(:[0-9]+)?
e.g. https://user:pass@host.com:8080
e.g. http://host.com
URL Scheme to Hosthttps?://([a-zA-Z0-9-._%:]*@)?[a-zA-Z0-9-.]+
e.g. https://user:pass@host.com
e.g. http://host.com
URL Scheme to Userinfohttps?://([a-zA-Z0-9-._%:]*@)?
e.g. https://user:pass@
e.g. http://

This is a regular expression that searches for URLs from arbitrary text, allowing matches up to a specific syntax, such as scheme to query, to path, to host, etc., as well as the entire URL.

Readme

  • This regex does not use lookahead, lookbehind and ⌘w, so it will work equally well with many regex engines.
  • This regular expression is not a strict check on the format of the URL.

Schemes other than http/https

This regular expression matches only the http(s) scheme; if you want to match schemes other than http(s), replace https? part with (https?|ftp|mailto|telnet|file) and so on.

Characters allowed in URL

This regular expression covers reserved and non-reserved characters as defined in RFC 3986, as well as % (supports percent encoding).

Reserved Characters

:, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, =

Unreserved Characters

ALPHA (a-z, A-Z), DIGIT (0-9), -, ., _, ~

  • Hosts are not allowed to use _ and ~.

URL syntax

scheme://userinfo@host:port/path?query#fragment

scheme

Information that indicates the means of communication and cannot be omitted. Basically, the name is the same as the communication protocol, but scheme does not necessarily mean protocol. In some cases, application names are used.

userinfo

The user name and password used for access. userinfo, host, and port are collectively referred to as authority.

host

Information that indicates the server to be accessed, such as FQDN or IP address.

port

Port number of the server to be accessed starting with :.

path

The destination in the host starting with /; for http(s), it is the directory name and file name.

query

It starts with ? and ends with # or the end. There is no clear syntax, but in the case of http(s), specify the key=value format separated by & as a parameter for GET communication.

fragment

It starts with #. There is no clear syntax, but for HTML documents, specify the id attribute of an element, and the browser will scroll to that element’s position when displaying (anchor).

Hirota Yano / Japan / Programmer
I am publishing a web tool I created as a hobby. It is free of charge, so please feel free to use it.
© Hirota Yano