Tutorials > The Web > URLs

URLs

What Are URLs?

In this tutorial we will be talking about what URLs are. So in the context of the last tutorial, we can summarise that we have web browsers that let us access websites that are hosted on a web server (hopefully the jargon is making sense, otherwise read over the last tutorial).

Websites consist of webpages that often have links to each other and other resources on the website. Links are small elements of a webpage that will navigate us to a new webpage or resource when we click them (you have used them to navigate you to this page). But before we can talk about links in more detail (the next tutorial), we have to talk about URLs which is what allows links to function.

URL stands for Uniform Resource Locator and it is what allows both us and web browsers to uniquely identify a resource on the Web. When we are on a webpage, a URL is what we see in the address bar of our web browser. For example, the URL of this page is: https://1x1.nicolldouglas.dev/tutorials/the-web/urls.

This URL uniquely identifies this webpage as a resource on the Web. So if you type this into the address bar of any web browser, it will know to request this webpage and it is what you will see. The structure of a URL consists of some of the components in the example URL as well as others. Let's try to unpack the structure of a URL below.

Structure of a URL

The figure below denotes the structure of a URL. It consists of 6 parts but you don't have to know exactly what each part means in detail right now; we shall go over it.

The stucture of a URL — Figure 1 - The structure of a URL

Scheme

The first part of a URL is the scheme. The scheme tells the browser which protocol to use in order to request the resource. you will commonly see these be either HTTP or HTTPS (HTTPS for this page, for example). We will go over these in more detail in a future tutorial but for now all you need to know is that HTTP and HTTPS are standard protocols/methods used on the Web by browsers and servers in order to communicate and transfer data. Think about it as a language that browsers and servers use to speak to each other and agree to speak with.

There are also other schemes that you might see in URLs on the Web, for example the mailto: scheme which web browsers use to open your email client. These are less important, so just know the scheme is the protocol in which a web browser tries to get a resource or do something.

Another thing is, when we type out a domain name in our address bar like youtube.com, we can typically omit the scheme because with HTTP and HTTPS, modern web browsers have features that lets them figure out what scheme to use based on interactions with the web server. But the scheme is always there (and it is usually separated with a :// from the domain name when shown).

Domain

We've already talked a lot about domains but this part of the URL can contain a domain name or an IP address as we have used previously, and it tells the browser which web server we are trying to get the resource from. In the figure, the domain name used is "example.com".

Port

The next part of the URL is the port number which is separated by a colon after the domain name. You can think of the port as the "gateway" into the server. A server might be hosting web content as well as providing other services (like email for example), so the port indicates which "gate"/service to use. It also makes it easier for the server to distinguish their services on the inernet and to distinguish what request-makers want.

In the figure example, the port number used is 80, which is the default/standard port used by servers for the HTTP protocol. The default/standard port number for HTTPS is 443. We often don't see port numbers in URLs when on the Web (like on this page) and that is because, if a server is using the standard HTTP/HTTPS ports (which they usually are), then we can omit them and the browser can infer the port from the scheme. But these are the only exceptions, in other cases, the port number is mandatory.

If you try adding port 443 to the URL of this page, you will see you get the exact same page since we are using HTTPS. If you try adding port 80, you might get an error since the request scheme (HTTPS protocol) is not compatible with the HTTP protocol used on port 80.

File Path

Once we know the scheme, domain, and port that we will be using to request from the web server, next comes the file path. In the figure, the file path is /path/to/file.html. The same way we have files and folders on our personal computers, web servers also have files and folders that they make publicly available on the Web. In this case, the file is file.html which lives in folder /to which lives in the top-level folder /path. So a web server could potentially make available thousands of files if they have the infrastructure. These files are often HTML, images like PNGs and JPEGs, MP4 videos, and lots of other file formats that are supported on the Web. So with the file path in the URL, the web server knows which file that we want.

The file path for this webpage is /tutorials/the-web/urls. You might notice that there is no file extension (like .html and so on). Web servers can make configurations to omit these file extensions but they still represent some kind of resource on the server like HTML or an image. And in this case, this is a HTML page stored inside the folder for all other tutorials in this module.

Other Parts

The last two parts of a URL are the query parameters and anchor. These are not so important for now but I will give a brief explanation.

Query parameters come after the file path and require a ? before them. They are essentially key value pairs joined by an = that can provide extra data and information to the server, and the server can do what it wants with them. Each time a new query parameter is added, an & symbol must separate them. So the query parameter section in the example is ?key1=value1&key2=value2 which essentially says: "for this request, key1 has value1 and key2 has value2".

We also have anchors and these will be more relevant when we talk about links in the next tutorial. But if the resource we are requesting is a HTML page, the anchor acts as a kind of "bookmark" and tells us where exactly in the document we want to see when it loads.

So overall, URLs have this comprehensive structure that allow us to identify any unique resource on the Web which can be any of the many types supported on the Web. And with these unique identifiers, web servers and web browsers are able to easily communicate and retrieve the resources for a specific request.

Absolute URLs vs. Relative URLs

The structure we were talking about above is what's known as an absolute URL. There is also another type of URL called a relative URL and we will discuss the differences below. But for context, the required parts of a URL depend on the context in which they are used. For example, inside the address bar of a browser, there is essentially no context, so all parts are required (although you can omit some like we discussed previously, but overall all parts will be needed by the browser).

URLs can also be embedded inside a document (which is what forms links), and that has a context, which is the current document it is in. If that current document is being viewed in a web browser, the web browser will know its full URL. The key part here is, A URL inside a document that forms a link can omit certain parts and the web browser will infer those missing parts based on the URL of the page it is on (the existing context). Those are what as known as relative URLs.

There are a few types of relative URLs, so I will go over the key ones below:

Scheme-relative URL: //1x1.nicolldouglas.dev/tutorials — only the scheme is missing, so the browser will use the same protocol used to load the document the URL is in.
Domain-relative URL: /tutorials — the scheme and the domain are missing, so the browser will use the same protocol and domain used to load the document the URL is in. So for example, any page on this website with a link containing URL /tutorials, would take you to the tutorials page.
Sub-resources: the-web/urls — the protocol and domain are missing, and the file path doesn't begin with /. The browser will try to find the resource in a subfolder of the one it is currently in. So imagine we are in the /tutorials document, and there is a link with a relative URL to the-web/urls. The document we will be getting is /tutorials/the-web/urls (/the-web being a subfolder). You may also have these types of relative URLs be in the form ./the-web/urls. The ./ just means "the current folder", so the URLs are equivalent.
Going back in the folder tree: ../html/introduction — the protocol and domain are missing, and the file path starts with .., which means go back up one folder. So consider this document (/tutorials/the-web/urls), we are currently in the folder /the-web. The URL would tell us to go back into the previous folder /tutorials and then into the folder /html and then the document introduction.
Anchor-only: #absolute-urls-vs-relative-urls — all parts of the URL are missing except the anchor. In a link, the browser will add the anchor to the current document's URL and take you to that location in the document (and that is what the navigation links at the beginning of these tutorials do). Try it.

So overall these are the different kinds of relative URLs. You don't have to memorise them all now but it is important to know that there are different types of URLs that exist on the web that fall either into the category of absolute URLs (a full document identifier) or relative URLs (context dependent). In the next tutorial we will talk about what links are and how they tie into URLs.

Key Concepts Learnt

URL stands for Uniform Resource Locator and it allows us and web browsers to uniquely identify a resource on the Web.
A URL contains a scheme which indicates what protocol to use to request the resource.
A URL contains a domain name or IP address which tells what web server to make the request to.
A URL contains a port number to indicate what "gateway" to use on the server.
The default ports for HTTP and HTTPS are 80 and 443 respectively and can usually be omitted in a URL.
A URL contains a file path which tells the location of the file or resource on the server.
A URL can contain other parts such as query parameters and an anchor, which can provide extra information towards the resource we are requesting.
URLs can be either absolute URLs (the full identifier) or relative URLs (context dependent).