Decoding the Internet's Pathways: How a Web Page Travels Across the Digital Landscape

Decoding the Internet's Pathways: How a Web Page Travels Across the Digital Landscape

How does the web work, knowing this is important for anyone who wants to be a software developer be it frontend, backend, DevOps or any field that deals with computers, so how does the internet or web work? This seems to be a simple question but many people will not be able to explain. In this post let's understand that. Nowadays tools or framework for developers has become so powerful that beginners tend to look too far and don't understand the basics of how the internet or the way web works, even I have done the same mistake.

So let's start, every user on the internet does a web search like www.google.com, in the browsers like Chrome, edge, firefox etc, after some time these browsers show the page of Google search engine, so how does the browser know what is google.com, where does it search. This may seem to be a simple task but there are a lot of complications and steps to follow before we start to see the journey let's understand some basic terms in the world of web.

Time for some Techincal terms

Client: Client is an application like Chrome that stays on your computer, it is the one that communicates with the internet to send requests and get responses. Here browser may be the one that is communicating but the whole system or computer is said to be a client, every client has an IP address on the internet which is used by other computers to identify it.

Server: A server is also a computer that has an IP address, but the responsibility of a server is to send accurate data for a particular request. Servers have special software installed on it to specifically listen to the requests sent by the client and respond to them by either sending data to it or an error message if it cannot process the request. This whole system is known as a client-server model.

IP address stands for internet protocol address it is a numerical identifier for a device on a TCP/IP network where every computer on the internet has an IP address that it uses to identify and communicate with other computers. IP addresses have four sets of numbers separated by decimal points (e.g. 244.155.65.2). This is called the “logical address”. To locate a device in the network, the logical IP address is converted to a physical address by the TCP/IP protocol software. This physical address (i.e. MAC address) is built into your hardware.

ISP stands for an internet service provider and is the middleman between the client and server, when we search for an address like www.google.com our browser doesn't know where to look, it is the ISP's job to perform a DNS lookup for the IP address of the site that the client has requested.

DNS stands for domain name system and has a distributed database that keeps track of all the domain names and their corresponding IP address, a domain name is used to identify one more IP address on the internet.

TCP/IP is the transmission control protocol or internet protocol is the most widely used protocol in communications. A “protocol” is simply a standard set of rules for doing something. TCP/IP is used as a standard for transmitting data over networks.

Port Number is a 16-bit integer that identifies a specific port on a server and is always associated with an IP address. It serves as a way to identify a specific process on a server so network requests could be forwarded.

Host is a computer, just like the client, connected to the internet and has an IP address or a server that serves web pages for the website, Servers are a type of host — they are a specific machine. On the other hand, a host could refer to an entire organization that provides a hosting service to maintain multiple web servers.

HTTP stands for Hyper-text transfer protocol, which is the protocol that web browsers and web servers use to communicate with each other on the internet.

URL stands for uniform resource locators, used to identify a particular web resource. A simple example is https://github.com/someone. The URL specifies the protocol (“https”), hostname (github.com) and file name (someone’s profile page). A user can obtain the web resource identified by this URL via HTTP from a network host whose domain name is github.com.

The journey of websites

Now that we have all the essentials out of the way let's take an example to see how it works, let's take www.github.com/ as an example.

  1. Type the URL into your browser

  2. The browser parses the information contained in the URL. This includes the protocol (“https”), the domain name (“github.com”) and the resource (“/”). In this case, there isn’t anything after the “.com” to indicate a specific resource, so the browser knows to retrieve just the main (index) page.

  3. The browser communicates with your ISP to do a DNS lookup of the IP address for the web server that hosts www.github.com. The DNS service will first contact a Root Name Server, which looks at https://www.github.com and replies with the IP address of a name server for the “.com” top-level domain. This address is sent back to your DNS service. The DNS service does another outreach to the “.com” name server and asks it for the address https://www.github.com.

  4. Once the ISP receives the IP address of the destination server, it sends it to your web browser.

  5. Your browser takes the IP address and the given port number and opens a TCP/IP socket, now the client and the web server are connected.

  6. Now the browser sends an HTTP request to the web server for the main page of GitHub (this is a GET request but more on that later)

  7. The web server receives the request and looks for that HTML page. If the page exists, the web server prepares the response and sends it back to your browser. If the server cannot find the requested page, it will send an HTTP 404 error message, which stands for “Page Not Found”.

  8. Your web browser takes the HTML page it receives and then parses through it doing a full head-to-toe scan looking for other assets that are listed, such as images, CSS files, JavaScript files, etc.

  9. For each asset listed, the browser repeats the entire process above, making additional HTTP requests to the server for each resource.

  10. Once the browser has finished loading all other assets that were listed in the HTML page, the page will finally be loaded in the browser window and the connection will be closed.

    Note:

    When you make a request, that information is broken up into many tiny chunks called packets. Each packet is tagged with a TCP header, which includes the source and destination port numbers, and an IP header which includes the source and destination IP addresses to give it its identity. The packet is then transmitted through ethernet, WiFi or Cellular network and is allowed to travel on any route and take as many hops as it needs to get to the final destination.

    All the packets know how to get to the destination because of the TCP/IP.

    TCP/IP is a two-part system, functioning as the Internet’s fundamental “control system”. IP stands for Internet Protocol; its job is to send and route packets to other computers using the IP headers (i.e. the IP addresses) on each packet. The second part, Transmission Control Protocol (TCP), is responsible for breaking the message or file into smaller packets, routing packets to the correct application on the destination computer using the TCP headers, resending the packets if they get lost on the way, and reassembling the packets in the correct order once they’ve reached the other end.

  11. Now that our browser has all the code, style sheets and images that are required, your browser now goes through several steps in order to display the web page to you.

  12. Your browser has a rendering engine that’s responsible for displaying the content. The rendering engine receives the content of the resources in small chunks. Then there’s an HTML parsing algorithm that tells the browser how to parse the resources.

  13. Once parsed, it generates a tree structure of the DOM elements. DOM stands for Document Object Model and it is a convention for how to represent objects located in an HTML document. These objects — or “nodes” — of every document can be manipulated using scripting languages like JavaScript.

  14. Once the DOM tree is built, the stylesheets are parsed to understand how to style each node. Using this information, the browser traverses down DOM nodes and calculates the CSS style, position, coordinates, etc for each node.

  15. Once the browser has the DOM nodes and their styles, it’s finally is ready to paint the page to your screen accordingly. The result: everything you’ve ever looked at on the Internet.

So in a nutshell this is how the internet works or you can say the travel of the webpage on the internet, this is just a simple version without any complications so that everyone can understand.

If you like the article like it, if you want to read more articles like this bookmark the blog or subscribe to the newsletter so that you won't miss any future articles.

Hey, viewer if you like the articles that I am writing and want to support me you can go to the support me tab or else you can also sponsor me. Thank you!

Did you find this article valuable?

Support Saravan Krishna's blog by becoming a sponsor. Any amount is appreciated!