tadhg.com
tadhg.com
 

Guide to How the Web Works I: For Web Users

22:47 Sun 28 Jul 2013
[, , , , , , , ]

What I think every web user should know about the technical side of the web. This is intended to be the first in a series of guides aimed at increasingly advanced levels of use[1].

This post covers the basics; enough so that after reading it you won’t mistake a blog post for the new Facebook redesign.

This is a work in progress. Please let me know if you see errors, or if you don’t understand something here—that’s valuable feedback!

How to Get There: URLs

Your browser almost certainly has a URL bar, also called a “location bar” or “address bar”. This is where you enter the address of the website you want to go to. In some browsers, this bar also accepts other input that it will send to a search engine, so that if you enter “waxy” instead of waxy.org you’ll be directed to a results page on which waxy.org is likely to be the first result.

The difference between these two things is similar to the difference between dialing a phone number directly and calling an operator[2] and asking them to find you the number and connect you—or, perhaps, between dialing a number directly and saying the name of the person you want to call into your smartphone and having it dial for you.

If your browser does turn non-URLs into search terms, you’ll still follow one of those, at which point a real URL is in the location bar and directing the browser. So what is it? It stands for Uniform Resource Locator, but that’s not so important. More important are its parts.

Here’s a simple one:


http://tadhg.com/

All web addresses start with http:// or https:// (denoting a “secure” connection)[3]. The piece that might matter to you is whether it’s http or https at the start. Many browsers now don’t display the scheme at all, so that in your address bar you’d just see:

tadhg.com

That part is the host, or domain—the core part of the address. With nothing after it, or with just a trailing slash, you’re at the top level, or “home”.

Many addresses have a third piece of information after scheme and host: path. This is a more specific location for whatever the resource is, and uses slashes as separators. For example:


http://tadhg.com/wp/2013/07/21/comedic-interlude/

Eliminating what we’ve covered already (scheme of http, host of tadhg.com), we have:

/wp/2013/07/21/comedic-interlude/

The slashes are supposed to denote directories or folders, so this means: in the comedic-interlude section of the 21 section of the 07 section of the 2013 section of the wp section.

Paths don’t always end in slashes, so might look like this:


http://docs.python.org/2/library/functools.html

Which means: the functools.html object in the library section of the 2 section of the domain docs.python.org using the http: scheme.

One implication of this folder-like structure is that you can alter the address manually and stand a good chance of getting to something useful; without visiting the page, I can guess that the library documentation for Python 2 is located at:


http://docs.python.org/2/library/

And just give that address to my browser instead.

To review:

http://    docs.python.org    /2/library/functools.html
  ^               ^                       ^
scheme          domain                   path

There is more to URLs, some of it important[4]. The full specification:

scheme
e.g. http://

user/password
e.g. guido:monty@. Optional and deprecated.

domain
e.g. docs.python.org

port
.e.g :8080. Optional.

path
e.g. /2/library (Optional, defaults to /.)

query string
e.g. ?parameter=value. Optional

fragment
e.g. #section1. Optional

Also see Wikipedia’s URL entry.

Who’s Giving What to Whom: Servers and Clients

The thing you’re giving the URL to, in the hopes of reaching the resource you want, is a web browser: Firefox, Chrome, or Safari, for example. That browser is a software program running on your hardware device: phone, laptop, desktop computer, perhaps your television. There’s a layer in between the browser and the device, called an operating system, that handles common tasks without requiring each program to do everything itself. Common operating systems include Windows, Mac OS X, Linux, iOS, and Android.

So you have a device, running an operating system, running a browser. The browser operates locally, that is, on your device, and classically it makes requests to the server.

The server is the machine, somewhere, that’s storing or creating the content that is returned to the browser after a request is made.

Here’s a typical interaction, simplified:

  1. You enter the URL popehat.com into your browser’s location bar.
  2. Your browser turns this into a well-formed address of http://popehat.com/
  3. Your browser uses your internet connection to send this request to the machine with the name popehat.com[5]
  4. That machine, the web server for popehat.com, sees the request and recognizes that it’s HTTP and that it’s asking for the top level of the site’s content.
  5. The web server sends a response with the appropriate content. This content is likely to be HTML.
  6. Your browser sees that the response is HTML, and displays it to you as such, also making additional requests to the server for content specified by that HTML.

What a Web Page (Probably) is: HTML, CSS, and JavaScript

Most of those steps summarize a lot of complexity, and this is particularly true of the last one. What your browser displays to you is not strictly speaking like a picture or a printed page, but rather your browser’s estimation of how the content is supposed to look.

HTML is HyperText Markup Language; a “markup language” is text that’s marked up with special syntax to describe its appearance, function, and meaning, while “hypertext” is essentially text that supports linking. This is the agreed-upon standard that browsers expect from web servers; by definition, a web browser can display HTML.

HTML is a formally specified markup language and not a programming language.

CSS (Cascading Style Sheets) is a presentation syntax for HTML—that is, it describes to browsers how marked-up sections of HTML pages should be styled. It is also not a programming language.

JavaScript is a programming language, the standard language that browsers expect to execute on HTML pages to handle complex interactive behavior. (Note that it is unrelated to Java, and is called JavaScript because someone thought at the time that this made marketing sense.)

The classic separation between the three is that HTML contains the content and broad markup describing what various sections are, CSS contains rules for how sections should look, and JavaScript contains rules for interactive behavior. The browser understands all three and does the work on your device of putting them together and making them function.


Again, I’d really like to improve this over time, so feedback and suggestions are welcome.

Website owners, you need to know the above… and more—that will hopefully be the next entry in the series.

[1] The plan is:

1
Web users.

2
Website owners.

3
Web developers.

4
Large-scale web application engineers.

5
Network and security engineers? I’m not sure about this one yet—and it might take me a while to learn what would need to go into such a guide.

[2] I find it interesting that, contrary to what one might have expected from science fiction, “the web” is rarely used as a bare noun, but more often as part of terms like “website”, “web page”, and “web developer”, and that “the web” still sounds clunky to me except as part of the phrase “on the web”.

[3] Or 411, or directory enquiries, etc.

[4] HTTP stands for HyperText Transfer Protocol, and the S in HTTPS stands for Secure.

[5] Particularly because of optional sections in which things can be hidden. Consider these two fictional examples:


https://myfictionalbankingsite.com/account/login/forgot_password/forgot_password.jsp?sid=123321042000&token=1490c555-442a-47b0-b097-8d03f9b067be


https://myfictionalbankingsite.com∕account∕login∕forgot_password:forgot_password.jspsid@123.32.104.20/token=1490c555-442a-47b0-b097-8d03f9b067be

One of them goes to the forgot password service of a fictional banking site; the other to some very likely dangerous site designed to lure users to give up their banking credentials. The latter one is the suspicious one, and is telling the browser to go to the domain 132.32.104.20 and to present a username of myfictionalbankingsite.com/account/login/forgot_password and a password of forgot_password.jspsid; these will be ignored, as the entire setup is there purely to present a technically-valid URL that looks like an entirely different URL.

[6] DNS, caching, and ports are all beyond the scope of this post.

« (previous)

One Response to “Guide to How the Web Works I: For Web Users”

  1. Steve Casey Says:

    Your footnotes have become mis-numbered at some point.

    I wasn’t aware that the URL encoded u/p has been deprecated? I agree it’s troublesome but didn’t know the reaction was to remove it?

Leave a Reply