It you look at any url for any website you will notice it starts with https:// . This hasn't always been the case. I don't know how old are you but in the early days of the internet up to 10 years ago, most of the urls started with http:// . This is not the case now as http doesn't provide any sort of communication encryption and thus considered totally un-secure. In this article we will go through a summary of how the internet works and how is https:// .
How does the internet work ?
let's say you really enjoy watching cats, you want to use the power of the internet to look at the pictures and videos of cats, so what do you do ? you open up your browser of choice and visit one of the websites dedicated to cats.
But what is a Website / Web application ?
A website or web application basically allows you to view structured documents along with media content like images, videos, audio files etc.. This content can be static (never changed), Or a dynamic content that is tailored only to you, for example when you are viewing your email messages you are reading content that is only available to you.
Where does a website / web application live ?
In its most basic forms (please ignore all the complications of cloud computing, load balancers, CDNs etc..). a website will live on a special type of computer (let's call it a Server). this computer has all the text / video or can generate them to you on the fly for you to enjoy and must be running 24/7 for when you need to watch cat videos at 3 am 😊.
How does a Server serve my requests ?
When you want to see pictures for cats you need to ask this special computer to bring you pictures of cats. If you want to watch videos you need to ask this special server for videos.
Basically in order to get the content you want you need to :
- Know the address of computer (server) that has the content that you need.
- Ask the computer to get you the content that you want.
- Receive the content and view it.
An address of a computer is the url cats.com right ? No. computers sort of work like telephones, in the sense that the addresses they understand is only numbers (please ignore all about ipv6, ipv4 for now 🙏). this address format is called an ip and it is the only thing that a computer network understand (exactly like a telephone number)
an example ip is
123.45.67.89
When is the last time you memorised telephone number ? most of us only see a telephone number only when we save a new contact in our phones 😂. The same thing goes for a computer IP. We use something called a DNS.
What is a DNS ?
In its simplest a DNS is a directory, it maps human readable text ex www.cats.com
to the actual address that is understood by any network (the internet included). there are some global DNS out there and almost every ISP has a small version of it to make these mapping calls quick.
Ok, So I know the address, How does requests work ?
Once you have a human readable url (website address) you start making requests. you open you browser and type "cats.com". Under the hood the browser asks the DNS in your network to translate this human text into a network address. The requests and responses happen as follows
- You visit cats.com/videos
Your browser asks the server to get you "/Videos" this request is sent on the network.
- The server prepares you some Cat videos and send them back
- You browser displays these videos for you.
this is how basically http work.
Why is not HTTP secure ?
Computer use these message types called packets to send and receive data using the http request / response model that we explained. these packages are carried wireless and through wires, keeps navigating the network till it reaches the destination. for example if you are communicating with cats.com then your request message fits 1 packet then
- packet is created by your browser
- your laptop wireless interface sends this package to your home router
- your home router sends this package to your Internet Service Provider / Mobile carrier.
- ISP sees where this package should go and will send it using the fastest cable to the nearest router to the desitnation.
- ...package keeps hopping on the network
- package reaches destination computer
- package is processed.
The problem here is any node on the way can basically intercept and read this package, it can also modify it. any one on your wireless / wired network can do something called "sniffing" and basically capture all packages that go through the network.
With cats and dogs videos this might not be a big deal. but if you are sending your email + password of you bank account to the bank server (to login / see your balance) having the possibility of reading your packets means all your information can be very easily stolen.
What is HTTP(s) ?
https added the (security) to regular http. It does so by introducing e2e encryption between you and the computer that you are talking to. if your messages is locked in a secure box and only the recipient has a key to open it then you don't care who is carrying it. any one that is trying to read it along the way will only see a locked box and can not do anything with it.
this idea revolutionised the internet. it is what allowed application types like health, banking, etc to even exist because without this transit security these application could never exist as basically any one can easily just steal anybody's information.
There is only one problem. if you lock your message with a key how will the server know it to unlock your message ? if the server on the other hand made its key public then basically we achieved nothing as any one who captured your locked message will use the key that is vended by the website to unlock the your message !!
Symmetric vs Asymmetric keys
Symmetric keys are the keys used for both locking and unlocking of things. it is like your home keys for example. any one with a copy can lock / unlock your door.
Asymmetric Keys on the other hand are quite different you have a set of two keys, a locking key and an unlocking key. you can lock your door with the locking key, but you can only unlock it with the "unlocking key". this means id you have the "locking" key only you can lock stuff up but only a person with an "unlocking" key can unlock it.
these in the computer world are called public and private keys.
Now in order to communicate securely you can use this concept as follows.
- You and Server both have public and private keys
- You both share with each other the public (locking keys)
- When you want to send something you lock it with the Server's locking key.
- Server opens the message by his private (unlocking) key.
- Server prepares message, locks it by your locking key.
- Because you are the only one with the unlocking key you can unlock server response.
In this scheme the locking keys (public) can be shared on the internet without any problem as hacker cannot use them for unlocking the message 😂.
To be honest it is more complicated than that as https uses both Asymmetric & Symmetric encryption to work but if you want to learn more you can find details online.
How does HTTPS work ?
The Asymmetric key grantees end to end encryption between you and the entity that you are talking to. but what grantees that you are talking to the right entity and not an imposter ?
Remember the DNS that your ISP, local company might have ? what if someone broke into it and changed the directory mapping line "cats.com" from 123.45.67.89
to a machine that the attacker has control of like 79.10.200.23
that the attacker has control of. In this case your end to end communication will be protected (no one can ears drop on it). but you might be talking to an imposter. if it is a bank website and you are sending to it your email + password. these can be easily stolen by the imposter and later used to break into your own bank account. So how can we protect guarantee that this never happens ?!
the only ways is that a third party keeps track of the websites and authenticate the public information that they provide, when a server vends his public key along with the website information. this website information needs to be validated before the communications can actually happens.
First, the website needs to have an SSL certificate issued by a publicly trusted certificate authority. An SSL certificate is a lot like a passport — but it’s for websites, not people. An SSL certificate includes details such as:
- The website’s URL(s),
- A public key (which is linked to a private key only possessed by the website),
- The certificate authority that issued the certificate,
- The certificate’s expiration date, and
- The legal organization that runs the website (optional).
To get a valid SSL certificate, the website owner will have to go through a few steps:
- Generate a public key and a private key (more on how they’re use later).
- Go through a specific process to prove to the certificate authority that they’re the actual owner of the website.
- In the case of some certificates, the website owner also has to prove that they’re an actual, legally registered organization.
Once those steps are completed, the certificate authority issues an SSL certificate to the website owner. This certificate is installed on the web server and is automatically provided every time someone visits the website via an https:// URL.
Certificate Authorities : en.wikipedia.org/wiki/Certificate_authority
How does a browser trust a Certificate Authority that produces certificates ? Each browser has to maintain a list of trusted Certificate Authorities CAs along with their public keys. this makes sure when a user tries to visit a website that has a certificate issues by one of the the CAs the browser can confirm that the information in the certificate is correct.
The weakest link problem with HTTPS.
Because anyone of the trusted CA (can be 10s of them) can produce and verify a CA. It only takes one of the Certificate Authorities to hacked and the whole HTTPs security falls apart. Anyone can use the hacked CA to produce a fake issue for Facebook / Google / Your bank and because the browser sees it has been issued by one of the trusted CAs it will continue to allow the user to communicate with this server. This is why CAs have to go through rigorous security checks and are always audited all the time to make sure they don't have breaches or that they don't sell fake certificates. It only takes one of them to be hacked and you no longer can trust any website that you are visiting on the internet. This has happened before multiple times.
Are we really safe ?
History has taught us that a defender has to keep his eye open 24/7 365 days a year, an attacker has to get lucky once. Whether this has already happened or how soon it will happen again we will do not know. Question everything and always have second thoughts 😉.