Virtualization for dummies

What are the types of virtualization? What does this correspond to in real life?

Are you lost in the middle of virtualization jargon?

I’ll explain.

Level zero of virtualization: no virtualization.

When we talk about virtualization, we must first define what a non-virtualized environment is. This was still the norm a few years ago.

Basically, for every physical machine/server we have an operating system (windows, linux or other) and applications running on this system.

The physical machine:

Installs in a server room, in a rack
Consumes electricity directly
Heats up and must therefore be cooled
Must be physically cabled by an operator
Has a limited lifespan to that of its hardware
Is physically connected to one or more networks
Any unused resource is lost, because hardware wears out even if not used.

The operating system:

Is installed directly on the physical machine
Has a lifespan less than or equal to that of the hardware
Must be installed each time (automated or not)
Uses all resources of the physical machine
Cannot obtain additional resources unless there is physical intervention on the host machine.

The application:

Executes directly on the operating system
Must share the amount of resources (cpu, ram, network) left free by the operating system with other applications.
Can have conflicts with other applications on shared resources (ports, files, others)
Is dependent on libraries installed on the system.
Its dependencies can conflict with those of other applications or the operating system

In this operation, each server is unique. Usually, they are managed in small numbers by system administrators who watch over each of them, give them little names, etc… Some teams manage them in larger numbers, but the management observation remains the same. We manage servers like pets: they are pets (pets in English).

Level 1 virtualization: Hypervisors

The constraints being great for adding physical servers, especially at the budget level, as well as the poor optimization of their lifecycle and their resources, have pushed system teams to change their approach.

The hypervisor was born from this desire to pool resources. There are many like ESX (with or without “i” at the end) from VMWARE, VIRTUALBOX, PROXMOX and XEN, for the best known

Let’s take our previous diagram:

On this diagram, we still have the physical machine (And there will always be, whatever happens, IT is not magic, even in “serverless”), on which a hypervisor is installed, in place of the operating system on the previous diagram.

I’m going to spoil the rest: the hypervisor is an operating system. However, it has several particularities. This system is made to emulate physical machines. In other words, it runs programs that are supposed to offer the same functionalities to operating systems as physical machines would.

The goal of these programs is to make operating systems that we will install on them believe they are physical machines, with their own file systems, processors, RAM, network, etc…

The hypervisor is capable of emulating machines as well as the network infrastructure accompanying them. Technically, on a hypervisor, we can make a complete information system with network switches, servers, applications installed on these servers. All this is of formidable efficiency.

By connecting several hypervisors, we can create substantial infrastructures at little cost. For 450€/month, I had at one time on 3 OVH physical servers between 55 and 70 virtual machines each of which had its operating system and its applications.

VMs, virtual machines, work as follows:

The virtualized machine looks a lot like a machine installed without virtualization. It behaves the same way. The only difference at its level is that it will have some applications and drivers installed to optimize its operation in a virtualized environment.

The advantages of a hypervisor are:

Virtual machines are no longer tied to the physical host: we can move, backup, duplicate, destroy them without impacting the physical machine.
Machine moves from one hypervisor to another can be done hot on certain types of hypervisors. (Tested losing only one ping on the machine during migration.)
We pool available resources (ballooning): we provision more virtual resources than are physically available. We consider that all machines don’t consume 100% of resources, 100% of the time, but they each have their activity peaks, which are not necessarily simultaneous.
It’s not necessary to physically install a server for each virtual machine. Installations are done from templates (templates, pre-installed images) and can therefore be done remotely.
Ballooning and hot moves can allow limiting activity loss by automatically switching virtual machines running on a crashed hypervisor to its neighbors. A physical machine that fails is no longer necessarily associated with service loss.
Simplification of management also allows not having all critical machines in duplicate, saving precious hardware and financial resources.

Thanks to all these advantages, the door is open to standardized server management. An important devops concept in recent years is “PET Vs CATTLE”.

On one side, the PET (pet) is a server, physical or virtual, managed with care, which has a name, and which the admin(s) repair when it’s broken.

On the other, CATTLE (cattle): Each server is standardized and automated installation procedures allow its management like a head of cattle. It no longer has its own management. If it’s broken, it’s replaced.

Now, the disadvantages of a hypervisor:

Addition of an additional layer between the physical machine and the VM’s operating system, therefore fewer resources available in total, due to hypervisor consumption.
More difficulties accessing certain physical machine resources, like specific physical cards, the administration console (which is natural on a physical machine, since it appears on the screen connected to it), USB ports or GPUs. All this is not insurmountable, but becomes non-trivial.
The hypervisor, although often accompanied by a simple-to-use graphical interface, remains a very complex environment. I’ve already lost for a week a cluster of 3 hypervisors (with 70 machines as mentioned above) due to a network bug and a ghost network card not listed. This case is fortunately rare. (1 time in 6 years of hypervisor management)

Higher level, container virtualization

Last point of this post, containers.

The best known container management system is undoubtedly Docker. (See my post Why no new project should see the light of day without docker)

How does it work and how do we use it?

Container virtualization creates a kind of virtual machine, much lighter than a classic virtual machine, because it uses the kernel of the operating system on which it is installed. It is therefore dependent on it, but in return, it only embarks the bare minimum, hence its lightness.

We install on the operating system a daemon/service, like dockerd, and this daemon creates containers isolated from each other, each having its environment and its file system. The daemon also takes care of emulating a network between containers, the host and the outside.

Container isolation is ensured thanks to features directly implemented in the operating system kernel.

The huge advantage is that the application doesn’t run, functionally, on the host machine, although it uses its resources. It doesn’t depend on system libraries installed and its own dependencies won’t affect the system. (To go deeper, again, see my post Why no new project should see the light of day without docker)

This type of light virtualization can be associated with heavy virtualization by hypervisor. Indeed, we can start containers in a virtual machine, as well as on a physical machine.

The next level: Kubernetes, swarm, etc…

The next step in container management consists of organizing them without worrying about the underlying infrastructure. The principle is to run them on host machines that will be standardized, down to applications and having as their only purpose to run containers, and managed in a standardized way by a control tower. The control tower will receive resource allocation requests from us and will automatically choose where to run the applications we need.

A kubernetes slave, the machine that runs containers with applications in k8s, can be schematically represented like this:

It’s a normal machine on which we install the kubernetes client part. It keeps its control tower informed of available resources and running containers and responds to resource requests. it takes commands from the machine to run containers requested by the control tower.

Swarm and mesos work according to an equivalent scheme. We interact with the masters (the distributed control tower) and the system takes care of deploying applications where it sees fit, with the desired number of instances and ensures providing a path to access them simply.

It’s quite impressive as a concept, but it works.

So, we deploy applications “somewhere” and…. that’s it!

I’ll come back in a future post on Kubernetes to explain this in more detail.

And the cloud in all this?

Today, offers provided by cloud providers are often mixed.

If we take the example of amazon AWS:

EC2: heavy-type hypervisors
ECS: Docker containerization in standalone: it’s a simple solution to run a few containers
EKS: Hosted kubernetes solution
Lambda: we just give the application we want to run and amazon manages it for us. This uses the solutions seen previously

We realize that the technologies and concepts behind provider platforms are complex and intertwined and it’s difficult to classify them. Globally, the more knowledge you have, the more you can afford to take low-level services, which will be cheaper.

Cloud providers, given their volumes, tend to develop their own systems, like amazon with nitro, their new hypervisor, which promises to run virtual machines as fast as if they were physical machines.

In conclusion

Virtualization has existed for a long time and is widely used. Today, however, you must know how to efficiently choose the level you’re going to choose for your applications. What level of detail and mastery do you expect? Is it profitable for your company, your project?

Today, everyone wants to go towards containers, by fad. But it’s not the solution to everything.

The ecosystem is rich. Use it according to your needs and your skills.