Today we all need storage space and we get it in all kind of flavors: local disks, file servers on the local network, cloud storage, etc. In this post I’ll show you how you can quickly create a basic storage infrastructure that you can use for your data storage.
As you may see in the diagram the topology consists of 2 clients, 2 switches and 2 storage servers. All the components in the diagram are KVM virtual machines running on a single host. You can easily create this setup on your own workstation. Clients need to be able to access the files on the storage servers as they would on their local drives. In order to do that we can export file system paths on the storage servers by protocols such as NFS or SMB/CIFS. NFS is my favorite one so I’ll show how you can use it in this example. The clients will be basic Debian VMs running the NFS clients.
Hardware always fails so in order for our system to be available we need to have redundant components. The first storage server will act as the primary server, all the clients will access directly. Data on the primary server will be replicated on the second one which will act as a slave (standby). For the storage file systems we will use ZFS which is both a file system and logical volume manager. ZFS ensures data integrity checks and automatic repairs, it provides various software RAID levels and many other great features such as snapshots, compression, deduplication and replication. In addition to this ZFS is a 128-bit file system providing lots and lots of storage space. Unfortunately ZFS is licensed under Suns CDDL which is not compatible with the Linux kernel license: GPLv2. Work is currently being done for a port of ZFS for the Linux kernel ( ZFS on Linux ). In our topology the storage servers will run zfs on FreeBSD 9.1.
The network infrastructure is very important when we need redundant systems. Each of the storage servers will have 2 NICs grouped in a bridge connected to 2 different switches. In order to prevent bridge loops we need a loop prevention mechanism such as STP. In our environment we will use STP (802.1d) but be aware that in a production network you need other flavors of STP for reduced convergence time. The switches will run Debian as OS and Open vSwitch for the virtual switch.
We can now proceed and start creating our infrastructure. I will be running Debian jessie which is the current testing release on both the host machine and the virtual machines. For the VMs communications on the hypervisor I will be using Linux bridges. Other tools that we need for the VMs management are libvirt, virt-install or virt-manager which provides a gui.
Installing required packages for VMs management:
Now let’s create a bridge for each of the vms links plus an additional one used for out of band management. eth0 is my public interface so i will use it for the management bridge.
Next thing to do is to create the virtual drives of the vms. I will use qcow2 files as they provide copy-on-write support and snapshots. You can use the qemu-img tools for creating the files. Each of the vms will be assigned one 10GB virtual drive for the OS. The storage servers will have 8 additional 260TB disks used for storage.
Now we are ready to shoot the installers. I prefer virt-install which is a CLI tool used for vms installations. You can also use virt-manager which provides a nice GUI. I will do a net install for the Debian VMs and run the installer from an attached cdrom for FreeBSD. You can use different drivers for the I/O devices. For the Linux machines I will use paravirtualized virtio drivers since they offer better performance. I noticed that FreeBSD doesn’t include native virtio support so I will use scsi drives and intel e1000 nics for it. You can even select the cpu model and what cpu features are available to the vm. The extra-args option passes the priority=low to the kernel command line which gets the Debian installer into expert mode.
The installer should be pretty straight forward. Once they are complete we can move forward. We’ll first need to set the IP addresses for the network interfaces and install the required packages. Please note that you need to edit /etc/network/interfaces to make the IP addresses persistent.
Quick tip: when you debug network connectivity issues always check the link status of the interfaces ( physical, virtual, bridges, etc) and get tcpdump installed on the same line with your favorite text editor.
NFS Clients:
Now lets’s configure the switches/bridges. One important thing we need to take care of is the STP root bridge of our topology. We need to make sure that one of the switches will be the root bridge otherwise all the traffic will be forwarded by the storage servers bridges and it’s not their job to do that.
Open vSwitch:
Storage machines networking:
We can notice that one of the physical interfaces is in forwarding state and the other one is discarding. The same settings need to be done on the other machines except IP addresses, hostnames and other things that need to be unique. At this point we have the network infrastructure ready with basic connectivity ensured for all the nodes.
We can go ahead and do the storage configuration. As mentioned before storage1 will be the primary storage machine so we’ll start with it.
The list shows us that we have one dvdrom (installer) and 9 drives (1 for the OS and 8 for the storage) attached.
ZFS:
Let’s create the ZFS storage pool. The pool will be made up of 4 mirrors of 2 x 260TB drives. ZFS mirrors are similar to RAID1 level. All the mirrors are striped so the resulting pool will have a RAID10 like fault tolerance. The total capacity of the pool will be of 1 Petabyte :)
We can now create a dataset inside the storage pool setting specific attributes. For instance, let’s create a dataset that has compression enabled:
Now let’s ctually see the benefits:
Let’s say we need storage space for users. We can create a dataset for that and assign each user a dataset that has an assigned quota:
Sanpshots:
ZFS provides snapshot functionality which is an image of the file system at the time you snapshot it. Let’s create a new data set called set, create some random files inside it and then snapshot it calling it snap1.
How can we replicate all the data to the other storage machine ? ZFS enables you to transfer snapshots to another machine by piping zfs send and zfs recive which is pretty awesome. The receiving end will extract the snapshot and recreate the file system.
We have an empty storage pool called storage on storage2 where we’ll transfer snap1 we have just created. We’ll use netcat for raw tcp piping but you can also use ssh.
You can also send incremental data using zfs send. We’ll create an additional file in /storage/set, create a snapshot and transfer only the differences between the second and the first snapshot. We’ll use ssh this time:
What happens if we delete a file ?
Using the great snapshot and zfs send and receive features we could easily set up a cron job that does periodic incremental transfers thus having all the data on storage1 replicated on storage2.
Now that we have all the storage prepared let’s mount it on the clients by NFS. On the storage server prepare the /etc/exports file:
Now let’s go for a failover test of the bridged interfaces on the storage machine. We’ll start writing a random file using dd in vm1:/mnt, shutdown the forwarding interface on storage1 and wait to see what happens.
The result:
Transfer was very slow but your data is uncorrupted on the remote storage.
I hope you enjoyed this long tutorial :) Let me know if you any questions or other observations and I’ll be more than happy to answer.