Skip to content

Linux: Storage

Basics

Let's learn how to work with Linux storage and partitions. Storage devices are refered to by Block or Character devices in Linux. Block devices read and write block of data in HDDs and SSDs. Character devices read and write streams of data (keyboard, mice, serail ports).

File system

A file system is a data structure used by the operating system to store, retrieve, and manage files and directories. It also maintains the metadata of the files and directories.

Linux supports different file systems:

  • FAT: File Allocation Table. Compatible with many operating systems
  • ext2: A Linux native file system
  • ext3: A much faster and better file system
  • ext4: Supports volumes of up to 1 exabyte and a file size of up to 16 terabytes
  • BTRFS: Supports volumes of up to 16 exabytes and up to 18 quintrillion files in each volume
  • XFS: A 64-bit efficient and high performance journaling file system.

Other file systems function as network protocols:

  • SMB: Simple Message Block allows sharing data over a network.
  • CIFS: Common Internet File System
  • NFS: Network File System
  • VFS: Virtual File System

NFS and SMB are not compatible with each other.

Index Node (Inode)

The Inode store metadata about a file or directory on a file system. It does not include the file name or contents. Each file requires one inode. Use the command stat <filename> to view a file's index node. Directories also require inodes. We can run out of inodes before we run out of disk space. When that happen we can no longer create files in the disk. Inode exhaustion occurs when we have lots of small files.

df -i to view inode usage on Linux.

ls -i to view the inode files in a directory.

It is important to monitor inode usage.

Partitions

A partition is a section of the drive that as a separate drive. It allows dividing a drive into smaller and more manageable chunks.

There 3 types of partitions we can use:

  1. Primary partitions. It contains one file system or logical drive also called a volume. The swap file system and the boot partition are usually created in the primary partition. The swap file system is used as memory when the system runs out of memory.
  2. Extended partitions. It contains also multiple logical drives. This partition type does not contain any data and has a separate partition table
  3. Logical partitions. It is partitioned and allocated as an independent unit and function as a separate drive.

fdisk is used to create, modify, and delete partitions on a drive. It supports multiple options. It is used on older systems using MBR with small disks.

gdisk supports GPT partition tables.

parted is used to create, destroy, resize and resize partitions and runs the GNU Parted utility. It is a menu driven interactive tool just like fdisk but modern. It is supports large disks and GPT partition tables. It also allows the resizing of partitions without lost of data.

growpart is a tool used for extending partitions in cloud environment and virtual machines without destroying the data. The generic syntax is growpart <device> <partition number>. ex: growpart /dev/sdb 1 to expand partition 1 on /dev/sdb then resize2fs /dev/sdb1. growpart is part of the cloud-guest-utils (for Debian) and cloud-utils-growpart (for RHEL) package that may need to be installed separately. growpart does not create or delete partitions.

partprobe is used to update the kernel with changes that now exists within the partition table.

mkfs is used to build a Linux file system on a device, which is usually a drive partition. mkfs [options] <device> to run. The other way to use mkfs is mkfs <file system type> [options] <device name>.

fstab stores information about storage devices and partitions and where and how they should be mounted.

The /etc/crypttab file store information about encrypted devices and partitions that must be unlocked and mounted on system boot.

The process of setting up a storage device on a Linux system includes:

  1. Creating partitions using tools like fdisk or parted

fdisk to enter the menu n to select for creating a new partition. Provide the first sector (can be left to the default value) and the last sector to delimit the size of the partition. For example +4096M to imply adding 4Gi of space from the first sector. p to print the information w to write the changes q to quit the menu without saving the changes

Once the partition is create, use partprobe /dev/sdb to inform the kernel of the changes in the partition table.

parted /dev/sdb to enter the menu of GNU parted tool. print to see the partitions in the disk. mkpart to create a new partition. Type primary to select primary as type of partition.

example:

parted /dev/sdb to enter into the menu

mkpart to make a partition. Can also use mkpart PARTITION-TYPE [FS-TYPE] START END to create the partition in one go. ex: mkpart primary ext3 2048M 4096M or mkpart primary ext3 0% 30%

Partition type? primary/extended? primary to specify the type of partition

File system type? [ext1]? ext4 the specify the type of file system (ext1, ext2, ext3, ext4, xfs, ...)

Start? 2048M to set the start of the partition

End? 4298M to set the end of the partition

print to views partitions information

help to view parted commands details

quit to quit parted

Selecting the file system type does not format the partition in the select file system type but let the partition know that it will be formatted with the selected file system.

  1. Formatting the partitions with a file system using tools like mkfs

mkfs.xfs /dev/sdb1 to create a xfs file system on sdb1 partition. You might have to use the -f flag to force the formatting of the partition.

  1. Labeling the partition

For xfs partitions, use:

xfs_admin -l /dev/sdb1 to print out the label.

xfs_admin -L LABEL /dev/sdb1 to label the partition. ex: xfs_admin -L SystemBackup /dev/sdb1 to label the partition sdb1 SystemBackup.

For ext file system we use a different command:

e2label /dev/sdb2 to print out the current label

e2label /dev/sdb2 LABEL to add a label to the partition. ex: e2label /dev/sdb2 DataBackup

  1. Adding the formatted partition to fstab file so it can be configured by the system and run at boot time

/dev is a special file that contains all the devices.

/dev/null, /dev/zero, and /dev/urandom are special character storage devices used in the linux systems.

/dev/null (the Black Hole) is a special virtual device that discards anything that is redirected to it.

/dev/zero (the Zero Generator) is also a special virtual device that returns a null character (0x0000) anytime you read from it.

/dev/urandom (the Pseudorandom Source) is also a special virtual device that returns a randomized series of pseudorandom numbers.

Expanding a partition

lsblk to see all available disks and partitions fdisk /dev/sdb to enter fdisk menu and create a new partition using the n command parted /dev/sdb to enter parted menu for resizing the partition we just created. Run resizepart 1 5GB then quit

Logical Volumes

Partitions are not the only way to divide storage devices logically.

In Linux, Device Mapper creates virtual device and passes data from that virtual device to on or more physical devices. DM-Multipath provides redundancy and improved performance for block storage devices. mdadm is used to create and manage software-defined RAID arrays in Linux. In RAID arrays, data is stored across multiple storage devices, and those devices are presented as a single storage device. Using a RAID array is an alternative for using Device Mapper and DM-Multipath. mdadm tool allows creating, modifying, and managing RAID arrays.

RAID: Redundant Array of Independent or Inexpensive Disks. We have RAID 0, RAID 1, RAID 6, RAID 6, and RAID 10.

  • Striping combines multiple smaller physical disks to logically act as a single larger disk. ex: combining 2x2Tb disks to form a 4Tb logical disk.

  • Mirroring combines two physical hard drives into a single logical volume where an identical copy of everything is put on both drives. If one disk fails, the data can be entirely recovered in the second drive because it contains a full copy of the data.

  • Parity is used in RAID arrays for fault tolerance by calculating the data in two drives and storing the results on a different drive.

RAID 0 (Striping) is great for speed but provides no data redundancy. If one disk fail, the data cannot be recovered. And there is no lost of space.

RAID 1 (Mirroring) each disk contains a full copy of the data. There is a lost of space in this RAID 1.

RAID 5 (Striping with Parity). You must have at least 3 disks to configure a RAID 5. It is more efficient to create RAID 5, in terms of space, than a RAID 1. It is the most populate type of RAID in use in most server environments.

RAID 6 = RAID 5 + RAID 1 (Striping with Dual Party). We can loose up to 2 drives in RAID 6 compared to only one disk in RAID 5.

RAID 10 (Striping + Mirroring). It consists of 2 RAID 1 inside a RAID 0. RAID 10 requires a minimum of 4 drives. There will be a 50% lost of disk space when creating a RAID 10.

/proc/mdstat file is used to get information about the RAID in a Linux system. Use cat /proc/mdstat to read RAID configuration on the system.

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sd[b-c] to create a software RAID in Linux.

Logical Volume Mapper (LVM)

LVM maps whole physical devices and partitions into one or more virtual containers called volume groups. The volume groups are going to become the storage devices the system, users, and applications are going to interact with. With the LVM we can dynamically create, delete, and resize volumes without having to reboot the system. We can also map multiple logical volumes across multiple physical devices. And we can create virtual snapshots of each logical volumes.

/dev/mapper contains all logical volumes in a system that are been managed by the LVM. Devices in the /dev/mapper directory are usually called /dev/mapper/<volume groupe name>-<logical volume name>

The Logical Volume Manager provide logical volume management tools based on three layers:

  1. Physical volume tools:

LVM physical volumes are raw physical drives or partitions used by LVM.

pvscan scans for all physical devices being used as physical volumes

pvcreate initializes a drive or partition to use as a physical volume

pvdisplay lists attributes of physical volumes

pvchange changes attributes of a physical volumes

pvmove moves data from one physical volume to another without loss of data. Useful when we want to migrate data from one disk to another. pvmove [options] <source pv> [destination pv] ex: pvmove /dev/sdb1 /dev/sdc1 to move data from /dev/sdb1 to /dev/sdc1. --background flag allows running the command in the background. --abort is used to cancel a running move.

pvs displays information about physical volumes

pvck checks the metadata of physical volume

pvremove removes physical volumes. The command will fail if the pv is part of any group.

pvresize to make LVM recognized increased size of physical volumes. ex: pvresize /dev/sdb

  1. Volume group tools:

A volume groups is a pool of storage created from one or more LVM physical volumes.

vgscan scan all physical devices for volume groups

vgcreate creates volume groups

vgdisplay list attributes of volume groups

vgchange changes attributes of volume groups

vgs displays information about volume groups

vgck checks the metadata of volume groups

vgrename renames a volume group

vgreduce removes physical volumes from a group to reduce its size

vgextend adds physical volumes to volume groups

vgmerge merges two volume groups

vgsplit splits a volume group into two

vgremove removes volume groups

  1. Logical volume tools:

A logical volume is a partition created from a volume group that will act as a storage device in the system.

lvscan scans all physical devices for logical volumes

lvcreate creates logical volumes in a volume group

lvdisplay displays attributes of logical volumes

lvchange changes attributes of the logical volumes

lvs displays information about logical volumes

lvrename renames a logical volume

lvreduce reduces the size of a logical volume

lvextend extends the size of the logical volume

lvresize resizes logical volumes

lvremove removes logical volumes

Managing Logical Volumes

ls /dev/mapper to see logical volumes

pvcreate /dev/sdb1 /dev/sdb2 to create a physical volumes

vgcreate backups /dev/sdb1 /dev/sdb2 to create a volume groups

lvcreate --name sysbk --size 4GB backups

lvcreate --name appbk --size 2GB backups

lvcreate --name logbk --size 0.5GB backups

lvscan and lvdisplay will show the 3 newly created logical volumes sysbk, appbk, and logbk in the backups volume group.

So we create a volume group (backups) that groups 2 physical volumes (sdb1 and sdb2). Then we created 3 logical volumes (sysbk, appbk, and logbk) in the volume group.

Let's extend logbk to 1GB:

lvextend -L1G /dev/backups/logbk

Let's reduce appbk to 1GB:

lvreduce -L1G /dev/backups/appbk

Let's file system in the volumes we just created:

mkfs.xfs /dev/backups/sysbk

mkfs.ext4 /dev/backups/appbk

mkfs.ext4 /dev/backups/logbk

Mounting/Unmounting File Systems

File systems have to be mounted before the OS can read or write.

A Mount Point is an access point that is typically an empty directory whre a file system is loaded or mounted to make it accessible to users.

The mount command is used to load a file system to a specify directory to make it accessible to users and applications.

mount [options] <device name> <mount point> to mount a file system. ex: mount /dev/sdb1 /home to mount sdb1 to the home directory.

Mount options:

auto device must be mounted automatically

noauto device should not be mounted automatically

nouser only the root user can mount a device or a file system

user all users can mount a device or file system

exec allow binaries in a file system to be executed

noexec prevent binaries in a file system from being executed

ro mount a file system as read-only

rw mount a file system with read write permissions

sync input and output operations should be done synchronously

async input and output operations should be done asynchronously

remount allows to change a file system options without having to unmount it first. For example changing from rw to ro. Ex: mount -o remount,ro /mnt/data to remount /mnt/data in ro mode.

noatime prevents the application of a timestamp to record file reads. This can improve performance as it reduces disk writes.

nodiratime stops updates for directory access times

flush ensures that all metadata is written on disk

Unmounting a File System

We can use the umount command to unmount a file system. The file system must not be in use when being unmounted.

Use umount [options] <mount point> to unmount a file system.

fstab (File System Table) contains the list of file systems to be mounted, their mount points, and any options that might be needed for specific file systems. We can also use systemd.mount to create a new mount unit to mount a file system.

Filesystem in USErspace (FUSE) lets non-priviledge users create own file systems without editing the underlying kernel code.

Let's mount our logical volumes now:

mkdir -p /backups/sys /backups/app /backups/log to create necessary directories

mount /dev/backups/sysbk /backups/sys

mount /dev/backups/appbk /backups/app

mount /dev/backups/logbk /backups/log

mount will show the associations of volumes and mount points.

umount /backups/log will unmount the volume mounted at backups/log

Mounting Volumes at Boot

Let's make sure that volumes are mounted at boot time. For that we need to manipulate the fstab configuration file.

nano /etc/fstab to start editing the configuration file.

Add all volumes to be mounted at boot time at the end of the file. Ex:

/dev/backups/sysbk /backups/sys xfs 0 0

/dev/backups/appbk /backups/app ext4 0 0

/dev/backups/logbk /backups/log ext4 0 0

Then use mount -a to test the fstab configurations to make sure that all volumes listed can be mounted at boot time.

Managing File Systems

There are a lots of tools and configurations that can be used to manage Linux file systems. For example:

/etc/mtab file reports the status of currently mounted file systems. It looks like the /proc/mounts but /proc/mounts is more accurate and includes more up-to-date information on file systems.

/proc/partitions file contains information about each partition attached to the file system.

lsblk displays information about block storage devices currently available on the system.

lsblk [options] [device name] has multiple options:

  • -a list empty devices

  • -r list devices excluding provided output devices

  • -f display additional information

  • -l display results in list format

  • -m display device permission information

blkid prints each block device in a flat format and includes some additional information.

The following are common tools used for managing ext type file systems:

e2fsck used to check for file system issues

resize2fs used to resize ext2, ext3, and ext4 file systems

tune2fs used to adjust various tunable parameters of the ext2/ext3 file systems. it can also be used to add a journal to an existing ext2 or ext3 file system.

dumpe2fs prints the superblock and block group information of the selected device.

Use fsck command to check the correctness and validity of a file system.

A file system superblock contains metadata about the file system, including its size, type, and status. If it becomes corrupted, we must use a tool like fsck to repair it.

The following are tools used to manage xfs file systems:

xfs_info display details about the XFS file system

xfs_admin change the parameters of an XFS file system

xfs_metadump copy the superblock metadata of the XFS file system to a file

xfs_growfs expand the XFS file system to file the drive size

xfs_copy copy the contents of the XFS file system to another location

xfs_repair repair and recover a corrupt XFS file system

xfs_db debug the XFS file system

For scsi devices, use lsscsi to list information about SCSI devices connected to a Linux system.

fcstat is used to interact with and display statistics of Fiber Channel connected devices.

Directory Structure

Directories are containers for other files

Special files are system files stored in the /dev directory.

Links make a file accessible in multiple parts of the system's file tree.

Domain Sockets provide inter-process networking that is protected by the file system's access control

Name pipes enables processes to communicate with each other without using network sockets

Use the command file [options] <file name> to determine the type of file you are working with.

Security Mount Options

nodev prevents the use of special device files. ex: mount -o nodev /dev/sdb1 /mnt/safe or in fstab /dev/sdb1 /mnt/safe ext4 defaults,nodev 0 2. It may be used in /home, /mnt/usb, /var/tmp, and more. It should not be used in /dev

nosuid prevents files from granting extra priviledges when they run. It may be used in /home, /var/tmp, network shares, and removable drives.

noexec prevents a binary program from executing directly from file system even if it has an execute permission. Ex: mount -o noexec /dev/sdb1 /mnt/usb or /dev/sdb1 /mnt/usb ext4 defaults,noexec 0 2(in /etc/fstab). noexec prevents direct execution (ex ./program.sh) but not interpreters' scripts (ex python ./myscript.py)

Network Mounts

NFS: Network File System. Linux to Linux sharing. Syntax: mount -t nfs <server>:/<remote-path> <local mountpoint>. ex: mount -t nfs 192.168.1.2:/shared/data /mnt/shared or in fstab 192.168.1.10:/shared/data /mnt/shared nfs defaults,_netdev 0 0

SMB: Server Message Block. Cross platform sharing. SMB uses CIFS (Common Internet File System) protocol under the hood. Syntax: mount -t cifs //<server>/<share> <local mountpoint> -o username=<user>,password=<pass>. The local mount point is the directory where you want the mount to show once mounted. ex: mount -t cifs //192.168.1.2/shared /mnt/shared -o username=demo,password=vEreaSaCrEat

Troubleshooting Storage Issues

ulimit limits the system resources for a user in a Linux-based server.

ulimit -n 512 limits the number of open files to 512 for a particular user

ulimit -a displays all the current limits

df/du (disk free/disk usage) facilitate storage space tracking. df displays the device's storage space and du displays how a device is used.

df -h to display human readable format of the amount of space in the system

df -h /dev/backups/appbk to see the space in appbk logical volume

du -h /dev/backups

iostat generates reports on CPU and device usage. I can be used to determine issues with iops.

iostat -d /dev/backups/appbk to view the I/O stats of appbk

ioping generates a report of device I/O latency in real-time

ioping -c 4 /dev/backups/logbk to check for latency in real-time

Storage Quota allows managing the storage space per user. Here are tools used to manage storage quotas:

quotacheck -cug <mount point> create quota database file for a file system and check for user and group quotas

edquota -u <username> edit quotas for a specific user

edquota -g <group name> edit quotas for a specific group

setquota -u <username> set quotas for a specific user

setquota -g <group name> set quotas for a specific group

Before using these commands, we have to activate user and group quotas on the file system. To do that, edit the fstab file to add the options usrquota and grpquota to the the relevant file system.

With XFS file system, you can use xfs_admin to configure quotas.

To generate quota reports, use the following the commands:

repquota -a display the reports for all file systems indicated as read-write quotas in the mtab file

repquota -u <username> display the quota report for a particular user

quota -uv <username> display the quota report for a particular user with verbose output

warnquota -u check if users are not exceeding the alloted quota limit

warnquota -g check if groups are not exceeding the alloted quota limit

To troubleshoot a device, start with simple things:

  • make sure it is powered and recognized by the system by checking the /proc directory
  • check errors in the configuration files ftab
  • make sure to reload the config files if changes have been made to them
  • confirm that there is enough capacity
  • confirm that I/O is not overwhelming the device
  • use partprobe to scan for new storage devices and partitions and load them into the kernel
  • select the appropriate scheduler to optimize performance. ex noop for flash drives. To change the scheduler, change the /sys/block/[sda]/queue/scheduler config file with echo noop > /sys/block/[sda]/queue/scheduler
  • to prevent user from consumming excessive space, we are going to edit the fstab config file to enable quotas with /dev/backups/appbk /backups/app ext4 defaults,usrquota 0 0. Use edquota -u john to edit the soft and hard limits of the user john.