Linux: Storage
Basics
Let's learn how to work with Linux storage and partitions. Storage devices are refered to by Block
or Character
devices in Linux. Block devices
read and write block of data in HDDs and SSDs. Character devices
read and write streams of data (keyboard, mice, serail ports).
File system
A file system
is a data structure used by the operating system to store, retrieve, and manage files and directories. It also maintains the metadata of the files and directories.
Linux supports different file systems:
- FAT: File Allocation Table. Compatible with many operating systems
- ext2: A Linux native file system
- ext3: A much faster and better file system
- ext4: Supports volumes of up to 1 exabyte and a file size of up to 16 terabytes
- BTRFS: Supports volumes of up to 16 exabytes and up to 18 quintrillion files in each volume
- XFS: A 64-bit efficient and high performance journaling file system.
Other file systems function as network protocols:
- SMB: Simple Message Block allows sharing data over a network.
- CIFS: Common Internet File System
- NFS: Network File System
- VFS: Virtual File System
NFS and SMB are not compatible with each other.
Index Node (Inode)
The Inode store metadata about a file or directory on a file system. It does not include the file name or contents. Each file requires one inode. Use the command stat <filename>
to view a file's index node. Directories also require inodes. We can run out of inodes before we run out of disk space. When that happen we can no longer create files in the disk. Inode exhaustion occurs when we have lots of small files.
df -i
to view inode usage on Linux.
ls -i
to view the inode files in a directory.
It is important to monitor inode usage.
Partitions
A partition is a section of the drive that as a separate drive. It allows dividing a drive into smaller and more manageable chunks.
There 3 types of partitions we can use:
- Primary partitions. It contains one file system or logical drive also called a volume. The
swap
file system and theboot
partition are usually created in the primary partition. The swap file system is used as memory when the system runs out of memory. - Extended partitions. It contains also multiple logical drives. This partition type does not contain any data and has a separate partition table
- Logical partitions. It is partitioned and allocated as an independent unit and function as a separate drive.
fdisk
is used to create, modify, and delete partitions on a drive. It supports multiple options. It is used on older systems using MBR
with small disks.
gdisk
supports GPT
partition tables.
parted
is used to create, destroy, resize and resize partitions and runs the GNU Parted utility. It is a menu driven interactive tool just like fdisk
but modern. It is supports large disks and GPT partition tables. It also allows the resizing of partitions without lost of data.
growpart
is a tool used for extending partitions in cloud environment and virtual machines without destroying the data. The generic syntax is growpart <device> <partition number>
. ex: growpart /dev/sdb 1
to expand partition 1
on /dev/sdb
then resize2fs /dev/sdb1
. growpart
is part of the cloud-guest-utils
(for Debian
) and cloud-utils-growpart
(for RHEL
) package that may need to be installed separately. growpart
does not create or delete partitions.
partprobe
is used to update the kernel with changes that now exists within the partition table.
mkfs
is used to build a Linux file system on a device, which is usually a drive partition. mkfs [options] <device>
to run. The other way to use mkfs
is mkfs <file system type> [options] <device name>
.
fstab
stores information about storage devices and partitions and where and how they should be mounted.
The /etc/crypttab
file store information about encrypted devices and partitions that must be unlocked and mounted on system boot.
The process of setting up a storage device on a Linux system includes:
- Creating partitions using tools like
fdisk
orparted
fdisk
to enter the menu
n
to select for creating a new partition. Provide the first sector (can be left to the default value) and the last sector to delimit the size of the partition. For example +4096M
to imply adding 4Gi of space from the first sector.
p
to print the information
w
to write the changes
q
to quit the menu without saving the changes
Once the partition is create, use partprobe /dev/sdb
to inform the kernel of the changes in the partition table.
parted /dev/sdb
to enter the menu of GNU parted
tool. print
to see the partitions in the disk. mkpart
to create a new partition. Type primary
to select primary
as type of partition.
example:
parted /dev/sdb
to enter into the menu
mkpart
to make a partition. Can also use mkpart PARTITION-TYPE [FS-TYPE] START END
to create the partition in one go. ex: mkpart primary ext3 2048M 4096M
or mkpart primary ext3 0% 30%
Partition type? primary/extended? primary
to specify the type of partition
File system type? [ext1]? ext4
the specify the type of file system (ext1, ext2, ext3, ext4, xfs, ...)
Start? 2048M
to set the start of the partition
End? 4298M
to set the end of the partition
print
to views partitions information
help
to view parted
commands details
quit
to quit parted
Selecting the file system type does not format the partition in the select file system type but let the partition know that it will be formatted with the selected file system.
- Formatting the partitions with a file system using tools like
mkfs
mkfs.xfs /dev/sdb1
to create a xfs
file system on sdb1
partition. You might have to use the -f
flag to force the formatting of the partition.
- Labeling the partition
For xfs
partitions, use:
xfs_admin -l /dev/sdb1
to print out the label.
xfs_admin -L LABEL /dev/sdb1
to label the partition. ex: xfs_admin -L SystemBackup /dev/sdb1
to label the partition sdb1
SystemBackup
.
For ext
file system we use a different command:
e2label /dev/sdb2
to print out the current label
e2label /dev/sdb2 LABEL
to add a label to the partition. ex: e2label /dev/sdb2 DataBackup
- Adding the formatted partition to
fstab
file so it can be configured by the system and run at boot time
/dev
is a special file that contains all the devices.
/dev/null
, /dev/zero
, and /dev/urandom
are special character storage devices used in the linux systems.
/dev/null
(the Black Hole) is a special virtual device that discards anything that is redirected to it.
/dev/zero
(the Zero Generator) is also a special virtual device that returns a null character (0x0000) anytime you read from it.
/dev/urandom
(the Pseudorandom Source) is also a special virtual device that returns a randomized series of pseudorandom numbers.
Expanding a partition
lsblk
to see all available disks and partitions
fdisk /dev/sdb
to enter fdisk menu and create a new partition using the n
command
parted /dev/sdb
to enter parted menu for resizing the partition we just created. Run resizepart 1 5GB
then quit
Logical Volumes
Partitions are not the only way to divide storage devices logically.
In Linux, Device Mapper
creates virtual device and passes data from that virtual device to on or more physical devices. DM-Multipath
provides redundancy and improved performance for block storage devices. mdadm
is used to create and manage software-defined RAID arrays in Linux. In RAID arrays, data is stored across multiple storage devices, and those devices are presented as a single storage device. Using a RAID array is an alternative for using Device Mapper
and DM-Multipath
. mdadm
tool allows creating, modifying, and managing RAID arrays.
RAID: Redundant Array of Independent or Inexpensive Disks
. We have RAID 0
, RAID 1
, RAID 6
, RAID 6
, and RAID 10
.
-
Striping
combines multiple smaller physical disks to logically act as a single larger disk. ex: combining 2x2Tb disks to form a 4Tb logical disk. -
Mirroring
combines two physical hard drives into a single logical volume where an identical copy of everything is put on both drives. If one disk fails, the data can be entirely recovered in the second drive because it contains a full copy of the data. -
Parity
is used in RAID arrays for fault tolerance by calculating the data in two drives and storing the results on a different drive.
RAID 0
(Striping) is great for speed but provides no data redundancy. If one disk fail, the data cannot be recovered. And there is no lost of space.
RAID 1
(Mirroring) each disk contains a full copy of the data. There is a lost of space in this RAID 1.
RAID 5
(Striping with Parity). You must have at least 3 disks to configure a RAID 5. It is more efficient to create RAID 5, in terms of space, than a RAID 1. It is the most populate type of RAID in use in most server environments.
RAID 6
= RAID 5 + RAID 1
(Striping with Dual Party). We can loose up to 2 drives in RAID 6 compared to only one disk in RAID 5.
RAID 10
(Striping + Mirroring). It consists of 2 RAID 1 inside a RAID 0. RAID 10 requires a minimum of 4 drives. There will be a 50% lost of disk space when creating a RAID 10.
/proc/mdstat
file is used to get information about the RAID in a Linux system. Use cat /proc/mdstat
to read RAID configuration on the system.
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sd[b-c]
to create a software RAID in Linux.
Logical Volume Mapper (LVM)
LVM
maps whole physical devices and partitions into one or more virtual containers called volume groups
. The volume groups are going to become the storage devices the system, users, and applications are going to interact with. With the LVM we can dynamically create, delete, and resize volumes without having to reboot the system. We can also map multiple logical volumes across multiple physical devices. And we can create virtual snapshots of each logical volumes.
/dev/mapper
contains all logical volumes in a system that are been managed by the LVM. Devices in the /dev/mapper
directory are usually called /dev/mapper/<volume groupe name>-<logical volume name>
The Logical Volume Manager provide logical volume management tools based on three layers:
- Physical volume tools:
LVM physical volumes are raw physical drives or partitions used by LVM.
pvscan
scans for all physical devices being used as physical volumes
pvcreate
initializes a drive or partition to use as a physical volume
pvdisplay
lists attributes of physical volumes
pvchange
changes attributes of a physical volumes
pvmove
moves data from one physical volume to another without loss of data. Useful when we want to migrate data from one disk to another. pvmove [options] <source pv> [destination pv]
ex: pvmove /dev/sdb1 /dev/sdc1
to move data from /dev/sdb1
to /dev/sdc1
. --background
flag allows running the command in the background. --abort
is used to cancel a running move.
pvs
displays information about physical volumes
pvck
checks the metadata of physical volume
pvremove
removes physical volumes. The command will fail if the pv is part of any group.
pvresize
to make LVM recognized increased size of physical volumes. ex: pvresize /dev/sdb
- Volume group tools:
A volume groups is a pool of storage created from one or more LVM physical volumes.
vgscan
scan all physical devices for volume groups
vgcreate
creates volume groups
vgdisplay
list attributes of volume groups
vgchange
changes attributes of volume groups
vgs
displays information about volume groups
vgck
checks the metadata of volume groups
vgrename
renames a volume group
vgreduce
removes physical volumes from a group to reduce its size
vgextend
adds physical volumes to volume groups
vgmerge
merges two volume groups
vgsplit
splits a volume group into two
vgremove
removes volume groups
- Logical volume tools:
A logical volume is a partition created from a volume group that will act as a storage device in the system.
lvscan
scans all physical devices for logical volumes
lvcreate
creates logical volumes in a volume group
lvdisplay
displays attributes of logical volumes
lvchange
changes attributes of the logical volumes
lvs
displays information about logical volumes
lvrename
renames a logical volume
lvreduce
reduces the size of a logical volume
lvextend
extends the size of the logical volume
lvresize
resizes logical volumes
lvremove
removes logical volumes
Managing Logical Volumes
ls /dev/mapper
to see logical volumes
pvcreate /dev/sdb1 /dev/sdb2
to create a physical volumes
vgcreate backups /dev/sdb1 /dev/sdb2
to create a volume groups
lvcreate --name sysbk --size 4GB backups
lvcreate --name appbk --size 2GB backups
lvcreate --name logbk --size 0.5GB backups
lvscan
and lvdisplay
will show the 3 newly created logical volumes sysbk
, appbk
, and logbk
in the backups
volume group.
So we create a volume group (backups
) that groups 2 physical volumes (sdb1
and sdb2
). Then we created 3 logical volumes (sysbk
, appbk
, and logbk
) in the volume group.
Let's extend logbk
to 1GB:
lvextend -L1G /dev/backups/logbk
Let's reduce appbk
to 1GB:
lvreduce -L1G /dev/backups/appbk
Let's file system in the volumes we just created:
mkfs.xfs /dev/backups/sysbk
mkfs.ext4 /dev/backups/appbk
mkfs.ext4 /dev/backups/logbk
Mounting/Unmounting File Systems
File systems have to be mounted before the OS can read or write.
A Mount Point
is an access point that is typically an empty directory whre a file system is loaded or mounted to make it accessible to users.
The mount
command is used to load a file system to a specify directory to make it accessible to users and applications.
mount [options] <device name> <mount point>
to mount a file system. ex: mount /dev/sdb1 /home
to mount sdb1 to the home directory.
Mount options:
auto
device must be mounted automatically
noauto
device should not be mounted automatically
nouser
only the root user can mount a device or a file system
user
all users can mount a device or file system
exec
allow binaries in a file system to be executed
noexec
prevent binaries in a file system from being executed
ro
mount a file system as read-only
rw
mount a file system with read write permissions
sync
input and output operations should be done synchronously
async
input and output operations should be done asynchronously
remount
allows to change a file system options without having to unmount it first. For example changing from rw
to ro
. Ex: mount -o remount,ro /mnt/data
to remount /mnt/data in ro
mode.
noatime
prevents the application of a timestamp to record file reads. This can improve performance as it reduces disk writes.
nodiratime
stops updates for directory access times
flush
ensures that all metadata is written on disk
Unmounting a File System
We can use the umount
command to unmount a file system. The file system must not be in use when being unmounted.
Use umount [options] <mount point>
to unmount a file system.
fstab
(File System Table) contains the list of file systems to be mounted, their mount points, and any options that might be needed for specific file systems. We can also use systemd.mount
to create a new mount unit to mount a file system.
Filesystem in USErspace
(FUSE) lets non-priviledge users create own file systems without editing the underlying kernel code.
Let's mount our logical volumes now:
mkdir -p /backups/sys /backups/app /backups/log
to create necessary directories
mount /dev/backups/sysbk /backups/sys
mount /dev/backups/appbk /backups/app
mount /dev/backups/logbk /backups/log
mount
will show the associations of volumes and mount points.
umount /backups/log
will unmount the volume mounted at backups/log
Mounting Volumes at Boot
Let's make sure that volumes are mounted at boot time. For that we need to manipulate the fstab
configuration file.
nano /etc/fstab
to start editing the configuration file.
Add all volumes to be mounted at boot time at the end of the file. Ex:
/dev/backups/sysbk /backups/sys xfs 0 0
/dev/backups/appbk /backups/app ext4 0 0
/dev/backups/logbk /backups/log ext4 0 0
Then use mount -a
to test the fstab
configurations to make sure that all volumes listed can be mounted at boot time.
Managing File Systems
There are a lots of tools and configurations that can be used to manage Linux file systems. For example:
/etc/mtab
file reports the status of currently mounted file systems. It looks like the /proc/mounts
but /proc/mounts
is more accurate and includes more up-to-date information on file systems.
/proc/partitions
file contains information about each partition attached to the file system.
lsblk
displays information about block storage devices currently available on the system.
lsblk [options] [device name]
has multiple options:
-
-a
list empty devices -
-r
list devices excluding provided output devices -
-f
display additional information -
-l
display results in list format -
-m
display device permission information
blkid
prints each block device in a flat format and includes some additional information.
The following are common tools used for managing ext
type file systems:
e2fsck
used to check for file system issues
resize2fs
used to resize ext2
, ext3
, and ext4
file systems
tune2fs
used to adjust various tunable parameters
of the ext2
/ext3
file systems. it can also be used to add a journal to an existing ext2
or ext3
file system.
dumpe2fs
prints the superblock and block group information of the selected device.
Use fsck
command to check the correctness and validity of a file system.
A file system superblock
contains metadata about the file system, including its size, type, and status. If it becomes corrupted, we must use a tool like fsck
to repair it.
The following are tools used to manage xfs
file systems:
xfs_info
display details about the XFS file system
xfs_admin
change the parameters of an XFS file system
xfs_metadump
copy the superblock metadata of the XFS file system to a file
xfs_growfs
expand the XFS file system to file the drive size
xfs_copy
copy the contents of the XFS file system to another location
xfs_repair
repair and recover a corrupt XFS file system
xfs_db
debug the XFS file system
For scsi
devices, use lsscsi
to list information about SCSI devices connected to a Linux system.
fcstat
is used to interact with and display statistics of Fiber Channel connected devices.
Directory Structure
Directories are containers for other files
Special files
are system files stored in the /dev
directory.
Links
make a file accessible in multiple parts of the system's file tree.
Domain Sockets
provide inter-process networking that is protected by the file system's access control
Name pipes
enables processes to communicate with each other without using network sockets
Use the command file [options] <file name>
to determine the type of file you are working with.
Security Mount Options
nodev
prevents the use of special device files. ex: mount -o nodev /dev/sdb1 /mnt/safe
or in fstab
/dev/sdb1 /mnt/safe ext4 defaults,nodev 0 2
. It may be used in /home
, /mnt/usb
, /var/tmp
, and more. It should not be used in /dev
nosuid
prevents files from granting extra priviledges when they run. It may be used in /home
, /var/tmp
, network shares, and removable drives.
noexec
prevents a binary program from executing directly from file system even if it has an execute permission. Ex: mount -o noexec /dev/sdb1 /mnt/usb
or /dev/sdb1 /mnt/usb ext4 defaults,noexec 0 2
(in /etc/fstab
). noexec
prevents direct execution (ex ./program.sh
) but not interpreters' scripts (ex python ./myscript.py
)
Network Mounts
NFS
: Network File System. Linux to Linux sharing. Syntax: mount -t nfs <server>:/<remote-path> <local mountpoint>
. ex: mount -t nfs 192.168.1.2:/shared/data /mnt/shared
or in fstab
192.168.1.10:/shared/data /mnt/shared nfs defaults,_netdev 0 0
SMB
: Server Message Block. Cross platform sharing. SMB uses CIFS
(Common Internet File System) protocol under the hood. Syntax: mount -t cifs //<server>/<share> <local mountpoint> -o username=<user>,password=<pass>
. The local mount point is the directory where you want the mount to show once mounted. ex: mount -t cifs //192.168.1.2/shared /mnt/shared -o username=demo,password=vEreaSaCrEat
Troubleshooting Storage Issues
ulimit
limits the system resources for a user in a Linux-based server.
ulimit -n 512
limits the number of open files to 512 for a particular user
ulimit -a
displays all the current limits
df
/du
(disk free/disk usage) facilitate storage space tracking. df
displays the device's storage space and du
displays how a device is used.
df -h
to display human readable format of the amount of space in the system
df -h /dev/backups/appbk
to see the space in appbk
logical volume
du -h /dev/backups
iostat
generates reports on CPU and device usage. I can be used to determine issues with iops.
iostat -d /dev/backups/appbk
to view the I/O stats of appbk
ioping
generates a report of device I/O latency in real-time
ioping -c 4 /dev/backups/logbk
to check for latency in real-time
Storage Quota
allows managing the storage space per user. Here are tools used to manage storage quotas:
quotacheck -cug <mount point>
create quota database file for a file system and check for user and group quotas
edquota -u <username>
edit quotas for a specific user
edquota -g <group name>
edit quotas for a specific group
setquota -u <username>
set quotas for a specific user
setquota -g <group name>
set quotas for a specific group
Before using these commands, we have to activate user and group quotas on the file system. To do that, edit the fstab
file to add the options usrquota
and grpquota
to the the relevant file system.
With XFS file system, you can use xfs_admin
to configure quotas.
To generate quota reports, use the following the commands:
repquota -a
display the reports for all file systems indicated as read-write quotas in the mtab file
repquota -u <username>
display the quota report for a particular user
quota -uv <username>
display the quota report for a particular user with verbose output
warnquota -u
check if users are not exceeding the alloted quota limit
warnquota -g
check if groups are not exceeding the alloted quota limit
To troubleshoot a device, start with simple things:
- make sure it is powered and recognized by the system by checking the
/proc
directory - check errors in the configuration files
ftab
- make sure to reload the config files if changes have been made to them
- confirm that there is enough capacity
- confirm that I/O is not overwhelming the device
- use
partprobe
to scan for new storage devices and partitions and load them into the kernel - select the appropriate scheduler to optimize performance. ex
noop
for flash drives. To change the scheduler, change the/sys/block/[sda]/queue/scheduler
config file withecho noop > /sys/block/[sda]/queue/scheduler
- to prevent user from consumming excessive space, we are going to edit the
fstab
config file to enable quotas with/dev/backups/appbk /backups/app ext4 defaults,usrquota 0 0
. Useedquota -u john
to edit the soft and hard limits of the user john.