CH_24 Storage System in DBMS

by Jasleen Chhabra | Updated on 29 September 2024
  • Storage System in DBMS
  • Storage Hierarchy in DBMS
  • Redundant Array of Independent Disks (RAID)

Storage System in DBMS

A database system requires efficient data storage to maintain and retrieve vast amounts of information. These systems make use of various types of storage media, each offering different advantages in terms of speed, accessibility, cost, and data retention. While the database management system (DBMS) provides a logical view of the data, the actual storage occurs at the hardware level, in various forms such as bits and bytes spread across multiple storage devices. Let's explore the types of storage systems used in DBMS and their roles in handling data.


Types of Data Storage in DBMS

Data storage in a DBMS is broadly classified into three categories based on the nature of data persistence and speed of access:

  1. Primary Storage
  2. Secondary Storage
  3. Tertiary Storage

Each type of storage serves a specific function in maintaining the balance between fast data access and reliable long-term storage. These storage types vary significantly in terms of performance, cost, and volatility.


1. Primary Storage

Primary storage is the fastest and most easily accessible form of data storage in a computer system. It is directly accessible by the central processing unit (CPU) and is used for storing data that needs to be accessed quickly. However, primary storage is volatile, meaning the data is lost when the power is turned off. There are two main types of primary storage:

  • Main Memory (RAM): The main memory, also known as Random Access Memory (RAM), holds data that is currently being processed by the CPU. It plays a crucial role in executing applications and performing operations. Although RAM can store a considerable amount of data—typically in gigabytes—it is not large enough to hold entire databases. Once the power goes off, all the data in the main memory is lost, making it volatile. In terms of speed, main memory provides fast read and write operations, essential for system performance.

  • Cache Memory: Cache is a small, high-speed memory that sits between the CPU and main memory. It stores frequently accessed data to reduce the average time to access data from the main memory. Because cache memory is extremely fast, it is often used to improve the performance of the CPU by reducing the latency involved in fetching data. However, cache is limited in size and is more expensive compared to main memory. Designers of data structures and query processors often take cache effects into account to optimize performance in DBMS operations.

Despite its speed, primary storage is limited by its volatility. Once a system is shut down or a power failure occurs, all data stored in primary memory is lost. This limitation necessitates the use of secondary and tertiary storage for persistent data retention.


2. Secondary Storage

Secondary storage, also known as non-volatile storage, provides long-term data storage that is not dependent on continuous power supply. This type of storage retains data even after the system is powered off or rebooted. Secondary storage is generally slower than primary storage, but it can store significantly larger amounts of data, including entire databases. Common secondary storage devices include:

  • Flash Memory: Flash memory is widely used in devices like USB drives, solid-state drives (SSDs), and memory cards. It offers fast read and write speeds, making it an effective medium for caching frequently accessed data in large systems. Flash memory retains data even after a power loss, unlike RAM. Its non-volatility makes it suitable for servers and systems that need to maintain high performance while storing large datasets. Flash storage is increasingly being adopted in database systems for caching purposes, enhancing performance and speeding up access times.

  • Magnetic Disk Storage: Hard disk drives (HDDs) are the most common form of magnetic disk storage. They are widely used in both personal computers and large-scale data centers to store massive amounts of data. HDDs consist of spinning disks (platters) coated with magnetic material, where data is written and read by magnetic heads. The advantage of magnetic disks is their ability to store huge databases and support direct access to data. Although they are slower than flash memory, magnetic disks are reliable and cost-effective, making them suitable for long-term data storage. In DBMS, magnetic disks are often used for storing primary copies of databases, with regular backups to prevent data loss from disk failure.

Secondary storage plays a crucial role in DBMS because it provides permanent storage for data. Unlike primary storage, which is cleared when the system powers off, secondary storage ensures that data remains intact, offering a safer and more reliable option for long-term data retention.


3. Tertiary Storage

Tertiary storage, sometimes referred to as offline storage, is an external storage medium used primarily for data backup and archiving. While it offers the slowest data access speed, tertiary storage can hold extremely large volumes of data and is cost-effective. Two common types of tertiary storage devices include:

  • Optical Storage: Optical storage devices, such as CDs (Compact Discs) and DVDs (Digital Versatile Discs), use lasers to read and write data. While a CD can store around 700 MB of data, a DVD can hold between 4.7 GB and 8.5 GB, depending on whether it has one or two layers. Optical storage is frequently used for distributing media and software, as well as for long-term backups. However, due to its slower access time and smaller capacity compared to modern storage solutions, optical storage is gradually being replaced by flash and cloud storage.

  • Tape Storage: Tape storage is one of the oldest forms of data storage still in use today. It is primarily used for data archiving and backup purposes. Tape drives store data sequentially, which means that access times are slower compared to disk storage. However, tapes are highly cost-effective for storing large amounts of data. In many data centers, tape storage is used for backing up critical information, which may not need to be accessed frequently but must be preserved for legal or business purposes.

Tertiary storage is an important component of data management, as it provides affordable and reliable storage for historical data and backup archives. Although slower, it plays a vital role in disaster recovery by ensuring data can be retrieved even if primary and secondary storage systems fail.


Storage Hierarchy in DBMS

The different storage types discussed above can be arranged into a storage hierarchy, where devices are organized based on speed, cost, and reliability. Typically, the higher levels of the hierarchy (like cache and main memory) are fast but expensive and volatile. As we move down the hierarchy, storage devices become slower, less costly, and more reliable. The basic structure of the storage hierarchy is as follows:

  1. Cache Memory (Fastest, most expensive, volatile)
  2. Main Memory (RAM)
  3. Flash Memory (SSDs)
  4. Magnetic Disk Storage (HDDs)
  5. Optical and Tape Storage (Slowest, cheapest, non-volatile)

The purpose of the storage hierarchy is to balance cost and performance by using fast, expensive storage for active data processing and slower, cheaper storage for long-term data retention.


Redundant Array of Independent Disks (RAID)

To further optimize performance and reliability, database systems often employ RAID (Redundant Array of Independent Disks), a technology that connects multiple storage drives into a unified array. This setup offers various benefits like data redundancy, faster access, and protection against disk failures. Different RAID levels offer different combinations of performance and data protection:

  • RAID 0 (Striping): RAID 0 divides data into blocks and distributes these blocks across multiple disks. This improves read/write performance but offers no redundancy—if one disk fails, all data is lost.

  • RAID 1 (Mirroring): RAID 1 duplicates data across multiple disks. In case one disk fails, the data can still be retrieved from the other disk, ensuring complete redundancy.

  • RAID 2 (Error-Correction): This level uses error correction codes to store data across disks. While it offers high data integrity, RAID 2 is rarely used due to its complexity and high cost.

  • RAID 3 (Byte-level Striping with Parity): In RAID 3, data is striped across disks, and parity information is stored on a separate disk. This setup allows recovery from a single disk failure.

  • RAID 4 (Block-level Striping with Parity): Similar to RAID 3, but data is striped at the block level rather than byte-level. RAID 4 provides good read performance but may suffer from a bottleneck during write operations.

  • RAID 5 (Distributed Parity): RAID 5 improves performance by distributing both data and parity information across all disks in the array. This provides efficient storage with fault tolerance.

  • RAID 6 (Dual Parity): RAID 6 extends RAID 5 by adding a second parity block, providing additional protection against multiple disk failures.


Conclusion

The storage systems in DBMS play a crucial role in ensuring efficient data handling, from rapid processing in primary storage to long-term backup in tertiary storage. The hierarchy of storage devices, combined with RAID technology, ensures that database systems can offer high performance, reliability, and scalability, while safeguarding against data loss. Understanding these storage systems helps optimize both the design and the functioning of modern database systems.


FAQ

Any Questions?
Look Here.

Related Articles

CH_01 Database Management Systems (DBMS)

CH_02 DBMS Architecture

CH_03 DBMS Data Models

CH_04 Difference between DBMS and RDBMS

CH_05 DBMS Data Schemas and Data Independence

CH_06 Database Languages in DBMS

CH_07 ACID Properties in DBMS

CH_08 ER (Entity-Relationship) Diagrams in DBMS

CH_09 Cardinality in DBMS

CH_10 Keys in DBMS

CH_11 Generalization, Specialization, and Aggregation in DBMS

CH_12 Relational Model in DBMS

CH_13 Operations on Relational Model in DBMS

CH_14 Relational Algebra in DBMS

CH_15 Join Operations in DBMS

CH_16 Integrity Constraints in DBMS

CH_17 Relational Calculus in DBMS

CH_18 Anomalies in DBMS

CH_19 Normalization in DBMS

CH_20 Transaction Management in DBMS

CH_21 ACID Properties in DBMS

CH_22 Concurrency Control in DBMS

CH_23 Data Backup and Recovery in DBMS