Skouperd
08-12-2011, 12:48 PM
Since Onyx helped me with the most difficult sections in the article, I figured I may just as well post it for future reference... Thank you for your help Onyx.
BTW, this is also posted on my blog, here: http://blog.skoups.com/?p=236
Introduction
Redundant Array of Inexpensive Disks, better known as RAID has been in existence for several years. The reasons why most people would consider using RAID is that they need faster data transfer speed from the drives, require redundancy, need a bigger single drive, or, in most instances, a combination of these reasons.
The first section of this document will deal with explaining what factors impact the speed of mechanical hard drives. At the end of the section, you should know what to look for when purchasing a new hard drive and what to avoid. The second section will cover the popular RAID formats as well as discussing the pros and cons of each. The final section covers nested RAID.
Rotational based hard drives
Before we start delving into RAID and more specifically the read/write speed, it is imperative that we review the inner workings of spinning (or mechanical) hard drives. The basic mechanical concept of the magnetic hard drive has not changed for several decades. The concept still relies on a spinning platter that can hold a magnetic charge, with a magnetic head to read and write bits (0 or 1) to and from the platter. By changing the number of platters, the hard drive manufacturers could tailor their products for more energy conservation or more performance. A typical hard drive usually contains several platters.
The first drives weighed several tons and produced only a couple of megabytes. Today, the biggest single drive amounts to 3TB (or 3,000,000,000,000 bytes). The actual space available on a 3TB drive however only amounts to 2.7TiB in the operating system. This difference boils down to the way the hard drive manufactures use 1000s to differentiate between kilo, mega, giga and terra whereas standard operating systems use 1024 between each jump. The correct way to differentiate between the two formats is to use the terms terabyte (TB) and tebibyte (TiB). The latter term refers to the binary number typically associated with the space available in the operating system. In other words, the drive is actually a 3TB drive but it only has 2.7TiB space in the operating system.
Key things that differentiate hard drives, apart from the size, are the following:
1. Rotation speed
2. Speed variations on the same drive (read/write speed at the beginning and end of the drive)
3. Consecutive read / write speed
4. Random read / write speed
5. Seek time
Rotation Speed
The rotation speed indicates the speed at which the platters spin. A typical laptop drive spins at 5400RPM, while a SATA drive spins at 7200RPM. SAS / SCSI and high end SATA drives spin at 10,000RPM, with the most expensive SAS and SCSI drives peaking at 15,000RPM. The faster the drive spins, the quicker the magnetic head will reach a specific section on the platter. The slower a drive spins, the less energy and heat it will dispense, and the less noisy it will be. It is thus important to realize and appreciate the purpose for which you buy a hard drive. Also, when we eventually get to describing various RAID functions, keep in mind that the optimal performance of a RAID array is achieved when all the drives spin at the same speed.
Speed variations on the same drive
Depending on where the data is read from on the platter, you will observe different transfer speed. Another way to visualise fs is to think of a spinning merry-go-round. If the merry-go-round spins at 100 revolutions per minute, the people the furthest from the centre will travel faster (in terms of metres per second) than the people closer to the centre point. Hard drives work the same - the further out from the centre you are, the quicker the hard drive will be able to read and process the data because the platter underneath moves so much quicker. In practise, hard drives will start filling up from the outside towards the centre. CD’s and DVD’s start from the inside and write / read to the outside.
Now let us make a couple of assumptions for me to demonstrate the mathematics behind a hard drive:
1. Our drive spins at 7200 rotations per minute.
2. Our drive (supposedly a normal 3.5” drive), has a platter with a diameter of 3.4” or 86 millimetres.
3. The innermost tracks where data can be stored are located 10mm from the centre of the platter.
4. The outer most tracks where data could be stored are located 43mm from the centre of the platter.
5. For this example, let us just assume the sectors are packed optimally and equally dense everywhere.
6. Our final assumption is that the picture below represents the hard drive platter, and that the green line represents the innermost tracks, and the red line the outermost tracks.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-HDD-Platter-001.jpg
One can calculate the speed at which the innermost circle spins by taking the circumference of the green circle and multiplying that with the number of rotations the drive makes in a minute. We know that the rotations are 7200 and we can calculate the circumference by using the formula <i>circumference = 2 x radius x Pi </i> which, if we substitute it with numbers, results in c = 2 x 10mm x 3.14159265 = 62.831853mm. So the speed at which the green circle spins at will be 62.831853mm, multiplied by 7200RPM, multiplied by 60 (converting minutes to hours). This then tells us that the green circle is traveling at 27.14km/h.
Doing the same calculation on the red circle produces the following: 2 x 43mm (Radius) x 3.14159265 (Pi) x 7200 (RPM) x 60 (min to hour) = 116.72km/h. You can therefore clearly see that the actual distance that is covered between the inside and the outside of the platter is vastly different.
Just as a theoretical mind tease, the outer track of a 15,000RPM drive will travel at a speed of over 243.16km/h. This raw speed is partly the reason why rotational hard drives are reaching their limit in terms of how fast they can transfer data. The only place where rotational drives are now gaining is how densely they can compact the data so that each millimetre of platter can now hold a lot more data. That is the reason why you will now find large drives performing at only 5400RPM’s, outperforming older drives spinning at 10,000RPM.
Consecutive read/write speed
Consecutive read/write speed looks at what rate a hard drive reads and writes bits that are placed next to each other on the platter. Given the numbers I quoted above, this figure could differ quite substantially based on where the data is located on the platter, and therefore it is always important to consider maximum, slowest, average consecutive read/write speeds, and if possible, the standard deviation. Good luck for finding a standard deviation from the sales agent! Then again, it is quite cool to watch their expression when you do ask for it…
Random read/write speed
Random read/write looks at read/write speeds when the data you are looking for is located at different locations on the platter. The poor hard drive will need to complete rotations for each bit it is trying to read – compare this to consecutive read/writes where the data was contained in bits all sitting next to each other, and accessible with a single rotation. To solve this problem, hard drives have thus brought in NCQ (native command queuing) which will basically read the bits in the fastest possible order from the platter and then reorganize them in the hard drive’s cache before sending it on to the CPU. This is why hard drives have some cache as well.
Seek time
Seek time refers to how fast the head moves from the beginning of the drive to the end. The lower the seek time on a hard drive, the faster the random/read writes will be as well.
RAID Basics
Ok, now that we understand the basics of a hard drive, let us consider RAID. As mentioned before, RAID (or Redundant Array of Inexpensive Drives) has been around for decades. Originally the main objective of RAID was not so much to increase speed, but to improve reliability. In other words, if my drive head crashes, or my platter fails, then I do not want to lose all the data. There are various combinations of RAID and each motherboard manufacturer, OEM and software designer has his own specific way of implement each combination. I will discuss the most obvious ones i.e., RAID 0, RAID1, RAID5 and RAID6, before moving on to nested RAID solutions such as RAID 10, RAID0+1, RAID 50, and then finally also discuss Matrix RAID and JBOD.
By the way, when calculating the speed of the read/write calculations (in the best-case scenarios), you have to allow, in practice, for a little overhead from the controllers’ perspective.
RAID 0
RAID 0, also known as striping, was not on the original list of specifications for the various RAID implementations. This is because, technically, it is not redundant at all. The example below explains how a RAID 0 operates.
Assume we have two drives in a RAID 0 array and want to write the following text “Hello_world!.” The following will occur: the controller will split the data across both drives, first writing to the one drive, then to the second, then back on the first and so forth until it is finished. Assuming that the controller is set to write two letters at a time to a single drive, the letters “He” will be written to the first drive while at the same time, “ll” is written to the second drive. Then “o_” and “wo” will be written to the two drives (again at the same time), followed by one more write of “rl” to the first and “d!” to the second drive.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-0-data-striped.jpg
Because you are writing to both drives simultaneously, the speed at which you write data to the drive is equal to the number of drives in the array, multiplied by the slowest drive. In our example, assume the slowest drive can write 100 letters per second. This means the two drives combined will be able to write 200 letters per second. When reading data off the drive, the same principle applies. In other words, the read speed is as fast as the slowest of the drives, multiplied by the number of drives in the array.
The space you obtain from a RAID 0 configuration is the number of drives in the array multiplied by the smallest drive. If you have four drives and they are all 100GB in size, you will have a 400GB array. The benefit of a RAID 0 array is speed, both from a reading and from a writing perspective. The downside of a RAID 0 is that should any one of the hard drives in the array crash, all the data in that array will be lost - in our example above, if second drive crashes, there is no way to determine or “guess” that the missing letters on the drive were “ll”, “wo”, and “d!.
I will only recommend using a RAID 0 if you have no concern whatsoever about losing the data, so typically a RAID 0 array is for somebody who requires speed, and lots of it. For instance, your game installation folder will be a good example of where you can use a RAID 0. If one of the drives fails, big deal, you just reinstall the game... I would not install my saved games on that RAID 0 array though. Another use for RAID 0 is as a temporary workspace, which allows you to have the source data located on a different redundant array, but all the processing and data manipulations occur on this RAID 0 array. If the array fails, then you still have the source data and can just redo the manipulations.
RAID1
In RAID 1 the data on one drive is an exact replica of the data on a second drive, which is why it is also known as mirroring. Again, let’s assume we want to write “Hello_World!” to the RAID 1 array. The controller writes the data “Hello_World!” to both the first and the second drive simultaneously. If the one drive crashes, all the data is still available on the second drive. This solution therefore provides you with full redundancy.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-1-data-mirrored.jpg
Since the controller feeds the same data to both of the drives, the write speed of a RAID 1 is equal to the slowest drive in the array. The read speed, however, is equal to the slowest drive, multiplied by the number of drives in the array. This is because you can read from both drives simultaneously - at least in theory. Few controllers implement RAID 1 correctly, so the read speed is usually slower.
The main purpose of RAID 1 is not speed, but redundancy. The size of the array is equal to the size of the smallest drive in the array. In other words, due to data replication, two drives of 100GB each will still only result in a 100GB RAID 1 array. From a monetary perspective, Rand per GB, this is the most expensive setup but it does provide very good redundancy.
A typical use of RAID 1 in a home environment is photo and document storage. In a corporate environment, you may find the operating system and some databases such as e-mail, installed on a RAID 1 array.
RAID 5
RAID 5 uses distributed parity calculations to reconstruct data in the event of a drive failure. It requires a minimum of three drives, because it uses a form of striping (RAID0) that writes data on two of the three drives and writes a parity calculation value on the third drive. In other words, if either drives 1 or 2 fails, then you will be able to reconstruct what was on the failed drive using the parity information from the 3rd drive.
Writing our “Hello_World!” to a RAID 5 array will do the following (we’re writing one character per drive at a time):
Data written to drive 1 will be:
“H L O W R D”
Data written to drive 2 will be:
“E L _ O L !”
Drive 3 will hold the parity information.
In hexadecimal: “2D 00 30 38 1E 45”
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-5-hello-world-text.jpg
RAID 5 Parity
Because the mathematics behind the parity calculations on RAID 5 is rarely explained, let us do so now. The secret to parity calculation is an “Exclusive OR”, also known as “XOR” calculation. XOR calculation states the following:
0 xor 0 = 0
0 xor 1 = 1
1 xor 0 = 1
1 xor 1 = 0
In English, when the two values are the same, the answer is “0”, otherwise it is “1”.
When writing “Hello_World!” to a RAID 5 drive the controller will convert each letter into a binary number, so in our example “Hello_World!” will be converted to:
H = 01001000
e = 01100101
l = 01101100
l = 01101100
o = 01101111
_ = 01011111
W = 01010111
o = 01101111
r = 01110010
l = 01101100
d = 01100100
! = 00100001
If we look at the data that we wrote to the two drives (in binary format this time), then drive 1 will contain the data 01001000 01101100 01101111 01010111 01110010 01100100, and drive 2 will contain 01100101 01101100 01011111 01101111 01101100 00100001.
Using the XOR calculation, the parity string would be 00101101 00000000 00110000 00111000 00011110 01000101.
Put all three underneath each other:
D1: 01001000 01101100 01101111 01010111 01110010 01100100
D2: 01100101 01101100 01011111 01101111 01101100 00100001
P1: 00101101 00000000 00110000 00111000 00011110 01000101
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-5-hello-world-binary.jpg
Now we can see exactly how the parity calculation worked.
By removing either one of the three drives, using the remaining two drives, you can recalculate the missing values with the same XOR function.
If we destroy D1, the data will look like this:
D1: …….. …….. …….. …….. …….. ……..
D2: 01100101 01101100 01011111 01101111 01101100 00100001
P1: 00101101 00000000 00110000 00111000 00011110 01000101
Using the remaining two drives, XOR-ing each bit from D2 with the corresponding bit on P1, we can rebuild D1’s data.
D1: 01001000 …….. …….. …….. …….. ……..
RAID 5 may extend to more than 3 drives, but the process remains the same. Assume we have 5 drives in a RAID 5 configuration then the data will look like this:
D1: 01001000 01101111 01110010
D2: 01100101 01011111 01101100
D3: 01101100 01010111 01100100
D4: 01101100 01101111 00100001
P1: 00101101 00001000 01011011
The way that XOR works now is (((D1 XOR D2) XOR D3) XOR D4) = P1.
Looking at the first 8 bits on each drive, this is the process: (((D1 XOR D2) XOR D3) XOR D4) = P1
Written out as:(((01001000 XOR 01100101) XOR 01101100) XOR 01101100 = P1
First, get rid of the first set of brackets by doing the first XOR calculation:
((00101101 XOR 01101100) XOR 01101100 = P1
Getting rid of the second XOR calculation results in:
(01000001 XOR 01101100) = P1
And finally:
00101101 = P1
An easy way to see how XOR works is to think that P1 is the value needed to make the total number of 1’s in that specific column an even number.
You have to agree that mathematics is a cool subject!
Performance
The biggest problem with RAID 5 is the speed of the parity calculation. Due to that reason RAID 5 generally does not go hand in hand with stating things like “twice the speed of the slowest drive” etc. It really boils down to how fast the RAID controller can do the calculation. Generally, if you have a superfast RAID controller, then the theoretical speed of a RAID 5 array will be the speed of the slowest drive, multiplied by the number of drives in the array less one.
RAID 5 is also the cheapest form of having redundant data (per GB) since the size of a RAID 5 partition is the equivalent of the number of drives in the array, minus 1 for parity, multiplied by the smallest drive in the array. In other words, if you have four drives of a 100GB each, then the formula is (4 – 1 = 3 * 100GB = 300GB). The weakest point in a RAID 5 array is that if one drive fails, then there is a zero point failure on the remaining drives until the array has been rebuilt. In other words, as long as you only lose 1 drive, you should be fine.
Since RAID 5 is the cheapest form of redundancy (price per GB) RAID 5 is typically used in arrays where the objective is more space, with some redundancy added for good measure. Home users might use RAID 5 for their movie collections, or even ISO. Corporations use RAID 5 for their file servers, even with high-end RAID controllers for databases.
RAID 5 has always been one of those “magic” solutions. Magic however only remains magic until one understands the inner workings. I trust that this little piece of information helped remove some of the mystique associated with RAID 5. Just a warning with regard to cheap RAID controllers: they generally cannot do RAID 5 calculations in hardware and pass the calculations back to the operating system and/or the CPU. If you do intend to use RAID 5, ensure that you do research before going out to buy a RAID controller or else you will get frustrated with the slow speed of the array.
RAID 6
This brings us to RAID 6. The concept of RAID 6 is effectively a RAID 5 on steroids. Whereas RAID 5 uses a single drive as parity drive, RAID 6 uses a double distributed parity algorithm instead. RAID6 is very similar to RAID 5 with the exception of two parity drives instead of one. Users have found that too often, given the mean time before failure (MTBF) of a drive or the technician unplugs the wrong drive, that if one drive crashes and is replaced, another one tends to go very quickly. RAID 6 introduces another drive for parity, thus allowing you to lose two drives before you lose data. The space calculation is the total number of drives minus two drives for parity, multiplied by the smallest drive.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-dual-parity.jpg
Mathematics of RAID 6
RAID 5 use XOR Boolean calculations whereas the normal RAID 6 mostly uses a Reed-Solomon coding. I am saying mostly because the definition of RAID 6 is defined by the Storage Networking Industry Association as “any form of RAID that can continue to execute read and write requests to all of a RAID array’s virtual disks in the presence of any two concurrent disk failures.” This definition resulted in various vendors implementing different solutions to achieve RAID 6 compatibility. The method implemented by most vendors is the Reed Solomon error correction method, or variations thereof. The Reed Solomon coding uses Galois Field mathematics and finite fields. The mathematics behind the proper implementation is a bit scary as it relies on polynomials, matrix multiplication and linear feedback shifts.
Doing some research into the above mathematics made the implementation thereof even more confusing. So if somebody has a simplified way to explain exactly how RS works in a RAID 6 environment, feel free to let me know. (I will give you credit if I end up using it). Explaining the mathematics of a Reed Solomon implementation as such will have to wait for the time being. We may call a spade a shovel, but at the end of the day, the use of it is to bury some bodies. , Herewith is my view on how RAID 6 works.
Assume you have six drives (minimum number is four for RAID 6) then four of the six drives will contain data drives (marked as D1 to D4 below) and two contain parity drives (marked P1 and P2). The first parity drive (P1) will use a row level XOR calculation, similar to that explained in RAID 5, across the four data drives. For now, let us just ignore the second diagonal parity calculations.
The first row contains data (a1, b1, c1, and d1). A row level XOR calculation is performed, similar to that of RAID 5 and the value is stored in P1 (marked as r1). Continuing with the process, after 4 rows of writing data, the array will look something like this. You need to write the same number of rows as the number of data drives you have in the RAID 6 array.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-001.jpg
Now, in order for us to do the parity 2 diagonal XOR calculation, we need to generate a linear shift in the array as is marked by the different colours below.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-002.jpg
Just to confirm: just because the cells refer to a1 in the different rows, does not mean it is a replication of the data in each cell It merely refers to the fact that those cells marked a1 (and the same colour) are used in the calculation of the diagonal XOR calculation. For clarity, in the next array, I have renamed r1, r2, r3 and r4 so that it matches the different diagonal colouring stripes as well. The data still remains the row-XOR calculation. If we then calculate the diagonal XOR and store it in P2, then we end up with the following array:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-003.jpg
The value for the first row diagonal XOR is the calculation of XOR for all the orange cells (b1 cells). The value of the second row will be the XOR calculation of all the blue cells (c1), row 3 of all the purple cells (d1) and row 4 of all the yellow cells (e1). Those astute readers amongst you will note that I am neither performing nor storing an XOR calculation on the cells marked (a1).
That is because it is not necessary, as I will show later on.
If we continue with the above trend, then the final array will look like this:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-004.jpg
Ok, this looks all pretty and nice, but let see how we can recover data if we lose various combination of drives.
RAID 6 - Recover from two data drive failures
Assume for starters that we have lost Data Drive 1 (d1) and Data Drive 2 (d2). (The same logic will hold true if you lose any two data drives.) Losing those two drives result in the following array:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-006.jpg
As with a Sudoku puzzle or breaking an encryption algorithm, you need to find that first weak point before solving it. In the above scenario, we are unable to use the row-level parity drive P1 since we are missing two sets of data. However, we can use the diagonal parity XOR calculation since we have three of the four data cells of the orange diagonal (b1). Solving b1, (row 1 disk 2) we now have three of the four bits to use with the row parity P1 drive to solve row 1, disk 1.
The array now looks like this:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-007.jpg
It would have been easy for us to follow the pattern as we used for row 1, however, remember I said we do not need to create a XOR value for the white diagonal (a1 data) as we can still solve disk 1, row 4 using the blue diagonal parity c1. Once solved for d1, r4 then the row parity is used to solve d2, r4.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-008.jpg
The below indicates the order in which the various question marks is solved.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-009.jpg
The above showed us how to restore a two data drive failure, but what happens if a data drive and perhaps the row-XOR drive fails? Here are just some of the solutions:
RAID 6- Recover from a row parity drive and a data drive failure
In the following case, the data drive D2 and the row parity drive, P1 failed leaving our array in the following condition:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-010.jpg
In order to solve the above broken array, you first need to solve the row 3, parity 1 data by using the blue diagonal (c1) XOR calculation stored in r2 p2. Once solved andthen using the value from r3 p1, we can solve r3 d2 and so the process continues. The below is the order in which to recover the whole drive.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-011.jpg
RAID 6 - Recover from a diagonal parity drive and a data drive failure
Ok, now that we can recover from a data drive and the row parity, what about a data drive and the diagonal parity?
The broken array looks like this:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-012.jpg
One can do it the complicated way, or, the easiest which would be to recover the data drive using the row-parity drive and then just recalculate (from scratch) the diagonal parity drive.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-013.jpg
RAID 6 - Recover from both parity drives failing
Ok, smarty pants. What happens if both parity drives fail? The solution is just to go and recalculate both the drives from scratch. In other words, not recover them, just recalculate from scratch.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-014.jpg
RAID 6 - Conclusion
Now the above method is probably not the most efficient method to do the calculations, according to the clever people that is, but in my view this is compliant with the definition of RAID 6 and I am sure, somewhere along the lines, somebody has implemented it.
Saying the above, I am curious to learn from anybody that can actually explain, without using any mathematical equations, signs or symbols that is not covered by basic algebra, how a real RAID 6, Reed-Solomon implementation work in practise.
To summarise, the benefit of a RAID 6 is that you could lose any two drives and the array will continue to function. Having 6x3TB drives will yield an effective capacity of 12TB. This is on par with a RAID 50 solution but having the benefit that any two drives could be lost. RAID 50 allows only a specific combination of a 2 drive failure. The only other benefit of going with RAID 6 over RAID 50 is that you could use an uneven number of drives in a RAID 6 configuration whereas for RAID 50 you need even numbers.
RAID 6 is typically used where one has a very large number of drives and/or a big size array. I would say that as soon as you start exceeding a couple of Terabytes, or more than six drives in a single array, then it is time to start considering RAID 6 as a solution, depending on the importance of the data.
As individual drive space increases and the size of individual arrays increase with that, the time for a RAID 5 to be rebuilt increases the risk of losing that second drive. RAID 6 is not something new, but only recently due to the sufficient drop in the price of hard drives, together with the increase in processing power, makes this a more feasible solution. Just be careful though, only new RAID controllers will support RAID 6 in hardware mode.
Nested Raids
The next set of RAID arrays are also known as nested RAID. That is where you will combine different formats of the above single level RAID arrays with each other. Any RAID that is denominated with two numbers effectively is two of the aforementioned kind of RAID setup to work as a single array. The first number denotes the actual configuration of the drives in the array. The second number is how the configuration of the individual arrays looks like in relation to the other arrays.
RAID 50 for instance, will have two RAID 5 arrays (each array has four drives) and these two RAID 5 arrays are striped with a RAID 0. Once the single level RAIDs are understood then nested RAIDS is very easy to follow.
RAID0+1
The first nested array that I want to deal with is RAID0+1. Assume we have eight drives then a RAID0+1 will be:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-0+1.jpg
We first create two RAID 0 arrays and then we mirror them, most likely in the operating system. You will not find RAID0+1 in enterprise settings and that is because it is a very risky affair.
RAID 10 is a much safer method, as I will explain later.
The biggest risk with a RAID 0+1 is that if any single drive fails, then all your data becomes reliant on another high-risk array. Therefore, losing a single drive from both arrays means a complete loss of data. The key benefit of RAID0+1 is that it is a cheap-man’s solution to speed, online capacity expansion and using several different RAID controllers on the same array. Allow me to explain.
Most motherboards will allow you to create a RAID 0 array Let us assume the motherboard has eight SATA ports. For argument sake, 1 is used for the OS drive and 1 is used for the optical drive thus leaving you six ports for your data array. You use 6x1TB drives for the array. So, create two RAID0 arrays of 3TB each using the motherboard’s on-board raid controller. In Windows Manager you will see two individual arrays, each being 3TB in size. Mirror the two arrays and you end up with a 3TB logical drive in Windows that is redundant. Any one of the arrays could crash and you will not lose your data. This not only provides you with a form of redundancy, but because RAID 5 controllers that get shipped with most motherboards perform very poorly, this now also provides you with a lot of speed.
So, how about that online capacity expansion on RAID0+1? Now assume that you need to increase the capacity of your array and that you do not have money for a proper RAID controller. You could do the following. Purchase any cheap SATA controller that can do RAID 0 and buy as many drives as you can afford. The capacity does not need to be the same as the original ones as long as the total space is the same). For our example, we will be buying two 1TB drives. Remember, we have six 1TB drives already.
In Windows, delete the mirror array. This will produce you two arrays (each 3TB) that contain the exact same information. Remove 2 hard drives from any one of the arrays on the motherboard and install that with the two new drives on the SATA controller. You now have 4 drives on the SATA controller and 4 data drives on the motherboard. Create a new 4TB array on the SATA controller (RAID 0). In Windows you still have an array that contained all the data that was on the original 3TB array. Copy the data onto the new 4TB array. Once the copy is finished, destroy the 3TB array and recreate the array on the motherboard but this time using all four drives. (This will produce a 4TB array from the motherboard.) Back in Windows, now just create a mirror (RAID1) between the two 4TB drives.
The above is the cheap-mans’ method to do Online Capacity Expansion. The biggest problem is that while you increase the arrays, your data is very vulnerable and if anything happens to that single array while you copy the data from the one to the other, you could lose data. Most high-end RAID controllers can do RAID 10, but only few RAID controllers will do a RAID0+1. The reason for that is with most high-end RAID controllers, the key advantage of RAID0+1 is nullified by their built-in “Online Capacity Expansion Technology” or OCE. OCE allows the user to add more drives into the array without deleting the existing data that is stored on the array.
Your cheap motherboard RAID controllers do not support that feature which is where RAID0+1 comes in handy.
Negatives of RAID0+1? As mentioned, redundancy is a problem. The other problem is that should you lose a single drive, then in order to rebuild, all the data in the total array need to be rebuilt. Assume you have 8 drives in RAID0+1 (4 per array). Then instead of just recreating the failed drive, it will recreate 4 drives. This increases the time before the array is back to optimal health and increases the risk of losing a second drive.
Saying all the above, I have used RAID0+1 for many years on my own setup and never once did I lose data due to drive failures. (Due to stupidity yes, but not due to drive failures). Using the above, it allowed me the benefit of building up my drives and capacity as I needed it. At some stage, it becomes more economical to buy a proper RAID controller and to convert to a RAID 5 configuration. The cost of the additional space offset the cost of the RAID controller. For instance the space available from 12x1TB drives is as follows:
RAID 0+1 = 6TB
RAID 5 = 11TB
RAID 6 = 10TB
In other words, instead of trying to increase a RAID0+1 to 10TB, (requiring 8 new drives) it is cheaper to buy a real RAID controller and use your existing drives.
Bottom line, having RAID 0 arrays on the bottom provide you the flexibility and creativity that you generally don’t have with other RAID formats especially if the top most RAID is done in the operating system. For example, Matrix RAID is quite easy when the bottom array is RAID 0 . Using 3 independent RAID 0 arrays, you can create a RAID0+1 partition for your photos and important stuff, a RAID0+0 for game installations and a RAID0+5 for your semi important files. If a single drive fails, then you will lose all the data on the RAID0+0 configuration but you will not lose your photos nor any of and your semi important stuff.
RAID10
RAID10 use multiple RAID 1 arrays and then stripe it using RAID 0. See the below example.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-10.jpg
As opposed to RAID0+1, as long as you don’t lose two drives from the same 2 drive array, you are fine. You can lose several drives in this setup without incurring any data loss. If a drive fails, then only that drive’s array is rebuilt, so the rebuilding process is also a lot faster than RAID 5 or RAID0+1.
As such, you will often see RAID10 used in a corporate environment where both speed and redundancy is more important than the cost of the drives. The cost of RAID10 (and RAID0+1) is quite expensive in that you are only able to use 50% of the available space for your data.
RAID50
We have covered RAID 5 in detail. Assume we have 12 x 1TB drives for our RAID50 example. Create two RAID5 arrays of six drives each. The six drives, less the parity drive, times the space of the smallest drive will provide 5TB space for each of the two arrays.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-50.jpg
Now we can stripe the two arrays using RAID0. Therefore, your final array will be 10TB in size. Thus, you will end up with a 1TB array at the end of the day.
The benefit of this is that you could now lose a single drive from both the two arrays and still have all your data. The problem is you will still need to rebuild a 5TB RAID 5 array when you have a drive crash. While the array is rebuilding, that array is vulnerable for data loss in case of another drive failure. If you lose two drives from any single array, then you will also lose all your data.
RAID51
I have never seen RAID 51 actually implemented for real. The benefit of RAID51 is super redundancy. Assume we have twelve 1TB drives. RAID 51 needs two RAID 5 arrays, which in our case will be six drives each and each array will thus produce 5TB. This 5TB array mirrors with each other keeping the total usable space to just 5TB. The benefit of this is that you could lose many drives; in fact, you could lose seven of the twelve drives (best case) and still have all your data available. Even losing three drives, in any combination, will still not result in a data loss. The problem is the cost of having this kind of redundancy is a bit expensive and as such is very rarely used, if at all.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-51.jpg
RAID60
RAID60 consists of bottom arrays of RAID 6, striped for additional performance. RAID 60 is mainly used for when your array becomes quite large. I would say you need 16 or more drives before a setup like this is justified.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-60.jpg
Exotic RAID
The following RAID levels are just for show (ooh, look everybody, I am running RAID 100!). If you are still reading by this stage, the following diagrams should be self-explanatory. The only reason why you would actually want to do something like this is if you have multiple RAID controllers or have multiple enclosures each running their own RAID configuration.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-100.jpg
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-500.jpg
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-600.jpg
Other RAIDs
Until now, we have covered the basics of RAID. The following section covers topics that are generally associated with RAID in one way or another.
JBOD
“Just bunch of drives”, better known as “JBOD” or “spanning” is not RAID in the strictest sense. Effectively, the computer will write data to the first drive. When that is full, it will continue to the second and when that is full, to the third and so on.
Most popular use is to get odd drives, say for instance a 2x200GB, and a 600GB drive into something that is manageable.You could JBOD them and create just one array that is 1TB in size. This array in turn could then be used with another 1TB drive in RAID 0 or RAID 1 for instance.
In theory, if a drive within a JBOD fails, you will only lose the data from that drive. In practise though, if you create a JBOD array in windows, and one drive fails, you will lose all your data. (You could recover it with software but that is a whole new topic.)
MATRIX RAID
Finally, the last kind of RAID I want to discuss today is Matrix RAID. Intel patents matrix RAID and you will therefore only find it on Intel chipset motherboards. What you do with Matrix RAID is to apply multiple different RAID levels to a single drive. Assume that you have 2x100GB drives. You want both speed and redundancy. Using normal (non matrix raid) you could only have one, either RAID0 or RAID1. With Matrix RAID you could use 50% of the drives in a RAID0 configuration and the other 50% in a RAID1 configuration, i.e. the two drives will thus provide you the following total space: 100GB / 2 = 50GB partitions. The RAID1 partition will be (50GB + 50GB) / 2 = 50GB. The RAID 0 partition will be 50GB + 50GB = 100GB with total drive usage = 150GB of 200GB. Redundant space = 50GB. Speed is the fastest on the RAID 0 configuration but redundancy is only found on the RAID1 partition. This is a very handy setup for gamers who don’t have big budget systems. Basically, you setup your operating and install drive on the RAID 0 configuration, and you store your save games and photos on the RAID1 partition.
Setting up RAID
Ok, now that we have covered the different RAIDs that exist, let’s tackle the next section which is how to setup RAID. I mentioned before that you have three different ways you could set RAID up.
The first being software, then RAID as shipped with the motherboard (also known as FAKE RAID) and then you could get a proper RAID controller that you plug into your PCI, or even your pci-express slot.
The problem with both FAKE RAID or Software RAID is that you will use your CPU cycles to perform the calculations which means that you will require drivers for your RAID and and … Expensive, RAID controllers ship with a battery backup, serious amount of on-board cache as well as their own dedicated CPU’s. While on-board RAID controllers are not created equal, as mentioned, they tend to rely on the main CPU for starters, but also the bandwidth provided by the south bridge. The 680i chipset from NVidia for instance is restricted to about 140MB per second on a single threaded read/write which basically bottlenecks the drives in a big RAID0 array.
If you do want to go with RAID and the goal is performance, then I would recommend you consider getting yourself a dedicated RAID controller card rather than relying on just the motherboard’s on-board RAID. The easiest way for me to explain this is: think of your on-board graphics card, now compare that to your dedicated 9600GSO graphics card then compare that to a GTX570. There is no comparison. Likewise with RAID - what you pay is what you get.
RAID Capacity Summary
Ok, we have been talking a long time about RAID. The following table should make the capacity you will obtain clear from the various drives as well as what kind of level the applicable RAID should be considered. This table has been prepared by myself for my own purposes, it should however give a reasonable level of clarity I am assuming for the purpose of the table, that all the drives are 2TB drives. The colour scheme is as follows:
Green: I will consider using it
Red: Really getting nervous with this setup
Blue: This is overkill
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-Capacity-index.jpg
Conclusions
I hope that this article have removed some of the magic and mystery from RAID levels. If you think there is anything that I did not cover in sufficient detail, or if you think I’ve got something wrong, let me know.
Kind regards
Skouperd
BTW, this is also posted on my blog, here: http://blog.skoups.com/?p=236
Introduction
Redundant Array of Inexpensive Disks, better known as RAID has been in existence for several years. The reasons why most people would consider using RAID is that they need faster data transfer speed from the drives, require redundancy, need a bigger single drive, or, in most instances, a combination of these reasons.
The first section of this document will deal with explaining what factors impact the speed of mechanical hard drives. At the end of the section, you should know what to look for when purchasing a new hard drive and what to avoid. The second section will cover the popular RAID formats as well as discussing the pros and cons of each. The final section covers nested RAID.
Rotational based hard drives
Before we start delving into RAID and more specifically the read/write speed, it is imperative that we review the inner workings of spinning (or mechanical) hard drives. The basic mechanical concept of the magnetic hard drive has not changed for several decades. The concept still relies on a spinning platter that can hold a magnetic charge, with a magnetic head to read and write bits (0 or 1) to and from the platter. By changing the number of platters, the hard drive manufacturers could tailor their products for more energy conservation or more performance. A typical hard drive usually contains several platters.
The first drives weighed several tons and produced only a couple of megabytes. Today, the biggest single drive amounts to 3TB (or 3,000,000,000,000 bytes). The actual space available on a 3TB drive however only amounts to 2.7TiB in the operating system. This difference boils down to the way the hard drive manufactures use 1000s to differentiate between kilo, mega, giga and terra whereas standard operating systems use 1024 between each jump. The correct way to differentiate between the two formats is to use the terms terabyte (TB) and tebibyte (TiB). The latter term refers to the binary number typically associated with the space available in the operating system. In other words, the drive is actually a 3TB drive but it only has 2.7TiB space in the operating system.
Key things that differentiate hard drives, apart from the size, are the following:
1. Rotation speed
2. Speed variations on the same drive (read/write speed at the beginning and end of the drive)
3. Consecutive read / write speed
4. Random read / write speed
5. Seek time
Rotation Speed
The rotation speed indicates the speed at which the platters spin. A typical laptop drive spins at 5400RPM, while a SATA drive spins at 7200RPM. SAS / SCSI and high end SATA drives spin at 10,000RPM, with the most expensive SAS and SCSI drives peaking at 15,000RPM. The faster the drive spins, the quicker the magnetic head will reach a specific section on the platter. The slower a drive spins, the less energy and heat it will dispense, and the less noisy it will be. It is thus important to realize and appreciate the purpose for which you buy a hard drive. Also, when we eventually get to describing various RAID functions, keep in mind that the optimal performance of a RAID array is achieved when all the drives spin at the same speed.
Speed variations on the same drive
Depending on where the data is read from on the platter, you will observe different transfer speed. Another way to visualise fs is to think of a spinning merry-go-round. If the merry-go-round spins at 100 revolutions per minute, the people the furthest from the centre will travel faster (in terms of metres per second) than the people closer to the centre point. Hard drives work the same - the further out from the centre you are, the quicker the hard drive will be able to read and process the data because the platter underneath moves so much quicker. In practise, hard drives will start filling up from the outside towards the centre. CD’s and DVD’s start from the inside and write / read to the outside.
Now let us make a couple of assumptions for me to demonstrate the mathematics behind a hard drive:
1. Our drive spins at 7200 rotations per minute.
2. Our drive (supposedly a normal 3.5” drive), has a platter with a diameter of 3.4” or 86 millimetres.
3. The innermost tracks where data can be stored are located 10mm from the centre of the platter.
4. The outer most tracks where data could be stored are located 43mm from the centre of the platter.
5. For this example, let us just assume the sectors are packed optimally and equally dense everywhere.
6. Our final assumption is that the picture below represents the hard drive platter, and that the green line represents the innermost tracks, and the red line the outermost tracks.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-HDD-Platter-001.jpg
One can calculate the speed at which the innermost circle spins by taking the circumference of the green circle and multiplying that with the number of rotations the drive makes in a minute. We know that the rotations are 7200 and we can calculate the circumference by using the formula <i>circumference = 2 x radius x Pi </i> which, if we substitute it with numbers, results in c = 2 x 10mm x 3.14159265 = 62.831853mm. So the speed at which the green circle spins at will be 62.831853mm, multiplied by 7200RPM, multiplied by 60 (converting minutes to hours). This then tells us that the green circle is traveling at 27.14km/h.
Doing the same calculation on the red circle produces the following: 2 x 43mm (Radius) x 3.14159265 (Pi) x 7200 (RPM) x 60 (min to hour) = 116.72km/h. You can therefore clearly see that the actual distance that is covered between the inside and the outside of the platter is vastly different.
Just as a theoretical mind tease, the outer track of a 15,000RPM drive will travel at a speed of over 243.16km/h. This raw speed is partly the reason why rotational hard drives are reaching their limit in terms of how fast they can transfer data. The only place where rotational drives are now gaining is how densely they can compact the data so that each millimetre of platter can now hold a lot more data. That is the reason why you will now find large drives performing at only 5400RPM’s, outperforming older drives spinning at 10,000RPM.
Consecutive read/write speed
Consecutive read/write speed looks at what rate a hard drive reads and writes bits that are placed next to each other on the platter. Given the numbers I quoted above, this figure could differ quite substantially based on where the data is located on the platter, and therefore it is always important to consider maximum, slowest, average consecutive read/write speeds, and if possible, the standard deviation. Good luck for finding a standard deviation from the sales agent! Then again, it is quite cool to watch their expression when you do ask for it…
Random read/write speed
Random read/write looks at read/write speeds when the data you are looking for is located at different locations on the platter. The poor hard drive will need to complete rotations for each bit it is trying to read – compare this to consecutive read/writes where the data was contained in bits all sitting next to each other, and accessible with a single rotation. To solve this problem, hard drives have thus brought in NCQ (native command queuing) which will basically read the bits in the fastest possible order from the platter and then reorganize them in the hard drive’s cache before sending it on to the CPU. This is why hard drives have some cache as well.
Seek time
Seek time refers to how fast the head moves from the beginning of the drive to the end. The lower the seek time on a hard drive, the faster the random/read writes will be as well.
RAID Basics
Ok, now that we understand the basics of a hard drive, let us consider RAID. As mentioned before, RAID (or Redundant Array of Inexpensive Drives) has been around for decades. Originally the main objective of RAID was not so much to increase speed, but to improve reliability. In other words, if my drive head crashes, or my platter fails, then I do not want to lose all the data. There are various combinations of RAID and each motherboard manufacturer, OEM and software designer has his own specific way of implement each combination. I will discuss the most obvious ones i.e., RAID 0, RAID1, RAID5 and RAID6, before moving on to nested RAID solutions such as RAID 10, RAID0+1, RAID 50, and then finally also discuss Matrix RAID and JBOD.
By the way, when calculating the speed of the read/write calculations (in the best-case scenarios), you have to allow, in practice, for a little overhead from the controllers’ perspective.
RAID 0
RAID 0, also known as striping, was not on the original list of specifications for the various RAID implementations. This is because, technically, it is not redundant at all. The example below explains how a RAID 0 operates.
Assume we have two drives in a RAID 0 array and want to write the following text “Hello_world!.” The following will occur: the controller will split the data across both drives, first writing to the one drive, then to the second, then back on the first and so forth until it is finished. Assuming that the controller is set to write two letters at a time to a single drive, the letters “He” will be written to the first drive while at the same time, “ll” is written to the second drive. Then “o_” and “wo” will be written to the two drives (again at the same time), followed by one more write of “rl” to the first and “d!” to the second drive.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-0-data-striped.jpg
Because you are writing to both drives simultaneously, the speed at which you write data to the drive is equal to the number of drives in the array, multiplied by the slowest drive. In our example, assume the slowest drive can write 100 letters per second. This means the two drives combined will be able to write 200 letters per second. When reading data off the drive, the same principle applies. In other words, the read speed is as fast as the slowest of the drives, multiplied by the number of drives in the array.
The space you obtain from a RAID 0 configuration is the number of drives in the array multiplied by the smallest drive. If you have four drives and they are all 100GB in size, you will have a 400GB array. The benefit of a RAID 0 array is speed, both from a reading and from a writing perspective. The downside of a RAID 0 is that should any one of the hard drives in the array crash, all the data in that array will be lost - in our example above, if second drive crashes, there is no way to determine or “guess” that the missing letters on the drive were “ll”, “wo”, and “d!.
I will only recommend using a RAID 0 if you have no concern whatsoever about losing the data, so typically a RAID 0 array is for somebody who requires speed, and lots of it. For instance, your game installation folder will be a good example of where you can use a RAID 0. If one of the drives fails, big deal, you just reinstall the game... I would not install my saved games on that RAID 0 array though. Another use for RAID 0 is as a temporary workspace, which allows you to have the source data located on a different redundant array, but all the processing and data manipulations occur on this RAID 0 array. If the array fails, then you still have the source data and can just redo the manipulations.
RAID1
In RAID 1 the data on one drive is an exact replica of the data on a second drive, which is why it is also known as mirroring. Again, let’s assume we want to write “Hello_World!” to the RAID 1 array. The controller writes the data “Hello_World!” to both the first and the second drive simultaneously. If the one drive crashes, all the data is still available on the second drive. This solution therefore provides you with full redundancy.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-1-data-mirrored.jpg
Since the controller feeds the same data to both of the drives, the write speed of a RAID 1 is equal to the slowest drive in the array. The read speed, however, is equal to the slowest drive, multiplied by the number of drives in the array. This is because you can read from both drives simultaneously - at least in theory. Few controllers implement RAID 1 correctly, so the read speed is usually slower.
The main purpose of RAID 1 is not speed, but redundancy. The size of the array is equal to the size of the smallest drive in the array. In other words, due to data replication, two drives of 100GB each will still only result in a 100GB RAID 1 array. From a monetary perspective, Rand per GB, this is the most expensive setup but it does provide very good redundancy.
A typical use of RAID 1 in a home environment is photo and document storage. In a corporate environment, you may find the operating system and some databases such as e-mail, installed on a RAID 1 array.
RAID 5
RAID 5 uses distributed parity calculations to reconstruct data in the event of a drive failure. It requires a minimum of three drives, because it uses a form of striping (RAID0) that writes data on two of the three drives and writes a parity calculation value on the third drive. In other words, if either drives 1 or 2 fails, then you will be able to reconstruct what was on the failed drive using the parity information from the 3rd drive.
Writing our “Hello_World!” to a RAID 5 array will do the following (we’re writing one character per drive at a time):
Data written to drive 1 will be:
“H L O W R D”
Data written to drive 2 will be:
“E L _ O L !”
Drive 3 will hold the parity information.
In hexadecimal: “2D 00 30 38 1E 45”
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-5-hello-world-text.jpg
RAID 5 Parity
Because the mathematics behind the parity calculations on RAID 5 is rarely explained, let us do so now. The secret to parity calculation is an “Exclusive OR”, also known as “XOR” calculation. XOR calculation states the following:
0 xor 0 = 0
0 xor 1 = 1
1 xor 0 = 1
1 xor 1 = 0
In English, when the two values are the same, the answer is “0”, otherwise it is “1”.
When writing “Hello_World!” to a RAID 5 drive the controller will convert each letter into a binary number, so in our example “Hello_World!” will be converted to:
H = 01001000
e = 01100101
l = 01101100
l = 01101100
o = 01101111
_ = 01011111
W = 01010111
o = 01101111
r = 01110010
l = 01101100
d = 01100100
! = 00100001
If we look at the data that we wrote to the two drives (in binary format this time), then drive 1 will contain the data 01001000 01101100 01101111 01010111 01110010 01100100, and drive 2 will contain 01100101 01101100 01011111 01101111 01101100 00100001.
Using the XOR calculation, the parity string would be 00101101 00000000 00110000 00111000 00011110 01000101.
Put all three underneath each other:
D1: 01001000 01101100 01101111 01010111 01110010 01100100
D2: 01100101 01101100 01011111 01101111 01101100 00100001
P1: 00101101 00000000 00110000 00111000 00011110 01000101
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-5-hello-world-binary.jpg
Now we can see exactly how the parity calculation worked.
By removing either one of the three drives, using the remaining two drives, you can recalculate the missing values with the same XOR function.
If we destroy D1, the data will look like this:
D1: …….. …….. …….. …….. …….. ……..
D2: 01100101 01101100 01011111 01101111 01101100 00100001
P1: 00101101 00000000 00110000 00111000 00011110 01000101
Using the remaining two drives, XOR-ing each bit from D2 with the corresponding bit on P1, we can rebuild D1’s data.
D1: 01001000 …….. …….. …….. …….. ……..
RAID 5 may extend to more than 3 drives, but the process remains the same. Assume we have 5 drives in a RAID 5 configuration then the data will look like this:
D1: 01001000 01101111 01110010
D2: 01100101 01011111 01101100
D3: 01101100 01010111 01100100
D4: 01101100 01101111 00100001
P1: 00101101 00001000 01011011
The way that XOR works now is (((D1 XOR D2) XOR D3) XOR D4) = P1.
Looking at the first 8 bits on each drive, this is the process: (((D1 XOR D2) XOR D3) XOR D4) = P1
Written out as:(((01001000 XOR 01100101) XOR 01101100) XOR 01101100 = P1
First, get rid of the first set of brackets by doing the first XOR calculation:
((00101101 XOR 01101100) XOR 01101100 = P1
Getting rid of the second XOR calculation results in:
(01000001 XOR 01101100) = P1
And finally:
00101101 = P1
An easy way to see how XOR works is to think that P1 is the value needed to make the total number of 1’s in that specific column an even number.
You have to agree that mathematics is a cool subject!
Performance
The biggest problem with RAID 5 is the speed of the parity calculation. Due to that reason RAID 5 generally does not go hand in hand with stating things like “twice the speed of the slowest drive” etc. It really boils down to how fast the RAID controller can do the calculation. Generally, if you have a superfast RAID controller, then the theoretical speed of a RAID 5 array will be the speed of the slowest drive, multiplied by the number of drives in the array less one.
RAID 5 is also the cheapest form of having redundant data (per GB) since the size of a RAID 5 partition is the equivalent of the number of drives in the array, minus 1 for parity, multiplied by the smallest drive in the array. In other words, if you have four drives of a 100GB each, then the formula is (4 – 1 = 3 * 100GB = 300GB). The weakest point in a RAID 5 array is that if one drive fails, then there is a zero point failure on the remaining drives until the array has been rebuilt. In other words, as long as you only lose 1 drive, you should be fine.
Since RAID 5 is the cheapest form of redundancy (price per GB) RAID 5 is typically used in arrays where the objective is more space, with some redundancy added for good measure. Home users might use RAID 5 for their movie collections, or even ISO. Corporations use RAID 5 for their file servers, even with high-end RAID controllers for databases.
RAID 5 has always been one of those “magic” solutions. Magic however only remains magic until one understands the inner workings. I trust that this little piece of information helped remove some of the mystique associated with RAID 5. Just a warning with regard to cheap RAID controllers: they generally cannot do RAID 5 calculations in hardware and pass the calculations back to the operating system and/or the CPU. If you do intend to use RAID 5, ensure that you do research before going out to buy a RAID controller or else you will get frustrated with the slow speed of the array.
RAID 6
This brings us to RAID 6. The concept of RAID 6 is effectively a RAID 5 on steroids. Whereas RAID 5 uses a single drive as parity drive, RAID 6 uses a double distributed parity algorithm instead. RAID6 is very similar to RAID 5 with the exception of two parity drives instead of one. Users have found that too often, given the mean time before failure (MTBF) of a drive or the technician unplugs the wrong drive, that if one drive crashes and is replaced, another one tends to go very quickly. RAID 6 introduces another drive for parity, thus allowing you to lose two drives before you lose data. The space calculation is the total number of drives minus two drives for parity, multiplied by the smallest drive.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-dual-parity.jpg
Mathematics of RAID 6
RAID 5 use XOR Boolean calculations whereas the normal RAID 6 mostly uses a Reed-Solomon coding. I am saying mostly because the definition of RAID 6 is defined by the Storage Networking Industry Association as “any form of RAID that can continue to execute read and write requests to all of a RAID array’s virtual disks in the presence of any two concurrent disk failures.” This definition resulted in various vendors implementing different solutions to achieve RAID 6 compatibility. The method implemented by most vendors is the Reed Solomon error correction method, or variations thereof. The Reed Solomon coding uses Galois Field mathematics and finite fields. The mathematics behind the proper implementation is a bit scary as it relies on polynomials, matrix multiplication and linear feedback shifts.
Doing some research into the above mathematics made the implementation thereof even more confusing. So if somebody has a simplified way to explain exactly how RS works in a RAID 6 environment, feel free to let me know. (I will give you credit if I end up using it). Explaining the mathematics of a Reed Solomon implementation as such will have to wait for the time being. We may call a spade a shovel, but at the end of the day, the use of it is to bury some bodies. , Herewith is my view on how RAID 6 works.
Assume you have six drives (minimum number is four for RAID 6) then four of the six drives will contain data drives (marked as D1 to D4 below) and two contain parity drives (marked P1 and P2). The first parity drive (P1) will use a row level XOR calculation, similar to that explained in RAID 5, across the four data drives. For now, let us just ignore the second diagonal parity calculations.
The first row contains data (a1, b1, c1, and d1). A row level XOR calculation is performed, similar to that of RAID 5 and the value is stored in P1 (marked as r1). Continuing with the process, after 4 rows of writing data, the array will look something like this. You need to write the same number of rows as the number of data drives you have in the RAID 6 array.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-001.jpg
Now, in order for us to do the parity 2 diagonal XOR calculation, we need to generate a linear shift in the array as is marked by the different colours below.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-002.jpg
Just to confirm: just because the cells refer to a1 in the different rows, does not mean it is a replication of the data in each cell It merely refers to the fact that those cells marked a1 (and the same colour) are used in the calculation of the diagonal XOR calculation. For clarity, in the next array, I have renamed r1, r2, r3 and r4 so that it matches the different diagonal colouring stripes as well. The data still remains the row-XOR calculation. If we then calculate the diagonal XOR and store it in P2, then we end up with the following array:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-003.jpg
The value for the first row diagonal XOR is the calculation of XOR for all the orange cells (b1 cells). The value of the second row will be the XOR calculation of all the blue cells (c1), row 3 of all the purple cells (d1) and row 4 of all the yellow cells (e1). Those astute readers amongst you will note that I am neither performing nor storing an XOR calculation on the cells marked (a1).
That is because it is not necessary, as I will show later on.
If we continue with the above trend, then the final array will look like this:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-004.jpg
Ok, this looks all pretty and nice, but let see how we can recover data if we lose various combination of drives.
RAID 6 - Recover from two data drive failures
Assume for starters that we have lost Data Drive 1 (d1) and Data Drive 2 (d2). (The same logic will hold true if you lose any two data drives.) Losing those two drives result in the following array:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-006.jpg
As with a Sudoku puzzle or breaking an encryption algorithm, you need to find that first weak point before solving it. In the above scenario, we are unable to use the row-level parity drive P1 since we are missing two sets of data. However, we can use the diagonal parity XOR calculation since we have three of the four data cells of the orange diagonal (b1). Solving b1, (row 1 disk 2) we now have three of the four bits to use with the row parity P1 drive to solve row 1, disk 1.
The array now looks like this:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-007.jpg
It would have been easy for us to follow the pattern as we used for row 1, however, remember I said we do not need to create a XOR value for the white diagonal (a1 data) as we can still solve disk 1, row 4 using the blue diagonal parity c1. Once solved for d1, r4 then the row parity is used to solve d2, r4.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-008.jpg
The below indicates the order in which the various question marks is solved.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-009.jpg
The above showed us how to restore a two data drive failure, but what happens if a data drive and perhaps the row-XOR drive fails? Here are just some of the solutions:
RAID 6- Recover from a row parity drive and a data drive failure
In the following case, the data drive D2 and the row parity drive, P1 failed leaving our array in the following condition:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-010.jpg
In order to solve the above broken array, you first need to solve the row 3, parity 1 data by using the blue diagonal (c1) XOR calculation stored in r2 p2. Once solved andthen using the value from r3 p1, we can solve r3 d2 and so the process continues. The below is the order in which to recover the whole drive.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-011.jpg
RAID 6 - Recover from a diagonal parity drive and a data drive failure
Ok, now that we can recover from a data drive and the row parity, what about a data drive and the diagonal parity?
The broken array looks like this:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-012.jpg
One can do it the complicated way, or, the easiest which would be to recover the data drive using the row-parity drive and then just recalculate (from scratch) the diagonal parity drive.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-013.jpg
RAID 6 - Recover from both parity drives failing
Ok, smarty pants. What happens if both parity drives fail? The solution is just to go and recalculate both the drives from scratch. In other words, not recover them, just recalculate from scratch.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-6-table-014.jpg
RAID 6 - Conclusion
Now the above method is probably not the most efficient method to do the calculations, according to the clever people that is, but in my view this is compliant with the definition of RAID 6 and I am sure, somewhere along the lines, somebody has implemented it.
Saying the above, I am curious to learn from anybody that can actually explain, without using any mathematical equations, signs or symbols that is not covered by basic algebra, how a real RAID 6, Reed-Solomon implementation work in practise.
To summarise, the benefit of a RAID 6 is that you could lose any two drives and the array will continue to function. Having 6x3TB drives will yield an effective capacity of 12TB. This is on par with a RAID 50 solution but having the benefit that any two drives could be lost. RAID 50 allows only a specific combination of a 2 drive failure. The only other benefit of going with RAID 6 over RAID 50 is that you could use an uneven number of drives in a RAID 6 configuration whereas for RAID 50 you need even numbers.
RAID 6 is typically used where one has a very large number of drives and/or a big size array. I would say that as soon as you start exceeding a couple of Terabytes, or more than six drives in a single array, then it is time to start considering RAID 6 as a solution, depending on the importance of the data.
As individual drive space increases and the size of individual arrays increase with that, the time for a RAID 5 to be rebuilt increases the risk of losing that second drive. RAID 6 is not something new, but only recently due to the sufficient drop in the price of hard drives, together with the increase in processing power, makes this a more feasible solution. Just be careful though, only new RAID controllers will support RAID 6 in hardware mode.
Nested Raids
The next set of RAID arrays are also known as nested RAID. That is where you will combine different formats of the above single level RAID arrays with each other. Any RAID that is denominated with two numbers effectively is two of the aforementioned kind of RAID setup to work as a single array. The first number denotes the actual configuration of the drives in the array. The second number is how the configuration of the individual arrays looks like in relation to the other arrays.
RAID 50 for instance, will have two RAID 5 arrays (each array has four drives) and these two RAID 5 arrays are striped with a RAID 0. Once the single level RAIDs are understood then nested RAIDS is very easy to follow.
RAID0+1
The first nested array that I want to deal with is RAID0+1. Assume we have eight drives then a RAID0+1 will be:
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-0+1.jpg
We first create two RAID 0 arrays and then we mirror them, most likely in the operating system. You will not find RAID0+1 in enterprise settings and that is because it is a very risky affair.
RAID 10 is a much safer method, as I will explain later.
The biggest risk with a RAID 0+1 is that if any single drive fails, then all your data becomes reliant on another high-risk array. Therefore, losing a single drive from both arrays means a complete loss of data. The key benefit of RAID0+1 is that it is a cheap-man’s solution to speed, online capacity expansion and using several different RAID controllers on the same array. Allow me to explain.
Most motherboards will allow you to create a RAID 0 array Let us assume the motherboard has eight SATA ports. For argument sake, 1 is used for the OS drive and 1 is used for the optical drive thus leaving you six ports for your data array. You use 6x1TB drives for the array. So, create two RAID0 arrays of 3TB each using the motherboard’s on-board raid controller. In Windows Manager you will see two individual arrays, each being 3TB in size. Mirror the two arrays and you end up with a 3TB logical drive in Windows that is redundant. Any one of the arrays could crash and you will not lose your data. This not only provides you with a form of redundancy, but because RAID 5 controllers that get shipped with most motherboards perform very poorly, this now also provides you with a lot of speed.
So, how about that online capacity expansion on RAID0+1? Now assume that you need to increase the capacity of your array and that you do not have money for a proper RAID controller. You could do the following. Purchase any cheap SATA controller that can do RAID 0 and buy as many drives as you can afford. The capacity does not need to be the same as the original ones as long as the total space is the same). For our example, we will be buying two 1TB drives. Remember, we have six 1TB drives already.
In Windows, delete the mirror array. This will produce you two arrays (each 3TB) that contain the exact same information. Remove 2 hard drives from any one of the arrays on the motherboard and install that with the two new drives on the SATA controller. You now have 4 drives on the SATA controller and 4 data drives on the motherboard. Create a new 4TB array on the SATA controller (RAID 0). In Windows you still have an array that contained all the data that was on the original 3TB array. Copy the data onto the new 4TB array. Once the copy is finished, destroy the 3TB array and recreate the array on the motherboard but this time using all four drives. (This will produce a 4TB array from the motherboard.) Back in Windows, now just create a mirror (RAID1) between the two 4TB drives.
The above is the cheap-mans’ method to do Online Capacity Expansion. The biggest problem is that while you increase the arrays, your data is very vulnerable and if anything happens to that single array while you copy the data from the one to the other, you could lose data. Most high-end RAID controllers can do RAID 10, but only few RAID controllers will do a RAID0+1. The reason for that is with most high-end RAID controllers, the key advantage of RAID0+1 is nullified by their built-in “Online Capacity Expansion Technology” or OCE. OCE allows the user to add more drives into the array without deleting the existing data that is stored on the array.
Your cheap motherboard RAID controllers do not support that feature which is where RAID0+1 comes in handy.
Negatives of RAID0+1? As mentioned, redundancy is a problem. The other problem is that should you lose a single drive, then in order to rebuild, all the data in the total array need to be rebuilt. Assume you have 8 drives in RAID0+1 (4 per array). Then instead of just recreating the failed drive, it will recreate 4 drives. This increases the time before the array is back to optimal health and increases the risk of losing a second drive.
Saying all the above, I have used RAID0+1 for many years on my own setup and never once did I lose data due to drive failures. (Due to stupidity yes, but not due to drive failures). Using the above, it allowed me the benefit of building up my drives and capacity as I needed it. At some stage, it becomes more economical to buy a proper RAID controller and to convert to a RAID 5 configuration. The cost of the additional space offset the cost of the RAID controller. For instance the space available from 12x1TB drives is as follows:
RAID 0+1 = 6TB
RAID 5 = 11TB
RAID 6 = 10TB
In other words, instead of trying to increase a RAID0+1 to 10TB, (requiring 8 new drives) it is cheaper to buy a real RAID controller and use your existing drives.
Bottom line, having RAID 0 arrays on the bottom provide you the flexibility and creativity that you generally don’t have with other RAID formats especially if the top most RAID is done in the operating system. For example, Matrix RAID is quite easy when the bottom array is RAID 0 . Using 3 independent RAID 0 arrays, you can create a RAID0+1 partition for your photos and important stuff, a RAID0+0 for game installations and a RAID0+5 for your semi important files. If a single drive fails, then you will lose all the data on the RAID0+0 configuration but you will not lose your photos nor any of and your semi important stuff.
RAID10
RAID10 use multiple RAID 1 arrays and then stripe it using RAID 0. See the below example.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-10.jpg
As opposed to RAID0+1, as long as you don’t lose two drives from the same 2 drive array, you are fine. You can lose several drives in this setup without incurring any data loss. If a drive fails, then only that drive’s array is rebuilt, so the rebuilding process is also a lot faster than RAID 5 or RAID0+1.
As such, you will often see RAID10 used in a corporate environment where both speed and redundancy is more important than the cost of the drives. The cost of RAID10 (and RAID0+1) is quite expensive in that you are only able to use 50% of the available space for your data.
RAID50
We have covered RAID 5 in detail. Assume we have 12 x 1TB drives for our RAID50 example. Create two RAID5 arrays of six drives each. The six drives, less the parity drive, times the space of the smallest drive will provide 5TB space for each of the two arrays.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-50.jpg
Now we can stripe the two arrays using RAID0. Therefore, your final array will be 10TB in size. Thus, you will end up with a 1TB array at the end of the day.
The benefit of this is that you could now lose a single drive from both the two arrays and still have all your data. The problem is you will still need to rebuild a 5TB RAID 5 array when you have a drive crash. While the array is rebuilding, that array is vulnerable for data loss in case of another drive failure. If you lose two drives from any single array, then you will also lose all your data.
RAID51
I have never seen RAID 51 actually implemented for real. The benefit of RAID51 is super redundancy. Assume we have twelve 1TB drives. RAID 51 needs two RAID 5 arrays, which in our case will be six drives each and each array will thus produce 5TB. This 5TB array mirrors with each other keeping the total usable space to just 5TB. The benefit of this is that you could lose many drives; in fact, you could lose seven of the twelve drives (best case) and still have all your data available. Even losing three drives, in any combination, will still not result in a data loss. The problem is the cost of having this kind of redundancy is a bit expensive and as such is very rarely used, if at all.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-51.jpg
RAID60
RAID60 consists of bottom arrays of RAID 6, striped for additional performance. RAID 60 is mainly used for when your array becomes quite large. I would say you need 16 or more drives before a setup like this is justified.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-60.jpg
Exotic RAID
The following RAID levels are just for show (ooh, look everybody, I am running RAID 100!). If you are still reading by this stage, the following diagrams should be self-explanatory. The only reason why you would actually want to do something like this is if you have multiple RAID controllers or have multiple enclosures each running their own RAID configuration.
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-100.jpg
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-500.jpg
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-600.jpg
Other RAIDs
Until now, we have covered the basics of RAID. The following section covers topics that are generally associated with RAID in one way or another.
JBOD
“Just bunch of drives”, better known as “JBOD” or “spanning” is not RAID in the strictest sense. Effectively, the computer will write data to the first drive. When that is full, it will continue to the second and when that is full, to the third and so on.
Most popular use is to get odd drives, say for instance a 2x200GB, and a 600GB drive into something that is manageable.You could JBOD them and create just one array that is 1TB in size. This array in turn could then be used with another 1TB drive in RAID 0 or RAID 1 for instance.
In theory, if a drive within a JBOD fails, you will only lose the data from that drive. In practise though, if you create a JBOD array in windows, and one drive fails, you will lose all your data. (You could recover it with software but that is a whole new topic.)
MATRIX RAID
Finally, the last kind of RAID I want to discuss today is Matrix RAID. Intel patents matrix RAID and you will therefore only find it on Intel chipset motherboards. What you do with Matrix RAID is to apply multiple different RAID levels to a single drive. Assume that you have 2x100GB drives. You want both speed and redundancy. Using normal (non matrix raid) you could only have one, either RAID0 or RAID1. With Matrix RAID you could use 50% of the drives in a RAID0 configuration and the other 50% in a RAID1 configuration, i.e. the two drives will thus provide you the following total space: 100GB / 2 = 50GB partitions. The RAID1 partition will be (50GB + 50GB) / 2 = 50GB. The RAID 0 partition will be 50GB + 50GB = 100GB with total drive usage = 150GB of 200GB. Redundant space = 50GB. Speed is the fastest on the RAID 0 configuration but redundancy is only found on the RAID1 partition. This is a very handy setup for gamers who don’t have big budget systems. Basically, you setup your operating and install drive on the RAID 0 configuration, and you store your save games and photos on the RAID1 partition.
Setting up RAID
Ok, now that we have covered the different RAIDs that exist, let’s tackle the next section which is how to setup RAID. I mentioned before that you have three different ways you could set RAID up.
The first being software, then RAID as shipped with the motherboard (also known as FAKE RAID) and then you could get a proper RAID controller that you plug into your PCI, or even your pci-express slot.
The problem with both FAKE RAID or Software RAID is that you will use your CPU cycles to perform the calculations which means that you will require drivers for your RAID and and … Expensive, RAID controllers ship with a battery backup, serious amount of on-board cache as well as their own dedicated CPU’s. While on-board RAID controllers are not created equal, as mentioned, they tend to rely on the main CPU for starters, but also the bandwidth provided by the south bridge. The 680i chipset from NVidia for instance is restricted to about 140MB per second on a single threaded read/write which basically bottlenecks the drives in a big RAID0 array.
If you do want to go with RAID and the goal is performance, then I would recommend you consider getting yourself a dedicated RAID controller card rather than relying on just the motherboard’s on-board RAID. The easiest way for me to explain this is: think of your on-board graphics card, now compare that to your dedicated 9600GSO graphics card then compare that to a GTX570. There is no comparison. Likewise with RAID - what you pay is what you get.
RAID Capacity Summary
Ok, we have been talking a long time about RAID. The following table should make the capacity you will obtain clear from the various drives as well as what kind of level the applicable RAID should be considered. This table has been prepared by myself for my own purposes, it should however give a reasonable level of clarity I am assuming for the purpose of the table, that all the drives are 2TB drives. The colour scheme is as follows:
Green: I will consider using it
Red: Really getting nervous with this setup
Blue: This is overkill
http://blog.skoups.com/wp-content/uploads/2011/12/RAID-RAID-Capacity-index.jpg
Conclusions
I hope that this article have removed some of the magic and mystery from RAID levels. If you think there is anything that I did not cover in sufficient detail, or if you think I’ve got something wrong, let me know.
Kind regards
Skouperd