Backblaze, which provides cloud storage and data backup services, has published another report according to statistics of failures of hard drives of different models. Following the global trend, the company decided to find out whether artificial intelligence can help reduce the number of failures.
At the end of the second calendar quarter of 2024, Backblaze had 284,876 hard drives in operation. The company excluded models with fewer than 100 units in operation and those that had less than 10,000 total days of operation during the quarter. The report included 284,386 units, made up of 29 models. Given how popular AI technologies are today in various industries, Backblaze wondered whether they could be used to predict hard drive failures. This would require training a large language model on the company’s statistics and testing the hypothesis of whether the AI can predict the probability of a given drive failing over time – and it is not yet clear whether the statistics for one model can be applied to another, since their failure profiles can differ dramatically.
The latest report found that the annualized failure rate (AFR) for Q2 was 1.71%, down from 2.28% in the same period last year but up from 1.41% in Q1 2024. Of most concern was the 12TB HGST model (HUH721212ALN604), whose AFR jumped to 7.17% during the period, bringing its lifetime AFR up to 1.57% from 0.99%. It’s also notable that two models, the Seagate 14TB ST14000NM000J and 16TB ST16000NM002J, did not experience a single failure during the quarter. But Backblaze only has a relatively small number of these drives in service.
The oldest model in operation is a 4TB Seagate (ST4000DM000), and the company plans to migrate data from these drives to newer, larger drives in the next quarter or two. The longest-running example, however, is a 4TB HGST (HMS5C4040ALE640), which had been in operation for 9 years, 11 months, and 23 days by the end of Q2—the storage where the drive is installed is currently in the process of migrating.
The goal of collecting and processing these statistics is to create a failure profile for each drive over time, Backblaze explained, which will help develop replacement and migration strategies. To illustrate this, the company has three charts based on failure statistics for models with a combined total of 1 million days of operation at the company. The first chart shows AFRs for 14 models with an average age of 60 months or less, while the second chart shows models with an average age of more than 60 months. This division was chosen because 60 months is a typical warranty period for enterprise-class hard drives.
The drives in quadrant I in the first chart are characterized as performing well with an AFR of less than 1.5%; those in quadrant II are performing acceptably with an AFR above 1.5%; the models in quadrant IV are relatively new, and their failure profile is just beginning to emerge. There are no drives in quadrant III. In the second chart, quadrant I, as before, represents quality models; quadrants II and III are “drives we should worry about”; and quadrant IV contains only one model that gives no cause for concern.
To show the dynamics of failures, a third chart was compiled. It shows the failure rate over the entire service life of nine models older than 60 months – for clarity, the countdown is started from 24 months. The distribution goes mainly to the I and II quadrants, and five of the nine models as of the second quarter of 2024 were in the I quadrant. Models whose lines are almost vertical (red, brown and purple) demonstrate a stable failure rate over time. Models on the blue and gray lines increase their failure rate as they age – the blue one, in particular (Seagate ST800DM002) lies within the normal range, since its AFR was around 1% for the first 60 months. The three models that reached the III quadrant have similar profiles – their curves bend more and more to the right as the failure rate increases. Finally, the black line is a 4TB Seagate drive that is “actively migrating” and being replaced by others.