NVIDIA GeForce RTX Failure Rates


The GeForce RTX range of cards have been taking a lot of flack since launch, mainly from people disappointed with the performance and lack of support for RTX in games, but also partly for reportedly high failure rates with users on the NVIDIA forums and Reddit reporting either black screens or artifacting.

We have already covered the performance and why it is wrong to crucify the RTX cards for that reason in this Technology Explained article, but we have not touched onto the failure rates, mainly because we don’t have enough accurate data to do so.

Electronic failure rates follow a bath curve, with the highest failure rates known as “infant mortality” near the beginning caused mainly by poor manufacturing (this is inevitable, as not every single component can be perfectly manufactured – it’s the nature of the game), and a good way down the line, due to wear and tear.

The so called infant mortality failure rates start off high and drop rapidly, while the wear and tear failure rates start off low and gradually increase. The third type of failure is the random failures that stay fairly consistent during the entire period. These three combined cause the observed failure rate to follow the shape of a bath when plotted on a graph, from high to low to high. As the RTX cards have only just been launched, what we’re seeing at the moment is the initial high point.

The news being reported is that the infant mortality failure rate is unusually high on the RTX cards, but nobody is actually looking at a decently sized data set. A thread on the NVIDIA forums or Reddit reporting 100 failures says nothing on its own, as if 1,000 cards have been sold in total the failure rate is an alarmingly high 10%, while if 100,000 cards have been sold in total it’s an insignificant 0.1%.

If we see 100 GeForce GTX 10 series cards reportedly dying compared to 1,500 RTX cards dying, it’s easy to assume that the RTX is a far less reliable card with fifteen times as many failures. Yes, in hard numbers more have failed, but if 500 GTX 10 series cards were sold and 60,000 RTX cards the failure rate for the GTX 10 series is 20% while the RTX series is 2.5%.

Without having the total number of cards sold to compare against the total number that have failed you cannot calculate the failure rate as a percentage, and without being able to compare the failure rate as a percentage of the RTX cards against that of the GeForce GTX 10 series cards (and preferably several generations before that) it’s impossible to say whether the failure rate is lower than normal, higher than normal, or on par with the norm.

There are two problems faced in getting the information required to make an accurate statement regarding the reliability of these cards and whether they’re fundamentally flawed. The first is getting access to a large enough data set including the total number of cards sold, as just knowing the number of failures alone is useless. Vendors don’t openly share this information, so we’re left with having to track down the information ourselves, which brings the second problem.

Journalism as we once knew it is all but dead, unfortunately, and it’s easier for the masses to run with a story of how the cards have a disturbingly high failure rate than to research the actual percentage of failures. A news headline such as “NVIDIA GeForce RTX Cards Are Dying Left, Right and Centre” is something that catches attention, and no media outlet (or YouTuber) wants to miss out on the wave of interest it causes. By the time the research has been completed it’s likely already old news, and they’ve missed the opportunity to maximize views.

Thankfully we have a source for the former, in the form of Roman “der8auer” Hartung, an extreme overclocker who works for CaseKing – the largest retailer in Europe. As you can imagine, being the largest in Europe gives them access to a much larger data set than most, with enough information available to give a far more accurate indication of how good or bad the situation really is.

While he can’t give the sales figures, he can give us something even more useful in the failure rates that CaseKing is experiencing. CaseKing doesn’t sell the NVIDIA Founders Edition cards, only selling the partner cards. As such, these statistics can’t necessarily be applied directly to the Founders Edition cards, but should still give you an overall idea as to the reliability of the GPUs.

That said, several partners have cards which are based on the reference PCB so the numbers aren’t necessarily completely exclusive of the Founders Edition cards which follow a very similar design.

For the GeForce GTX 1080 the failure rate they saw was 7.1% out of “thousands sold”, the GeForce GTX 1080 Ti was 4.6% out of “thousands sold” (the figure is lower as the cards haven’t been out for as long as the GTX 1080), the RXT 2080 had a failure rate of 0.17% out of “nearly a four digit number sold”, and lastly for the RTX 2080 Ti was 1.4% out of an unknown number sold. To give some perspective, CaseKing considers a failure rate of less than 3% as very good and only once failure rates get to more than 15-20% is it considered as a problematic product.

To understand these number one needs to keep in mind that a higher end product has a much higher chance of failure than an entry level product as not only is it more likely to be run in a stressful environment but there are also more components involved in the individual piece of hardware, increasing the number of potential points of failure.

Overall, the failure rates are not worth worrying about at this stage. If a distributor the size of CaseKing says there’s nothing to worry about, you can safely take their word that there really is nothing to worry about with RTX failure rates and you can buy with confidence regardless of what the news is reporting with the very low sample sizes available to them.

Source: der8bauer on YouTube