{"id":188,"date":"2014-11-29T22:29:24","date_gmt":"2014-11-30T06:29:24","guid":{"rendered":"http:\/\/nramkumar.org\/tech\/?p=188"},"modified":"2014-11-30T18:46:36","modified_gmt":"2014-12-01T02:46:36","slug":"replacing-a-failing-snapraid-parity-drive-on-my-ubuntu-home-server","status":"publish","type":"post","link":"https:\/\/nramkumar.org\/tech\/blog\/2014\/11\/29\/replacing-a-failing-snapraid-parity-drive-on-my-ubuntu-home-server\/","title":{"rendered":"Replacing a failing SnapRAID parity drive on my Ubuntu Home Server"},"content":{"rendered":"<p>About 3 weeks ago, I got an email notification from smartd that the regular SMART short check on one of the drives in my home server failed. The specific failure was a failure to read one of the sectors in the disk (current pending sector count went to 1 from 0). Now, it is hard to predict if this type of error is transient or a precursor to drive failure. I re-ran the check again and it passed a second time, so I decided to wait and watch. Two days later, the same error was reported but with a different sector failing to read. At this point, given that the tolerance in my snapraid setup is for a single disk failure, I decided to be prudent and order a new drive (<a title=\"HGST 3TB NAS Drive\" href=\"http:\/\/www.newegg.com\/Product\/Product.aspx?Item=N82E16822145911\" target=\"_blank\">this<\/a> one if you are curious). While waiting for the drive to arrive, I re-ran a longer self-test on the failing drive (the failing drive was <a title=\"WD Green 2TB\" href=\"http:\/\/www.newegg.com\/Product\/Product.aspx?Item=N82E16822136514\" target=\"_blank\">this <\/a>one, it had slight over 3.5 years of use) which passed but as before, two days later the scheduled short test failed again with yet another read failure on a different sector. The drive in question was the parity drive for my SnapRAID setup.<\/p>\n<p>Once my new drive arrived, the first thing I did was take a copy of my parity file from the failing drive to an external hard drive. Per the <a title=\"SnapRAID - Replacing a parity drive\" href=\"http:\/\/snapraid.sourceforge.net\/faq.html#reppardisk\" target=\"_blank\">SnapRAID FAQ for replacing a parity drive<\/a>, the process proceeds a lot faster if you can get whatever you can of your old parity data (note that this is not necessary, but simply an optimization).<\/p>\n<p>After that, I shut down my machine, replaced the failing drive with the new drive and restarted my Ubuntu server (pro tip: If you&#8217;re running the OS off a USB stick like I am, remove the USB stick before moving the machine around\/opening it up). At this point, the new drive will not be recognized or mounted as it doesn&#8217;t have a valid filesystem on it. Also, the boot process will seem a bit perilous as the system will recognize that one of the drives it is supposed to be mounting (via \/etc\/fstab) is missing. You can tell Linux that it is ok to skip mounting that drive and continue with the boot process.<\/p>\n<p>After the boot completed, the first order of business was to partition the drive and create a filesystem. For drives &gt; 2TB, you&#8217;re best off using parted. For my case, I simply created one primary GPT partition in the drive. Now that the partition was created, it was time to create a filesystem &#8211; I used mkfs.ext4 to create a file system on the new partition. Now that the drive was prepared and ready, I used blkid to determine the UUID for the partition and then updated \/etc\/fstab with the new UUID. You can simply run mount -a as root to get the new drive mounted and any other dependent mounts to also take place automatically. Assuming \/dev\/sde is the new drive that needs to be prepared, here are the steps:<\/p>\n<pre>\r\nsudo parted \/dev\/sde\r\nmklabel gpt\r\nmkpart primary\r\nquit\r\nsudo mkfs.ext4 \/dev\/sde1\r\nblkid \/dev\/sde1\r\n<\/pre>\n<p>The next step in the process was to copy over the parity file from the external drive to the new replacement drive. After this, I simply ran snapraid fix and spot checked a few files to make sure the data seemed ok. Finally, I ran snapraid sync manually to make sure everything was in sync.<\/p>\n<p>Overall, the process of replacing the drive was extremely simple and I was up and running within 2 hours including the time to physically replace the drive (and clean the year long gunk inside the machine). The only wrinkle I faced was that after snapraid fix, my Plex media server running on that machine seemed to have lost all its configured libraries. I suspect this was due to some database inconsistency introduced by the restoration (most likely the restore doesn&#8217;t play well with Plex for some reason or I screwed something up by maybe running Plex while restoring). In any case, this was a very easy fix too &#8211; simply copy over the last available backup of my library database (Plex does the backup per a schedule you can specify) and things started working again.<\/p>\n<p>While I&#8217;m not thrilled that I had to replace a hard drive on the home server, I am very happy at how easy and seamless the process was and the fact that I didn&#8217;t lose any data in the process. Even more impressive was how quick the whole recovery process was. SnapRAID remains an absolutely fantastic piece of work for a home server NAS environment and based on this experience, I have no hestation in recommending it as the first choice for protection against drive failures for a home NAS environment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>About 3 weeks ago, I got an email notification from smartd that the regular SMART short check on one of the drives in my home server failed. The specific failure was a failure to read one of the sectors in the disk (current pending sector count went to 1 from 0). Now, it is hard&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55,53,6],"tags":[],"class_list":["post-188","post","type-post","status-publish","format-standard","hentry","category-drive-replacement","category-snapraid","category-ubuntu"],"_links":{"self":[{"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/posts\/188","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/comments?post=188"}],"version-history":[{"count":4,"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/posts\/188\/revisions"}],"predecessor-version":[{"id":192,"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/posts\/188\/revisions\/192"}],"wp:attachment":[{"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/media?parent=188"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/categories?post=188"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nramkumar.org\/tech\/wp-json\/wp\/v2\/tags?post=188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}