2022/10/07

TrueNAS (FreeNAS) - 7 years later

7 years after the original FreeNAS, It has changed a lot. Guess it's time to write some recap and experiences along the additions and modifications to my NAS. It's gonna be a long one, for those impatient, here is the TL;DR:

  • Maxed RAM to 128G
  • Added a PCIe NVMe expansion card, and
    • Upgraded mirrored boot device to NVMe drives
    • Added another NVMe drive for L2ARC
  • Upgraded disks to 8TB
  • Upgraded switch to MultiGig (10G/2.5G) - Not directly related to the NAS itself
  • Peak CIFS get a nice bump, from 3.5Gbps to 5.45Gbps

自從組 FreeNAS 已經過了七年,特別寫一篇 blog 詳述中間各種變化與升級,下面條列升級重點,給沒時間細讀的人參考。

  • 記憶體加滿 - 128G
  • 加了一張 PCIe NVMe 擴充卡
    • 把開機 USB 換成 NVMe
    • 再多加一張 NVMe 當成 L2ARC
  • 硬碟就地升級成 8TB
    • 拆外接硬碟,改 SATA 電源接頭
    • smartctlbadblocks 壓力測試硬碟
    • 陣列重組 (resilvering)
  • 升級區網,終於讓主機板上的 10G 網卡解除限制器
  • CIFS 效能從 3.5Gbps 提升到 5.45Gbps

 

If you are still interested, keep going...

對細節有興趣的請繼續往下看...

At that time, the total cost of building the NAS is around $2,500 USD, if we depreciate it to zero over 7 years, with usable storage of 12TB, it's about $2.5/TB/month. If we only consider the disks ($1,500), that will be $1.5/TB/month.

Since I paid for the whole RAM slots, maxing the RAM to 128G is like a no-brainer knee reflex. It provided enough room to run VMs on it, however, that also means when you need to turn off your NAS, your VM and the services those hosts are going to be offline. So at the end of day I decided to move all the services to docker on a different machine.

The PCIe NVMe expansion card is an unplanned upgrade, FreeNAS OS was loaded from USB disks and ran on RAM until 11.1, Once of my USB boot device went belly up, I noticed there's nothing onboard for me to add a mirrored boot device. the motherboard has 6 SATA ports, and 1 NVMe M.2 slot, nothing else. I'll need to expand for a mirrored boot device.

在 FreeNAS 上面跑幾個比較肥的 VM,記憶體有多少都不夠用。因此把記憶體加滿這種事情在經過大腦思考前,膝蓋都已經下單結帳完了。不過每次重開機,VM 也會跟著一起重開,不太方便,後來就把 VM 上跑的東西慢慢移到另外一台機器上跑 docker。

古早以前,FreeNAS 開機是放在 USB 上面,但自從 11.1 後,開機碟會有頻繁的讀寫,官方建議放在正常的硬碟上。(當然我是直到 USB 碟掛點了才知道這件事情。) 但這張主機板已經沒有多餘的插槽裝二個開機碟。


With a single PCIe x16, I could choose an HBA, or this new shiny thingy with the PCIe bifurcation my board happens to support. The rest is reinstalling FreeNAS on NVMe and mirror the boot drive. and it looks like we have some extra slots! I am gonna use the whole speedometer again - another no-brainer, L2ARC on NVMe.

However, the biggest performance improvement is not any of these upgrades (or maybe the combination of all the upgrades) - I upgraded my primary switch to MultiGig with this badass, and the NAS is now on a 2x10G LACP with the help of this, and with my puny WD Red 4TB, it can do 3.5Gbps consistently, which is very impressive.

還好主機板上還有一個 PCIe x16,我當然也可以裝一張 HBA 擴充卡,或者是這個閃閃發光的神奇裝置,而且正好主機板有支援 PCIe bifurcation,唉,我的膝蓋又下單了... 而且好像可以多插幾張NVMe... 那就順便加個 L2ARC 好了!

不過對效能影響最大的升級不在機器本身,當我把區網升級到 2.5G / 10G 後,限制器整個解除,六顆 WD Red 4TB 竟然 CIFS 可以維持在 3.5Gbps,整個跌破眼鏡。


Fast forward, COVID happened, and Blade Runner 2020, and COVID variants, fuck.

Disks getting slowly filled from 7 years of usage, and a not-so-large 6x4TB RaidZ2 will eventually be full. But because of COVID, there's a huge strain on the supply chain, and also there's "sustainable" chai chia coin, disks are getting expensive.

However, one day I saw this in a nearby Costco. Problem solved, with new minor problems:

The practice of taking the content out of the shell is called shucking - like shucking oysters, in this case, the shell is the USB enclosure, and the juicy meat is the hard drive. (It would be spot on if WD still use Caviar name) After watching a YouTube tutorial, all the disks are out in 5 minutes. However, it is not ready to go in to the NAS yet!

First, these drives SATA power needs to be hacked a bit, using a tape to cover the 3rd pin.

 

不久後武漢肺炎(與它的變種們)就爆發了,加州還因為野火太旺,直接進入媽的刀鋒戰士,接著供應鏈炸裂,跟不知道哪個天才發明的奇亞幣,讓硬碟價錢跳了好幾階,我的升級計畫就暫時被擱置了...

直到某一天在附近的溝子口看到特價硬碟,我的膝蓋就拿出信用卡下單了。但這特價硬碟需要稍微處理一下才能用,要先從外接盒裡把硬碟拆出來,並且把 SATA 電源接頭第三腳用絕緣膠帶貼起來(用 Molex 轉 SATA也可以) 


After that, it takes a long time... the disk burn-in test.

smartctl is pretty standard - do a short test, then a long test. The point of short test is to fail fast - there's no point spending days on an in-warranty faulty disk, as most manufacturer would replace those.

then after that is badblocks - specifically, a write-read one. (WARNING: IT WILL WIPE THE DISK!) for this particular 8TB drive, it took 110 hours to finish. 

Finally, another smartctl long test to check every block is fine. So, 5 days in total, just to test the disks. I need to run 2 batches (10 days) as my other system has only 4 SATA ports.

(When running badblocks, use a larger -c so it runs faster. Another interesting observation from the burn-in tests, 2 out of the 6 disks runs badblocks about 2% faster, I wonder if there's a lottery factor in disk drives too, maybe the controller chips?)

Once the burn-in finished, now it's just a matter of replacing the drives one by one, and let TrueNAS resilver, each disk took about 10 hours and decreasing as I go, probably because this new disk is 7200rpm (vs the old 4TB 5400rpm WD Red) which shortens resilver time gradually as the pool's read performance increases. After all resilver, the raidz2 pool expanded automatically to 8TB x (6-2).

Finally, I got a nice 5.45Gbps testing with CIFS with cache miss (reboot with mostly empty ARC/L2ARC), this nice NAS box will keep rolling *slaps on case* until I fully upgrade LAN to 10GbE.



接上去確定硬碟都抓到以後,就開始燒機測試注意,以下的燒機測試會同時燒掉硬碟上的資料),用 smartctl 短長各跑一輪,接下來用 badblocks 進行讀寫測試( -c 用大一點會跑稍快),測試完後再跑一次 smartctl long... 這顆 8TB 總共花了 110 小時才跑完(建議全部接上去,一口氣壓測)。我另一台機器只有 4 SATA port,只好分二批測。

(測完以後發現有二顆硬碟讀寫稍微快一點(2%),不知是硬碟樂透,還是正常的效能浮動。)

燒機燒完確定沒有不良品,SMART 數值正常,就可以上線啦!利用 resilver 機制一顆一顆慢慢換,每次 10 小時,換到後面越來越快,可能是因為從 5400 轉升級到 7200 轉,讓整體效能慢慢增加的原因吧?換完 raid z2 pool 容量就自動升級到 8TB x (6-2)

最後 CIFS 實測,cache miss 可以維持在 5.45Gbps,disk busy 大概在 80% 左右,看來在區網全面升級成 10Gbps 之前,這台可以再戰個幾年沒有問題。

 

No comments: