Smartctl

From wikinotes
Revision as of 03:07, 9 May 2021 by Will (talk | contribs) (→‎Usage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This guide is the most complete I've seen on how to interpret SMART HDD info, by the author of the tool: https://www.linuxjournal.com/article/6983?page=0%2C1

Documentation

wikipedia SMART attributes https://en.wikipedia.org/wiki/S.M.A.R.T.#ATA_S.M.A.R.T._attributes

Tutorials

Watching a hard drive die (in smartctl) http://notemagnet.blogspot.com/2009/10/watching-hard-drive-die.html
Dan Langville monitoring HDD health https://www.freebsddiary.org/smart.php

Install

sudo pacman -S smartmontools    # ArchLinux
sudo pkg install smartmontools  # FreeBSD

Usage

Error Logs

sudo smartctl -l error /dev/sda  # list logs of type 'error' (see man page to help read result)
...
Error 2 occurred at disk power-on lifetime: 153 hours (6 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 08      00:00:17.300  IDENTIFY DEVICE

Error 1 occurred at disk power-on lifetime: 154 hours (6 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 01 00 00 a0  Error: UNC 1 sectors at LBA = 0x00000001 = 1

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 01 01 00 00 a0 ff      11:17:17.000  READ DMA EXT
  25 03 01 01 00 00 a0 ff      11:17:17.000  READ DMA EXT
  25 03 30 5e 00 d4 48 04      11:17:05.800  READ DMA EXT
  25 03 40 4f 00 d4 40 00      11:16:56.600  READ DMA EXT
  35 03 08 4f 00 9c 40 00      11:16:56.600  WRITE DMA EXT

Auto Health Assessment (not always accurate)

sudo smartctl -H /dev/sda        # overall health assessment
SMART overall-health self-assessment test result: PASSED

SmartCtl Attributes and values

sudo smartctl -a /dev/sda        # list smartctl values


  • check raw_value of Reallocated_Sector_Ct and Reallocated_Event_Count If you see a value over 0, the drive is starting to die.
  • check Power_On_Hours to verify your error log entries pertain to the current drive issues.
  • check Total_LBAs_written on SSDs. SSDs have a max number of writes. LBAs may refer to sectors, or for example with INTEL, raw_value is incremented by 1 every 32MB. Generally, refers to logical sector size. Find your sector-size using fdisk -l. http://www.dewassoc.com/kbase/hard_drives/lba.htm
...
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   092   092   016    Pre-fail  Always       -       2621443
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   111   111   024    Pre-fail  Always       -       600 (Average 660)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       74
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       7
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       154
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       69
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       77
193 Load_Cycle_Count        0x0012   100   100   050    Old_age   Always       -       77
194 Temperature_Celsius     0x0002   114   114   000    Old_age   Always       -       48 (Lifetime Min/Max 18/56)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       12
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       4
...

Self Testing


sudo smartctl -t short /dev/sda   # run a short test
sudo smartctl -t long /dev/sda    # run a long test

sudo smartctl -l selftest /dev/sda  # print self test report