Hi! Thanks for the pointers. Unfortunately dmesg
and system logs where the first places I looked at, but I found nothing at the time. I tried it again now to give you the output of a zpool clear
, you can obviously ignore the failed email attempt. journalctl
:
Jun 07 08:06:24 truenas kernel: WARNING: Pool 'tank-02' has encountered an uncorrectable I/O failure and has been suspended.
Jun 07 08:06:24 truenas zed[799040]: eid=309 class=statechange pool='tank-02' vdev=xxx-xxx-xxx-xxx-xxx vdev_state=ONLINE
Jun 07 08:06:24 truenas zed[799049]: eid=310 class=statechange pool='tank-02' vdev=xxx-xxx-xxx-xxx-xxx vdev_state=FAULTED
Jun 07 08:06:24 truenas zed[799057]: eid=313 class=data pool='tank-02' priority=3 err=28 flags=0x20004000 bookmark=0:0:0:1
Jun 07 08:06:24 truenas zed[799058]: eid=311 class=vdev_clear pool='tank-02' vdev=xxx-xxx-xxx-xxx-xxx vdev_state=FAULTED
Jun 07 08:06:24 truenas zed[799067]: eid=312 class=data pool='tank-02' priority=3 err=28 flags=0x20004000 bookmark=0:62:0:0
Jun 07 08:06:24 truenas zed[799081]: eid=316 class=io_failure pool='tank-02'
Jun 07 08:06:24 truenas zed[799082]: eid=315 class=data pool='tank-02' priority=3 err=28 flags=0x20004000 bookmark=0:0:-1:0
Jun 07 08:06:24 truenas zed[799090]: eid=314 class=data pool='tank-02' priority=3 err=28 flags=0x20004000 bookmark=0:0:1:0
Jun 07 08:06:24 truenas find_alias_for_smtplib.py[799114]: sending mail to
To: root
Subject: ZFS device fault for pool tank-02 on truenas
MIME-Version: 1.0
Content-Type: text/plain; charset="ANSI_X3.4-1968"
Content-
Jun 07 08:06:24 truenas find_alias_for_smtplib.py[799114]: No aliases found to send email to root
Jun 07 08:06:24 truenas zed[799144]: error: statechange-notify.sh: eid=310: mail exit=1
dmesg
says even less.
I also tried to reboot the machine with the drive detached and then attach it at runtime while tailing dmesg and journalctl. Now, they are pretty verbose, so will only add here any interesting part (I didn’t notice anything new however):
[...]
[ 221.952569] usb 2-4: Enable of device-initiated U1 failed.
[ 221.954164] usb 2-4: Enable of device-initiated U2 failed.
[ 221.965756] usbcore: registered new interface driver usb-storage
[ 221.983528] usb 2-4: Enable of device-initiated U1 failed.
[ 221.983997] usb 2-4: Enable of device-initiated U2 failed.
[ 221.987603] scsi host2: uas
[ 221.987831] usbcore: registered new interface driver uas
[...]
[ 222.040564] sd 2:0:0:0: Attached scsi generic sg1 type 0
[ 222.049860] sd 2:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[ 222.051867] sd 2:0:0:0: [sdb] Write Protect is off
[ 222.051879] sd 2:0:0:0: [sdb] Mode Sense: 37 00 00 08
[ 222.056719] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 222.058407] sd 2:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
[ 222.058413] sd 2:0:0:0: [sdb] Optimal transfer size 33553920 bytes
[ 222.252607] sdb: sdb1
[ 222.253015] sd 2:0:0:0: [sdb] Attached SCSI disk
[ 234.935926] usb 2-4: USB disconnect, device number 2
[ 234.983962] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
[ 235.227936] sd 2:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[...]
Thanks for the advices, it was worth another try. Anything more that comes to mind?
Hi,
Thanks for sharing. I agree with you 100% and I think everybody commenting here does. The whole point of the thread however was to understand if/how you can identify the location of the problem without guessing. The reality is I got to the conclusion that people… don’t. Like you said people know ZFS is fussy about how does he speaks with the disks and the minimum issue it has it throws a tantrum. So people just switch things until they work (or buy expensive motherboards with many ports). I don’t like the idea of not knowing “why”, so I will just add on my notes that for my specific usecase I cannot trust ZFS + OS (TrueNas scale) to use the USB disk for backups via ZFS send/recieve.
I would like to add that I am not trying to
mirror
my main disk with a usb one. I just wanted to copy the zfs snapshots on the usb drive once a day at midnight. ZFS is just (don’t throw stones at me for this, it is just my opinon) too brittle to use it this way too. I mean when I am trying toclean
/recover the pool it just refuses (and there is no one writing on it).In my case there was no switching however. It was a single nvme drive in a single usb line in an enclusure. It was a separate stripe to just recieve data once a day.
Not without good logs or debugging tools.
I decided I cannot trust it so unfortunately I will take the usb enclosure with the nvme, format it with etx4 and use Kopia to backup the datasets there once a day. It is not what I wanted but it is the best I can get for now.
About better solutions for the my play-NAS in general, I am constrained with the ports I have. I (again personal choice - I understand people disagree with this) don’t want to go SATA. Unfortunately, since I could not find any PCIe switch with ASM2812I (https://www.asmedia.com.tw/product/866yq74SPBqRdtgC/7c5YQ79xz8urEGr1) I am unable to get more from my m2 nvme pcie 3x4 (speed loss for me is not an issue, my main bottleneck is the network). It is interesting how you can find many more interesting attempt at it in the PIs ecosystem but not for mini PCs.