|
Replies:
1
-
Last Post:
Feb 8, 2010 7:16 PM
by: dd-b
|
|
|
Posts:
438
From:
US
Registered:
7/7/06
|
|
|
|
[zfs-discuss] Big send/receive hangs on 2009.06
Posted:
Feb 8, 2010 5:35 AM
|
|
So, I was running my full backup last night, backing up my main data pool zp1, and it seems to have hung.
Any suggestions for additional data gathering?
-bash-3.2$ zpool status zp1 pool: zp1 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config:
NAME STATE READ WRITE CKSUM zp1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0
errors: No known data errors
to one of my external USB drives holding pool bup-wrack
-bash-3.2$ zpool status bup-wrack pool: bup-wrack state: ONLINE scrub: none requested config:
NAME STATE READ WRITE CKSUM bup-wrack ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0
errors: No known data errors
The line in the script that starts the send and receive is
zfs send -Rv "$srcsnap" | zfs recv -Fudv "$BUPPOOL/$HOSTNAME/$FS"
And the -v causes the start and stop of each incremental stream to be announced of course. The last output from it was:
sending from @bup-20090315-190807UTC to zp1/ddb@bup-20090424-034702UTC receiving incremental stream of zp1/ddb@bup-20090424-034702UTC into bup-wrack/fsfs/zp1/ddb@bup-20090424-034702UTC
And it appears hung when I got up this morning. No activity on the drive, zpool iostat shows no activity on the backup pool and no unexplained activity on the data pool. The server is responsive, and the data pool is responsive. ps shows considerable accumulated time on the backup and receive processes, but no change in the last half hour.
zpool list shows that quite a lot of data has not yet been transferred to the backup pool (which was newly-created when this backup started).
-bash-3.2$ zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT bup-wrack 928G 438G 490G 47% ONLINE /backups/bup-wrack rpool 74G 6.35G 67.7G 8% ONLINE - zp1 744G 628G 116G 84% ONLINE -
ps -ef shows
root 3153 3145 0 23:09:07 pts/3 19:59 zfs recv -Fudv bup-wrack/fsfs/zp1 root 3145 3130 0 23:09:04 pts/3 0:00 /bin/bash ./bup-backup-full zp1 bup-wrack root 3152 3145 0 23:09:07 pts/3 17:06 zfs send -Rv zp1@bup-20100208-050907GMT
zfs list shows:
-bash-3.2$ zfs list -t snapshot,filesystem -r zp1 NAME USED AVAIL REFER MOUNTPOINT zp1 628G 104G 33.8M /home zp1@bup-20090223-033745UTC 0 - 33.8M - zp1@bup-20090225-184857UTC 0 - 33.8M - zp1@bup-20090302-032437UTC 0 - 33.8M - zp1@bup-20090309-033514UTC 0 - 33.8M - zp1@bup-20090315-190807UTC 0 - 33.8M - zp1@bup-20090424-034702UTC 22K - 33.8M - zp1@bup-20090619-063536GMT 0 - 33.8M - zp1@bup-20090619-143851UTC 0 - 33.8M - zp1@bup-20090804-024506UTC 0 - 33.8M - zp1@bup-20090906-192431UTC 0 - 33.8M - zp1@bup-20100102-035216UTC 0 - 33.8M - zp1@bup-20100102-184101UTC 0 - 33.8M - zp1@bup-20100208-050707GMT 0 - 33.8M - zp1@bup-20100208-050907GMT 0 - 33.8M - zp1/ddb 494G 104G 452G /home/ddb zp1/ddb@bup-20090223-033745UTC 5.12M - 326G - zp1/ddb@bup-20090225-184857UTC 4.15M - 328G - zp1/ddb@bup-20090302-032437UTC 16.6M - 329G - zp1/ddb@bup-20090309-033514UTC 8.95M - 330G - zp1/ddb@bup-20090315-190807UTC 35.3M - 330G - zp1/ddb@bup-20090424-034702UTC 140M - 345G - zp1/ddb@bup-20090619-063536GMT 43.9M - 386G - zp1/ddb@bup-20090619-143851UTC 44.9M - 386G - zp1/ddb@bup-20090804-024506UTC 4.30G - 418G - zp1/ddb@bup-20090906-192431UTC 8.43G - 440G - zp1/ddb@bup-20100102-035216UTC 4.13G - 435G - zp1/ddb@bup-20100102-184101UTC 108M - 431G - zp1/ddb@bup-20100208-050707GMT 142K - 452G - zp1/ddb@bup-20100208-050907GMT 140K - 452G - zp1/jmf 33.5G 104G 33.3G /home/jmf zp1/jmf@bup-20090223-033745UTC 0 - 33.2G - zp1/jmf@bup-20090225-184857UTC 0 - 33.2G - zp1/jmf@bup-20090302-032437UTC 0 - 33.2G - zp1/jmf@bup-20090309-033514UTC 0 - 33.2G - zp1/jmf@bup-20090315-190807UTC 0 - 33.2G - zp1/jmf@bup-20090424-034702UTC 0 - 33.3G - zp1/jmf@bup-20090619-063536GMT 0 - 33.3G - zp1/jmf@bup-20090619-143851UTC 0 - 33.3G - zp1/jmf@bup-20090804-024506UTC 0 - 33.3G - zp1/jmf@bup-20090906-192431UTC 42K - 33.3G - zp1/jmf@bup-20100102-035216UTC 0 - 33.3G - zp1/jmf@bup-20100102-184101UTC 0 - 33.3G - zp1/jmf@bup-20100208-050707GMT 0 - 33.3G - zp1/jmf@bup-20100208-050907GMT 0 - 33.3G - zp1/lydy 31.1G 104G 31.1G /home/lydy zp1/lydy@bup-20090223-033745UTC 0 - 31.1G - zp1/lydy@bup-20090225-184857UTC 0 - 31.1G - zp1/lydy@bup-20090302-032437UTC 0 - 31.1G - zp1/lydy@bup-20090309-033514UTC 0 - 31.1G - zp1/lydy@bup-20090315-190807UTC 0 - 31.1G - zp1/lydy@bup-20090424-034702UTC 0 - 31.1G - zp1/lydy@bup-20090619-063536GMT 0 - 31.1G - zp1/lydy@bup-20090619-143851UTC 0 - 31.1G - zp1/lydy@bup-20090804-024506UTC 0 - 31.1G - zp1/lydy@bup-20090906-192431UTC 0 - 31.1G - zp1/lydy@bup-20100102-035216UTC 0 - 31.1G - zp1/lydy@bup-20100102-184101UTC 0 - 31.1G - zp1/lydy@bup-20100208-050707GMT 0 - 31.1G - zp1/lydy@bup-20100208-050907GMT 0 - 31.1G - zp1/music 24.8G 104G 24.8G /home/music zp1/music@bup-20090223-033745UTC 1.03M - 24.3G - zp1/music@bup-20090225-184857UTC 619K - 24.3G - zp1/music@bup-20090302-032437UTC 287K - 24.3G - zp1/music@bup-20090309-033514UTC 0 - 24.3G - zp1/music@bup-20090315-190807UTC 0 - 24.3G - zp1/music@bup-20090424-034702UTC 1.38M - 24.3G - zp1/music@bup-20090619-063536GMT 0 - 24.3G - zp1/music@bup-20090619-143851UTC 0 - 24.3G - zp1/music@bup-20090804-024506UTC 2.08M - 24.8G - zp1/music@bup-20090906-192431UTC 2.04M - 24.8G - zp1/music@bup-20100102-035216UTC 906K - 24.8G - zp1/music@bup-20100102-184101UTC 932K - 24.8G - zp1/music@bup-20100208-050707GMT 0 - 24.8G - zp1/music@bup-20100208-050907GMT 0 - 24.8G - zp1/pddb 2.05G 104G 2.05G /home/pddb zp1/pddb@bup-20090223-033745UTC 0 - 2.05G - zp1/pddb@bup-20090225-184857UTC 0 - 2.05G - zp1/pddb@bup-20090302-032437UTC 0 - 2.05G - zp1/pddb@bup-20090309-033514UTC 0 - 2.05G - zp1/pddb@bup-20090315-190807UTC 0 - 2.05G - zp1/pddb@bup-20090424-034702UTC 0 - 2.05G - zp1/pddb@bup-20090619-063536GMT 0 - 2.05G - zp1/pddb@bup-20090619-143851UTC 0 - 2.05G - zp1/pddb@bup-20090804-024506UTC 0 - 2.05G - zp1/pddb@bup-20090906-192431UTC 0 - 2.05G - zp1/pddb@bup-20100102-035216UTC 0 - 2.05G - zp1/pddb@bup-20100102-184101UTC 0 - 2.05G - zp1/pddb@bup-20100208-050707GMT 0 - 2.05G - zp1/pddb@bup-20100208-050907GMT 0 - 2.05G - zp1/public 43.1G 104G 33.7G /home/public zp1/public@bup-20090223-033745UTC 191K - 33.8G - zp1/public@bup-20090225-184857UTC 58K - 33.8G - zp1/public@bup-20090302-032437UTC 59K - 33.8G - zp1/public@bup-20090309-033514UTC 104K - 33.9G - zp1/public@bup-20090315-190807UTC 335K - 33.9G - zp1/public@bup-20090424-034702UTC 29.0M - 26.1G - zp1/public@bup-20090619-063536GMT 234K - 26.6G - zp1/public@bup-20090619-143851UTC 235K - 26.6G - zp1/public@bup-20090804-024506UTC 943K - 27.1G - zp1/public@bup-20090906-192431UTC 8.97M - 27.3G - zp1/public@bup-20100102-035216UTC 1.67M - 33.5G - zp1/public@bup-20100102-184101UTC 1.66M - 33.5G - zp1/public@bup-20100208-050707GMT 0 - 33.7G - zp1/public@bup-20100208-050907GMT 0 - 33.7G - zp1/raphael 69K 104G 20K /home/raphael zp1/raphael@bup-20090223-033745UTC 0 - 18K - zp1/raphael@bup-20090225-184857UTC 0 - 18K - zp1/raphael@bup-20090302-032437UTC 0 - 18K - zp1/raphael@bup-20090309-033514UTC 0 - 18K - zp1/raphael@bup-20090315-190807UTC 0 - 18K - zp1/raphael@bup-20090424-034702UTC 0 - 18K - zp1/raphael@bup-20090619-063536GMT 0 - 18K - zp1/raphael@bup-20090619-143851UTC 0 - 18K - zp1/raphael@bup-20090804-024506UTC 0 - 18K - zp1/raphael@bup-20090906-192431UTC 0 - 18K - zp1/raphael@bup-20100102-035216UTC 0 - 20K - zp1/raphael@bup-20100102-184101UTC 0 - 20K - zp1/raphael@bup-20100208-050707GMT 0 - 20K - zp1/raphael@bup-20100208-050907GMT 0 - 20K -
-- David Dyer-Bennet, dd-b@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
Posts:
438
From:
US
Registered:
7/7/06
|
|
|
|
Re: [zfs-discuss] Big send/receive hangs on 2009.06
Posted:
Feb 8, 2010 7:16 PM
in response to: dd-b
To: Communities » zfs » discuss
|
|
Nobody has any ideas? It's still hung after work.
I wonder what it will take to stop the backup and export the pool? Well, that's nice; a straight "kill" terminated the processes, at least.
zpool status shows no errors. zfs list shows backup filesystems mounted.
zpool export -f is running...no disk I/O now...starting to look hung.
Ah, the zfs receive process is still in the process table. kill -9 doesn't help.
Kill and kill -9 won't touch the zpool export process, either.
Pulling the USB cable on the drive doesn't seem to be helping any either.
zfs list now hangs, but giving it a little longer just in case.
Kill -9 doesn't touch any of the hung jobs.
Closing the ssh sessions doesn't touch any of them either.
zfs list on pools other than bup-wrack works. zpool list works, and shows bup-wrack.
Attempting to set failmode=continue gives an I/O error.
Plugging the USB back in and then setting failmode gives the same I/O error.
cfgadm -al lists known disk drives and usb3/9 as "usb-storage connected". I think that's the USB disk that's stuck.
cfgadm -cremove usb3/9 failed "configuration operation not supported".
cfgadm -cdisconnect usb3/9 queried if I wanted to suspend activity, then failed with "cannot issue devctl to ap_id: /devices/pci@0,0/pci10de,cb84@2,1:9"
Still -al the same.
cfgadm -cunconfigure same error as disconnect.
I was able to list properties on bup-wrack:
bash-3.2$ zpool get all bup-wrack NAME PROPERTY VALUE SOURCE bup-wrack size 928G - bup-wrack used 438G - bup-wrack available 490G - bup-wrack capacity 47% - bup-wrack altroot /backups/bup-wrack local bup-wrack health UNAVAIL - bup-wrack guid 2209605264342513453 default bup-wrack version 14 default bup-wrack bootfs - default bup-wrack delegation on default bup-wrack autoreplace off default bup-wrack cachefile none local bup-wrack failmode wait default bup-wrack listsnapshots off default
It's not healthy, alright. And the attempt to set failmode really did fail.
I've been here before, and it has always required a reboot.
Other than setting failmode=continue earlier, anybody have any ideas?
|
|
|
|
|