OpenSolaris

Collectives Discussions Documentation Download Source Browser

Home » OpenSolaris Forums » zfs » discuss

Thread: [zfs-discuss] Big send/receive hangs on 2009.06

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 1 - Last Post: Feb 8, 2010 7:16 PM by: dd-b
dd-b

Posts: 438
From: US

Registered: 7/7/06
[zfs-discuss] Big send/receive hangs on 2009.06
Posted: Feb 8, 2010 5:35 AM

  Click to reply to this thread Reply

So, I was running my full backup last night, backing up my main data
pool zp1, and it seems to have hung.

Any suggestions for additional data gathering?

-bash-3.2$ zpool status zp1
pool: zp1
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
zp1 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0

errors: No known data errors

to one of my external USB drives holding pool bup-wrack

-bash-3.2$ zpool status bup-wrack
pool: bup-wrack
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
bup-wrack ONLINE 0 0 0
c7t0d0 ONLINE 0 0 0

errors: No known data errors

The line in the script that starts the send and receive is

zfs send -Rv "$srcsnap" | zfs recv -Fudv "$BUPPOOL/$HOSTNAME/$FS"

And the -v causes the start and stop of each incremental stream to be
announced of course. The last output from it was:

sending from @bup-20090315-190807UTC to zp1/ddb@bup-20090424-034702UTC
receiving incremental stream of zp1/ddb@bup-20090424-034702UTC into
bup-wrack/fsfs/zp1/ddb@bup-20090424-034702UTC

And it appears hung when I got up this morning. No activity on the
drive, zpool iostat shows no activity on the backup pool and no
unexplained activity on the data pool. The server is responsive, and
the data pool is responsive. ps shows considerable accumulated time on
the backup and receive processes, but no change in the last half hour.

zpool list shows that quite a lot of data has not yet been transferred
to the backup pool (which was newly-created when this backup started).

-bash-3.2$ zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
bup-wrack 928G 438G 490G 47% ONLINE /backups/bup-wrack
rpool 74G 6.35G 67.7G 8% ONLINE -
zp1 744G 628G 116G 84% ONLINE -

ps -ef shows

root 3153 3145 0 23:09:07 pts/3 19:59 zfs recv -Fudv
bup-wrack/fsfs/zp1
root 3145 3130 0 23:09:04 pts/3 0:00 /bin/bash
./bup-backup-full zp1 bup-wrack
root 3152 3145 0 23:09:07 pts/3 17:06 zfs send -Rv
zp1@bup-20100208-050907GMT

zfs list shows:

-bash-3.2$ zfs list -t snapshot,filesystem -r zp1
NAME USED AVAIL REFER MOUNTPOINT
zp1 628G 104G 33.8M /home
zp1@bup-20090223-033745UTC 0 - 33.8M -
zp1@bup-20090225-184857UTC 0 - 33.8M -
zp1@bup-20090302-032437UTC 0 - 33.8M -
zp1@bup-20090309-033514UTC 0 - 33.8M -
zp1@bup-20090315-190807UTC 0 - 33.8M -
zp1@bup-20090424-034702UTC 22K - 33.8M -
zp1@bup-20090619-063536GMT 0 - 33.8M -
zp1@bup-20090619-143851UTC 0 - 33.8M -
zp1@bup-20090804-024506UTC 0 - 33.8M -
zp1@bup-20090906-192431UTC 0 - 33.8M -
zp1@bup-20100102-035216UTC 0 - 33.8M -
zp1@bup-20100102-184101UTC 0 - 33.8M -
zp1@bup-20100208-050707GMT 0 - 33.8M -
zp1@bup-20100208-050907GMT 0 - 33.8M -
zp1/ddb 494G 104G 452G /home/ddb
zp1/ddb@bup-20090223-033745UTC 5.12M - 326G -
zp1/ddb@bup-20090225-184857UTC 4.15M - 328G -
zp1/ddb@bup-20090302-032437UTC 16.6M - 329G -
zp1/ddb@bup-20090309-033514UTC 8.95M - 330G -
zp1/ddb@bup-20090315-190807UTC 35.3M - 330G -
zp1/ddb@bup-20090424-034702UTC 140M - 345G -
zp1/ddb@bup-20090619-063536GMT 43.9M - 386G -
zp1/ddb@bup-20090619-143851UTC 44.9M - 386G -
zp1/ddb@bup-20090804-024506UTC 4.30G - 418G -
zp1/ddb@bup-20090906-192431UTC 8.43G - 440G -
zp1/ddb@bup-20100102-035216UTC 4.13G - 435G -
zp1/ddb@bup-20100102-184101UTC 108M - 431G -
zp1/ddb@bup-20100208-050707GMT 142K - 452G -
zp1/ddb@bup-20100208-050907GMT 140K - 452G -
zp1/jmf 33.5G 104G 33.3G /home/jmf
zp1/jmf@bup-20090223-033745UTC 0 - 33.2G -
zp1/jmf@bup-20090225-184857UTC 0 - 33.2G -
zp1/jmf@bup-20090302-032437UTC 0 - 33.2G -
zp1/jmf@bup-20090309-033514UTC 0 - 33.2G -
zp1/jmf@bup-20090315-190807UTC 0 - 33.2G -
zp1/jmf@bup-20090424-034702UTC 0 - 33.3G -
zp1/jmf@bup-20090619-063536GMT 0 - 33.3G -
zp1/jmf@bup-20090619-143851UTC 0 - 33.3G -
zp1/jmf@bup-20090804-024506UTC 0 - 33.3G -
zp1/jmf@bup-20090906-192431UTC 42K - 33.3G -
zp1/jmf@bup-20100102-035216UTC 0 - 33.3G -
zp1/jmf@bup-20100102-184101UTC 0 - 33.3G -
zp1/jmf@bup-20100208-050707GMT 0 - 33.3G -
zp1/jmf@bup-20100208-050907GMT 0 - 33.3G -
zp1/lydy 31.1G 104G 31.1G /home/lydy
zp1/lydy@bup-20090223-033745UTC 0 - 31.1G -
zp1/lydy@bup-20090225-184857UTC 0 - 31.1G -
zp1/lydy@bup-20090302-032437UTC 0 - 31.1G -
zp1/lydy@bup-20090309-033514UTC 0 - 31.1G -
zp1/lydy@bup-20090315-190807UTC 0 - 31.1G -
zp1/lydy@bup-20090424-034702UTC 0 - 31.1G -
zp1/lydy@bup-20090619-063536GMT 0 - 31.1G -
zp1/lydy@bup-20090619-143851UTC 0 - 31.1G -
zp1/lydy@bup-20090804-024506UTC 0 - 31.1G -
zp1/lydy@bup-20090906-192431UTC 0 - 31.1G -
zp1/lydy@bup-20100102-035216UTC 0 - 31.1G -
zp1/lydy@bup-20100102-184101UTC 0 - 31.1G -
zp1/lydy@bup-20100208-050707GMT 0 - 31.1G -
zp1/lydy@bup-20100208-050907GMT 0 - 31.1G -
zp1/music 24.8G 104G 24.8G /home/music
zp1/music@bup-20090223-033745UTC 1.03M - 24.3G -
zp1/music@bup-20090225-184857UTC 619K - 24.3G -
zp1/music@bup-20090302-032437UTC 287K - 24.3G -
zp1/music@bup-20090309-033514UTC 0 - 24.3G -
zp1/music@bup-20090315-190807UTC 0 - 24.3G -
zp1/music@bup-20090424-034702UTC 1.38M - 24.3G -
zp1/music@bup-20090619-063536GMT 0 - 24.3G -
zp1/music@bup-20090619-143851UTC 0 - 24.3G -
zp1/music@bup-20090804-024506UTC 2.08M - 24.8G -
zp1/music@bup-20090906-192431UTC 2.04M - 24.8G -
zp1/music@bup-20100102-035216UTC 906K - 24.8G -
zp1/music@bup-20100102-184101UTC 932K - 24.8G -
zp1/music@bup-20100208-050707GMT 0 - 24.8G -
zp1/music@bup-20100208-050907GMT 0 - 24.8G -
zp1/pddb 2.05G 104G 2.05G /home/pddb
zp1/pddb@bup-20090223-033745UTC 0 - 2.05G -
zp1/pddb@bup-20090225-184857UTC 0 - 2.05G -
zp1/pddb@bup-20090302-032437UTC 0 - 2.05G -
zp1/pddb@bup-20090309-033514UTC 0 - 2.05G -
zp1/pddb@bup-20090315-190807UTC 0 - 2.05G -
zp1/pddb@bup-20090424-034702UTC 0 - 2.05G -
zp1/pddb@bup-20090619-063536GMT 0 - 2.05G -
zp1/pddb@bup-20090619-143851UTC 0 - 2.05G -
zp1/pddb@bup-20090804-024506UTC 0 - 2.05G -
zp1/pddb@bup-20090906-192431UTC 0 - 2.05G -
zp1/pddb@bup-20100102-035216UTC 0 - 2.05G -
zp1/pddb@bup-20100102-184101UTC 0 - 2.05G -
zp1/pddb@bup-20100208-050707GMT 0 - 2.05G -
zp1/pddb@bup-20100208-050907GMT 0 - 2.05G -
zp1/public 43.1G 104G 33.7G /home/public
zp1/public@bup-20090223-033745UTC 191K - 33.8G -
zp1/public@bup-20090225-184857UTC 58K - 33.8G -
zp1/public@bup-20090302-032437UTC 59K - 33.8G -
zp1/public@bup-20090309-033514UTC 104K - 33.9G -
zp1/public@bup-20090315-190807UTC 335K - 33.9G -
zp1/public@bup-20090424-034702UTC 29.0M - 26.1G -
zp1/public@bup-20090619-063536GMT 234K - 26.6G -
zp1/public@bup-20090619-143851UTC 235K - 26.6G -
zp1/public@bup-20090804-024506UTC 943K - 27.1G -
zp1/public@bup-20090906-192431UTC 8.97M - 27.3G -
zp1/public@bup-20100102-035216UTC 1.67M - 33.5G -
zp1/public@bup-20100102-184101UTC 1.66M - 33.5G -
zp1/public@bup-20100208-050707GMT 0 - 33.7G -
zp1/public@bup-20100208-050907GMT 0 - 33.7G -
zp1/raphael 69K 104G 20K /home/raphael
zp1/raphael@bup-20090223-033745UTC 0 - 18K -
zp1/raphael@bup-20090225-184857UTC 0 - 18K -
zp1/raphael@bup-20090302-032437UTC 0 - 18K -
zp1/raphael@bup-20090309-033514UTC 0 - 18K -
zp1/raphael@bup-20090315-190807UTC 0 - 18K -
zp1/raphael@bup-20090424-034702UTC 0 - 18K -
zp1/raphael@bup-20090619-063536GMT 0 - 18K -
zp1/raphael@bup-20090619-143851UTC 0 - 18K -
zp1/raphael@bup-20090804-024506UTC 0 - 18K -
zp1/raphael@bup-20090906-192431UTC 0 - 18K -
zp1/raphael@bup-20100102-035216UTC 0 - 20K -
zp1/raphael@bup-20100102-184101UTC 0 - 20K -
zp1/raphael@bup-20100208-050707GMT 0 - 20K -
zp1/raphael@bup-20100208-050907GMT 0 - 20K -



--
David Dyer-Bennet, dd-b@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


dd-b

Posts: 438
From: US

Registered: 7/7/06
Re: [zfs-discuss] Big send/receive hangs on 2009.06
Posted: Feb 8, 2010 7:16 PM   in response to: dd-b
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Nobody has any ideas? It's still hung after work.

I wonder what it will take to stop the backup and export the pool? Well, that's nice; a straight "kill" terminated the processes, at least.

zpool status shows no errors. zfs list shows backup filesystems mounted.

zpool export -f is running...no disk I/O now...starting to look hung.

Ah, the zfs receive process is still in the process table. kill -9 doesn't help.

Kill and kill -9 won't touch the zpool export process, either.

Pulling the USB cable on the drive doesn't seem to be helping any either.

zfs list now hangs, but giving it a little longer just in case.

Kill -9 doesn't touch any of the hung jobs.

Closing the ssh sessions doesn't touch any of them either.

zfs list on pools other than bup-wrack works. zpool list works, and shows bup-wrack.

Attempting to set failmode=continue gives an I/O error.

Plugging the USB back in and then setting failmode gives the same I/O error.

cfgadm -al lists known disk drives and usb3/9 as "usb-storage connected". I think that's the USB disk that's stuck.

cfgadm -cremove usb3/9 failed "configuration operation not supported".

cfgadm -cdisconnect usb3/9 queried if I wanted to suspend activity, then failed with "cannot issue devctl to ap_id: /devices/pci@0,0/pci10de,cb84@2,1:9"

Still -al the same.

cfgadm -cunconfigure same error as disconnect.

I was able to list properties on bup-wrack:

bash-3.2$ zpool get all bup-wrack
NAME PROPERTY VALUE SOURCE
bup-wrack size 928G -
bup-wrack used 438G -
bup-wrack available 490G -
bup-wrack capacity 47% -
bup-wrack altroot /backups/bup-wrack local
bup-wrack health UNAVAIL -
bup-wrack guid 2209605264342513453 default
bup-wrack version 14 default
bup-wrack bootfs - default
bup-wrack delegation on default
bup-wrack autoreplace off default
bup-wrack cachefile none local
bup-wrack failmode wait default
bup-wrack listsnapshots off default

It's not healthy, alright. And the attempt to set failmode really did fail.

I've been here before, and it has always required a reboot.

Other than setting failmode=continue earlier, anybody have any ideas?




Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2010, Oracle Corporation and/or its affiliates

Oracle Logo