Eu não tenho certeza se este é o lugar certo para perguntar isso ...
Eu tenho executado um kernel de pré-lançamento (4.2.0-rc5) e ontem tive um possível erro de disco no meu sistema de arquivos btrfs raid-1 de 6 discos. Ao tentar executar o comando btrfs device remove ...
I repetidamente recebi uma verificação de bugs do kernel que registrou a mensagem kernel BUG at fs/btrfs/extent-tree.c:1833!
.
Aug 15 10:42:44 capella kernel: ------------[ cut here ]------------
Aug 15 10:42:44 capella kernel: kernel BUG at fs/btrfs/extent-tree.c:1833!
Aug 15 10:42:44 capella kernel: invalid opcode: 0000 [#1] SMP
Aug 15 10:42:44 capella kernel: Modules linked in: nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace sunrpc fscache qt1010 af9013 dvb_usb_af9015 dvb_usb_v2 dvb_core rc_core sp5100_tco kvm_amd kvm pcspkr snd_hda_codec_hdmi evdev amd64_edac_mod edac_mce_amd edac_core nvidia(PO) i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core sg snd_hwdep snd_pcm tpm_infineon tpm_tis tpm snd_timer snd soundcore drm acpi_cpufreq processor thermal_sys button shpchp md_mod k10temp jc42 i2c_core loop parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 crc32c_generic btrfs xor raid6_pq dm_mod hid_generic usbhid hid sd_mod usb_storage ohci_pci tg3 ptp pps_core libphy ahci libahci libata ehci_pci ohci_hcd ehci_hcd scsi_mod usbcore usb_common
Aug 15 10:42:44 capella kernel: CPU: 1 PID: 95 Comm: kworker/u8:7 Tainted: P O 4.2.0-rc5-derek-00033-g6c84461-dirty #5
Aug 15 10:42:44 capella kernel: Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
Aug 15 10:42:44 capella kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
Aug 15 10:42:44 capella kernel: task: ffff880213b8cd00 ti: ffff880213b90000 task.ti: ffff880213b90000
Aug 15 10:42:44 capella kernel: RIP: 0010:[<ffffffffa020e4b3>] [<ffffffffa020e4b3>] insert_inline_extent_backref+0xe3/0xf0 [btrfs]
Aug 15 10:42:44 capella kernel: RSP: 0018:ffff880213b93af8 EFLAGS: 00010293
Aug 15 10:42:44 capella kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
Aug 15 10:42:44 capella kernel: RDX: ffff880000000000 RSI: 0000000000000001 RDI: 0000000000000000
Aug 15 10:42:44 capella kernel: RBP: ffff8800daef2800 R08: 0000000000004000 R09: ffff880213b93a08
Aug 15 10:42:44 capella kernel: R10: 0000000000000000 R11: 0000000000000003 R12: ffff8801d797dad0
Aug 15 10:42:44 capella kernel: R13: 0000000000004b18 R14: 0000000000000000 R15: 00001c7de0f8c000
Aug 15 10:42:44 capella kernel: FS: 00007fab947cb8c0(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
Aug 15 10:42:44 capella kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 15 10:42:44 capella kernel: CR2: 00000000010928d8 CR3: 00000000aa465000 CR4: 00000000000006e0
Aug 15 10:42:44 capella kernel: Stack:
Aug 15 10:42:44 capella kernel: 00001c7de0f8c000 0000000000004b18 0000000000000000 0000000000000000
Aug 15 10:42:44 capella kernel: 0000000000000001 0000000000000282 ffff8801d797dad0 ffffffffa020f9c0
Aug 15 10:42:44 capella kernel: ffff880213b93bb4 00000000000034bd ffff8801d797dae0 ffff8800db81b000
Aug 15 10:42:44 capella kernel: Call Trace:
Aug 15 10:42:44 capella kernel: [<ffffffffa020f9c0>] ? __btrfs_free_extent.isra.68+0x320/0xd50 [btrfs]
Aug 15 10:42:44 capella kernel: [<ffffffffa020e927>] ? __btrfs_inc_extent_ref.isra.52+0xa7/0x280 [btrfs]
Aug 15 10:42:44 capella kernel: [<ffffffffa0273fb2>] ? find_ref_head+0x52/0x70 [btrfs]
Aug 15 10:42:44 capella kernel: [<ffffffffa0213ce1>] ? __btrfs_run_delayed_refs+0xc41/0x1070 [btrfs]
Aug 15 10:42:44 capella kernel: [<ffffffff8101c6a5>] ? sched_clock+0x5/0x10
Aug 15 10:42:44 capella kernel: [<ffffffff811b9292>] ? __sb_start_write+0x42/0xe0
Aug 15 10:42:44 capella kernel: [<ffffffffa0216c81>] ? btrfs_run_delayed_refs.part.73+0x71/0x270 [btrfs]
Aug 15 10:42:44 capella kernel: [<ffffffff8109aef0>] ? update_curr+0xb0/0xf0
Aug 15 10:42:44 capella kernel: [<ffffffffa0216f18>] ? delayed_ref_async_start+0x78/0x90 [btrfs]
Aug 15 10:42:44 capella kernel: [<ffffffffa0259bb0>] ? btrfs_scrubparity_helper+0xc0/0x280 [btrfs]
Aug 15 10:42:44 capella kernel: [<ffffffff81083401>] ? process_one_work+0x1a1/0x430
Aug 15 10:42:44 capella kernel: [<ffffffff810836d7>] ? worker_thread+0x47/0x4a0
Aug 15 10:42:44 capella kernel: [<ffffffff81083690>] ? process_one_work+0x430/0x430
Aug 15 10:42:44 capella kernel: [<ffffffff810890f1>] ? kthread+0xc1/0xe0
Aug 15 10:42:44 capella kernel: [<ffffffff81089030>] ? kthread_worker_fn+0x170/0x170
Aug 15 10:42:44 capella kernel: [<ffffffff81539b9f>] ? ret_from_fork+0x3f/0x70
Aug 15 10:42:44 capella kernel: [<ffffffff81089030>] ? kthread_worker_fn+0x170/0x170
Aug 15 10:42:44 capella kernel: Code: 89 d9 4c 89 34 24 4d 89 e8 4c 89 f9 4c 89 e6 48 89 ef 48 89 44 24 10 8b 84 24 a8 00 00 00 89 44 24 08 e8 f1 d6 ff ff 31 c0 eb b3 <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 56 41
Aug 15 10:42:44 capella kernel: RIP [<ffffffffa020e4b3>] insert_inline_extent_backref+0xe3/0xf0 [btrfs]
Aug 15 10:42:44 capella kernel: RSP <ffff880213b93af8>
Aug 15 10:42:44 capella kernel: ---[ end trace 1cdbb5a82e302412 ]---
Olhando o código-fonte, a função relevante era int insert_inline_extent_backref(...)
, que chamava lookup_inline_extent_backup(...)
; presumivelmente, isso retornou zero, então a linha 1833 BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID)
foi executada.
static noinline_for_stack
int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
u64 bytenr, u64 num_bytes, u64 parent,
u64 root_objectid, u64 owner,
u64 offset, int refs_to_add,
struct btrfs_delayed_extent_op *extent_op)
{
struct btrfs_extent_inline_ref *iref;
int ret;
ret = lookup_inline_extent_backref(trans, root, path, &iref,
bytenr, num_bytes, parent,
root_objectid, owner, offset, 1);
if (ret == 0) {
BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
update_inline_extent_backref(root, path, iref,
refs_to_add, extent_op, NULL);
} else if (ret == -ENOENT) {
setup_inline_extent_backref(root, path, iref, parent,
root_objectid, owner, offset,
refs_to_add, extent_op);
ret = 0;
}
return ret;
}
Isso ocorreu repetidamente usando o kernel RC5, então eu tentei o lançamento do Debian 4.1.0-1 amd64, que completou o dispositivo removido com sucesso.
Pergunta óbvia: isso é algo corrupto em meu sistema de arquivos que o kernel anterior erra, ou é apenas um bug na versão 4.2.0-RC5 que, esperamos, será corrigido na versão final?
Eu executei uma limpeza dos dados que não relataram erros. Há mais alguma coisa que devo verificar?
FYI: Inicialmente baixei o código-fonte, construí e executei um kernel de pré-lançamento quando quis converter os dados em um sistema de arquivos e descobri que o comando btrfs balance start -dconvert=RAID1 ...
foi quebrado na versão lançada!