Eu tenho um programa do usuário Linux x86_64 multithreaded que está gravando em um SCTP soquete usando a chamada do sistema writev (). Eu queria confirmar a atomicidade de a chamada do sistema writev ().
A página man do writev () afirma:
ssize_t writev(int fd, const struct iovec *iov, int iovcnt);
The data transfers performed by readv() and writev() are atomic: the data written by writev()
is written as a single block that is not intermingled with output from writes in other processes
(but see pipe(7) for an exception); analogously, readv() is guaranteed to read a contiguous
block of data from the file, regardless of read operations performed in other threads or processes
that have file descriptors referring to the same open file description (see open(2)).
Então, quando olhei para a implementação do writev (), pensei que veria claramente um bloqueio. Quando eu não vi o bloqueio na implementação writev () comecei a rastrear as chamadas. Aqui está o que eu encontrei. Este é meu pela primeira vez percorrendo a fonte do kernel do Linux, então, por favor, desculpe os mal entendidos.
O kernel do Linux analisado é o 4.4.0 no x86.
A implementação de writev () inicia em fs / read_write.c: 896:
SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,u nsigned long, vlen)
e chama vfs_writev () definido no mesmo arquivo fs / read_write.c: 863
ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
unsigned long vlen, loff_t *pos)
{
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
return do_readv_writev(WRITE, file, vec, vlen, pos);
}
onde do_readv_writev () também está em fs / read_write.c: 798, e para o tipo WRITE será executado,
fn = (io_fn_t)file->f_op->write;
iter_fn = file->f_op->write_iter;
file_start_write(file);
file_start_write () é uma função inline em include / linux / fs.h: 2512,
static inline void file_start_write(struct file *file)
{
if (!S_ISREG(file_inode(file)->i_mode))
return;
__sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
}
S_ISREG () é definido em include / uapi / linux / stat.h: 20 para verificar se o descritor é um arquivo regular.
E __sb_start_write é definido em fs / super.c: 1252
/*
* This is an internal function, please use sb_start_{write,pagefault,intwrite}
* instead.
*/
int __sb_start_write(struct super_block *sb, int level, bool wait)
{
bool force_trylock = false;
int ret = 1;
#ifdef CONFIG_LOCKDEP
/*
* We want lockdep to tell us about possible deadlocks with freezing
* but it's it bit tricky to properly instrument it. Getting a freeze
* protection works as getting a read lock but there are subtle
* problems. XFS for example gets freeze protection on internal level
* twice in some cases, which is OK only because we already hold a
* freeze protection also on higher level. Due to these cases we have
* to use wait == F (trylock mode) which must not fail.
*/
if (wait) {
int i;
for (i = 0; i < level - 1; i++)
if (percpu_rwsem_is_held(sb->s_writers.rw_sem + i)) {
force_trylock = true;
break;
}
}
#endif
if (wait && !force_trylock)
percpu_down_read(sb->s_writers.rw_sem + level-1);
else
ret = percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
WARN_ON(force_trylock & !ret);
return ret;
}
EXPORT_SYMBOL(__sb_start_write);
Eu não acredito que meu kernel tenha sido compilado com CONFIG_LOCKDEP baseado neste this
O bloqueio do sistema de arquivos é descrito nos comentários que começam em fs / super.c: 1322
/**
* freeze_super - lock the filesystem and force it into a consistent state
* @sb: the super to lock
*
* Syncs the super to make sure the filesystem is consistent and calls the fs's
* freeze_fs. Subsequent calls to this without first thawing the fs will return
* -EBUSY.
*
* During this function, sb->s_writers.frozen goes through these values:
*
* SB_UNFROZEN: File system is normal, all writes progress as usual.
*
* SB_FREEZE_WRITE: The file system is in the process of being frozen. New
* writes should be blocked, though page faults are still allowed. We wait for
* all writes to complete and then proceed to the next stage.
*
* SB_FREEZE_PAGEFAULT: Freezing continues. Now also page faults are blocked
* but internal fs threads can still modify the filesystem (although they
* should not dirty new pages or inodes), writeback can run etc. After waiting
* for all running page faults we sync the filesystem which will clean all
* dirty pages and inodes (no new dirty pages or inodes can be created when
* sync is running).
*
* SB_FREEZE_FS: The file system is frozen. Now all internal sources of fs
* modification are blocked (e.g. XFS preallocation truncation on inode
* reclaim). This is usually implemented by blocking new transactions for
* filesystems that have them and need this additional guard. After all
* internal writers are finished we call ->freeze_fs() to finish filesystem
* freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
* mostly auxiliary for filesystems to verify they do not modify frozen fs.
*
* sb->s_writers.frozen is protected by sb->s_umount.
*/
E, finalmente, no kernel / locking / percpu-rwsem.c: 70
/*
* Like the normal down_read() this is not recursive, the writer can
* come after the first percpu_down_read() and create the deadlock.
*
* Note: returns with lock_is_held(brw->rw_sem) == T for lockdep,
* percpu_up_read() does rwsem_release(). This pairs with the usage
* of ->rw_sem in percpu_down/up_write().
*/
void percpu_down_read(struct percpu_rw_semaphore *brw)
{
might_sleep();
rwsem_acquire_read(&brw->rw_sem.dep_map, 0, 0, _RET_IP_);
if (likely(update_fast_ctr(brw, +1)))
return;
/* Avoid rwsem_acquire_read() and rwsem_release() */
__down_read(&brw->rw_sem);
atomic_inc(&brw->slow_read_ctr);
__up_read(&brw->rw_sem);
}
EXPORT_SYMBOL_GPL(percpu_down_read);
Então, há o bloqueio.