Isso me lembrou de uma palestra que tive com Alexander Larsson , o principal desenvolvedor do Nautilus e outros projetos, incluindo o GVFS.
Giles sua resposta , especificamente o pouco sobre o Nautilus procurando dentro do conteúdo de arquivos, toca na principal razão pela qual o Nautilus é "lento". No entanto, Giles não explica por que isso é lento, o que pode ser óbvio para alguns, mas não para outros. Aqui está o que Alex tinha a dizer:
Say you start with a blank slate, i.e. you have not accessed the filesystem at all. Now say you run stat(“/some/dir/file”). First the kernel has to find the file, which in technical terms is called the inode. It starts by looking in the filesystem superblock, which stores the inode of the root directory. Then it opens the root directory, finds “some”, opens that, finds “dir”, etc. eventually finding the inode for file.
Then you have to actually read the inode data. After first read this is also cached in RAM. So, a read only has to happen once.
Think of the HD like an old record player, once you’re in the right place with the needle you can keep reading stuff fast as it rotates. However, once you need to move to a different place, called “seeking” you’re doing something very different. You need to physically move the arm, then wait for the platter to spin until the right place is under the needle. This kind of physical motion is inherently slow so seek times for disks are pretty long.
So, when do we seek? It depends on the filesystem layout of course. Filesystems try to store files consecutively as to increase read performance, and they generally also try to store inodes for a single directory near each other but it all depends on things like when the files are written, filesystem fragmentation, etc. So, in the worst case, each stat of a file will cause a seek and then each open of the file will cause a second seek. So, thats why things take such a long time when nothing is cached.
Some filesystems are better than others, defragmentation might help. You can do some things in apps. For instance, GIO sorts the received inodes from readdir() before stating them hoping that the inode number has some sort of relation to disk order (it generally has) thus minimizing random seeks back and forth.
One important thing is to design your data storage and apps to minimize seeking. For instance, this is why Nautilus reading /usr/bin is slow, because the files in there generally have no extension we need to do magic sniffing for each. So, we need to open each file => one seek per file => slooooow. Another example is apps that store information in lots of small files, like gconf used to do, also a bad idea. Anyway, in practice I don’t think there is much you can do except try to hide the latencies.
Ele terminou com a seguinte nota:
The real fix for this whole dilemma is to move away from rotating media. I hear the Intel SSDs are awesome. Linus swears by them.
: -)