Now that we’ve learned what a translator looks like and how to build one, it’s time to run one and actually watch it work. The best way to do this is good old-fashioned gdb, as follows (using some of the examples from last time).
1 2 3 4 5 6 7 |
[root@gfs-i8c-01 xlator_example]# gdb glusterfs GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6) ... (gdb) r --debug -f my.vol /play/import Starting program: /usr/sbin/glusterfs --debug -f my.vol /play/import ... [2011-11-23 11:23:16.495516] I [fuse-bridge.c:2971:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 |
If you get to this point, your glusterfs client process is already running. You can go to another window to see the mountpoint, do file operations, etc.
[root@gfs-i8c-01 ~]# df /play/import Filesystem 1K-blocks Used Available Use% Mounted on /root/xlator_example/my.vol 114506240 2643968 106045568 3% /play/import [root@gfs-i8c-01 ~]# ls /play/import a_file [root@gfs-i8c-01 ~]# cat /play/import/a_file hello
Now let’s interrupt the process and see where we are.
1 2 3 4 5 6 7 8 9 10 11 |
^C Program received signal SIGINT, Interrupt. 0x0000003a0060b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) info threads 5 Thread 0x7fffeffff700 (LWP 27206) 0x0000003a002dd8c7 in readv () from /lib64/libc.so.6 4 Thread 0x7ffff50e3700 (LWP 27205) 0x0000003a0060b75b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7ffff5f02700 (LWP 27204) 0x0000003a0060b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x7ffff6903700 (LWP 27203) 0x0000003a0060f245 in sigwait () from /lib64/libpthread.so.0 * 1 Thread 0x7ffff7957700 (LWP 27196) 0x0000003a0060b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 |
Like any non-toy server, this one has multiple threads. What are they all doing? Honestly, even I don’t know. Thread 1 turns out to be in event_dispatch_epoll, which means it’s the one handling all of our network I/O. Note that with socket multi-threading patch this will change, with one thread in socket_poller per connection. Thread 2 is in glusterfs_sigwaiter which means signals will be isolated to that thread. Thread 3 is in syncenv_task, so it’s a worker process for synchronous requests such as those used by the rebalance and repair code. Thread 4 is in janitor_get_next_fd, so it’s waiting for a chance to close no-longer-needed file descriptors on the local filesystem. (I admit I had to look that one up, BTW.) Lastly, thread 5 is in fuse_thread_proc, so it’s the one fetching requests from our FUSE interface. You’ll often see many more threads than this, but it’s a pretty good basic set. Now, let’s set a breakpoint so we can actually watch a request.
1 2 3 4 |
(gdb) b rot13_writev Breakpoint 1 at 0x7ffff50e4f0b: file rot-13.c, line 119. (gdb) c Continuing. |
At this point we go into our other window and do something that will involve a write.
1 2 3 4 5 6 7 |
[root@gfs-i8c-01 ~]# echo goodbye > /play/import/another_file (back to the first window) [Switching to Thread 0x7fffeffff700 (LWP 27206)] Breakpoint 1, rot13_writev (frame=0x7ffff6e4402c, this=0x638440, fd=0x7ffff409802c, vector=0x7fffe8000cd8, count=1, offset=0, iobref=0x7fffe8001070) at rot-13.c:119 119 rot_13_private_t *priv = (rot_13_private_t *)this->private; |
Remember how we built with debugging symbols enabled and no optimization? That will be pretty important for the next few steps. As you can see, we’re in rot13_writev, with several parameters.
- frame is our always-present frame pointer for this request. Also, frame->local will point to any local data we created and attached to the request ourselves.
- this is a pointer to our instance of the rot-13 translator. You can examine it if you like to see the name, type, options, parent/children, inode table, and other stuff associated with it.
- fd is a pointer to a file-descriptor object (fd_t, not just a file-descriptor index which is what most people use “fd” for). This in turn points to an inode object (inode_t) and we can associate our own rot-13-specific data with either of these.
- vector and count together describe the data buffers for this write, which we’ll get to in a moment.
- offset is the offset into the file at which we’re writing.
- iobref is a buffer-reference object, which is used to track the life cycle of buffers containing read/write data. If you look closely, you’ll notice that vector[0].iov_base points to the same address as iobref->iobrefs[0].ptr, which should give you some idea of the inter-relationships between vector and iobref.
OK, now what about that vector? We can use it to examine the data being written, like this.
(gdb) p vector[0] $2 = {iov_base = 0x7ffff7936000, iov_len = 8} (gdb) x/s 0x7ffff7936000 0x7ffff7936000: "goodbye\n"
It’s not always safe to view this data as a string, because it might just as well be binary data, but since we’re generating the write this time it’s safe and convenient. With that knowledge, let’s step through things a bit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
(gdb) s 120 if (priv->encrypt_write) (gdb) 121 rot13_iovec (vector, count); (gdb) rot13_iovec (vector=0x7fffe8000cd8, count=1) at rot-13.c:57 57 for (i = 0; i < count; i++) { (gdb) 58 rot13 (vector[i].iov_base, vector[i].iov_len); (gdb) rot13 (buf=0x7ffff7936000 "goodbye\n", len=8) at rot-13.c:45 45 for (i = 0; i < len; i++) { (gdb) 46 if (buf[i] >= 'a' && buf[i] <= 'z') (gdb) 47 buf[i] = 'a' + ((buf[i] - 'a' + 13) % 26); |
Here we’ve stepped into rot13_iovec, which iterates through our vector calling rot13, which in turn iterates through the characters in that chunk doing the rot-13 operation if/as appropriate. This is pretty straightforward stuff, so let’s skip to the next interesting bit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
(gdb) fin Run till exit from #0 rot13 (buf=0x7ffff7936000 "goodbye\n", len=8) at rot-13.c:47 rot13_iovec (vector=0x7fffe8000cd8, count=1) at rot-13.c:57 57 for (i = 0; i < count; i++) { (gdb) fin Run till exit from #0 rot13_iovec (vector=0x7fffe8000cd8, count=1) at rot-13.c:57 rot13_writev (frame=0x7ffff6e4402c, this=0x638440, fd=0x7ffff409802c, vector=0x7fffe8000cd8, count=1, offset=0, iobref=0x7fffe8001070) at rot-13.c:123 123 STACK_WIND (frame, (gdb) b 129 Breakpoint 2 at 0x7ffff50e4f35: file rot-13.c, line 129. (gdb) b rot13_writev_cbk Breakpoint 3 at 0x7ffff50e4db3: file rot-13.c, line 106. (gdb) c |
So we’ve set breakpoints on both the callback and the statement following the STACK_WIND. Which one will we hit first?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Breakpoint 3, rot13_writev_cbk (frame=0x7ffff6e4402c, cookie=0x7ffff6e440d8, this=0x638440, op_ret=8, op_errno=0, prebuf=0x7fffefffeca0, postbuf=0x7fffefffec30) at rot-13.c:106 106 STACK_UNWIND_STRICT (writev, frame, op_ret, op_errno, prebuf, postbuf); (gdb) bt #0 rot13_writev_cbk (frame=0x7ffff6e4402c, cookie=0x7ffff6e440d8, this=0x638440, op_ret=8, op_errno=0, prebuf=0x7fffefffeca0, postbuf=0x7fffefffec30) at rot-13.c:106 #1 0x00007ffff52f1b37 in posix_writev (frame=0x7ffff6e440d8, this=<value optimized out>, fd=<value optimized out>, vector=<value optimized out>, count=1, offset=<value optimized out>, iobref=0x7fffe8001070) at posix.c:2217 #2 0x00007ffff50e513e in rot13_writev (frame=0x7ffff6e4402c, this=0x638440, fd=0x7ffff409802c, vector=0x7fffe8000cd8, count=1, offset=0, iobref=0x7fffe8001070) at rot-13.c:123 |
Surprise! We’re in rot13_writev_cbk now, called (indirectly) while we’re still in rot13_writev before STACK_WIND returns (still at rot-13.c:123). If you did any request cleanup here, then you need to be careful about what you do in the remainder of rot13_writev because data may have been freed etc. It’s tempting to say you should just do the cleanup in rot13_writev after the STACK_WIND, but that’s not valid because it’s also possible that some other translator returned without calling STACK_UNWIND – i.e. before rot13_writev is called, so then it would be the one getting null-pointer errors instead. To put it another way, the callback and the return from STACK_WIND can occur in either order or even simultaneously on different threads. Even if you were to use reference counts, you’d have to make sure to use locking or atomic operations to avoid races, and it’s not worth it. Unless you really understand the possible flows of control and know what you’re doing, it’s better to do cleanup in the callback and nothing after STACK_WIND.
At this point all that’s left is a STACK_UNWIND and a return. The STACK_UNWIND invokes our parent’s completion callback, and in this case our parent is FUSE so at that point the VFS layer is notified of the write being complete. Finally, we return through several levels of normal function calls until we come back to fuse_thread_proc, which waits for the next request.
So that’s it. For extra fun, you might want to repeat this exercise by stepping through some other call – stat or setxattr might be good choices – but you’ll have to use a translator that actually implements those calls to see much that’s interesting. Then you’ll pretty much know everything I knew when I started writing my first for-real translators, and probably even a bit more. I hope you’ve enjoyed this series, or at least found it useful, and if you have any suggestions for other topics I should cover please let me know (via comments or email, IRC or Twitter).
저자 소개
유사한 검색 결과
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.