5. The Cygwin Architecture
Now we turn to an analysis of the actual
architecture of the Cygwin library.
When a binary linked
against the library is executed, the Cygwin DLL is loaded into the
application's text segment. Because we are trying to emulate a UNIX
kernel which needs access to all processes running under it, the first
Cygwin DLL to run creates shared memory areas that other processes
using separate instances of the DLL can access. This is used to keep
track of open file descriptors and assist fork and exec, among other
purposes. In addition to the shared memory regions, every process
also has a per_process structure that contains information such as
process id, user id, signal masks, and other similar process-specific
information.
The DLL is implemented using the Win32 API, which
allows it to run on all Win32 hosts. Because processes run under the
standard Win32 subsystem, they can access both the UNIX compatibility
calls provided by Cygwin as well as any of the Win32 API calls.
This gives the programmer complete flexibility in designing the
structure of their program in terms of the APIs used. For example,
they could write a Win32-specific GUI using Win32 API calls on top of
a UNIX back-end that uses Cygwin.
Early on in the development
process, we made the important design decision that it would not be
necessary to strictly adhere to existing UNIX standards like POSIX.1
if it was not possible or if it would significantly diminish the
usability of the tools on the Win32 platform. In many cases, an
environment variable can be set to override the default behavior and
force standards compliance.
5.1. Windows NT != Windows 95/98
While Windows 95 and Windows 98 are similar enough to
each other that we can safely ignore the distinction when implementing
Cygwin, Windows NT is an extremely different operating system. For
this reason, whenever the DLL is loaded, the library checks which
operating system is active so that it can act accordingly. In
some cases, the Win32 API is only different for historical reasons.
In this situation, the same basic functionality is available under
95/98 and NT but the method used to gain this functionality differs.
A trivial example: in our implementation of uname, the library
examines the sysinfo.dwProcessorType structure member to figure out
the processor type under 95/98. This field is not supported in NT,
which has its own operating system-specific structure member called
sysinfo.wProcessorLevel.
Other differences between NT and 95/98
are much more fundamental in nature. The best example is that only NT
provides a security model.
5.2. Permissions and Security
Windows NT includes a sophisticated security model
based on Access Control Lists (ACLs). Although some modern UNIX
operating systems include support for ACLs, Cygwin maps Win32 file
ownership and permissions to the more standard, older UNIX model. The
chmod call maps UNIX-style permissions back to the Win32 equivalents.
Because many programs expect to be able to find the /etc/passwd and
/etc/group files, we provide utilities that can be used to construct
them from the user and group information provided by the operating
system.
Under Windows NT, the administrator is permitted to
chown files. There is currently no mechanism to support the setuid
concept or API call. Although we hope to support this functionality
at some point in the future, in practice, the programs we have ported
have not needed it.
Under Windows 95/98, the situation is
considerably different. Since a security model is not provided,
Cygwin fakes file ownership by making all files look like they are
owned by a default user and group id. As under NT, file permissions
can still be determined by examining their read/write/execute status.
Rather than return an unimplemented error, under Windows 95/98, the
chown call succeeds immediately without actually performing any action
whatsoever. This is appropriate since essentially all users jointly
own the files when no concept of file ownership exists.
It is
important that we discuss the implications of our "kernel" using
shared memory areas to store information about Cygwin processes.
Because these areas are not yet protected in any way, in principle a
malicious user could modify them to cause unexpected behavior in
Cygwin processes. While this is not a new problem under Windows
95/98 (because of the lack of operating system security), it does
constitute a security hole under Windows NT. This is because one user
could affect the Cygwin programs run by another user by changing the
shared memory information in ways that they could not in a more
typical WinNT program. For this reason, it is not appropriate to use
Cygwin in high-security applications. In practice, this will not be
a major problem for most uses of the
library.
5.3. Files
Cygwin supports both Win32- and
POSIX-style paths, using either forward or back slashes as the
directory delimiter. Paths coming into the DLL are translated from
Win32 to POSIX as needed. As a result, the library believes that the
file system is a POSIX-compliant one, translating paths back to Win32
paths whenever it calls a Win32 API function. UNC pathnames (starting
with two slashes) are supported.
The layout of this POSIX view
of the Windows file system space is stored in the Windows registry.
While the slash ('/') directory points to the system partition by
default, this is easy to change with the Cygwin mount utility. In
addition to selecting the slash partition, it allows mounting
arbitrary Win32 paths into the POSIX file system space. Many people
use the utility to mount each drive letter under the slash partition
(e.g. C:\ to /c, D:\ to /d, etc...).
The library exports several
Cygwin-specific functions that can be used by external programs to
convert a path or path list from Win32 to POSIX or vice versa. Shell
scripts and Makefiles cannot call these functions directly. Instead,
they can do the same path translations by executing the "cygpath"
utility program that we provide with Cygwin.
Win32 file
systems are case preserving but case insensitive. Cygwin does not
currently support case distinction because, in practice, few UNIX
programs actually rely on it. While we could mangle file names to
support case distinction, this would add unnecessary overhead to the
library and make it more difficult for non-Cygwin applications to
access those files.
Symbolic links are emulated by files
containing a magic cookie followed by the path to which the link
points. They are marked with the System attribute so that only files
with that attribute have to be read to determine whether or not the
file is a symbolic link. Hard links are fully supported under Windows
NT on NTFS file systems. On a FAT file system, the call falls back to
simply copying the file, a strategy that works in many
cases.
The inode number for a file is calculated by hashing its
full Win32 path. The inode number generated by the stat call always
matches the one returned in d_ino of the dirent structure. It is
worth noting that the number produced by this method is not guaranteed
to be unique. However, we have not found this to be a significant
problem because of the low probability of generating a duplicate inode
number.
5.4. Text Mode vs. Binary Mode
Interoperability
with other Win32 programs such as text editors was critical to the
success of the port of the development tools. Most Cygnus customers
upgrading from the older DOS-hosted toolchains expected the new
Win32-hosted ones to continue to work with their old development
sources.
Unfortunately, UNIX and Win32 use different end-of-line
terminators in text files. Consequently, carriage-return newlines
have to be translated on the fly by Cygwin into a single newline
when reading in text mode. The control-z character is interpreted as
a valid end-of-file character for a similar reason.
This
solution addresses the compatibility requirement at the expense of
violating the POSIX standard that states that text and binary mode
will be identical. Consequently, processes that attempt to lseek
through text files can no longer rely on the number of bytes read as
an accurate indicator of position in the file. For this reason, an
environment variable can be set to override this
behavior.
5.5. ANSI C Library
We chose to include
Cygnus' own existing ANSI C library
"newlib" as part of the library, rather than write all of the lib C
and math calls from scratch. Newlib is a BSD-derived ANSI C library,
previously only used by cross-compilers for embedded systems
development.
The reuse of existing free implementations of such things
as the glob, regexp, and getopt libraries saved us considerable
effort. In addition, Cygwin uses Doug Lea's free malloc
implementation that successfully balances speed and compactness. The
library accesses the malloc calls via an exported function pointer.
This makes it possible for a Cygwin process to provide its own
malloc if it so desires.
5.6. Process Creation
The fork call in Cygwin is particularly interesting because it
does not map well on top of the Win32 API. This makes it very
difficult to implement correctly. Currently, the Cygwin fork is a
non-copy-on-write implementation similar to what was present in early
flavors of UNIX.
The first thing that happens when a parent process
forks a child process is that the parent initializes a space in the
Cygwin process table for the child. It then creates a suspended
child process using the Win32 CreateProcess call. Next, the parent
process calls setjmp to save its own context and sets a pointer to
this in a Cygwin shared memory area (shared among all Cygwin
tasks). It then fills in the child's .data and .bss sections by
copying from its own address space into the suspended child's address
space. After the child's address space is initialized, the child is
run while the parent waits on a mutex. The child discovers it has
been forked and longjumps using the saved jump buffer. The child then
sets the mutex the parent is waiting on and blocks on another mutex.
This is the signal for the parent to copy its stack and heap into the
child, after which it releases the mutex the child is waiting on and
returns from the fork call. Finally, the child wakes from blocking on
the last mutex, recreates any memory-mapped areas passed to it via the
shared area, and returns from fork itself.
While we have some
ideas as to how to speed up our fork implementation by reducing the
number of context switches between the parent and child process, fork
will almost certainly always be inefficient under Win32. Fortunately,
in most circumstances the spawn family of calls provided by Cygwin
can be substituted for a fork/exec pair with only a little effort.
These calls map cleanly on top of the Win32 API. As a result, they
are much more efficient. Changing the compiler's driver program to
call spawn instead of fork was a trivial change and increased
compilation speeds by twenty to thirty percent in our
tests.
However, spawn and exec present their own set of
difficulties. Because there is no way to do an actual exec under
Win32, Cygwin has to invent its own Process IDs (PIDs). As a
result, when a process performs multiple exec calls, there will be
multiple Windows PIDs associated with a single Cygwin PID. In some
cases, stubs of each of these Win32 processes may linger, waiting for
their exec'd Cygwin process to exit.
5.7. Signals
When a Cygwin process starts, the library starts a secondary thread for
use in signal handling. This thread waits for Windows events used to
pass signals to the process. When a process notices it has a signal,
it scans its signal bitmask and handles the signal in the appropriate
fashion.
Several complications in the implementation arise from the
fact that the signal handler operates in the same address space as the
executing program. The immediate consequence is that Cygwin system
functions are interruptible unless special care is taken to avoid
this. We go to some lengths to prevent the sig_send function that
sends signals from being interrupted. In the case of a process
sending a signal to another process, we place a mutex around sig_send
such that sig_send will not be interrupted until it has completely
finished sending the signal.
In the case of a process sending
itself a signal, we use a separate semaphore/event pair instead of the
mutex. sig_send starts by resetting the event and incrementing the
semaphore that flags the signal handler to process the signal. After
the signal is processed, the signal handler signals the event that it
is done. This process keeps intraprocess signals synchronous, as
required by POSIX.
Most standard UNIX signals are provided. Job
control works as expected in shells that support
it.
5.8. Sockets
Socket-related calls in Cygwin simply
call the functions by the same name in Winsock, Microsoft's
implementation of Berkeley sockets. Only a few changes were needed to
match the expected UNIX semantics - one of the most troublesome
differences was that Winsock must be initialized before the first
socket function is called. As a result, Cygwin has to perform this
initialization when appropriate. In order to support sockets across
fork calls, child processes initialize Winsock if any inherited file
descriptor is a socket.
Unfortunately, implicitly loading DLLs
at process startup is usually a slow affair. Because many processes
do not use sockets, Cygwin explicitly loads the Winsock DLL the
first time it calls the Winsock initialization routine. This single
change sped up GNU configure times by thirty
percent.
5.9. Select
The UNIX select function is another
call that does not map cleanly on top of the Win32 API. Much to our
dismay, we discovered that the Win32 select in Winsock only worked on
socket handles. Our implementation allows select to function normally
when given different types of file descriptors (sockets, pipes,
handles, and a custom /dev/windows windows messages
pseudo-device).
Upon entry into the select function, the first
operation is to sort the file descriptors into the different types.
There are then two cases to consider. The simple case is when at
least one file descriptor is a type that is always known to be ready
(such as a disk file). In that case, select returns immediately as
soon as it has polled each of the other types to see if they are
ready. The more complex case involves waiting for socket or pipe file
descriptors to be ready. This is accomplished by the main thread
suspending itself, after starting one thread for each type of file
descriptor present. Each thread polls the file descriptors of its
respective type with the appropriate Win32 API call. As soon as a
thread identifies a ready descriptor, that thread signals the main
thread to wake up. This case is now the same as the first one since
we know at least one descriptor is ready. So select returns, after
polling all of the file descriptors one last
time.
|