[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Summary of the 2009-01-06 Packaging Committee meeting

On Wed, 2009-01-07 at 16:51 -0800, Toshio Kuratomi wrote:
> Doug Ledford wrote:
> > On Wed, 2009-01-07 at 14:48 -0800, Toshio Kuratomi wrote:
> >> It depends on how you interpret the FHS, I suppose.  In the old
> >> packages, the config files are in /etc, the arch independent data (help
> >> files) are in a subdir or /usr/share/openmpi/, and most of the
> >> arch-specific files are under /usr/lib/openmpi/.  This satisfies the
> >> overarching goal of the FHS, separation of sharable and unsharable data.
> >>  it also satisfies the goal of separating arch specific and arch
> >> independent files.
> >>
> >> The question is whether the binaries can go there or have to go in
> >> /usr/bin and whether the libraries can go there or must go directly in
> >> /usr/lib.  For the libraries, we often put private libraries in a
> >> subdirectory of /usr/lib.  These differ in that they're public
> >> libraries.  I lean towards this being okay.  The binaries being in the
> >> subdirectory of %{_libdir} doesn't have as much precedent.  Perhaps we
> >> need to make that usage explicit in the Guidelines just like %{_libexecdir}?
> >>
> >> Looking at the new package I see that there's config files under
> >> %{_libdir}/openmpi.  I think these need to go in %{_sysconfdir} instead.
> >>  This is more important than binaries and libraries for several reasons:
> >>
> >> 1) Having them in %{_libdir} breaks the sharable/unsharable boundary
> > 
> > Not really, but that's due to typical usage of these specific files.  I
> > would tend to agree that files normally in /etc are something that are
> > intended to be edited on a per machine basis.  These files, even though
> > they are in %{_libdir}/%{mpidir}/etc, are not something that you would
> > edit on a per machine basis.  If anything, things like the
> > openmpi-default-hostfile would be edited on a per version basis (and
> > with this layout they have a per version etc directory to be contained
> > in).  This is because on a large cluster, you are likely to either allow
> > all the machines in the cluster to participate and would put all the
> > machines in the cluster in this config file, or you would have a segment
> > of the cluster that is dedicated to running this version of openmpi and
> > only those machines would be in this file.  Either way, for all the
> > machines you want running this version of openmpi by default, the file
> > would be the same (this assumes that a person might start the openmpi
> > job from any machine in the cluster that's part of the appropriate
> > group, you may have a control machine doing things instead, in which
> > case you really only have to edit the file on that one machine and all
> > the others will be passive clients and not care about the contents of
> > this file).
> > 
> Okay.. but then you preclude the possibility of running multiple
> instances of one mpi version within a cluster.  It sounds like that's
> not typical in your experience but it doesn't sound like a necessary
> limitation.

No, nothing's precluded.  None of these files are essential to the
openmpi operation (well, the mpivars.* files are essential to
mpi-selector operation, but they are files that should never be edited
by admins, they are created during the build process and are static, I
just stuck them there) and every single one of them has the option to be
overridden by an instance specific version of the file.

> > Now, the even more common scenario is that you have multiple different
> > MPI apps.  The admins typically would do a login per app so that the
> > default login environment for a given app is already pre-configured.
> > Amongst that would be things like selection of the right mpi, and host
> > files specific to what machines that app is allowed to run on.  Those
> > would all be in the home directory for the login and wouldn't require
> > editing the system wide etc files in here.
> > 
> Despite the environment being somewhat different than normal this kind
> of configuration is normal for any apps.
> >> 2) They are files edited by system admins and looked at by the user.
> >> They should be in a predictable place for this reason.
> > 
> > In truth, they aren't edited much at all, and relying upon them is
> > frowned upon.  But, as I noted above, even if they are edited, they are
> > still generally shareable due to the nature of MPI clusters.
> > 
> This is true of other applications as well, though....
> So even if we don't care about people having multiple different openmpi
> instances within their cluster, this still doesn't answer what breaks by
> putting the config files in /etc.  Which is important because deviating
> breaks other sysadmin assumptions.

No, putting the files in /etc would actually break more sysadmin
assumptions that anything else.  OpenMPI and the mvapich stacks have
been installed under static prefix installations for far longer than
we've been interested in shipping them.  People totally new to the
openmpi realm/usage might have some assumptions broken, but people who
have been using openmpi and similar packages for years would have theirs
broken by our changes.

And it's not just the users.  The package itself has a configure option
to enable --prefix behavior by default, and if you read the man page
you'll see that there is a specific option for passing in the --prefix
to the run time environment, and in fact if you even just start the run
time environment using a full path such as /usr/local/bin/mpirun, it
automatically enables --prefix mode and sets the prefix to one directory
up from the one the binary is in, and then it passes %{prefix}/bin in
the path and %{prefix}/lib in the ld library path to the remote nodes.

>   For instance, if I was backing up
> all configuration files on these machines by backing up /etc, this would
> miss the openmpi configs.  If I was mounting the /usr filesystem
> read-only, this would prevent me from updating the config file on-the-fly.

True on both counts.  Of course, given clusters, totally irrelevant
points.  They don't back up individual /etc directories on any of the
cluster nodes.  All the nodes are set up so that they can be installed
by the cluster manager, and they can be reinstalled as needed.  Nodes
are disposable.  Also, they *do* mount /usr read-only in lots of
clusters, and they couldn't care less that default files are in there.
If they need to edit them, they go to the cluster disk controller node
and edit it where it isn't read-only and then all the nodes see the
changes.  More commonly, the batch scheduler they use has its own
private data directory on the controlling node and it writes the
necessary files on the fly (or passes the options entirely on the
command line) based upon what nodes it intends to start the job on.  My
point is that these "we care about single node issues" simply do not
exist in clusters, and *can't* exist or they make the cluster

> >> As you noted, there's also some FHS regressions compared to the current
> >> package:
> >>
> >> - include files are under %{_libdir} instead of under %{_includedir} --
> >> If these are arch specific include files then this makes sense.  If not,
> >> they belong in %{_includedir}.  What things were broken by doing that?
> > 
> > Two things here.  Remember that we allow simultaneous installs of
> > different versions of OpenMPI (you can't get it out of the yum channel
> > this way, and you can't do upgrades of OpenMPI or it wipes older
> > versions out, but you can download anything after the openmpi-1.2.5 I
> > think and install different copies of different versions, although that
> > does not include multiple releases of the same version, I only use n-v
> > in the naming, not full n-v-r, so for instance you couldn't have 1.2.7-5
> > and 1.2.7-6 installed, but you can have 1.1.8 and 1.2.7 installed at the
> > same time) in order to meet user requirements.  Differing versions can
> > have differing header files, so we can't just use %{_includedir}/%{name}
> > or they might conflict.  Putting the includes alongside the libs works
> > for just about any devel package that needs to use it because you can
> > just use --prefix to configure it to the right place.  Of course, the
> > gcc wrappers also know about where the right include files are, so it
> > works with mpicc without doing anything.  The second reason is that for
> > fortran use in particular, the header file produced during build is
> > different for different arches.
> The correct way to do this is by having the version in the includedir:
>   %{_includedir}/openmpi-1.2.7/*.h
> >  So aside from the multi-install issue,
> > there is an arch specific component to the headers that can't be worked
> > around due to limitations in the fortran language (or that's my
> > understanding, I haven't touched fortran since 1991 or so).
> > 
> So is it only the fortran headers that are arch specific or all all of
> them arch specific and only fortran doesn't have a way to workaround that?

All the headers except fortran have the ability to do things like #ifdef
__i386__ so that a single header works on all arches.  The fortan header
can't and must be specific to the arch it's referencing.  In the past,
what I tried to do was put the headers under %{_includedir}/openmpi and
then I created an arch specific dir and moved the arch specific header
into that.  Because of how openmpi's mpicc works, this then meant that I
had to run a sed script during the install on the files in
%{_datadir}/%{name}/help/*-wrapper.txt to edit in the additional include
directory into the default include search list (I also had to edit in -m
%{mode} on multilib capable arches).  This is why the datadir help files
had to be placed in an arch specific location.

> arch specific headers do belong in a subdir of %{_libdir}.  But most of
> the times just that file goes into %{_libdir}.  If you take a look at
> glib-devel, for instance, you have:
>   /usr/lib/glib/include/glibconfig.h
>   /usr/include/glib-1.2
>   /usr/include/glib-1.2/glib.h
>   /usr/include/glib-1.2/gmodule.h

I actually think that doing the above is worse than what I'm doing with
all of the openmpi include files in one place.  And when I *did* have
the openmpi includes inside %{_includedir}, I *still* kept all the
includes there and just created bit size specific include dirs under the
main openmpi include dir.  I find either of those alternatives superior
to the junk above.

> >> - man dirs are now under %{_libdir} instead of under %{_datadir}.  What
> >> broke by having these under %{_datadir}?
> > 
> > Multiple installs
> This shouldn't be the case.  Once again, the correct solution to this
> problem is including the version in the directory name.
> > and also if we put it under datadir, then we have to
> > fiddle with manpath when we set up the environment.  With them where
> > they are, the presence of %{_libdir}/%{mpidir}/bin in the exec path is
> > enough for man to track down the right man page automatically.
> > 
> And this should be something that environment modules takes care of.

Sorry, not convincing.  The openmpi package has unique requirements.  It
has assumptions about being in its own prefix coded into its actual
runtime operation.  And although openmpi might be able to do these
things, neither mvapich or mvapich2 even allow installing their files in
anything other than their own private prefix.  So making all these
changes to openmpi wouldn't solve the issue on the other two and would
simply serve to fragment how people handle the various MPI
implementations, taking us from one standard to two.  And this all
because the people that put out the FHS decided that if you are an ISV
then you can put code into /opt under a private prefix, but if you are
the OS vendor, then even if the code really *is* highly optional and not
shipped by default and really *should* be in /opt with its own prefix,
you can't do it.  My response to that is that the FHS people were being
dumbasses and should have left us with a bit more flexibility to do the
right thing depending on circumstances.  In fact, if I were to do
*anything* with the openmpi package, it would be to go ahead and move it
under /opt in defiance of FHS because that's where it really needs to

Doug Ledford <dledford redhat com>
              GPG KeyID: CFBFF194

Infiniband specific RPMs available at

Attachment: signature.asc
Description: This is a digitally signed message part

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]