[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Discussion: what would not blocking on btrfs look like?



On Wed, Aug 28, 2019 at 03:01:16PM -0400, Josh Boyer wrote:
> On Wed, Aug 28, 2019 at 2:40 PM Josef Bacik <josef toxicpanda com> wrote:
> >
> > On Wed, Aug 28, 2019 at 02:35:39PM -0400, Laura Abbott wrote:
> > > On 8/28/19 1:58 PM, Josef Bacik wrote:
> > > > On Tue, Aug 27, 2019 at 07:53:20AM -0400, Laura Abbott wrote:
> > > > > On 8/26/19 11:39 PM, Neal Gompa wrote:
> > > > > > On Mon, Aug 26, 2019 at 11:16 AM Laura Abbott <labbott redhat com> wrote:
> > > > > > >
> > > > > > > On 8/23/19 9:00 PM, Chris Murphy wrote:
> > > > > > > > On Fri, Aug 23, 2019 at 1:17 PM Adam Williamson
> > > > > > > > <adamwill fedoraproject org> wrote:
> > > > > > > >
> > > > > > > > > So, there was recently a Thing where btrfs installs were broken, and
> > > > > > > > > this got accepted as a release blocker:
> > > > > > > > >
> > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1733388
> > > > > > > >
> > > > > > > > Summary: This bug was introduced and discovered in linux-next, it
> > > > > > > > started to affect Fedora 5.3.0-rc0 kernels in openqa tests, patch
> > > > > > > > appeared during rc1, and the patch was merged into 5.3.0-rc2. The bug
> > > > > > > > resulted in a somewhat transient deadlock which caused installs to
> > > > > > > > hang, but no corruption. The fix, 2 files changed, 12 insertions, 8
> > > > > > > > deletions (1/2 the insertions are comments).
> > > > > > > >
> > > > > > > > How remarkable or interesting is this bug? And in particular, exactly
> > > > > > > > how much faster should it have been fixed in order to avoid worrying
> > > > > > > > about it being a blocker bug?
> > > > > > > >
> > > > > > > > 7/25 14:27 utc bug patch was submitted to linux-btrfs@
> > > > > > > > 7/25 22:33 utc bug was first reported in Fedora bugzilla
> > > > > > > > 7/26 19:20 utc I confirmed upstream's patch related to this bug with
> > > > > > > > upstream and updated the Fedora bug
> > > > > > > > 7/26 22:50 utc I confirmed it was merged into rc2, and updated the Fedora bug
> > > > > > > >
> > > > > > > > So in the context of status quo, where Btrfs is presented as an option
> > > > > > > > in the installer and if there are bugs they Beta blocking, how could
> > > > > > > > or should this have been fixed sooner? What about the handling should
> > > > > > > > have been different?
> > > > > > > >
> > > > > > >
> > > > > > > That's a fair question. This bug actually represents how this _should_
> > > > > > > work. The concern is that in the past we haven't seen a lot engagement
> > > > > > > in the past. Maybe today that has changed as demonstrated by this thread.
> > > > > > > I'm still concerned about having this be a blocker vs. just keeping it
> > > > > > > as an option, simply because a blocker stops the entire release and it
> > > > > > > can be a last minute scramble to get things fixed. This was the ideal
> > > > > > > case for a blocker bugs and I'm skeptical about all bugs going this well.
> > > > > > > If we had a few more people who were willing to be on the btrfs alias and
> > > > > > > do the work for blocker bugs it would be a much stronger case.
> > > > > > >
> > > > > >
> > > > > > Out of curiosity, how many such issues have we had in the past 2
> > > > > > years? I personally can't recall any monumental occasions where people
> > > > > > were scrambling over *Btrfs* in Fedora. If anything, we continue to
> > > > > > inherit the work that SUSE and Facebook are doing upstream as part of
> > > > > > us continually updating our kernels, which I'm grateful for.
> > > > > >
> > > > > > And in the instances where we've had such issues, has anyone reached
> > > > > > out to btrfs folks in Fedora? Chris and myself are the current ones,
> > > > > > but there have been others in the past. Both of us are subscribed to
> > > > > > the linux-btrfs mailing list, and Chris has a decent rapport with most
> > > > > > of the btrfs developers.
> > > > > >
> > > > > > What more do you want? Actual btrfs developers in Fedora? We don't
> > > > > > have any for the majority of filesystems Fedora supports, only XFS. Is
> > > > > > there some kind of problem with communicating with the upstream kernel
> > > > > > developers about Fedora bugs that I'm not aware of?
> > > > > >
> > > > >
> > > > > Again, it's about length of overall development. ext and XFS have
> > > > > a much longer history in general which is something that's important
> > > > > for file system stability in general. It's also a bit of a catch-22
> > > > > where the rate of btrfs use in Fedora is so low we don't actually
> > > > > see issues.
> > > > >
> > > > > > > > I note here that ext2 and ext3 are offered as file systems in
> > > > > > > > Custom/Advanced partitioning and in this sense have parity with Btrfs.
> > > > > > > > If this same bug occurred in ext2 or ext3 would or should that cause
> > > > > > > > discussion to drop them from the installer, even if the bug were fixed
> > > > > > > > within 24 hours of discovery and patch? What about vfat? That's
> > > > > > > > literally the only truly required filesystem that must work, for the
> > > > > > > > most commonly supported hardware so it can't be dropped, we'd just be
> > > > > > > > stuck until it got fixed. That work would have to be done upstream,
> > > > > > > > yes?
> > > > > > > >
> > > > > > >
> > > > > > > I don't think that's really a fair comparison. Just because options
> > > > > > > are presented doesn't mean all of them are equal. ext2/ext3 and vfat
> > > > > > > have been in development for much longer than btrfs and length of development
> > > > > > > is something that's particularly important for file system stability
> > > > > > > from talking with file system developers. It's not impossible for there
> > > > > > > to be bugs in ext4 for example (we've certainly seen them before) but
> > > > > > > btrfs is only now gaining overall stability and we're still more likely to see
> > > > > > > bugs, especially with custom setups where people are likely to find
> > > > > > > edge cases.
> > > > > > >
> > > > > >
> > > > > > Nope. We can totally use this because LVM has not existed as long (we
> > > > > > use LVM + filesystem by default, not plain partitions), and we still
> > > > > > encounter quirks with things like thinp LVM combined with these
> > > > > > filesystems. OverlayFS is mostly hot garbage (kernel people know it,
> > > > > > container people know it, filesystem people know it, etc.), and yet we
> > > > > > continue to try to use it in more places. Stratis is in an odd state
> > > > > > of limbo now, since its main developer and advocate left Red Hat.
> > > > > > > There are plenty of examples of Red Hat doing crazy/experimental
> > > > > > things... I'd like to think Red Hat isn't supposed to be special here,
> > > > > > but in this realm, it seems like it is...
> > > > > >
> > > > > >
> > > > >
> > > > > btrfs still doesn't give me the warm fuzzies and I also think this
> > > > > is a bigger issue than other features simply because user data is at
> > > > > stake. We do need to consider that the failure case is not "I can't do X"
> > > > > but "my precious data which I have been trying to snapshot is now
> > > > > inaccessible" in a way that's even worse than say rpm database
> > > > > corruption. Even if it is in the advanced partitioning or not the
> > > > > default, we can still end up with people clicking in because they
> > > > > read an article about how btrfs was the hot new thing.
> > > > >
> > > > > There are two parts to this here: killing off btrfs entirely and
> > > > > btrfs as release criteria. I think you are correct that there's
> > > > > enough community support to justify keeping btrfs around at least
> > > > > in the kernel (I can't speak for anaconda here)
> > > > >
> > > > > As for btrfs as release criteria, I'd feel much more confident
> > > > > about that if we could have a file system developer on the btrfs
> > > > > alias. I'm glad to hear the btrfs upstream community has been
> > > > > receptive to bugs but it's still much easier to make things
> > > > > happen if we have contributors who are active in the Fedora
> > > > > community, especially if we want the advanced features that
> > > > > btrfs has (which is why people want it anyway). So, who would
> > > > > you suggest to work with us in Fedora?
> > > >
> > > > You can always CC me, if I get an email from you or anybody else I recognize
> > > > from the fedora kernel team I'm going to pay attention to it.
> > > >
> > > > Facebook runs more btrfs file systems than Fedora has installs, so we're pretty
> > > > happy with how it works stability wise.  That being said we're slightly more
> > > > fault tolerant than most users.  If you guys are hitting problems chances are
> > > > we'll hit them eventually as well, so it makes sense for us to be on top of
> > > > them.
> > > >
> > > > I agree it would be better if somebody inside Fedora was able to help out, but
> > > > again I'm only an email away.  Thanks,
> > > >
> > >
> > > So it appears you are on the btrfs alias already:
> > >
> > > fedora-kernel-btrfs: fs-maint redhat com,josef toxicpanda com,bugzilla colorremedies com
> > >
> > > This technically meets the requirements if you are willing to stay on this
> > > alias and (continue) to help with requests as needed. I would feel more
> > > confident if we had a few more people involved as well. Even better
> > > would be proactively going through the bugzillas to help find the
> > > btrfs ones.
> >
> > Yeah that goes into a bucket that basically is ignored.  The only time I'll peek
> > in there is if somebody specifically pokes me, because generally speaking we hit
> > the problems and fix them welllllll before Fedora users start to notice them.
> 
> Fedora chugs along at the rate of daily upstream Linus snapshots.  If
> you're hitting and fixing issues before Fedora users see them, I'm
> curious why Fedora users would ever see them.
> 
> Where does the lag come from?  Are the fixes queued internally?
> Staged in an upstream subsystem tree?  Is there a way for interested
> btrfs people to proactively just get those fixed in Fedora before
> users hit them?

For this particular example we saw the problem in testing and had a patch on the
mailinglist before you hit the problem.  It was in a tree and sent to Linus, and
was merged the day after the bugzilla was reported.  So yes before users see
them, unless they are subscribed to the daily snapshots, which I assume is just
for testing right?  Or were you guys going to ship 5.3-rc0?

On one hand I understand all of the consternation around making btrfs bugs
blockers for Fedora, but on the other hand it seems a bit silly to be having
this conversation at all based on hitting a bug that went into the merge window
and then was fixed before rc1 was even cut.  Thanks,

Josef


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]