Red Hat Research Quarterly: The uncertainty principle

30 octobre 2023Hugh Brock5 minutes (temps de lecture)

One of the funny things about research is you never know what you're going to get. In fact, the uncertainty of research is not just unavoidable—it's desirable. Scientific breakthroughs like penicillin and even X-rays were the result of attentive scientists noticing something interesting while pursuing something else, then applying the same rigor to the new path that they would have for their original thesis.

If this teaches us anything beyond the virtue of attention to detail in research, surely it is that in planning and especially funding it, we must pay as much attention to the researcher doing the work and the field they work in as to the specific question they propose to answer.

Unikernel Linux

What launched me down this line of thinking is our long-running Unikernel Linux (UKL) project, which had the original aim of producing a Linux-based unikernel. A unikernel is a software application that is built into a single binary along with the kernel and operating system that will support it and that runs in the same privileged space as that kernel. Although there are, of course, security concerns with running an application this way, major performance gains can be realized through bypassing parts of the kernel not needed for the particular application running in this mode.

The research team at the Red Hat Collaboratory at Boston University has made significant progress on this front: UKL preserves Linux’s battle-tested codebase, community, and ecosystem of tools, applications, and hardware support while demonstrating performance gains of up to 26%. The next steps will be getting the project into the hands of more developers and working with commercial and individual partners to test their applications with UKL. Users with workloads requiring the highest performance and the lowest latencies stand to gain the most.

But there’s also been a twist, including the discovery that building the application along with the kernel may be unnecessary. Instead, we may have actually discovered a way to write a user-space application that can effectively act as a device driver through controlled privilege escalation. This is not at all what we were looking for, but it may turn out to be substantially more useful. We've had similar happy accidents applying machine learning to compiler optimization, as well as in tuning dynamic systems for energy efficiency.

Testing for mission-critical systems

That said, embracing uncertainty doesn’t mean embracing aimlessness. As part of Red Hat Research’s partnership with Czech Technical University (CTU), we work with professor Miroslav Bureš, an expert in system testing and test automation. His work in testing unreliable systems, grounded in the long-time collaboration between Red Hat Czech and CTU, is finding applications with NATO troops as well as commercial systems.

In a recent interview about his experiments to simulate unreliable connections in large IoT installations, providing a testbed for system designers, he was asked what the near future holds for his work.

“Honestly, I cannot say,” he answered. “Research is an adventure. When we find something more useful, we will go for it.”

Focusing on usefulness is what led Miroslav from learning software testing on the job while working as a banking project manager to eventually founding the System Testing IntelLigent Lab (STILL) at CTU, where Red Hat shares lab space with faculty and students. One of the first initiatives we collaborated on with Miroslav was the PATRIOT project, which created a new test framework for IoT solutions—and that project, now complete, led to his current work in creating sensor network technology to help first responders and field doctors in defense settings do their jobs more safely and effectively.

It’s not easy to see a clear path that starts with the abstract mathematics of testing algorithms and ends at saving the life of a wounded soldier, but following the instincts of a good researcher isn’t a bad way to find it.

Generative AI and large language models

One last word on uncertainty. It’s hard to get more uncertain than the future of the generative AI systems that have taken the world by storm over the last year. Understanding both what they are and what they mean for open source development has become a Quadrant I, important and urgent task for those of us in research.

To get a handle on the problem, I asked our AI leader in Red Hat Research, Sanjay Arora, to collaborate with leading Red Hat software licensing expert Richard Fontana to develop an article for our research quarterly painting a comprehensive picture of what generative AI is, what it can actually be used for, whether it can be trusted (not in most cases, in my view), and whether it makes sense to talk about models being "open" or not.

Writing, as it turns out, is a lot like research. I didn’t know exactly where the story would go when we conceived it, and it went in a very different and much better direction from what I was expecting. However, as someone who is not a domain expert in AI like Sanjay, these are my key technical takeaways:

In the software industry, large language models (LLMs) provide ways to significantly improve production, performance, and customer service.
Ready or not, some software companies will soon be using large language models (LLMs) to gain a competitive advantage.
That said, LLMs may not be quite ready for prime time, given their propensity to hallucinate output—that is, state “facts” that cannot be inferred from the training data.
We will need to develop both training methods and hardware that are more efficient to bring down the high cost and energy consumption of training.

Richard also makes the important point that all the open source licenses in use today were developed before machine learning models were a topic of widespread commercial interest. The open source community is far from reaching consensus on how and whether to place limits on the use of source code for training data or even model weights. These debates cut right to the basic question of the definition and purpose of open source as a development model.

We can’t foresee where open source licenses or even copyright law more generally will go in response to ChatGPT and similar projects, but Sanjay and Richard’s final point is well taken: “Realizing the potential of Generative AI and LLMs…will depend on open source communities, industry, and AI/ML researchers working together in the open. The more roadblocks we set up, the slower the progress.”

I think that sums up our approach to research at Red Hat nicely: start with an open mind about where you’ll end up, bring in brilliant collaborators from both academia and industry, and then give them room to come up with answers to questions you hadn’t even thought to ask yet. It’s been working for us pretty well.

About the Red Hat Research Quarterly

Four times a year, the Red Hat Research Quarterly highlights the collaborations between Red Hat engineers and our university partners. The open source research featured in the magazine offers a glimpse of the most innovative and promising ideas we think will be shaping technology in the next 3-5+ years. Read more articles like this and subscribe to the magazine for free to stay up to date with open source research.

À propos de l'auteur

Hugh Brock

Research Director

Hugh Brock is the Research Director for Red Hat, coordinating Red Hat research and collaboration with universities, governments, and industry worldwide. A Red Hatter since 2002, Hugh brings intimate knowledge of the complex relationship between upstream projects and shippable products to the task of finding research to bring into the open source world.

Read full bio