Show logo
Explore all episodes

A Language for the Web

  |    
Tech history

Command Line Heroes • • A Language for the Web | Command Line Heroes

A Language for the Web | Command Line Heroes

About the episode

The Hypertext Markup Language (HTML) gave everyone a foundation for building and viewing the World Wide Web. In 1995, its standardization led to dominance. Its simplicity helped it spread. And its solid common foundation helped shape the internet.

Dr. Belinda Barnet explains what kind of framework was initially needed to build and navigate the web. Jeff Veen describes the three ingredients Tim Berners-Lee combined to create HTML: the ideal language for the web. Gavin Nicol recounts the need to standardize the quickly-growing language. And Gretchen McCulloch points out how HTML instills an inherent bias for English-speaking web developers.

Command Line Heroes Team Red Hat original show

Subscribe

Subscribe here:

Listen on Apple Podcasts Listen on Spotify Subscribe via RSS Feed

Transcript

In Medieval Europe, scholars had to converse in... Latin. For centuries the British monarchs spoke... French. And today the business language of India is... English. Official languages have the power to unify people, but they don't always reflect everybody's lived experience. And when we look at not just a country, but a world wide web, that struggle to impose a standard language can grow to epic proportions. This season, we've been exploring a pivotal year in the history of tech—1995. We already heard how it launched the dot-com bubble, and how it led to the privatization of the internet. But 1995 was also the year when HTML, the language of the web, was standardized. HTML's rapid evolution was crucial to the web's development and growth, but some basic assumptions about who a coder is and whose language mattered were locked into place at the same time. And once we began digging into HTML's past, we realized: a language can become a standard, but it can never be neutral. I'm Saron Yitbarek and this is Command Line Heroes, an original podcast from Red Hat. Today, hypertext markup language, HTML, is the mother tongue of the web. The standard markup language for pretty much everything you see in a browser, but right there in its name is a much older concept that predates HTML and that's the idea of hypertext. Back in 1945, the engineer Vannevar Bush wrote an article for Life magazine, where he imagined a futuristic machine. This machine would allow you to display information on a screen, and that information could be retrieved from a microfiche storage device that sat under the desk. Bush was thinking this up decades before anything like the world wide web, but what he proposed was the start of something big. Belinda Barnet, Senior Lecturer in Media and Communications at Swinburne University of Technology explains. What was most interesting is that you could create links between pieces of information from different articles in order to create what he called a trail through information. And so this was really the first instance of a technical device that would create hyperlinks. For years, Bush worked as the Head of the U.S. Office of Scientific Research and Development. So even when he was delving into a bit of fantasy, he was still being influenced by academic practices—that classic rule of academia, where one person's work is constantly linked to other authorities. Bush imagined a machine that would make those links come to life, a machine that would work like an academic's mind, connecting to every other mind it had ever encountered. He called this imaginary machine, the "Memex"—combining together the words, memory and extender and that's what it was: an extension of human memory, an extension of human thought. Bush's Memex inspired generations of computer scientists to pursue that Holy Grail of preserved and interlinked knowledge. But yeah, he still lived in the 1940s. He'd have to wait another decade until the Memex started coming to life. In the 1950s, Douglas Engelbart, who we've talked about on this podcast before was inspired to build a system of links, a living network of linked information. And he brought together this idea of using a computer screen to display knowledge and information and link it together in the manner of the Vannevar Bush's trails and create a system, which he eventually got funding for at the Stanford Research Institute, that was the first hypertext system. Of course, he didn't call it hypertext yet. The word itself was coined a decade later in the 1960s by philosopher Ted Nelson. Though, as Barnet tells it, Nelson's version differed from Engelbart's in important ways. Ted wanted something far more freeform, more like, as he put it thought itself, which kind of meanders between things and there's no restrictions to what you can connect to or at what level you can connect it. He had also imagined that hyperlinks would not be one-way, but that they would be two-way. But this basic concept of connecting together different pieces of information associatively and forming trails through the information was certainly evident in Ted's thinking in the 60s. That distinction between one-way links and two-way links has pretty profound consequences. A web composed of two-way links would arguably create an entirely different online experience and at that point in history, there was no obvious form that hypertext had to take. Our linked future was still being imagined. Douglas Engelbart's hypertext system, which was called the online system, was not especially user-friendly. Only the truly technical were able to use it. And for years, most early hypertext attempts had the same roadblock, but then along came a computer scientist named Tim Berners-Lee. While working as a contractor for CERN in 1980, he created a document-sharing program called Enquire. And 9 years after that, he wrote a memo laying out a plan to use hypertext to take his work onto the global stage. The result? A Hypertext Markup Language was unlike any hypertext system that had come before. Jeff Veen is a partner at True Ventures. Tim Berners-Lee took three things that already existed and mashed them together in this brilliant way. He took a markup language, that's a way of marking up documents to give them structure. That existed. He took hypertext, which is a way of linking from one document to another. That existed. And he took networking so that these documents could be stored on different machines around the world. And that also existed. He took those three things and mashed them together and we got HTML. We got HTTP. And we got URLs. And the web was born. But here's the thing. Berners-Lee wasn't just combining three existing concepts. He was also making crucial decisions about how they combined. And those decisions would echo through the decades. For example, remember Belinda Barnet told us that Ted Nelson imagined bidirectional links? Well, Berners-Lee made a very different choice. Tim Berners-Lee made the decision that links were one way. That when you clicked on a link, you went somewhere else, but the place you went didn't necessarily know that you came from somewhere. And he made that decision specifically because he thought it would make the web easier to implement and manage. And he was right. It did make it easier, but it also meant that we lost some of the richness that bidirectional links would have provided. That decision to go with one-way links instead of bidirectional links was all about practicality. It made the web easier to build and maintain. But it also shaped the kind of web we ended up with. A web where links break, where you can't easily see who's linking to you, where the fabric of connection is a bit more fragile than it might have been. Other decisions were more directly related to language. When Berners-Lee designed HTML, he used English words for the tags: head, body, title, paragraph. These weren't just arbitrary choices. They reflected the fact that Berners-Lee was working in an English-speaking environment, and he assumed that the people who would be using HTML would also be comfortable with English. HTML was designed to be simple. And part of that simplicity came from using familiar English words for the tags. It made HTML easy to learn and easy to remember. But it also meant that HTML was inherently biased toward English speakers. That bias toward English wasn't necessarily intentional, but it was definitely consequential. As the web grew, HTML's English-centric design became a barrier for non-English speakers who wanted to create web content. By 1993, there were about 500 websites in the world. By 1994, that number had grown to 10,000. The web was exploding, and HTML was at the center of that explosion. But with that rapid growth came new challenges. Different browsers were implementing HTML in different ways, and there was no standard to ensure compatibility. Enter the World Wide Web Consortium, or W3C, founded by Tim Berners-Lee in 1994. The W3C's mission was to develop standards for the web, and one of their first priorities was standardizing HTML. The standardization of HTML was crucial for the web's growth. Without standards, different browsers would have implemented HTML in incompatible ways, and we would have ended up with a fragmented web where content worked on some browsers but not others. The process of standardizing HTML wasn't easy. There were competing visions for what the web should be, and different stakeholders had different priorities. Browser makers wanted features that would differentiate their products. Developers wanted simplicity and compatibility. And academics wanted rigor and completeness. The W3C had to balance all these competing interests while also trying to keep HTML simple enough that anyone could learn it. It was a delicate balancing act, and they didn't always get it right. But they managed to create a standard that was good enough to support the web's continued growth. HTML 2.0 was officially published in 1995, and it represented a major milestone in the web's evolution. For the first time, there was an official, standardized version of HTML that all browsers could implement consistently. But even as HTML was being standardized, there were growing concerns about its limitations. The original version of HTML was designed for simple documents with text and links. As the web grew, people wanted to do more complex things: they wanted to add images, tables, forms, and multimedia content. HTML was constantly playing catch-up with what people wanted to do on the web. Every time someone had a new idea for web content, HTML had to be extended to support it. It was a reactive process rather than a proactive one. This reactive approach to HTML's evolution created some interesting dynamics. Browser makers would often implement new features before they were officially standardized, creating a kind of de facto standard that the W3C would then have to formalize. The browser wars of the late 1990s were partly driven by this dynamic. Netscape and Microsoft were constantly trying to one-up each other by adding new HTML features. Some of these features were genuinely useful and eventually became part of the standard. Others were proprietary and created compatibility problems. Despite these challenges, HTML's standardization in 1995 was a turning point. It provided a stable foundation that allowed the web to grow from thousands of sites to millions, and eventually billions. But there was another aspect of HTML's evolution that was equally important: its relationship with human language. As we mentioned earlier, HTML was designed with English words and English speakers in mind. But as the web became truly global, this English-centricism became a significant limitation. Programming languages tend to be very biased toward English. And this is not just true of HTML, it's true of most programming languages. The keywords are in English, the documentation is in English, the community discussions are often in English. This creates a significant barrier for people whose first language is not English. Gretchen McCulloch is an internet linguist and the author of "Because Internet: Understanding the New Rules of Language." She's studied how language and technology intersect, and she's particularly interested in how English dominance in programming affects global participation in tech. The dominance of English in programming is partly historical accident and partly network effect. English happened to be the language of the people who were developing the early internet and programming languages. And once that precedent was set, it became self-reinforcing. New programming languages were created in English because that's what people expected. This English bias in HTML and other programming languages isn't just a minor inconvenience. It can be a significant barrier to entry for people whose first language isn't English. And it can also affect the kinds of content and applications that get created. When you force people to code in a language that's not their native language, you're not just making it harder for them to learn programming. You're also potentially limiting their creativity and their ability to think about problems in ways that are natural to their own cultural and linguistic background. This is where the story gets more complex and more interesting. Because while HTML's standardization in 1995 locked in certain English-centric assumptions, it also opened up new possibilities for international participation in the web. One of the people who recognized this challenge early on was Gavin Nicol. In the 1990s, Nicol was working on internationalization issues for web technologies. He saw firsthand how HTML's English-centricism was limiting global participation in the web. When I was working in Japan in the 1990s, I could see how difficult it was for Japanese developers to work with web technologies. The tools were designed for English, the documentation was in English, and even the character encoding systems were biased toward Latin scripts. Nicol became passionate about making web technologies more accessible to non-English speakers. He worked on developing international character encoding standards and pushed for better support for non-Latin scripts in HTML and other web technologies. It was partly practical, but it was also a matter of principle. I believed that the web should be truly global, and that meant making it accessible to people regardless of their linguistic background. If we limited the web to English speakers, we would be cutting ourselves off from a huge portion of human creativity and knowledge. Nicol's work on internationalization was crucial for the web's global expansion. By the late 1990s, web technologies had much better support for non-Latin scripts and international character sets. This made it possible for people around the world to create web content in their own languages. But even with better support for international characters, the underlying programming languages and markup languages were still fundamentally English-centric. The tags in HTML were still English words, the programming keywords were still English words. We had solved the problem of displaying non-English text, but we hadn't solved the problem of non-English programming. This distinction is important. There's a difference between being able to display content in multiple languages and being able to program in multiple languages. HTML gained the ability to display text in any language, but the HTML tags themselves remained in English. This raises some interesting questions about the nature of programming languages and their relationship to human languages. Should programming languages be culturally neutral? Is it possible to create programming languages that work equally well for speakers of different human languages? There have been some attempts to create programming languages that use non-English keywords. For example, there are versions of programming languages that use Chinese or Arabic keywords. But these haven't gained widespread adoption, partly because the global programming community has already converged on English-based languages. The network effects are powerful. If most programmers are using English-based languages, then new programmers have an incentive to learn English-based languages so they can participate in the broader programming community. This creates a feedback loop that reinforces English dominance. But it's important to remember that this dominance isn't permanent or inevitable. Language dominance can shift over time. Latin was once the dominant language of scholarship in Europe, but that changed. English is dominant in programming now, but that could change too, especially as more programming is done in countries where English isn't the primary language. This brings us back to the broader themes we've been exploring this season. 1995 was a year when many of the basic structures of the modern internet were established. The standardization of HTML was one of those foundational moments. But as we've seen throughout this season, these foundational moments often involved choices that seemed obvious at the time but had far-reaching consequences. It was a mess to be frank. Nicol was the one who looked at HTML, which at the time had no real character-processing model, and decided to find a way to let everybody use it. His solution was to use Unicode, a standard that handles text in almost all the world's languages. By adopting Unicode, as his character set, he managed to establish a formal model for the internationalization of HTML. Partially it was practical, but also there was a mission aspect to it as well. You know, the practicality came from the fact that I was working at NIC while living in Japan. I was like, "Hey, listen, you know, I would love to be part of this global conversation." But then also I still believe there's an under-representation of non-Latin voices in the global commentary. So I really wanted to accelerate the pace of that. Despite the great work that the W3C folks had done in 1995, it was up to people like Nicol to take the HTML standard and open it up to all the world's languages. And it became kind-of a mission for me to make that easy for non-native speakers, because I firmly believe that if you force everyone to speak in English, you force everyone to sort-of think in English and that's a very sad thing, because you end up losing the culture that is associated with the language. Nicol believes people must be allowed to communicate and work in their own language. To make the web an English-only zone would mean cutting off part of our shared humanity. There's a thing called Conway's law, and it's, kind-of-like, systems tend to evolve to represent the organizational structures that created them. To me Conway's law is a kind-of warning. Make sure the organizational structures represent all of us or else don't be surprised when the systems that evolve lock some people out. 1995 was a quarter-century ago and HTML has evolved to HTML5 today, but the work is far from finished. Look around at the coding landscape and English is still taking up a lot of the oxygen. Sometimes it can feel like a forgone conclusion. Pascal for example, was created by a Swedish [correction: Swiss] computer scientist who made it in English to appeal to the rest of the world. Python, same thing, written in English by a creator in the Netherlands and Ruby uses English too, even though it was made in Japan. Here's Gretchen McCulloch again. So if you're a non-native English speaker and you're thinking, oh, I want my programming language to be adopted by the most number of people. You might say, well, I know that people are used to coding in English-based things. I know that I've gotten used to coding in English-based programming languages, so I'm also going to create my programming language that has English-based keywords, because that's what people are used to. It's a feedback loop and not a great one. One of the things that I think we could do as a short-term way of calling attention to the problem is when we talk about programming languages, where the keywords are based on English, we could call them that. The first website wasn't written in HTML, it was written in English HTML, which opens up the possibility of what would Spanish HTML look like? What would Russian HTML look like? What does your HTML look like? How will you get to program on the web, and how will you make sure everyone else can do the same? I mentioned at the top of this episode that, in Medieval Europe, reading and writing was working in Latin, even if you didn't speak Latin every day. Only Latin was allowed as a tool for accessing the technology of writing, the technology of the printing press. Today, we look back on that and it makes little sense, but how different are we really? Isn't it just as ridiculous to expect everyone to code in English? And by the way, all you English speaking coders out there one day the shoe could be on the other foot. I don't think it's likely in the short term for a programming language based on language other than English to become dominant, but it's entirely possible in the long term because we know that Latin didn't last forever as the lingua franca. 1995 was the year that HTML became standardized, but that moment in tech history sparked a decade-long discussion that's continuing to this day. We're still finding ways to make the web language a platform for everybody. And this matters because we have no way of knowing what people from different backgrounds, different languages, will build; what apps they'll design; what code they might write once they're given the chance to work with their own voice. We just might be amazed by our own diversity and maybe that is the standard we should all be reaching for. Next time, we're diving into another of 1995's biggest transformations, the fantastic emergence of web designers. Until then find bonus material about HTML and all our 1995 stories over at redhat.com/commandlineheroes. I'm Saron Yitbarek and this is Command Line Heroes, an original podcast from Red Hat. Keep on coding.

About the show

Command Line Heroes

During its run from 2018 to 2022, Command Line Heroes shared the epic true stories of developers, programmers, hackers, geeks, and open source rebels, and how they revolutionized the technology landscape. Relive our journey through tech history, and use #CommandLinePod to share your favorite episodes.