Skip to main content

10 YAML tips for people who hate YAML

Do you hate YAML? These tips might ease your pain.

There are lots of formats for configuration files: a list of values, key and value pairs, INI files, YAML, JSON, XML, and many more. Of these, YAML sometimes gets cited as a particularly difficult one to handle for a few different reasons. While its ability to reflect hierarchical values is significant and its minimalism can be refreshing to some, its Python-like reliance upon syntactic whitespace can be frustrating.

However, the open source world is diverse and flexible enough that no one has to suffer through abrasive technology, so if you hate YAML, here are 10 things you can (and should!) do to make it tolerable. Starting with zero, as any sensible index should.

0. Make your editor do the work

Whatever text editor you use probably has plugins to make dealing with syntax easier. If you're not using a YAML plugin for your editor, find one and install it. The effort you spend on finding a plugin and configuring it as needed will pay off tenfold the very next time you edit YAML.

For example, the Atom editor comes with a YAML mode by default, and while GNU Emacs ships with minimal support, you can add additional packages like yaml-mode to help.

Emacs in YAML and whitespace mode
Emacs in YAML and whitespace mode.

If your favorite text editor lacks a YAML mode, you can address some of your grievances with small configuration changes. For instance, the default text editor for the GNOME desktop, Gedit, doesn't have a YAML mode available, but it does provide YAML syntax highlighting by default and features configurable tab width:

Configuring tab width and type in Gedit
Configuring tab width and type in Gedit.

With the drawspaces Gedit plugin package, you can make white space visible in the form of leading dots, removing any question about levels of indentation.

Take some time to research your favorite text editor. Find out what the editor, or its community, does to make YAML easier, and leverage those features in your work. You won't be sorry.

1. Use a linter

Ideally, programming languages and markup languages use predictable syntax. Computers tend to do well with predictability, so the concept of a linter was invented in 1978. If you're not using a linter for YAML, then it's time to adopt this 40-year-old tradition and use yamllint.

You can install yamllint on Linux using your distribution's package manager. For instance, on Red Hat Enterprise Linux 8 or Fedora:

$ sudo dnf install yamllint

Invoking yamllint is as simple as telling it to check a file. Here's an example of yamllint's response to a YAML file containing an error:

$ yamllint errorprone.yaml
errorprone.yaml
23:10     error    syntax error: mapping values are not allowed here
23:11     error    trailing spaces  (trailing-spaces)

That's not a time stamp on the left. It's the error's line and column number. You may or may not understand what error it's talking about, but now you know the error's location. Taking a second look at the location often makes the error's nature obvious. Success is eerily silent, so if you want feedback based on the lint's success, you can add a conditional second command with a double-ampersand (&&). In a POSIX shell, && fails if a command returns anything but 0, so upon success, your echo command makes that clear. This tactic is somewhat superficial, but some users prefer the assurance that the command did run correctly, rather than failing silently. Here's an example:

$ yamllint perfect.yaml && echo "OK"
OK

The reason yamllint is so silent when it succeeds is that it returns 0 errors when there are no errors.

2. Write in Python, not YAML

If you really hate YAML, stop writing in YAML, at least in the literal sense. You might be stuck with YAML because that's the only format an application accepts, but if the only requirement is to end up in YAML, then work in something else and then convert. Python, along with the excellent pyyaml library, makes this easy, and you have two methods to choose from: self-conversion or scripted.

Self-conversion

In the self-conversion method, your data files are also Python scripts that produce YAML. This works best for small data sets. Just write your JSON data into a Python variable, prepend an import statement, and end the file with a simple three-line output statement.

#!/usr/bin/python3	
import yaml 

d={
"glossary": {
  "title": "example glossary",
  "GlossDiv": {
	"title": "S",
	"GlossList": {
	  "GlossEntry": {
		"ID": "SGML",
		"SortAs": "SGML",
		"GlossTerm": "Standard Generalized Markup Language",
		"Acronym": "SGML",
		"Abbrev": "ISO 8879:1986",
		"GlossDef": {
		  "para": "A meta-markup language, used to create markup languages such as DocBook.",
		  "GlossSeeAlso": ["GML", "XML"]
		  },
		"GlossSee": "markup"
		}
	  }
	}
  }
}

f=open('output.yaml','w')
f.write(yaml.dump(d))
f.close

Run the file with Python to produce a file called output.yaml file.

$ python3 ./example.json
$ cat output.yaml
glossary:
  GlossDiv:
	GlossList:
	  GlossEntry:
		Abbrev: ISO 8879:1986
		Acronym: SGML
		GlossDef:
		  GlossSeeAlso: [GML, XML]
		  para: A meta-markup language, used to create markup languages such as DocBook.
		GlossSee: markup
		GlossTerm: Standard Generalized Markup Language
		ID: SGML
		SortAs: SGML
	title: S
  title: example glossary

This output is perfectly valid YAML, although yamllint does issue a warning that the file is not prefaced with ---, which is something you can adjust either in the Python script or manually.

Scripted conversion

In this method, you write in JSON and then run a Python conversion script to produce YAML. This scales better than self-conversion, because it keeps the converter separate from the data.

Create a JSON file and save it as example.json. Here is an example from json.org:

{
	"glossary": {
	  "title": "example glossary",
	  "GlossDiv": {
		"title": "S",
		"GlossList": {
		  "GlossEntry": {
			"ID": "SGML",
			"SortAs": "SGML",
			"GlossTerm": "Standard Generalized Markup Language",
			"Acronym": "SGML",
			"Abbrev": "ISO 8879:1986",
			"GlossDef": {
			  "para": "A meta-markup language, used to create markup languages such as DocBook.",
			  "GlossSeeAlso": ["GML", "XML"]
			  },
			"GlossSee": "markup"
			}
		  }
		}
	  }
	}

Create a simple converter and save it as json2yaml.py. This script imports both the YAML and JSON Python modules, loads a JSON file defined by the user, performs the conversion, and then writes the data to output.yaml.

#!/usr/bin/python3
import yaml
import sys
import json

OUT=open('output.yaml','w')
IN=open(sys.argv[1], 'r')

JSON = json.load(IN)
IN.close()
yaml.dump(JSON, OUT)
OUT.close()

Save this script in your system path, and execute as needed:

$ ~/bin/json2yaml.py example.json

3. Parse early, parse often

Sometimes it helps to look at a problem from a different angle. If your problem is YAML, and you're having a difficult time visualizing the data's relationships, you might find it useful to restructure that data, temporarily, into something you're more familiar with.

If you're more comfortable with dictionary-style lists or JSON, for instance, you can convert YAML to JSON in two commands using an interactive Python shell. Assume your YAML file is called mydata.yaml.

$ python3
>>> f=open('mydata.yaml','r')
>>> yaml.load(f)
{'document': 34843, 'date': datetime.date(2019, 5, 23), 'bill-to': {'given': 'Seth', 'family': 'Kenlon', 'address': {'street': '51b Mornington Road\n', 'city': 'Brooklyn', 'state': 'Wellington', 'postal': 6021, 'country': 'NZ'}}, 'words': 938, 'comments': 'Good article. Could be better.'}

There are many other examples, and there are plenty of online converters and local parsers, so don't hesitate to reformat data when it starts to look more like a laundry list than markup.

4. Read the spec

After I've been away from YAML for a while and find myself using it again, I go straight back to yaml.org to re-read the spec. If you've never read the specification for YAML and you find YAML confusing, a glance at the spec may provide the clarification you never knew you needed. The specification is surprisingly easy to read, with the requirements for valid YAML spelled out with lots of examples in chapter 6.

5. Pseudo-config

Before I started writing my book, Developing Games on the Raspberry Pi, Apress, 2019, the publisher asked me for an outline. You'd think an outline would be easy. By definition, it's just the titles of chapters and sections, with no real content. And yet, out of the 300 pages published, the hardest part to write was that initial outline.

YAML can be the same way. You may have a notion of the data you need to record, but that doesn't mean you fully understand how it's all related. So before you sit down to write YAML, try doing a pseudo-config instead.

A pseudo-config is like pseudo-code. You don't have to worry about structure or indentation, parent-child relationships, inheritance, or nesting. You just create iterations of data in the way you currently understand it inside your head.

A pseudo-config.
A pseudo-config.

Once you've got your pseudo-config down on paper, study it, and transform your results into valid YAML.

6. Resolve the spaces vs. tabs debate

OK, maybe you won't definitively resolve the spaces-vs-tabs debate, but you should at least resolve the debate within your project or organization. Whether you resolve this question with a post-process sed script, text editor configuration, or a blood-oath to respect your linter's results, anyone in your team who touches a YAML project must agree to use spaces (in accordance with the YAML spec).

Any good text editor allows you to define a number of spaces instead of a tab character, so the choice shouldn't negatively affect fans of the Tab key.

Tabs and spaces are, as you probably know all too well, essentially invisible. And when something is out of sight, it rarely comes to mind until the bitter end, when you've tested and eliminated all of the "obvious" problems. An hour wasted to an errant tab or group of spaces is your signal to create a policy to use one or the other, and then to develop a fail-safe check for compliance (such as a Git hook to enforce linting).

7. Less is more (or more is less)

Some people like to write YAML to emphasize its structure. They indent vigorously to help themselves visualize chunks of data. It's a sort of cheat to mimic markup languages that have explicit delimiters.

Here's a good example from Ansible's documentation:

# Employee records
-  martin:
        name: Martin D'vloper
        job: Developer
        skills:
            - python
            - perl
            - pascal
-  tabitha:
        name: Tabitha Bitumen
        job: Developer
        skills:
            - lisp
            - fortran
            - erlang

For some users, this approach is a helpful way to lay out a YAML document, while other users miss the structure for the void of seemingly gratuitous white space.

If you own and maintain a YAML document, then you get to define what "indentation" means. If blocks of horizontal white space distract you, then use the minimal amount of white space required by the YAML spec. For example, the same YAML from the Ansible documentation can be represented with fewer indents without losing any of its validity or meaning:

---
- martin:
   name: Martin D'vloper
   job: Developer
   skills:
   - python
   - perl
   - pascal
- tabitha:
   name: Tabitha Bitumen
   job: Developer
   skills:
   - lisp
   - fortran
   - erlang

8. Make a recipe

I'm a big fan of repetition breeding familiarity, but sometimes repetition just breeds repeated stupid mistakes. Luckily, a clever peasant woman experienced this very phenomenon back in 396 AD (don't fact-check me), and invented the concept of the recipe.

If you find yourself making YAML document mistakes over and over, you can embed a recipe or template in the YAML file as a commented section. When you're adding a section, copy the commented recipe and overwrite the dummy data with your new real data. For example:

---
# - <common name>:
#   name: Given Surname
#   job: JOB
#   skills:
#   - LANG
- martin:
  name: Martin D'vloper
  job: Developer
  skills:
  - python
  - perl
  - pascal
- tabitha:
  name: Tabitha Bitumen
  job: Developer
  skills:
  - lisp
  - fortran
  - erlang

9. Use something else

I'm a fan of YAML, generally, but sometimes YAML isn't the answer. If you're not locked into YAML by the application you're using, then you might be better served by some other configuration format. Sometimes config files outgrow themselves and are better refactored into simple Lua or Python scripts.

YAML is a great tool and is popular among users for its minimalism and simplicity, but it's not the only tool in your kit. Sometimes it's best to part ways. One of the benefits of YAML is that parsing libraries are common, so as long as you provide migration options, your users should be able to adapt painlessly.

If YAML is a requirement, though, keep these tips in mind and conquer your YAML hatred once and for all!

Topics:   Programming   Automation  
Author’s photo

Seth Kenlon

Seth Kenlon is a UNIX geek and free software enthusiast. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.