Subscribe to the feed

Osquery is an open source, cross-platform tool that allows you to obtain information about your system using a SQL query language. My previous article explained how to use Osquery to query data about a system interactively. Running a query when needed is an excellent way to become comfortable with Osquery's SQL language and provides a convenient method for quick data collection on a system.

[ Learn more about open source SQL databases. Download the MariaDB and MySQL cheat sheet. ]

However, the real power of Osquery is its ability to scrape data about a system regularly. Osquery can run as a daemon and execute scheduled queries, allowing you to collect and process data on a regular cadence and respond to changes in the state of your systems.

Run a basic scheduled query

Setting up a basic scheduled query involves adding the query to Osquery's configuration file and starting the Osquery daemon. The default configuration file is located at /etc/osquery/osquery.conf, although you can change this by passing flags to the service.

The configuration is a JSON object that specifies certain global options and defines a schedule of queries to execute. The example file below will run a query every five seconds to obtain the user ID (UID), username, and shell for any users with a UID greater than or equal to 1000:

[root@fedora ~]# cat /etc/osquery/osquery.conf 
{
  "options": {
    "host_identifier": "hostname"
  },
  "schedule": {
    "users": {
      "query": "SELECT uid,username,shell FROM users WHERE uid >= 1000;",
      "interval": 5
    }
  }
}

Once the configuration file is in place, you can use the osqueryctl command to start, restart, or stop the Osquery daemon:

[root@fedora ~]# osqueryctl start

The daemon will start and begin executing the scheduled queries. Osquery logs to the filesystem by default by sending JSON output to a log file located at /var/log/osquery/osqueryd.results.log. Each log entry contains metadata, such as the query execution time and the columns of data that the query returned:

[root@fedora ~]# cat /var/log/osquery/osqueryd.results.log | jq
{
  "name": "users",
  "hostIdentifier": "fedora",
  "calendarTime": "Tue Oct  4 16:40:10 2022 UTC",
  "unixTime": 1664901610,
  "epoch": 0,
  "counter": 0,
  "numerics": false,
  "columns": {
    "shell": "/sbin/nologin",
    "uid": "65534",
    "username": "nobody"
  },
  "action": "added"
}

[ Learn how to manage your Linux environment for success. ]

Scheduled queries provide a differential between each point in time when the query was run. The "action" field in the JSON log indicates whether a row in the table was added or removed since the query was last run. You can see this in action by adding two users to the system and looking at the query results:

[root@fedora ~]# useradd testuser
[root@fedora ~]# useradd testuser2

The query results show the addition of two users. The action is added, indicating that these rows of data have been added since the last query:

[root@fedora ~]# cat /var/log/osquery/osqueryd.results.log | jq
{
  "name": "users",
  "hostIdentifier": "fedora",
  "calendarTime": "Tue Oct  4 16:40:10 2022 UTC",
  "unixTime": 1664901610,
  "epoch": 0,
  "counter": 0,
  "numerics": false,
  "columns": {
    "shell": "/sbin/nologin",
    "uid": "65534",
    "username": "nobody"
  },
  "action": "added"
}
{
  "name": "users",
  "hostIdentifier": "fedora",
  "calendarTime": "Tue Oct  4 16:41:35 2022 UTC",
  "unixTime": 1664901695,
  "epoch": 0,
  "counter": 1,
  "numerics": false,
  "columns": {
    "shell": "/bin/bash",
    "uid": "1000",
    "username": "testuser"
  },
  "action": "added"
}
{
  "name": "users",
  "hostIdentifier": "fedora",
  "calendarTime": "Tue Oct  4 16:41:40 2022 UTC",
  "unixTime": 1664901700,
  "epoch": 0,
  "counter": 2,
  "numerics": false,
  "columns": {
    "shell": "/bin/bash",
    "uid": "1001",
    "username": "testuser2"
  },
  "action": "added"
}
[root@f

Finally, you can delete the users from the system, and Osquery will report them as removed:

[root@fedora ~]# userdel -r testuser
[root@fedora ~]# userdel -r testuser2

The next set of query results shows that the data (and subsequently the users) have been removed since the last run of the query:

[root@fedora ~]# tail -n 2 /var/log/osquery/osqueryd.results.log | jq
{
  "name": "users",
  "hostIdentifier": "fedora",
  "calendarTime": "Tue Oct  4 16:47:10 2022 UTC",
  "unixTime": 1664902030,
  "epoch": 0,
  "counter": 3,
  "numerics": false,
  "columns": {
    "shell": "/bin/bash",
    "uid": "1000",
    "username": "testuser"
  },
  "action": "removed"
}
{
  "name": "users",
  "hostIdentifier": "fedora",
  "calendarTime": "Tue Oct  4 16:47:15 2022 UTC",
  "unixTime": 1664902035,
  "epoch": 0,
  "counter": 4,
  "numerics": false,
  "columns": {
    "shell": "/bin/bash",
    "uid": "1001",
    "username": "testuser2"
  },
  "action": "removed"
}

This differential approach allows you to build tooling that monitors Osquery logs and reports on system state changes. This is very useful for observability. Being able to answer questions about system changes is important for security, incident response, troubleshooting, and outage response situations.

Scheduled snapshots

A differential query doesn't always make sense for a particular dataset. Sometimes, you need a complete point-in-time view of the data that a query returns. For example, a query to monitor memory utilization should return a complete picture each time it runs. Osquery supports this type of scheduled query with the "snapshot" parameter in the query's configuration.

The configuration below schedules an additional query to run every 15 seconds and collect data about memory utilization on the system:

# cat /etc/osquery/osquery.conf 
{
  "options": {
    "host_identifier": "hostname"
  },
  "schedule": {
    "users": {
      "query": "SELECT uid,username,shell FROM users WHERE uid >= 1000;",
      "interval": 5
    },
    "memory_info": {
      "query": "SELECT memory_total,memory_free,memory_available,buffers FROM memory_info;",
      "interval": 15,
      "snapshot": true
    }
  }
}

Once you modify the configuration, you must restart Osquery for the changes to take effect:

# osqueryctl restart

Snapshot results are saved as JSON to a different log file located at /var/log/osquery/osqueryd.snapshots.log. The JSON objects in this log file contain a complete picture of the query results at the time it was executed, and the "action" field is not present:

# cat /var/log/osquery/osqueryd.snapshots.log  | jq
{
  "snapshot": [
    {
      "buffers": "1708032",
      "memory_available": "1471823872",
      "memory_free": "855764992",
      "memory_total": "2066640896"
    }
  ],
  "action": "snapshot",
  "name": "memory_info",
  "hostIdentifier": "fedora",
  "calendarTime": "Tue Oct  4 17:12:28 2022 UTC",
  "unixTime": 1664903548,
  "epoch": 0,
  "counter": 0,
  "numerics": false
}
{
  "snapshot": [
    {
      "buffers": "1708032",
      "memory_available": "1471827968",
      "memory_free": "855764992",
      "memory_total": "2066640896"
    }
  ],
  "action": "snapshot",
  "name": "memory_info",
  "hostIdentifier": "fedora",
  "calendarTime": "Tue Oct  4 17:12:42 2022 UTC",
  "unixTime": 1664903562,
  "epoch": 0,
  "counter": 0,
  "numerics": false
}

Snapshots allow you to collect a complete view of data at a specific point in time, and they are excellent for queries where a differential view of data doesn't make sense.

Wrap up

In this article, you extended your Osquery knowledge to build scheduled queries that can regularly collect data about a system. You learned how to run Osquery as a daemon and saw how queries could provide a different view or a point-in-time snapshot of the system state.

Osquery is a very powerful tool, and this two-part series has only scratched the surface of its capabilities. If you are interested in learning more about Osquery, check out the official documentation for a deeper dive into Osquery's underlying architecture and features.


About the author

Anthony Critelli is a Linux systems engineer with interests in automation, containerization, tracing, and performance. He started his professional career as a network engineer and eventually made the switch to the Linux systems side of IT. He holds a B.S. and an M.S. from the Rochester Institute of Technology.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Original series icon

Original shows

Entertaining stories from the makers and leaders in enterprise tech