Osquery is an open source, cross-platform tool that allows you to obtain information about your system using a SQL query language. My previous article explained how to use Osquery to query data about a system interactively. Running a query when needed is an excellent way to become comfortable with Osquery's SQL language and provides a convenient method for quick data collection on a system.
[ Learn more about open source SQL databases. Download the MariaDB and MySQL cheat sheet. ]
However, the real power of Osquery is its ability to scrape data about a system regularly. Osquery can run as a daemon and execute scheduled queries, allowing you to collect and process data on a regular cadence and respond to changes in the state of your systems.
Run a basic scheduled query
Setting up a basic scheduled query involves adding the query to Osquery's configuration file and starting the Osquery daemon. The default configuration file is located at /etc/osquery/osquery.conf
, although you can change this by passing flags to the service.
The configuration is a JSON object that specifies certain global options and defines a schedule of queries to execute. The example file below will run a query every five seconds to obtain the user ID (UID), username, and shell for any users with a UID greater than or equal to 1000:
[root@fedora ~]# cat /etc/osquery/osquery.conf
{
"options": {
"host_identifier": "hostname"
},
"schedule": {
"users": {
"query": "SELECT uid,username,shell FROM users WHERE uid >= 1000;",
"interval": 5
}
}
}
Once the configuration file is in place, you can use the osqueryctl
command to start, restart, or stop the Osquery daemon:
[root@fedora ~]# osqueryctl start
The daemon will start and begin executing the scheduled queries. Osquery logs to the filesystem by default by sending JSON output to a log file located at /var/log/osquery/osqueryd.results.log
. Each log entry contains metadata, such as the query execution time and the columns of data that the query returned:
[root@fedora ~]# cat /var/log/osquery/osqueryd.results.log | jq
{
"name": "users",
"hostIdentifier": "fedora",
"calendarTime": "Tue Oct 4 16:40:10 2022 UTC",
"unixTime": 1664901610,
"epoch": 0,
"counter": 0,
"numerics": false,
"columns": {
"shell": "/sbin/nologin",
"uid": "65534",
"username": "nobody"
},
"action": "added"
}
[ Learn how to manage your Linux environment for success. ]
Scheduled queries provide a differential between each point in time when the query was run. The "action"
field in the JSON log indicates whether a row in the table was added or removed since the query was last run. You can see this in action by adding two users to the system and looking at the query results:
[root@fedora ~]# useradd testuser
[root@fedora ~]# useradd testuser2
The query results show the addition of two users. The action is added
, indicating that these rows of data have been added since the last query:
[root@fedora ~]# cat /var/log/osquery/osqueryd.results.log | jq
{
"name": "users",
"hostIdentifier": "fedora",
"calendarTime": "Tue Oct 4 16:40:10 2022 UTC",
"unixTime": 1664901610,
"epoch": 0,
"counter": 0,
"numerics": false,
"columns": {
"shell": "/sbin/nologin",
"uid": "65534",
"username": "nobody"
},
"action": "added"
}
{
"name": "users",
"hostIdentifier": "fedora",
"calendarTime": "Tue Oct 4 16:41:35 2022 UTC",
"unixTime": 1664901695,
"epoch": 0,
"counter": 1,
"numerics": false,
"columns": {
"shell": "/bin/bash",
"uid": "1000",
"username": "testuser"
},
"action": "added"
}
{
"name": "users",
"hostIdentifier": "fedora",
"calendarTime": "Tue Oct 4 16:41:40 2022 UTC",
"unixTime": 1664901700,
"epoch": 0,
"counter": 2,
"numerics": false,
"columns": {
"shell": "/bin/bash",
"uid": "1001",
"username": "testuser2"
},
"action": "added"
}
[root@f
Finally, you can delete the users from the system, and Osquery will report them as removed:
[root@fedora ~]# userdel -r testuser
[root@fedora ~]# userdel -r testuser2
The next set of query results shows that the data (and subsequently the users) have been removed
since the last run of the query:
[root@fedora ~]# tail -n 2 /var/log/osquery/osqueryd.results.log | jq
{
"name": "users",
"hostIdentifier": "fedora",
"calendarTime": "Tue Oct 4 16:47:10 2022 UTC",
"unixTime": 1664902030,
"epoch": 0,
"counter": 3,
"numerics": false,
"columns": {
"shell": "/bin/bash",
"uid": "1000",
"username": "testuser"
},
"action": "removed"
}
{
"name": "users",
"hostIdentifier": "fedora",
"calendarTime": "Tue Oct 4 16:47:15 2022 UTC",
"unixTime": 1664902035,
"epoch": 0,
"counter": 4,
"numerics": false,
"columns": {
"shell": "/bin/bash",
"uid": "1001",
"username": "testuser2"
},
"action": "removed"
}
This differential approach allows you to build tooling that monitors Osquery logs and reports on system state changes. This is very useful for observability. Being able to answer questions about system changes is important for security, incident response, troubleshooting, and outage response situations.
Scheduled snapshots
A differential query doesn't always make sense for a particular dataset. Sometimes, you need a complete point-in-time view of the data that a query returns. For example, a query to monitor memory utilization should return a complete picture each time it runs. Osquery supports this type of scheduled query with the "snapshot"
parameter in the query's configuration.
The configuration below schedules an additional query to run every 15 seconds and collect data about memory utilization on the system:
# cat /etc/osquery/osquery.conf
{
"options": {
"host_identifier": "hostname"
},
"schedule": {
"users": {
"query": "SELECT uid,username,shell FROM users WHERE uid >= 1000;",
"interval": 5
},
"memory_info": {
"query": "SELECT memory_total,memory_free,memory_available,buffers FROM memory_info;",
"interval": 15,
"snapshot": true
}
}
}
Once you modify the configuration, you must restart Osquery for the changes to take effect:
# osqueryctl restart
Snapshot results are saved as JSON to a different log file located at /var/log/osquery/osqueryd.snapshots.log
. The JSON objects in this log file contain a complete picture of the query results at the time it was executed, and the "action"
field is not present:
# cat /var/log/osquery/osqueryd.snapshots.log | jq
{
"snapshot": [
{
"buffers": "1708032",
"memory_available": "1471823872",
"memory_free": "855764992",
"memory_total": "2066640896"
}
],
"action": "snapshot",
"name": "memory_info",
"hostIdentifier": "fedora",
"calendarTime": "Tue Oct 4 17:12:28 2022 UTC",
"unixTime": 1664903548,
"epoch": 0,
"counter": 0,
"numerics": false
}
{
"snapshot": [
{
"buffers": "1708032",
"memory_available": "1471827968",
"memory_free": "855764992",
"memory_total": "2066640896"
}
],
"action": "snapshot",
"name": "memory_info",
"hostIdentifier": "fedora",
"calendarTime": "Tue Oct 4 17:12:42 2022 UTC",
"unixTime": 1664903562,
"epoch": 0,
"counter": 0,
"numerics": false
}
Snapshots allow you to collect a complete view of data at a specific point in time, and they are excellent for queries where a differential view of data doesn't make sense.
Wrap up
In this article, you extended your Osquery knowledge to build scheduled queries that can regularly collect data about a system. You learned how to run Osquery as a daemon and saw how queries could provide a different view or a point-in-time snapshot of the system state.
Osquery is a very powerful tool, and this two-part series has only scratched the surface of its capabilities. If you are interested in learning more about Osquery, check out the official documentation for a deeper dive into Osquery's underlying architecture and features.
About the author
Anthony Critelli is a Linux systems engineer with interests in automation, containerization, tracing, and performance. He started his professional career as a network engineer and eventually made the switch to the Linux systems side of IT. He holds a B.S. and an M.S. from the Rochester Institute of Technology.
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit