This is a guest post by Jonathan Katz, Vice President of Platform Engineering at Crunchy Data & PostgreSQL Core Team Member.
Edge computing helps businesses become more proactive and dynamic by placing applications and processing power closer to the devices and sensors that create or consume data. This enables them to gather, analyze, and turn large data flows into actionable insights, faster. We see this in delivery services that produce vast arrays of sensor data and benefit from the rich analytics that result from aggregation and processing this data. These connected vehicles and the supporting fleet management systems stand to benefit from cloud-native data analytic services using AI-powered intelligent applications to gather insights and solve problems in real time.
One powerful application of the potential for cloud-native data analytic services built from PostgreSQL and OpenShift is spatial data analytics of connected vehicles.
The raw data generated by fleets of moving objects fitted with sensors (aka “telematics”) is voluminous and fast changing. This data contains many analytical insights that can be mined using systems that combine stream processing to extract data of interest with edge databases to collate and analyze those data.
Implementing databases and data analytics with cloud-native applications involves several steps and tools, often including data Ingestion; preliminary data storage; data preparation; data storage for analytics; and data analysis. An open, adaptable architecture will help you execute this process more effectively. This architecture requires several key technologies. Container and Kubernetes platforms, such as Red Hat OpenShift provide a consistent foundation for deploying databases, data analytics tools, and cloud-native
applications across infrastructure, as well as self-service capabilities for developers and integrated compute acceleration.
Change Data Capture (CDC) captures row-level changes to database tables and passes corresponding change events to a data streaming bus. Applications can read these change event streams and access these change events in the order in which they occurred. Thus, CDC helps to bridge traditional data stores and new cloud-native event-driven architectures.
Red Hat Integration includes the Red Hat Change Data Capture (Debezium) connectors to Red Hat AMQ Streams which are deployed and managed using Kafka Connect. Debezium is a set of distributed services that captures row-level changes in databases so that applications can see and respond to those changes. Debezium is built upon the Apache Kafka project and uses Kafka to transport the changes from one system to another.
The most interesting aspect of Debezium is that at the core it is using CDC to capture the data and push it into Kafka. The source PostgreSQL database remains untouched in the sense that we do not have to add triggers or log tables. This is a huge advantage as triggers and log tables degrade performance. In addition, PostgreSQL manages the changes in such a way that we do nott lose them if we restart.
This makes the system much more flexible. If you want to add a new microservice, simply subscribe to the topic in Kafka that is pertinent to the service.
Let’s dive in and see how to actually make this work.
Deploying Red Hat AMQ Stream, Red Hat Integration, and Crunchy PostgreSQL on OpenShift
Red Hat OpenShift and the Kubernetes Operator pattern provide the ability to deploy this cloud-native data analytic solution using the automation and standardization afforded by containers and Kubernetes.
The following section will provide steps for deploying Red Hat AMQ Stream, Red Hat Integration, and Crunchy PostgreSQL on OpenShift using the Red Hat Marketplace and Operators.
Set up the Crunchy Postgres Operator. We will create a new project called "pgo" that will serve as the home of both our PostgreSQL cluster and the Debezium components.
Create a new project that will be used for this exercise, which will include the Postgres Operator:
oc new-project pgo
You can then proceed to install the Postgres Operator. Be sure to install it into the "pgo" namespace.
(Prior to PostgreSQL Operator version 4.6.2, on OpenShift you will need to edit "pgo.yaml" entry in "pgo-config" ConfigMap and change "DisableFSGroup" to "true" and restart the PostgreSQL Operator Pod.)
Once the Postgres Operator is up and running and "DisableFSGroup" is set correctly, go ahead and create a PostgreSQL cluster named "hippo" with some explicit passwords set. We will return and use this cluster later:
pgo create cluster hippo -n pgo --password-superuser=datalake --password=datalake
As mentioned previously, Debezium uses Kafka to transport changes between systems, so we will need to set up Kafka. This can be done using the Strimzi Kafka Operator. First, download the Strimzi Kafka Operator to your local environment:
export STRIMZI_VERSION=0.18.0
git clone --depth 1 -b $STRIMZI_VERSION https://github.com/strimzi/strimzi-kafka-operator
cd strimzi-kafka-operator
To install the Strimzi Kafka Operator, you will need to make some substitutions in the "install/cluster-operator" directory. Find all values that say:
"namespace: myproject"
and replace them with:
"namespace: pgo"
Then run the following commands to set up the Strimzi Kafka Operator:
oc create -f install/cluster-operator && oc create -f examples/templates/cluster-operator
Once this operator is deployed, run the following command to set up a Kafka broker:
oc process strimzi-ephemeral -p CLUSTER_NAME=broker -p ZOOKEEPER_NODE_COUNT=1 -p KAFKA_NODE_COUNT=1 -p KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 -p KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1 | oc apply -f -
To run Debezium in a container, we need to build an image that contains the connector that we require, in this case PostgreSQL. Download the Debezium Postgres container to your local environment:
CONNECTOR_VERSION="1.4.2.Final" curl "https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/${CONNECTOR_VERSION}/debezium-connector-postgres-${CONNECTOR_VERSION}-plugin.tar.gz" | tar xvz
Next, create a Dockerfile that includes the Debezium connector as a Kafka plugin:
FROM strimzi/kafka:0.18.0-kafka-2.5.0
USER root:root
RUN mkdir -p /opt/kafka/plugins/debezium
COPY ./debezium-connector-postgres/ /opt/kafka/plugins/debezium/
USER 1001
Build and push the image to your local repository. For example, if you are running CodeReady Containers, this would look similar to:
podman build . -t default-route-openshift-image-registry.apps-crc.testing/pgo/connect-debezium
podman push default-route-openshift-image-registry.apps-crc.testing/pgo/connect-debezium
Ensure the image is in your image stream:
oc set image-lookup connect-debezium
Now, we need to set up a secret that contains the credentials for Debezium to access the Postgres cluster. We can do this with the command below:
cat <<EOF > debezium-postgres-credentials.properties
postgres_username: postgres
postgres_password: datalake
EOF
oc create secret generic postgres-credentials \
--from-file=debezium-postgres-credentials.properties
Next, create a Kafka Connect custom resource. This will allow for Debezium to transfer data between your PostgreSQL cluster and Kafka:
apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaConnect
metadata:
name: connect-debezium
annotations:
# use-connector-resources configures this KafkaConnect
# to use KafkaConnector resources to avoid
# needing to call the Connect REST API directly
strimzi.io/use-connector-resources: "true"
spec:
Image: pgo/connect-debezium
replicas: 1
bootstrapServers: broker-kafka-bootstrap:9093
tls:
trustedCertificates:
- secretName: broker-cluster-ca-cert
certificate: ca.crt
config:
config.storage.replication.factor: 1
offset.storage.replication.factor: 1
status.storage.replication.factor: 1
config.providers: file
config.providers.file.class: org.apache.kafka.common.config.provider.FileConfigProvider
externalConfiguration:
volumes:
- name: connector-config
secret:
secretName: postgres-credentials
Let's set up some data. Connect to the "hippo" database and create some entries:
oc exec -it $(oc get pods --selector=pg-cluster=hagiscluster,role=master,pgo-pg-database -o name) -- psql hippo
CREATE TABLE users (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
username text NOT NULL,
email text NOT NULL
);
INSERT INTO users (username, email) VALUES ('hippo', 'hippo@exampel.com'), ('zebra', 'zebra@example.com);
Finally, we need to set up the Debezium inventory connector which will actually facilitate the extraction of data from the PostgreSQL database into Kafka:
oc exec -i -c kafka broker-kafka-0 -- curl -X POST \
-H "Accept:application/json" \
-H "Content-Type:application/json" \
http://connect-debezium-connect-api:8083/connectors -d @- <<EOF
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "pgsql",
"plugin.name": "pgoutput",
"database.hostname": "hippo"
"database.port": "5432",
"database.user": "postgres",
"database.password": "datalake",
"database.dbname" : "hippo",
"database.server.name": "hippo",
"table.whitelist": "public.users"
}
}
EOF
We can verify that the data is streaming into Kafka by using the "kafka-console-consumer.sh", which acts as our sink. For example:
oc exec -it broker-kafka-0 -- /opt/kafka/bin/kafka-console-consumer.sh \
--bootstrap-server localhost:9092 \
--from-beginning \
--property print.key=true \
--topic hippo.public.users
Build Your Data Analytics Pipeline on OpenShift With the Red Hat Marketplace
Red Hat AMQ Streams, Red Hat Integration and Crunchy PostgreSQL for OpenShift, are available in the Red Hat Marketplace, an open cloud marketplace that enables enterprise companies to easily discover and access certified software for container-based environments across the hybrid cloud. Through the marketplace, customers can take advantage of responsive support, streamlined billing, and simplified governance with a single-dashboard to provide visibility and insights across clouds.
About the author
Red Hatter since 2018, technology historian and founder of The Museum of Art and Digital Entertainment. Two decades of journalism mixed with technology expertise, storytelling and oodles of computing experience from inception to ewaste recycling. I have taught or had my work used in classes at USF, SFSU, AAU, UC Law Hastings and Harvard Law.
I have worked with the EFF, Stanford, MIT, and Archive.org to brief the US Copyright Office and change US copyright law. We won multiple exemptions to the DMCA, accepted and implemented by the Librarian of Congress. My writings have appeared in Wired, Bloomberg, Make Magazine, SD Times, The Austin American Statesman, The Atlanta Journal Constitution and many other outlets.
I have been written about by the Wall Street Journal, The Washington Post, Wired and The Atlantic. I have been called "The Gertrude Stein of Video Games," an honor I accept, as I live less than a mile from her childhood home in Oakland, CA. I was project lead on the first successful institutional preservation and rebooting of the first massively multiplayer game, Habitat, for the C64, from 1986: https://neohabitat.org . I've consulted and collaborated with the NY MOMA, the Oakland Museum of California, Cisco, Semtech, Twilio, Game Developers Conference, NGNX, the Anti-Defamation League, the Library of Congress and the Oakland Public Library System on projects, contracts, and exhibitions.
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit