Building Cloud Native Data Analytic Services for the Edge

June 3, 2021Alex Handy5-minute read

This is a guest post by Jonathan Katz, Vice President of Platform Engineering at Crunchy Data & PostgreSQL Core Team Member.

Edge computing helps businesses become more proactive and dynamic by placing applications and processing power closer to the devices and sensors that create or consume data. This enables them to gather, analyze, and turn large data flows into actionable insights, faster. We see this in delivery services that produce vast arrays of sensor data and benefit from the rich analytics that result from aggregation and processing this data. These connected vehicles and the supporting fleet management systems stand to benefit from cloud-native data analytic services using AI-powered intelligent applications to gather insights and solve problems in real time.

One powerful application of the potential for cloud-native data analytic services built from PostgreSQL and OpenShift is spatial data analytics of connected vehicles.

The raw data generated by fleets of moving objects fitted with sensors (aka “telematics”) is voluminous and fast changing. This data contains many analytical insights that can be mined using systems that combine stream processing to extract data of interest with edge databases to collate and analyze those data.

Implementing databases and data analytics with cloud-native applications involves several steps and tools, often including data Ingestion; preliminary data storage; data preparation; data storage for analytics; and data analysis. An open, adaptable architecture will help you execute this process more effectively. This architecture requires several key technologies. Container and Kubernetes platforms, such as Red Hat OpenShift provide a consistent foundation for deploying databases, data analytics tools, and cloud-native

applications across infrastructure, as well as self-service capabilities for developers and integrated compute acceleration.

Change Data Capture (CDC) captures row-level changes to database tables and passes corresponding change events to a data streaming bus. Applications can read these change event streams and access these change events in the order in which they occurred. Thus, CDC helps to bridge traditional data stores and new cloud-native event-driven architectures.

Red Hat Integration includes the Red Hat Change Data Capture (Debezium) connectors to Red Hat AMQ Streams which are deployed and managed using Kafka Connect. Debezium is a set of distributed services that captures row-level changes in databases so that applications can see and respond to those changes. Debezium is built upon the Apache Kafka project and uses Kafka to transport the changes from one system to another.

The most interesting aspect of Debezium is that at the core it is using CDC to capture the data and push it into Kafka. The source PostgreSQL database remains untouched in the sense that we do not have to add triggers or log tables. This is a huge advantage as triggers and log tables degrade performance. In addition, PostgreSQL manages the changes in such a way that we do nott lose them if we restart.

This makes the system much more flexible. If you want to add a new microservice, simply subscribe to the topic in Kafka that is pertinent to the service.

Let’s dive in and see how to actually make this work.

Deploying Red Hat AMQ Stream, Red Hat Integration, and Crunchy PostgreSQL on OpenShift

Red Hat OpenShift and the Kubernetes Operator pattern provide the ability to deploy this cloud-native data analytic solution using the automation and standardization afforded by containers and Kubernetes.

The following section will provide steps for deploying Red Hat AMQ Stream, Red Hat Integration, and Crunchy PostgreSQL on OpenShift using the Red Hat Marketplace and Operators.

Set up the Crunchy Postgres Operator. We will create a new project called "pgo" that will serve as the home of both our PostgreSQL cluster and the Debezium components.

Create a new project that will be used for this exercise, which will include the Postgres Operator:

oc new-project pgo

You can then proceed to install the Postgres Operator. Be sure to install it into the "pgo" namespace.

(Prior to PostgreSQL Operator version 4.6.2, on OpenShift you will need to edit "pgo.yaml" entry in "pgo-config" ConfigMap and change "DisableFSGroup" to "true" and restart the PostgreSQL Operator Pod.)

Once the Postgres Operator is up and running and "DisableFSGroup" is set correctly, go ahead and create a PostgreSQL cluster named "hippo" with some explicit passwords set. We will return and use this cluster later:

pgo create cluster hippo -n pgo --password-superuser=datalake --password=datalake

As mentioned previously, Debezium uses Kafka to transport changes between systems, so we will need to set up Kafka. This can be done using the Strimzi Kafka Operator. First, download the Strimzi Kafka Operator to your local environment:

export STRIMZI_VERSION=0.18.0
git clone --depth 1 -b $STRIMZI_VERSION https://github.com/strimzi/strimzi-kafka-operator
cd strimzi-kafka-operator

To install the Strimzi Kafka Operator, you will need to make some substitutions in the "install/cluster-operator" directory. Find all values that say:

"namespace: myproject"
and replace them with:
"namespace: pgo"

Then run the following commands to set up the Strimzi Kafka Operator:

oc create -f install/cluster-operator && oc create -f examples/templates/cluster-operator

Once this operator is deployed, run the following command to set up a Kafka broker:

oc process strimzi-ephemeral -p CLUSTER_NAME=broker -p ZOOKEEPER_NODE_COUNT=1 -p KAFKA_NODE_COUNT=1 -p KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 -p KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1 | oc apply -f -

To run Debezium in a container, we need to build an image that contains the connector that we require, in this case PostgreSQL. Download the Debezium Postgres container to your local environment:

CONNECTOR_VERSION="1.4.2.Final" curl "https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/${CONNECTOR_VERSION}/debezium-connector-postgres-${CONNECTOR_VERSION}-plugin.tar.gz" | tar xvz

Next, create a Dockerfile that includes the Debezium connector as a Kafka plugin:

FROM strimzi/kafka:0.18.0-kafka-2.5.0
USER root:root
RUN mkdir -p /opt/kafka/plugins/debezium
COPY ./debezium-connector-postgres/ /opt/kafka/plugins/debezium/
USER 1001

Build and push the image to your local repository. For example, if you are running CodeReady Containers, this would look similar to:

podman build . -t default-route-openshift-image-registry.apps-crc.testing/pgo/connect-debezium
podman push default-route-openshift-image-registry.apps-crc.testing/pgo/connect-debezium

Ensure the image is in your image stream:

oc set image-lookup connect-debezium

Now, we need to set up a secret that contains the credentials for Debezium to access the Postgres cluster. We can do this with the command below:

cat <<EOF > debezium-postgres-credentials.properties
postgres_username: postgres
postgres_password: datalake
EOF
oc create secret generic postgres-credentials \
  --from-file=debezium-postgres-credentials.properties

Next, create a Kafka Connect custom resource. This will allow for Debezium to transfer data between your PostgreSQL cluster and Kafka:

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaConnect
metadata:
  name: connect-debezium
  annotations:
  # use-connector-resources configures this KafkaConnect
  # to use KafkaConnector resources to avoid
  # needing to call the Connect REST API directly
    strimzi.io/use-connector-resources: "true"
spec:
  Image: pgo/connect-debezium
  replicas: 1
  bootstrapServers: broker-kafka-bootstrap:9093
  tls:
    trustedCertificates:
      - secretName: broker-cluster-ca-cert
        certificate: ca.crt
  config:
    config.storage.replication.factor: 1
    offset.storage.replication.factor: 1
    status.storage.replication.factor: 1
    config.providers: file
    config.providers.file.class: org.apache.kafka.common.config.provider.FileConfigProvider
  externalConfiguration:
    volumes:
      - name: connector-config
        secret:
          secretName: postgres-credentials

Let's set up some data. Connect to the "hippo" database and create some entries:

oc exec -it $(oc get pods --selector=pg-cluster=hagiscluster,role=master,pgo-pg-database -o name) -- psql hippo

CREATE TABLE users (
  id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  username text NOT NULL,
  email text NOT NULL
);
INSERT INTO users (username, email) VALUES ('hippo', 'hippo@exampel.com'), ('zebra', 'zebra@example.com);

Finally, we need to set up the Debezium inventory connector which will actually facilitate the extraction of data from the PostgreSQL database into Kafka:

oc exec -i -c kafka broker-kafka-0 -- curl -X POST \
    -H "Accept:application/json" \
    -H "Content-Type:application/json" \
    http://connect-debezium-connect-api:8083/connectors -d @- <<EOF

{
    "name": "inventory-connector",
    "config": {
      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "database.hostname": "pgsql",
      "plugin.name": "pgoutput",
      "database.hostname": "hippo"
      "database.port": "5432",
      "database.user": "postgres",
      "database.password": "datalake",
      "database.dbname" : "hippo",
      "database.server.name": "hippo",
      "table.whitelist": "public.users"
    }
}
EOF

We can verify that the data is streaming into Kafka by using the "kafka-console-consumer.sh", which acts as our sink. For example:

oc exec -it broker-kafka-0 -- /opt/kafka/bin/kafka-console-consumer.sh \
    --bootstrap-server localhost:9092 \
    --from-beginning \
    --property print.key=true \
    --topic hippo.public.users

Build Your Data Analytics Pipeline on OpenShift With the Red Hat Marketplace

Red Hat AMQ Streams, Red Hat Integration and Crunchy PostgreSQL for OpenShift, are available in the Red Hat Marketplace, an open cloud marketplace that enables enterprise companies to easily discover and access certified software for container-based environments across the hybrid cloud. Through the marketplace, customers can take advantage of responsive support, streamlined billing, and simplified governance with a single-dashboard to provide visibility and insights across clouds.

About the author

Alex Handy

Principal Product Marketing Manager

Red Hatter since 2018, technology historian and founder of The Museum of Art and Digital Entertainment. Two decades of journalism mixed with technology expertise, storytelling and oodles of computing experience from inception to ewaste recycling. I have taught or had my work used in classes at USF, SFSU, AAU, UC Law Hastings and Harvard Law.

I have worked with the EFF, Stanford, MIT, and Archive.org to brief the US Copyright Office and change US copyright law. We won multiple exemptions to the DMCA, accepted and implemented by the Librarian of Congress. My writings have appeared in Wired, Bloomberg, Make Magazine, SD Times, The Austin American Statesman, The Atlanta Journal Constitution and many other outlets.

I have been written about by the Wall Street Journal, The Washington Post, Wired and The Atlantic. I have been called "The Gertrude Stein of Video Games," an honor I accept, as I live less than a mile from her childhood home in Oakland, CA. I was project lead on the first successful institutional preservation and rebooting of the first massively multiplayer game, Habitat, for the C64, from 1986: https://neohabitat.org . I've consulted and collaborated with the NY MOMA, the Oakland Museum of California, Cisco, Semtech, Twilio, Game Developers Conference, NGNX, the Anti-Defamation League, the Library of Congress and the Oakland Public Library System on projects, contracts, and exhibitions.

Read full bio

Browse by channel

Explore all channels

Platform products

Try & buy

Featured

By industry

Featured

Topics

Articles

More to explore

For customers

For partners

About us

Open source

Company details

Recommendations

Select a language

Select a language

Building Cloud Native Data Analytic Services for the Edge

Build Your Data Analytics Pipeline on OpenShift With the Red Hat Marketplace

About the author

Alex Handy

More like this

Browse by channel

Products

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links