Documentation

What is Venice?

Venice is an open-source framework that enables users to deploy and manage stream processing pipelines.

Venice is designed for users who want to set up a streaming pipeline with minimal hassle, and may be most suitable for developers who want to quickly deploy a streaming pipeline for a small application.

gif of Venice pipeline in action — A Venice pipeline in action.

Features

Venice’s main features are:

An easy to use command line interface for common development and pipeline management tasks.
A default pipeline to quickly support stream processing applications.
Simple configuration of new connections and components.
A web-based UI for monitoring the message brokers.

Components

The Venice default pipeline consists of multiple components:

Kafka brokers for logging events
Zookeeper for managing the Kafka brokers
Confluent Schema Registry for managing schemas
ksqlDB and ksqlCLI for performing aggregations, filtration, transformations, and joins over real-time data
Kafka Connect for pushing data into and pulling data out of Kafka and for simplifying the process of adding new sources and sinks
PostgreSQL data sink for storing or presenting the data in multiple formats
Kafdrop Kafka UI for visually inspecting messages, schemas, and connectors

The components used in the pipeline are packaged in Docker containers, making them easy to build and deploy or swap for others to better meet your needs.

To learn why we chose these components, read our Case Study

Architecture

When does it fit?

Venice offers a middle ground between proprietary and DIY solutions (more details here). Venice is open-source, abstracts away some of the complexity, comes with reasonable defaults, and is fully customizable and extensible. It may be most suitable for a subset of users.

Venice may be a good fit if:

You want to start with a basic stream processing pipeline for a small application.
Most of the default settings would work for your use case.

When does it not fit?

Venice may NOT be a good fit if:

Your application has to handle a large volume of events.
Your use case is complex and requires substantial customization.

Quickstart

Installation Instructions

Dependencies

Install Node and npm

Node (the CLI was created and tested with version 13.8.0)
npm (the CLI was created and tested with version 6.13.6)

Install Venice CLI

Use this command:
git clone https://github.com/venice-framework/venice-cli.git && cd venice-cli && npm install && sudo npm link

Or do it step-by-step:

Clone the respository.
git clone https://github.com/venice-framework/venice-cli.git
Navigate into the newly installed venice-cli directory.
cd venice-cli
Install packages.
npm install or npm i
Create a symbolic link.
sudo npm link

Choose a pipeline

Navigate to a repo containing a pipeline, or install a Venice default pipeline.

You can use the Venice pipeline with a PostgreSQL sink if you do not have a pipeline you want to test.

NOTE: The following command will clone a repo into the current working directory. If you want the repo to be cloned into another folder, navigate away from venice-cli now (e.g., `cd ..)

venice install

Select "venice - Base pipeline with a Postgres sink".

cd venice-postgres-sink

Add a producer

You can use your own or a Venice producer example.

Open the docker-compose.yml file in a text editor.

Uncomment the section for the producer so that producer aligns vertically with the other services.

code snippet of the producer service in the default docker-compose.yml file

If you want to use your own producer, package it in a container and have it use the environment variables BROKER, TOPIC_NAME, and SCHEMA_REGISTRY_URL. Then, swap out image: veniceframework/python-producer-test-key-string for the image you want to use.

Choose whether to insert or upsert the data into PostgreSQL

Open the docker-compose.yml file in a text editor if it is not yet open.

insert inserts every row into the table. upsert adds a new row if a row with the same key does not exist yet. If a row with the same key exists, that row is updated.

The PostgreSQL pipeline uses upsert by default. Within the commands for the connector-init service, change upsert to insert if you want to use insert instead.

code snippet of the connector-init service in the default docker-compose.yml file, with the configuration line for insert or upsert highlighted

Save and close the file.

Change the topic name and PostgreSQL environment variables.

Open the .env file in a text editor. If you are following along with the Venice producer and PostgreSQL pipeline, change the values for TOPIC_NAME to bus_locations and POSTGRES_DB to buses. Save and close the file.

Start the pipeline.

venice up

Show the status of all components of the pipeline.

venice status or venice -st

To follow the logs of one or more components

venice logs

press space bar to select before pressing enter/return

The output is piped to less, so you can navigate up and down through the logs using the arrow keys or j and k. Use the spacebar to scroll down by “page”, and press q to quit.

Show topics, schemas, and connectors

venice topics or venice -t
venice schemas or venice -s
venice connectors or venice -c

gif of venice commands to view topics, schemas, and connectors — view topics, schemas, and connectors

Open the KSQL CLI to see the data being written to the topic.

This will download the KSQL CLI if you have not already used it. There will be a notable delay as it loads for the first time.

venice ksql

If you get an error message that Remote server at http://ksql-server:8088 does not appear to be a valid KSQL server, exit the KSQL CLI and try again.

Try these KSQL commands: Show all topics. You should see bus_locations among the list of topics. SHOW TOPICS;

Print all data from the beginning. PRINT bus_locations FROM BEGINNING;

Follow any new input. PRINT bus_locations; CTRL+C to stop printing.

You can issue additional queries over the data such as aggregations and other transformations. See more KSQL commands in the KSQL documentation

CTRL+D to exit the KSQL console.

Launch the PostgreSQL CLI to see the data populating in the database.

venice psql

If you are following along with the bus example, enter “venice_user” for the name of the PostgreSQL user, and “buses” for the name of the PostgreSQL database.

PostgreSQL commands:

Show all tables. You should see bus_locations.
\dt
Show the first ten rows.
SELECT * FROM bus_locations LIMIT 10;
Display a count of how many records there are in the table.
SELECT COUNT(*) FROM bus_locations;
CTRL+D to exit the PSQL console. You can also use exit, quit, \q, or CTRL+C.

GUI

For a graphic interface for the Kafka cluster, navigate to localhost:9000 in your web browser. Here you can view Kafka topics and consumer groups. You can examine individual messages to ensure they are being serialized and deserialized correctly. You can also verify message distribution among partitions to ensure balance.

gif of viewing messages through the Kafdrop graphical user interface — View partition messages in Kafdrop.

Shut down the Venice pipeline

venice down