PostgreSQL connector client tutorial | Enterprise Search documentation [8.7]

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

› ›

« Getting started with the MongoDB connector Getting started with machine learning »

PostgreSQL connector client tutorialedit

This tutorial walks you through the process of creating a connector client for a PostgreSQL data source. You’ll be using the Build a connector workflow in the Kibana UI. This means you’ll be deploying the connector on your own infrastructure. Refer to the Elastic PostgreSQL connector reference for more information about this connector.

You’ll use the Python connector framework to create the connector. In this exercise, you’ll be working in both the terminal (or your IDE) and the Kibana UI.

If you want to build a connector for another data source, use this tutorial as a blueprint. Refer to the list of available connector clients.

You can test the PostgreSQL connector client in an isolated environment, using a real PostgreSQL source and a Dockerized Elasticsearch instance. This requires no configuration and is as simple as running a single command in your terminal.

Prerequisitesedit

Elastic prerequisitesedit

First, ensure you satisfy the prerequisites for building a connector. Importantly, your Enterprise Search service needs at least 4GB RAM per zone.

PostgreSQL prerequisitesedit

You need:

PostgreSQL version 11+.
Tables must be owned by a PostgreSQL user.
Database superuser privileges are required to index all database tables.

You should enable recording of the commit time of PostgreSQL transactions. Otherwise, all data will be indexed in every sync. By default, track_commit_timestamp is off.

Enable this by running the following command on the PosgreSQL server command line:

ALTER SYSTEM SET track_commit_timestamp = on;

Then restart the PostgreSQL server.

Stepsedit

To complete this tutorial, you’ll need to complete the following steps:

Create an Elasticsearch indexedit

Elastic connectors enable you to create searchable, read-only replicas of your data sources in Elasticsearch. The first step in setting up your connector client is to create an index.

In the Kibana UI go to Enterprise Search > Content > Elasticsearch indices.

Create a new connector index:

Choose Create new index.
Choose Build a connector.
Provide the connector name and optionally change the language analyzer to match the human language of your data source. (The name you provide is automatically prefixed with search-.)
Save your changes.

The index is created and ready to configure.

Gather Elastic details

Before you can configure the connector, you need to gather some details about your Elastic deployment:

Elasticsearch endpoint.
- If you’re an Elastic Cloud user, find your deployment’s Elasticsearch endpoint in the Cloud UI under Cloud > Deployments > <your-deployment> > Elasticsearch.
- If you’re running your Elastic deployment and the connector service in Docker, the default Elasticsearch endpoint is http://host.docker.internal:9200.
API key. You’ll need this key to configure the connector. Use an existing key or create a new one.
Connector ID. Your unique connector ID is automatically generated when you create the connector. Find this in the Kibana UI.

Set up the connectoredit

Once you’ve created an index, you can set up the connector. You will be guided through this process in the UI.

Edit the name and description for the connector. This will help your team identify the connector.

Clone and edit the connector service code. For this example, we’ll use the Python framework. Follow these steps:

Clone or fork that repository locally with the following command: git clone https://github.com/elastic/connectors-python.
Native mode must be set to false to use this as a connector client. Make sure "is_native": False is set in connectors-python/connectors/kibana.py file.
Open the config.yml configuration file in your editor of choice.

Replace the values for host, api_key, and connector_id with the values you gathered earlier. Use the service_type value postgresql for this connector.

Expand to see an example config.yml file

Replace the values for host, api_key, and connector_id with your own values. Use the service_type value postgresql for this connector.

elasticsearch:
  host: <https://<my-elastic-deployment.es.us-west2.gcp.elastic-cloud.com>> # Your Elasticsearch endpoint
  api_key: '<YOUR-API-KEY>' # Your API key
  ssl: true
  bulk:
    queue_max_size: 1024
    queue_max_mem_size: 25
    display_every: 100
    chunk_size: 1000
    max_concurrency: 5
    chunk_max_mem_size: 5
    concurrent_downloads: 10
  request_timeout: 120
  max_wait_duration: 120
  initial_backoff_duration: 1
  backoff_multiplier: 2
  log_level: info

service:
  idling: 30
  heartbeat: 300
  max_errors: 20
  max_errors_span: 600
  max_concurrent_syncs: 1
  job_cleanup_interval: 300
  log_level: INFO

# native_service_types:
#  - mongodb
#  - mysql
#  - network_drive
#  - s3
#  - google_cloud_storage
#  - azure_blob_storage
#  - postgresql
#  - oracle
#  - dir
#  - sharepoint
#  - mssql
#  - jira


# Connector client settings
# Uncomment and update the following settings to use the connector client

connector_id: '<YOUR-CONNECTOR-ID>' # Your connector ID
service_type: 'postgresql'  # The service type for your connector

sources:
  # mongodb: connectors.sources.mongo:MongoDataSource
  # s3: connectors.sources.s3:S3DataSource
  # dir: connectors.sources.directory:DirectoryDataSource
  # mysql: connectors.sources.mysql:MySqlDataSource
  # network_drive: connectors.sources.network_drive:NASDataSource
  # google_cloud_storage: connectors.sources.google_cloud_storage:GoogleCloudStorageDataSource
  # azure_blob_storage: connectors.sources.azure_blob_storage:AzureBlobStorageDataSource
  postgresql: connectors.sources.postgresql:PostgreSQLDataSource
  # oracle: connectors.sources.oracle:OracleDataSource
  # sharepoint: connectors.sources.sharepoint:SharepointDataSource
  # mssql: connectors.sources.mssql:MSSQLDataSource
  # jira: connectors.sources.jira:JiraDataSource

Run the connector serviceedit

Now that you’ve configured the connector code, you can run the connector service.

In your terminal or IDE:

cd into the root of your connectors-python clone/fork.
Run the following command: make run.

The connector service should now be running. The UI will let you know that the connector has successfully connected to Enterprise Search.

Here we’re working locally. In production setups, you’ll deploy the connector service to your own infrastructure. If you prefer to use Docker, refer to the repo docs for instructions.

Sync your PostgreSQL data sourceedit

Enter your PostgreSQL data source detailsedit

Once you’ve configured the connector, you can use it to index your data source.

You can now enter your PostgreSQL instance details in the Kibana UI.

Enter the following information:

Host. Server host address for your PostgreSQL instance.
Port. Port number for your PostgreSQL instance.
Username. Username of the PostgreSQL account.
Password. Password for that user.
Database. Name of the PostgreSQL database.
Comma-separated list of tables. * will fetch data from all tables in the configured database.

Once you’ve entered all these details, select Save configuration.

Launch a syncedit

If you navigate to the Overview tab in the Kibana UI, you can see the connector’s ingestion status. This should now have changed to Configured.

It’s time to launch a sync by selecting the Sync button.

If you navigate to the terminal window where you’re running the connector service, you should see output like the following:

[FMWK][13:22:26][INFO] Fetcher <create: 499 update: 0 |delete: 0>
[FMWK][13:22:26][INF0] Fetcher <create: 599 update: 0 |delete: 0>
[FMWK][13:22:26][INFO] Fetcher <create: 699 update: 0 |delete: 0>
...
[FMWK][23:22:28][INF0] [oRXQwYYBLhXTs-qYpJ9i] Sync done: 3864 indexed, 0 deleted.
(27 seconds)

This confirms the connector has fetched records from your PostgreSQL table(s) and transformed them into documents in your Elasticsearch index.

Verify your Elasticsearch documents in the Documents tab in the Kibana UI.

If you’re happy with the results, set a recurring sync schedule in the Scheduling tab. This will ensure your searchable data in Elasticsearch is always up to date with changes to your PostgreSQL data source.

Learn moreedit

« Getting started with the MongoDB connector Getting started with machine learning »