Problems: Synchronize Postgresql databases.

Postgres synchronization solution P&L: -8 ħ (≃ -73 USD)

I want to take one set of data and synchronize it with another set where there could be changed to columns, new rows and deleted rows

YAML Project Produce

This is a challenging computer problem, data synchronization but I have an idea

I plan to use a inefficient rolling hash and sorting to solve the problem of synchronization. This is similar to rsync

— chronological, 19,220 2:02

Yearly IRR: -1.0000

Latest NPV_{@bank rate=0.1}: -7.9813 ħ (-72.36 USD)

+[result]

+[transfer]

o-92001

-3.0000 ħ (3.0 HUR) (+0.0) I implemented column hashing

I implemented a hash that takes previous data and current column and the previous hash to produce a hash of the entire database.

This allows us to synchronize with the minimum of data transmissions when I write the synchronizer part which shall rehash all its own data, then retrieve the hash of a binary search of the sorted data.

—chronological, 19,244 2:04

t-151001

[target] On target.

19,239 2:03

I have an idea on how to solve the "winning" copy problem.

Have a separate table that hashes every column field and row and gives it a version.

This is the version that is compared.

—chronological, 19,234 2:03

+[result] +[transfer]

o-92001

-5.0000 ħ (5.0 HUR) (+0.0) I created a vagrant setup with 3 postgres installations and created a cronjob to run the sync script

My vagrant setup uses persistent disks and uses ansible to deploy the cronjob and sync script. It is configured by YAML file. I've also installed psycopg2 and I found documentation on how to retrieve the tables in a database in Postgres. It's just a matter of writing the sync algorithm now.

My problem is detecting which side is the winning copy.

When one side changes the data there shall be a different hash and the changed rows are detected. This part I understand.

The problem is detecting which side is the latest change and which side should win. I might need to introduce a version column.

If I had a last updated timestamp field I could use that. Or a version column but I am expressly trying to avoid introducing new columns to the schema. It means it's a lot harder.

—chronological, 19,220 2:02

t-146001