Postgres synchronization solution P&L: -8 (≃ -96 USD)
This is a challenging computer problem, data synchronization but I have an idea
I plan to use a inefficient rolling hash and sorting to solve the problem of synchronization. This is similar to rsync
I implemented a hash that takes previous data and current column and the previous hash to produce a hash of the entire database.
This allows us to synchronize with the minimum of data transmissions when I write the synchronizer part which shall rehash all its own data, then retrieve the hash of a binary search of the sorted data.
I have an idea on how to solve the "winning" copy problem.
Have a separate table that hashes every column field and row and gives it a version.
This is the version that is compared.
My vagrant setup uses persistent disks and uses ansible to deploy the cronjob and sync script. It is configured by YAML file. I've also installed psycopg2 and I found documentation on how to retrieve the tables in a database in Postgres. It's just a matter of writing the sync algorithm now.
My problem is detecting which side is the winning copy.
When one side changes the data there shall be a different hash and the changed rows are detected. This part I understand.
The problem is detecting which side is the latest change and which side should win. I might need to introduce a version column.
If I had a last updated timestamp field I could use that. Or a version column but I am expressly trying to avoid introducing new columns to the schema. It means it's a lot harder.