Automated API traversal - program synthesis
If computers could code interactions with systems for us, we wouldn't have to write code, they could interoperate automatically
We have a REST API or standard library functions in our chosen programming language. Each rest call or method call takes arguments and returns data and causes side effects. Take a HTTP client for instance, it takes data in and returns data.
We typically have to write interactions with APIs manually.
What if the programming against APIs could be automatically generated and the source code or AST generated to interact with the API.
We need a way to map a simple definition of the problem into a more detailed representation of the problem which uses APIs
In other words, computers program themselves. Rather than general intelligence we rely on mappings and equivalencies and graph traversal and scheduling.
Say I want to backup files on all my computers, compress them, encrypt them, sign them, deduplicate them, upload.
That's a lot of APIs I need to talk to (encryption, operating system file APIs) and the APIs for them require multiple lines of code to perform. There's also a data flow through a pipeline of steps.
The call graph of the code to fulfil the pipeline looks a certain way.
Each step, "encrypt" takes in an input and has an output.
These need to be defined.
I want to be able to define a list of steps to perform and have the computer synthesise all the intermediate and data transformation steps in between.
I give the computer this list: { "steps": ["deduplicate", "compress", "encrypt", "upload"] }
The computer generates code to do the following, performing all intermediate conversions such as files to bytes.
Upload(encrypt(compress(deduplicate(input)))
The actual code may look more like
A(b(c(d(e(f(g(h(I(j(k(input)))))))))
But this is a simplified example, upload might be multiple steps or multiple steps in the AST.
To implement this I need a database of records representing APIs and their inputs, outputs and types.
We also need a database of code sequences (or code snippets) which are named sequences of code that perform an overarching task - such as signing or encrypting some bytes A sequence references API method calls in order and had variables used between method calls.
We also have sequences to convert between types. And collections of types
I tell the computer what I want to be done in a simple specification of the problem and the computer works out what APIs need to be used to perform the actions. It's synthesis of a set of steps or operators to perform a function.
UseMe files are definitely an application/related to this idea. I really like UseMe files.
The unsolved problem in this idea is the expanding of one graph to another.
Without defining what "encrypt" means - without marking the following code as "encrypt" -
Generate key Get algorithm Algorithm.encrypt(input)
You cannot expand "encrypt" to those steps. You've got to do a lot of work to indicate what encrypt means.
Maybe we can use types to solve this problem. As long as you refer to the correct types, the system can infer the correct code operations.
I understand what you want, and understand that such program synthesis must be possible without a programmer in the loop. I had even attempted to write software for making APIs from software libraries and websites more reusable, which I call "metadrive". I think we need drivers, because not every API is documented and not in the same standard, remember UseMe Files files idea? Once we solve automatic reusability, the graph of functions can definitely be traversed, and paths of execution to get final result composed like routes on a geographical map. Heck of a multilevel problem, but definitely solvable.
[+]
I should point out that OpenAPI is a specification format for rest APIs that could be used to traverse APIs the only problem is that it isn't very good.
If programs advertised their functionality, types, inputs and outputs we could have systems that program themselves to talk to other systems.