You are reading a single comment by @grandrew and its replies.
Click here to read the full conversation.
-
What do you think about Open Interpreter
, could it be used together with the Espruino Command-line tool to automate development? (I don't have experience with the Command-Line tool myself)
These two actually intersect a lot as there is no fine line between "what calls to call and in what order" and "make computer do the thing you want". And technically, they are the same and called A.I. planning and program synthesis and the difference is probably in the grounding level of the generated code- either AI can write calls with unrolled loops call-by-call, or create higher-order logic to describe the program with flow control primitives. Lifting the logic is computationally expensive though.
Where this is becoming really promising is that Bangle.js provides a confined and simple enough "world model" for the LLM to be able to handle it much more efficiently than the "open-world" approach of generic Python code writing.
I believe that if automatic programming will ever happen "in a mass consumer product" it will happen here, in the watch applications, first. Or at least in the first batch of breakthroughs.
I don't believe multi-shot (feedback from interpreter) approach is feasible in general, but might improve the accuracy by some significant percentage points. The problem is in the cost of it - while trying a couple attempts might be OK, running thousand run/feedback shots of LLM is extremely expensive to do just to throw away all the missed results.
The problem of writing larger applications can be solved by fine-tuning the LLM by providing more examples of the code, and/or training a world model using some kind of metaheuristic-guided exploration on a simulator and then mass-training on that simulator in a GAN-like approach to get a corpus of feasible working programs. Still orders of magnitude cheaper to do than open-world problem.
As a product development "feature" (if you allow me to wildly hypothesize), one could collect the feature requests from users that the LLM was not able to code in one-shot approach, and then use those to re-train the LLM model so that these "featureless features" will become gradually available for one-shot implementation by user request.