• These two actually intersect a lot as there is no fine line between "what calls to call and in what order" and "make computer do the thing you want". And technically, they are the same and called A.I. planning and program synthesis and the difference is probably in the grounding level of the generated code- either AI can write calls with unrolled loops call-by-call, or create higher-order logic to describe the program with flow control primitives. Lifting the logic is computationally expensive though.

    Where this is becoming really promising is that Bangle.js provides a confined and simple enough "world model" for the LLM to be able to handle it much more efficiently than the "open-world" approach of generic Python code writing.

    1. We have JS, which is exhaustively represented in any LLM dataset
    2. We have the realm of "watch applications" which is much smaller scope than any other application scope (probably smallest consumer electronics scope ever)
    3. We have a significantly smaller scope of library calls and possible I/O scenarios due to, basically, the nature of the bangle.js project
    4. All these combined make the total problem space size much, much smaller than "open world" rhetoric from OpenAI et al., given that every scope increase step is a combinatorial explosion.

    I believe that if automatic programming will ever happen "in a mass consumer product" it will happen here, in the watch applications, first. Or at least in the first batch of breakthroughs.

    I don't believe multi-shot (feedback from interpreter) approach is feasible in general, but might improve the accuracy by some significant percentage points. The problem is in the cost of it - while trying a couple attempts might be OK, running thousand run/feedback shots of LLM is extremely expensive to do just to throw away all the missed results.

    The problem of writing larger applications can be solved by fine-tuning the LLM by providing more examples of the code, and/or training a world model using some kind of metaheuristic-guided exploration on a simulator and then mass-training on that simulator in a GAN-like approach to get a corpus of feasible working programs. Still orders of magnitude cheaper to do than open-world problem.

    As a product development "feature" (if you allow me to wildly hypothesize), one could collect the feature requests from users that the LLM was not able to code in one-shot approach, and then use those to re-train the LLM model so that these "featureless features" will become gradually available for one-shot implementation by user request.

About

Avatar for grandrew @grandrew started