Most recent activity
-
-
I agree. So there are two options:
- Option 1: do this for "end user" - that we're shooting for the use case where user follows the "the few shots" and gets his custom app to work, with properties:
- code sharing is to provide more examples mostly, not as end artifact, millions of apps created & "published"
- no "settings" in apps, apps are simpler and "just do the job"
- to change the app, the user will continue this 'few shots' approach to change whatever setting he likes
- code sharing is to provide more examples mostly, not as end artifact, millions of apps created & "published"
- Option 2: do this for "developer" - the 'few shot' approach helps the developer to create & package the app for "conventional usage scenarios", props:
- installable packaged app is a final artifact of the flow
- apps contain familiar interfaces, settings, menus, etc.
- when developing app feature, this becomes "enhanced copilot"
- installable packaged app is a final artifact of the flow
I'm for option 1 - although this "end user" is probably still a developer enthusiast but personally for me it is exactly the fun I'm looking for, and it shoots directly into the vast space of unknowns of what the future interface would look like.
- Option 1: do this for "end user" - that we're shooting for the use case where user follows the "the few shots" and gets his custom app to work, with properties:
-
I suggest to experiment with completely different UX
Like,
- the user looks at examples gallery (like short screenshot videos gallery)
- picks and idea of what he wants in his watch e.g.
- I want this watch face but smaller font and red color
- or I want an app that looks like "notes" but lists top items from slashdot RSS instead
- I want the watch face from app X but also show temperature from my BLE coffee mug in bottom with a tiny mug icon, and a one-tap 2 min timer for my toaster, but show that only until 10am, afterwards show steps counter... ....
- I want this watch face but smaller font and red color
- ChatGPT generates app, packages, pushes to watch
- If the user doesn't like the result, they tell to fix, or delete the app
- The user asks to add some functions to existing app on his watch from the list of "ideas" in that gallery in p.1
it's just and idea and still needs a lot of polishing and figuring out the details but I guess high-level properties of this are smth like:
- every app is custom
- apps will have any amount of functions. e.g. I can ask the sleep info app to also show weather forecast for today, and ask it to light up when wake up alarm triggers
- very social/storylike experience with users and chatbots discussing these apps
- amount of "apps" created is beyond listable
- no "settings" or other hard-to-use "menus" on the watch - just ask ChatGPT to change news feed from slashdot to hackernews... or add another app...
- the user looks at examples gallery (like short screenshot videos gallery)
-
-
-
So I created the RSS reader app with ChatGPT, not without issues though.
Where it failed was:
- It used incorrect http api instead of Gadgetbridge API, but was able to correct itself when I told to use that.
- It did not know the Gadgetbridge API .resp trick with returned "data" so I had to manually fix
- It completely broke on attempting to write XML parser with RegExp but apparently because RegExp implementation in espruino is too non-standard and I ended up asking it to rewrite the parser wtih string slicing, and it worked.
I believe it would have been doable in one shot if I fed it ahead of time the API nuances with tiny use case examples, specifically the "rough edges" like unsupported RegExp.
I can definitely create a neat "GPTs" demo case where it writes the app in one shot.
Full transcript to the point where I understood it won't fix itself: https://chat.openai.com/share/18d6df03-7af0-41da-a660-f419c1abd686
Final RSS reader app attached.
- It used incorrect http api instead of Gadgetbridge API, but was able to correct itself when I told to use that.
-
-
I'm not familiar with Open Interpreter, looks cool, but from what I can see in the video it is more about controlling the machine with commands that are "interpreted" by LLM, while we do not have this capacity on the watch. We need to write very small code, then store and make launchable & usable.
It would be interesting to look inside, they seem to have some neat ideas and a lot of experience with feeding just enough context for it to work even on open source models.
Btw, ChatGPT seems to know Bangle.js already and can write code without pre-feeding context.
-
These two actually intersect a lot as there is no fine line between "what calls to call and in what order" and "make computer do the thing you want". And technically, they are the same and called A.I. planning and program synthesis and the difference is probably in the grounding level of the generated code- either AI can write calls with unrolled loops call-by-call, or create higher-order logic to describe the program with flow control primitives. Lifting the logic is computationally expensive though.
Where this is becoming really promising is that Bangle.js provides a confined and simple enough "world model" for the LLM to be able to handle it much more efficiently than the "open-world" approach of generic Python code writing.
- We have JS, which is exhaustively represented in any LLM dataset
- We have the realm of "watch applications" which is much smaller scope than any other application scope (probably smallest consumer electronics scope ever)
- We have a significantly smaller scope of library calls and possible I/O scenarios due to, basically, the nature of the bangle.js project
- All these combined make the total problem space size much, much smaller than "open world" rhetoric from OpenAI et al., given that every scope increase step is a combinatorial explosion.
I believe that if automatic programming will ever happen "in a mass consumer product" it will happen here, in the watch applications, first. Or at least in the first batch of breakthroughs.
I don't believe multi-shot (feedback from interpreter) approach is feasible in general, but might improve the accuracy by some significant percentage points. The problem is in the cost of it - while trying a couple attempts might be OK, running thousand run/feedback shots of LLM is extremely expensive to do just to throw away all the missed results.
The problem of writing larger applications can be solved by fine-tuning the LLM by providing more examples of the code, and/or training a world model using some kind of metaheuristic-guided exploration on a simulator and then mass-training on that simulator in a GAN-like approach to get a corpus of feasible working programs. Still orders of magnitude cheaper to do than open-world problem.
As a product development "feature" (if you allow me to wildly hypothesize), one could collect the feature requests from users that the LLM was not able to code in one-shot approach, and then use those to re-train the LLM model so that these "featureless features" will become gradually available for one-shot implementation by user request.
- We have JS, which is exhaustively represented in any LLM dataset
Physicist.