-
• #2
Could you point ChatGPT to all available Bangle.js/Espruino documentation now when doing initial prompts setting up?
I think I tried that a while back but without too much luck. It was kind of cool using it as an aid, but not super helpful yet. I guess mainly since it probably didn't/couldn't parse websites and I'm not sure the docs would fit within the token limit.
-
• #3
"Point to" - no, still not enough compute/context
But doing smart prompting will definitely do it.
I would approach it like this:
- user writes the description of a program
- in the "first-level" prompt there is a pre-loaded summarized list of applications, so ChatGPT selects applicable programs to use as examples
- same thing with documentation pages, select what is likely to be relevant topic-wise
- in second turn, load the examples and documentation pages into context, preempting by relevancy score
- write code based on example, potentially run simulator to see any useful feedback
- if the software runs on a simulator, package as app, if it does not - show the code to fix manually
Even better results will be with changing current applications - like changing / inventing new watch faces by modifying code
- user writes the description of a program
-
• #4
If it gets to that point it would be very cool!
I feel like you've almost got two types of programming - you've got one where you have a task and there's some real problem-solving involved in trying to figure out how to make a computer do the thing you want.
Then you've got the other one where the task is quite straightforward, but you just have to figure out what names of functions to call and in what order - and normally you'd be spending all your time consulting API references, Google and StackOverflow
I reckon the first one, I'm not sure an AI will help much with at the moment - but the second one it's already semi-useful at and will end up being awesome.
I'm sure I saw something a few months ago where someone had the AI actually rigged up to Python and it was able to try things out for itself and iterate?
I guess if it were possible to actually connect the AI to your Bangle so it could run the code it created, it could iterate itself and might actually be able to come up with something working in its first answer
-
• #5
These two actually intersect a lot as there is no fine line between "what calls to call and in what order" and "make computer do the thing you want". And technically, they are the same and called A.I. planning and program synthesis and the difference is probably in the grounding level of the generated code- either AI can write calls with unrolled loops call-by-call, or create higher-order logic to describe the program with flow control primitives. Lifting the logic is computationally expensive though.
Where this is becoming really promising is that Bangle.js provides a confined and simple enough "world model" for the LLM to be able to handle it much more efficiently than the "open-world" approach of generic Python code writing.
- We have JS, which is exhaustively represented in any LLM dataset
- We have the realm of "watch applications" which is much smaller scope than any other application scope (probably smallest consumer electronics scope ever)
- We have a significantly smaller scope of library calls and possible I/O scenarios due to, basically, the nature of the bangle.js project
- All these combined make the total problem space size much, much smaller than "open world" rhetoric from OpenAI et al., given that every scope increase step is a combinatorial explosion.
I believe that if automatic programming will ever happen "in a mass consumer product" it will happen here, in the watch applications, first. Or at least in the first batch of breakthroughs.
I don't believe multi-shot (feedback from interpreter) approach is feasible in general, but might improve the accuracy by some significant percentage points. The problem is in the cost of it - while trying a couple attempts might be OK, running thousand run/feedback shots of LLM is extremely expensive to do just to throw away all the missed results.
The problem of writing larger applications can be solved by fine-tuning the LLM by providing more examples of the code, and/or training a world model using some kind of metaheuristic-guided exploration on a simulator and then mass-training on that simulator in a GAN-like approach to get a corpus of feasible working programs. Still orders of magnitude cheaper to do than open-world problem.
As a product development "feature" (if you allow me to wildly hypothesize), one could collect the feature requests from users that the LLM was not able to code in one-shot approach, and then use those to re-train the LLM model so that these "featureless features" will become gradually available for one-shot implementation by user request.
- We have JS, which is exhaustively represented in any LLM dataset
-
• #6
What do you think about Open Interpreter
, could it be used together with the Espruino Command-line tool to automate development? (I don't have experience with the Command-Line tool myself) -
• #7
I'm not familiar with Open Interpreter, looks cool, but from what I can see in the video it is more about controlling the machine with commands that are "interpreted" by LLM, while we do not have this capacity on the watch. We need to write very small code, then store and make launchable & usable.
It would be interesting to look inside, they seem to have some neat ideas and a lot of experience with feeding just enough context for it to work even on open source models.
Btw, ChatGPT seems to know Bangle.js already and can write code without pre-feeding context.
-
• #8
Last night I let a gpt4 assistant I set up as 'Espruini Guide' start writing a work logger app I first thought up on the App Ideas conversation (I probably could have done a better job in this setup phase, adding more documentation or something).
The first bump in the road was it tried using a 'append' storage method that espruino doesn't have. I just gave it the error from the console and it then switched to a write call which worked. It created the file with the first line, but then it could not be appended to. I told it this but it struggled to figure a correct solution. Then I asked something like 'but isn't there another way to write to storage as described in the hardware reference?', and with that it figured out it should be using the 'open' method. With that I had a working proof of concept!
I've since let it implement a menu with options to add annotations to a work session, as well as creating and switching between different tasks (each writing to separate files). It struggled with clearing and setting up input handlers when moving between parts of the ui, so I fixed that myself, mostly without prompting 'Espruino Guide'. It needed some help with adhering to system theme as well.
Pretty neat, but not entirely trivial.
Edit: Added the code, screenshots and a log with entries and annotations.
Edit2: Now saving my progress here: https://github.com/thyttan/BangleApps/blob/worklog/apps/worklog/app.js
7 Attachments
-
• #9
This is super cool!
I will see if I can enhance this process and get back
-
• #10
One thing I think I noticed, is in order to keep the cost of api calls down I can clear the chat and reinitialize the 'Espruino Guide', pasting the current iteration of the full code in the first user prompt (I do it inside
code quotes
) together with the next change I want made. -
• #11
The setup (see attachment):
Instructions:
You are an expert on the Espruino platform and specifically at developing for the Bangle.js 2 watch.
You know the Hardware Reference at "https://www.espruino.com/Reference" by heart and also all relevant repositories on "https://github.com/espruino".
1 Attachment
-
• #12
So I created the RSS reader app with ChatGPT, not without issues though.
Where it failed was:
- It used incorrect http api instead of Gadgetbridge API, but was able to correct itself when I told to use that.
- It did not know the Gadgetbridge API .resp trick with returned "data" so I had to manually fix
- It completely broke on attempting to write XML parser with RegExp but apparently because RegExp implementation in espruino is too non-standard and I ended up asking it to rewrite the parser wtih string slicing, and it worked.
I believe it would have been doable in one shot if I fed it ahead of time the API nuances with tiny use case examples, specifically the "rough edges" like unsupported RegExp.
I can definitely create a neat "GPTs" demo case where it writes the app in one shot.
Full transcript to the point where I understood it won't fix itself: https://chat.openai.com/share/18d6df03-7af0-41da-a660-f419c1abd686
Final RSS reader app attached.
1 Attachment
- It used incorrect http api instead of Gadgetbridge API, but was able to correct itself when I told to use that.
-
• #13
... and also Ember BLE mug temperature reader that I wrote with ChatGPT
1 Attachment
-
• #14
Maybe we could compile a list of stuff it gets wrong together with instructions/nudges to what it should do instead? And that list could be included in the setup phase?
-
• #15
Yes that was exactly my thinking. We should also think of compatible method to deploy & publish those apps, as regular DevOps doesn't make sense here as presumably the app is defined by English not code.
-
• #16
We should also think of compatible method to deploy & publish those apps, as regular DevOps doesn't make sense here as presumably the app is defined by English not code.
I'm not sure I follow. Do you suggest we store the prompt in english on the BangleApps repository instead of the code we get from working with the gpt-model?
Isn't a good intermediate solution to make the gpt-assistant as competent as possible by having good initial setup prompts, and then when working app code has been generated we store that as an app on the BangleApps repo like for all other apps?
Another question I have not thought about is copyright and licences when it comes to apps created this way. Can they safely be added to the repo, @Gordon? (I guess this could be an argument for storing the prompts rather than the generated code @grandrew. I just feel like it's much harder to get it to be reliable...)
-
• #17
I suggest to experiment with completely different UX
Like,
- the user looks at examples gallery (like short screenshot videos gallery)
- picks and idea of what he wants in his watch e.g.
- I want this watch face but smaller font and red color
- or I want an app that looks like "notes" but lists top items from slashdot RSS instead
- I want the watch face from app X but also show temperature from my BLE coffee mug in bottom with a tiny mug icon, and a one-tap 2 min timer for my toaster, but show that only until 10am, afterwards show steps counter... ....
- I want this watch face but smaller font and red color
- ChatGPT generates app, packages, pushes to watch
- If the user doesn't like the result, they tell to fix, or delete the app
- The user asks to add some functions to existing app on his watch from the list of "ideas" in that gallery in p.1
it's just and idea and still needs a lot of polishing and figuring out the details but I guess high-level properties of this are smth like:
- every app is custom
- apps will have any amount of functions. e.g. I can ask the sleep info app to also show weather forecast for today, and ask it to light up when wake up alarm triggers
- very social/storylike experience with users and chatbots discussing these apps
- amount of "apps" created is beyond listable
- no "settings" or other hard-to-use "menus" on the watch - just ask ChatGPT to change news feed from slashdot to hackernews... or add another app...
- the user looks at examples gallery (like short screenshot videos gallery)
-
• #18
That sounds like a cool end point! To me it feels like a big task getting all those steps to work. Doesn't mean it can't be done of course, but I would think it makes sense to start with getting it to reliably create app code in as few shots as possible. When that's nailed down it would be really cool to look at those other things though. My two cents!
-
• #19
I agree. So there are two options:
- Option 1: do this for "end user" - that we're shooting for the use case where user follows the "the few shots" and gets his custom app to work, with properties:
- code sharing is to provide more examples mostly, not as end artifact, millions of apps created & "published"
- no "settings" in apps, apps are simpler and "just do the job"
- to change the app, the user will continue this 'few shots' approach to change whatever setting he likes
- code sharing is to provide more examples mostly, not as end artifact, millions of apps created & "published"
- Option 2: do this for "developer" - the 'few shot' approach helps the developer to create & package the app for "conventional usage scenarios", props:
- installable packaged app is a final artifact of the flow
- apps contain familiar interfaces, settings, menus, etc.
- when developing app feature, this becomes "enhanced copilot"
- installable packaged app is a final artifact of the flow
I'm for option 1 - although this "end user" is probably still a developer enthusiast but personally for me it is exactly the fun I'm looking for, and it shoots directly into the vast space of unknowns of what the future interface would look like.
- Option 1: do this for "end user" - that we're shooting for the use case where user follows the "the few shots" and gets his custom app to work, with properties:
-
• #20
Just coming back to Open Interpreter, the point I meant to make wasn't to get Open Interpreter to run on the Bangle.
But rather to let Open Interpreter run on the computer - creating, running and debugging apps in a automated fashion.
-
• #21
I experimented some today with having a setup where I advice agains stuff, like trying to use a
append
storage method. I also tried to write out a longer somewhat detailed description of the worklogger app as a first user prompt.I get the sense it will be much harder to give it a bigger instruction and expect good results. It seemed to perform better the way I did it the first time - starting with a small and simple description and iterating in smaller steps to arrive at my vision for the worklog app.
-
• #22
As Open Interpreter is just an alternative (and weaker) version of ChatGPT, I would probably see using open interpreter as using a Motorola 6800 when Intel 8086 was already available...
-
• #23
It depends on the LLM you chose to use under the hood. As I understand it you can configure it to do api calls to OpenAI GPT-4.
I'm still processing this, but it is insane
Imagine a watch that has no functions at start.
You describe functions in English, ChatGPT writes code.
I was able to combine ChatGPT with code examples and it wrote me almost-working code to read and display current tea temperature from my Ember smart mug over BLE. For smaller programs it would just write working code directly, potentially with no user interaction needed, today.
Here is how it works:
I believe it can be automated to the point of just asking for specific functionality and it writes code. Maybe it will need emulator feedback to fix errors, and some additional documentation trickery - like asking it to summarize documentation first and paste that summary as initial prompt so it is more aware about custom function call patterns.
This is absolutely stunning. Wow. I have the smartwatch with all imaginable functions at once.