You are reading a single comment by @fanoush and its replies. Click here to read the full conversation.
  • The Pico code is for a Cortex M0+ and what you have in your snippet is incomplete.
    The correct code for a Cortex M4 is this (from assembled+disassembled libgcc, corrected to return the result in S0 instead of R0, since the compiler ignores the softfp ABI for functions in the same compilation unit, apparently):

    let c = E.compiledC(`
    // int boop(double)
    
    extern "C" float __attribute__((naked)) d2f(double) {__asm__ volatile(R"(
     	mov.w	r2, r1, lsl 1
     	subs.w	r3, r2, 0x70000000
     	itt	cs
     	subscs.w ip, r3, 0x200000
     	rsbscs	ip, ip, 0x1fc00000
     	bls.n	2f
    1:
     	and.w	ip, r1, 0x80000000
     	mov.w	r2, r0, lsl 3
     	orr.w	r0, ip, r0, lsr 29
     	cmp.w	r2, 0x80000000
     	adc.w	r0, r0, r3, lsl 2
     	it	eq
     	biceq.w	r0, r0, 1
     	b 6f
    2:
     	tst.w	r1, 0x40000000
     	bne.n	3f
     	adds.w	r2, r3, 0x2e00000
     	itt	lt
     	andlt.w	r0, r1, 0x80000000
     	blt	6f
     	orr.w	r1, r1, 0x100000
     	mov.w	r2, r2, lsr 21
     	rsb	r2, r2, 24
     	rsb	ip, r2, 32
     	lsls.w	r3, r0, ip
     	lsr.w	r0, r0, r2
     	it	ne
     	orrne.w	r0, r0, 1
     	mov.w	r3, r1, lsl 11
     	mov.w	r3, r3, lsr 11
     	lsl.w	ip, r3, ip
     	orr.w	r0, r0, ip
     	lsr.w	r3, r3, r2
     	mov.w	r3, r3, lsl 1
     	b.n	1b
    3:
     	mvns.w	r3, r2, asr 21
     	bne.n	5f
     	orrs.w	r3, r0, r1, lsl 12
     	ittt	ne
     	movne.w	r0, 0x7f000000
     	orrne.w	r0, r0, 0xc00000
     	bne	6f
    5:
     	and.w	r0, r1, 0x80000000
     	orr.w	r0, r0, 0x7f000000
     	orr.w	r0, r0, 0x800000
    6:
      vmov s0, r0
     	bx	lr
     	nop
    )");}
    
    
    int boop(double d) {
      return d2f(d) * 3;
    }
    `);
    
    print('boop:', c.boop(3.5)); // 3.5 * 3 = 10.5 cast to int = 10
    
    

    Note that you have to call the function explicitly, I couldn't use a float cast and get it to link as __aeabi_d2f. It might work if we compile the assembly into a library and then link to that. If one were to go that far, might as well link to libgcc and let the linker throw away all the stuff that isn't used.

  • why do you think so?

    -  a6:	f7ff ffad 	bl	4 <__aeabi_d2fx>
    -  aa:	eef0 7a08 	vmov.f32	s15, #8	; 0x40400000  3.0
    -  ae:	ee20 0a27 	vmul.f32	s0, s0, s15
    -  b2:	eefd 7ac0 	vcvt.s32.f32	s15, s0
    -  b6:	ee17 0a90 	vmov	r0, s15
    -  ba:	bd08      	pop	{r3, pc}
    
    +  a6:	f7ff ffad 	bl	4 <__aeabi_d2f>
    +  aa:	ee07 0a10 	vmov	s14, r0
    +  ae:	eef0 7a08 	vmov.f32	s15, #8	; 0x40400000  3.0
    +  b2:	ee67 7a27 	vmul.f32	s15, s14, s15
    +  b6:	eefd 7ae7 	vcvt.s32.f32	s15, s15
    +  ba:	ee17 0a90 	vmov	r0, s15
    +  be:	bd08      	pop	{r3, pc}
    

    The top part of the diff is calling the function I pasted earlier (d2f), in the same compilation unit.
    It gets the return value directly from S0.

    The bottom part is calling libgcc's __aeabi_d2f, in a library. The difference is the vmov s14, r0, it loads a float value from an int register.

    Both used the exact same compilation flags.

    But anyway, why to use doubles? Cortex M4F can do only floats in hardware, isn't this enough for most stuff discussed here?

    Because JavaScript uses doubles you need to be able to convert to float and back.

    and btw it is possible to set calling convention per method via attributes and when calling one from another it works https://gist.github.com/fanoush/85ebe50c­5c4a54ca15bf2867e27f7cd3

    Interesting, I didn't know about those. That's probably what's going on in the diff above. I'll do some more testing.
    Edit: It seems d2f gets called with hardfp by default and adding __attribute__((pcs("aapcs"))) it gets called by softfp (tested on the online EspruinoCompiler, not my own). Anyway, d2f can be called by either mode now since it returns the result in both R0 and S0.

About

Avatar for fanoush @fanoush started