You are reading a single comment by @FManga and its replies. Click here to read the full conversation.
  • Just in case anyone finds this interesting or would like to make suggestions:

    Before trying to implement texture mapping, I decided to try lighting support.
    For that I need to calculate each triangle's normal vector, to find out if it is facing towards the light.
    This requires a bit of math: inverseLength = 1 / sqrt(x * x + y * y + z * z)
    All of the code so far has been using 23.8 fixed-point math, and while the formula above can be implemented without floats, it'd be much slower.

    Of course, we can't use math.h in compiledC code, so this is where the fun begins:

    int __attribute__((naked)) v_invlength(int x, int y, int z) {__asm__ volatile(
            push {lr}
            VMOV.F32 S0, r0
            VCVT.F32.S32 S0, S0, 8
            VMUL.F32 S0, S0, S0
            VMOV.F32 S1, r1
            VCVT.F32.S32 S1, S1, 8
            VMLA.F32 S0, S1, S1
            VMOV.F32 S2, r2
            VCVT.F32.S32 S2, S2, 8
            VMLA.F32 S0, S2, S2
            VMOV.F32  s1, #1.0e+0
            VSQRT.F32 S0, S0
            VDIV.F32 S0, S1, S0
            VCVT.S32.F32 S0, S0, 16
            VMOV.F32 r0, S0
            pop {pc}

    (quotes and newlines omitted for clarity. It would've been nice if we could use raw string literals for inline assembly)

    Unfortunately that doesn't work either because gcc really doesn't want to work with floats. To get around that, I assembled that function locally and replaced each instruction with the result as data:

    int __attribute__((naked)) v_invlength(int x, int y, int z) {__asm__ volatile(
    ".short 0xb500 \n               \n"   // push	{lr}
    ".short 0xee00 \n .short 0x0a10 \n"   // vmov	s0, r0
    ".short 0xeeba \n .short 0x0acc \n"   // vcvt.f32.s32	s0, s0, #8
    ".short 0xee20 \n .short 0x0a00 \n"   // vmul.f32	s0, s0, s0
    ".short 0xee00 \n .short 0x1a90 \n"   // vmov	s1, r1
    ".short 0xeefa \n .short 0x0acc \n"   // vcvt.f32.s32	s1, s1, #8
    ".short 0xee00 \n .short 0x0aa0 \n"   // vmla.f32	s0, s1, s1
    ".short 0xee01 \n .short 0x2a10 \n"   // vmov	s2, r2
    ".short 0xeeba \n .short 0x1acc \n"   // vcvt.f32.s32	s2, s2, #8
    ".short 0xee01 \n .short 0x0a01 \n"   // vmla.f32	s0, s2, s2
    ".short 0xeef7 \n .short 0x0a00 \n"   // vmov.f32	s1, #112	; 0x3f800000  1.0
    ".short 0xeeb1 \n .short 0x0ac0 \n"   // vsqrt.f32	s0, s0
    ".short 0xee80 \n .short 0x0a80 \n"   // vdiv.f32	s0, s1, s0
    ".short 0xeebe \n .short 0x0ac8 \n"   // vcvt.s32.f32	s0, s0, #16
    ".short 0xee10 \n .short 0x0a10 \n"   // vmov	r0, s0
    ".short 0xbd00 \n               \n"   // pop	{pc}

    The result of this abomination is attached below.

    1 Attachment

    • hyperspace.png

Avatar for FManga @FManga started