Implementing an Object Index to replace the current 32 bit register array can be done in at least two ways.
Firstly I need to use a SRAM buffer to hold objects rather than the current 32 bit register array, and I need to replace the 16 bit index in the Data Descriptor with a 16 bit Offset. This gives us a maximum of 64Kb data range per module, but that is more than I need. We should probably rename the Data Descriptor to Object Descriptor ?
I can now replace the 16 bit register index with a 32 bit data descriptor everywhere in the instructions, or I can use a 16 bit Object Index and implement a Object Index table. The first will use more flash, but I am a bit concerned about performance on the last one. If I use a lookup table in Flash we will be reading an instruction that will need to lookup multiple data descriptors – and reading from flash is slow. We could of cause move the table to SRAM as this is very small table (40 bytes on 10 variables).
I think I will prefer to just use 32 bit Data Descriptors on the instruction set to keep things simple, but I need to do some math over Flash/SRAM usage and think through the full impact.
Obsolete 21.feb.2017 – New design.
The Call & Raise instructions are identical in design, it is only the op-code that differs. I re-use the extended length option from Assign and add an array of 32 bit Data Descriptors to define the parameters.
Moving on I have several unsolved design questions with the VM:
One is the details around the stack as we enter and leave a function + the details of how we return an event. I have loose ends here I need to dig into.
The second is that parameters in a call can be expressions. My initial thought was to let the assembler split this into a sequence of Assign and a Call if required, but it is fully possible to embed the P-Code directly as a sub-table to a special “type”. The Win on this is that we move back to a 1:1 between Assembler and Instructions as we can also do this on If and While. The VM actually have 6 different If instructions that we could replace with one if we embed the expression P-Code directly.
The third issue is how to handle different data types during math. If we add a uint32 with a float32 the VM need to convert one of them meaning the VM need to know what types they are. Using the 32 bit Data Descriptor solves this, but that would bloat the instructions as every register reference now becomes 32 bits rather than 16. The alternative that I was planning was to use 8 bits extra on each register – if I go down this path I will be using more SRAM.
Another related question is if we should ditch the 32 bit register part and only use Objects? We have the objects anyway and having a generic 32 bit register array in addition complicate things. My original idea was to let the VM only use 32 bit values + it was a bit of influence from legacy Modbus/CAN – But, what if we change this to become a 64K index of objects and let even simple data values be objects … I need to think this one through and calculate on SRAM impact.
As mentioned before – work in progress –
Dealing with math it is quite handy to use a standard algebraic expression format. But, to do so we need a parser that translate the expression into a sequence of simple math instructions. In this case we expect the Assembler to “compile” the expression into P-Code that we later execute as part of the VM’s Assign instruction. The Parser is a C++ class library used by Plain Assembler. The module was actually created prior to Plain as part of a different project.
The parser logic is quite simple
- You parse 1st sequence consisting of start parenthesis, value, end parenthesis and an operator. End of Expression is also an “operator” in this sence.
- You parse 2nd sequence.
- You evaluate 1st and 2nd priority. if 1-op has higher pri than 2-op, you compute 1 by generating a output table entry (P-Code).
- Repeat from 2.
Example 1: 3 + 4 * 5
This is the classic example to test priority as we know that 4 * 5 must be computed before 3 + … The parser will build a tree as listed below:
0 : 3 +
1 : 4 * // + lower pri than * so continue
2 : 5 END // 'END' is lowest pri
This will generate the following P Code:
T1 = 4 * 5
T0 = 3 + T1
In this case we could end up with an Assign instruction with 2 P-Code table entries. I say ‘could’ because the actual parser will in this case detect that it is only constants and pre-calculate the value 20.
I don’t intend to describe all instructions in detail here, but I will draft the more critical ones. Assign is one of those because it contains a small micro-VM of it’s own dealing with mathematical expressions.
The instruction format is illustrated above. It contains a standard op-code, but we now also add a rule that if Length=0x3F we fetch the real length from the 3rd entry. This is to allow larger expression tables if they are needed. The reality is that I don’t expect to see this extended format in use, but I don’t like limitations on issues like this.
The last part of the instruction is P-Code table adapted to our Plain VM. The Expression Parser have computed a list of mathematical operations that if executed in sequence top-down will provide a correct result. P1 & P2 RIX are Register IX. P1 & P2 are flags indicating if this is a Register (0 0) or a TIX ( 0 1). TIX is Temporary Index and refer to the internal, temporary stack on the Assign. TIX=0 is the same as the resulting RIX. I will dig up some old work and describe the Expression Parser next…
This is an updated illustration of the VM Core. I will not describe it in detail right now because I am in the progress of making proper documentation that will be available.
This is the new 16 bit instruction format. I did use a more generic 32 bit instruction set earlier, but concluded that a specific 16 bit was better. Op-Code is reduced to 6 bits and Length is extended to 6 bits. As Length describe additional 16 bit entries it means an instruction can be up to 64 x 16 bit. Only the first 16 bit is mandatory for all instructions.
I have also added a 32 bit Data Descriptor. This contains a 8 bit data type, 8 bit byte length of data and a 16 bit register index to start of data. This is used on Call/Raise instructions as well as on stack entries. The later means I will not push data itself on a the stack – only Data Descriptors. This have some consequences for what happen with parameters on Call/Raise that we need to discuss later.
The 32 bit Data Descriptor is in effect a safe pointer for data. As always – this is work in progress – nothing is written in stone as this needs to be adjusted as I wrap up the VM.
The RTL (Real-Time Linker) part of the VM is a module that receive code that is being downloaded and perform the last step of linking to create an executable binary. The way I do this is actually quite simple. I let the assembler create a special RTL format file that is sent one instruction at the time. The RTL instruction are simple commands as follows:
- Verify firmware name and version
- Verify C module
- Verify Plain Module
- Add Instruction
Each instruction contain a list of components as follows:
- Lookup user opcode
- Insert binary content “as is”.
- Lookup function IX
At the end we are left with a binary instruction array that can be saved into a VM and started. I might add other bits, but I also want to keep this simple with a small footprint.
Interfacing C modules to the VM is done by letting the C modules add their interface to a small repository. We specify name, address of callback function and parameters. As the RTL lookup ix for functions we use the last part of the instruction array as a virtual index. Any attempt to call a function in virtual space will cause the VM to look for the IX in the C Interface repository and call that C function.
What is still a bit in the design is easyIPC objects. Any attempt to read/write virtual registers will call C functions that does the job. But, the actual read/write will need to happen to copied variables and we need a trigger mechanism to actually transfer a collection of variables known as a entity. The issue here is that we want to control that C functions read/write a consistent set of variables. I will return to the details here.
My VM basically have an 16 bit array of instructions and start by decoding and executing the one at index 0. We run a controlled loop executing one instruction at time. A sample C code snip is included below. Each instruction do their job and set the next ix to be used. Once the instruction is executed we return to the OS that will execute system tasks (if requested) before we continue with the next instruction.
void vm_Execute_Instruction(VM *vm, uint32_t *ix)
vm_Decode_Instruction(vm, ix, &Ins);
vm_Error(vm, ix, VM_ERR_UNKNOWN_INSTRUCTION);
In the case of the 32 x Servo/IO controller we will be executing a hard real-time bit-banger as a system task. The RTOS gives me yS accuracy so I can schedule a bit-bang every 0,1 ms with no problem. This gives me 10,000 Hz accuracy on all IO. In this case we do this as a priority in the main loop and only execute the VM on idle time.
Idle time means we have a defined time for a full cycle and will only be executing the VM if the cycle of system tasks are shorter than this. How this will work is that we on each cycle execute exactly one instruction – as we control the speed of everything else we basically let the VM run as fast as possible. We will be running on a 72Mhz RISC processor, so I hope for an average speed of 10,000++ VM instructions per second.
Keep in mind that the VM most of the time will be idle as the logic only respond to events – so I don’t need speed alone – I need a responsive system. Obviously we can code with ever loops if we need to, but the intention is that you respond to an event, process a bit of Logic and og idle. If we need performance we create a module in C and Control it from the VM. I am looking forward to test this and see what we actually get out on performance.
I used to work with sound back in the days and looking into STM32 capabilities I actually wonder if I should attempt to create the basis for a DIY sound synthesizer. The idea is to use a Sound IO Hat with 1-2 channels, add a STM32F405 (or similar) and stack the Hat’s on a Raspberry PI 3 to create a multi channel synthesizer & mixer.
Raspberry PI 3 delivers quite awesome CPU power itself, but a M4 should in theory be capable to add sound effect processing in modules as it contains DSP alike instructions and capabilities.
The electronics in this case will be a MCU + analogue input and output amplifiers. STM32 contains ADC/DAC with 12 bit resolution, but I would in this case look for components with 16 – 32 bit resolution and target a sampling rate up to 64KHz (I think).
I need to return to assembly syntax later. For now I want to implement what we have described on the VM to start testing it using real applications. One of the design issues that I need to solve is storage space for downloaded applications.
STM32F10x series of MCU’s organize their embedded Flash as 1 or 2Kb pages. This was actually far better than I expected. With 128Kb on STM32F105RB we should be able to set aside 16Kb for Plain VM applications. Flash can only be re-written ca 100,000 times, but that is a lot for application download – if you download an application every day it means we can keep doing that for ca 280 years or so.
A bit more complex is the implementation of a persistent storage object mapped to object registers. This is SRAM that we write to flash on power shut-down and reload on MCU start. I need to check if this is possible on my devices. The MCU will give us an interrupt as power drops and with sufficient capacitors we should be able to store a single page or so.
The actual download of VM Applications is done using standard easyIPC. I will set up a few special ID tags for this purpose. The high level part of this protocol allow for transport of tagged data – 2 byte ID followed by an object.
I need to consolidate the notes on the high layer protocol, but the idea is that it is an array of objects with a PID and data-value. Each device will at initialization send a list of it’s objects to provide full plug & play. But, we will also reserve a range of PID’s for common things like download. Done correctly we can download new applications while the old is in full operation and smoothly swap over modules without interrupt applicatons.
I Need to think about the details – to be honest I need to re-visit my old notes about this design and usage. But, I am thinking of a sequence as follows:
- VM Download Initiate Request
- VM Download Proceed
- VM Download Code
- VM Download Commit/Rollback
- VM Download Completed
Top-side initialize the download with a VM Download Initiate request that causes the device to send a VM Download Proceed followed by a series of VM Download Code. The device can control re-send with a new VM Download Proceed while top-side will finish with a VM Download Commit/Rollback. At the end the device should send a VM Download Completed to indicate if the download was completed or aborted.
The concept shown on the Lab PSU earlier is that I can create a battery module as illustrated above. The MCU is simply used to monitor output voltage and current towards preset values that can be controlled by a robot control system if required.
We can also add a charger so we can just connect an external PSU to re-charge the battery. For now I will focus on the PSU module and see if I can get the concept tested.
Returning to the Lab PSU we use the same module as core, but we could also add a programmable, analogue regulator. The later will reduce noise and make the PSU usable on audio applications. The idea is that the Switched PSU regulate the majority of the Power because it is more effective, but we add an analogue regulator on the last 1V or so as a “super filter” as it can deal with ripple and noise better than a switched PSU. As we only regulate the last volt or so the classic power loss will be limited. We can also make this an optional step for high ampere applications (motors etc).