PScript Interpreter – Part 5 – Footprint

I am currently at ca 13Kb Flash usage and 1Kb SRAM usage. SRAM usage can be adjusted, but Flash usage is fixed and will increase with definition of more build-in stuff. This is on a CISC, so I am not 100% sure how that convert to a RISC MCU, but it looks decently good. I notice that expressions bloat a bit so I might need to do a trade-off and simplify expressions or investigate smarter ways of coding. A typical trick to save Flash is to move logic to tables. I do as an example have a lot of strncmp() in if functions that can be optimized for a lower footprint. But, it is not exactly critical yet. My objectice is below 20Kb. I am aware that PBASIC was implemented in 8Kb back in 1978 using assembly, but this is not BASIC. PScript is actually a full, proper programming language despite the “script” part of the name.

One option that is possible is to extend the Interpreter later to support full Plain syntax as an Interpreter. And to be honest – it is not that far from having an Interpreter to building a compiler. The difference is that in an Interpreter I execute as I parse, in a compiler I need to output the stack as byte code. Lets see. I did not intend to let this grow into a full Plain, but I see the option. I will need far more than 20Kb for  full Plain syntax however.

I am also coding in C++, and my expreience witg GCC on STM32 is that I quickly reach 64Kb. I have suffered Flash starvation on 32Kb and 16Kb devices and been forced to switch to C. I have never investigated the reasons as I believe this can be sorted with some GCC tricks – I have a friend that is more specialized on this that I can ask. But, I think a minimum requirement of 128Kb Flash is ok these days.

One option that I consider is for Plain to skip the C/C++ stage and compile straight to assembly. It is years since I even touched assembly and it’s a huge difference between generating C/C++ code versus generating assembly. The later will require that I generate for every single MCU and handle the differences. But, it is possible. The advantage will be smaller footprint as every automation step will bloat, but the advantages of using C/C++ as a middle step is so many that I doubht it will be worth it. Lets finish PScript and see where we go because this is just an experimental break from BSA at the moment.

A bit back to PScript footprint – I think 20Kb is ok if I can maintain that and it actually looks good – I think I have ca 75% of the code in place and the full Interpreter with comments will be < 2000 lines of code. It will be a nice component to add in on a CLI.

PScript Interpreter – Part 4 – Operators

As with any programming language we need to define a list of operators and how they behave. After years of C/C++ programming I simply copy the list from that language as a start. This first list show mathematical operators.

Operator Description
+ Addition: a+b
Subtraction: a-b
/ Divide. a/b
* Multiply: a*b
+a Positive signed variable/number
-a Negative signed variable/number
% Modulo: a%b
Decrement with 1: –a or a–
++ Increment with 1: ++a or a++
~ Binary invert: ~a
| Binary OR
& Binary AND
^ Binary XOR
>> Bit rotate right
<< Bit rotate left

This next list show boolean operators that only is valid for boolean expressions.

Operator Description
== Equal
!= Not equal
> Greater than
< Less Than
>= Greater than or equal
<= Less than or equal
! NOT
|| Logical OR
&& Logical AND

The trick with this is that I might have 3 operators related to a variable – one pre operator like a sign or ++, — etc, a post operator like ++ or — and a main operator. This needs to be handled by my expression parser.

PScript Interpreter – part 3 – Build In Content

PScript would only be able to perform math unless we integrate it to the target platform. In the drawing above I illustrate (1) the build in function “print” and (2) a variable “a”.

Both are declared as a C/C++ struct in a table that point to a function with a specific format that accept a PSCript stack as input. Thes structs are declared on a separate Flash table in C/C++ so that we don’t use SRAM.

Print is a function so we simply parse the call by executing expressions and putting them on the stack. As print accept an endless number of parameters. Each stack entry have a datatype so in the c function we iterate and convert each parameter to text while printing them out.

The variable a is similar – I call a setget function to either set or get the variable. How this is done is up to C code as it might not be an actual variable at all.

The usage of Flash here is important because I usually have limited SRAM, but plenty of Flash. And as function calls create a stack that is released afterwards I only need to account for max stack deptht in SRAM.

To declare names I use static strings which also are on Flash.

To declare parameters I also use a string. ‘.’ means endless list of parameters. ‘*’ means a parameter of any type while ‘A’ to ‘Z’ are specific data types. This makes it easy to define the functions.

PScript Interpreter – Part 2 – Expressions

I must admit that PScript is a nice break from the larger BSA task. Sadly BSA is written in C# and PScript in C++ because PScript is perfect for extending BSA. But, I can actually use C++ from C#, so lets see.

I have created a stack that I use while I interpret and I am very happy with how that worked out, but I have so far only interpreted the core language and I need expressions. I have done several expression parsers both interpreters and parsers in the past, so what I want goes hand in hand with the existing stack. Let’s annotate an example:

uint32 d
d = 3 + 4 * 5
print(d)

This is a classic mathematical expression test that should print out 23. Let’s interpret this step by step:

  1. “uint32 d” will set up an entry on the stack for the variable d.
  2. Next line “d” tells me that this is a variable and as such an assign operation.
  3. “=” confirms that this is an assign operation.
  4. “3+” get added on the stack.
  5. “4*” get added on the stack. As we now have two or more expression components we evaluate if we should calculate and since * have higher priority than + we just continue.
  6. “5 eol” is added to the stack.
  7. As eol (end of line) is the lowest priority we now calculate “4 * 5” and replace the two last stack entries with “20 eol”
  8. We now evaluate the two remaining expression entries and calculate “3 + 20” replacing the two last entries with “23 eol”
  9. With only one entry left, nothing more to parse and “eol” tagged we are finished. The result is 23. So we assign 23 to the variable d.
  10. “print (d)” is parsed. A call to print and d is pushed on the stack and 23 get printed out.

I have only used a very simple example as we also need to support parentheses and variables in addition to constants. We also need to support functions that return values, but it is all pretty much straight forward.

Once expressions and return functions are supported we pretty much have our own math script where we can combine math expressions and logic for more complex calculations. My experience with expressions tells me that 10 stack entries are a very complex expression. This is no limit, but it is an estimate that we need ca 10 entries (120 bytes) in spare while interpreting an expression. After an expression is interpreted we release the stack.

 

PScript Interpreter – Part 1 – Stack

I surpriced myself with how easy it was to implement the PScript Interpreter and would like to share the basic technique that I am using. The language is still under development, but the idea is to have a minimalistic Interpreter that can be used embedded so I need a very small footprint on the core language. Current tests indicate < 20Kb Flash and 1020 bytes SRAM as a minimum. In addition to that we obviously always have the text – the actual source code that is stored. But, as I target small scripts we actually store a lot in 1-2Kb text. And some of the MCU’s have quite a lot Flash/SRAM available. Lets annotate an example:

func test(uint32 v)
    print(v)
end
uint32 b
test(14)
for b=1 to 10
   test(b)
end

The example above is one of my small test scripts, and to illustrate the inside of the Interpreter I want to annotate stack usage and design. Each statement is a struct with variables and pointers that I need. So as I parse the statements I put these entries on a stack that basically is a 1Kb buffer as follows:

As I interpret “func” I add a func entry on the stack. Func body is parsed without execution.

Next I interpret “uint32 b” and insert the variable b on the stack.

Next I interpret “test(14)” and a call to function test is put on the stack. As this is outside a func we execute the function “test” with 14 as parameter.

What happens now is stack magic. I parse func test again and this time I add v as a parameter on the stack before I assign v the value of 14. v is now a variable with local scope inside the function.

Next I execute test by parsing and executing its body. The next statement is print and I do the same trick of calling print with the value 14. At this point we should get “14” printed out. Stack entry for print and parameter is removed after execution.

The next statement is “end” and as the previous stack entry now is the call to test and it’s parameters I now remove these from the stack and continue on the next line after the call to test. I am now only left with func and var b on the stack again.

Next I interpret the “for” statement and put that on the stack. For is now executed so b is assigned the value 1.

Next I interpret “test(b)” and set up a call to test using the current value of b as parameter. We now execute function “test” as illustrated before and will get the value 1 printed out.

Finally I reach “end” and with “for” being my previous entry I increment and test b with +1 until I reach the value 10. With b=10 I delete the for entry on the stack and continue. As we now are at the end of the script we are finished.

It might be details I need to change, but the key concept of the stack usage means that I can execute quite large scripts on very little SRAM usage. I bviously need sufficient stack to support functions and variables, but this simple technique is already working very well. And with a my MCU’s mostly being 168Mhz monsters it is quite fast as well. That said an Interpreter will always be slow compared to native code.

PScript Interpreter

I draftet a specification for a PScript interpreter some time ago and have implemented a minimalistic version. Basically I am down on very small footprints using only 1020 bytes SRAM to execute a decent size script. I actually think I will try implementing this on an Arduino Uno just for fun, it is that small.

Using an interpreter on an Arduino Uno is a joke – I wonder if I should tweak the syntax to look like C# or Java because that is actually doable… it would be a rather minimalistic version – nanoJava :). The main challenge is that those languages are not targeting small footprints so they might bloat a bit too much. PScript is minimalistic and designed for small, functional scripts extending static configuration.

XPortHub3 – Draft

I use XPortHub1 a lot, but have not assembled XPortHub2 with Ethernet yet. What I would like is a XPortHub with radio and I am thinking of 5Km LoRa. With that I could use this as the master on my robots.

I also need a SPI Flash or something to store config. I can as an emergency store that in internal flash, but it’s a bit better with an external Flash.

I like to keep RPI as backbone, but I am not sure about the Watchdog version of DeviceBus.

I need 2 x CAN of which one must be on the backbone.

2 x RS485’s are always handy.

USB is basically mandatory.

The key think with this is that it act as  wireless hub. As for Ethernet, Wifi and Bluetooth I can always add a Raspberry PI.

I have not added to much on this yet, so lets see.

XPortHub3 w/Watchdog

I decided to start making some changes to the DeviceBus experiment. Basically I squeezed the function up in the right corner. Next I will replace RS232 with 2 x RS485 and add a LoRa or maybe Wifi and LoRa. I have little usage for Wired Ethernet on mobile devices, but I do lack radio options.

BSA – XPortHub2/32xIO/Watchdog

I have a version of XPortHub that works fine, so I have not been in any hurry to make this 2. unit with Ethernet, but I will finally order the PCB’s. XPortHub is very simple as it is only a switch between USB, CAN, RS485, RS232 and Ethernet. But, that exact functionality is what I use the most. The only thing I would change on this is the Ethernet connector as I want to switch to a low profile unit that is lower, but also partly below the PCB to make it easier to put this into a stack.

Just to remiond everyone – all these card are in Raspberry PI Hat format, but they are not depending on Raspberry PI. The backbone in mostly 5V Supply, CAN, SPI (DeviceBus) allowing the cards to be clicked together as a system. Behind the J45 connector  is also a Flash SPI and TTL UART connection and the the PCB backside is the battery for the RTC.

  • DeviceBus/Raspberry PI Backbone bus allowing multiple cards to be used together.
  • 2 x CAN HS ports
  • 2 x RS485 ports
  • 2 x RS232 ports
  • 1 x USB port. Unit can be powered through USB.
  • 1 x Ethernet port.
  • 1 x TTL UART port.
  • SPI Flash.
  • RTC w/x-tal & battery.

I have made ca 12 different Raspberry PI Hat designs that I will start upgrading and adding some new designs. The STM32F405RG ticking at 168Mhz w/1Mb Flash and 192Kb SRAM is very powerfully. And the LQFP64 package is just right for these boards. What they need however is an upgrade and more Software.

More important is that I will start interconnecing these with BSA, meaning I get GUI and config utilities as well as the capability to design logic that executes on XPortHub from BSA.

This mock-up shows XPortHub together with a 32xIO board. Each IO pin have it’s own connector with signal/ground. This can be used for multiple purposes as all pins are both input and output and have a decebt TVS protection. More important it has the same DeviceBus/Raspberry PI Backbone so it can be stacked. This is a dead simple, but yet powerfully board.

DeviceBus consist of 5V, 1 x CAN, 1 x SPI and 3 x IO pins. SPI can be used in full or half duplex mode. In Half duplex mode the SPI bus can act as a TDM allowing any device to send. SPI is much higher speed than CAN, while CAN arbitration means any board can transmit. This is easier for start-up config etc. And tbh CAN is sufficient for most needs. Sadly RPI can’t communicate on CAN, only on SPI. But SPI is perfect for larger data amounts and higher bandwidth. These are so far the only boards that have the full DeviceNET added.

This 3rd board is a test board where I consider adding a 2nd MCU as Watchdog with capability to switch on/off the main MCU. I used a XPortHub, ripped off the right side and added the new functionality. I have to go through the articles and concept, but I will order the board for fun. Part of the idea her is that the 2nd MCU also focus on high speed SPI using a high speed, full duplex UART for the main MCU. This will allow the main MCU to avoid the challenge of high speed SPI bus, but it add space and complexity. My opinion is that it is a bit of over-design, but well – lets order the PCB – fun for dark & dingy days in authumn 🙂 – it is an experiment.

 

BSA – Example UML Class Diagram

This example lack a few details, but it illustrate the need for a specialized line where we in this case have a common horizontal level. Without this the diagram would be messy. In this case I have a common horizontal line because all symbols are at the same Y position, but I will need to be able to control that. The solution is that these exact lines need a fixed horizontal level rather than just the mid position as used here and I need to be able to manually control the level of that combined line.

You don’t need to draw class diagrams like this with parts of the line common, but it makes it easier to read. This is just a mockup example so i will return to this detail later as I do UML Class Diagrams properly.

I learned Object Orrientation with OMT which is one of the methods used to create UML and class diagrams are IMO the most important diagram in modelling information because a single, quite simple diagram helps you remember better. The more you remember the better the structure and quality of your system becomes. In BSA I will use classes and class diagrams to model data first and foremost, but you will be able to locate functions drawn in PLD to classes as well.

Precentation is a topic I will return to. Physical classes can contain a lot of details that I am not interested in showing on “this” exact diagram, so I need a precentation layer where I can select parts of a model to draw a non-physical diagram for the purpose of illustrating selected functionality.

The most interesting detail in this example is actually the speed of drawing this example – this is done with BSA and it was drawn in a few seconds. If you try this in most available UML tools you will notice that it actually is a bit of work to get this drawn.