-
Notifications
You must be signed in to change notification settings - Fork 41
All the things you need to know in a nutshell
My attempt to create a BASIC interpreter from scratch using frugal programming techniques and minimal memory. The project was inspired by Steve Wozniak's statement that the Apple 1 basic interpreter was the biggest challenge in his professional life. Bill Gates also had started his career and created his fortune by writing a BASIC interpreter for the Altair microcomputer.
The program is written in C. The C stack is not used for arithemtic or runtime pruposes to keep it minimal. The interpreter uses a set of global variables, an own stack and a static memory array to emulate the low memory environment of the early microcomputers.
Arithmetic is 16 bit by default but can be changed to any other value (32 bit and 64 bit integer and 32 bit float has been tested). The full set of logical expresssions with NOT, AND, OR is implemented C style. Conditions are part of the arithemtic and not separate like in many BASIC dialects. This makes the call stack of the recursive descent deeper but simplifies other code. To reduce the memory footprint an own multi purpose stack is added.
Memory access is strictly 8bit. For memory, stack and variable access, array logic is used and not C pointers. This allows all pointers to be integers which can be stored on the arithmetic stack if needed. An exception is string handling.
The interpreted can be compiled with standard gcc on almost any architecture or in the Arduino IDE.
See also:
- https://en.wikipedia.org/wiki/Recursive_descent_parser
- https://rosettacode.org/wiki/BNF_Grammar
- https://en.wikipedia.org/wiki/Tiny_BASIC
- https://github.com/slviajero/tinybasic/wiki/Unforgotten:-Palo-Alto-BASIC
- https://github.com/slviajero/tinybasic/wiki/The-original-Apple-1-BASIC-manual
- https://github.com/slviajero/tinybasic/blob/main/misc/ECMA-55_1st_edition_january_1978.pdf
The language is now fully compatible to Palo Alto BASIC and Apple Integer BASIC offered as an add on to the Apple 1. Floating point and Dartmouth BASIC language features can be switched on.
16bit, 32 bit and 64 bit arithmetic possible with the respective number ranges. 32 bit and 64 bit float is also possible. The internal typedef number_t and the compiler flag HASFLOAT controls this.
26 integer variable A-Z stored in a static array. In addition a heap of dynamic variables is added like in Apple Integer BASIC. Variable names can be A-Z[0-9] for variables, strings and arrays. Arrays need to be created with DIM. Array indices start with 1. They can be as big as the memory permits.
An array @() gives access to the entire fre memory. (See Dr. Wang BASIC in my Wiki). On an Arduino @E() has a similar purpose. It gives access to the EEPROM. These arrays do not have to be created with DIM.
Strings and general arrays have to be dimensioned with the DIM statement and are placed on a heap. If a string is used without dimenioning it with DIM before it is created with a default length of SBUFSIZE which is 32.
@$ is a predefined string. It is stored in the same memory location as the input buffer and overwritten on user input.
Assess to the static vars A-Z, the string @$, and the array @() is approximately 30% faster than to the objects stored on the heap.
Expressions include basic arithmetic, conditions and logical operators AND, OR, NOT.
Conditions work C style as part of the expression evaluation with 0 as FALSE and all other values as TRUE.
Functions ABS, SGN, SQR, POW, RND, FRE, PEEK are implemented.
SQR is an approximate square root. It is exact for perfect squares with integer number types. For floats the standard C library is used.
POW is exponentiation and requires two arguments. It replaces ^
. If floats are used it is just the standard C pow. For integers it is implemented standalone.
RND is a 16 bit constant seed random number generator which always delivers the same sequence.
FRE takes a dummy argument and has the value of the rest of the core basic memory. Negative values of the argument mean EEPROM (see PEEK and POKE).
PEEK access to memory works in a standard way. On an Arduino, negative values of the argument access the EEPROM.
Basic statements taken from Palo Alto BASIC are PRINT, LET, INPUT, IF - THEN, GOTO, FOR - TO - STEP - NEXT - BREAK, GOSUB - RETURN, STOP.
PRINT is standard. Palo Alto BASIC's # notation for formatting is added.
LET is standard and can be ommited for assignments.
INPUT is standard. Entering # as first character of input end the program (Arduino and Unix).
IF THEN is standard, THEN allows a line number or a statement as an argument. THEN can also be omitted.
GOTO expects an expression, evaluates it and then jumps to the line number. This is similar to the old small basic variants.
FOR TO STEP NEXT is implemented with a maximum nesting size of 4. STEP 0 is legal and generates an infinite loop. Leaving away TO and STEP is allowed. Array elements cannot be used as control variables in FOR to reduce complexity.
BREAK is "apocryphal" and not generally found in BASIC. It can be used to abort a FOR loop, clearing the FOR stack.
GOSUB, RETURN work in a standard way but GOSUB accepts expressions like GOTO (it is the same function). GOSUB stack depth is 4 by default.
Program control statements include RUN, CLR, NEW, LIST, DUMP, SAVE, LOAD.
RUN starts the program. It has maximum one argument. Running programs on an Arduino can be interrupted using #. This is not implemented on Unix.
CLR sets all variable to zero.
NEW sets all variables to zero and deletes the entire program.
LIST allows maximum 2 arguments separated by commas (beginning line, end line).
DUMP prints the memory as a decimal dump. It allows a maximum two arguments to specify the memory range.
END ends a program and preserves the interpreter state completely. It is more like STOP in standard basic. SCR is a synonym of END.
CONT restarts the program where it ended, using only the here variable - (buggy, has to be rewritten).
REM is a comment but behaves different than in other BASICs. The tokenizer tries to interpret the REM string which may change its format. I recommend to use string constants in REM.
SAVE "!" on an Arduino writes a program to the EEPROM, LOAD "!" reloads it. An EEPROM autorun feature is added as well. See the page on SD Card DOS for more details. On a platform with a filesystem like Arduino with an SD as well as Mac, Windows, Linux LOAD "filename" and SAVE "filename" just load and save programs. Programs are stored as ASCII file. Without a parameter the default file name is "file.bas". On ESP8266 SPIFFS is used as filesystem in the internal flash allowing true single board standalone computers.
DWRITE, DREAD, AWRITE, AREAD, PINM, DELAY and MILLIS are the Arduino I/O functions.
Optionally PULSEIN and TONE are supported.
POKE is added to access the basic memory. Like for peek, negative values access the EEPROM.
GET reads user input. It is non blocking only on the Arduino and returns one character in an integer variable.
PUT writes one character. Its argument is an integer. It is mostly intended for control characters.
SET is a general command changing internal interpreter settings: implemented so far is the command group 1 trigged by set 1, x. set 1, 1 sets a program saved in EEPROM to autorun on Arduino, set 1, 0 sets the program back to normal mode, set 1, 255 removes the program flag and makes it unloadable. set 2, 1 sets the output device to the LCD display (see the LCD page for more information), set 2, 0 sets it back to serial mode. set 4, 1 activates a PS2 keyboard as input device, set 4, 0 sets this flag back to serial (see the keyboard page for details).
USR is a function calling low level functions from BASIC.
OPEN, CLOSE, DELETE, CATALOG are very basic file I/O functions.
If you compile for float SIN, COS, TAN, ATAN, LOG, EXP and INT are added and do what they are supposed to do.
READ, DATA, RESTORE and ON GOTO/GOSUB is available following the Dartmouth standards.
More arcane features like a MALLOC, FIND and EVAL command are added now if the interpreter is compiled with the DARKARTS flag.
No makefile or headers are provided. All is in one C file, function definitions are in a initial section in the code. The code can be directly compiled in the arduino ide with the build in C++ compiler. The standard serial library costs 180 bytes RAM and 1 kB of flash memory. It is the single biggest library on an arduino. See the article on memory in this Wiki for more information. My main test system is a Arduino Uno, 32 kB flash, 2 kB memory. Other devices I have tested can be found in Tested Hardware.
The interpreter is build with the goal of minimal RAM footprint. Static global variables are used for most interpreter functions. The C stack is avoided. A set of global variables controls the interpreter.
mem: the basic memory array of 8 bit characters.
stack: the arithemtic stack and the integer stack pointer sp.
ibuffer: a generic string input buffer and its pointer bi.
x, y: accumulators.
xc, yc: 8 bit accumulators.
z: a union type as a combined number and 8 bit accumulator. Mostly used for data type conversion.
here, top, himem: program memory pointers implemented as integers.
ir: a general character pointer for the internal string operations.
FOR and GOSUB have own shallow stacks controlled by constants.
Addresses and numbers are 16 bit on an Arduino by default but both can be changed by redefining the number_t and address_t declaration. It is assumed that number_t is always at least as big as address_t. On 64 bit platforms like Mac, Windows and Linux the number type is a 32 bit signed integer and the address type is a 16 unsigned integer by default.
The interpreter is written in 3 layers
Layer 0: variable handling, errors, stack, mathematics and input/output. Layer 0 functions do not change the interpreter state. Also the Arduino I/O functions are layer 0.
Layer 1: lexical analysis, storing and retrieving programs, expression evaluation. Layer 1 changes global variables and the intepreter state.
Layer 2: Statements, control commands and the statement loop.
Strings are implemented Apple Integer BASIC style. See the manual in this Wiki for more details. In expressions strings can be mixed with numbers (sometimes). This is unusual. a$=a
assigns the character value of a
to the first element of the string. a=a$(1)
also works. Strings can be used as character arrays. The implicit character type is signed which is not surprising as the basic type in C is signed character
.
Syntax of the Arduino functions:
"PINM pin, mode": mode is 0 for OUTPUT, 1 for INPUT, 2 for INPUT_PULLUP
"DWRITE pin, value": value can be 0 or 1
"A = DREAD(pin)"
"AWRITE pin, value": value can be between 0 and 255.
"A = AREAD(pin)": A can have a value between 0 and 1023.
"DELAY time": time is measured in miliseconds, maximum delay is 32 seconds.
"MILLIS(divisor)": the milliseconds since program start divided by the argument.
"PULSEIN(pin, value, timeout)": the pulse length on a pin in 10 microsecond units. Timeout is measures in ms. The units work well with ultrasonic pulses even with only 16 bit arithmetic available (see https://www.arduino.cc/reference/de/language/functions/advanced-io/pulsein/ for further info).
The pin value is never checked for validity on the BASIC level. For all analog functions the numeric pin values need to be known. The constant AZERO is the lowest analog input pin. In general it isthe next pin after the last digital pin. On an UNO it is 14 and on a Mega it is 54. See https://www.arduino.cc/reference/de/language/functions/analog-io/analogwrite/ for more information. Other analog pins can be addressed by adding to AZERO.
The program was originally written with a 16 bit arithmetic 8 bit machine in mind. This can be changed by defining the types number_t
, address_t
, and changing the constant strindexsize
. number_t
is the arithmetic type, including the stack and variables. address_t
controls the addressing of memory. 32 and 64 bit arithmetics have been tested on all platforms. number_t
always as to be at least as big as address_t
. strindexsize
is the length of strings. Setting it to 1 creates 1 byte string code allowing a maximum string length of 255. Setting it to two changes the maximum string length to 65535.
number_t
can be set to a float but this should only be done through the HASFLOAT definition. Although the code is mostly number type independent, input and output functions differ for floats and numbers as basic types.
Error handling is very tolerant. Many things are allowed that should be forbidden. This is an anarchic piece of software. There are many small inconsistencies. Picoserial doesn't work on a Mega. The program storage is not a chained list as in many BASIC interpreters but just a list of tokens. In long programs finding a line can take a lot of time. In practice this doesn't matter. Programs on Arduinos are small and the larger systems are so fast that one can afford this overhead.
A main difference to other BASICs is the implementation of FOR loops. Leaving out the beginning, end and step value is allowed and causes defaults 1 for the beginning, the maximum positive value for the end and 1 for the step to be used. Typical code could look like this: FOR I : PRINT I : IF I>10 THEN BREAK : NEXT
.
Some things other BASICs to as commands have been implemented as special variables. @R is the random number seed. Setting it to any values initialises the random number generator. @R=MILLIS(1) sets it to the milliseconds since start of the program.
There is no exponentiation operator ^
this is replaced by the POW() function, no DEF FN
, and no READ/DATA
mechanism so for. READ/DATA
has to be implemented with a fileio or as constants.