Scripting Language

I am trying to learn the process of making script languages. It seems simple, but I am at a lack of proper resource material which demonstrates a working example, which is small. There is this script language called TrumpScript which I found hilarious, but it is a little objectified.

So let me get this straight.
A token is a subset of data from a line. These include: +, numbers, strings, -, , =, *, variable names.
Each token has its priority of execution. In the line: var a = 5 + 6 * 2;, * will be the highest, + is second, var a is third and = is the last.

These tokens are to be written out into a list of commands, making up byte code.
The byte code should be a state based thing in which you push values and call an operation… such as…

push 32
push a
add

Does add pop 32 and pop a from the state? Or should future pushes pop the second push in that pop has to be ran twice to edit push 1? Or am I getting this process wrong?

If we’re talking about a stack based (virtual) machine:

  • the push instruction would take one operand (in your example 32 or a) and push it onto the stack.
  • the add instruction would pop the top 2 values from the stack, add them together, and place the result back onto the stack.

There are many resources on the Internet explaining how basic stack based VMs work.

That makes sense. Thanks for your reply.

This lead me to a blog which details out some nice information

It detailed out for me two types of VMs: Stack and Register. I like the register based one, which was obviously more promoted. I rarely deal with stacks anways.

As of now I am just messing around trying to discover how things work more down deep. Everyone eventually writes the ‘basic’ language at some point. Aha :slight_smile:

That blog is good, though it doesn’t address the most important advantage of a stack-based VM; portability.

A stack-based VM can be run on a processor with any number of registers.
Equally a stack-based VM can be easily transposed onto a register-based VM. (as demonstrated by the translation of java bytecode into dalvik bytecode)

Essentially a stack-based VM is more abstract, and thus more flexible.
True it leaves more work for the JIT to do, but (beyond the most naive of implementations) doesn’t mean it’s necessarily less efficient.

Rather than attempting to write your own VM, I’d suggest just studying Java’s VM specification. It’s an excellent document:

https://docs.oracle.com/javase/specs/jvms/se7/html/index.html

Recommendation: learn Assembly.

I sat down to start learning assembly a while ago, but that never picked up.
I get the idea of what assembly does. Drawing a connection to that is good. I never really got into actually writing assembly. Getting started writing assembly code isn’t necessarily a pick up and go thing. You have to understand everything about your specific environment. I never really got passed a tutorial demonstrating tools to do so, for the tools themselves weren’t available or something - I don’t remember. Maybe it was for linux. I get the whole register business, but didn’t draw a connection to this.

I think I would have to actually write out a VM and try to convert it to another bytecode format to understand the flexibility more than saying, ‘it does and here is why.’ I do get the instructions are the same, but I am not coming to a reasoning on why stack-based would be easier than a register based one. Right now having less registers or more registers doesn’t seem like a problem in this case, but maybe that perspective will change in the future.

I will take a look into the orcale docs for this. I am looking to make a crappy little high level language.