I guess you dont want to use a third-party api for your animations and instead want to learn how to do it yourself.
If that is the case then it would be important to know how are you modeling your animation data.
I know some ways, maybe one of them is similar to what you are doing and all of them require the use of a skeleton which is just an hierarchy of transforms. If you want to be compatible the standards check the h-anim site and name your transforms accordingly to their convenction.
About the data you could attach a piece of data to each transform (bone). It can be physical data like a force vector being applyed at certain transform. It can also be anim data, a table that associates relative time values with with a certain transformation matrix (mostly rotations). I supose you use the timeline method.
In this case its neccessary to register the absolute time when the animation starts (for instance when the forward key is pressed read current time and associate it with the start of your current animation) then wait until a frame is going to be rendered and based on the time passed after the animation started and the current time choose the required matrix for each of your bones.
To simplify things you should allow only one animation to be played at a time, otherwise you will need to create a blending process somehow.