That seems reasonable to me. Getting a baseline like this is always a good idea. As Jeff suggested, there’s a reasonable amount of work being done for each Task. Note that a big-T Task is going to get managed through the DataService, and if it’s not also a ManagedObject, removed after successfully run. In other words, your code above does the following work:
- Get the next task from the scheduler
- Create a transactional context
- Get the Task object from the DataService (see note below)
- Invoke the Task, which in turn (your code) invokes the TaskService, creating a new ManagedObject
- Remove the managed Task
- Commit the Transaction, which cause the next Task to get scheduled
- Repeat…
Note that tasks are managed by the TaskManager, because in the event of failure we need to make sure that they still get run in the future if the current Transaction commits. So, In the case of your Task (which is not a ManagedObject and therefore not already persisted), the TaskManager is managing a new object that keeps a Serialized copy of your Task. This is part of the “fairly un-optimized” code that I mentioned in my previous message. In particular, there’s a lot done here with name-mapping that’s going away. Still, some of this work is unavoidable if you want to guarantee durability.
You’re absolutely right to be digging in at this level, and asking questions about throughput. In fact, I’m glad to see people doing this, because data can only help us improve the system! I just wanted to be clear what the expected model is for the stack, and also that what you’re investigating right now is a version of the system that tries to meet its functional contract, not necessarily one that performs where we’d like it to. We’re working on that now, and I expect the system you see in a few months to have very different performance characteristics 
The 1600 part is about expected, but the slow-down is curious.
The reason I expect the 1600 part is because you’re running in lock-step. You schedule one task, and then run another once the first one is finished. This means that only 1 scheduler thread is running, and so there’s no parallel execution. It also means that you’re seeing well less than 1ms overhead for each Transaction. When I cited 15k tasks, I was talking about parallel execution, which is what an application normally looks like. Now, it turns out that you can’t just run with 10 threads and get a 10x improvement; it’s sad, but true
You should see a much better result, however, and when I address the previous comment about name bindings and management in the DataManager, you will see closer to a 10x improvement here. Note that by default there are only 4 threads running in the scheduler, but if you want I can show you how to tune that.
The slow-down is a little strange. There’s nothing you’re doing that should cause a significant back-log of work in the system. Have you tried running with profiling (as discussed in a previous thread)? It might show a little of what’s going on. If you post what your AppListener code looks like, I will try running your example and see if I can replicate this behavior. If I can figure out what’s going on, I’ll let you know. Thanks again for helping to dig into this!
seth