neural network - Reinforcement learning toy project -


My Toy Project & amp; Learning reinforcement applies:
- An agent tries to reach a secure target state "safely" & amp; "Quick" ....
- But projectiles and rockets are the way the agent is launched.
- The agent can determine the position of the rocket - some noise - only if they are "near"
- to avoid the agent crashes in these rockets should learn.
- Near the agent - Rechargeable with time - Fuel The agent is consumed in speed
- Continuous Actions : Moving forward - Turning with angles


I
- I think it is PODP, but can I prepare it as an MDP and only ignore noise ?
- If POMDP is recommended, then what is the recommended method to evaluate the possibility?
- Which is better to use in this case: Value functions or policy changes?
- Can I use NN for model of environmentally mobility instead of using clear explanations?
- If yes, should a specific type / model of NN be recommended?
- I think the Acti Open should be separated, right?

I know there will be time and effort in learning about this subject, but I am curious ..
You can answer some questions if you do not answer all If it is your first experiment, learn the reinforcement ... In comparison, something very simple would recommend starting with something you can start simple to hang things up and then for a complex project like this. You can proceed. I have trouble with Pomdip and I am working for a while in RL. Now I will try to answer questions which I can do.

I think it is Pomdip, but can I consider it as an MDP and just ignore the noise?

Yes. The Pomdip partially stands for Observative Markov decision process. The partially watched part shows the fact that the agent does not fully know its position, but it can guess on the basis of estimates. In your case, you have the location of the rocket in the form of an observation which can be some noise, and based on the knowledge of the previous knowledge, you can update it, it is believed that the missile is where it adds very complexity. It will be easy to use missile locations completely and will not deal with uncertainty. So you do not have to use Pomdip.

If POMDP is, what is the recommended way to evaluate the probability?

I do not understand your question. You will use some form of Bayes rule i.e., you have some kind of distribution that is your trust state (probability of being in any state), this will be your pre-distribution and depending on the observation you will adjust it and Will receive further delivery. Look at Bayes rules if you need more information.

Which is better to use in this matter: Value work or policy change?

Most of my experience is using value functions and it seems relatively easy to use / understand. But I do not know what else to say to you. I think that this is probably your choice, I have to work on the project to make better choices.

Can I use the NN for the dynamics of the environment instead of using clear equations? If yes, should a specific type of model / model be recommended?

me nn Nothing is known about using the atmosphere of the model, sorry.

I think the work should be independent, right?

Yes. There should be a discrete list of your work, and a discrete list of states. Normally the algorithm will choose the best action for a given state, and for the simplest algorithm (something like Q allerning), you will keep track of values ​​for each given state-action pair.

If you want to see a simple example of RL algorithm, you have a very simple base class, and this example (Python ). Abstract_rl means to expand the class to RL functions, but it is very simple. Simple_rl.py is an example of a simple task (this is the goal of a position and it uses cue alignment as an algorithm) can be run using base_rl and will print a few lines showing the reward over time. Neither is very complicated, but if you are just beginning, you can get help in giving some ideas. I hope that this help will tell me if you have more or more specific questions.


Comments