Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
334 views
in Technique[技术] by (71.8m points)

python - Getting a state from gym-minigrid for Q-learning

I'm trying to create a Q-learner in the gym-minigrid environment, based on an implementation I found online. The implementation works just fine, but it uses the normal Open AI Gym environment, which has access to some variables that are not present, or not presented in the same way, as in the gym-minigrid library. Where in for instance the "Taxi-v3" environment, I can get the current state with env.s and get the state space with env.observation_space.n, but neither of these are available in gym-minigrid.

This is especially challenging to me, as I cannot simply do new_state, reward, done, info = env.step(action) and use that new_state to obtain a value in my Q-table. Using for instance the "MiniGrid-Empty-8x8-v0" environment, doing a step with an action, and printing this next state, I get the following output:

{'image': array([[[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0],
    [2, 5, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]],

   [[2, 5, 0],
    [2, 5, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0],
    [1, 0, 0]]], dtype=uint8), 'direction': 0, 'mission': 'get to the green goal square'}

As you can see, this is not a single value for a state I can use and plug into my Q-table. Is there any way to transform the above into a single value for a specific state, which I can then use to get the entry in my Q-table? Similarly, is there an easy, non-hardcoded way, which I can use to obtain the state space, similar to env.observation_space.n?

I had initially thought to make tuples out of the (position, direction) variables, making a new entry(a dict) with 6 positions for each action, as given by state_tup = ((tuple(env.agent_pos), env.agent_dir)), and use those as keys in a dict. With that I could build a Q-table to let my agent learn on the environment. The only downside here, is that this gets more tricky for other environments that are not the Empty Environment, let's say the "MiniGrid-DoorKey-8x8-v0" environment, where we have randomly placed wall, key, and door. How would I approach getting the state space in that scenario, to make my Q-table?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share

2.1m questions

2.1m answers

63 comments

56.6k users

...