StarcraftII Reinforcement Learning Agent

This project is a implementation of DeepMind's PySC2 RL agent. It forces on comparing the performances of different neural networks.

    • Before Training
placeholder
    • After Training
placeholder

Deepmind's PySC2

PySC2 is DeepMind's Python component of the Starcraft II Learning Environment(SC2LE) which have full external control of StarCraft II

placeholder

Instead of normal RGB pixels, the game exposes feature layers. There are ~20 of them broken down between the screen and minimap.

Minimap Feature Layers:
     • height_map: Shows the terrain levels.
     • visibility: Which part of the map are hidden, have been seen or are currently visible.
     • creep: Which parts have zerg creep.
     • camera: Which part of the map are visible in the screen layers.
     • player_id: Who owns the units, with absolute ids.
     • player_relative: Which units are friendly vs hostile. Takes values in [0, 4], denoting [background, self, ally, neutral, enemy] units respectively.
     • selected: Which units are selected.
Screen Feature Layers:
     • height_map: Shows the terrain levels.
     • visibility: Which part of the map are hidden, have been seen or are currently visible.
     • creep: Which parts have zerg creep.
     • power: Which parts have protoss power, only shows your power.
     • player_id: Who owns the units, with absolute ids.
     • player_relative: Which units are friendly vs hostile. Takes values in [0, 4], denoting [background, self, ally, neutral, enemy] units respectively.
     • unit_type: A unit type id
     • selected: Which units are selected.
     • hit_points: How many hit points the unit has.
     • energy: How much energy the unit has.
     • shields: How much shields the unit has. Only for protoss units.
     • unit_density: How many units are in this pixel.

A3C Algorithm

A3C is short for Asynchronous Actor-Critic Agents. The A3C algorithm was released by Google’s DeepMind group in Jun 2016, and it is a big improvement of DQN. It was faster, simpler, and able to achieve better result on machine learning tasks.
In my implementation,the outline of the architecture is:

    • A3C_Network -- This is the neural network class and it contains all the TensorFlow ops.
    • Worker -- This is the interacting class with the environment. It contains a copy of A3C_Network. it interact with the environment and update the global network.
placeholder
    • General -- Generate all the workers instances and run them at the same time.
placeholder

Maps

The map I'm using is made by Blizzard which is called DefeatRoaches. And also I made some changed on unit type to test the agent

Description
A map with 9 Marines and a group of 4 Roaches on opposite sides. Rewards are earned by using the Marines to defeat Roaches, with optimal combat strategy requiring the Marines to perform focus fire on the Roaches. Whenever all 4 Roaches have been defeated, a new group of 4 Roaches is spawned and the player is awarded 5 additional Marines at full health, with all other surviving Marines retaining their existing health (no restore). Whenever new units are spawned, all unit positions are reset to opposite sides of the map.
Initial State
• 9 Marines in a vertical line at a random side of the map (preselected)
• 4 Roaches in a vertical line at the opposite side of the map from the Marines
Rewards
• Roach defeated: +10
• Marine defeated: -1
End Conditions
• Time elapsed
• All Marines defeated
Time Limit
• 120 seconds
Additional Notes
• Fog of War disabled
• No camera movement required (single-screen)
• This map includes an automatic, mid-episode state change for player-controlled units. The Marine units are automatically moved back to a neutral position (at a random side of the map opposite the Roaches) when new units are spawned, which occurs whenever the current set of Roaches is defeated. This is done in order to guarantee that new units do not spawn within combat range of one another.
publish
publish