HSVI

The algorithmic scheme of Heuristic Search Value Iteration (opens new window) (HSVI) was firstly introduced by Trey Smith and Reid Simmons.

Composantes

Components used in HSVI

Schéma algorithmique

Le schéma algorithmique général d'HSVI est représenté par le schéma ci-dessous. Pour en définir une instance, celui-ci nécessite de définir les notions d'état sts_t, d'action ata_t, de borne inférieur V\underline{V} et borne supérieur V¯\bar{V}.

HSVI scheme

Exemple : une instance d'HSVI est l'algorithme oHSVI. Cette instance permet de résoudre un Dec-POMDP formulé comme un occupancy-state MDP. Le type d'état dans ce cas est un état d'occupation, noté ξt=p(xt,otιt)\xi_t = p\left( x_t, o_t \mid \iota_t \right). Le type d'action est un ensemble de règles de décision individuelles, noté dt=(dt1,...,dtn)=(p(u1ot1),p(u2ot2),...,p(unotn))\mathbf{d}_t = (d_t^1, ..., d_t^n) = \left(p(u^1 \mid o_t^1), p(u^2 \mid o_t^2),..., p(u^n \mid o_t^n)\right). La borne inférieure est représentée par une ensemble d'hyperplan et la borne supérieure par un ensemble de point.

|

PROFILING

Parameters :

  • PROBLEM = Tiger, h=3
  • STOCKAGE_OCCUPANCY_STATES = True
  • STOCKAGE_DECISION_RULES = True
  • BELIEF_PRECISION = 0.01;
  • OCCUPANCY_STATE_PRECISION = 0.01
  • COMPRESS_PRECISION = 0.1
  • LB_INIT = MIN, UB_INIT = MDP

TABULAR & SAWTOOTH / EXHAUSTIVE

NAME TIME PERCENT
TOTAL_TIME 3.96201 s 100.00000 %
------------------------------ ----------------------- -----------------------
HSVI::TIME_INITIALIZATION 0.00293 s 0.07400 %
HSVI::TIME_IN_SELECT_ACTION 3.66150 s 92.41528 %
HSVI::TIME_IN_SELECT_STATE 0.08857 s 2.23553 %
HSVI::TIME_IN_UPDATE_LB 0.02557 s 0.64533 %
HSVI::TIME_IN_UPDATE_UB 0.16753 s 4.22849 %
------------------------------ ----------------------- -----------------------
OccMDP::TIME_IN_GET_ACTION 0.19639 s 4.95673 %
OccMDP::TIME_IN_NEXT_STATE 0.24157 s 6.09714 %
OccMDP::TIME_IN_COMPRESS 0.13510 s 3.40979 %
OccMDP::TIME_IN_GET_REWARD 2.98747 s 75.40284 %
------------------------------ ----------------------- -----------------------

MAXPLAN & SAWTOOTH / EXHAUSTIVE

NAME TIME PERCENT
TOTAL_TIME 173.07457 s 100.00000 %
------------------------------ ----------------------- -----------------------
HSVI::TIME_INITIALIZATION 0.00304 s 0.00176 %
HSVI::TIME_IN_SELECT_ACTION 4.10709 s 2.37302 %
HSVI::TIME_IN_SELECT_STATE 0.10788 s 0.06233 %
HSVI::TIME_IN_UPDATE_LB 168.62403 s 97.42854 %
HSVI::TIME_IN_UPDATE_UB 0.21085 s 0.12183 %
------------------------------ ----------------------- -----------------------
OccMDP::TIME_IN_GET_ACTION 0.21684 s 0.12529 %
OccMDP::TIME_IN_NEXT_STATE 78.62268 s 45.42706 %
OccMDP::TIME_IN_COMPRESS 30.56853 s 17.66206 %
OccMDP::TIME_IN_GET_REWARD 3.36106 s 1.94197 %
------------------------------ ----------------------- -----------------------

MAXPLAN & SAWTOOTH / LP

NAME TIME PERCENT
TOTAL_TIME 13.54949 s 100.00000 %
------------------------------ ----------------------- -----------------------
HSVI::TIME_INITIALIZATION 0.00406 s 0.02996 %
HSVI::TIME_IN_SELECT_ACTION 5.71125 s 42.15106 %
HSVI::TIME_IN_SELECT_STATE 0.16362 s 1.20760 %
HSVI::TIME_IN_UPDATE_LB 1.39889 s 10.32428 %
HSVI::TIME_IN_UPDATE_UB 6.24559 s 46.09467 %
------------------------------ ----------------------- -----------------------
OccMDP::TIME_IN_GET_ACTION 0.00000 s 0.00000 %
OccMDP::TIME_IN_NEXT_STATE 0.21809 s 1.60954 %
OccMDP::TIME_IN_COMPRESS 0.13124 s 0.96858 %
OccMDP::TIME_IN_GET_REWARD 0.01084 s 0.07997 %
------------------------------ ----------------------- -----------------------

Parameters :

  • PROBLEM = mabc, h=100
  • STOCKAGE_OCCUPANCY_STATES = True
  • STOCKAGE_DECISION_RULES = True
  • BELIEF_PRECISION = 0.01;
  • OCCUPANCY_STATE_PRECISION = 0.01
  • COMPRESS_PRECISION = 0.1
  • LB_INIT = MIN, UB_INIT = MDP

TABULAR & SAWTOOTH / EXHAUSTIVE

NAME TIME PERCENT
TOTAL_TIME 73.86029 s 100.00000 %
------------------------------ ----------------------- -----------------------
HSVI::TIME_INITIALIZATION 0.93803 s 1.27000 %
HSVI::TIME_IN_SELECT_ACTION 47.51947 s 64.33696 %
HSVI::TIME_IN_SELECT_STATE 0.17276 s 0.23389 %
HSVI::TIME_IN_UPDATE_LB 1.50819 s 2.04196 %
HSVI::TIME_IN_UPDATE_UB 23.73140 s 32.13012 %
------------------------------ ----------------------- -----------------------
OccMDP::TIME_IN_GET_ACTION 1.23505 s 1.67214 %
OccMDP::TIME_IN_NEXT_STATE 13.12697 s 17.77270 %
OccMDP::TIME_IN_COMPRESS 6.17419 s 8.35928 %
OccMDP::TIME_IN_GET_REWARD 1.41560 s 1.91660 %
------------------------------ ----------------------- -----------------------

MAXPLAN & SAWTOOTH / LP

NAME TIME PERCENT
TOTAL_TIME 1651.38171 s 100.00000 %
------------------------------ ----------------------- -----------------------
HSVI::TIME_INITIALIZATION 0.97255 s 0.05889 %
HSVI::TIME_IN_SELECT_ACTION 590.64403 s 35.76666 %
HSVI::TIME_IN_SELECT_STATE 3.77463 s 0.22857 %
HSVI::TIME_IN_UPDATE_LB 72.39404 s 4.38385 %
HSVI::TIME_IN_UPDATE_UB 984.20183 s 59.59869 %
------------------------------ ----------------------- -----------------------
OccMDP::TIME_IN_GET_ACTION 0.00000 s 0.00000 %
OccMDP::TIME_IN_NEXT_STATE 6.87834 s 0.41652 %
OccMDP::TIME_IN_COMPRESS 3.13532 s 0.18986 %
OccMDP::TIME_IN_GET_REWARD 0.25612 s 0.01551 %
------------------------------ ----------------------- -----------------------

Tiger 3 with store_states=false

NAME TIME PERCENT
TOTAL_TIME 21.81457 s 100.00000 %
------------------------------ ----------------------- -----------------------
HSVI::TIME_INITIALIZATION 0.00284 s 0.01302 %
HSVI::TIME_IN_SELECT_ACTION 7.24417 s 33.20793 %
HSVI::TIME_IN_SELECT_STATE 0.13929 s 0.63852 %
HSVI::TIME_IN_UPDATE_LB 7.18607 s 32.94160 %
HSVI::TIME_IN_UPDATE_UB 7.22577 s 33.12361 %
------------------------------ ----------------------- -----------------------
OccMDP::TIME_IN_GET_ACTION 0.00000 s 0.00000 %
OccMDP::TIME_IN_NEXT_STATE 4.95182 s 22.69962 %
OccMDP::TIME_IN_COMPRESS 3.08612 s 14.14706 %
OccMDP::TIME_IN_GET_REWARD 10.10381 s 46.31682 %
------------------------------ ----------------------- -----------------------