Theoritical foundations

Notations

  • xtx_t: state at time tt.
  • utu_t : action at time $t
  • ztz_t : observation at time tt
  • oto_t : history at time tt
  • dtd_t : decision rule at time tt
  • sts_t : state of the controller at time tt

Formalisms

Problem State Action Observation
MDP xtx_t utu_t xtx_t
POMDP bt=p(xtot)b_t = p\left( x_t \mid o_t \right) utu_t ztz_t
MMDP sts_t ut=(ut1,ut2,...,utn)\mathbf{u}_t = \left( u_t^1, u_t^2, ..., u_t^n\right) xtx_t
MPOMDP bt=p(xtot)b_t = p\left( x_t \mid \mathbf{o}_t \right) ut=(ut1,ut2,...,utn)\mathbf{u}_t = \left( u_t^1, u_t^2, ..., u_t^n\right) zt=(zt1,zt2,...,ztn)\mathbf{z}_t = \left( z_t^1, z_t^2, ..., z_t^n\right)
NDPOMDP ξt=(p(btib0i,d0..ti))i=1..n\mathbf{\xi}_t = \left(p\left( b^i_t \mid b^i_0, d^i_{0..t} \right)\right)_{i=1..n} dt=(dti)i=1..n=(p(uioti))i=1..n\mathbf{d}_t = (d_t^i)_{i=1..n} = \left(p(u^i \mid o_t^i)\right)_{i=1..n} zt=(zt1,zt2,...,ztn)\mathbf{z}_t = \left( z_t^1, z_t^2, ..., z_t^n\right)
Dec-POMDP ξt=p(xt,otιt)\xi_t = p\left( x_t, o_t \mid \iota_t \right) dt=(dti)i=1..n=(p(uioti))i=1..n\mathbf{d}_t = (d_t^i)_{i=1..n} = \left(p(u^i \mid o_t^i)\right)_{i=1..n}
Extensive-Form Dec-POMDP ξti=p(xt,ot,ut0:i1ιt)\xi_t^i = p\left( x_t, o_t, u_t^{0:i-1} \mid \iota_t \right) dti=p(utioti)d_t^i = p(u_t^i \mid o_t^i)
(2p)-ZS-SG sts_t (p(ut1),p(ut2))(p(u_t^1),p(u_t^2))
(2p)-ZS-POSG ξt=p(xt,otιt)\xi_t = p\left( x_t, o_t \mid \iota_t \right) dt=(dti)i=1..n=(p(uioti))i=1..n\mathbf{d}_t = (d_t^i)_{i=1..n} = \left(p(u^i \mid o_t^i)\right)_{i=1..n}

Problems reformulation