msmtools.estimation.transition_matrix¶

msmtools.estimation.transition_matrix(C, reversible=False, mu=None, method='auto', **kwargs)¶

Estimate the transition matrix from the given countmatrix.

Parameters:

C (numpy ndarray or scipy.sparse matrix) – Count matrix
reversible (bool (optional)) – If True restrict the ensemble of transition matrices to those having a detailed balance symmetry otherwise the likelihood optimization is carried out over the whole space of stochastic matrices.
mu (array_like) – The stationary distribution of the MLE transition matrix.
method (str) – Select which implementation to use for the estimation. One of ‘auto’, ‘dense’ and ‘sparse’, optional, default=’auto’. ‘dense’ always selects the dense implementation, ‘sparse’ always selects the sparse one. ‘auto’ selects the most efficient implementation according to the sparsity structure of the matrix: if the occupation of the C matrix is less then one third, select sparse. Else select dense. The type of the T matrix returned always matches the type of the C matrix, irrespective of the method that was used to compute it.
**kwargs (Optional algorithm-specific parameters. See below for special cases) –
Xinit ((M, M) ndarray) – Optional parameter with reversible = True. initial value for the matrix of absolute transition probabilities. Unless set otherwise, will use X = diag(pi) t, where T is a nonreversible transition matrix estimated from C, i.e. T_ij = c_ij / sum_k c_ik, and pi is its stationary distribution.
maxiter (1000000 : int) – Optional parameter with reversible = True. maximum number of iterations before the method exits
maxerr (1e-8 : float) – Optional parameter with reversible = True. convergence tolerance for transition matrix estimation. This specifies the maximum change of the Euclidean norm of relative stationary probabilities (\(x_i = \sum_k x_{ik}\)). The relative stationary probability changes \(e_i = (x_i^{(1)} - x_i^{(2)})/(x_i^{(1)} + x_i^{(2)})\) are used in order to track changes in small probabilities. The Euclidean norm of the change vector, \(|e_i|_2\), is compared to maxerr.
rev_pisym (bool, default=False) – Fast computation of reversible transition matrix by normalizing \(x_{ij} = \pi_i p_{ij} + \pi_j p_{ji}\). \(p_{ij}\) is the direct (nonreversible) estimate and \(pi_i\) is its stationary distribution. This estimator is asympotically unbiased but not maximum likelihood.
return_statdist (bool, default=False) – Optional parameter with reversible = True. If set to true, the stationary distribution is also returned
return_conv (bool, default=False) – Optional parameter with reversible = True. If set to true, the likelihood history and the pi_change history is returned.
warn_not_converged (bool, default=True) – Prints a warning if not converged.
sparse_newton (bool, default=False) – If True, use the experimental primal-dual interior-point solver for sparse input/computation method.

Returns:

P ((M, M) ndarray or scipy.sparse matrix) – The MLE transition matrix. P has the same data type (dense or sparse) as the input matrix C.
The reversible estimator returns by default only P, but may also return
(P,pi) or (P,lhist,pi_changes) or (P,pi,lhist,pi_changes) depending on the return settings
P (ndarray (n,n)) – transition matrix. This is the only return for return_statdist = False, return_conv = False
(pi) (ndarray (n)) – stationary distribution. Only returned if return_statdist = True
(lhist) (ndarray (k)) – likelihood history. Has the length of the number of iterations needed. Only returned if return_conv = True
(pi_changes) (ndarray (k)) – history of likelihood history. Has the length of the number of iterations needed. Only returned if return_conv = True

Notes

The transition matrix is a maximum likelihood estimate (MLE) of the probability distribution of transition matrices with parameters given by the count matrix.

References

[1]	Prinz, J H, H Wu, M Sarich, B Keller, M Senne, M Held, J D Chodera, C Schuette and F Noe. 2011. Markov models of molecular kinetics: Generation and validation. J Chem Phys 134: 174105

[2]	Bowman, G R, K A Beauchamp, G Boxer and V S Pande. 2009. Progress and challenges in the automated construction of Markov state models for full protein systems. J. Chem. Phys. 131: 124101

[3]	Trendelkamp-Schroer, B, H Wu, F Paul and F. Noe. 2015 Estimation and uncertainty of reversible Markov models. J. Chem. Phys. 143: 174101

Examples

>>> import numpy as np
>>> from msmtools.estimation import transition_matrix

>>> C = np.array([[10, 1, 1], [2, 0, 3], [0, 1, 4]])

Non-reversible estimate

>>> T_nrev = transition_matrix(C)
>>> T_nrev
array([[ 0.83333333,  0.08333333,  0.08333333],
       [ 0.4       ,  0.        ,  0.6       ],
       [ 0.        ,  0.2       ,  0.8       ]])

Reversible estimate

>>> T_rev = transition_matrix(C, reversible=True)
>>> T_rev
array([[ 0.83333333,  0.10385551,  0.06281115],
       [ 0.35074677,  0.        ,  0.64925323],
       [ 0.04925323,  0.15074677,  0.8       ]])

Reversible estimate with given stationary vector

>>> mu = np.array([0.7, 0.01, 0.29])
>>> T_mu = transition_matrix(C, reversible=True, mu=mu)
>>> T_mu
array([[ 0.94771371,  0.00612645,  0.04615984],
       [ 0.42885157,  0.        ,  0.57114843],
       [ 0.11142031,  0.01969477,  0.86888491]])