Entropy estimation of very short symbolic sequences

Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Apr;79(4 Pt 2):046208. doi: 10.1103/PhysRevE.79.046208. Epub 2009 Apr 7.

Abstract

While entropy per unit time is a meaningful index to quantify the dynamic features of experimental time series, its estimation is often hampered in practice by the finite length of the data. We here investigate the performance of entropy estimation procedures, relying either on block entropies or Lempel-Ziv complexity, when only very short symbolic sequences are available. Heuristic analytical arguments point at the influence of temporal correlations on the bias and statistical fluctuations, and put forward a reduced effective sequence length suitable for error estimation. Numerical studies are conducted using, as benchmarks, the wealth of different dynamic regimes generated by the family of logistic maps and stochastic evolutions generated by a Markov chain of tunable correlation time. Practical guidelines and validity criteria are proposed. For instance, block entropy leads to a dramatic overestimation for sequences of low entropy, whereas it outperforms Lempel-Ziv complexity at high entropy. As a general result, the quality of entropy estimation is sensitive to the sequence temporal correlation hence self-consistently depends on the entropy value itself, thus promoting a two-step procedure. Lempel-Ziv complexity is to be preferred in the first step and remains the best estimator for highly correlated sequences.