"Unfolding Entropic Statistics" by Jialin Zhang
 

Unfolding Entropic Statistics

Document Type

Lecture

Publication Date

9-14-2022

Abstract

This talk is organized into three parts. 1) Entropy estimation in Turing’s perspective is described. Given an iid sample from a countable alphabet under a probability distribution, Turing’s formula (introduced by Good (1953), hence also known as the Good-Turing formula) is a mind-bending non-parametric estimator of total probability associated with letters of the alphabet that are NOT represented in the sample. Some interesting facts and thoughts about entropy estimators are introduced. 2) Turing’s formula brought about a new characterization of probability distributions on general countable alphabets that provides a new way to do statistics on alphabets, where the usual statistical concepts associated with random variables (on the real line) no longer exist. The new perspective, in turn, inspires some thoughts on the characterization of probability distribution when the underlying sample space is unclear. An application example of authorship attribution is provided at the end. 3) Shannon’s entropy is only finitely defined for distributions with fast decaying tails on a countable alphabet. The unboundedness of Shannon’s entropy over thick-tailed distributions on an alphabet prevents its potential utility from being fully realized. Zhang (2020) proposed generalized Shannon’s entropy (GSE), which is finitely defined everywhere. Some interesting results about GSE and a new test of independence inspired by GSE are introduced. The new test does not require the knowledge of cardinality, and it is consistent and would detect any form of dependence structure in the general alternative space given a sufficiently large sample.

Relational Format

presentation

This document is currently not available for download.

Share

COinS