How Many 14-digit Numbers are Sensible Timestamps?

Sunday, August 31, 2025

Consider the following two numbers:

20240510235959
12345678901234

One of them is a proper timestamp, and the other just a 14-digit number:

2024-05-10 23:59:59
1234-56-78 90:12:34

Imagine a situation where you are recovering creation date from a set of filenames. You know some filenames contain a timestamp in the pattern above, and write the following regex to detect those filenames:

[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]

The regex is not perfect, as there is a risk of false positives, meaning some filenames might match the regular expression which were not actually intended as timestamps. For the sake of this blog post, let’s ignore the fact that all formatting characters used, such as : and -, take 99% of the doubt away that this filename might not be a timestamp. What’s the risk of mistakenly identifying a filename as a timestamp?

We’ll first define what it means for filenames to be timestamped, and then calculate the probability of getting it wrong

Warning: I’m not a statistics geek, so if you’re prone to getting pedantic about statistics, this might not be the blog post for you… If you want to read more about statistics as used in this blog post, chapter five of Online Statistics Education: A Multimedia Course of Study is a good place to start.

Plain and timestamped filenames

Let’s first make clear what I mean by mistakenly identifying a filename. I define a “filenumber” to be a sequence of 14 digits. We assume that from each filename a filenumber can be extracted, as done above with e.g. a regex. Each filenumber has an “intention” when the file was first created. The intention is either timestamp when it refers to an actual time, or plain when it’s just intended for identification of the file.

We call a file, filename or filenumber with intention timestamped a timestamped file/filename/filenumber, and vice versa for plain.

For example, when Signal exports a picture, it names it something like signal-2025-08-29-22-51-40-350.jpg. This name contains the filenumber 20250829225140. As this is a timestamp of the moment I exported the picture, this is a timestamped file. Conversely, a hypothetical app could also generate a random 14-digit number with the purpose of just identifying the picture. In that case, it’s a plain file.

Note that an intention requires both a file and a filenumber. Given only a filenumber, it is impossible to decide what the intention is. In other words, a filenumber might be assigned two intentions, based on the files that have that file number.

With an intention detection technique, the intention of a filenumber can be guessed. For example, given a detection technique $G$ (for $G$ uess), it might make the following predictions:

$G(\texttt{199811230900}) = \mathtt{timestamp}$

Here, $G$ takes as an input the filenumber, and outputs either plain or timestamp. Hypothetically $G$ could take more inputs, such as the file, or metadata from the file, which can be used to make the detection more accurate. In this blog post, we only consider the filenumber as an input.

How inaccurate is $G_b$ ?

The problem is that $G$ can be wrong. For example, here’s the $G$ I used the other day to classify my photos. Let’s call it $G_b$ :

$G_b(n)\equiv \begin{cases} \mathtt{timestamp}, & \text{if } \texttt{1970-1-1 00:00} \leq n \leq \texttt{2025-08-15 18:30} \\ \mathtt{plain}, & \text{otherwise} \end{cases}$

Essentially, whenever the filenumber $n$ looks like a timestamp and falls in a reasonable range, classify it as a timestamp.

Now here’s the problem. Assuming my filenumbers are a mix of plain and timestamped filenumbers, and that plain filenumbers are uniformly distributed, what are the chances my $G_b$ will classify one or more as timestamp, when they are in fact plain?

Average of misclassified files

There are a few assumptions we can make to make the analysis more robust:

Ranges: Every timestamp must lie somewhere between 1970 inclusive and 2026 exclusive. This is not generally true, but in my case the files were photos, and we’re about halfway through 2025, which makes this a sensible assumption.
Uniform distribution: We assume that plain filenumber are uniformly distributed over all possible 14-digit numbers. Meaning, each filenumber has the same chance to be used as any other filenumber. An app that uses a non-uniform distribution for ID generation would be non-standard, to say the least.
Imprecision: This is just a calculation done for fun, so I don’t need to take into account leap days, seconds, etc. I’ll settle for an approximation of the exact answer.
Population sizes: For the sake of the example, let’s say my collection has 4000 files, 3000 of which are timestamped, and 1000 of which are plain.

Let’s first determine the fraction of filenumbers that look like timestamps:

Number of filenumbers = $10^{14}$
Number of valid timestamps given above assumptions: $56 \times 12 \times 31 \times 24 \times 60 \times 60 = 1799884800$ . Here, we count all timestamps until 2026 exclusive.
Fraction of valid timestamps in set of filenumbers: $1799884800 \div 10^{14} = 0.000017998848$ . That’s a small number; let’s call this fraction $\alpha$ .

Given $\alpha$ , and the fact that plain filenumbers are uniformly distributed, it’s easy to calculate how many $G_b$ will get wrong. We’ll just multiply the number of plain files with $\alpha$ :

$\alpha \times 1000 = 0.017998848$

So, on average, we’ll get less than 1 card wrong! While that sounds nice, ultimately it doesn’t say much. Intuitively speaking, maybe there’ll be an unlucky streak and five filenumbers end up in the sensible timestamp range. Is there not a more robust way to get an indicator of how dangerous $G_b$ is?

Chances of getting one or more wrong

There is! I actually want to know the following:

$\text{``chance that one or more filenumbers are misclassified as timestamped''}$

Unfortunately, we can’t (yet) put plain english into a calculator. Let’s use a basic statistics trick to invert probabilities and reduce the formula to something we can actually calculate. The trick is: if something happens with chance $p$ , then the chance of the thing not happening is $1 - p$ . So, the chance that we wrongly classify one or more numbered filenames as timestamped can also be written as:

$1 - \text{``chance that no filenumbers are misclassified as timestamped''}$

We’ll have to further unpack $\text{``chance that no filenumbers are misclassified as timestamped''}$ by considering $G_b$ . Essentially, $G_b$ does detection by checking if the filenumber falls inside a certain range. Therefore, the chance that no plain files are misclassified is equal to the chance that all numbered files fall outside of that range to begin with.

That sounds tricky; let’s first calculate what the chance is that plain filenumbers lie outside the range of timestamps. This we can calculate using $\alpha$ , the chance that a plain filenumber is classified as timestamped. Using the probability inversion trick, the chance that one particular filenumber is outside the range of timestamps is $1 - \alpha$ . Generalizing this to all numbered files, the chance that all of them lie outside the range of timestamps is $(1-\alpha)^{1000}$ .

We now have enough to define $\text{``chance that no filenumbers are misclassified as timestamped''}$ , using, again, the probability inversion trick. If the chance that all plain filenumbers lie outside of the range of timestamps is $(1-\alpha)^{1000}$ , then the chance that one or more lie inside the range of timestamps is:

$1 - (1 - \alpha)^{1000}$

To actually calculate this, you need a very precise calculator. If the decimal component is not handled correctly while computing $(1 - \alpha)^{1000}$ , the result will be meaningless. Thankfully, my trusty built-in android calculator app actually has very good precision, so we’ll just use that. Case in point: my Samsung tablet cannot go beyond a precision of 10 decimals.

Here we go:

$1 - (1 - \alpha)^{1000} = 0.0178\dotsc$

So that makes about a $1.78\%$ chance of mis-classifying one or more filenumbers as timestamp. That’s actually not as safe as I thought! A chance below $0.1\%$ would’ve given me a safe “gut feeling”. Luckily no lives depend on the classification of these filenames so I’m not too worried 🙂.¹

View as: md (raw), txt.

Generated with BYOB. License: CC-BY-SA. This page is designed to last.

⇐ [This site is part of the UT webring] ⇒

How Many 14-digit Numbers are Sensible Timestamps?

Plain and timestamped filenames

How inaccurate is GbG_b?

Average of misclassified files

Chances of getting one or more wrong

How inaccurate is $G_b$ ?