Fake CAPTCHA image spelling "unit propagation". Generated with Wolfram Alpha.
The Blog of Bob Rubbens

How Many 14-digit Numbers are Sensible Timestamps?

Sunday, August 31, 2025

Consider the following two numbers:

20240510235959
12345678901234

One of them is a proper timestamp, and the other just a 14-digit number:

2024-05-10 23:59:59
1234-56-78 90:12:34

Imagine a situation where you are recovering creation date from a set of filenames. You know some filenames contain a timestamp in the pattern above, and write the following regex to detect those filenames:

[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]

The regex is not perfect, as there is a risk of false positives, meaning some filenames might match the regular expression which were not actually intended as timestamps. For the sake of this blog post, let’s ignore the fact that all formatting characters used, such as : and -, take 99% of the doubt away that this filename might not be a timestamp. What’s the risk of mistakenly identifying a filename as a timestamp?

We’ll first define what it means for filenames to be timestamped, and then calculate the probability of getting it wrong

Warning: I’m not a statistics geek, so if you’re prone to getting pedantic about statistics, this might not be the blog post for you… If you want to read more about statistics as used in this blog post, chapter five of Online Statistics Education: A Multimedia Course of Study is a good place to start.

Plain and timestamped filenames

Let’s first make clear what I mean by mistakenly identifying a filename. I define a “filenumber” to be a sequence of 14 digits. We assume that from each filename a filenumber can be extracted, as done above with e.g. a regex. Each filenumber has an “intention” when the file was first created. The intention is either timestamp when it refers to an actual time, or plain when it’s just intended for identification of the file.

We call a file, filename or filenumber with intention timestamped a timestamped file/filename/filenumber, and vice versa for plain.

For example, when Signal exports a picture, it names it something like signal-2025-08-29-22-51-40-350.jpg. This name contains the filenumber 20250829225140. As this is a timestamp of the moment I exported the picture, this is a timestamped file. Conversely, a hypothetical app could also generate a random 14-digit number with the purpose of just identifying the picture. In that case, it’s a plain file.

Note that an intention requires both a file and a filenumber. Given only a filenumber, it is impossible to decide what the intention is. In other words, a filenumber might be assigned two intentions, based on the files that have that file number.

With an intention detection technique, the intention of a filenumber can be guessed. For example, given a detection technique GG (for GGuess), it might make the following predictions:

G(𝟷𝟿𝟿𝟾𝟷𝟷𝟸𝟹𝟶𝟿𝟶𝟶)=𝚝𝚒𝚖𝚎𝚜𝚝𝚊𝚖𝚙 G(\texttt{199811230900}) = \mathtt{timestamp}

Here, GG takes as an input the filenumber, and outputs either plain or timestamp. Hypothetically GG could take more inputs, such as the file, or metadata from the file, which can be used to make the detection more accurate. In this blog post, we only consider the filenumber as an input.

How inaccurate is GbG_b?

The problem is that GG can be wrong. For example, here’s the GG I used the other day to classify my photos. Let’s call it GbG_b:

Gb(n){𝚝𝚒𝚖𝚎𝚜𝚝𝚊𝚖𝚙,if 𝟷𝟿𝟽𝟶-𝟷-𝟷 𝟶𝟶:𝟶𝟶n𝟸𝟶𝟸𝟻-𝟶𝟾-𝟷𝟻 𝟷𝟾:𝟹𝟶𝚙𝚕𝚊𝚒𝚗,otherwise G_b(n)\equiv \begin{cases} \mathtt{timestamp}, & \text{if } \texttt{1970-1-1 00:00} \leq n \leq \texttt{2025-08-15 18:30} \\ \mathtt{plain}, & \text{otherwise} \end{cases}

Essentially, whenever the filenumber nn looks like a timestamp and falls in a reasonable range, classify it as a timestamp.

Now here’s the problem. Assuming my filenumbers are a mix of plain and timestamped filenumbers, and that plain filenumbers are uniformly distributed, what are the chances my GbG_b will classify one or more as timestamp, when they are in fact plain?

Average of misclassified files

There are a few assumptions we can make to make the analysis more robust:

Let’s first determine the fraction of filenumbers that look like timestamps:

Given α\alpha, and the fact that plain filenumbers are uniformly distributed, it’s easy to calculate how many GbG_b will get wrong. We’ll just multiply the number of plain files with α\alpha:

α×1000=0.017998848 \alpha \times 1000 = 0.017998848

So, on average, we’ll get less than 1 card wrong! While that sounds nice, ultimately it doesn’t say much. Intuitively speaking, maybe there’ll be an unlucky streak and five filenumbers end up in the sensible timestamp range. Is there not a more robust way to get an indicator of how dangerous GbG_b is?

Chances of getting one or more wrong

There is! I actually want to know the following:

“chance that one or more filenumbers are misclassified as timestamped” \text{``chance that one or more filenumbers are misclassified as timestamped''}

Unfortunately, we can’t (yet) put plain english into a calculator. Let’s use a basic statistics trick to invert probabilities and reduce the formula to something we can actually calculate. The trick is: if something happens with chance pp, then the chance of the thing not happening is 1p1 - p. So, the chance that we wrongly classify one or more numbered filenames as timestamped can also be written as:

1“chance that no filenumbers are misclassified as timestamped” 1 - \text{``chance that no filenumbers are misclassified as timestamped''}

We’ll have to further unpack “chance that no filenumbers are misclassified as timestamped”\text{``chance that no filenumbers are misclassified as timestamped''} by considering GbG_b. Essentially, GbG_b does detection by checking if the filenumber falls inside a certain range. Therefore, the chance that no plain files are misclassified is equal to the chance that all numbered files fall outside of that range to begin with.

That sounds tricky; let’s first calculate what the chance is that plain filenumbers lie outside the range of timestamps. This we can calculate using α\alpha, the chance that a plain filenumber is classified as timestamped. Using the probability inversion trick, the chance that one particular filenumber is outside the range of timestamps is 1α1 - \alpha. Generalizing this to all numbered files, the chance that all of them lie outside the range of timestamps is (1α)1000(1-\alpha)^{1000}.

We now have enough to define “chance that no filenumbers are misclassified as timestamped”\text{``chance that no filenumbers are misclassified as timestamped''}, using, again, the probability inversion trick. If the chance that all plain filenumbers lie outside of the range of timestamps is (1α)1000(1-\alpha)^{1000}, then the chance that one or more lie inside the range of timestamps is:

1(1α)1000 1 - (1 - \alpha)^{1000}

To actually calculate this, you need a very precise calculator. If the decimal component is not handled correctly while computing (1α)1000(1 - \alpha)^{1000} , the result will be meaningless. Thankfully, my trusty built-in android calculator app actually has very good precision, so we’ll just use that. Case in point: my Samsung tablet cannot go beyond a precision of 10 decimals.

Here we go:

1(1α)1000=0.0178 1 - (1 - \alpha)^{1000} = 0.0178\dotsc

So that makes about a 1.78%1.78\% chance of mis-classifying one or more filenumbers as timestamp. That’s actually not as safe as I thought! A chance below 0.1%0.1\% would’ve given me a safe “gut feeling”. Luckily no lives depend on the classification of these filenames so I’m not too worried 🙂.1


View as: md (raw), txt.


Generated with BYOB. License: CC-BY-SA. This page is designed to last.