In this article, we will take a detailed look at how ClickHouse parses dates and times depending on the date_time_input_format setting. We will cover general information and technical implementation details provide practical examples compare performance and offer recommendations on choosing the optimal parsing mode.


1. Overview of the date_time_input_format Setting

ClickHouse has a special setting called date_time_input_format, which determines how the server interprets incoming string data when converting it to DateTime, DateTime64 types, and during input/output formats (e.g., FORMAT JSONEachRow, FORMAT CSV, etc.).

1.1. Main Modes: 'base' and 'best_effort'

The most commonly mentioned modes are:

  1. base (or alias basic):
    • A more strict parsing mode.
    • Expects a date and time format strictly like YYYY-MM-DD HH:MM:SS or with an additional fractional part of seconds YYYY-MM-DD HH:MM:SS[.fractional] (e.g., 2025-01-14 09:31:30.123456).
    • Parsing errors occur if deviations from this format are detected.
  2. best_effort:
    • A flexible mode that supports a wide range of common date/time formats (ISO 8601 with T separator, Z suffix, fractional seconds, timezone offsets, etc.).
    • Can correctly parse strings like YYYY-MM-DDTHH:MM:SSZ, 2025-01-14T09:31:30.725617089Z, and others.
    • However, due to the extra logic for recognizing multiple formats, it might work slightly slower on large data volumes.

1.2. Why Multiple Modes Are Needed

Different applications, systems, and libraries may output date and time differently. The most common standard is ISO 8601, but there are other variations (with or without spaces, fractional seconds, timezone letters, etc.). To simplify integration as much as possible, ClickHouse introduced the idea of having a "strict" mode (base), which expects a single format, and a "flexible" mode (best_effort), which attempts to recognize the input date in several popular formats.


2. Technical Details

The implementation of date and time parsing in ClickHouse is divided into several functions. Two key ones are:

  1. parseDateTimeBestEffort.cpp — for the 'best_effort' mode.
  2. parseDateTime.cpp — for the 'base' mode (sometimes referred to as 'basic').

The relevant source code links in ClickHouse (GitHub repository) are:

2.1. Parsing in best_effort Mode

In best_effort mode, the input string is sequentially matched against predefined templates in ClickHouse. These include:

If the string doesn't fit one format, the algorithm tries the next one, and so on. If no matches are found, parsing fails with an error.

2.2. Parsing in base Mode (or basic)

The base mode (often referred to as basic in documentation and code) assumes a strictly defined format:

Any deviation (e.g., T instead of a space, presence of Z, missing seconds section) will result in an error:

Code: 41. DB::ParsingException: Cannot parse datetime ...

3. Practical Examples

Below are several scenarios that illustrate the differences between base and best_effort.

3.1. When best_effort Is Essential

Imagine receiving data in ISO 8601 format, including a T separator and Z (UTC) suffix. For instance, 2025-01-14T09:31:30.725617089Z.

In base mode, ClickHouse expects a string like 2025-01-14 09:31:30.725617089 (with a space, without Z). Therefore, an insertion attempt will produce an error. In best_effort mode, ClickHouse correctly recognizes this timestamp, accounts for fractional seconds, and interprets Z as UTC.

3.2. Example of Manual INSERT with JSONEachRow

Suppose we have a table:

CREATE TABLE events (
    event_date_time DateTime64(9),
    event_value Int32
)
ENGINE = MergeTree
ORDER BY event_date_time;

And we want to insert data in JSONEachRow format:

INSERT INTO events FORMAT JSONEachRow
{"event_date_time": "2025-01-14T09:31:30.725617089Z", "event_value": 100}

3.3. Using SET date_time_input_format='best_effort'

To enable flexible parsing locally within the current session, you can execute:

SET date_time_input_format = 'best_effort';

After this, any date-time insertion/read within the session will work in "best effort" mode.

An analogous example with INSERT:

INSERT INTO events
SETTINGS date_time_input_format='best_effort'
FORMAT JSONEachRow
{"event_date_time": "2025-01-14T09:31:30.725617089Z", "event_value": 100}

3.4. Globally enabling best_effort via users.xml

If you want all requests from a particular user or role profile to use best_effort by default, you can configure this in the users.xml configuration file. For example:

<yandex>
    <profiles>
        <my_profile>
            ...
            <date_time_input_format>best_effort</date_time_input_format>
        </my_profile>
    </profiles>
    <users>
        <my_user>
            <profile>my_profile</profile>
            ...
        </my_user>
    </users>
</yandex>

When ClickHouse is restarted, all new sessions of user my_user will run in best_effort mode.

3.5. Example of inserting a large JSON string with several columns

When parsing JSON data in ClickHouse, you often encounter a situation where a JSON file may contain fields that are not present in the target table. For example:

{"event_date_time": "2025-01-14T09:31:30Z", "event_value": 42, "some_extra_field": "ignored"}

If FORMAT JSONEachRow is used, the default “extra” fields may cause an error. To skip them, you can enable the setting:

SET input_format_skip_unknown_fields = 1;

Then ClickHouse will safely ignore some_extra_field, and parse the event_date_time according to the logic of the current date_time_input_format. If we have best_effort, then 2025-01-14T09:31:30Z will be successfully read.


4. Comparative performance analysis

4.1. Why base is faster

The base mode uses a clearly fixed parsing function: it checks strictly positively that the year consists of 4 digits, followed by a hyphen, followed by 2 digits of the month, and so on. Virtually no branching or checks for “what if there's a T or a space, if there's a Z suffix” and the like.

The best_effort mode, on the other hand, stores a set of patterns - sometimes quite large (especially in modern versions of ClickHouse that support time zone offsets, different delimiter options, fractional seconds, etc.). Each string is checked sequentially by several rules, which gives additional load.

4.2. When this can become noticeable

On small amounts of data, the difference is unlikely to be noticeable. However, if you are loading, say, billions of rows, and a large number of values have to be converted from rows to dates/times, the parsing time can become a bottle neck.

4.3. A small benchmark

To test the difference, you can prepare a dataset yourself (e.g. with 10^7 or 10^8 rows in CSV/JSON format) and run the insert twice - once with date_time_input_format='base', another time with best_effort'. Measure the query execution time or average insertion speed (rows/sec).

There are no official benchmarks from ClickHouse developers on this topic (at least at the time of writing), so you will have to make your own estimation based on actual usage conditions.

Benchmark:

-- Data preparation (10M rows)
SELECT now() + number AS ts FROM numbers(10000000) INTO OUTFILE 'data.tsv'

-- Insertion time measurement
clickhouse-client --query "INSERT INTO test_table FORMAT TSV" < data.tsv

5. Recommendations and Best Practices

5.1. When to Enable best_effort

5.2. When to Prefer base

5.3. Accuracy Support (DateTime64(N))

For DateTime64(N) types, fractional second parsing is particularly important. In base mode, the exact number of digits after the decimal point must match the declared precision. In best_effort, ClickHouse can recognize and truncate fractional seconds to the required precision.

5.4. Impact of T, Z, and Timezone Offsets

5.5. Debugging Parsing Issues


6. Conclusions

The choice between 'base' and 'best_effort' primarily depends on the nature of your data and performance requirements:

In practice, if you exchange data with microservices that output ISO 8601 formats (T, Z), enabling 'best_effort' is reasonable. For fixed formats with performance-critical requirements, 'base' is better.

Thus, the date_time_input_format setting in ClickHouse helps balance strict format control with flexibility in handling diverse date/time formats. Choose the approach that best fits your project.