Câu hỏi phỏng vấn Databases
Câu hỏi

What is Index Cardinality and why does it matter?

Câu trả lời

Index Cardinality and Its Importance

Index Cardinality refers to the uniqueness of values stored in a specific column within an index in a database. It is a measure of how many distinct values are present in that column. Cardinality can be categorized as either high or low:

  • High Cardinality: This means that the column contains a large number of unique values. For example, a column storing unique user IDs or email addresses would have high cardinality.
  • Low Cardinality: This means that the column contains a small number of unique values. For example, a column storing gender values (e.g., 'Male' and 'Female') would have low cardinality.

Why Index Cardinality Matters

  1. Query Performance:

    • High Cardinality: High cardinality indexes are generally more selective, meaning they can significantly narrow down the search space for a query. This improves query performance because the database engine can quickly locate the rows that match the query criteria without scanning the entire table[1][3][8].
    • Low Cardinality: Low cardinality indexes are less selective and may not provide significant performance benefits. In some cases, a full table scan might be more efficient than using an index on a low cardinality column[2][3][17].
  2. Index Selectivity:

    • Selectivity is a measure of how well an index can filter out rows. It is calculated as the ratio of the number of distinct values to the total number of rows. High selectivity (close to 1) means the index is very effective at filtering rows, while low selectivity means it is less effective[9][13][20].
  3. Storage and Maintenance:

    • High Cardinality: High cardinality indexes can consume more storage space because each unique value needs to be stored separately. However, they can reduce the number of data blocks read during query execution, which can offset the storage cost with performance gains[4][6][14].
    • Low Cardinality: Low cardinality indexes consume less storage space but may not justify the overhead of maintaining the index, especially if the index is not frequently used by queries[10][17].
  4. Impact on Write Operations:

    • Indexes need to be updated whenever data is inserted, updated, or deleted. High cardinality indexes can slow down write operations because the database needs to maintain a larger index structure. Conversely, low cardinality indexes might have a lesser impact on write performance but offer limited benefits for read operations[4][6][15].
  5. Query Optimization:

    • Database query optimizers use cardinality estimates to determine the most efficient execution plan for a query. Accurate cardinality statistics help the optimizer choose the best indexes and join strategies, leading to faster query execution[5][11][16].

Practical Examples

  • High Cardinality: A column storing unique customer IDs in an e-commerce database. Indexing this column can significantly speed up queries that search for specific customers.
  • Low Cardinality: A column storing boolean values (e.g., 'is_active' with values 0 and 1). Indexing this column might not be beneficial because the index would not significantly reduce the search space.

In summary, understanding and leveraging index cardinality is crucial for database performance tuning. High cardinality indexes can greatly enhance query performance by providing high selectivity, while low cardinality indexes might not offer the same benefits and could even be detrimental if not used appropriately.

Citations:
[1] https://stackoverflow.com/questions/2566211/what-is-cardinality-in-mysql
[2] https://distributedsystemsauthority.com/index-efficiency-and-maintenance-postgresql-12-high-performance-guide-part-5-12/
[3] https://www.actian.com/what-is-cardinality/
[4] https://stackoverflow.com/questions/2597179/mysql-index-cardinality-performance-vs-storage-efficiency
[5] https://vertabelo.com/blog/cardinality-in-database/
[6] https://www.linkedin.com/pulse/day11-compression-cardinality-farhan-khan
[7] https://www.elastic.co/blog/improving-the-performance-of-high-cardinality-terms-aggregations-in-elasticsearch
[8] https://www.mysqltutorial.org/mysql-index/mysql-index-cardinality/
[9] https://developer.couchbase.com/tutorial-understanding-cardinality-and-selectivity/
[10] https://aws.amazon.com/blogs/database/detect-and-fix-low-cardinality-indexes-in-amazon-documentdb/
[11] https://orangematter.solarwinds.com/2020/01/05/what-is-cardinality-in-a-database/
[12] https://www.timescale.com/blog/what-is-high-cardinality-how-do-time-series-databases-influxdb-timescaledb-compare/
[13] https://severalnines.com/blog/understanding-indexes-mysql-part-three/
[14] https://www.cockroachlabs.com/blog/data-cardinality-ultimate-tournament/
[15] https://www.mongodb.com/blog/post/performance-best-practices-indexing
[16] https://www.freecodecamp.org/news/database-indexing-at-a-glance-bb50809d48bd/
[17] https://stackoverflow.com/questions/2113181/does-it-make-sense-to-use-an-index-that-will-have-a-low-cardinality
[18] https://logicalread.com/mysql-index-cardinality-mc12/
[19] https://www.lullabot.com/articles/slow-queries-check-the-cardinality-of-your-mysql-indexes
[20] https://planetscale.com/learn/courses/mysql-for-developers/indexes/index-selectivity

expert

expert

Gợi ý câu hỏi phỏng vấn

middle

Define ACID Properties

expert

What is Optimistic Locking and Pessimistic Locking?

middle

What's the difference between a Primary Key and a Unique Key?

Bình luận

Chưa có bình luận nào

Chưa có bình luận nào