Published on: 23 January, 2022

State the steps involved in query processing. Why is the query optimization needed?


Query processing refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions that can be used at the physical level of the file system, a variety of query-optimizing transformations, and actual evaluation of queries. A database query is the vehicle for instructing a DBMS to update or retrieve specific data to/from the physically stored medium.

There are three phases that a query passes through during the DBMS’ processing of that query:

  1. Parsing and translation - Check syntax and verify relations. – Translate the query into an equivalent relational algebra expression.
  2. Optimization - Generate an optimal evaluation plan (with lowest cost) for the query plan.
  3. Evaluation - The query-execution engine takes an (optimal) evaluation plan, executes that plan, and returns the answers to the query.

Purpose of the Query Optimizer

The optimizer attempts to generate the best execution plan for a SQL statement. The best execution plan is defined as the plan with the lowest cost among all considered candidate plans. The cost computation accounts for factors of query execution such as I/O, CPU, and communication.

The best method of execution depends on numerous conditions including how the query is written, the size of the data set, the layout of the data, and which access structures exist. The optimizer determines the best plan for a SQL statement by examining multiple access methods, such as full table scan or index scans, and different join methods such as nested loops and hash joins.

Because the database has many internal statistics and tools at its disposal, the optimizer is usually in a better position than the user to determine the best method of statement execution. For this reason, all SQL statements use the optimizer.

Consider a user who queries records for employees who are managers. If the database statistics indicate that 80% of employees are managers, then the optimizer may decide that a full table scan is most efficient. However, if statistics indicate that few employees are managers, then reading an index followed by a table access by rowed may be more efficient than a full table scan.

Random questions