澳门新葡8455手机版-澳门新葡8455最新网站

您的位置:澳门新葡8455手机版 > 新闻资讯 > BitmapHeapScan那篇小说.

BitmapHeapScan那篇小说.

2019-10-05 10:19

四、参照他事他说加以考察资料

allpaths.ccost.hcostsize.cPG Document:Query Planning

二、源码解读

choose_bitmap_and函数create_index_paths->choose_bitmap_and函数,该函数给定非空的位图访谈路径链表,施行AND操作后统一到一条路径中,最终获得位图索引围观访问路线节点.

 /* * choose_bitmap_and * Given a nonempty list of bitmap paths, AND them into one path. * 给定非空的位图访问路径链表,执行AND操作后合并到一条路径中 * * This is a nontrivial decision since we can legally use any subset of the * given path set. We want to choose a good tradeoff between selectivity * and cost of computing the bitmap. * 这是一个非常重要的策略,因为这样可以合法地使用给定路径集的任何子集。 * * The result is either a single one of the inputs, or a BitmapAndPath * combining multiple inputs. * 输出结果要么是输出的其中之一,要么是融合多个输入之后的BitmapAndPath */ static Path * choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths) { int npaths = list_length; PathClauseUsage **pathinfoarray; PathClauseUsage *pathinfo; List *clauselist; List *bestpaths = NIL; Cost bestcost = 0; int i, j; ListCell *l; Assert(npaths > 0); /* else caller error */ if (npaths == 1) return  linitial; /* easy case */ /* * In theory we should consider every nonempty subset of the given paths. * In practice that seems like overkill, given the crude nature of the * estimates, not to mention the possible effects of higher-level AND and * OR clauses. Moreover, it's completely impractical if there are a large * number of paths, since the work would grow as O. * 理论上,我们应该考虑给定路径的所有非空子集。在实践中, * 考虑到估算的不确定性和成本,以及更高级别的AND和OR约束可能产生的影响,这样的做法并不合适. * 此外,它并不切合实际,如果有大量的路径,这项工作的复杂度会是指数级的O。 * * As a heuristic, we first check for paths using exactly the same sets of * WHERE clauses + index predicate conditions, and reject all but the * cheapest-to-scan in any such group. This primarily gets rid of indexes * that include the interesting columns but also irrelevant columns. (In * situations where the DBA has gone overboard on creating variant * indexes, this can make for a very large reduction in the number of * paths considered further.) * 作为一种启发式方法,首先使用完全相同的WHERE子句+索引谓词条件集检查路径, * 并去掉这类条件组中除成本最低之外的所有路径。 * 这主要是去掉了包含interesting列和不相关列的索引。 * (在DBA过度创建索引的情况下,这会大大减少进一步考虑的路径数量。) * * We then sort the surviving paths with the cheapest-to-scan first, and * for each path, consider using that path alone as the basis for a bitmap * scan. Then we consider bitmap AND scans formed from that path plus * each subsequent (higher-cost) path, adding on a subsequent path if it * results in a reduction in the estimated total scan cost. This means we * consider about O rather than O path combinations, which is * quite tolerable, especially given than N is usually reasonably small * because of the prefiltering step. The cheapest of these is returned. * 然后,我们首先使用成本最低的扫描路径对现存的路径进行排序, * 对于每个路径,考虑单独使用该路径作为位图扫描的基础。 * 然后我们考虑位图和从该路径形成的扫描加上每个后续的路径, * 如果后续路径导致估算的总扫描成本减少,那么就添加一个后续路径。 * 这意味着我们只需要处理O,而不是O个路径组合, * 这样的成本完全可以接受,特别是N通常相当小时。函数返回成本最低的路径。 * * We will only consider AND combinations in which no two indexes use the * same WHERE clause. This is a bit of a kluge: it's needed because * costsize.c and clausesel.c aren't very smart about redundant clauses. * They will usually double-count the redundant clauses, producing a * too-small selectivity that makes a redundant AND step look like it * reduces the total cost. Perhaps someday that code will be smarter and * we can remove this limitation. (But note that this also defends * against flat-out duplicate input paths, which can happen because * match_join_clauses_to_index will find the same OR join clauses that * extract_restriction_or_clauses has pulled OR restriction clauses out * of.) * 我们将只考虑没有两个索引同时使用相同的WHERE子句的AND组合。 * 这是一个有点蹩脚的做法:之所以这样是因为cost.c和clausesel.c未能足够聪明的处理多余的子句。 * 它们通常会重复计算冗余子句,从而产生很小的选择性,使冗余子句看起来像是减少了总成本。 * 也许有一天,代码会变得更聪明,我们可以消除这个限制。 * (但是要注意,这也可以防止完全重复的输入路径, * 因为match_join_clauses_to_index会找到相同的OR连接子句,而这些子句 * 已通过extract_restriction_or_clauses函数提升到外面去了.) * * For the same reason, we reject AND combinations in which an index * predicate clause duplicates another clause. Here we find it necessary * to be even stricter: we'll reject a partial index if any of its * predicate clauses are implied by the set of WHERE clauses and predicate * clauses used so far. This covers cases such as a condition "x = 42" * used with a plain index, followed by a clauseless scan of a partial * index "WHERE x >= 40 AND x < 50". The partial index has been accepted * only because "x = 42" was present, and so allowing it would partially * double-count selectivity. (We could use predicate_implied_by on * regular qual clauses too, to have a more intelligent, but much more * expensive, check for redundancy --- but in most cases simple equality * seems to suffice.) * 出于同样的原因,我们不会组合索引谓词子句与另一个重复的子句。 * 在这里,有必要更加严格 : 如果部分索引的任何谓词子句 * 隐含在WHERE子句中,则不能使用此索引。 * 这里包括了形如使用普通索引的“x = 42”和使用部分索引“x >= 40和x < 50”的情况。 * 部分索引被接受,是因为存在“x = 42”,因此允许它部分重复计数选择性。 * (我们也可以在普通的qual子句上使用predicate_implied_by函数, * 这样就可以更智能但更昂贵地检查冗余——但在大多数情况下,简单的等式似乎就足够了。) */ /* * Extract clause usage info and detect any paths that use exactly the * same set of clauses; keep only the cheapest-to-scan of any such groups. * The surviving paths are put into an array for qsort'ing. * 提取子句使用信息并检测使用完全相同子句集的所有路径; * 只保留这类路径中成本最低的,这些路径被放入一个数组中进行qsort'ing */ pathinfoarray = (PathClauseUsage **) palloc(npaths * sizeof(PathClauseUsage *));//数组 clauselist = NIL; npaths = 0; foreach//遍历paths { Path *ipath =  lfirst; pathinfo = classify_index_clause_usage(ipath, &clauselist);//归类路径信息 for (i = 0; i < npaths; i++) { if (bms_equal(pathinfo->clauseids, pathinfoarray[i]->clauseids)) break;//只要发现子句集一样,就继续执行 } if (i < npaths)//发现相同的 { /* duplicate clauseids, keep the cheaper one */ //相同的约束条件,只保留成本最低的 Cost ncost; Cost ocost; Selectivity nselec; Selectivity oselec; cost_bitmap_tree_node(pathinfo->path, &ncost, &nselec);//计算成本 cost_bitmap_tree_node(pathinfoarray[i]->path, &ocost, &oselec); if (ncost < ocost) pathinfoarray[i] = pathinfo; } else//没有发现条件一样的,添加到数组中 { /* not duplicate clauseids, add to array */ pathinfoarray[npaths++] = pathinfo; } } /* If only one surviving path, we're done */ if (npaths == 1)//结果只有一条,则返回之 return pathinfoarray[0]->path; /* Sort the surviving paths by index access cost */ qsort(pathinfoarray, npaths, sizeof(PathClauseUsage *), path_usage_comparator);//以索引访问成本排序现存路径 /* * For each surviving index, consider it as an "AND group leader", and see * whether adding on any of the later indexes results in an AND path with * cheaper total cost than before. Then take the cheapest AND group. * 对于现存的索引,把它视为"AND group leader", * 并查看是否添加了以后的索引后,会得到一个总成本比以前更低的AND路径。 * 选择成本最低的AND组. * */ for (i = 0; i < npaths; i++)//遍历这些路径 { Cost costsofar; List *qualsofar; Bitmapset *clauseidsofar; ListCell *lastcell; pathinfo = pathinfoarray[i];//PathClauseUsage结构体 paths = list_make1(pathinfo->path);//路径链表 costsofar = bitmap_scan_cost_est(root, rel, pathinfo->path);//当前的成本 qualsofar = list_concat(list_copy(pathinfo->quals), list_copy(pathinfo->preds)); clauseidsofar = bms_copy(pathinfo->clauseids); lastcell = list_head; /* 用于快速删除,for quick deletions */ for (j = i + 1; j < npaths; j++)//扫描后续的路径 { Cost newcost; pathinfo = pathinfoarray[j]; /* Check for redundancy */ if (bms_overlap(pathinfo->clauseids, clauseidsofar)) continue; /* 多余的路径,consider it redundant */ if (pathinfo->preds)//部分索引? { bool redundant = false; /* we check each predicate clause separately */ //单独检查每一个谓词 foreach(l, pathinfo->preds) { Node *np =  lfirst; if (predicate_implied_by(list_make1, qualsofar, false)) { redundant = true; break; /* out of inner foreach loop */ } } if (redundant) continue; } /* tentatively add new path to paths, so we can estimate cost */ //尝试在路径中添加新路径,这样我们就可以估算成本 paths = lappend(paths, pathinfo->path); newcost = bitmap_and_cost_est(root, rel, paths);//估算成本 if (newcost < costsofar)//新成本更低 { /* keep new path in paths, update subsidiary variables */ costsofar = newcost; qualsofar = list_concat(qualsofar, list_copy(pathinfo->quals));//添加此条件 qualsofar = list_concat(qualsofar, list_copy(pathinfo->preds));//添加此谓词 clauseidsofar = bms_add_members(clauseidsofar, pathinfo->clauseids);//添加此子句ID lastcell = lnext; } else { /* reject new path, remove it from paths list */ paths = list_delete_cell(paths, lnext, lastcell);//去掉新路径 } Assert(lnext == NULL); } /* Keep the cheapest AND-group (or singleton) */ if (i == 0 || costsofar < bestcost)//单条路径或者取得最小的成本 { bestpaths = paths; bestcost = costsofar; } /* some easy cleanup (we don't try real hard though) */ list_free(qualsofar); } if (list_length(bestpaths) == 1) return  linitial(bestpaths); /* 无需AND路径,no need for AND */ return  create_bitmap_and_path(root, rel, bestpaths);//生成BitmapAndPath } //-------------------------------------------------------------------------- bitmap_scan_cost_est /* * Estimate the cost of actually executing a bitmap scan with a single * index path (no BitmapAnd, at least not at this level; but it could be * a BitmapOr). */ static Cost bitmap_scan_cost_est(PlannerInfo *root, RelOptInfo *rel, Path *ipath) { BitmapHeapPath bpath; Relids required_outer; /* Identify required outer rels, in case it's a parameterized scan */ required_outer = get_bitmap_tree_required_outer; /* Set up a dummy BitmapHeapPath */ bpath.path.type = T_BitmapHeapPath; bpath.path.pathtype = T_BitmapHeapScan; bpath.path.parent = rel; bpath.path.pathtarget = rel->reltarget; bpath.path.param_info = get_baserel_parampathinfo(root, rel, required_outer); bpath.path.pathkeys = NIL; bpath.bitmapqual = ipath; /* * Check the cost of temporary path without considering parallelism. * Parallel bitmap heap path will be considered at later stage. */ bpath.path.parallel_workers = 0; cost_bitmap_heap_scan(&bpath.path, root, rel, bpath.path.param_info, ipath, get_loop_count(root, rel->relid, required_outer));//BitmapHeapPath计算成本 return bpath.path.total_cost; } //-------------------------------------------------------------------------- bitmap_and_cost_est /* * Estimate the cost of actually executing a BitmapAnd scan with the given * inputs. * 给定输入,估算实际执行BitmapAnd扫描的实际成本 */ static Cost bitmap_and_cost_est(PlannerInfo *root, RelOptInfo *rel, List *paths) { BitmapAndPath apath; BitmapHeapPath bpath; Relids required_outer; /* Set up a dummy BitmapAndPath */ apath.path.type = T_BitmapAndPath; apath.path.pathtype = T_BitmapAnd; apath.path.parent = rel; apath.path.pathtarget = rel->reltarget; apath.path.param_info = NULL; /* not used in bitmap trees */ apath.path.pathkeys = NIL; apath.bitmapquals = paths; cost_bitmap_and_node(&apath, root); /* Identify required outer rels, in case it's a parameterized scan */ required_outer = get_bitmap_tree_required_outer &apath); /* Set up a dummy BitmapHeapPath */ bpath.path.type = T_BitmapHeapPath; bpath.path.pathtype = T_BitmapHeapScan; bpath.path.parent = rel; bpath.path.pathtarget = rel->reltarget; bpath.path.param_info = get_baserel_parampathinfo(root, rel, required_outer); bpath.path.pathkeys = NIL; bpath.bitmapqual =  &apath; /* * Check the cost of temporary path without considering parallelism. * Parallel bitmap heap path will be considered at later stage. */ bpath.path.parallel_workers = 0; /* Now we can do cost_bitmap_heap_scan */ cost_bitmap_heap_scan(&bpath.path, root, rel, bpath.path.param_info,  &apath, get_loop_count(root, rel->relid, required_outer));//BitmapHeapPath计算成本 return bpath.path.total_cost; } //-------------------------------------------------------------------------- create_bitmap_and_path /* * create_bitmap_and_path * Creates a path node representing a BitmapAnd. */ BitmapAndPath * create_bitmap_and_path(PlannerInfo *root, RelOptInfo *rel, List *bitmapquals) { BitmapAndPath *pathnode = makeNode(BitmapAndPath); pathnode->path.pathtype = T_BitmapAnd; pathnode->path.parent = rel; pathnode->path.pathtarget = rel->reltarget; pathnode->path.param_info = NULL; /* not used in bitmap trees */ /* * Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be * parallel-safe if and only if rel->consider_parallel is set. So, we can * set the flag for this path based only on the relation-level flag, * without actually iterating over the list of children. */ pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel; pathnode->path.parallel_workers = 0; pathnode->path.pathkeys = NIL; /* always unordered */ pathnode->bitmapquals = bitmapquals; /* this sets bitmapselectivity as well as the regular cost fields: */ cost_bitmap_and_node(pathnode, root);//计算成本 return pathnode; }//----------------------------------------------------- cost_bitmap_and_node /* * cost_bitmap_and_node * Estimate the cost of a BitmapAnd node * 估算BitmapAnd节点成本 * * Note that this considers only the costs of index scanning and bitmap * creation, not the eventual heap access. In that sense the object isn't * truly a Path, but it has enough path-like properties (costs in particular) * to warrant treating it as one. We don't bother to set the path rows field, * however. */ void cost_bitmap_and_node(BitmapAndPath *path, PlannerInfo *root) { Cost totalCost; Selectivity selec; ListCell *l; /* * We estimate AND selectivity on the assumption that the inputs are * independent. This is probably often wrong, but we don't have the info * to do better. * * The runtime cost of the BitmapAnd itself is estimated at 100x * cpu_operator_cost for each tbm_intersect needed. Probably too small, * definitely too simplistic? */ totalCost = 0.0; selec = 1.0; foreach(l, path->bitmapquals) { Path *subpath =  lfirst; Cost subCost; Selectivity subselec; cost_bitmap_tree_node(subpath, &subCost, &subselec); selec *= subselec; totalCost += subCost; if (l != list_head(path->bitmapquals)) totalCost += 100.0 * cpu_operator_cost; } path->bitmapselectivity = selec; path->path.rows = 0; /* per above, not used */ path->path.startup_cost = totalCost; path->path.total_cost = totalCost; }

这一小节继续介绍查询物理优化中的create_index_paths->create_bitmap_heap_path函数,该函数创立位图堆扫描访谈路线节点。关于BitmapHeapScan的相关知识,请仿照效法PostgreSQL DBA - SeqScan vs IndexScan vs BitmapHeapScan那篇小说.本节未有描述具体的Cost开支总计方法,后续再行详述。

这一小节继续介绍查询物理优化中的create_index_paths->choose_bitmap_and,该函数奉行Bitmap AND操作后创立位图索引围观访谈路线(BitmapAndPath)节点。关于Bitmap Scan的连锁文化,请参见PostgreSQL DBA - SeqScan vs IndexScan vs BitmapHeapScan这篇文章.

二、源码解读

create_bitmap_heap_path函数create_index_paths->create_bitmap_heap_path函数,创造位图堆扫描访问路线节点.

 /* * create_bitmap_heap_path * Creates a path node for a bitmap scan. * 创建位图堆扫描访问路径节点 * * 'bitmapqual' is a tree of IndexPath, BitmapAndPath, and BitmapOrPath nodes. * bitmapqual-IndexPath, BitmapAndPath, and BitmapOrPath节点组成的树 * 'required_outer' is the set of outer relids for a parameterized path. * required_outer-参数化路径中依赖的外部relids * 'loop_count' is the number of repetitions of the indexscan to factor into * estimates of caching behavior. * loop_count-上一节已介绍 * * loop_count should match the value used when creating the component * IndexPaths. */ BitmapHeapPath * create_bitmap_heap_path(PlannerInfo *root, RelOptInfo *rel, Path *bitmapqual, Relids required_outer, double loop_count, int parallel_degree) { BitmapHeapPath *pathnode = makeNode(BitmapHeapPath);//创建节点 pathnode->path.pathtype = T_BitmapHeapScan; pathnode->path.parent = rel; pathnode->path.pathtarget = rel->reltarget; pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->path.parallel_aware = parallel_degree > 0 ? true : false; pathnode->path.parallel_safe = rel->consider_parallel; pathnode->path.parallel_workers = parallel_degree; pathnode->path.pathkeys = NIL; /* always unordered */ pathnode->bitmapqual = bitmapqual; cost_bitmap_heap_scan(&pathnode->path, root, rel, pathnode->path.param_info, bitmapqual, loop_count);//成本估算 return pathnode;//返回结果 } //-------------------------------------------------------- cost_bitmap_heap_scan /* * cost_bitmap_heap_scan * Determines and returns the cost of scanning a relation using a bitmap * index-then-heap plan. * 确定并返回使用BitmapIndexScan和BitmapHeapScan的成本. * * 'baserel' is the relation to be scanned * baserel-需扫描的Relation * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL * param_info-参数化信息 * 'bitmapqual' is a tree of IndexPaths, BitmapAndPaths, and BitmapOrPaths * bitmapqual-位图条件表达式,IndexPath, BitmapAndPath, and BitmapOrPath节点组成的树 * 'loop_count' is the number of repetitions of the indexscan to factor into * estimates of caching behavior * * Note: the component IndexPaths in bitmapqual should have been costed * using the same loop_count. */ void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel, ParamPathInfo *param_info, Path *bitmapqual, double loop_count) { Cost startup_cost = 0;//启动成本 Cost run_cost = 0;//执行成本 Cost indexTotalCost;//索引扫描总成本 QualCost qpqual_cost;//表达式成本 Cost cpu_per_tuple; Cost cost_per_page; Cost cpu_run_cost; double tuples_fetched; double pages_fetched; double spc_seq_page_cost, spc_random_page_cost; double T; /* Should only be applied to base relations */ Assert(IsA(baserel, RelOptInfo)); Assert(baserel->relid > 0); Assert(baserel->rtekind == RTE_RELATION); /* Mark the path with the correct row estimate */ if (param_info) path->rows = param_info->ppi_rows; else path->rows = baserel->rows; if (!enable_bitmapscan)//不允许位图扫描 startup_cost += disable_cost;//禁用之 pages_fetched = compute_bitmap_pages(root, baserel, bitmapqual, loop_count, &indexTotalCost, &tuples_fetched);//计算页面数 startup_cost += indexTotalCost;//启动成本为BitmapIndexScan的总成本 T = (baserel->pages > 1) ?  baserel->pages : 1.0;//页面数 /* Fetch estimated page costs for tablespace containing table. */ get_tablespace_page_costs(baserel->reltablespace, &spc_random_page_cost, &spc_seq_page_cost);//访问表空间页面成本 /* * For small numbers of pages we should charge spc_random_page_cost * apiece, while if nearly all the table's pages are being read, it's more * appropriate to charge spc_seq_page_cost apiece. The effect is * nonlinear, too. For lack of a better idea, interpolate like this to * determine the cost per page. * 对于少量的页面,每个页面的成本为spc_random_page_cost, * 而如果几乎所有的页面都被读取,则每个页面的成本为spc_seq_page_cost。 * 这种影响也是非线性的。由于缺乏更好的方法,通过插值法确定每页的成本。 */ if (pages_fetched >= 2.0) cost_per_page = spc_random_page_cost - (spc_random_page_cost - spc_seq_page_cost) * sqrt(pages_fetched / T); else cost_per_page = spc_random_page_cost; run_cost += pages_fetched * cost_per_page;//执行成本 /* * Estimate CPU costs per tuple. * 为每个元组估算CPU成本(Rechck步骤的成本) * * Often the indexquals don't need to be rechecked at each tuple ... but * not always, especially not if there are enough tuples involved that the * bitmaps become lossy. For the moment, just assume they will be * rechecked always. This means we charge the full freight for all the * scan clauses. * 通常情况下,索引约束条件不需要在每个元组上重新检查,但现实并非如此理想, * 尤其是当涉及到较多的元组时。就目前而言, * 优化器会假设它们总是会被重新检查。这意味着我们需要为所有扫描条件计算成本。 */ get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);//获取条件表达式 startup_cost += qpqual_cost.startup;//增加启动成本 cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;//增加处理每个元组的CPU成本 cpu_run_cost = cpu_per_tuple * tuples_fetched;//CPU运行成本 /* Adjust costing for parallelism, if used. */ if (path->parallel_workers > 0)//是否并行? { double parallel_divisor = get_parallel_divisor; /* The CPU cost is divided among all the workers. */ cpu_run_cost /= parallel_divisor; path->rows = clamp_row_est(path->rows / parallel_divisor); } //计算最终成本 run_cost += cpu_run_cost; /* tlist eval costs are paid per output row, not per tuple scanned */ startup_cost += path->pathtarget->cost.startup; run_cost += path->pathtarget->cost.per_tuple * path->rows; path->startup_cost = startup_cost; path->total_cost = startup_cost + run_cost; }//--------------------------------------- compute_bitmap_pages /* * compute_bitmap_pages * * compute number of pages fetched from heap in bitmap heap scan. * 计算页面数 */ double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual, int loop_count, Cost *cost, double *tuple) { Cost indexTotalCost; Selectivity indexSelectivity; double T; double pages_fetched; double tuples_fetched; double heap_pages; long maxentries; /* * Fetch total cost of obtaining the bitmap, as well as its total * selectivity. * 获取位图的总成本,以及它的总选择性。 */ cost_bitmap_tree_node(bitmapqual, &indexTotalCost, &indexSelectivity); /* * Estimate number of main-table pages fetched. * 估算主表的页面数 */ tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);//计算总元组数 T = (baserel->pages > 1) ?  baserel->pages : 1.0; /* * For a single scan, the number of heap pages that need to be fetched is * the same as the Mackert and Lohman formula for the case T <= b (ie, no * re-reads needed). * 对于单个扫描,需要获取的堆页面数量与T <= b的Mackert和Lohman公式相同。 */ pages_fetched = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched); /* * Calculate the number of pages fetched from the heap. Then based on * current work_mem estimate get the estimated maxentries in the bitmap. * (Note that we always do this calculation based on the number of pages * that would be fetched in a single iteration, even if loop_count > 1. * That's correct, because only that number of entries will be stored in * the bitmap at one time.) * 计算从堆中读取的页面数. * 根据当前的work_mem估算得到位图中粗略的最大访问入口。 * (请注意,我们总是根据单个迭代中获取的页面数来进行计算, * 即使loop_count > 1也是如此。因为只有该数量的条目在位图中只存储一次。 */ heap_pages = Min(pages_fetched, baserel->pages);//堆页面数 maxentries = tbm_calculate_entries(work_mem * 1024L);//位图最大入口数 if (loop_count > 1) { /* * For repeated bitmap scans, scale up the number of tuples fetched in * the Mackert and Lohman formula by the number of scans, so that we * estimate the number of pages fetched by all the scans. Then * pro-rate for one scan. */ pages_fetched = index_pages_fetched(tuples_fetched * loop_count, baserel->pages, get_indexpath_pages(bitmapqual), root); pages_fetched /= loop_count; } if (pages_fetched >= T) pages_fetched = T;//数据字典中的页面数 else pages_fetched = ceil(pages_fetched); if (maxentries < heap_pages)//最大入口数小于堆页面数 { double exact_pages; double lossy_pages; /* * Crude approximation of the number of lossy pages. Because of the * way tbm_lossify() is coded, the number of lossy pages increases * very sharply as soon as we run short of memory; this formula has * that property and seems to perform adequately in testing, but it's * possible we could do better somehow. * 粗略估计缺页的数目。由于tbm_lossify()的编码方式, * 一旦内存不足,缺页的数量就会急剧增加; * 这个公式有这个性质,在测试中表现得很好,但有可能做得更好。 */ lossy_pages = Max(0, heap_pages - maxentries / 2); exact_pages = heap_pages - lossy_pages; /* * If there are lossy pages then recompute the number of tuples * processed by the bitmap heap node. We assume here that the chance * of a given tuple coming from an exact page is the same as the * chance that a given page is exact. This might not be true, but * it's not clear how we can do any better. * 如果存在缺页面,则重新计算位图堆节点处理的元组数量。 * 这里假设给定元组来自精确页面的概率与给定页面的概率相同。 * 但这可能不符合实际情况,但我们不清楚如何才能做得更好:( */ if (lossy_pages > 0) tuples_fetched = clamp_row_est(indexSelectivity * (exact_pages / heap_pages) * baserel->tuples + (lossy_pages / heap_pages) * baserel->tuples); } if  *cost = indexTotalCost; if  *tuple = tuples_fetched; return pages_fetched; }//--------------------------- tbm_calculate_entries /* * tbm_calculate_entries * * Estimate number of hashtable entries we can have within maxbytes. */ long tbm_calculate_entries(double maxbytes) { long nbuckets; /* * Estimate number of hashtable entries we can have within maxbytes. This * estimates the hash cost as sizeof(PagetableEntry), which is good enough * for our purpose. Also count an extra Pointer per entry for the arrays * created during iteration readout. * 估计maxbytes中可以包含的哈希表条目的数量。 * 这将散列成本估计为sizeof(PagetableEntry),这已经足够好了。 * 还要为迭代读出期间创建的数组中每个条目计算额外的指针。 */ nbuckets = maxbytes / (sizeof(PagetableEntry) + sizeof + sizeof;//桶数 nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */ nbuckets = Max(nbuckets, 16); /* sanity limit */ return nbuckets; }//--------------------------- cost_bitmap_tree_node /* * cost_bitmap_tree_node * Extract cost and selectivity from a bitmap tree node (index/and/or) */ void cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec) { if (IsA(path, IndexPath))//索引访问路径 { *cost = ((IndexPath *) path)->indextotalcost; *selec = ((IndexPath *) path)->indexselectivity; /* * Charge a small amount per retrieved tuple to reflect the costs of * manipulating the bitmap. This is mostly to make sure that a bitmap * scan doesn't look to be the same cost as an indexscan to retrieve a * single tuple. * 对每个检索到的元组计算少量成本,以反映操作位图的成本。 * 这主要是为了确保位图扫描与索引扫描检索单个元组的成本不一样。 */ *cost += 0.1 * cpu_operator_cost * path->rows; } else if (IsA(path, BitmapAndPath))//BitmapAndPath { *cost = path->total_cost; *selec = ((BitmapAndPath *) path)->bitmapselectivity; } else if (IsA(path, BitmapOrPath))//BitmapOrPath { *cost = path->total_cost; *selec = ((BitmapOrPath *) path)->bitmapselectivity; } else { elog(ERROR, "unrecognized node type: %d", nodeTag; *cost = *selec = 0; /* keep compiler quiet */ } }

上边是BitmapAnd访问路线的样例:

三、追踪剖判

测量检验脚本如下

select t1.* from t_dwxx t1 where dwbh > '10000' and dwbh < '30000';

启动gdb跟踪

 b create_bitmap_heap_pathBreakpoint 1 at 0x78f1c1: file pathnode.c, line 1090. cContinuing.Breakpoint 1, create_bitmap_heap_path (root=0x23d93d8, rel=0x248a788, bitmapqual=0x2473a08, required_outer=0x0, loop_count=1, parallel_degree=0) at pathnode.c:10901090 BitmapHeapPath *pathnode = makeNode(BitmapHeapPath);

创建节点,并赋值

1090 BitmapHeapPath *pathnode = makeNode(BitmapHeapPath); n1092 pathnode->path.pathtype = T_BitmapHeapScan; n1093 pathnode->path.parent = rel; n1094 pathnode->path.pathtarget = rel->reltarget; n1095 pathnode->path.param_info = get_baserel_parampathinfo(root, rel, 1097 pathnode->path.parallel_aware = parallel_degree > 0 ? true : false; 1098 pathnode->path.parallel_safe = rel->consider_parallel; 1099 pathnode->path.parallel_workers = parallel_degree; 1100 pathnode->path.pathkeys = NIL; /* always unordered */ 1102 pathnode->bitmapqual = bitmapqual;

进入cost_bitmap_heap_scan函数

 1104 cost_bitmap_heap_scan(&pathnode->path, root, rel, stepcost_bitmap_heap_scan (path=0x24737d8, root=0x23d93d8, baserel=0x248a788, param_info=0x0, bitmapqual=0x2473a08, loop_count=1) at costsize.c:949949 Cost startup_cost = 0;

输入参数,在那之中bitmapqual为T_IndexPath节点路线的别样主要消息:rows = 2223, startup_cost = 0.28500000000000003, total_cost = 169.23871600907944

 p *(IndexPath *)bitmapqual$2 = {path = {type = T_IndexPath, pathtype = T_IndexScan, parent = 0x248a788, pathtarget = 0x248a998, param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 2223, startup_cost = 0.28500000000000003, total_cost = 169.23871600907944, pathkeys = 0x0}, indexinfo = 0x23b63b8, indexclauses = 0x2473948, indexquals = 0x2473b38, indexqualcols = 0x2473b88, indexorderbys = 0x0, indexorderbycols = 0x0, indexscandir = ForwardScanDirection, indextotalcost = 50.515000000000001, indexselectivity = 0.22227191011235958}

始于图谋本金

...980 startup_cost += indexTotalCost; p indexTotalCost$16 = 51.070750000000004 p startup_cost$17 = 0 p pages_fetched$18 = 64 p baserel->pages$19 = 64... p qpqual_cost$20 = {startup = 0, per_tuple = 0.0050000000000000001}

谈到底的寻访路线消息

 p *(BitmapHeapPath *)path$22 = {path = {type = T_BitmapHeapPath, pathtype = T_BitmapHeapScan, parent = 0x248a788, pathtarget = 0x248a998, param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 2223, startup_cost = 51.070750000000004, total_cost = 148.41575, pathkeys = 0x0}, bitmapqual = 0x2473a08}

除此而外BitmapHeapPath,还会有BitmapOr和BitmapAnd,那二种Path的深入分析后续再详述.

一、数据结构

Cost相关小心:实际利用的参数值通过系统布署文件定义,实际不是这里的常量定义!

 typedef double Cost; /* execution cost (in page-access units) */ /* defaults for costsize.c's Cost parameters */ /* NB: cost-estimation code should use the variables, not these constants! */ /* 注意:实际值通过系统配置文件定义,而不是这里的常量定义! */ /* If you change these, update backend/utils/misc/postgresql.sample.conf */ #define DEFAULT_SEQ_PAGE_COST 1.0 //顺序扫描page的成本 #define DEFAULT_RANDOM_PAGE_COST 4.0 //随机扫描page的成本 #define DEFAULT_CPU_TUPLE_COST 0.01 //处理一个元组的CPU成本 #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005 //处理一个索引元组的CPU成本 #define DEFAULT_CPU_OPERATOR_COST 0.0025 //执行一次操作或函数的CPU成本 #define DEFAULT_PARALLEL_TUPLE_COST 0.1 //并行执行,从一个worker传输一个元组到另一个worker的成本 #define DEFAULT_PARALLEL_SETUP_COST 1000.0 //构建并行执行环境的成本 #define DEFAULT_EFFECTIVE_CACHE_SIZE 524288 /*先前已有介绍, measured in pages */ double seq_page_cost = DEFAULT_SEQ_PAGE_COST; double random_page_cost = DEFAULT_RANDOM_PAGE_COST; double cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST; double cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST; double cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST; double parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST; double parallel_setup_cost = DEFAULT_PARALLEL_SETUP_COST; int effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE; Cost disable_cost = 1.0e10;//1后面10个0,通过设置一个巨大的成本,让优化器自动放弃此路径 int max_parallel_workers_per_gather = 2;//每次gather使用的worker数

PathClauseUsage

 /* Per-path data used within choose_bitmap_and() */ typedef struct { Path *path; /* 访问路径链表,IndexPath, BitmapAndPath, or BitmapOrPath */ List *quals; /* 限制条件子句链表,the WHERE clauses it uses */ List *preds; /* 部分索引谓词链表,predicates of its partial index */ Bitmapset *clauseids; /* 位图集合,quals+preds represented as a bitmapset */ } PathClauseUsage;

一、数据结构

Cost相关瞩目:实际使用的参数值通过系统布局文件定义,并非这里的常量定义!

 typedef double Cost; /* execution cost (in page-access units) */ /* defaults for costsize.c's Cost parameters */ /* NB: cost-estimation code should use the variables, not these constants! */ /* 注意:实际值通过系统配置文件定义,而不是这里的常量定义! */ /* If you change these, update backend/utils/misc/postgresql.sample.conf */ #define DEFAULT_SEQ_PAGE_COST 1.0 //顺序扫描page的成本 #define DEFAULT_RANDOM_PAGE_COST 4.0 //随机扫描page的成本 #define DEFAULT_CPU_TUPLE_COST 0.01 //处理一个元组的CPU成本 #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005 //处理一个索引元组的CPU成本 #define DEFAULT_CPU_OPERATOR_COST 0.0025 //执行一次操作或函数的CPU成本 #define DEFAULT_PARALLEL_TUPLE_COST 0.1 //并行执行,从一个worker传输一个元组到另一个worker的成本 #define DEFAULT_PARALLEL_SETUP_COST 1000.0 //构建并行执行环境的成本 #define DEFAULT_EFFECTIVE_CACHE_SIZE 524288 /*先前已有介绍, measured in pages */ double seq_page_cost = DEFAULT_SEQ_PAGE_COST; double random_page_cost = DEFAULT_RANDOM_PAGE_COST; double cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST; double cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST; double cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST; double parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST; double parallel_setup_cost = DEFAULT_PARALLEL_SETUP_COST; int effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE; Cost disable_cost = 1.0e10;//1后面10个0,通过设置一个巨大的成本,让优化器自动放弃此路径 int max_parallel_workers_per_gather = 2;//每次gather使用的worker数

三、追踪剖析

测试脚本如下

select t1.* from t_dwxx t1 where (dwbh > '10000' and dwbh < '15000') AND (dwdz between 'DWDZ10000' and 'DWDZ15000');

启动gdb跟踪

 b choose_bitmap_andBreakpoint 1 at 0x74e8c2: file indxpath.c, line 1372. cContinuing.Breakpoint 1, choose_bitmap_and (root=0x1666638, rel=0x1666a48, paths=0x166fdf0) at indxpath.c:13721372 int npaths = list_length;

输入参数

 p *paths$1 = {type = T_List, length = 2, head = 0x166fe20, tail = 0x16706b8} p *paths->head->data.ptr_value$2 = {type = T_IndexPath} p *paths->head->next->data.ptr_value$3 = {type = T_IndexPath} set $p1=(IndexPath *)paths->head->data.ptr_value set $p2=(IndexPath *)paths->head->next->data.ptr_value p *$p1$4 = {path = {type = T_IndexPath, pathtype = T_IndexScan, parent = 0x1666a48, pathtarget = 0x166d988, param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 33, startup_cost = 0.28500000000000003, total_cost = 116.20657683302848, pathkeys = 0x0}, indexinfo = 0x166e420, indexclauses = 0x166f528, indexquals = 0x166f730, indexqualcols = 0x166f780, indexorderbys = 0x0, indexorderbycols = 0x0, indexscandir = ForwardScanDirection, indextotalcost = 18.205000000000002, indexselectivity = 0.059246954595791879} p *$p2$5 = {path = {type = T_IndexPath, pathtype = T_IndexScan, parent = 0x1666a48, pathtarget = 0x166d988, param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 33, startup_cost = 0.28500000000000003, total_cost = 111.33157683302848, pathkeys = 0x0}, indexinfo = 0x1666c58, indexclauses = 0x166fed0, indexquals = 0x166ffc8, indexqualcols = 0x1670018, indexorderbys = 0x0, indexorderbycols = 0x0, indexscandir = ForwardScanDirection, indextotalcost = 13.855, indexselectivity = 0.055688888888888899}

paths中的第三个成分对应(dwbh > '一千0' and dwbh < '14000') ,第3个要素对应(dwdz between 'DWDZ10000' and 'DWDZ1四千')

 set $ri1=(RestrictInfo *)$p1->indexclauses->head->data.ptr_value set $tmp=(RelabelType *)($ri1->clause)->args->head->data.ptr_value p *$tmp->arg$17 = {xpr = {type = T_Var}, varno = 1, varattno = 3, vartype = 1043, vartypmod = 104, varcollid = 100, varlevelsup = 0, varnoold = 1, varoattno = 3, location = 76} p *($ri1->clause)->args->head->next->data.ptr_value$18 = {type = T_Const} p *($ri1->clause)->args->head->next->data.ptr_value$19 = {xpr = {type = T_Const}, consttype = 25, consttypmod = -1, constcollid = 100, constlen = -1, constvalue = 23636608, constisnull = false, constbyval = false, location = 89}

最早遍历paths,提取子句条件并检查评定是不是选择千篇一律子句集的兼具路径,只保留这个路子中资金最低的,那么些门路被归入一个数组中举办qsort.

... 1444 npaths = 0; 1445 foreach 

采摘音讯到PathClauseUsage数组中

... n1471 pathinfoarray[npaths++] = pathinfo; 1445 foreach 1476 if (npaths == 1) p npaths$26 = 2 

按花费排序

 n1480 qsort(pathinfoarray, npaths, sizeof(PathClauseUsage *),

遍历路线,找到开销最低的AND group

1488 for (i = 0; i < npaths; i++) n1495 pathinfo = pathinfoarray[i]; 1496 paths = list_make1(pathinfo->path); 1497 costsofar = bitmap_scan_cost_est(root, rel, pathinfo->path); 1499 list_copy(pathinfo->preds));

取伏贴前的资本,设置当前的尺度子句

 p costsofar$27 = 89.003250000000008 n1498 qualsofar = list_concat(list_copy(pathinfo->quals),

试行AND操作,花费更低,调治当前资金财产和有关变量

 n1531 newcost = bitmap_and_cost_est(root, rel, paths); 1532 if (newcost < costsofar) p newcost$30 = 88.375456720095343 n1535 costsofar = newcost; n1537 list_copy(pathinfo->quals)); 1536 qualsofar = list_concat(qualsofar, 1539 list_copy(pathinfo->preds));

拍卖下贰个AND条件,单个AND条件比上三个规范化开支高,保留原本的

1488 for (i = 0; i < npaths; i++) 1495 pathinfo = pathinfoarray[i]; 1496 paths = list_make1(pathinfo->path); 1497 costsofar = bitmap_scan_cost_est(root, rel, pathinfo->path); 1499 list_copy(pathinfo->preds)); p costsofar$34 = 94.053250000000006 n1498 qualsofar = list_concat(list_copy(pathinfo->quals), 1500 clauseidsofar = bms_copy(pathinfo->clauseids); 1501 lastcell = list_head; /* for quick deletions */ 1503 for (j = i + 1; j < npaths; j++) 1553 if (i == 0 || costsofar < bestcost) p i$35 = 1 p costsofar$36 = 94.053250000000006 p bestcost$37 = 88.375456720095343 

构建BitmapAndPath,返回

 n1563 if (list_length(bestpaths) == 1) 1565 return  create_bitmap_and_path(root, rel, bestpaths); 1566 }

DONE!

 ncreate_index_paths (root=0x1666638, rel=0x1666a48) at indxpath.c:337337 bpath = create_bitmap_heap_path(root, rel, bitmapqual,

四、参照他事他说加以考察资料

allpaths.ccost.hcostsize.cPG Document:Query Planning

testdb=# explain verbose select t1.* testdb-# from t_dwxx t1 testdb-# where (dwbh > '10000' and dwbh < '15000') AND (dwdz between 'DWDZ10000' and 'DWDZ15000');QUERY PLAN ---------------------------------------------------------------------------------------------- Bitmap Heap Scan on public.t_dwxx t1 (cost=32.33..88.38 rows=33 width=20) Output: dwmc, dwbh, dwdz Recheck Cond: (::text > '10000'::text) AND ::text < '15000'::text) AND ::text >= 'DWDZ10000'::text) AND ::text <= 'DWDZ15000'::text)) -> BitmapAnd (cost=32.33..32.33 rows=33 width=0) -->BitmapAnd -> Bitmap Index Scan on t_dwxx_pkey (cost=0.00..13.86 rows=557 width=0) Index Cond: (::text > '10000'::text) AND ::text < '15000'::text)) -> Bitmap Index Scan on idx_dwxx_dwdz (cost=0.00..18.21 rows=592 width=0) Index Cond: (::text >= 'DWDZ10000'::text) AND ::text <= 'DWDZ15000'::text))

本文由澳门新葡8455手机版发布于新闻资讯,转载请注明出处:BitmapHeapScan那篇小说.

关键词:

  • 上一篇:没有了
  • 下一篇:没有了