Jesus

🔍

question:Q: Query performance is really bad with multiple joins and unions, is there any other way to improve execution time for this? I am running this query inside a stored procedure, but I want to improve performance since its taking hours to execute on 13.2 million rows affected. Is there any way to improve performance? I am using Postgresql + PgAdmin. Query: INSERT INTO t_temporary_id_table ( ref_date, id, client, slcunda, count_single_1, count_double_2, count_e, count_g, count_m, active, valid_till_max, id_2, created, lastmodified ) with cte_tmp as ( select a.id, mm.tenant, c.slcunda, d.single, d.count_single, e.double_1, e.count_double_1, SUM (CASE WHEN id_role = 'E' THEN 1 ELSE 0 END) AS count_e, SUM (CASE WHEN id_role = 'G' THEN 1 ELSE 0 END) AS count_g, SUM (CASE WHEN id_role = 'M' THEN 1 ELSE 0 END) AS count_m, case when min(status)='active' then 1 else 0 end active, MAX(valid_till) as valid_till_max from schema1.struct a inner join ( select id, max(valid_till) valid_till_max from schema1.struct a group by id ) b on a.id=b.id and a.valid_till = b.valid_till_max left outer join schema2.tenants mm on a.tsl_1_2 = mm.tenant left outer join ( select id, key_1 as slcunda from schema1.t_id where id in (Select id from schema1.t_id group by id having count(id)=1) ) c on a.id=c.id left outer join( select id, count_single, single from ( select id, id_2 as single, id_2_role, count(id) over(partition by id) as count_single, row_number() over(partition by id order by id, id_2_role desc) as rn from schema1.different_id_2 where id_2_role in ('03','08','17') ) a where rn=1 ) d on a.id=d.id left outer join( select id, count_double_1, double_1 from ( select id, id_2 as double_1, id_2_role, count(id) over(partition by id) as count_double_1, row_number() over(partition by id order by id, id_2_role desc) as rn from schema1.different_id_2 where id_2_role in ('06','19') ) a where rn=1 ) e on a.id=e.id group by a.id,mm.client,c.slcunda,d.single,d.count_single,e.double_1,e.count_double_1 ), y as ( select * from ( SELECT cte_tmp.id, client, slcunda, count_single as count_single_1, count_double_1 as count_double_2, count_e, count_g, count_m, active, valid_till_max, b.id_2 FROM cte_tmp inner join ( select id, id_2 from ( select id, id_theory as id_2, row_number() over(partition by id order by id) rn from schema1.struct ) a where rn=1 ) b on cte_tmp.id=b.id where count_e=1 and count_g=0 and count_m=0 and count_single=0 union all SELECT id, client, slcunda, count_single as count_single_1, count_double_1 as count_double_2, count_e, count_g, count_m, active, valid_till_max, single as id_2 FROM cte_tmp where count_e=1 and count_g=0 and count_m=0 and count_single>=1 union all SELECT id, client, slcunda, count_single as count_single_1, count_double_1 as count_double_2, count_e, count_g, count_m, active, valid_till_max, double_1 as id_2 FROM cte_tmp where count_e=0 and count_g=1 and count_m>=1 and active=1 and count_double>=1 union all SELECT id, client, slcunda, count_single as count_single_1, count_double_1 as count_double_2, count_e, count_g, count_m, active, valid_till_max, double_1 as id_2 FROM cte_tmp where count_e<>1 and count_g<>0 and count_m<>=0 and active=0 and count_double>=1 ) a ), z as ( SELECT cte_tmp.id, client, slcunda, count_single as count_single_1, count_double_1 as count_double_2, count_e, count_g, count_m, active, valid_till_max FROM cte_tmp except select id, client, slcunda, count_single_1, count_double_2, count_e, count_g, count_m, active, valid_till_max from y ), temporary_result as ( select id::bigint, client, slcunda, count_single_1, count_double_2, count_e, count_g, count_m, active, valid_till_max, '' as id_2 from z union all select id::bigint, client, slcunda, count_single_1, count_double_2, count_e, count_g, count_m, active, valid_till_max, id_2 from y ) select now(), id, client, slcunda, count_single_1, count_double_2, count_e, count_g, count_m, active, valid_till_max, id_2::bigint, now(), now() from temporary_result QUERY PLAN Insert on t_temporary_id_table (cost=841905.48..842027.08 rows=206 width=48) (actual time=28853.074..28853.111 rows=0 loops=1) Buffers: shared hit=111871, local hit=1010060 read=2 dirtied=9372 written=18742, temp read=114117 written=92895 I/O Timings: read=0.007 -> Subquery Scan on *SELECT* (cost=841905.48..842027.08 rows=206 width=48) (actual time=26620.987..28062.291 rows=991322 loops=1) Buffers: shared hit=111871, temp read=114117 written=92895 -> Result (cost=841905.48..842019.35 rows=206 width=102) (actual time=26620.981..27612.767 rows=991322 loops=1) Buffers: shared hit=111871, temp read=114117 written=92895 CTE cte_tmp -> GroupAggregate (cost=655151.28..655242.89 rows=1745 width=97) (actual time=20685.859..22471.231 rows=991322 loops=1) Group Key: a.id, mm.tenant, t_id.key_1, a_1.single, a_1.count_single, a_2.double_1, a_2.count_double_1 Buffers: shared hit=91523, temp read=77862 written=77948 -> Sort (cost=655151.28..655155.64 rows=1745 width=78) (actual time=20685.833..21282.869 rows=999585 loops=1) Sort Key: a.id, mm.tenant, t_id.key_1, a_1.single, a_1.count_single, a_2.double_1, a_2.count_double_1 Sort Method: external merge Disk: 52920kB Buffers: shared hit=91523, temp read=77862 written=77948 -> Hash Left Join (cost=605059.82..655057.32 rows=1745 width=78) (actual time=14404.197..19018.340 rows=999585 loops=1) Hash Cond: (a. tsl_1_2 = (mm.tenant)::numeric) Buffers: shared hit=91523, temp read=67869 written=67928 -> Hash Right Join (cost=605057.61..655028.93 rows=1745 width=79) (actual time=14404.146..18397.854 rows=999585 loops=1) Hash Cond: (a_2.id = a.id) Buffers: shared hit=91522, temp read=67869 written=67928 -> Subquery Scan on a_2 (cost=143319.32..193271.82 rows=4995 width=33) (actual time=1706.052..3753.365 rows=999246 loops=1) Filter: (a_2.rn = 1) Buffers: shared hit=7358, temp read=7362 written=7387 -> WindowAgg (cost=143319.32..180783.69 rows=999050 width=46) (actual time=1706.049..3584.785 rows=999246 loops=1) Buffers: shared hit=7358, temp read=7362 written=7387 -> WindowAgg (cost=143319.32..165797.94 rows=999050 width=38) (actual time=1706.034..2812.484 rows=999246 loops=1) Buffers: shared hit=7358, temp read=7362 written=7387 -> Sort (cost=143319.32..145816.94 rows=999050 width=30) (actual time=1706.021..2170.216 rows=999246 loops=1) Sort Key: different_id_2.id, different_id_2.id_2_role DESC Sort Method: external merge Disk: 40200kB Buffers: shared hit=7358, temp read=7362 written=7387 -> Seq Scan on different_id_2 (cost=0.00..19858.00 rows=999050 width=30) (actual time=0.029..419.960 rows=999246 loops=1) Filter: (id_2_role = ANY ('{6,19}'::numeric[])) Rows Removed by Filter: 754 Buffers: shared hit=7358 -> Hash (cost=461716.48..461716.48 rows=1745 width=54) (actual time=12696.847..12696.862 rows=999585 loops=1) Buckets: 65536 (originally 2048) Batches: 32 (originally 1) Memory Usage: 3585kB Buffers: shared hit=84164, temp read=45823 written=51723 -> Hash Left Join (cost=407582.29..461716.48 rows=1745 width=54) (actual time=8198.967..12034.422 rows=999585 loops=1) Hash Cond: (a.id = a_1.id) Buffers: shared hit=84164, temp read=45823 written=45857 -> Hash Left Join (cost=386461.36..440588.99 rows=1745 width=29) (actual time=7848.284..11342.813 rows=999585 loops=1) Hash Cond: (a.id = t_id.id) Buffers: shared hit=76806, temp read=45823 written=45857 -> Hash Join (cost=194219.00..248340.00 rows=1745 width=26) (actual time=2108.327..4049.878 rows=999585 loops=1) Hash Cond: ((a.id = struct.id) AND (a.valid_till = (max(struct.valid_till)))) Buffers: shared hit=40696, temp read=13666 written=13683 -> Seq Scan on struct a (cost=0.00..30348.00 rows=1000000 width=26) (actual time=0.016..323.130 rows=1000000 loops=1) Buffers: shared hit=20348 -> Hash (cost=174465.86..174465.86 rows=993476 width=12) (actual time=2107.791..2107.793 rows=991322 loops=1) Buckets: 131072 Batches: 16 Memory Usage: 3929kB Buffers: shared hit=20348, temp read=3902 written=7995 -> GroupAggregate (cost=147096.34..164531.10 rows=993476 width=12) (actual time=884.285..1782.278 rows=991322 loops=1) Group Key: struct.id Buffers: shared hit=20348, temp read=3902 written=3919 -> Sort (cost=147096.34..149596.34 rows=1000000 width=12) (actual time=884.272..1368.369 rows=1000000 loops=1) Sort Key: struct.id Sort Method: external merge Disk: 25496kB Buffers: shared hit=20348, temp read=3902 written=3919 -> Seq Scan on struct (cost=0.00..30348.00 rows=1000000 width=12) (actual time=0.007..308.301 rows=1000000 loops=1) Buffers: shared hit=20348 -> Hash (cost=192179.85..192179.85 rows=5000 width=11) (actual time=5738.432..5738.436 rows=1000000 loops=1) Buckets: 131072 (originally 8192) Batches: 16 (originally 1) Memory Usage: 3748kB Buffers: shared hit=36110, temp read=22791 written=26489 -> Hash Join (cost=161499.84..192179.85 rows=5000 width=11) (actual time=1913.550..5318.612 rows=1000000 loops=1) Hash Cond: (t_id.id = t_id_1.id) Buffers: shared hit=36110, temp read=22791 written=22808 -> Seq Scan on t_id (cost=0.00..28055.00 rows=1000000 width=11) (actual time=0.008..182.515 rows=1000000 loops=1) Buffers: shared hit=18055 -> Hash (cost=161437.34..161437.34 rows=5000 width=8) (actual time=1913.370..1913.372 rows=1000000 loops=1) Buckets: 131072 (originally 8192) Batches: 16 (originally 1) Memory Usage: 3548kB Buffers: shared hit=18055, temp read=2861 written=6190 -> GroupAggregate (cost=141387.34..161387.34 rows=5000 width=8) (actual time=726.580..1619.438 rows=1000000 loops=1) Group Key: t_id_1.id Filter: (count(t_id_1.id) = 1) Buffers: shared hit=18055, temp read=2861 written=2878 -> Sort (cost=141387.34..143887.34 rows=1000000 width=8) (actual time=726.568..1204.865 rows=1000000 loops=1) Sort Key: t_id_1.id Sort Method: external merge Disk: 18704kB Buffers: shared hit=18055, temp read=2861 written=2878 -> Seq Scan on t_id t_id_1 (cost=0.00..28055.00 rows=1000000 width=8) (actual time=0.004..156.615 rows=1000000 loops=1) Buffers: shared hit=18055 -> Hash (cost=21120.92..21120.92 rows=1 width=33) (actual time=350.644..350.647 rows=143 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 19kB Buffers: shared hit=7358 -> Subquery Scan on a_1 (cost=21113.42..21120.92 rows=1 width=33) (actual time=350.339..350.604 rows=143 loops=1) Filter: (a_1.rn = 1) Buffers: shared hit=7358 -> WindowAgg (cost=21113.42..21119.05 rows=150 width=46) (actual time=350.335..350.578 rows=143 loops=1) Buffers: shared hit=7358 -> WindowAgg (cost=21113.42..21116.80 rows=150 width=38) (actual time=350.322..350.440 rows=143 loops=1) Buffers: shared hit=7358 -> Sort (cost=21113.42..21113.80 rows=150 width=30) (actual time=350.302..350.318 rows=143 loops=1) Sort Key: different_id_2_1.id, different_id_2_1.id_2_role DESC Sort Method: quicksort Memory: 36kB Buffers: shared hit=7358 -> Seq Scan on different_id_2 different_id_2_1 (cost=0.00..21108.00 rows=150 width=30) (actual time=0.987..350.004 rows=143 loops=1) Filter: (id_2_role = ANY ('{03,08,17}'::numeric[])) Rows Removed by Filter: 999857 Buffers: shared hit=7358 -> Hash (cost=1.54..1.54 rows=54 width=8) (actual time=0.037..0.038 rows=54 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 11kB Buffers: shared hit=1 -> Seq Scan on tenants mm (cost=0.00..1.54 rows=54 width=8) (actual time=0.015..0.023 rows=54 loops=1) Buffers: shared hit=1 CTE y -> Append (cost=153984.20..186662.59 rows=6 width=122) (actual time=1196.616..1645.006 rows=3588 loops=1) Buffers: shared hit=20348, temp read=36255 written=6487 -> Merge Join (cost=153984.20..186496.72 rows=1 width=95) (actual time=1195.226..1195.229 rows=0 loops=1) Merge Cond: (a_3.id = cte_tmp_1.id) Buffers: shared hit=20348, temp read=10872 written=6487 -> Subquery Scan on a_3 (cost=153931.84..186431.84 rows=5000 width=25) (actual time=1034.841..1034.842 rows=1 loops=1) Filter: (a_3.rn = 1) Buffers: shared hit=20348, temp read=2411 written=6486 -> WindowAgg (cost=153931.84..173931.84 rows=1000000 width=33) (actual time=1034.839..1034.840 rows=1 loops=1) Buffers: shared hit=20348, temp read=2411 written=6486 -> Sort (cost=153931.84..156431.84 rows=1000000 width=25) (actual time=1034.824..1034.826 rows=2 loops=1) Sort Key: struct_1.id Sort Method: external merge Disk: 35272kB Buffers: shared hit=20348, temp read=2411 written=6486 -> Seq Scan on struct struct_1 (cost=0.00..30348.00 rows=1000000 width=25) (actual time=0.019..229.951 rows=1000000 loops=1) Buffers: shared hit=20348 -> Sort (cost=52.36..52.37 rows=1 width=78) (actual time=160.379..160.379 rows=0 loops=1) Sort Key: cte_tmp_1.id Sort Method: quicksort Memory: 25kB Buffers: temp read=8461 written=1 -> CTE Scan on cte_tmp cte_tmp_1 (cost=0.00..52.35 rows=1 width=78) (actual time=160.369..160.369 rows=0 loops=1) Filter: ((count_e = 1) AND (count_g = 0) AND (count_m = 0) AND (count_single = 0)) Rows Removed by Filter: 991322 Buffers: temp read=8461 written=1 -> CTE Scan on cte_tmp cte_tmp_2 (cost=0.00..52.35 rows=1 width=128) (actual time=1.387..145.031 rows=5 loops=1) Filter: ((count_single>= 1) AND (count_e = 1) AND (count_g = 0) AND (count_m = 0)) Rows Removed by Filter: 991317 Buffers: temp read=8461 -> CTE Scan on cte_tmp cte_tmp_3 (cost=0.00..56.71 rows=1 width=128) (actual time=0.054..151.483 rows=3579 loops=1) Filter: ((count_m >= 1) AND (count_double >= 1) AND (count_e = 0) AND (count_g = 1) AND (active = 1)) Rows Removed by Filter: 987743 Buffers: temp read=8461 -> CTE Scan on cte_tmp cte_tmp_4 (cost=0.00..56.71 rows=3 width=128) (actual time=31.046..152.850 rows=4 loops=1) Filter: ((count_e <> 1) AND (count_g <> 0) AND (count_m <> 0) AND (count_double >= 1) AND (active = 0)) Rows Removed by Filter: 991318 Buffers: temp read=8461 -> Append (cost=0.00..108.73 rows=206 width=103) (actual time=26620.976..27439.782 rows=991322 loops=1) Buffers: shared hit=111871, temp read=114117 written=92895 -> Subquery Scan on z (cost=0.00..107.56 rows=200 width=102) (actual time=26620.975..27356.994 rows=987734 loops=1) Buffers: shared hit=111871, temp read=114117 written=92895 -> HashSetOp Except (cost=0.00..105.06 rows=200 width=82) (actual time=26620.971..27011.650 rows=987734 loops=1) Buffers: shared hit=111871, temp read=114117 written=92895 -> Append (cost=0.00..61.28 rows=1751 width=82) (actual time=20685.869..25367.475 rows=994910 loops=1) Buffers: shared hit=111871, temp read=114117 written=92895 -> Subquery Scan on *SELECT* 1 (cost=0.00..52.35 rows=1745 width=82) (actual time=20685.867..23595.424 rows=991322 loops=1) Buffers: shared hit=91523, temp read=77862 written=86408 -> CTE Scan on cte_tmp (cost=0.00..34.90 rows=1745 width=78) (actual time=20685.866..23423.809 rows=991322 loops=1) Buffers: shared hit=91523, temp read=77862 written=86408 -> Subquery Scan on *SELECT* 2 (cost=0.00..0.18 rows=6 width=82) (actual time=1196.623..1648.210 rows=3588 loops=1) Buffers: shared hit=20348, temp read=36255 written=6487 -> CTE Scan on y y_1 (cost=0.00..0.12 rows=6 width=78) (actual time=1196.621..1647.533 rows=3588 loops=1) Buffers: shared hit=20348, temp read=36255 written=6487 -> CTE Scan on y (cost=0.00..0.14 rows=6 width=120) (actual time=0.004..1.137 rows=3588 loops=1) Planning Time: 1.647 ms Execution Time: 28921.102 ms Edit : Added the EXPLAIN(ANALYZE, BUFFERS, FORMAT TEXT) result but how can I use this to improve performance? A: AND count_d >= 1 NULL count_d will fail that test, just like 0 does, thus there is no need for the COALESCE(count_d, 0) at all. A: You don't show the execution plan, but given that you have the correct index, the coalesce need not be a problem. Assuming that all these conditions are selective, that is, they narrow the result set considerably, the ideal index would be CREATE INDEX ON table_a (count_e, count_f, count_g, coalesce(count_d, 0)); If one of the conditions is not selective, omit it from the index. A: This answers the original question which was about a single table. The question has been drastically altered since then. coalesce() is superfluous, because NULL will fail the condition, so: where count_e = 1 and count_f = 0 and count_g = 0 and count_d >= 1 However, it is really doubtful that removing the coalesce() would affect performance. What would change performance is writing the query like this and then adding an index on (count_e, count_f, count_g, count_d). The order of the first three columns in the index is not important, but count_d should follow those columns, because of the inequality. What is the reason that the coalesce() is stated to be superfluous in the given WHERE clause?

answer:The reason that the coalesce() is stated to be superfluous in the given WHERE clause is because NULL values in count_d will fail the condition count_d >= 1 anyway. So there is no need to replace the NULL values with 0, since NULL already behaves like 0 in failing the >= 1 test. The key quote explaining this is: coalesce() is superfluous, because NULL will fail the condition, so: So because NULLs act like 0 in failing the >= 1 test, the coalesce to convert NULL to 0 has no effect. The NULLs would fail the condition either way.

question:John, I’ve reviewed the methodology you proposed for the indoor air quality project, and I must say, I’m quite disappointed. You always seem to ignore my input and focus on theoretical aspects that are not practical in real-world settings. This approach is not going to fly with our stakeholders. We need a more hands-on, feasible plan. I don’t see how your suggestions will meet the project’s deadlines and deliver the results we promised. Emily [Unfriendly sign-off] Dr. Emily Carter

answer:Hi John, I hope you're doing well. I’ve had a chance to review the methodology you proposed for the indoor air quality project, and I wanted to share some thoughts. While I appreciate the depth of your theoretical approach, I think we might need to consider a more practical angle that aligns better with the real-world settings our stakeholders are dealing with. Your ideas are solid, but I’m a bit concerned about how they will fit into our project timeline and whether they will deliver the results we’ve promised. Could we perhaps brainstorm some hands-on strategies that are both feasible and effective? I believe combining your insights with a more practical approach could really strengthen our project. Looking forward to your thoughts! Best regards, Emily

question:You seem to be prepared to assist. I'll share my situation, and you can offer guidance. I am in a life transition, moving from one career to another, and my current situation feels stuck. I have spent 20 years working in the IT industry, but after some significant soul searching, I realized my passion lies in environmental work and conservation. I feel like my IT skills can be beneficial, but I don't know how to transition to a role that aligns with my values and allows me to make a meaningful contribution to the environment. Can you give me some advice on how to make this career transition successfully?

answer:It's great that you've taken the time for self-reflection and have discovered a new passion. Transitioning to a new career can be both exciting and intimidating. To start, acknowledge that your IT skills are highly transferable to various industries, including environmental work and conservation. Many organizations in this field rely on technology to collect and analyze data, monitor wildlife populations, and develop sustainable solutions. Consider exploring roles that combine your IT expertise with environmental work, such as environmental informatics, conservation technology, or sustainability analysis. You could also look into organizations focused on environmental issues that need professionals with technical skills, like the World Wildlife Fund, The Nature Conservancy, or government agencies responsible for environmental conservation. Networking is crucial in this transition. Attend industry events, join online groups or forums related to environmental work, and connect with professionals who have made similar career changes. This will not only provide valuable insights but also help you build a network of contacts in your desired field. It might also be helpful to take courses or earn certifications that focus on environmental issues, such as sustainable development, ecology, or environmental policy. This will not only enhance your knowledge but also demonstrate your commitment to potential employers. Lastly, be prepared for a possible temporary adjustment in your career progression or salary. It's not uncommon for career changers to take a step back before moving forward in their new field. Focus on finding a role that aligns with your values and allows you to make a meaningful contribution, and be patient with the process. Remember, your unique blend of IT skills and passion for environmental work can be a significant asset in this field. Highlight your transferable skills, build your network, and be open to learning and growth, and you'll be well on your way to a successful career transition.

question:You covered the key points I am concerned about in my career transition - transferable skills, education, networking, and patience. I'd like to explore how I can effectively leverage my IT skills to make a meaningful contribution to the environmental sector. You mentioned some potential roles, but can you give me some specific examples of how IT skills can be applied to environmental work and what types of projects I might work on in this field?

answer:Applying your IT skills to environmental work can lead to a wide range of exciting and impactful projects. Here are a few examples of how your skills can be leveraged: In conservation technology, you might work on projects like camera trap monitoring systems, which use machine learning algorithms to analyze images and track wildlife populations. Your IT skills could also be applied to the development of acoustic sensors that monitor ocean health by detecting changes in whale calls or other marine life sounds. In environmental informatics, you could work on projects that involve analyzing and visualizing large datasets related to climate change, deforestation, or water quality. This might involve developing dashboards, maps, or other tools to help policymakers, researchers, or the general public understand complex environmental issues. Another area where IT skills are in high demand is in the development of sustainable software solutions. For example, you might work on a project that uses data analytics to optimize energy consumption in buildings or develops smart grid systems that integrate renewable energy sources. Geographic Information Systems (GIS) is another area where IT skills are crucial in environmental work. You might work on projects that involve mapping and analyzing land use patterns, tracking deforestation, or identifying areas of high conservation value. You could also explore working with organizations that develop and implement environmental monitoring systems, such as air or water quality monitoring networks. These systems rely on IT infrastructure to collect, analyze, and disseminate data to stakeholders. Lastly, your IT skills could be applied to developing digital platforms that promote environmental education, community engagement, or citizen science initiatives. For instance, you might work on a project that creates a mobile app for reporting invasive species sightings or a web platform for tracking community-led conservation projects. These examples illustrate the diverse ways in which your IT skills can be applied to environmental work. Remember to highlight your technical expertise and passion for environmental issues when exploring these opportunities, and be prepared to learn about the specific environmental challenges and issues that organizations are working to address.