[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Spacewalk-devel] Patch review for exportLib.py



Using rhnChecksum and rhnChecksumType directly instead of rhnChecksumView seems to fix the performance problem as well.  'explain analyze' output is included to show the difference.

I saw the ill effects of using the same table twice within a subquery, but honestly, it doesn't make a whole lot of sense to me.  The subquery returns the correct number of rows, but the database engine seems to do something else with it when joining to the outer query.  Makes me glad I don't have to deal much with SQL :-)

diff --git a/backend/satellite_tools/exporter/exportLib.py b/backend/satellite_tools/exporter/exportLib.
index 1646c58..4eb7b36 100644
--- a/backend/satellite_tools/exporter/exportLib.py
+++ b/backend/satellite_tools/exporter/exportLib.py
@@ -750,12 +750,14 @@ class _PackageDumper(BaseRowDumper):
                 pc.name, pf.device, pf.inode, pf.file_mode, pf.username,
                 pf.groupname, pf.rdev, pf.file_size,
                 TO_CHAR(mtime, 'YYYYMMDDHH24MISS') mtime,
-                c.checksum_type as "checksum-type",
+                ct.label as "checksum-type",
                 c.checksum, pf.linkto, pf.flags, pf.verifyflags, pf.lang
             from rhnPackageFile pf
-            left join rhnChecksumView c
-              on pf.checksum_id = c.id,
-                rhnPackageCapability pc
+              left join rhnChecksum c
+                on pf.checksum_id = c.id
+              left join rhnChecksumType ct
+                on ct.id = c.checksum_type_id,
+              rhnPackageCapability pc
             where pf.capability_id = pc.id
             and pf.package_id = :package_id
         """)

Original:
spaceschema=# explain analyze select
spaceschema-#                 pc.name, pf.device, pf.inode, pf.file_mode, pf.username,
spaceschema-#                 pf.groupname, pf.rdev, pf.file_size,
spaceschema-#                 TO_CHAR(mtime, 'YYYYMMDDHH24MISS') mtime,
spaceschema-#                 c.checksum_type as "checksum-type",
spaceschema-#                 c.checksum, pf.linkto, pf.flags, pf.verifyflags, pf.lang
spaceschema-#             from rhnPackageFile pf
spaceschema-#               left join rhnChecksumView c
spaceschema-#                 on pf.checksum_id = c.id,
spaceschema-#               rhnPackageCapability pc
spaceschema-#             where pf.capability_id = pc.id
spaceschema-#             and pf.package_id = 15000;
                                                                             QUERY PLAN

--------------------------------------------------------------------------------------------------------
-------------------------------------------------------------
 Nested Loop  (cost=143272.87..186332.37 rows=1449 width=215) (actual time=2679.659..2981.784 rows=100 l
oops=1)
   ->  Hash Left Join  (cost=143272.87..173756.88 rows=1449 width=154) (actual time=2679.623..2980.960 r
ows=100 loops=1)
         Hash Cond: (pf.checksum_id = c.id)
         ->  Index Scan using rhn_package_file_pid_cid_uq on rhnpackagefile pf  (cost=0.00..4310.77 rows
=1449 width=110) (actual time=0.033..0.145 rows=100 loops=1)
               Index Cond: (package_id = 15000::numeric)
         ->  Hash  (cost=86806.33..86806.33 rows=2429483 width=60) (actual time=2679.115..2679.115 rows=
2435257 loops=1)
               ->  Hash Join  (cost=1.11..86806.33 rows=2429483 width=60) (actual time=0.027..1511.900 r
ows=2435257 loops=1)
                     Hash Cond: (c.checksum_type_id = ct.id)
                     ->  Seq Scan on rhnchecksum c  (cost=0.00..53399.83 rows=2429483 width=61) (actual
time=0.004..235.896 rows=2435257 loops=1)
                     ->  Hash  (cost=1.05..1.05 rows=5 width=13) (actual time=0.010..0.010 rows=5 loops=
1)
                           ->  Seq Scan on rhnchecksumtype ct  (cost=0.00..1.05 rows=5 width=13) (actual
 time=0.002..0.004 rows=5 loops=1)
   ->  Index Scan using rhn_pkg_capability_id_pk on rhnpackagecapability pc  (cost=0.00..8.66 rows=1 wid
th=77) (actual time=0.004..0.005 rows=1 loops=100)
         Index Cond: (pc.id = pf.capability_id)
 Total runtime: 2981.895 ms
(14 rows)


Patched:
spaceschema=# explain analyze select
spaceschema-#                 pc.name, pf.device, pf.inode, pf.file_mode, pf.username,
spaceschema-#                 pf.groupname, pf.rdev, pf.file_size,
spaceschema-#                 TO_CHAR(mtime, 'YYYYMMDDHH24MISS') mtime,
spaceschema-#                 ct.label as "checksum-type",
spaceschema-#                 c.checksum, pf.linkto, pf.flags, pf.verifyflags, pf.lang
spaceschema-#             from rhnPackageFile pf
spaceschema-#               left join rhnChecksum c
spaceschema-#                 on pf.checksum_id = c.id
spaceschema-#               left join rhnChecksumType ct
spaceschema-#                 on ct.id = c.checksum_type_id,
spaceschema-#               rhnPackageCapability pc
spaceschema-#             where pf.capability_id = pc.id
spaceschema-#             and pf.package_id = 15000;

                                                                                QUERY PLAN

--------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..29281.60 rows=1449 width=215) (actual time=0.071..2.022 rows=100 loo
ps=1)
   ->  Nested Loop Left Join  (cost=0.00..28872.14 rows=1449 width=216) (actual time=0.057..1.415 rows=1
00 loops=1)
         ->  Nested Loop  (cost=0.00..16882.63 rows=1449 width=171) (actual time=0.044..0.839 rows=100 l
oops=1)
               ->  Index Scan using rhn_package_file_pid_cid_uq on rhnpackagefile pf  (cost=0.00..4310.7
7 rows=1449 width=110) (actual time=0.029..0.095 rows=100 loops=1)
                     Index Cond: (package_id = 15000::numeric)
               ->  Index Scan using rhn_pkg_capability_id_pk on rhnpackagecapability pc  (cost=0.00..8.6
6 rows=1 width=77) (actual time=0.005..0.006 rows=1 loops=100)
                     Index Cond: (pc.id = pf.capability_id)
         ->  Index Scan using rhnchecksum_pk on rhnchecksum c  (cost=0.00..8.26 rows=1 width=61) (actual
 time=0.005..0.005 rows=1 loops=100)
               Index Cond: (pf.checksum_id = c.id)
   ->  Index Scan using rhn_checksumtype_id_pk on rhnchecksumtype ct  (cost=0.00..0.27 rows=1 width=13)
(actual time=0.002..0.002 rows=1 loops=100)
         Index Cond: (ct.id = c.checksum_type_id)
 Total runtime: 2.154 ms
(12 rows)

/aron

-----Original Message-----
From: Parsons, Aron 
Sent: Friday, February 03, 2012 12:25 PM
To: 'jpazdziora redhat com'
Subject: Re: [Spacewalk-devel] Patch review for exportLib.py

Jan,
The lookup* change you mention is for a separate change in satellite-sync.  This issue with rhnChecksumView is a change in rhn-satellite-exporter.  Both are nearly unusable on PostgreSQL right now.

/aron

-----Original Message-----

Message: 4
Date: Fri, 3 Feb 2012 17:02:31 +0100
From: Jan Pazdziora <jpazdziora redhat com>
To: spacewalk-devel redhat com
Subject: Re: [Spacewalk-devel] Patch review for exportLib.py
Message-ID: <20120203160231 GB6087 redhat com>
Content-Type: text/plain; charset=us-ascii

On Fri, Feb 03, 2012 at 03:55:53PM +0000, Parsons, Aron wrote:
> Can someone verify that this is a sound change?  rhn-satellite-exporter is extremely slow on PostgreSQL when getting the file checksums.  This speeds it up dramatically.
> 
> It's the same issue we ran into with PostgreSQL doing a left join on rhnChecksumView in 'query_client_get_file'.  PostgreSQL does a sequence scan on a table with millions of rows.  I patched it in 1e64dc4546d96c58dd84a88c3c634b2d2ce164a4 but Michael had to fix it in 79e57d3b215340a54e911a073408f0eb7c6afc1f.  I just want to be sure my SQL is proper on this one before committing this time.
> 
> diff --git a/backend/satellite_tools/exporter/exportLib.py b/backend/satellite_t
> index 1646c58..1f265a2 100644
> --- a/backend/satellite_tools/exporter/exportLib.py
> +++ b/backend/satellite_tools/exporter/exportLib.py
> @@ -753,7 +753,12 @@ class _PackageDumper(BaseRowDumper):
>                  c.checksum_type as "checksum-type",
>                  c.checksum, pf.linkto, pf.flags, pf.verifyflags, pf.lang
>              from rhnPackageFile pf
> -            left join rhnChecksumView c
> +            left join
> +              (select c.*
> +               from rhnPackageFile pf
> +                 inner join rhnChecksumView c
> +                   on pf.checksum_id = c.id
> +               where pf.package_id = :package_id) c
>                on pf.checksum_id = c.id,
>                  rhnPackageCapability pc
>              where pf.capability_id = pc.id

I'm affraid you'd get the same

	having a table twice in select is mostly a bug

response on this one. What are you trying to achieve? This query
doesn't even have any lookup_* function.

-- 
Jan Pazdziora
Principal Software Engineer, Satellite Engineering, Red Hat



------------------------------

_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel redhat com
https://www.redhat.com/mailman/listinfo/spacewalk-devel

End of Spacewalk-devel Digest, Vol 45, Issue 4
**********************************************


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]