[Freeipa-devel] [PATCH] 1031 run cleanallruv task

Mon Sep 17 15:49:18 UTC 2012

On 09/17/2012 04:15 PM, Rob Crittenden wrote:
> Martin Kosek wrote:
>> On 09/17/2012 04:04 PM, Rob Crittenden wrote:
>>> Martin Kosek wrote:
>>>> On 09/14/2012 09:17 PM, Rob Crittenden wrote:
>>>>> Martin Kosek wrote:
>>>>>> On 09/06/2012 11:17 PM, Rob Crittenden wrote:
>>>>>>> Martin Kosek wrote:
>>>>>>>> On 09/06/2012 05:55 PM, Rob Crittenden wrote:
>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>> On 09/05/2012 08:06 PM, Rob Crittenden wrote:
>>>>>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>>>>> On 07/05/2012 08:39 PM, Rob Crittenden wrote:
>>>>>>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>>>>>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote:
>>>>>>>>>>>>>>>>>> Deleting a replica can leave a replication vector (RUV) on the
>>>>>>>>>>>>>>>>>> other servers.
>>>>>>>>>>>>>>>>>> This can confuse things if the replica is re-added, and it also
>>>>>>>>>>>>>>>>>> causes the
>>>>>>>>>>>>>>>>>> server to calculate changes against a server that may no longer
>>>>>>>>>>>>>>>>>> exist.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 389-ds-base provides a new task that self-propogates itself
>>>>>>>>>>>>>>>>>> to all
>>>>>>>>>>>>>>>>>> available
>>>>>>>>>>>>>>>>>> replicas to clean this RUV data.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This patch will create this task at deletion time to hopefully
>>>>>>>>>>>>>>>>>> clean things up.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It isn't perfect. If any replica is down or unavailable at the
>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> cleanruv task fires, and then comes back up, the old RUV data
>>>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>>>> re-propogated around.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To make things easier in this case I've added two new
>>>>>>>>>>>>>>>>>> commands to
>>>>>>>>>>>>>>>>>> ipa-replica-manage. The first lists the replication ids of
>>>>>>>>>>>>>>>>>> all the
>>>>>>>>>>>>>>>>>> servers we
>>>>>>>>>>>>>>>>>> have a RUV for. Using this you can call clean_ruv with the
>>>>>>>>>>>>>>>>>> replication id of a
>>>>>>>>>>>>>>>>>> server that no longer exists to try the cleanallruv step again.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is quite dangerous though. If you run cleanruv against a
>>>>>>>>>>>>>>>>>> replica id that
>>>>>>>>>>>>>>>>>> does exist it can cause a loss of data. I believe I've put in
>>>>>>>>>>>>>>>>>> enough scary
>>>>>>>>>>>>>>>>>> warnings about this.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> rob
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Good work there, this should make cleaning RUVs much easier than
>>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>>> previous version.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is what I found during review:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) list_ruv and clean_ruv command help in man is quite lost. I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>> it would
>>>>>>>>>>>>>>>>> help if we for example have all info for commands indented. This
>>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>> user could
>>>>>>>>>>>>>>>>> simply over-look the new commands in the man page.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2) I would rename new commands to clean-ruv and list-ruv to make
>>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>> consistent with the rest of the commands (re-initialize,
>>>>>>>>>>>>>>>>> force-sync).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 3) It would be nice to be able to run clean_ruv command in an
>>>>>>>>>>>>>>>>> unattended way
>>>>>>>>>>>>>>>>> (for better testing), i.e. respect --force option as we already
>>>>>>>>>>>>>>>>> do for
>>>>>>>>>>>>>>>>> ipa-replica-manage del. This fix would aid test automation in the
>>>>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 4) (minor) The new question (and the del too) does not react too
>>>>>>>>>>>>>>>>> well for
>>>>>>>>>>>>>>>>> CTRL+D:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv 3 --force
>>>>>>>>>>>>>>>>> Clean the Replication Update Vector for
>>>>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>>>>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>>>>>>>>>>> consistency. Be very careful.
>>>>>>>>>>>>>>>>> Continue to clean? [no]: unexpected error:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 5) Help for clean_ruv command without a required parameter is
>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>> confusing
>>>>>>>>>>>>>>>>> as it reports that command is wrong and not the parameter:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv
>>>>>>>>>>>>>>>>> Usage: ipa-replica-manage [options]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ipa-replica-manage: error: must provide a command [clean_ruv |
>>>>>>>>>>>>>>>>> force-sync |
>>>>>>>>>>>>>>>>> disconnect | connect | del | re-initialize | list | list_ruv]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It seems you just forgot to specify the error message in the
>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 6) When the remote replica is down, the clean_ruv command fails
>>>>>>>>>>>>>>>>> with an
>>>>>>>>>>>>>>>>> unexpected error:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [root at vm-086 ~]# ipa-replica-manage clean_ruv 5
>>>>>>>>>>>>>>>>> Clean the Replication Update Vector for
>>>>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>>>>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>>>>>>>>>>> consistency. Be very careful.
>>>>>>>>>>>>>>>>> Continue to clean? [no]: y
>>>>>>>>>>>>>>>>> unexpected error: {'desc': 'Operations error'}
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
>>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>>>> cleanAllRUV_task: failed
>>>>>>>>>>>>>>>>> to connect to repl        agreement connection
>>>>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> tree,cn=config), error 105
>>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>>>> cleanAllRUV_task: replica
>>>>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.
>>>>>>>>>>>>>>>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> tree,   cn=config) has not been cleaned.  You will need to rerun
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> CLEANALLRUV task on this replica.
>>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>>>> cleanAllRUV_task: Task
>>>>>>>>>>>>>>>>> failed (1)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In this case I think we should inform user that the command
>>>>>>>>>>>>>>>>> failed,
>>>>>>>>>>>>>>>>> possibly
>>>>>>>>>>>>>>>>> because of disconnected replicas and that they could enable the
>>>>>>>>>>>>>>>>> replicas and
>>>>>>>>>>>>>>>>> try again.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 7) (minor) "pass" is now redundant in replication.py:
>>>>>>>>>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>>>>>>>>>>>>>>> +            # We can't make the server we're removing read-only
>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> +            # this isn't a show-stopper
>>>>>>>>>>>>>>>>> +            root_logger.debug("No permission to switch
>>>>>>>>>>>>>>>>> replica to
>>>>>>>>>>>>>>>>> read-only,
>>>>>>>>>>>>>>>>> continuing anyway")
>>>>>>>>>>>>>>>>> +            pass
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think this addresses everything.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rob
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks, almost there! I just found one more issue which needs to be
>>>>>>>>>>>>>>> fixed
>>>>>>>>>>>>>>> before we push:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>>>>>>>>>>>>>>> Directory Manager password:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing
>>>>>>>>>>>>>>> removal
>>>>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>>>>> "Can't
>>>>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There were issues removing a connection: %d format: a number is
>>>>>>>>>>>>>>> required, not str
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>>>>> "Can't
>>>>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is a traceback I retrieved:
>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>>         File "/sbin/ipa-replica-manage", line 425, in del_master
>>>>>>>>>>>>>>>           del_link(realm, r, hostname, options.dirman_passwd,
>>>>>>>>>>>>>>> force=True)
>>>>>>>>>>>>>>>         File "/sbin/ipa-replica-manage", line 271, in del_link
>>>>>>>>>>>>>>>           repl1.cleanallruv(replica_id)
>>>>>>>>>>>>>>>         File
>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> line 1094, in cleanallruv
>>>>>>>>>>>>>>>           root_logger.debug("Creating CLEANALLRUV task for
>>>>>>>>>>>>>>> replica id
>>>>>>>>>>>>>>> %d" %
>>>>>>>>>>>>>>> replicaId)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem here is that you don't convert replica_id to int in
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> part:
>>>>>>>>>>>>>>> +    replica_id = None
>>>>>>>>>>>>>>> +    if repl2:
>>>>>>>>>>>>>>> +        replica_id = repl2._get_replica_id(repl2.conn, None)
>>>>>>>>>>>>>>> +    else:
>>>>>>>>>>>>>>> +        servers = get_ruv(realm, replica1, dirman_passwd)
>>>>>>>>>>>>>>> +        for (netloc, rid) in servers:
>>>>>>>>>>>>>>> +            if netloc.startswith(replica2):
>>>>>>>>>>>>>>> +                replica_id = rid
>>>>>>>>>>>>>>> +                break
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Updated patch using new mechanism in 389-ds-base. This should more
>>>>>>>>>>>>>> thoroughly clean out RUV data when a replica is being deleted, and
>>>>>>>>>>>>>> provide for a way to delete RUV data afterwards too if necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rob
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rebased patch
>>>>>>>>>>>>>
>>>>>>>>>>>>> rob
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 0) As I wrote in a review for your patch 1041, changelog entry slipped
>>>>>>>>>>>> elsewhere.
>>>>>>>>>>>>
>>>>>>>>>>>> 1) The following KeyboardInterrupt except class looks suspicious. I
>>>>>>>>>>>> know why
>>>>>>>>>>>> you have it there, but since it is generally a bad thing to do, some
>>>>>>>>>>>> comment
>>>>>>>>>>>> why it is needed would be useful.
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2,
>>>>>>>>>>>> dirman_passwd,
>>>>>>>>>>>> force=False):
>>>>>>>>>>>>           repl1.delete_agreement(replica2)
>>>>>>>>>>>>           repl1.delete_referral(replica2)
>>>>>>>>>>>>
>>>>>>>>>>>> +    if type1 == replication.IPA_REPLICA:
>>>>>>>>>>>> +        if repl2:
>>>>>>>>>>>> +            ruv = repl2._get_replica_id(repl2.conn, None)
>>>>>>>>>>>> +        else:
>>>>>>>>>>>> +            ruv = get_ruv_by_host(realm, replica1, replica2,
>>>>>>>>>>>> dirman_passwd)
>>>>>>>>>>>> +
>>>>>>>>>>>> +        try:
>>>>>>>>>>>> +            repl1.cleanallruv(ruv)
>>>>>>>>>>>> +        except KeyboardInterrupt:
>>>>>>>>>>>> +            pass
>>>>>>>>>>>> +
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe you just wanted to do some cleanup and then "raise" again?
>>>>>>>>>>>
>>>>>>>>>>> No, it is there because it is safe to break out of it. The task will
>>>>>>>>>>> continue to run. I added some verbiage.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2) This is related to 1), but when some replica is down,
>>>>>>>>>>>> "ipa-replica-manage
>>>>>>>>>>>> del" may wait indefinitely when some remote replica is down, right?
>>>>>>>>>>>>
>>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>>>>>>>>>> Deleting a master is irreversible.
>>>>>>>>>>>> To reconnect to the remote master you will need to prepare a new
>>>>>>>>>>>> replica file
>>>>>>>>>>>> and re-install.
>>>>>>>>>>>> Continue to delete? [no]: y
>>>>>>>>>>>> ipa: INFO: Setting agreement
>>>>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>>>>>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> tree,cn=config
>>>>>>>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica
>>>>>>>>>>>> acquired
>>>>>>>>>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>>>>>>>>>> Background task created to clean replication data
>>>>>>>>>>>>
>>>>>>>>>>>> ... after about a minute I hit CTRL+C
>>>>>>>>>>>>
>>>>>>>>>>>> ^CDeleted replication agreement from
>>>>>>>>>>>> 'vm-086.idm.lab.bos.redhat.com' to
>>>>>>>>>>>> 'vm-055.idm.lab.bos.redhat.com'
>>>>>>>>>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS record
>>>>>>>>>>>> does not
>>>>>>>>>>>> contain 'vm-055.idm.lab.bos.redhat.com.'
>>>>>>>>>>>> You may need to manually remove them from the tree
>>>>>>>>>>>>
>>>>>>>>>>>> I think it would be better to inform user that some remote replica is
>>>>>>>>>>>> down or
>>>>>>>>>>>> at least that we are waiting for the task to complete. Something like
>>>>>>>>>>>> that:
>>>>>>>>>>>>
>>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>>>>>>>>>> ...
>>>>>>>>>>>> Background task created to clean replication data
>>>>>>>>>>>> Replication data clean up may take very long time if some replica is
>>>>>>>>>>>> unreachable
>>>>>>>>>>>> Hit CTRL+C to interrupt the wait
>>>>>>>>>>>> ^C Clean up wait interrupted
>>>>>>>>>>>> ....
>>>>>>>>>>>> [continue with del]
>>>>>>>>>>>
>>>>>>>>>>> Yup, did this in #1.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 3) (minor) When there is a cleanruv task running and you run
>>>>>>>>>>>> "ipa-replica-manage del", there is a unexpected error message with
>>>>>>>>>>>> duplicate
>>>>>>>>>>>> task object in LDAP:
>>>>>>>>>>>>
>>>>>>>>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force
>>>>>>>>>>>> Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing
>>>>>>>>>>>> removal
>>>>>>>>>>>> FAIL
>>>>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>> "Can't
>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>>>>>>>>>>>
>>>>>>>>>>>> There were issues removing a connection: This entry already exists
>>>>>>>>>>>> <<<<<<<<<
>>>>>>>>>>>>
>>>>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>> "Can't
>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record
>>>>>>>>>>>> does not
>>>>>>>>>>>> contain 'vm-072.idm.lab.bos.redhat.com.'
>>>>>>>>>>>> You may need to manually remove them from the tree
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I think it should be enough to just catch for "entry already
>>>>>>>>>>>> exists" in
>>>>>>>>>>>> cleanallruv function, and in such case print a relevant error message
>>>>>>>>>>>> bail out.
>>>>>>>>>>>> Thus, self.conn.checkTask(dn, dowait=True) would not be called too.
>>>>>>>>>>>
>>>>>>>>>>> Good catch, fixed.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 4) (minor): In make_readonly function, there is a redundant "pass"
>>>>>>>>>>>> statement:
>>>>>>>>>>>>
>>>>>>>>>>>> +    def make_readonly(self):
>>>>>>>>>>>> +        """
>>>>>>>>>>>> +        Make the current replication agreement read-only.
>>>>>>>>>>>> +        """
>>>>>>>>>>>> +        dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
>>>>>>>>>>>> +                ('cn', 'plugins'), ('cn', 'config'))
>>>>>>>>>>>> +
>>>>>>>>>>>> +        mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')]
>>>>>>>>>>>> +        try:
>>>>>>>>>>>> +            self.conn.modify_s(dn, mod)
>>>>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>>>>>>>>>> +            # We can't make the server we're removing read-only but
>>>>>>>>>>>> +            # this isn't a show-stopper
>>>>>>>>>>>> +            root_logger.debug("No permission to switch replica to
>>>>>>>>>>>> read-only,
>>>>>>>>>>>> continuing anyway")
>>>>>>>>>>>> +            pass         <<<<<<<<<<<<<<<
>>>>>>>>>>>
>>>>>>>>>>> Yeah, this is one of my common mistakes. I put in a pass initially,
>>>>>>>>>>> then
>>>>>>>>>>> add logging in front of it and forget to delete the pass. Its gone now.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 5) In clean_ruv, I think allowing a --force option to bypass the
>>>>>>>>>>>> user_input
>>>>>>>>>>>> would be helpful (at least for test automation):
>>>>>>>>>>>>
>>>>>>>>>>>> +    if not ipautil.user_input("Continue to clean?", False):
>>>>>>>>>>>> +        sys.exit("Aborted")
>>>>>>>>>>>
>>>>>>>>>>> Yup, added.
>>>>>>>>>>>
>>>>>>>>>>> rob
>>>>>>>>>>
>>>>>>>>>> Slightly revised patch. I still had a window open with one unsaved
>>>>>>>>>> change.
>>>>>>>>>>
>>>>>>>>>> rob
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Apparently there were two unsaved changes, one of which was lost. This
>>>>>>>>> adds in
>>>>>>>>> the 'entry already exists' fix.
>>>>>>>>>
>>>>>>>>> rob
>>>>>>>>>
>>>>>>>>
>>>>>>>> Just one last thing (otherwise the patch is OK) - I don't think this is
>>>>>>>> what we
>>>>>>>> want :-)
>>>>>>>>
>>>>>>>> # ipa-replica-manage clean-ruv 8
>>>>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>
>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>> consistency. Be very careful.
>>>>>>>> Continue to clean? [no]: y   <<<<<<
>>>>>>>> Aborted
>>>>>>>>
>>>>>>>>
>>>>>>>> Nor this exception, (your are checking for wrong exception):
>>>>>>>>
>>>>>>>> # ipa-replica-manage clean-ruv 8
>>>>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>
>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>> consistency. Be very careful.
>>>>>>>> Continue to clean? [no]:
>>>>>>>> unexpected error: This entry already exists
>>>>>>>>
>>>>>>>> This is the exception:
>>>>>>>>
>>>>>>>> Traceback (most recent call last):
>>>>>>>>       File "/sbin/ipa-replica-manage", line 651, in <module>
>>>>>>>>         main()
>>>>>>>>       File "/sbin/ipa-replica-manage", line 648, in main
>>>>>>>>         clean_ruv(realm, args[1], options)
>>>>>>>>       File "/sbin/ipa-replica-manage", line 373, in clean_ruv
>>>>>>>>         thisrepl.cleanallruv(ruv)
>>>>>>>>       File
>>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>>>>>>>> line 1136, in cleanallruv
>>>>>>>>         self.conn.addEntry(e)
>>>>>>>>       File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line
>>>>>>>> 503, in
>>>>>>>> addEntry
>>>>>>>>         self.__handle_errors(e, arg_desc=arg_desc)
>>>>>>>>       File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line
>>>>>>>> 321, in
>>>>>>>> __handle_errors
>>>>>>>>         raise errors.DuplicateEntry()
>>>>>>>> ipalib.errors.DuplicateEntry: This entry already exists
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>
>>>>>>> Fixed that and a couple of other problems. When doing a disconnect we
>>>>>>> should
>>>>>>> not also call clean-ruv.
>>>>>>
>>>>>> Ah, good self-catch.
>>>>>>
>>>>>>>
>>>>>>> I also got tired of seeing crappy error messages so I added a little
>>>>>>> convert
>>>>>>> utility.
>>>>>>>
>>>>>>> rob
>>>>>>
>>>>>> 1) There is CLEANALLRUV stuff included in 1050-3 and not here. There are
>>>>>> also
>>>>>> some finding for this new code.
>>>>>>
>>>>>>
>>>>>> 2) We may want to bump Requires to higher version of 389-ds-base
>>>>>> (389-ds-base-1.2.11.14-1) - it contains a fix for CLEANALLRUV+winsync bug I
>>>>>> found earlier.
>>>>>>
>>>>>>
>>>>>> 3) I just discovered another suspicious behavior. When we are deleting a
>>>>>> master
>>>>>> that has links also to other master(s) we delete those too. But we also
>>>>>> automatically run CLEANALLRUV in these cases, so we may end up in multiple
>>>>>> tasks being started on different masters - this does not look right.
>>>>>>
>>>>>> I think we may rather want to at first delete all links and then run
>>>>>> CLEANALLRUV task, just for one time. This is what I get with current code:
>>>>>>
>>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com
>>>>>> Directory Manager password:
>>>>>>
>>>>>> Deleting a master is irreversible.
>>>>>> To reconnect to the remote master you will need to prepare a new replica
>>>>>> file
>>>>>> and re-install.
>>>>>> Continue to delete? [no]: yes
>>>>>> ipa: INFO: Setting agreement
>>>>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>
>>>>>>
>>>>>>
>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>
>>>>>>
>>>>>>
>>>>>> tree,cn=config
>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>>>> Background task created to clean replication data. This may take a while.
>>>>>> This may be safely interrupted with Ctrl+C
>>>>>>
>>>>>> ^CWait for task interrupted. It will continue to run in the background
>>>>>>
>>>>>> Deleted replication agreement from 'vm-055.idm.lab.bos.redhat.com' to
>>>>>> 'vm-072.idm.lab.bos.redhat.com'
>>>>>> ipa: INFO: Setting agreement
>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>
>>>>>>
>>>>>>
>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>
>>>>>>
>>>>>>
>>>>>> tree,cn=config
>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>>>> Background task created to clean replication data. This may take a while.
>>>>>> This may be safely interrupted with Ctrl+C
>>>>>>
>>>>>> ^CWait for task interrupted. It will continue to run in the background
>>>>>>
>>>>>> Deleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
>>>>>> 'vm-072.idm.lab.bos.redhat.com'
>>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record does
>>>>>> not
>>>>>> contain 'vm-072.idm.lab.bos.redhat.com.'
>>>>>> You may need to manually remove them from the tree
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>
>>>>> All issues addressed and I pulled in abort-clean-ruv from 1050. I added a
>>>>> list-clean-ruv command as well.
>>>>>
>>>>> rob
>>>>
>>>> 1) Patch 1031-9 needs to get squashed with 1031-8
>>>>
>>>>
>>>> 2) Patch needs a rebase (conflict in freeipa.spec.in)
>>>>
>>>>
>>>> 3) New list-clean-ruv man entry is not right:
>>>>
>>>>          list-clean-ruv [REPLICATION_ID]
>>>>                 - List all running CLEANALLRUV and abort CLEANALLRUV tasks.
>>>>
>>>> REPLICATION_ID is not its argument.
>>>
>>> Fixed 1-3.
>>>
>>>> Btw. new list-clean-ruv command proved very useful for me.
>>>>
>>>> 4) I just found out we need to do a better job with make_readonly() command. I
>>>> get into trouble when disconnecting one link to a remote replica as it was
>>>> marked readonly and then I was then unable to manage the disconnected replica
>>>> properly (vm-072 is the replica made readonly):
>>>
>>> Ok, I reset read-only after we delete the agreements. That fixed things up for
>>> me. I disconnected a replica and was able to modify entries on that replica
>>> afterwards.
>>>
>>> This affected the --cleanup command too, it would otherwise have succeeded I
>>> think.
>>>
>>> I tested with an A - B - C - A agreement loop. I disconnected A and C and
>>> confirmed I could still update entries on C. Then I deleted C, then B, and made
>>> sure output looked right, I could still manage entries, etc.
>>>
>>> rob
>>>
>>>>
>>>> [root at vm-055 ~]# ipa-replica-manage disconnect vm-072.idm.lab.bos.redhat.com
>>>>
>>>> [root at vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>> Deleting a master is irreversible.
>>>> To reconnect to the remote master you will need to prepare a new replica file
>>>> and re-install.
>>>> Continue to delete? [no]: yes
>>>> Deleting replication agreements between vm-055.idm.lab.bos.redhat.com and
>>>> vm-072.idm.lab.bos.redhat.com
>>>> ipa: INFO: Setting agreement
>>>> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>
>>>>
>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>
>>>>
>>>> tree,cn=config
>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>> Deleted replication agreement from 'vm-072.idm.lab.bos.redhat.com' to
>>>> 'vm-055.idm.lab.bos.redhat.com'
>>>> Unable to remove replication agreement for vm-055.idm.lab.bos.redhat.com from
>>>> vm-072.idm.lab.bos.redhat.com.
>>>> Background task created to clean replication data. This may take a while.
>>>> This may be safely interrupted with Ctrl+C
>>>> ^CWait for task interrupted. It will continue to run in the background
>>>>
>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com entries: Server is
>>>> unwilling to
>>>> perform: database is read-only arguments:
>>>> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat.com at IDM.LAB.BOS.REDHAT.COM,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
>>>>
>>>>
>>>>
>>>> You may need to manually remove them from the tree
>>>> ipa: INFO: Unhandled LDAPError: {'info': 'database is read-only', 'desc':
>>>> 'Server is unwilling to perform'}
>>>>
>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: Server is
>>>> unwilling to perform: database is read-only
>>>>
>>>> You may need to manually remove them from the tree
>>>>
>>>>
>>>> --cleanup did not work for me as well:
>>>> [root at vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>>>> --cleanup
>>>> Cleaning a master is irreversible.
>>>> This should not normally be require, so use cautiously.
>>>> Continue to clean master? [no]: yes
>>>> unexpected error: Server is unwilling to perform: database is read-only
>>>> arguments:
>>>> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat.com at IDM.LAB.BOS.REDHAT.COM,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
>>>>
>>>>
>>>>
>>>> Martin
>>>>
>>>
>>
>> I think you sent a wrong patch...
>>
>> Martin
>>
> 
> I hate Mondays.
> 
> rob

Maybe you will like this one a little bit more :-)

ACK. Pushed to master, ipa-3-0.

Martin