[Freeipa-users] re-initialize replica

Tue Oct 6 13:52:52 UTC 2015

On Tue, Oct 06, 2015 at 09:35:08AM -0400, Rob Crittenden wrote:
> Andrew E. Bruno wrote:
> > On Mon, Oct 05, 2015 at 02:48:48PM -0400, Rob Crittenden wrote:
> >> Andrew E. Bruno wrote:
> >>> On Mon, Oct 05, 2015 at 12:40:42PM +0200, Martin Kosek wrote:
> >>>> On 10/02/2015 06:00 PM, Andrew E. Bruno wrote:
> >>>>> On Fri, Oct 02, 2015 at 09:56:47AM -0400, Andrew E. Bruno wrote:
> >>>>>> What's the best way to re-initialize a replica? 
> >>>>>>
> >>>>>> Suppose one of your replicas goes south.. is there a command to tell
> >>>>>> that replicate to re-initialize from the first master (instead of
> >>>>>> removing/re-adding the replica from the topology)?
> >>>>>
> >>>>> Found the command I was looking for:
> >>>>>    ipa-replica-manage re-initialize --from xxx
> >>>>>
> >>>>> However, one of our replicates is down and can't seem to re-initialize
> >>>>> it. Starting ipa fails (via systemctl restart ipa):
> >>>>>
> >>>>> ipactl status
> >>>>> Directory Service: RUNNING
> >>>>> krb5kdc Service: STOPPED
> >>>>> kadmin Service: STOPPED
> >>>>> named Service: STOPPED
> >>>>> ipa_memcached Service: STOPPED
> >>>>> httpd Service: STOPPED
> >>>>> pki-tomcatd Service: STOPPED
> >>>>> ipa-otpd Service: STOPPED
> >>>>> ipa: INFO: The ipactl command was successful
> >>>>>
> >>>>>
> >>>>> Errors from the dirsrv show:
> >>>>>
> >>>>> : GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (No Kerberos credentials available)) errno 0 (Success)
> >>>>> [02/Oct/2015:11:45:05 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] authentication mechanism [GSSAPI]: error -2 (Local error)
> >>>>> [02/Oct/2015:11:50:05 -0400] set_krb5_creds - Could not get initial credentials for principal [ldap/server at realm] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm)
> >>>>> [02/Oct/2015:11:50:05 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (No Kerberos credentials available)) errno 0 (Success)
> >>>>> [02/Oct/2015:11:50:05 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] authentication mechanism [GSSAPI]: error -2 (Local error)
> >>>>>
> >>>>>
> >>>>> Attempting to re-initialize fails:
> >>>>>
> >>>>> ipa-replica-manage re-initialize --from master
> >>>>> Connection timed out.
> >>>>>
> >>>>>
> >>>>> I verified time is in sync and DNS forward/reverse resolution is working.
> >>>>>
> >>>>> Any pointers on what else to try?
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> --Andrew
> >>>>
> >>>> Given that your Kerberos server instance is down, I would start investigating
> >>>> Kerberos logs to see why.
> >>>
> >>>
> >>> So looks like the dirsrv service comes up but with GSS errors about kerb
> >>> credentials. However, the rest of the services including the krb5kdc
> >>> fail to come up. Errors from the kdc logs suggest DNS:
> >>
> >> DS complaining about GSS is somewhat normal during startup as it is a
> >> bit noisy. The other errors suggest there is no data in the backend. An
> >> ldapsearch would confirm that.
> >>
> >>>
> >>>  LOOKING_UP_CLIENT: DNS/replica at REALM Server error
> >>>
> >>> FreeIPA is configured to serve DNS and this replica resolves it's own
> >>> DNS in /etc/resolv.conf (127.0.0.1)
> >>>
> >>> I tried pointing /etc/resolv.conf to another (good) replica and even
> >>> tried adjusting /etc/krb5.conf to point at another kdc to try and get a
> >>> ticket however it still tries to connect to the local kdc (which fails
> >>> to start).  
> >>>
> >>> I'm inclined to re-install this replica and start fresh. I'm curious if
> >>> we can re-kickstart this host from a fresh os/freeipa install and run
> >>> the  ipa-replica-manage re-initialize --from master command. The replica
> >>> will have the same name.. is this possible? Would we need to backup the
> >>> /var/lib/ipa/replica-info-XXX.gpg file?
> >>
> >> It needs to have its own principal in order to re-initialize. It sounds
> >> like it has nothing which is why replication is failing.
> >>
> >> I'd recommend generating a new replica file. There is no value in
> >> re-using the old one and it could be harmful if the certificates are
> >> expired.
> >>
> >> You'll need to delete all replication agreements this master had and
> >> you'll need to use the --force option since it won't be accessible. When
> >> you re-install the master it will get all the current data as part of
> >> the setup so no need to re-initialize after that.
> > 
> > I force removed the replica and still seeing the RUV's show up. 
> > 
> >   # ipa-replica-manage -v --force del srv-m14-30.cbls.ccr.buffalo.edu
> > 
> > 
> > From the logs:
> > 
> > [06/Oct/2015:07:43:47 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Initiating CleanAllRUV Task...
> > [06/Oct/2015:07:43:47 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Retrieving maxcsn...
> > [06/Oct/2015:07:43:47 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Found maxcsn (5600051d001000050000)
> > [06/Oct/2015:07:43:47 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning rid (5)...
> > [06/Oct/2015:07:43:47 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting to process all the updates from the deleted replica...
> > [06/Oct/2015:07:43:47 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be online...
> > [06/Oct/2015:07:43:47 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to receive all the deleted replica updates...
> > [06/Oct/2015:07:43:48 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Sending cleanAllRUV task to all the replicas...
> > [06/Oct/2015:07:43:48 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning local ruv's...
> > [06/Oct/2015:07:43:48 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be cleaned...
> > [06/Oct/2015:07:43:48 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Replica is not cleaned yet (agmt="cn=meTosrv-m14-31-02.cbls.ccr.buffalo.edu" (srv-m14-31-02:389))
> > [06/Oct/2015:07:43:48 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Replicas have not been cleaned yet, retrying in 10 seconds
> > [06/Oct/2015:07:43:59 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to finish cleaning...
> > [06/Oct/2015:07:43:59 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Successfully cleaned rid(5).
> > 
> > The replica is not showing up when running ipa-replica-manage list.
> > 
> >   # ipa-replica-manage list
> >   srv-m14-32.cbls.ccr.buffalo.edu: master
> >   srv-m14-31-02.cbls.ccr.buffalo.edu: master
> > 
> > 
> > However, still seeing the ruvs in ldapsearch:
> > 
> > ldapsearch -Y GSSAPI -b "cn=mapping tree,cn=config" objectClass=nsDS5ReplicationAgreement -LL
> > 
> > 
> > nsds50ruv: {replica 5 ldap://srv-m14-30.cbls.ccr.buffalo.edu:389} 55afec6b0000
> >  00050000 55b2aa68000200050000
> > 
> > 
> > ..
> > 
> > nsds50ruv: {replica 91 ldap://srv-m14-30.cbls.ccr.buffalo.edu:389} 55afecb0000
> >  0005b0000 55b13e740000005b0000
> > 
> > 
> > Should I clean these manually? or can I run: ipa-replica-manage clean-ruv 5
> > 
> > Thanks again for the all the help.
> > 
> > --Andrew
> > 
> > 
> 
> Note that the list of masters comes from entries in IPA, not from
> replication agreements.
> 
> ipa-replica-manage list-ruv will show the RUV data in a simpler way.
> 
> Yeah, I'd use clean-ruv to clean them up.
> 
> rob
> 
> 

I get an error trying to clean-ruv:

  # ipa-replica-manage clean-ruv 5
  Replica ID 5 not found

Can these safely be ignored? or will we hit problems when adding the
replica back in?

Thanks again.