[Freeipa-devel] [PATCH] 1079 address CA subsystem renewal issues

Petr Viktorin pviktori at redhat.com
Mon Jan 14 13:47:10 UTC 2013


On 01/12/2013 12:49 AM, Rob Crittenden wrote:
> Rob Crittenden wrote:
>> Petr Viktorin wrote:
>>> On 01/07/2013 05:42 PM, Rob Crittenden wrote:
>>>> Petr Viktorin wrote:
>>>>> On 01/07/2013 03:09 PM, Rob Crittenden wrote:
>>>>>> Petr Viktorin wrote:
>>> [...]
>>>>>>>
>>>>>>> Works for me, but I have some questions (this is an area I know
>>>>>>> little
>>>>>>> about).
>>>>>>>
>>>>>>> Can we be 100% sure these certs are always renewed together? Is
>>>>>>> certmonger the only possible mechanism to update them?
>>>>>>
>>>>>> You raise a good point. If though some mechanism someone replaces
>>>>>> one of
>>>>>> these certs it will cause the script to fail. Some notification of
>>>>>> this
>>>>>> failure will be logged though, and of course, the certs won't be
>>>>>> renewed.
>>>>>>
>>>>>> One could conceivably manually renew one of these certificates. It is
>>>>>> probably a very remote possibility but it is non-zero.
>>>>>>
>>>>>>> Can we be sure certmonger always does the updates in parallel? If it
>>>>>>> managed to update the audit cert before starting on the others, we'd
>>>>>>> get
>>>>>>> no CA restart for the others.
>>>>>>
>>>>>> These all get issued at the same time so should expire at the same
>>>>>> time
>>>>>> as well (see problem above). The script will hang around for 10
>>>>>> minutes
>>>>>> waiting for the renewal to complete, then give up.
>>>>>
>>>>> The certs might take different amounts of time to update, right?
>>>>> Eventually, the expirations could go out of sync enough for it to
>>>>> matter.
>>>>> AFAICS, without proper locking we still get a race condition when the
>>>>> other certs start being renewed some time (much less than 10 min)
>>>>> after
>>>>> the audit one:
>>>>>
>>>>> (time axis goes down)
>>>>>
>>>>>          audit cert                  other cert
>>>>>          ----------                  ----------
>>>>>      certmonger does renew                .
>>>>>    post-renew script starts               .
>>>>>   check state of other certs: OK          .
>>>>>              .                   certmonger starts renew
>>>>>   certutil modifies NSS DB  +  certmonger modifies NSS DB  == boom!
>>>>
>>>> This can't happen because we count the # of expected certs and wait
>>>> until all are in MONITORING before continuing.
>>>
>>> The problem is that they're also in MONITORING before the whole renewal
>>> starts. If the script happens to check just before the state changes
>>> from MONITORING to GENERATING_CSR or whatever, we can get corruption.
>>>
>>>> The worse that would
>>>> happen is the trust wouldn't be set on the audit cert and dogtag
>>>> wouldn't be restarted.
>>>>
>>>>>
>>>>>
>>>>>> The state the system would be in is this:
>>>>>>
>>>>>> - audit cert trust not updated, so next restart of CA will fail
>>>>>> - CA is not restarted so will not use updated certificates
>>>>>>
>>>>>>> And anyway, why does certmonger do renewals in parallel? It seems
>>>>>>> that
>>>>>>> if it did one at a time, always waiting until the post-renew
>>>>>>> script is
>>>>>>> done, this patch wouldn't be necessary.
>>>>>>>
>>>>>>
>>>>>>  From what Nalin told me certmonger has some coarse locking such that
>>>>>> renewals in a the same NSS database are serialized. As you point
>>>>>> out, it
>>>>>> would be nice to extend this locking to the post renewal scripts. We
>>>>>> can
>>>>>> ask Nalin about it. That would fix the potential corruption issue.
>>>>>> It is
>>>>>> still much nicer to not have to restart dogtag 4 times.
>>>>>>
>>>>>
>>>>> Well, three extra restarts every few years seems like a small price to
>>>>> pay for robustness.
>>>>
>>>> It is a bit of a problem though because the certs all renew within
>>>> seconds so end up fighting over who is restarting dogtag. This can
>>>> cause
>>>> some renewals go into a failure state to be retried later. This is fine
>>>> functionally but makes QE a bit of a pain. You then have to make sure
>>>> that renewal is basically done, then restart certmonger and check
>>>> everything again, over and over until all the certs are renewed.
>>>> This is
>>>> difficult to automate.
>>>
>>> So we need to extend the certmonger lock, and wait until Dogtag is back
>>> up before exiting the script. That way it'd still take longer than 1
>>> restart, but all the renews should succeed.
>>>
>>
>> Right, but older dogtag versions don't have the handy servlet to tell
>> that the service is actually up and responding. So it is difficult to
>> tell from tomcat alone whether the CA is actually up and handling
>> requests.
>>
>
> Revised patch that takes advantage of new version of certmonger.
> certmonger-0.65 adds locking from the time renewal begins to the end of
> the post_save_command. This lets us be sure that no other certmonger
> renewals will have the NSS database open in read-write mode.
>
> We need to be sure that tomcat is shut down before we let certmonger
> save the certificate to the NSS database because dogtag opens its
> database read/write and two writers can cause corruption.
>
> rob
>

stop_pkicad and start_pkicad need the Dogtag version check to select 
pki_cad/pki_tomcatd.

A more serious issue is that stop_pkicad needs to be installed on 
upgrades. Currently the whole enable_certificate_renewal step in 
ipa-upgradeconfig is skipped if it was done before.

In stop_pkicad can you change the first log message to "certmonger 
stopping %sd"? It's before the action so we don't want past tense.

-- 
Petr³




More information about the Freeipa-devel mailing list