[Freeipa-devel] [PATCH] 1079 address CA subsystem renewal issues

Rob Crittenden rcritten at redhat.com
Mon Jan 7 19:14:13 UTC 2013


Petr Viktorin wrote:
> On 01/07/2013 05:42 PM, Rob Crittenden wrote:
>> Petr Viktorin wrote:
>>> On 01/07/2013 03:09 PM, Rob Crittenden wrote:
>>>> Petr Viktorin wrote:
> [...]
>>>>>
>>>>> Works for me, but I have some questions (this is an area I know little
>>>>> about).
>>>>>
>>>>> Can we be 100% sure these certs are always renewed together? Is
>>>>> certmonger the only possible mechanism to update them?
>>>>
>>>> You raise a good point. If though some mechanism someone replaces
>>>> one of
>>>> these certs it will cause the script to fail. Some notification of this
>>>> failure will be logged though, and of course, the certs won't be
>>>> renewed.
>>>>
>>>> One could conceivably manually renew one of these certificates. It is
>>>> probably a very remote possibility but it is non-zero.
>>>>
>>>>> Can we be sure certmonger always does the updates in parallel? If it
>>>>> managed to update the audit cert before starting on the others, we'd
>>>>> get
>>>>> no CA restart for the others.
>>>>
>>>> These all get issued at the same time so should expire at the same time
>>>> as well (see problem above). The script will hang around for 10 minutes
>>>> waiting for the renewal to complete, then give up.
>>>
>>> The certs might take different amounts of time to update, right?
>>> Eventually, the expirations could go out of sync enough for it to
>>> matter.
>>> AFAICS, without proper locking we still get a race condition when the
>>> other certs start being renewed some time (much less than 10 min) after
>>> the audit one:
>>>
>>> (time axis goes down)
>>>
>>>          audit cert                  other cert
>>>          ----------                  ----------
>>>      certmonger does renew                .
>>>    post-renew script starts               .
>>>   check state of other certs: OK          .
>>>              .                   certmonger starts renew
>>>   certutil modifies NSS DB  +  certmonger modifies NSS DB  == boom!
>>
>> This can't happen because we count the # of expected certs and wait
>> until all are in MONITORING before continuing.
>
> The problem is that they're also in MONITORING before the whole renewal
> starts. If the script happens to check just before the state changes
> from MONITORING to GENERATING_CSR or whatever, we can get corruption.
>
>> The worse that would
>> happen is the trust wouldn't be set on the audit cert and dogtag
>> wouldn't be restarted.
>>
>>>
>>>
>>>> The state the system would be in is this:
>>>>
>>>> - audit cert trust not updated, so next restart of CA will fail
>>>> - CA is not restarted so will not use updated certificates
>>>>
>>>>> And anyway, why does certmonger do renewals in parallel? It seems that
>>>>> if it did one at a time, always waiting until the post-renew script is
>>>>> done, this patch wouldn't be necessary.
>>>>>
>>>>
>>>>  From what Nalin told me certmonger has some coarse locking such that
>>>> renewals in a the same NSS database are serialized. As you point
>>>> out, it
>>>> would be nice to extend this locking to the post renewal scripts. We
>>>> can
>>>> ask Nalin about it. That would fix the potential corruption issue.
>>>> It is
>>>> still much nicer to not have to restart dogtag 4 times.
>>>>
>>>
>>> Well, three extra restarts every few years seems like a small price to
>>> pay for robustness.
>>
>> It is a bit of a problem though because the certs all renew within
>> seconds so end up fighting over who is restarting dogtag. This can cause
>> some renewals go into a failure state to be retried later. This is fine
>> functionally but makes QE a bit of a pain. You then have to make sure
>> that renewal is basically done, then restart certmonger and check
>> everything again, over and over until all the certs are renewed. This is
>> difficult to automate.
>
> So we need to extend the certmonger lock, and wait until Dogtag is back
> up before exiting the script. That way it'd still take longer than 1
> restart, but all the renews should succeed.
>

Right, but older dogtag versions don't have the handy servlet to tell 
that the service is actually up and responding. So it is difficult to 
tell from tomcat alone whether the CA is actually up and handling requests.

rob




More information about the Freeipa-devel mailing list