[Freeipa-devel] [PATCH] 1079 address CA subsystem renewal issues

Rob Crittenden rcritten at redhat.com
Fri Jan 11 23:49:08 UTC 2013


Rob Crittenden wrote:
> Petr Viktorin wrote:
>> On 01/07/2013 05:42 PM, Rob Crittenden wrote:
>>> Petr Viktorin wrote:
>>>> On 01/07/2013 03:09 PM, Rob Crittenden wrote:
>>>>> Petr Viktorin wrote:
>> [...]
>>>>>>
>>>>>> Works for me, but I have some questions (this is an area I know
>>>>>> little
>>>>>> about).
>>>>>>
>>>>>> Can we be 100% sure these certs are always renewed together? Is
>>>>>> certmonger the only possible mechanism to update them?
>>>>>
>>>>> You raise a good point. If though some mechanism someone replaces
>>>>> one of
>>>>> these certs it will cause the script to fail. Some notification of
>>>>> this
>>>>> failure will be logged though, and of course, the certs won't be
>>>>> renewed.
>>>>>
>>>>> One could conceivably manually renew one of these certificates. It is
>>>>> probably a very remote possibility but it is non-zero.
>>>>>
>>>>>> Can we be sure certmonger always does the updates in parallel? If it
>>>>>> managed to update the audit cert before starting on the others, we'd
>>>>>> get
>>>>>> no CA restart for the others.
>>>>>
>>>>> These all get issued at the same time so should expire at the same
>>>>> time
>>>>> as well (see problem above). The script will hang around for 10
>>>>> minutes
>>>>> waiting for the renewal to complete, then give up.
>>>>
>>>> The certs might take different amounts of time to update, right?
>>>> Eventually, the expirations could go out of sync enough for it to
>>>> matter.
>>>> AFAICS, without proper locking we still get a race condition when the
>>>> other certs start being renewed some time (much less than 10 min) after
>>>> the audit one:
>>>>
>>>> (time axis goes down)
>>>>
>>>>          audit cert                  other cert
>>>>          ----------                  ----------
>>>>      certmonger does renew                .
>>>>    post-renew script starts               .
>>>>   check state of other certs: OK          .
>>>>              .                   certmonger starts renew
>>>>   certutil modifies NSS DB  +  certmonger modifies NSS DB  == boom!
>>>
>>> This can't happen because we count the # of expected certs and wait
>>> until all are in MONITORING before continuing.
>>
>> The problem is that they're also in MONITORING before the whole renewal
>> starts. If the script happens to check just before the state changes
>> from MONITORING to GENERATING_CSR or whatever, we can get corruption.
>>
>>> The worse that would
>>> happen is the trust wouldn't be set on the audit cert and dogtag
>>> wouldn't be restarted.
>>>
>>>>
>>>>
>>>>> The state the system would be in is this:
>>>>>
>>>>> - audit cert trust not updated, so next restart of CA will fail
>>>>> - CA is not restarted so will not use updated certificates
>>>>>
>>>>>> And anyway, why does certmonger do renewals in parallel? It seems
>>>>>> that
>>>>>> if it did one at a time, always waiting until the post-renew
>>>>>> script is
>>>>>> done, this patch wouldn't be necessary.
>>>>>>
>>>>>
>>>>>  From what Nalin told me certmonger has some coarse locking such that
>>>>> renewals in a the same NSS database are serialized. As you point
>>>>> out, it
>>>>> would be nice to extend this locking to the post renewal scripts. We
>>>>> can
>>>>> ask Nalin about it. That would fix the potential corruption issue.
>>>>> It is
>>>>> still much nicer to not have to restart dogtag 4 times.
>>>>>
>>>>
>>>> Well, three extra restarts every few years seems like a small price to
>>>> pay for robustness.
>>>
>>> It is a bit of a problem though because the certs all renew within
>>> seconds so end up fighting over who is restarting dogtag. This can cause
>>> some renewals go into a failure state to be retried later. This is fine
>>> functionally but makes QE a bit of a pain. You then have to make sure
>>> that renewal is basically done, then restart certmonger and check
>>> everything again, over and over until all the certs are renewed. This is
>>> difficult to automate.
>>
>> So we need to extend the certmonger lock, and wait until Dogtag is back
>> up before exiting the script. That way it'd still take longer than 1
>> restart, but all the renews should succeed.
>>
>
> Right, but older dogtag versions don't have the handy servlet to tell
> that the service is actually up and responding. So it is difficult to
> tell from tomcat alone whether the CA is actually up and handling requests.
>

Revised patch that takes advantage of new version of certmonger. 
certmonger-0.65 adds locking from the time renewal begins to the end of 
the post_save_command. This lets us be sure that no other certmonger 
renewals will have the NSS database open in read-write mode.

We need to be sure that tomcat is shut down before we let certmonger 
save the certificate to the NSS database because dogtag opens its 
database read/write and two writers can cause corruption.

rob

-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-rcrit-1079-2-renewal.patch
Type: text/x-patch
Size: 21679 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20130111/5d70ab1a/attachment.bin>


More information about the Freeipa-devel mailing list