[rdo-list] [TripleO] Newton large baremetal deployment issues

Charles Short cems at ebi.ac.uk
Tue Nov 15 22:39:19 UTC 2016


Hi,

So I have finally tried OSP9 and here are the results -

3 Controllers 40 compute - 1 hours 20 mins to deploy.

This is much more the sort of deployment time I was expecting :)

I then tried TripleO Newton Stable again with  3 Controllers 40 Compute -

4 hours and counting.....

The two deployment scripts (for OSP9 and TripleO Newton) were pretty 
much identical (allowing for any changes between releases)

During the OSP9 deployment I could use nova list to list the nodes. The 
Undercloud API access was in general very responsive.

During the TripleO Newton deployment 'nova list' hangs -
ERROR (ClientException): The server has either erred or is incapable of 
performing the requested operation. (HTTP 500)
Undercloud API access was very sluggish.
I noticed Keystone was stuck at 140% for most of the deployment (albeit 
multi threaded) which is not the case for OSP9.

I know it is hard to compare two releases, but the difference is enormous.
I will stick with OSP9 for now as this for me works properly out of the 
box for  large deployments.

Charles

On 14/11/2016 09:01, Charles Short wrote:
> Hi Graeme,
>
> Thanks for the reply.
>
> I used these images -
>
> http://buildlogs.centos.org/centos/7/cloud/x86_64/tripleo_images/newton/delorean/ 
>
>
> I installed the stable repo following the documentation here -
>
> http://docs.openstack.org/developer/tripleo-docs/installation/installation.html 
>
>
> for example -
>
> sudo curl -L -o /etc/yum.repos.d/delorean-newton.repo 
> https://trunk.rdoproject.org/centos7-newton/current/delorean.repo
>
> sudo curl -L -o /etc/yum.repos.d/delorean-deps-newton.repo 
> http://trunk.rdoproject.org/centos7-newton/delorean-deps.repo
>
>
> The difficulty I am having is that when I test with a small deployment 
> all works fine. So you would assume just adding more compute nodes 
> would not be an issue.
> Testing this is painful due to the time it takes for a large 
> deployment to fail. It seems to be only scale that is the issue.
>
> I will try and get you some logs
>
> Regards
>
> Charles
>
>
>
>> So the symptoms you are showing me above almost definitely leads me to
>> believe that neutron-server failed on the undercloud, which would
>> explain why the deploy and nova failed to work. It could have failed
>> before or during the deploy. We regularly see instances where
>> neutron-server times out upon system boot (takes slightly longer to
>> start than systemd expects), so we need to start it manually.
>>
>> To be clear, The undercloud has been installed using this repo
>>
>> http://buildlogs.centos.org/centos/7/cloud/x86_64/rdo-trunk-newton-tested/ 
>>
>>
>> Which overcloud images are you using? I'm not seeing any provided in
>> that repo, and I just want to make sure the undercloud and overcloud
>> packages match (as the tripleo-heat-templates package on the undercloud
>> has to align with the openstack-puppet-modules package on the overcloud
>> iamges).
>>
>> Also, is it possible to get a copy of all the neutron-server log from
>> the undercloud? If we can understand why neutron-server failed, that is
>> the first step towards getting a working deployment.
>>
>> It would be great if we could get a full sosreport with all the system
>> logs, to check for other errors. I'm assuming there were no problems
>> with the 'openstack undercloud install' process?
>>
>> Regards,
>>
>> Graeme
>>
>

-- 
Charles Short
Cloud Engineer
Virtualization and Cloud Team
European Bioinformatics Institute (EMBL-EBI)
Tel: +44 (0)1223 494205




More information about the rdo-list mailing list