GSoC 2008: Transifex - Federated architecture

Andreas Louca alouca at gmail.com
Thu Apr 3 14:41:16 UTC 2008


Dear all,

As part of the Google Summer of Code, I will apply to Fedora project,
for an improvement to Transifex. The project title is Transifex:
Server-federated architecture, and it was originally suggested by
Dimitris Glezos, and this proposal is drafted on top of the ideas we
discussed with him earlier.

I would really like your comments on this, so I can adjust the
proposal according to your thoughts and needs, so the project can
create something useful.

Thank you for your time

Proposal:

Summary of idea: (Taken from
http://fedoraproject.org/wiki/SummerCoding/2008/Ideas#tx_federated)

One of the most important assets of Tx is its community bridging
architecture. When multiple Transifex instances exist, they might want
to share stuff, like users, translation statistics and maybe
submissions. Eg. a Fedora user could submit something to the Debian
server, which will wait the approval of a Debian language leader.

    * For a single Tx instance to scale well, one might want to split
its functionality into (say) el.fooTx.org, pt.fooTx.org, etc.
    * This splitting/joining requires something like a server to
server architecture/protocol and the ability to aggregate and delegate
stuff on both sides
    * Some projects might want to have their own Tx instance which
contains internal projects, not publicly visible. At the same time,
they might have public projects wanting to freely receive
translations, but also they might want to allow their internal
translators to use this instance as a gateway to all other Tx servers.
    * Minimizing the independence between the scattered Tx instances
(ie. building bridges between the Tx "islands") will bring the whole
"community bridging" idea of Transifex to a whole new level. This is
the goal we want to reach in the long term.

The most important aspect of this idea is architecture design. The
student will need to have a very good image of content and translation
work-flow, the process Transifex adopts, and study how open source
projects work together.

Deliverables:

* Implementation of a server-to-server API to allow all features
discussed in the proposal
* Documentation of the new interfaces added
* Documentation of the implemented Server API

Implementation proposal:

Since it is required for this project to scale, each server must be
independent as an instance but maintain the link with the rest of the
project (eg. the EL Tx server must know that is part of the Fedora Tx
server). This requires that the server exposes the following services
(most probably using a XML-RPC system):

    * Common authentication mechanism (either by using OpenID for
example, or exposing an authentication mechanism for local user-base)
    * Ability to list local projects
    * Ability to associate local projects with remote projects
          o Automatically: in cases that the project name is the same
among Tx servers (this is the case for el and en Fedora Tx servers for
example)
          o Manually: in cases that two servers from different
distributions share data, and might now keep the same name for a
project (eg. Ubuntu and Fedora)
    * Ability to export translation statistics for a project
    * Ability to pass or retrieve translation tasks from other servers
    * Ability to export or import translation files, in case that a
translation is accepted by another Tx server for upstream commit


The Tx servers will have to share an API key, unique per project and
Tx server pair, to be used in identification and permissions.

Each Tx server will keep the state of the project that it is
associated with other Tx servers locally (cached) and this state will
be refreshed automatically (every N minutes). File export calls will
be executed on-demand.

Roadmap:

    * Initial Protocol and API design [1 week]
          o Data flow diagrams, to demonstrate the process
          o Preliminary web-services API documentation and design
          o SOAP web-service definition
    * Implementation of webservices, as discussed above (Milestone 1) [3 weeks]
    * Milestone 1 Complete - Interim testing and evaluation period [1 week]
    * Improvement and further development using feedback from testing
period [2 weeks]
    * Milestone 2 Complete - Interim testing and evaluation period -
first public test [1 week]
    * Improvement and further development using feedback from public
test [2 weeks]
    * Milestone 3 Complete - Integration with Trasifex codebase and
continuation of public test [1 week]
    * Bugfix and minor features with continued testing [1 week]


About me:
I'm a second year Computer Science student @ Lancaster University.

I've been developing web-apps for sometime now, both for fun and work,
using mostly PHP (using Zend Framework or developing Wordpress
plugins) and lately Python (using Django framework). I'm a Linux user
myself, using it on the Desktop and servers, so I am familiar with
open-source concepts and how communities work.

I also have a deep interest in distributed systems, after this year's
course at University. This project would allow me to put the skills
I've learned into practice, by designing and implementing a
distributed Transifex server architecture.

For more information about my experience, please see my CV at:
http://andreas.louca.org/wp-content/2008/03/alouca_cv.pdf




More information about the Fedora-trans-list mailing list