In December 2012 Red Hat announced its participation in a three-year European Union initiative to create a Large-scale Elastic Architecture for Data-as-a-Service (LEADS). Those three years have come to an end, so what did we learn? Well, the major take home from this is that LEADS can work in principle and it has the potential to alter how companies work with distributed datacenters.
The intention with LEADS was to create a technology that could provide democratized access to public data sets such as from the open web, social media, and data exposed from various open data movements, index that data, and open it to query. This would in turn enable a third party to create a Data-as-a-Service offering.
It is important to note that while this is an EU initiative involving various academic institutions, the proposal was not necessarily to provide a ‘public good’ but to provide a technology that a commercial entity could use to provide a paid-for service targeted at either smaller companies where it would be less feasible economically or technically to accomplish the same thing, or for larger companies that do not wish to perform the operations in-house. The ability to reindex large amounts of data can be expensive for many companies, including those that are not driven by that core requirement. Large organizations such as Google and Apple may be able to afford to do it, but the costs are likely prohibitive for average sized corporations.
Important to the success of LEADS would be its ability to work across a widely distributed geographic area. Therefore one objective and vision was to spread the LEADS infrastructure across several ‘micro’ datacenters. Not only could this architecture make big data more affordable since it would be a shared resource, but it could also reduce latency as users would have a greater chance of being closer to a datacenter, as well as improve resiliency due to the shared nature of the data.
To index the web in this way, micro datacenters in France would, for example, index sites hosted in France to help lower the cost of peering and improve access to the bandwidth; datacenters in Germany would do German sites and so on. Ultimately, aggregated together, this could become a global index.
Another potential advantage of this architecture is the freedom of choice of supplier that it delivers. If you have technology that enables you to run the same software seamlessly across different datacenters in an efficient way, and if for some reason one of your Infrastructure-as-a-Service providers goes down or makes you unhappy, you may be able to switch suppliers more easily.
One of our goals now is to demonstrate that if one datacenter goes down, the system can still work. Already, we have demonstrated that even as a distributed system, efficiency can be maintained.
To be able to provide a more robust, reliable infrastructure the LEADS project members needed to enable a ‘multi-site, locality-aware distributed storage architecture’. Put simply, that means the data is not stored on a single machine nor a single datacenter, it is distributed across multiple machines and multiple datacenters. This is where Infinispan and the LEADS provided improvements came into the project. Locality-aware means the framework lets users decide where they want to push data. It is this ability to push data where users want that powers the potential advantages of LEADS – lower costs, reduced latency and greater resilience.
There are existing tools that let users push data into two or three datacenters, but with LEADS we have a framework that can give users a more fine-tuned and granular choice as to how and where they can push data. This capability can be used to make placement decisions based on where the index data comes from and the kind of queries being executed.
We see a lot of interesting potential for this capability to push data out across multiple sites; consequently we have put this technology into the Infinispan community (Infinispan Ensemble) to assess the pick up and see whether there are technical limitations that we need to overcome.
Of course, indexing the data is not enough – you need to be able to query the data, and the LEADS platform enables a user to have the ability to do that in a distributed fashion: the system can maximize local query execution as much as possible to help eliminate most of the cross-datacenter communication. This can keep costs down, reduce latency and improve resiliency.
Infinispan users are able to perform distributed queries across datacenters. In practice that means there is a single data grid in each of the datacenters, and that a a framework on top talks to all of the individual data grids, and the aggregation is then carried out to make it look like it’s a single data grid. It helps keep the programming model simple while offering the multi-datacenter approach.
What might this all look like in a commercial setting? Why build solutions on top of indexed public data? Say you sell goods in a competitive market. You could build software on top of the query system that provides sentiment analysis to determine whether a given item or item category is liked more or less over time, and why. This can help detect whether a blogger or group of influencers is unhappy about a certain product, and help map trends over time. The goal is to have the ability to spot a trend and then predict what could be popular. By having the potential ability to predict what could be popular four to six months in the future, you may be able to gain a competitive advantage.
Sull'autore
Altri risultati simili a questo
Ricerca per canale
Automazione
Novità sull'automazione IT di tecnologie, team e ambienti
Intelligenza artificiale
Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque
Hybrid cloud open source
Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido
Sicurezza
Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti
Edge computing
Aggiornamenti sulle piattaforme che semplificano l'operatività edge
Infrastruttura
Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale
Applicazioni
Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili
Serie originali
Raccontiamo le interessanti storie di leader e creatori di tecnologie pensate per le aziende
Prodotti
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Servizi cloud
- Scopri tutti i prodotti
Strumenti
- Formazione e certificazioni
- Il mio account
- Supporto clienti
- Risorse per sviluppatori
- Trova un partner
- Red Hat Ecosystem Catalog
- Calcola il valore delle soluzioni Red Hat
- Documentazione
Prova, acquista, vendi
Comunica
- Contatta l'ufficio vendite
- Contatta l'assistenza clienti
- Contatta un esperto della formazione
- Social media
Informazioni su Red Hat
Red Hat è leader mondiale nella fornitura di soluzioni open source per le aziende, tra cui Linux, Kubernetes, container e soluzioni cloud. Le nostre soluzioni open source, rese sicure per un uso aziendale, consentono di operare su più piattaforme e ambienti, dal datacenter centrale all'edge della rete.
Seleziona la tua lingua
Red Hat legal and privacy links
- Informazioni su Red Hat
- Opportunità di lavoro
- Eventi
- Sedi
- Contattaci
- Blog di Red Hat
- Diversità, equità e inclusione
- Cool Stuff Store
- Red Hat Summit