Unified TM

CAT agnostic

Open Source

Encourages open collaboration

Solid framework

For efficient data sharing

Lower translation costs

Benefit by organising translation resources

Data bridge

For Public Administrations and translation contractors

The NEC TM Data consortium’s objective is to organise unexploited national bilingual assets that can be used as open data and general data for machine learning, in order to lower translation costs at a national level and across member states. It will gather translation memories from previous national contract awards from Member States and help them to centralise these language assets with the fast-performing NEC TM database, following industry best practices.

New translation contracts will benefit from fuzzy matching analysis and translation companies will be able to work online and connect to each national NEC TM version. Translation data will be categorised and classified per domain in NEC TM, and a connection provided to eTranslation and ELRC.

The consortium will deliver a pan-European data-sharing awareness program, engaging national public administrations by providing a solid legal framework for data sharing.

It will also provide a solid framework for public administrations to adopt as policy for general data sharing from public translation contracts and real adoption scenarios at a national level. NEC TM Data promotes better data-sharing practices using open standards in translation contracts between public administrations and translation service providers.

In sum, the NEC TM Data consortium advocates for the facilitation of a single digital market. It will act as a meeting point for European data gathering efforts and the collection of national digital big data. By building a data bridge between public administrations and translation vendors, NEC TM Data will promote the free flow of data between Public Administrations and translation professionals.

EU national public administrations are huge buyers of translation services, purchasing many millions of euros of translations annually. However, since translation contracts often do not require for translation service providers to return Translation Memories to the contracting body, public administrations do not receive bilingual assets along with their completed translations. Consequently, public administrations in Europe are not leveraging valuable bilingual assets.

The NEC TM Data consortium will aim to inform public administrations at the national level about translation technologies available for language resources, as well as to lobby for TM gathering to be enshrined in national translation contracts. Given that the majority of translated data generated in public contracts is currently not returned to public administrations and therefore its potential is not being maximised, NEC TM Data will ensure that public administrations make better use of the translated data generated in public contracts.

To support these central aims, NEC TM Data will also provide the centralised infrastructure for efficient data sharing, TM matching, TM retrieval, and domain categorisation of resources generated in 10 Member States/ EEA (including the participating 3), with an emphasis on countries with low language resources. This will enable the development of NEC TM, which will be an open source software developed from Pangeanic’s translation memory database ActivaTM.

In order to provide maximum awareness, NEC TM Data will engage national language programs and data collection efforts in Spain and Latvia, and national authorities in Croatia.

After running an initial study on the number of translation contracts in the EU, the data compiled will be shared with the European Commission and national authorities so that these bodies are made aware of the costs of maintaining their translation data hermetic, as well as of the potential data that is being generated that such bodies could be capitalising on.
In order to accomplish this, NEC TM Data proposes the following central activities:

  • Contact all companies that have been awarded a translation contract from 2015-2018 on behalf of national authorities and the European Commission to obtain public administration data which currently is not being put to use (in TMX format).
  • Deploy a central pan-European data-sharing platform for uploading and sharing TMs (TMX files) directly between public administrations and translation service providers, as well as sharing TMs between translation professionals working on translations for the public sector.
  • NEC TM will contain a number of API connectors to popular translation tools used by national administrations (commercial tools such as Trados Studio, MemoQ, Memsource; or free tools like OmegaT or MateCat) so that system users can connect directly to the service, thereby supporting their work and boosting their productivity when working on translation contracts for public administrations.
  • Establish NEC TM as a national central repository of public administration data with full classification and categorisation features.
  • Install NEC TM as an online central translation memory at a national level to which translation companies can connect to for all national translation contracts.
  • Add anonymisation features for data shared between Public Administrations.

Moreover, this activity aims at providing an exploitation plan based on software agreement supplies of a customised docker system containing NEC TM Data.

To aid this, the most suitable business models will be defined to exploit the outcomes of the project. For that purpose, the size of the market – in terms of potential beneficiaries- their needs and the feasibility of reusing central repositories and live translation capabilities will be analysed.

The business model and the approach towards the potential identified clients– both from public and private sectors – will result in the commercialisation plan. Implementing the commercialisation plan will ensure the long-term sustainability of the software since it will guarantee its use in the long-run.

ActivaTM

The NEC TM Data proposal includes the provision of a central TM-sharing repository, called the NEC TM Data platform. The platform will be based on Pangeanic’s commercial tool ActivaTM and it works on a similar concept using industry practices as used by other commercial tools and private organizations such as Memsource, TAUS, etc. Pangeanic will turn this commercial software into GPL (open source General Public License) and customise it for free use by Public Administrations.

Data Sharing

Member State institutions will be actively involved in the implementation and deployment of the NEC Data TM platform (these are listed in the Consortium Members listing). Additionally, the ELRC initiative will be invited to be a contributor of relevant training data. Vendors of translation services will be contacted so that they can provide the TMs to populate the NEC TM versions at a national level. This data can then be shared to ELRC as each national body sees fit, whilst benefitting from a connection to eTranslation. This will help to improve the quality of the Automated Translation platform and foster broader usage and acceptance of automated translation services across member states.

The NEC TM platform will connect to the ELRC repository (ELRC-SHARE) to exchange data for fuzzy matching or full TM import-export. It can also be deployed at a national level so that each EU Member State can centralise all translation memories. This will allow for the involved bodies to actively access and interface with the centralised data repository. Universities, research centres, and industry will benefit from this.

NEC TM Data will contain a number of API connectors to popular translation tools used by national administrations, so that system users can connect directly to the service. This will support their work and boost productivity when working on translation contracts for public administrations.

Furthermore, NEC TM will have a secure (ESens4, Domibus) API connection to eTranslation with a register and access rights so that eTranslation can be used whenever there is no match from the TM. This will ensure that the CEF Automated Translation platform will be integrated into several public services using multilingual best practices and architectures.

Fuzzy Matching

Fuzzy Matching works with translations that are less than 100% accurate when finding equivalence with translations that correspond between text segments and the previously built system database.

This translation memory (TM) tool searches the previously built database and detects matching sentences and phrases. It then suggests these to the translator, giving the translator the option to use the proposed translation. Like this, fuzzy matching can lead to quicker translations and increased productivity.

License pending.

Objectives

Enable a Single Digital Market

Promote the flow of translation data (specifically Translation Memories) from translation companies to public administrations

Encourage pan-European Data Sharing

Increase the volumes of parallel data available to the European Commission

Organise Big Data

Organise bilingual Big Data currently that is currently lost in translation companies’ internal translation memories and processes

Maximise Translation Profits

Enable public administrations to fully leverage TMs

Support Translators

Support the work of translators working on public sector texts

Our partners

Pangeanic specialises in the automation of as many language processes as possible, serving cross-national institutions, multinationals and government agencies all over the world. Notably, Pangeanic developed the DIY self-training platform PangeaMT, which includes data-cleansing and other user-empowerment features with implementations at Sony Europe, Sybase, Veritone, and the US government (IBWC).
Specialises in developing multilingual data technologies, such as custom machine translation and content analytics tools. Tilde boasts expertise in developing language technology services for high-demand applications. Tilde has participated in numerous EU programmes, building language technologies for various scenarios – from government e-services to multilingual financial data analysis and writing for digital media.
The Secretary of State for Digital Progress (SEAD) is the body responsible for promoting the coordination of plans, technological projects and action programmes for the digital transformation of Spain. Among other functions, SEAD is responsible for the implementation of Plan for the Promotion of Language Technologies for Spanish and the co-official langauges of Spain. Moreover, SEAD is also the National Anchor Point of the ELRC initiative.

A renowned industry-focused provider of linguistic solutions enabling companies to reach,
engage and support global clients. Ciklopea’s solutions include translation, localization and consulting in more than 150 language pairs and a strong emphasis in Croatian resources.

Latest news

Contact us

Follow by Email1
Facebook0
Google+0
https://www.nec-tm.eu">
Twitter
LinkedIn