The NEC TM Data consortium’s objective is to organise unexploited national bilingual assets that can be used as open data and general data for machine learning, in order to lower translation costs at a national level and across member states. It will gather translation memories from previous national contract awards from Member States and help them to centralise these language assets with the fast-performing NEC TM database, following industry best practices.
New translation contracts will benefit from fuzzy matching analysis and translation companies will be able to work online and connect to each national NEC TM version. Translation data will be categorised and classified per domain in NEC TM, and a connection provided to eTranslation and ELRC.
The consortium will deliver a pan-European data-sharing awareness program, engaging national public administrations by providing a solid legal framework for data sharing.
It will also provide a solid framework for public administrations to adopt as policy for general data sharing from public translation contracts and real adoption scenarios at a national level. NEC TM Data promotes better data-sharing practices using open standards in translation contracts between public administrations and translation service providers.
In sum, the NEC TM Data consortium advocates for the facilitation of a single digital market. It will act as a meeting point for European data gathering efforts and the collection of national digital big data. By building a data bridge between public administrations and translation vendors, NEC TM Data will promote the free flow of data between Public Administrations and translation professionals.
EU national public administrations are huge buyers of translation services, purchasing many millions of euros of translations annually. However, since translation contracts often do not require for translation service providers to return Translation Memories to the contracting body, public administrations do not receive bilingual assets along with their completed translations. Consequently, public administrations in Europe are not leveraging valuable bilingual assets.
The NEC TM Data consortium will aim to inform public administrations at the national level about translation technologies available for language resources, as well as to lobby for TM gathering to be enshrined in national translation contracts. Given that the majority of translated data generated in public contracts is currently not returned to public administrations and therefore its potential is not being maximised, NEC TM Data will ensure that public administrations make better use of the translated data generated in public contracts.
To support these central aims, NEC TM Data will also provide the centralised infrastructure for efficient data sharing, TM matching, TM retrieval, and domain categorisation of resources generated in 10 Member States/ EEA (including the participating 3), with an emphasis on countries with low language resources. This will enable the development of NEC TM, which will be an open source software developed from Pangeanic’s translation memory database ActivaTM.
In order to provide maximum awareness, NEC TM Data will engage national language programs and data collection efforts in Spain and Latvia, and national authorities in Croatia.
Enable a Single Digital Market
Promote the flow of translation data (specifically Translation Memories) from translation companies to public administrations
Encourage pan-European Data Sharing
Increase the volumes of parallel data available to the European Commission
Organise Big Data
Organise bilingual Big Data currently that is currently lost in translation companies’ internal translation memories and processes
Maximise Translation Profits
Enable public administrations to fully leverage TMs
Support the work of translators working on public sector texts