NEC TM launches second dissemination day on 23rd September in Zagreb

The NEC TM team will welcome the East and Central European National Points on the 23rd September in Zagreb. Talks by members of the NEC TM consortium will focus on disseminating the project’s goals and benefits of fully exploiting public administration-generated data. This will follow with country report presentations and findings, along with talks from other European Central Translation Memory Projects.

The European Commission’s aims to promote and foster digital economies has made data generated by the public sector increasingly relevant. The NEC TM consortium’s objective is to organise unexploited national bilingual assets generated by public administrations. These language assets can be put to use as open data and general data for machine learning.

What’s more, the NEC TM project aims at providing a national and European free translation memory server that member states can use to store and exploit translation memories produced as a result of public administration contracts. This will lower translation costs at a national level and across the EU.

The consortium is also working on market research, an EU-wide investigation to ascertain the expenditure of outsourced translations by public authorities in each member state. This research will help National Anchor Points understand the current size of public expenditure in translation, and how much could be saved by centralising data and adopting the fast-performing NEC TM database.

The current market research will be presented at the 1-day event in Zagreb, along with presentations on the technical aspects and functionality of the NEC TM translation memory server. The NEC TM dissemination day presents an exciting step towards further centralisation and efficient data sharing across the EU.

Download the NEC TM programme HERE

Beta version of NEC TM Launched

Alex Helle is Chief Operations for the open-sourcing of Pangeanic’s ActivaTM into the National and European Central Translation Memory. With the Beta version now available, we share an interview with Alex to find out more about the NEC TM project and how Member States can benefit from it.

What exactly are the aims of NEC TM?

We want to provide the tool with which European Administrations will organise their translation procurement and, in parallel, create national linguistic assets and bilingual data. By having a central repository where Public Administrations can run fuzzy matching and centralise their translation memories, they not only save money but also have a digital infrastructure where all the bilingual text data created through translation procurement contracts is stored. This can be shared at different levels or not. Several Administrations can have different deployments. The point is that each Member State can increase this “national language treasure” with every translation contract, and this can be done on-the-fly or at the end of a translation contract.

In short, NEC TM provides a centralised infrastructure for efficient data sharing, TM matching, TM retrieval, and domain categorisation of resources generated in Member States/ EEA, with an emphasis on countries with low language resources.

This will enable the development of NEC TM, which will be an open source software developed from Pangeanic’s translation memory database ActivaTM.

What are the benefits of NEC TM?

The benefits of NEC TM are as follows:

  • Unified TM: NEC TM is CAT agnostic, so it can be used from any CAT tools used by the Translation departments of the Member States/EEA or the external providers.
  • Open Source: Pangeanic will turn this commercial software into GPL (open source General Public License) and customise it for free, to be used by Public Administrations.
  • Solid framework: NEC TM will also provide a centralised infrastructure for efficient data sharing, TM matching, TM retrieval, and domain categorisation of resources generated in the Member States/ EEA
  • Lower translation costs: The NEC TM Data consortium’s objective is to organise unexploited national bilingual assets that can be used as open data and general data for machine learning, in order to lower translation costs at a national level and across member states. It will gather translation memories from previous national contract awards from Member States and help them to centralise these language assets with the fast-performing NEC TM
  • Data bridge: NEC TM will allow the Public Administrations to share data with themselves and with their translation contractors

How was this project conceived?

The EC’s Programme was quite clear on the objectives: data gathering and language tools. We believed an initiative like NEC TM could fulfil the language tool option as it empowers Public Administrations to gather data which otherwise is lost and remains ins silos, at translation companies’ internal servers or PCs. European Public Administrations are losing valuable assets they pay for with public money because they simply lack the tool to organise the repositories (live or as TMX after the translation contract is over). In reality, most translation companies run translation servers in one way or another. The point here was to come up with a robust solution that could be implemented at a national level.

However, NEC TM will not be implemented if we do not know the size of the expenditure in each country. We can’t provide the cure if we do not know there is a “problem”. I don’t like to call it a problem, because it is just the level of expenditure, but it’s difficult to organise something if we do not know the size of it. So, in parallel to the software development, half of our project is devoted to a market study, country by country that will help public institutions and the EC itself to understand the size of the public expenditure country by country. This report will be the basis for NAPs to speak to relevant authorities and push for national adoption. There are strong dissemination efforts in 3 European areas: September in Zagreb for Central Europe and the Balkans, Spain, Malta and Poland as national dissemination, and Northern region in Latvia, co-hosting with ELRC. We will also co-host in France and Luxembourg to maximise influence and awareness about market size and the advantages of a national translation memory.

How has Pangeanic helped reached this milestone for NEC TM?

The NEC TM Data proposal includes the provision of a central TM-sharing repository, called the NEC TM Data platform. The platform will be based on Pangeanic’s commercial tool ActivaTM and it works on a similar concept using industry practices as used by other commercial tools and private organizations such as Memsource, TAUS, etc. Pangeanic will turn this commercial software into GPL (open source General Public License) and customise it for free use for Public Administrations.

How different is this software to the one implemented by other projects?

NEC TM emphasizes fuzzy matching and leveraging for the translation departments and its own translators. For the scope of the project, plugins for different CAT tools will be provided so the Translation Project Managers or the translator can use NEC TM directly

Moreover, access to the tool can be live, so translators feed the national repository as they work. We are offering a “live” tool, not a static repository. ELRI, for instance, will be a collection of bilingual assets, from which a TMX is created for translators to work.

What are the future steps?

We are half-way through the project, these are exciting times… a lot of working ahead. We want to

  • To identify national administrations translation contractors from public sources (Official Gazzetes) to create a pan-European report identifying the sector contractors, main contractors and main contracts in Member States.
  • To set a secure legal framework for PPAA and vendors to share data (IP clearance).
  • To closely collaborate with ELRC so that Info Days for PPAA become part of the conference agenda and at translation organizations’ agenda to disseminate information about the data creation, flow and gathering initiative as well as the legal framework.
  • To create plugins for different CAT tools used by the PPAA

What type of license/ hardware will be used to implement NEC TM?

NEC TM will be GPL (open source General Public License) and free use by Public Administrations.

This is a small summarization of the hardware and software requirements:

  • Hardware:
    • RAM: 64GB recommended, 16GB minimum
    • CPU: Not important
    • Disk: SSD of 1TB recommended, 256GB minimum
  • Software:
    • SO: Ubuntu 16.04 recommended or later

Is there anything else in particular that is needed in order to launch NEC TM?

NEC TM will work through Docker, a simple operating-system-level virtualization which works on Linux, Windows and MacOS. The usage of Docker enables the easy installation and update of NEC TM.

In which countries will the beta version be launched?

Early adopters are Spain, Malta and Croatia, with Slovenia coming close. It is already in use in Latvia as part of the Hugo.lv project. The dissemination activities will help us introduce NEC TM in more Member States. This is an ongoing discussion with NAPs.

Carmen Herranz-Carr

Carmen is a Data Analyst currently working for the NEC TM project in which she forms part of a team collecting and managing translation data information across the EU. Her interests lie in AI, social affairs, and language technologies.

NECTM engages National Anchor Points at Language Resource Board in Amsterdam

On the 27th and 28 March 2019, members of NECTM joined the 8th LRB Meeting hosted by ELRC’s Language Resource Board in Amsterdam. Talks were given by representatives of the European Commission, such as June Lowery-Kingston, Head of Unit G.3 Accessibility, Multilingualism and Safer Internet; followed by several country report presentations from National Anchor Points. Overall, the meeting revealed invaluable insights into the main challenges of data sharing across member states.

The NECTM team worked to engage National Anchor Points at the meeting, by discussing and sharing knowledge on the benefits of centralising data to aid the development of sustainable data supply chains.

“We are very happy to have engaged National Anchor Points from all over the EU at this event. It has become clear that there is a huge need for language data centralization at national levels and also the tools to manage it.”

– Amando Estela, CHIEF TECHNOLOGY OFFICER

A crucial aspect of the NECTM project, is to promote better data-sharing practices; currently, public administrations lack awareness on the benefits of open data and how language, specifically TMs (translation memories), can be capitalised as a resource for AI. The 8th LRB Meeting facilitated the creation of synergies between the different projects across Europe and allowed for the NECTM team to establish further contact and engagement with the National Anchor Points.

“Our project, the National and European Central Translation Memory (NEC TM) offers knowledge about public spending in translation services (the size of the public translation market in each Member State) and also the software for public administrations to manage a central TM. This benefits the teams of translators at public administrations by centralising all  TMs and creating national Big Data. Another advantage is fuzzy matching -an industry
standard- which now the buyer (the Public Administration) can manage.”

– Manuel Herranz Perez, CEO

NECTM @ 8th LRB

IP Legal Services Contract for EU project (Public Administration TM Collection and Data Sharing)

IP Legal Services for NEC TM EU project

Pangeanic is currently asking for bids to help with the legal framework of its EU NEC TM project. Our firm is well-established and has built a solid reputation in the language technology field and provides a wide variety of translation services, data collection and language technology services to governments, institutions and commercial clients.

The legal services contract will become part of NEC TM’s solid framework in its Public Administration data sharing project and will facilitate the organisation of translation data assets (translation memories) which are currently held by contractors  after the service is provided and not returned to the Public Administration with the translation service.

Bids are open to legal firms around the EU and EEA countries.

Responsibilities and Scope of the Contract:

  • Establish a legal framework for the national implementation of NEC TM (as based on ActivaTM) in all Member States.
  • Define the scope of IP for data in national translation contracts
  • Establish general guidelines for Member States to incorporate translation memory collection as part of their legislation
  • Define and establish a legal framework for data sharing from Member States Public Administrations to the EC’s central translation memory

You will need:

  • A law degree and at least 2 years’ experience in the legal profession
  • Sound knowledge of IP as related to translation memory ownership or the translation sector
  • Use existing Open Data and Data Sharing agreements as a reference
  • Familiarity with EC contracts and policies with regards to data collection initiatives
  • Engage with existing initiatives that are establishing legal frameworks for translation memory and data collection for the EC
  • Experience in mediation and negotiation with a confident and measured approach to conflict resolution
  • Liasion with Pangeanic’s NEC TM team
  • Excellent written and verbal communications skills including strong presentation skills
  • Effective use of computer systems for the generation of professional reports, scheduling and client database management
  • Strong relationship-building skills
  • Ability to work autonomously and as part of a team
  • Good organisational skills and capacity to be detail-oriented
  • High level of integrity and professional accountability

This role represents a great opportunity within a dynamic, challenging and professional environment. If you are interested and meet the selection criteria, please send your offer the legal services contract and cover letter to Víctor Ignacio, v.ignacio@pangeanic.es

Find more information about Career opportunities at: https://www.pangeanic.com/professional-translation-company/careers/

Consortium publishes first documents on NEC TM

The NEC TM consortium has published two initial documents on the NEC TM platform to meet the first milestone agreed with the European Commission. The Functional Description details the functional specifications of the NEC TM Platform. This document was drafted by Pangeanic and was presented to other consortium members at the kick-off conference in Paris on the 20-22 September with the aim to improve the consortium’s understanding of the core functionalities of the NEC TM System. The Technical Description describes the software design, architecture and system design of NEC TM. This document was also drafted by Pangeanic and published on the 28/09/2018.

Info Day at Paris

NEC TM Data consortium will develop and build on existing experiences with national administrations to run an Info Day, in which decision makers and stakeholders at the ministry level in several countries are made aware of the value of gathering and organising translation resources (the generation of language data, multilingual Big Data, the savings in translation costs, benefits of the use of machine translation, etc.). Insofar as possible, the Info Days will be co-located with the ELRC workshops, which will be held throughout 2018-2019.

Where to find us

At the next ELRC Meeting Language Resource Board (LBR), NEC TM Data consortium will have an information point in which participants will be able to obtain information and present the NEC TM Data project, its objectives and the project methodology.

Time: 20th September 2018-09-20, 08:30- 17:15

Place: 7th LBR meeting, Mercure Paris 19 Philharmonie, La Villette, 216 Jean Jaures, 75019 Paris, France.