20201120_Speaker_Keyvisual_final_Vorlage_Zeichenfläche 1

A language assistance platform
»Made in Germany«

Werden Sie assoziierter Partner dieses Projekts

Eine Partnerschaft bietet Ihnen die Möglichkeit, Ihre marktspezifischen Anforderungen an ein natürlichsprachliches Assistenzsystem zu realisieren.

Voice assistants are a core technology for human-machine interaction and provide access to product offerings and services via natural language. So far, companies in the US and Asia have dominated the market for voice assistance technology. However, the demand for voice assistant solutions in German production and retail industries is enormous, especially with regard to data sovereignty, as there is a need for better protection and the secure exchange of personal data. A German-made voice assistant solution would make this possible by implementing European standards of data security. At the same time, a new level of quality in human-machine communication that goes far beyond the semantic capabilities of current systems is enabling much more user-friendly systems.

To this end, experts from the fields of speech signal processing, natural language understanding, artificial intelligence and software engineering have joined forces at Fraunhofer IIS and Fraunhofer IAIS. Fraunhofer IIS already holds a world-leading position in the field of acoustic signal processing technology, which forms the basis for the high reliability and robustness of speech processing. Fraunhofer IAIS has developed leading algorithms in the field of automatic speech recognition and question answering. The goal is to further expand this technological leadership and integrate it into a scalable, multilingual and open voice assistant platform. Fraunhofer technology can then be adapted to specific company requirements and support the data sovereignty required in the production and retail industry.

As part of the »Artificial intelligence as a driver for economically relevant ecosystems« innovation competition Fraunhofer is working on a concept for SPEAKER, a large-scale research and development project supported by funding from the German Federal Ministry for Economic Affairs and Climate Action.

Voice assistants in maintenance

Deutsche Bahn has one central concern: shifting more traffic to rail. To achieve this, we are focusing, among other things, on digitizing our processes. In maintenance, voice assistants are an important lever. With SPEAKER, we simplify the documentation and retrieval of information for our employees.

Voice assistant for police

Mobile voice assistant for police forces during accident and crime scene recording. Both the spoken protocol of the police officer and conversations with people on the scene, are transcribed in real time and displayed in protocol form.

Voice assistant for vehicle development

In vehicle testing and application, engineers are faced with the challenge of safely driving a vehicle and simultaneously completing complex tasks. This includes controlling measurement and application software, generating bug tickets or simply taking notes. Our Vehicle Test Assistant helps drivers to perform these tasks efficiently by voice, while protecting personal information and sensitive customer data. It also understands specialized vocabulary, runs locally, is customizable and can be easily integrated into an existing software environment.

Hospitality – Smart Speaker in Hotels

Voice assistants have already become a matter of course in many people's homes. So why not take advantage of digital assistance when traveling? Our voice assistant for hotel rooms makes it possible! It offers entertainment, information, booking/reservation of hotel services and many other functions: This turns a hotel stay into a modern experience. Of course, privacy and data security of the guests are always in the focus. The Smart Speaker and its functionalities can be individually adapted to the offers, systems and special features of hotels and helps to reduce the workload of hotel staff.

SPEAKER research project - DATEV speaks!

In the SPEAKER project, DATEV is evaluating tax and service-oriented use cases that will be incorporated into customer-oriented solutions. Questions from customers are answered around the clock in dynamic dialogs (QA).

Cloud-based Android Assistant

Our cloud-based assistant for Android operating systems works across devices and offers excellent speech recognition, as well as high-performance speech synthesis. The use of the SPEAKER platform in conjunction with a cloud makes it possible for the assistant to meet both the high requirements for data security and privacy as well as the expectations of users in terms of features and performance. Speaker recognition support is a highlight here, which is also possible without violating data protection requirements, as the client retains sole data sovereignty.

Previous Next

1 2 3 4 5 6

Summary

The SPEAKER project seeks to develop a leading German-made voice assistant platform for business-to-business (B2B) applications. This platform should be open, modular and scalable and provide technologies, services and data via service interfaces. The SPEAKER platform will be embedded in a comprehensive ecosystem made up of big industry, SMEs, start-ups and research partners who secure high innovation capabilities. The Fraunhofer Institutes for Intelligent Analysis and Information Systems IAIS and for Integrated Circuits IIS, which already possess the relevant technologies and experience in the field of voice assistant technologies, platforms (e.g. AI4EU – European AI on-demand platform) and global marketing strategies for voice and audio technologies (e.g. MP3), will ensure the development of the platform and the ecosystem.

The two Fraunhofer Institutes IIS and IAIS have conducted workshops with numerous companies to establish requirements, determine obstacles and recommend actions that will serve as a basis for platform design and development. Key arguments for a German-made voice assistant platform include data protection, security, privacy and trust. The lack of these has become evident particularly from the recently reported incidents of non-GDPR-compliant speech analysis by Google, Alexa and Siri. This is all the more applicable in the B2B environment, where internal company data needs to be protected. The SPEAKER platform therefore addresses the issues of data and technology sovereignty in this important emerging field of human-machine communication. Requirements were also identified with respect to domain-specific customizability, flexibility in the choice and use of modules, open interfaces to databases and applications, multilinguality, paralinguistics (e.g. recognizing emotions in voices) and participation and development of a user community. In parallel with the survey of requirements, current market research predicting strong growth in the voice assistant market was evaluated. On average, a 25 percent annual increase in devices with voice assistant functions is expected in the next four years.

The SPEAKER platform’s aim is to provide open, transparent and secure voice assistant applications. To achieve this, leading technologies for audio preprocessing, speech recognition, natural-language understanding (NLU), question answering (QA), dialogue management and speech synthesis by means of artificial intelligence (AI), and machine learning must be made available for simple, uncomplicated use. These key modules will be used to develop industrial voice assistant applications that, in turn, can be made available to other market players via the platform in the form of ready-made skills.

Compared with existing voice assistant environments (Alexa, Google Assistant), the following key characteristics are guaranteed and highlighted: modularity, data protection and privacy, openness with respect to technologies, connectivity and dissemination through an open ecosystem, and innovation capability. In addition, data diversity for B2B applications will be made possible by providing a data platform and integrating data and application partners. The infrastructure of the SPEAKER platform will enable data exchange (community approach), with international networks (MetaNet, European Language Grid) providing access to numerous language corpora. The SPEAKER platform will use industrial scaling mechanisms (e.g. Docker, Kubernetes, Redis). To this end, SPEAKER is working with the German company iNNOVO Cloud. This cooperation enables us to guarantee not only scalability, but also data protection based on GDPR principles. After the platform is transferred to the operating company, the public launch of the platform will help it become established quickly, setting up SPEAKER for a sustainable future. SPEAKER will be offered at a similar cost to established platforms and will focus primarily on B2B applications.

Consortium managers

collaborative partners

Collaborative partners, also called consortium partners, who have agreed to define a use case and implement it together with the SPEAKER consortium.

Associated partners

Companies, associations, municipalities or other organizations that do not apply for funding can be included as associated partners in the project network and thus benefit from free access to the SPEAKER platform during the implementation phase.

Events

Current events

SPEAKER project final event

on September 28th, 2023 in Berlin

Passed events

Hub.Berlin on 18. and 19.04.2021 in Berlin

Fachseminar „Smart Living – intelligent, vernetzt, energieeffizient“
on 16. and 17.09.2020 in Nürnberg

Hannover Messe 2020 from 13.07. to 27.07.2020 in Hannover

1st International Workshop on Language Technology Platforms (IWLTP 2020)
on 16.05.2020 in Marseille

Voice Connected Business on 14. and 15.05.2020 in Frankfurt

Start of the implementation phase of the SPEAKER Project on 01.04.2020

ITG-Fachgruppentreffen „Signalverarbeitung und maschinelles Lernen“
on 06.03.2020 in Sankt Augustin

ITG Workshop Sprachassistenten on 03.03.2020 in Magdeburg

Submission of the overall project description on 15.10.2019

Opening Ceremony Forum Digitale Technologien & Announcement of the winners of the
KI-Innovationswettbewerbs on 19.09.2019 in Berlin

lecture series at Fraunhofer IIS about Natural Language Processing with Dr. Xin Wang
on 13.09.2019 in Erlangen

Submission of the implementation concept for the implementation phase 16.08.2019

Project intern workshops

07.04.2020 Projekt Kick-Off

30.07.2020 Voice UX Workshop

08.10.2020 1st Milestone Meeting

13.11.2020 Data Annotation Workshop

26.11.2020 Plattform Workshop

09.12.2020 Model workshop for speech recognition

23.02.2021 Wikispeech-Workshop

04.03.2021 Workshop Dialogmanager, Dialogeditor und NLU

16.03.2021 Multimodality Workshop

18.03.2021 Workshop Text-to-Speech

15.04.2021 2nd Milestone Meeting

24.06.2021 Data Annotation Workshop

20.12.2021 Drittes Meilensteinmeeting

24.02.2022 Question Answering over Knowledge Graphs

Key:
sponsores, promoter, collaborative partners | collaborative- and associated partners | collaborative partners

Papers

The AudioLabs System for the Blizzard Challenge 2023

F. Zalkow et al.: ISCA Proceedings, 2023

Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests

K. Kayyar, C. Dittmar, N. Pia, and E. A. P. Habets: ISCA Proceedings, 2023

Multi-Speaker Text-to-Speech Using ForwardTacotron with Improved Duration Prediction

K. Kayyar, C. Dittmar, N. Pia, and E. A. P. Habets: ITG Conference on Speech Communication Proceedings, 2023

Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

K. Kayyar, C. Dittmar, N. Pia, and E. A. P. Habets: EUSIPCO Proceedings, 2023

Improving the Naturalness of Synthesized Spectrograms for TTS Using GAN-Based Post-Processing

P. Sani, J. Bauer, F. Zalkow, E. A. P. Habets, and C. Dittmar: ITG Conference on Speech Communication Proceedings, 2023

Evaluating Speech–Phoneme Alignment and its Impact on Neural Text-To-Speech Synthesis

F. Zalkow, P. Govalkar, M. Müller, E. A. P. Habets, and C. Dittmar: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Uncertain yet rational - Uncertainty as an Evaluation Measure of Rational Privacy Decision-Making in Conversational AI

A. Leschanowsky, B. Popp, and N. Peters: 25th International Conference On Human-Computer Interaction, 2023

Knowledge Distillation Meets Few-Shot Learning: An Approach for Few-Shot Intent Classification Within and Across Domains

A. Sauer, S. Asaadi, and F. Küch: NLP Proceedings, 2022

WoS - Open Source Wizard of Oz for Speech Systems

B. Brüggemeier & P. Lalone: IUI Proceedings, 2019

A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction

P. Govalkar, J. Fischer, F. Zalkow & C. Dittmar: ISCA SSW Proceedings, 2019

Segmenting multi-intent queries for spoken language understanding

R. Shet, E. Davcheva & C. Uhle: ESSV Proceedings, 2019

Privacy in Speech Interfaces

T. Bäckström, B. Brüggemeier & J. Fischer: ITG News, 2020 (not available online)

User Experience of Alexa, Siri and Google Assistant when controlling music – comparison of four questionnaires

B. Brüggemeier, M. Breiter, M. Kurz & J. Schiwy: HCII 2020 – Late Breaking Papers Springer LNCS Proceedings, Copenhagen, Denmark, 2020 (nicht frei verfügbar)

User Experience of Alexa when controlling music – comparison of face and construct validity of four questionnaires

B. Brüggemeier, M. Breiter, M. Kurz & J. Schiwy: 2nd Conference on Conversational User Interfaces (CUI 2020), Bilbao, Spain, 2020

Development of a leading language assistance platform

B. Brüggemeier, J. Fischer, D. Laqua, C. Möller, R. Usbeck, K. Wagener, H. Wedig, P. Theile, D. Steinigen & C. Dittmar: Schlussbericht zum Vorhaben SPEAKER, 2020 (not available online)

Message Passing for Hyper-Relational Knowledge Graphs

M. Galkin, P. Trivedi, G. Maheshwari, R. Usbeck & J. Lehmann: 2020

Language Model Transformers as Evaluators for Open-domain Dialogues

R. Nedelchev, J. Lehmann & R. Usbeck: Proceedings of the 28th International Conference on Computational Linguistics, pages 6797–6808, Barcelona, Spain (Online), 2020. International Committee on Computational Linguistics

Towards an interoperable ecosystem of AI and LT platforms: A roadmap for the implementation of different levels of interoperability

G. Rehm, D. Galanis, P. Labropoulou, S. Piperidis, M. Welß, R. Usbeck, J. Köhler, M. Deligiannis, K. Gkirtzou, J. Fischer, C. Chiarcos, N. Feldhus, J. Moreno Schneider, F. Kintzel, E. Montiel-Ponsoda, V. Rodríguez-Doncel, J. Philip McCrae, D. Laqua, I. P. Theile, C. Dittmar, K. Bontcheva, I. Roberts, A. Vasiljevs & A. Lagzdins: G. Rehm, K. Bontcheva, K. Choukri, J. Hajic, S. Piperidis, and A. Vasiljevs [editors]: Proceedings of the 1st International Workshop on Language Technology Platforms, IWLTP@LREC 2020, Marseille, France, 2020, pages 96–107. European Language Resources Association, 2020

User Preference and Categories for Error Responses in Conversational User Interfaces

S. Yuan, B. Brüggemeier, S. Hillmann & T. Michael: 2nd Conference on Conversational User Interfaces (CUI 2020), Bilbao, Spain, 2020 (Registrierung notwendig)

Crowdsourcing Ecologically-Valid Dialogue Data for German

Y. Frommherz and A. Zarcone: Frontiers in Computer Science 2021

New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain

T. Alam, A. Zarcone and S. Padó: Proceedings of the 14th International Conference on Computational Semantics (IWCS), June 2021, Groningen, The Netherlands (online), Association for Computational Linguistics

Design Implications for Human-Machine Interactions from a Qualitative Pilot Study on Privacy

Leschanowsky, A., Brüggemeier, B., Peters, N. (2021) Design Implications for Human-Machine Interactions from a Qualitative Pilot Study on Privacy. Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, 76-79, doi: 10.21437/SPSC.2021-16

Fraunhofer - A Lightweight Neural TTS System for High-quality German Speech Synthesis

Govalkar, A. Mustafa, N. Pia, J. Bauer, M. Yurt, Y. Özer, C. Dittmar: „A Lightweight Neural TTS System for High-quality German Speech Synthesis“. ITG Speech Communication, Kiel, 2021

Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification

Hrycyk, A. Zarcone, L. Hahn: „Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification“. Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, Punta Cana, 2021

Small Data in NLU: Proposals towards a Data-Centric Approach

Zarcone, J. Lehmann, E. Habets: „Small Data in NLU: Proposals towards a Data-Centric Approach“. Proceedings of the NeurIPS Data-centric AI Workshop, 52-67, 2021

Success is not Final; Failure is not Fatal – Task Success and User Experience in Interactions with Alexa, Google Assistant and Siri

Kurz, M., Brüggemeier, B., & Breiter, M. (2021). Success is not Final; Failure is not Fatal – Task Success and User Experience in Interactions with Alexa, Google Assistant and Siri. HCI.

Perceptions and reactions to conversational privacy initiated by a conversational user interface

Birgit Brüggemeier, Philip Lalone, Perceptions and reactions to conversational privacy initiated by a conversational user interface, Computer Speech & Language, Volume 71, 2022, 101269, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2021.101269

Adapting Debiasing Strategies for Conversational AI

Leschanowsky A., Popp B., Peters N. (2022): Adapting Debiasing Strategies for Conversational AI. In: Proceedings of the International Conference on Privacy-friendly and Trustworthy Technology for Society – COST Action CA19121 – Network on Privacy-Aware Audio- and Video-Based Applications for Active and Assisted Living (nicht online verfügbar)

Predicting Request Success with Objective Features in German Multimodal Speech Assistants

Mareike Weber, Mhd Modar Halimeh, Walter Kellermann, and Birgit Popp, Predicting Request Success with Objective Features in German Multimodal Speech Assistants, Proceedings of Human Computer Interaction International HCII 2022, LNAI 13336, Artificial Intelligence in HCI , volume 35.

Chatbot Language - crowdsource perceptions and reactions to dialogue systems to inform dialogue design decision

Birgit Popp, Philip Lalone, Anna Leschanowsky. Chatbot Language – crowdsource perceptions and reactions to dialogue systems to inform dialogue design decisions. Behavior Research Methods, 2022.

Would you like to become an associated partner
or do you have questions regarding this project

Feel free to contact me directly by phone
or use one of the other contact options:

E-Mail: johannes.fischer@iis.fraunhofer.de or speaker@iais.fraunhofer.de

Johannes Fischer

Fraunhofer IIS
+49 (0) 9131 / 776 – 6297

Registration for the Infomail-Service of the SPEAKER Projects

In our infomail we would like to inform you at irregular intervals about current topics, backgrounds and events in connection with the SPEAKER project. By submitting this form, you agree that the data you provide will be collected by the two consortium partners, Fraunhofer IIS and Fraunhofer IAIS, used exclusively for sending the SPEAKER Infomail and not passed on to third parties.
You can object to the use of your data at any time by sending an email to: amm-info@iis.fraunhofer.de widersprechen.

Bitte füllen Sie das Pflichtfeld aus.

forename

Bitte füllen Sie das Pflichtfeld aus.

surname

Bitte füllen Sie das Pflichtfeld aus.

company

Bitte füllen Sie das Pflichtfeld aus.

data protection declaration

*)required

I read both the privacy poicy of the Fraunhofer IIS and the privacy policy of the IAIS and accept them.

*)required

A language assistance platform »Made in Germany«

Werden Sie assoziierter Partner dieses Projekts

Voice assistants in maintenance

Voice assistant for police

Voice assistant for vehicle development

Hospitality – Smart Speaker in Hotels

SPEAKER research project - DATEV speaks!

Cloud-based Android Assistant

Summary

Consortium managers

collaborative partners

Associated partners

Events

Current events

Passed events

Project intern workshops

Papers

The AudioLabs System for the Blizzard Challenge 2023

Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests

Multi-Speaker Text-to-Speech Using ForwardTacotron with Improved Duration Prediction

Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

Improving the Naturalness of Synthesized Spectrograms for TTS Using GAN-Based Post-Processing

Evaluating Speech–Phoneme Alignment and its Impact on Neural Text-To-Speech Synthesis

Uncertain yet rational - Uncertainty as an Evaluation Measure of Rational Privacy Decision-Making in Conversational AI

Knowledge Distillation Meets Few-Shot Learning: An Approach for Few-Shot Intent Classification Within and Across Domains

WoS - Open Source Wizard of Oz for Speech Systems

A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction

Segmenting multi-intent queries for spoken language understanding

Privacy in Speech Interfaces

User Experience of Alexa, Siri and Google Assistant when controlling music – comparison of four questionnaires

User Experience of Alexa when controlling music – comparison of face and construct validity of four questionnaires

Development of a leading language assistance platform

Message Passing for Hyper-Relational Knowledge Graphs

Language Model Transformers as Evaluators for Open-domain Dialogues

Towards an interoperable ecosystem of AI and LT platforms: A roadmap for the implementation of different levels of interoperability

User Preference and Categories for Error Responses in Conversational User Interfaces

Crowdsourcing Ecologically-Valid Dialogue Data for German

New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain

Design Implications for Human-Machine Interactions from a Qualitative Pilot Study on Privacy

Fraunhofer - A Lightweight Neural TTS System for High-quality German Speech Synthesis

Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification

Small Data in NLU: Proposals towards a Data-Centric Approach

Success is not Final; Failure is not Fatal – Task Success and User Experience in Interactions with Alexa, Google Assistant and Siri

Perceptions and reactions to conversational privacy initiated by a conversational user interface

Adapting Debiasing Strategies for Conversational AI

Predicting Request Success with Objective Features in German Multimodal Speech Assistants

Chatbot Language - crowdsource perceptions and reactions to dialogue systems to inform dialogue design decision

Would you like to become an associated partneror do you have questions regarding this project

Johannes Fischer

Registration for the Infomail-Service of the SPEAKER Projects

A language assistance platform
»Made in Germany«

Would you like to become an associated partner
or do you have questions regarding this project