OpenSpeaks/toolkit/av

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
OpenSpeaks logo (white).svg
Audio-visual Toolkit

When languages die, they take away the knowledge preserved in them. At least one language is dying in every second week. Think about the indigenous culture, and cuisine, and weaving techniques, the unique soothing music, the dance forms, and many more that have only been ascribed in a particular language—they are too valuable to lose. We can create a lot with AI, but who does not want to play a digital game or even a board game that is recreated from an indigenous game-play?

It's a great benefit to live in an era that has such powerful digital tools to document and grow languages for many generations that are yet to come. This is probably the right time to think how we can take the advantage of openness—that contains the open source software, the open educational resource, the open processes and communities, and a diverse range of outcomes in open standards—to transform the state of many endangered languages.

Workflow

OpenSpeaks workflow.svg

1. Mapping the status quo of the endangered languages of India in mostly but not limited to the following areas that affect the growth of languages:

  • state language policy
  • native language education and state literacy
  • media, internet and mobile penetration
  • (digital) tools to access and contribute knowledge
  • electronic accessibility e.g. availability of screen reader, text-to-speech and speech-to-text, electronic accessibility tools in public services like ATMs, bus stations, smartphones
  • open-licensed resources like corpus and audio libraries
  • available linguistic tools for machine learning and Natural Language Processing
  • organizations working for the development of the endangered languages

2. Identifying
 demographic 
zones 
in need of 
immediate 
intervention based on the mapping research. A great inspiration can be the "Language Hotspot" model created by the Living Tongues Institute for Endangered Languages that considers a) highest level of linguistic diversity, b) highest levels of endangerment, and c) least-studied languages to identify the "Language Hotspots".

3. Toolkit development and pilot

The toolkit consists of a) Collection of FOSS software (I will try to leverage all the available software or try to create some if something is not available), b) User documentations that can be translated into other languages and used across the world, c) Sample datasets from the test runs to help with using the toolkit, and d) Other Open Educational Resources

4. Train citizen archivists 
in select zones and 5. Pilot toolkit
+
Document
+
Localize toolkit

Some bilingual native speakers — that are conversant in either English or an official language of their region — will be provided training. They — let's call them "Citizen Archivists" — will use the toolkit and create documentations in their languages, and will help annotate the documentations.

The documentation can include either journalistic reports or different linguistic aspects (folklore, folk songs, narration of traditional games/festivals)

6. Building communities
 of citizen archivists by providing constant training to the citizen archivists. Their inputs will be improving the toolkit constantly, and help grow 7. Audio-visual reporting
 by them

8. Building a repository of stories that matter to the many native language-speaker community and to language research. The annotated audio-visual documentations will not just help grow a historical documentation of many people in their own language, but create resources for linguistic research to revive the language. For instance, a recorded audio library is very essential to build text-to-speech and speech-to-text engines. Such tools not just help people with visual disability and illiteracy but everyone.

There are hundreds of reasons why many languages are dying. This toolkit aims at solving one problem at a time.

Check out some of the frequently asked questions.

Getting started with this toolkit
[edit]

Audio recording:[edit]

Home-studio recording setup for project Kathabhidhana that consists of a computer, a USB microphone, a monitor headphone.
1. Home studio: You need a microphone to be able to record the audio. If you can, I would suggest to record in a small home studio setup like the picture above (consists of a USB microphone, a computer, and a monitor headphone).
A digital audio recorder is used to record audio during field recording.
2: Field recording with a recorder or phone: The recording setup will largely vary if you are meeting someone outside your home for a field recording. In that case you will need to carry an audio recorder or a smartphone (some sort of recording app installed in it) with earphones. If you're using a portable recorder make sure you cover the top of the mic with a soft cotton cloth or fake fur to a) avoid dust going inside, and b) the sound of the wind during outdoor recording. Use a rubber band to tighten the base and never touch the cloth/fur while recording. Mics can capture small little movements and completely distort the audio.
2. Recording from phone: Earphones that come with the phones generally work both for phones and computers as compared to the default microphone provided along with . However, avoid sitting in an open space as there is a high probability of a lot of noise being captured unless if you are using a shotgun microphone (as shown in the picture on your right).
Tutorial to learn making your vocals clear and loud in Audacity (audio editing software).
3: Audio editing software: If editing from a computer, Audacity, a free and open source audio editing software is the first choice for many seasoned recording artists. It is robust, easy to use and can be used in multiple platforms. If you are using your phone or tablet to record and edit the audio, then, use your native recording app or try to find a good free alternative in your respective app store. Ideally the recording/editing app should be allowing you to record in a decent losless quality (minimum requirement is 44100 Hz, above 16 bit PCM i.e. 24 or 32 bit, above 220 kbps; check your settings to find these). Save the audio in .WAV or .FLAC (Audacity supports both). If your recorder/phone does not support these formats, try to use an app/online converter like this (MP3→FLAC or M4A→FLAC) to convert the audio into .FLAC.

Video recording:[edit]

A shotgun microphone is mounted on a video camera or a boom pole and is pointed towards the subject.

1. Which camera to use: Frankly speaking, the video is less important here as compared to the audio. With low quality video, viewers would still be able to manage if the audio is loud and clear. So if you are keen on investing, invest on a good quality microphone that can either be connected with the camera or can be used as a secondary recorder. But do not trust your camera's default microphone. They can literally jeopardize your hard work. As far as the camera goes, you can literally use any camera that allows you to record in a decent quality i.e. above 720p (1280×720 px)—from your phone to a point and shoot camera to a dSLR.

Video recording of scientific research.jpg
a) Using a camera: Use a shotgun microphone that can be connected directly into your camera so that you don't need to invest much on audio syncing during post production.
b) Using a phone for recording video: These days most phones come with high quality hardware that are capable of recording good video. But the real key to recording quality video in a phone lies in stabilizing the shot while recording. You can only do that by investing in a small tripod (they are generally really cheap and do the job) that can hold your phone. For this particular project, tripods will be the best.

2. How to edit the videos: You do not need to edit the videos as we will do that for you. You only need to compress the video using a free software like Handbrake, and upload that into YouTube or something similar without making it public. We will download it and ask you to delete so that you don't have to worry about the amount of space it will take in your hard drive.

Interview process:[edit]

It take years to capture the best emotion in an interview but you can master some of the basic gestures that will let you document really valuable information in your audio/video. Before you start the interview, spend some time in engaging the interviewee in a friendly manner. Ask about themselves, what they ate today and so on just like a friend. And explain why you are interviewing them and how it will be valuable to preserve their language. Some people get intimidated by knowing that their voice/video is going to be public. But explain it to them how they are contributing to preserve their language in a form that their future generation can also access. Language are changing rapidly because of many external mediums. The best way to preserve that is to record and make available for others. Share the fact how at least one language is dying in every two weeks.

Below are a set of things you need to ask in whichever language you are interviewing. Question #1 to 10 are mandatory, and the remaining are optional. Read the below to the interviewee right before the interview (you can modify it appropriate and even translate in whichever language you speak to them): Hi, My name is XYZ. I'm calling from THE PLACE YOU'RE CALLING FROM to document a few details about your language "LANGUAGE NAME" (optional: for our project PROJECT NAME) so that the valuable knowledge of your language get recorded in an accessible form. Based on the form that you filled up, I am recording this call.

This interview will be for about 30 mins. I will upload the recorded interview publicly under a Creative Commons Share-Alike license called CC-BY-SA 4.0. This license allows anyone to use, share, and modify the content even for commercial reproduction. Can I have your permission to proceed?

Comic - an interviewer interviewing a lady about her native language

Ask the following questions if they allow you to proceed:

  1. Can you pronounce your name the way you'd do in your native language/mother tongue?
  2. Where you were born? (skip if they're not comfortable)
  3. What games did you play as a child? Can you share little about those games?
  4. We all have our grandma stories? (with some curiosity in your face) Can you tell me one that you would have listened as a kid from your grandparents/someone elderly? (nod appropriately and show your emotions while listening to their stories—smile or frown but do not make any noise as we want only their voice to be recorded.)
  5. (with smile in your face) Who doesn't like songs even though not everyone is a great singer. Your language must have many songs. Would you mind singing one for me? (same gestures as above)
  6. Did you visit a local fair with friends and family as a kid? Can you share your experience?
  7. Imaging I cannot see anything. Can you explain me in words all the activities that you'd do from daybreak until you go to bed?
  8. What's your favorite traditional food? How is it prepared? (again react to them while listening with curiosity in your face, nod appropriately)
  9. Can you tell some words in your language? Maybe things that you use everyday?
  10. If I learn your language (if you yourself are not a native speaker), how do I greet a guest in the house, talk with them or offer them some food?
    if you speak the same language that they also speak Can you act how you'd welcome a guest to your home and explain me the meaning of each of the greetings? (act as the guest and ask meaning of all the greeting/conversation phrases they say)

Have some questions?[edit]

You might find answers to some here. If you don't find, ask us here. If you have any idea to improve this, please feel free to add them in the same place.

What's next[edit]

  • You can of course use this toolkit for free and document your language. You can translate this page into your language so other people that speak your language can use these resources.

Language list[edit]