Microsoft Community Insights Podcast

Episode 47 - Inside the Co‑op Translator Journey with Minseok Song

Episode 47

What if your documentation never drifted out of sync across 54 languages? We sit down with MinSock, Microsoft AI MVP and open-source maintainer, to unpack how a hackathon prototype grew into a robust translation automation pipeline now living under the Microsoft Azure GitHub organisation. The story starts with a simple pain: reading English technical docs as a non-native speaker. It evolves into a system that watches your repo, translates Markdown, images and Jupyter notebooks, and keeps everything aligned as source files change.

Enjoy the episode, and if it sparks ideas, share it with your team, subscribe for more community-driven engineering stories, and leave a review with the one translation challenge you want solved next.

Text Us About the Show

SPEAKER_00:

Hello everyone, welcome to Microsoft Community Insight Podcast, where we share insights from community experts day after day in Microsoft. Today we have a special episode called Insight Co-op Translated Journey and a special guest called MinSock. Can you please introduce yourself?

SPEAKER_02:

Hi Nicolas and hi everyone. Thanks for having me. I'm a Microsoft. Sorry, my name is MinSoff, and I'm a Microsoft AI MVP and open source maintainer. Most of my work focuses on building AI powered tooling for developer documentation. And one of the main projects I maintain is called COVID Translator.

SPEAKER_00:

Yeah, so I know it's uh I know it's within the Microsoft GitHub report. We just want to dive into like the story behind it. So when's it how is it how did you create it? What was the idea you thought of?

SPEAKER_02:

Oh yeah, there's so many stories I wanted to tell you, but uh while I started building, uh started to build Hope Translator, um it is directly from my frustration to read um English documentation because as you know, I'm not a native English speaker. Yeah. So yes, for me it is very hard to read um uh English documentation, especially for technical documentation. Um so uh when when I was contributing to the Phi coupe, which is the official documentation uh of a small language model called Phi, um I I suggested uh translate, I suggested the Korean translations uh to the maintainers, and during that conversation, um he introduced me into co-op translator project. At the time it was just a prototype and POC built by a student hackathon team, and I decided I decided to take it over and I re-architectured it into a fully functional command line tool, and it it is uh eventually um moved into under Azure uh Microsoft Azure organization tool.

SPEAKER_00:

Was it part of like an acadon in where you live in in Korea? Or was it online one of those online acaphrons that someone created it? I do and the team created it, and then you just take over it.

SPEAKER_02:

Yes, absolutely. And it evolved a lot of and it evolved a lot since I take over.

SPEAKER_00:

Okay, so what hackathon was it part of when it started?

SPEAKER_02:

Oh, I'm sorry.

SPEAKER_00:

What hackathon? You said you started it with it was first created with Hackerfront. You did a hackathon with groups of people?

SPEAKER_02:

Uh I I meant was it is it was kind of hackathon, hackathon project built by students. Uh okay. Yeah.

SPEAKER_00:

Nice. And you take over it. And at the moment I can see it's got lots of like uh likes and then now it's within the Microsoft Azure GitHub repo. And yeah. So are you the are you the can you tell me the main purpose of it? Is it just translating documents?

SPEAKER_02:

Um technically I would like to focus on automation, not uh translation. But you're right. Uh the main purpose is translating document, but uh the real real reason why cool translator exists is maintaining documentation, maintaining multilingual documentation always up to date. Uh because there are so many translator tools, you know, but most of them is focusing on just translation feature, but call translator is focusing on um automation that keeps always up to date. So the trans so that the translations always be uh be in sync every time the source source changes. Uh okay.

SPEAKER_00:

So whenever someone created a document in English, uh it would just normally translate it to different languages using that translator. Whether it's a new markdown file.

SPEAKER_02:

Yes, your markdown files and images and notebook files also. And it is it is treated like uh automation process.

SPEAKER_00:

Okay. So at the moment, currently are you the only maintenance or is it others as well, active maintenance?

SPEAKER_02:

Um currently um there are two maintainers official officially, including me. One of one of other maintainers is um currently not activated, but sometimes he contributed to that open source, uh, special specially um especially uh uh in terms of image image translation feature. And I'm focusing on markdown and the notebook translations and other overall translation features.

SPEAKER_00:

Uh okay, so you break down the features between different maintainers to look after it to maintain it. Yeah. Yeah, exactly. Yeah, I saw that it's got different upgrades now. So what's I saw it's being used by uh like a lot of repos then as well. So in within Microsoft, so it's been downloaded quite a lot of time, translated. So are you how are you using the the languages? Are you using Azure AI speech or language studio?

SPEAKER_02:

Ah we use um Azure OpenAI for translator, and because of flexibility, um technically Azure AI translator currently supports a lot of languages, but um currently in some languages kind of Nigerian PGN, it is it is um variation from English, but it is not supported by Azure AI translator. And and we since we use Azure OpenAI, instead of translator, we can support that languages. And the other reason is um it is very very easy to handle uh markdown documentation uh combined with some code and text. Uh Azure OpenAI can handle it very well. So I decided to use Azure OpenAI. Okay.

SPEAKER_00:

Can anyone contribute? Can anyone like if they want to help within the repo, do they need to contact you? Contribute? Starting.

SPEAKER_02:

Yes, of course. Uh fortunately, a lot of contributors have been uh have been contributing to the COP translator. Especially I want to say uh in that podcast, um I would like I'd like to say thank you for those contributors who were uh what I want to say is uh uh the native speakers because uh you know reviewing translations is that I can't do that because I can speak also uh I can speak English and Korean, but um for for translating some Japanese or Chinese or some Estonian, there are so many languages to need a review, but uh but fortunately uh many contributors uh from over over the world come to my uh come to my repo and they gladly review the translations. And I really I'm really grateful of that contributors contributions.

SPEAKER_00:

Yeah, because you definitely need someone to review the the translation because you can't like say trust agile ai language translation to do it, just make sure it's that the the accuracy of the language is always good.

SPEAKER_02:

Yes, we need a process to review it. To add a new net which in the whole translator.

SPEAKER_00:

Okay, so currently is there any other is there any fit new features being added to it recently?

SPEAKER_02:

Uh yes, um currently uh we are focusing on stabilization uh instead of of adding a new feature. Uh of course, right now I'm actually working on extension of co-translator called localized flow. Uh one of the challenges we found um using co-translator is when you deal with really large documentation, GitHub GitHub Action 6 hour limit can be restrictive. Uh so for those larger workflows, we are moving using um Azure container jobs instead of using um GitHub Actions. That way we can handle longer translation processes without hitting the time limit. Uh so the I'm focusing on like kind of stabilization using migrate, migrating some using a building new service that centralizes the server and that uh to handle a large large documentation, especially you know Microsoft um Microsoft repositories can be very large and sometimes it it makes a lot of error to uh while while while working it. The reason ah can you see the yeah, can you see my screen well? Oh can you so that's the reason um the reason made made me into building localized for flow uh to overcome the limit GitHub action six hour limit? Uh it is uh kind of called control panel of managing translations by your reports. So okay.

SPEAKER_00:

If you yes, if you just uh start uh what what does it mean by localized flow?

SPEAKER_02:

Is it something like you run local on your computer without like yes, it is uh think about this it is a kind of a hop translator, but it is hosted by a centralized centralized server. So when you just push the this automate button, yeah. Uh from from now on, this this localized full watches your repo and just uh localize it whenever the source changes, it is the same as the call translator did. It is using cop translator, but built on top of that call translator.

SPEAKER_00:

Okay. But is is it how is it connecting to the repo? Is it just using the API?

SPEAKER_02:

Uh yes, it is kind of GitHub app. Yeah, so yeah, GitHub API is right to connect to it. Uh yes, you're right. So if you click this button, you can um you can uh connect your repo at the GitHub GitHub homepage. And you can just use um call like you use call translator in the website very easily.

SPEAKER_00:

So if someone doesn't if someone wants to just connect to the like the localized localized flow API without having a GUI, like put it in like in code, like passed in code, can they do that?

SPEAKER_02:

Currently we just um support you UI by through um connecting this page, but I think users who want use code to connect it, I think we recommend you just use code translator instead of localized flow.

SPEAKER_01:

Yeah, yeah.

SPEAKER_02:

That's that's easier, easier way.

SPEAKER_00:

Okay, so once you connect to the repo, what is it doing?

SPEAKER_02:

Uh I think uh yes, we can configure the translation type for the target languages and uh okay. Yeah, languages. We can yes, we can manage jobs here and just click automation button. You can um you can receive automated PR whenever the source changes. Nice.

SPEAKER_00:

So you can you can be specific like which file you want to translate, whether to like uh like uh different language, you chosen language as well. You don't need to translate the whole repo as well.

SPEAKER_02:

Uh oh yes, yes, you're right.

SPEAKER_00:

Okay, how many languages is it supported at the moment?

SPEAKER_02:

Uh yes, currently 54 languages supported. Um all languages is contributed by the contributors who are native speakers. Yeah.

SPEAKER_00:

And you're always looking to expand that. So I'm always looking for I'm not sure how many languages around the world, but it's good to expand in it.

SPEAKER_02:

Yes, I'm happy to do so. I'm happy to find the contributors for for uh contributing the new languages to call trans translator.

SPEAKER_00:

Okay, which other language are you looking for translator for?

SPEAKER_02:

Uh yes, I have I would like to say um while I received so many contributions, I I just realize there are languages, but um it is kind of uh regional languages from India. I think in I I realize India has uh a lot of a lot of uh a lot of kinds of languages, and I'd like to um expand that language to the region-specific languages. That's what I do.

SPEAKER_00:

So whether it's India would be Hindu or something, so make it specific on like if it's like for example UK and US or Ireland, it'll be English or something, like Spanish, something south uh South Africa will be sp Spanish. So otherwise, if you do specific city language, it'll be too much.

SPEAKER_02:

Yes, you're right. Um, and technically, I have to add uh UK-based languages because currently it is um English-based. We we only support currently English-based, but we have to support UK in the near future.

SPEAKER_00:

Yeah. Oh yeah, because remember, the the language from UK and US is slightly different. How you different? Yes, you're right. It's barely the same. It's the same that you can say maybe in India as well. So maybe there's quite there's a few different languages in India.

SPEAKER_02:

Yes, in the first is Vingia, yeah.

SPEAKER_00:

Okay, so aside from India, is there any other languages do you want? So we can try to get people to in within those native speak uh languages to help.

SPEAKER_02:

Uh yes, um, what I've seen so far, um I I realize Azure Translator currently used in many projects, and Azure Translator supports a lot of languages, but they are are some have some weak point about the variation of languages. Uh for example, English has uh a lot of variations. For example, Nigerian PG is kind of variation, and there are so many countries using English, but it is slightly different from original English. I'd like to support that by coop translator.

SPEAKER_00:

Okay, so your main goal would be supporting all the languages that I just translators support, but native integrated with co-op translator.

SPEAKER_02:

Yes, I I think it is one of the benefits of using co-op translator.

SPEAKER_00:

Yeah. I'm not 100% sure how many languages IGS translator support, but I think it's quite a lot. So you just you could do a comparison between the co-op translators to see which language you're missing, and then you could try to find out uh if there's any potential uh translator that could review later on if you need to. Because we're trying to trying to be the main the aim to try to be the main translator for documents for different uh languages around the world. So that's the main aim to make it more accessible to people that that doesn't have like, for example, no English or career stuff to understand.

SPEAKER_02:

Oh, yeah.

SPEAKER_00:

I really also okay. So what so if you want to translate, for example, a document, for example, in Chinese, you would just use that co-op translator and then point it to your repo. Why if it's a like uh does it only support MD file? What kind of file does it support?

SPEAKER_02:

Uh currently we we support markdown files and images and Jupyter Notebook files. Currently it does uh we specialize supporting uh image.

SPEAKER_01:

Yeah.

SPEAKER_02:

We use Azure AI VJM for uh image notification and we use Azure OpenAI for translation, translator.

SPEAKER_00:

Okay, so it supports quite a few languages. And you can actually do it would there be any feature like uh support for different files, for example PDF or stuff?

SPEAKER_02:

Um currently um since we are focusing on on automating the translation for educational GitHub repositories, they are they are um they are they are combined with some markdown files, images, and notebooks. So currently we don't have um any other plan to add uh support uh supporting other types. But but if users need In future, um, I gladly to add it.

SPEAKER_00:

Yeah. So I guess if there were any like there were any issues, can someone typically raise an issue on a give repo and it will go to yourself or go to another maintenance? If there is any recommendation feature improvements.

SPEAKER_02:

Oh yes, I honestly I received a lot of issues and um the recommendations. The one issues I would like to say in here is here is totally fine. I would like to say it's the co-translator, especially the co-translator uh deal with image translations and image translations trigger uh uh a lot of file strategy. Um I mean the you know imagey one of Imagey has uh a lot of file size comparing to markdown files, and yeah it causes um um um ripple size it it it causes expand expand the ripple size very it will increase the replay size uh by the because I think it would be it's looking at the pixel of the images, so yes, you're right. When we translating 55 languages, we have to make 55 language versions of that images, so it is kind of boom. So many users uh who want to clone the Microsoft educational repo translate by five or four languages, struggle to download it because of the heavy heavy size of images. So I have to find a way to resize the images or the the better way to um handle it.

SPEAKER_00:

Yeah, we were actually reducing the resolution and the pixel, so you can only change the size. Okay, uh let's dive into some of the because uh some of the community work that you did. So I remember previously you were a student ambassador. So yeah. Do you want to explain how you get started and uh how you progressed to be an MVP?

SPEAKER_02:

This Surrey uh become uh SurreyO from student ambassador from Microsoft MVP. Yeah, am I right? Oh yeah, yeah. Um let me think for five seconds. I didn't expect it, yeah, honestly.

SPEAKER_00:

So when did you uh so how did how long were you a student bachelor for? And how how how did you become an MVP?

SPEAKER_02:

Oh yes, um almost two years ago I I I became student ambassador uh at the time I wanted to become uh I wanted almost two years old, I became student ambassador and and I I think uh um I really uh from my thinking um I think um I suggested some kind of uh technical documentation to Microsoft tech community.

SPEAKER_00:

Yeah, like Microsoft like Microsoft Learn, where there is Microsoft Learn modules and stuff. Yeah.

SPEAKER_02:

Yeah, it is stuff like that. It is a kind of uh tech blog, and I I as far as I know, student ambassador if approved, we can um unload by onloading. So I I I I made a lot of contribution to Microsoft tech community and during that during the contributions I I fortunately um had a lot of opportunity to talk with some other MVPs in Korea or over the walls and advocates like or maybe there are so many advocates in Microsoft and they gave me a lot of inspirate inspiration to me how to how to build uh how how to use Azure OpenAI or how do I um how do I learn from uh cutting edge technology? I learned a lot from him, uh them and and one of them uh and I eagerly to eagerly contributed to the open source repositories, including uh I I said it before. The Advocate manages a main so one of maintainers, uh Microsoft Advocate, and he he introduced me into Call Translator. I started um building Call Translator, and I think the project of Call Translator uh made me into um become a Microsoft AI MVP.

SPEAKER_00:

Yeah, uh it's part of it, but it's part of it that is impacting other people and you're helping them like learn. So that's the main aim that I think is. So aside from that, uh do you have any hobbies? Do you have do you anything in your spare time than just maintaining the co-op translator?

SPEAKER_02:

Uh yes, I I have a hobby. Um originally, so originally my hobby was open source contributions, but uh these days I'm really occupied with maintaining co-op translator. I have to find another hobby. My hobby is reading books. You can see uh before I have a lot of here. I don't know if it's in Korean audience.

SPEAKER_00:

It might be all in Korean internet, yeah.

SPEAKER_02:

Yeah, it is Korean, but um someday I wanna read uh English native books. Uh one of my companies study English.

SPEAKER_00:

Oh really? You can still take it up for like a as a uh like if you watch more of the English movies as TV, you can take it up either.

SPEAKER_02:

It's very yes, it's right. Yes, I love I love watching English movies and I I I really enjoyed um English animation topped by English. And I think it really helps me a lot to speak English.

SPEAKER_00:

Yeah. Uh yeah, and are you going to any events? Are you going to any you coming to the FVP summit in March? Any chance?

SPEAKER_02:

Um not yet because I'm I'm currently um occupied building some localized localized flow project.

SPEAKER_00:

I don't have any uh schedule yet, but if I have uh if I have a room for in my plate, I gladly participate in any last-minute words or advice for people that want to build but don't know how to get started in open source and how they can like, for example, try uh how to get started.

SPEAKER_02:

I think that that's a really great question because um including many, including me, many students struggle with they struggle with contributing the open source. The main reason is they don't know how to start open source contributions. I would like to tell my story about starting open source contributions. My first contribution was just fixed fixing typo at some project uh and uh in the Apache Foundation organization. And that that pro that is really simple process, but everyone can do it. And and during that open source contributions, we have to uh we can um have a chance to discuss discuss with other other maintainers or other commuters. And and through that process, we can learn how to how to contribute open source uh in the right way. And and I think that that um that experience uh makes me into the maintainer, into becoming the maintainer of COVID translator.

SPEAKER_00:

Also, yeah. I would say the main goal of any other any open source contributor is always try to see if it can help and improve something, whether it's a repo or specific processor you want to improve, you can you can still improve it and and like make the suggestion to the maintenance. Yeah, so that could be like uh open source contributor because you're suggesting improvements.

SPEAKER_02:

Yeah, I will with you. The the simplest way is suggesting a new feature and just replying in the any of issues, I can do it and I can handle it. I think that's a starting point to contribute open source.

SPEAKER_00:

So hopefully everyone would know the importance of contributing to open source projects. And if you want to contribute to the course translator, you go to the repo, uh the Microsoft repo, and uh you find any issues you would like to do, whether it's errors, uh any features issue, any grammatical issues, and then you write it up here or just take mix of if there's any issues.

SPEAKER_02:

I would do it for uh thank you everyone.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

The Azure Podcast Artwork

The Azure Podcast

Cynthia Kreng, Kendall Roden, Cale Teeter, Evan Basalik, Russell Young and Sujit D'Mello
The Azure Security Podcast Artwork

The Azure Security Podcast

Michael Howard, Sarah Young, Gladys Rodriguez and Mark Simos