← Back to KHAO

GitHub ·

Accelerating researchers and developers building multilingual AI with a new open dataset

2 min read

Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.

★ Tier-1 Source

Thumbnail for a video that says 'What do slash commands do?'.

Software may be written in programming languages, but human language is at the heart of developer collaboration.

Key facts

Summary

As AI becomes a bigger part of how developers build software, multilingual developer content matters more than ever. Today, GitHub is publishing the GitHub Multilingual Repositories Dataset, a repository-level metadata dataset designed to help researchers and developers discover public GitHub repositories with evidence of non-English natural-language content. The dataset is now.0. The GitHub Multilingual Repositories Dataset is intentionally not a dump of repository content. Language classifications of the README, the most-commented issue, and the most-commented pull request, with the first 150 characters of each used as the input sample.

Read full article at GitHub Blog →

#GitHub