且看如何以精致的方式展现,解析和分析GitHub上语言的发展趋势

GitHut网站原文连接:http://githut.info/,其实这是一个非常简单的只有一个页面的网站。做的事情就是去通过GitHub Archive来获取到GitHub代码仓库的大数据然后进行分析,并把Github上用到的各种语言的信息整理出来并呈现给大家。

GITHUT

GitHut is an attempt to visualize and explore the complexity of the universe of programming languages used across the repositories hosted on GitHub.

Githut尝试通过分析和探索Github上的大量的错综复杂的项目所用到的编程语言来把它们呈现出来给大家,以便大家更容易把握当前流行语言的走向和趋势。
_
Programming languages are not simply the tool developers use to create programs or express algorithms but also instruments to code and decode creativity. By observing the history of languages we can enjoy the quest of human kind for a better way to solve problems, to facilitate collaboration between people and to reuse the effort of others._

编程语言不仅仅是我们所认知的给程序员用来创建应用和实现算法的一个工具,同时它里面还封装和透露着人类伟大创造力的气息。通过观察语言的发展历史,我们很欣慰我们人类一直在追求着更优的解决问题的方法,以让人们之间的协作变得更加方便和让我们可以复用别人的努力成功而减少人力资源的浪费。

_Github is the largest code host in the world, with 3.4 million users. It’s the place where the open-source development community offers access to most of its projects. By analyzing how languages are used in GitHub it is possible to understand the popularity of programming languages among developers and also to discover the unique characteristics of each language. _

Github是世界上存有代码量最大的地方,它拥有者340万的用户量。这是一个开源开发社区提供大部分它们的项目让大家进行访问的地方。通过分析GitHub上的开发语言使用情况就很容易可以去了解开发人员当今正在使用的开发语言的流行程度以及这些语言各自的特性。

DATA

GitHub provides publicly available API to interact with its huge dataset of events and interaction with the hosted repositories.

GitHub提供了公共API来跟其巨大的事件数据集以及里面的代码仓库进行互动。

GitHub Archive takes this data a step further by aggregating and storing it for public consumption. GitHub Archive dataset is also available via Google BigQuery.
The quantitative data used in GitHut is collected from GitHub Archive. The data is updated on a quarterly basis.

GitHub Archive 更进一步的把这些巨大的数据聚合并保存起来提供公共服务。Github Archive数据集同时也可以通过 Google BigQuery来获得。我们这里GitHut所用到的计量数据就是通过GitHub Archive 来收集到的。这些数据会在每个季度进行更新。

An additional note about the data is about the large amount of records in which the programming language is not specified. This particular characteristic is extremely evident for the Create Events (of repository), therefore it is not possible to visualize the trending language in terms of newly created repositories. For this reason the Activity value (in terms of number of changes pushed) has been considered the best metric for the popularity of programming languages.

另外,对于我们获得的这些数据值得一提的一点是,其实GitHub Archive上面还有大量的数据里面是没有编程语言相关的信息的。这在先创建一个代码仓库所产生的创建事件中表现的特别明显(天地会珠海分舵注:因为创建一个仓库跟你创建一个文件夹来存放代码差不多,你没有存放实际的代码,我们怎么知道你这个仓库会用到什么编程语言呢?),因此,我们是没有办法去根据新创建的代码仓库来分析编程语言的发展趋势的。鉴于这个原因,我们去分析开发语言流行度最好的角度就是去分析那些代码改变后的签入相关的事件信息了。
_
The release year of the programming language is based on the table Timeline of programming languages from Wikipedia. _

上面的图表所现实的各种开发语言的发布日期参照的是Wikipedia上的开发语言时间轴这个表。

For more information on the methodology of the data collection check-out the publicly available GitHub repository of GitHut.

更多关于这些数据获取方式的信息请签出GitHub上公开的GitHut仓库

-------完----------