[ Top page ]

« Linguistics of programming languages ― now an inquiry on my graduate thesis came from France | Main | SQL queries translated into Perl »

Information, Computation and Programming:Programming and Compilers

Programming linguistics ― Morphology since then

[ 対応する日本語のページ ]

I wrote that I received an inquiry from a French on “Programming Linguistics” that I proposed in my graduate thesis in a blog article titled “Linguistics of programming languages ― now an inquiry on my graduate thesis came from France”. Yesteday I received another inquiry from a Japanese, and I searched for related work. As I wrote in the above article, there have been a few studies on programs regarded as human-written linguistic expressions. However, in this time, I found a paper by Masaru Ohba and Katsuhiko Gondow, which analyzes the strudture of identifiers.

ProLing126.jpg In my graduate thesis [Kan 81] (in Section 4.2), I pointed out that there were identifiers consisted of multiple elements (i.e., morphemes), and they were articulated by white spaces, such as “towers of hanoi” or articulated by underlines (“_”) or hyphens (“-”), or the first character of each element is capitalized, such as “FileOfInteger”.

Ohba et al [Ohb 05] calls such elements of identifiers “concept keywords”. They tried to find concept keywords automatically. For this purpose, they developed the ckTF/IDF method, which is based on so-called TF/IDF method that have been used for analyzing natural languages. A feature of the ckTF/IDF method is that prefixes such as “kbd_” are not regarded as concept keywords. This is probably because such prefixes often representes abbreviated module names but they have no concern to the meaning of the identifiers.

The authors of this paper do not seem to have had feeling that they studied human beings (i.e., they probably did not think they studied a humanity). However, this work is apparently a part of the Morphology of programming languages that I specified in my graduate thesis.


  • [Kan 81] Kanada, Y., “Toward Programming Linguistics”, Master's Thesis, University of Tokyo Graduate School, 1981 (in Japanse).
  • [Ohb 05] Ohba, Masaru and Gondow, Katsuhiko, “Toward Mining "Concept Keywords" from Identifiers in Large Software Projects”, Int'l Workshop on Mining Software Repositories 2005, pp. 1—5, 2005.
Keywords: Double articulation, Programming linguistics, Semiotics, Semiology


TrackBack URL for this entry:

Post a comment


This page contains a single entry from the blog posted on June 9, 2008 10:26 PM.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by Movable Type