SlideShare a Scribd company logo
1 of 92
Download to read offline
The                                                            Journal             Volume 1/1, May 2009

     A peer-reviewed, open-access publication of the R Foundation
                       for Statistical Computing




Contents
Editorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                3

Special section: The Future of R
Facets of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                5
Collaborative Software Development Using R-Forge . . . . . . . . . . . . . . . . . . . . . . . .                             9

Contributed Research Articles
Drawing Diagrams with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                     15
The hwriter Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                   22
AdMit: Adaptive Mixtures of Student-t Distributions . . . . . . . . . . . . . . . . . . . . . . .                           25
expert: Modeling Without Data Using Expert Opinion . . . . . . . . . . . . . . . . . . . . . . .                            31
mvtnorm: New Numerical Algorithm for Multivariate Normal Probabilities . . . . . . . . . .                                  37
EMD: A Package for Empirical Mode Decomposition and Hilbert Spectrum . . . . . . . . . .                                    40
Sample Size Estimation while Controlling False Discovery Rate for Microarray Experiments
  Using the ssize.fdr Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                     47
Easier Parallel Computing in R with snowfall and sfCluster . . . . . . . . . . . . . . . . . . .                            54
PMML: An Open Standard for Sharing Models . . . . . . . . . . . . . . . . . . . . . . . . . . .                             60

News and Notes
Forthcoming Events: Conference on Quantitative Social Science Research Using                  R     .   .   .   .   .   .   66
Forthcoming Events: DSC 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        . .   .   .   .   .   .   .   66
Forthcoming Events: useR! 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . .        . .   .   .   .   .   .   .   67
Conference Review: The 1st Chinese R Conference . . . . . . . . . . . . . . . . .             . .   .   .   .   .   .   .   69
Changes in R 2.9.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . .   .   .   .   .   .   .   71
Changes on CRAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       . .   .   .   .   .   .   .   77
News from the Bioconductor Project . . . . . . . . . . . . . . . . . . . . . . . . . .        . .   .   .   .   .   .   .   91
R Foundation News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       . .   .   .   .   .   .   .   92
2




                     The          Journal is a peer-reviewed publication of the R
                     Foundation for Statistical Computing. Communications regarding
                     this publication should be addressed to the editors. All articles are
                     copyrighted by the respective authors. Please send submissions to
                     regular columns to the respective column editor and all other sub-
                     missions to the editor-in-chief or another member of the editorial
                     board. More detailed submission instructions can be found on the
                     Journal’s homepage.


                                             Editor-in-Chief:
                                               Vince Carey
                                           Channing Laboratory
                                      Brigham and Women’s Hospital
                                               75 Francis St.
                                          Boston, MA 02115 USA

                                             Editorial Board:
                               John Fox, Heather Turner, and Peter Dalgaard.

                                        Editor Programmer’s Niche:
                                                Bill Venables

                                             Editor Help Desk:
                                                Uwe Ligges

                                           R Journal Homepage:
                                      http://journal.r-project.org/

                                    Email of editors and editorial board:
                                   firstname.lastname@R-project.org




The R Journal Vol. 1/1, May 2009                                                             ISSN 2073-4859
3



Editorial
by Vince Carey                                             first issue of the new Journal is purely accidental –
                                                           the transition has been in development for several
Early in the morning of January 7, 2009 — very early,      years now, with critical guidance and support from
from the perspective of readers in Europe and the          the R Foundation, from a number of R Core mem-
Americas, as I was in Almaty, Kazakhstan — I no-           bers, and from previous editors. I am particularly
ticed a somewhat surprising title on the front web         thankful for John Fox’s assistance with this transi-
page of the New York Times: “Data Analysts Capti-          tion, and we are all indebted to co-editors Peter Dal-
vated by R’s Power”. My prior probability that a fa-       gaard and Heather Turner for technical work on the
vorite programming language would find its way to           new Journal’s computational infrastructure, under-
the Times front page was rather low, so I navigated to     taken on top of already substantial commitments of
the article to check. There it was: an early version,      editorial effort.
minus the photographs. Later a few nice pictures               While contemplating the impending step into a
were added, and a Bits Blog http://bits.blogs.             new publishing vehicle, John Fox conceived a series
nytimes.com/2009/01/08/r-you-ready-for-r/ cre-             of special invited articles addressing views of “The
ated. As of early May, the blog includes 43 en-            Future of R”. This issue includes two articles in this
tries received from January 8 to March 12 and in-          projected series. In the first, John Chambers analyzes
cludes comment from Trevor Hastie clarifying histor-       R into facets that help us to understand the ‘rich-
ical roots of R in S, from Alan Zaslavsky reminding        ness’ and ‘messiness’ of R’s programming model.
readers of John Chambers’s ACM award and its role          In the second, Stefan Theußl and Achim Zeileis de-
in funding an American Statistical Association prize       scribe R-Forge, a new infrastructure for contributing
for students’ software development, and from nu-           and managing source code of packages destined for
merous others on aspects of various proprietary and        CRAN. A number of other future-oriented articles
non-proprietary solutions to data analytic comput-         have been promised for this series; we expect to is-
ing problems. Somewhat more recently, the Times            sue a special edition of the Journal collecting these
blog was back in the fold, describing a SAS-to-R in-       when all have arrived.
terface (February 16). As a native New Yorker, it is
natural for me to be aware, and glad, of the Times’            The peer-reviewed contributions section includes
coverage of R; consumers of other mainstream me-           material on programming for diagram construction,
dia products are invited to notify me of comparable        HTML generation, probability model elicitation, in-
coverage events elsewhere.                                 tegration of the multivariate normal density, Hilbert
    As R’s impact continues to expand, R News has          spectrum analysis, microarray study design, paral-
transitioned to The R Journal. You are now reading         lel computing support, and the predictive modeling
the first issue of this new periodical, which continues     markup language. Production of this issue involved
the tradition of the newsletter in various respects: ar-   voluntary efforts of 22 referees on three continents,
ticles are peer-reviewed, copyright of articles is re-     who will be acknowledged in the year-end issue.
tained by authors, and the journal is published elec-
tronically through an archive of freely downloadable       Vince Carey
PDF issues, easily found on the R project homepage         Channing Laboratory, Brigham and Women’s Hospital
and on CRAN. My role as Editor-In-Chief of this            Vince.Carey@R-project.org




The R Journal Vol. 1/1, May 2009                                                                 ISSN 2073-4859
4




The R Journal Vol. 1/1, May 2009   ISSN 2073-4859
I NVITED S ECTION : T HE F UTURE OF R                                                                                5



Facets of R
Special invited paper on “The Future of R”                      2. interactive, hands-on in real time;

by John M. Chambers                                             3. functional in its model of programming;
                                                                4. object-oriented, “everything is an object”;
We are seeing today a widespread, and welcome,
tendency for non-computer-specialists among statis-             5. modular, built from standardized pieces; and,
ticians and others to write collections of R functions
                                                                6. collaborative, a world-wide, open-source effort.
that organize and communicate their work. Along
with the flood of software sometimes comes an atti-           None of these facets is original to R, but while many
tude that one need only learn, or teach, a sort of basic     systems emphasize one or two of them, R continues
how-to-write-the-function level of R programming,            to reflect them all, resulting in a programming model
beyond which most of the detail is unimportant or            that is indeed rich but sometimes messy.
can be absorbed without much discussion. As delu-                The thousands of R packages available from
sions go, this one is not very objectionable if it en-       CRAN, BioConductor, R-Forge, and other reposi-
courages participation. Nevertheless, a delusion it is.      tories, the uncounted other software contributions
In fact, functions are only one of a variety of impor-       from groups and individuals, and the many citations
tant facets that R has acquired by intent or circum-         in the scientific literature all testify to the useful com-
stance during the three-plus decades of the history of       putations created within the R model. Understand-
the software and of its predecessor S. To create valu-       ing how the various facets arose and how they can
able and trustworthy software using R often requires         work together may help us improve and extend that
an understanding of some of these facets and their           software.
interrelations. This paper identifies six facets, dis-            We introduce the facets more or less chronolog-
cussing where they came from, how they support or            ically, in three pairs that entered into the software
conflict with each other, and what implications they          during successive intervals of roughly a decade. To
have for the future of programming with R.                   provide some context, here is a brief chronology,
                                                             with some references. The S project began in our
                                                             statistics research group at Bell Labs in 1976, evolved
Facets                                                       into a generally licensed system through the 1980s
                                                             and continues in the S+ software, currently owned by
Any software system that has endured and retained            TIBCO Software Inc. The history of S is summarized
a reasonably happy user community will likely                in the Appendix to Chambers (2008); the standard
have some distinguishing characteristics that form           books introducing still-relevant versions of S include
its style. The characteristics of different systems are      Becker et al. (1988) (no longer in print), Chambers
usually not totally unrelated, at least among systems        and Hastie (1992), and Chambers (1998). R was an-
serving roughly similar goals—in computing as in             nounced to the world in Ihaka and Gentleman (1996)
other fields, the set of truly distinct concepts is not       and evolved fairly soon to be developed and man-
that large. But in the mix of different characteris-         aged by the R-core group and other contributors. Its
tics and in the details of how they work together lies       history is harder to pin down, partly because the
much of the flavor of a particular system.                    story is very much still happening and partly be-
     Understanding such characteristics, including a         cause, as a collaborative open-source project, indi-
bit of the historical background, can be helpful in          vidual responsibility is sometimes hard to attribute;
making better use of the software. It can also guide         in addition, not all the important new ideas have
thinking about directions for future improvements of         been described separately by their authors. The
the system itself.                                           http://www.r-project.org site points to the on-line
     The R software and the S software that preceded         manuals, as well as a list of over 70 related books and
it reflect a rather large range of characteristics, result-   other material. The manuals are invaluable as a tech-
ing in software that might be termed rich or messy           nical resource, but short on background information.
according to one’s taste. Since R has become a very          Articles in R News, the predecessor of this journal,
widely used environment for applications and re-             give descriptions of some of the key contributions,
search in data analysis, understanding something of          such as namespaces (Tierney, 2003), and internation-
these characteristics may be helpful to the commu-           alization (Ripley, 2005).
nity.
     This paper considers six characteristics, which we
will call facets. They characterize R as:                    An interactive interface
   1. an interface to computational procedures of            The goals for the first version of S in 1976 already
      many kinds;                                            combined two facets of computing usually kept


The R Journal Vol. 1/1, May 2009                                                                     ISSN 2073-4859
6                                                                           I NVITED S ECTION : T HE F UTURE OF R


apart, interactive computing and the development of         Functional and object-oriented pro-
procedural software for scientific applications.
                                                            gramming
    One goal was a better interface to new computa-
tional procedures for scientific data analysis, usually      During the 1980s the topics of functional programming
implemented then as Fortran subroutines. Bell Labs          and object-oriented programming stimulated growing
statistics research had collected and written a variety     interest in the computer science literature. At the
of these while at the same time numerical libraries         same time, I was exploring possibilities for the next
and published algorithm collections were establish-         S, perhaps “after S”. These ideas were eventually
ing many of the essential computational techniques.         blended with other work to produce the “new S”,
These procedures were the essence of the actual data        later called Version 3 of S, or S3, (Becker et al., 1988).
analysis; the initial goal was essentially a better in-         As before, user expressions were still interpreted,
terface to procedural computation. A graphic from           but were now parsed into objects representing the
the first plans for S (Chambers, 2008, Appendix, page        language elements, mainly function calls. The func-
476) illustrated this.                                      tions were also objects, the result of parsing expres-
                                                            sions in the language that defined the function. As
    But that interface was to be interactive, specifically   the motto expressed it, “Everything is an object”. The
via a language used by the data analyst communicat-         procedural interface facet was not abandoned, how-
ing in real time with the software. At this time, a few     ever, but from the user’s perspective it was now de-
systems had pointed the way for interactive scientific       fined by functions that provided an interface to a C
computing, but most of them were highly specialized         or Fortran subroutine.
with limited programming facilities; that is, a limited         The new programming model was functional in
ability for the user to express novel computations.         two senses. It largely followed the functional pro-
The most striking exception was APL, created by             gramming paradigm, which defined programming
Kenneth Iverson (1962), an operator-based interac-          as the evaluation of function calls, with the value
tive language with heavy emphasis on general mul-           uniquely determined by the objects supplied as ar-
tiway arrays. APL introduced conventions that now           guments and with no external side-effects. Not all
seem obvious; for example, the result of evaluating         functions followed this strictly, however.
a user’s expression was either an assignment or a               The new version was functional also in the sense
printed display of the computed object (rather than         that nearly everything that happened in evaluation
expecting users to type a print command). But APL           was a function call. Evaluation in R is even more
at the time had no notion of an interface to proce-         functional in that language components such as as-
dures, which along with a limited, if exotic, syntax        signment and loops are formally function calls inter-
caused us to reject it, although S adopted a number         nally, reflecting in part the influence of Lisp on the
of features from it, such as its approach to general        initial implementation of R.
multidimensional arrays.                                        Subsequently, the programing model was ex-
                                                            tended to include classes of objects and methods
    From its first implementation, the S software            that specialized functions according to the classes
combined these two facets: users carried out data           of their arguments. That extension itself came in
analysis interactively by writing expressions in the        two stages. First, a thin layer of additional code
S language, but much of the programming to extend           implemented classes and methods by two simple
the system was via new Fortran subroutines accom-           mechanisms: objects could have an attribute defin-
panied by interfaces to call them from S. The inter-        ing their class as a character string or a vector of
faces were written in a special language that was           multiple strings; and an explicitly invoked method
compiled into Fortran.                                      dispatch function would match the class of the func-
   The result was already a mixed system that had           tion’s first argument against methods, which were
much of the tension between human and machine               simply saved function objects with a class string ap-
efficiency that still stimulates debate among R users        pended to the object’s name. In the second stage, the
and programmers. At the same time, the flexibility           version of the system known as S4 provided formal
from that same mixture helped the software to grow          class definitions, stored as a special metadata object,
and adapt with a growing user community.                    and similarly formal method definitions associated
                                                            with generic functions, also defined via object classes
    Later versions of the system replaced the inter-        (Chambers, 2008, Chapters 9 and 10, for the R ver-
face language with individual functions providing           sion).
interfaces to C or Fortran, but the combination of              The functional and object-oriented facets were
an interactive language and an interface to com-            combined in S and R, but they appear more fre-
piled procedures remains a key feature of R. Increas-       quently in separate, incompatible languages. Typi-
ingly, interfaces to other languages and systems have       cal object-oriented languages such as C++ or Java are
been added as well, for example to database systems,        not functional; instead, their programming model is
spreadsheets and user interface software.                   of methods associated with a class rather than with a


The R Journal Vol. 1/1, May 2009                                                                    ISSN 2073-4859
I NVITED S ECTION : T HE F UTURE OF R                                                                             7


function and invoked on an object. Because the ob-        essential tools provided in R to create, develop, test,
ject is treated as a reference, the methods can usu-      and distribute packages. Among many benefits for
ally modify the object, again quite a different model     users, these tools help programmers create software
from that leading to R. The functional use of meth-       that can be installed and used on the three main op-
ods is more complicated, but richer and indeed nec-       erating systems for computing with data—Windows,
essary to support the essential role of functions. A      Linux, and Mac OS X.
few other languages do combine functional structure
with methods, such as Dylan, (Shalit, 1996, chapters          The sixth facet for our discussion is that R is a col-
5 and 6), and comparisons have proved useful for R,       laborative enterprise, which a large group of people
but came after the initial design was in place.           share in and contribute to. Central to the collabo-
                                                          rative facet is, of course, that R is a freely available,
                                                          open-source system, both the core R software and
Modular design and collaborative                          most of the packages built upon it. As most read-
                                                          ers of this journal will be aware, the combination of
support                                                   the collaborative efforts and the package mechanism
                                                          have enormously increased the availability of tech-
Our third pair of facets arrives with R. The paper by
                                                          niques for data analysis, especially the results of new
Ihaka and Gentleman (1996) introduced a statistical
                                                          research in statistics and its applications. The pack-
software system, characterized later as “not unlike
                                                          age management tools, in fact, illustrate the essential
S”, but distinguished by some ideas intended to im-
                                                          role of collaboration. Another highly collaborative
prove on the earlier system. A facet of R related to
                                                          contribution has facilitated use of R internationally
some of those ideas and to important later develop-
                                                          (Ripley, 2005), both by the use of locales in compu-
ments is its approach to modularity. In a number of
                                                          tations and by the contribution of translations for R
fields, including architecture and computer science,
                                                          documentation and messages.
modules are units designed to be useful in them-
selves and to fit together easily to create a desired          While the collaborative facet of R is the least tech-
result, such as a building or a software system. R has    nical, it may eventually have the most profound im-
two very important levels of modules: functions and       plications. The growth of the collaborative commu-
packages.                                                 nity involved is unprecedented. For example, a plot
    Functions are an obvious modular unit, espe-          in Fox (2008) shows that the number of packages
cially functions as objects. R extended the modu-         in the CRAN repository exhibited literal exponen-
larity of functions by providing them with an envi-       tial growth from 2001 to 2008. The current size of
ronment, usually the environment in which the func-       this and other repositories implies at a conservative
tion object was created. Objects assigned in this en-     estimate that several hundreds of contributors have
vironment can be accessed by name in a call to the        mastered quite a bit of technical expertise and satis-
function, and even modified, by stepping outside           fied some considerable formal requirements to offer
the strict functional programming model. Functions        their efforts freely to the worldwide scientific com-
sharing an environment can thus be used together          munity.
as a unit. The programming technique resulting is
usually called programming with closures in R; for            This facet does have some technical implications.
a discussion and comparison with other techniques,        One is that, being an open-source system, R is gener-
see (Chambers, 2008, section 5.4). Because the eval-      ally able to incorporate other open-source software
uator searches for external objects in the function’s     and does indeed do so in many areas. In contrast,
environment, dependencies on external objects can         proprietary systems are more likely to build exten-
be controlled through that environment. The names-        sions internally, either to maintain competitive ad-
pace mechanism (Tierney, 2003), an important con-         vantage or to avoid the free distribution require-
tribution to trustworthy programming with R, uses         ments of some open-source licenses. Looking for
this technique to help ensure consistent behavior of      quality open-source software to extend the system
functions in a package.                                   should continue to be a favored strategy for R de-
    The larger module introduced by R is probably         velopment.
the most important for the system’s growing use, the
package. As it has evolved, and continues to evolve,          On the downside, a large collaborative enterprise
an R package is a module combining in most cases          with a general practice of making collective decisions
R functions, their documentation, and possibly data       has a natural tendency towards conservatism. Rad-
objects and/or code in other languages.                   ical changes do threaten to break currently working
    While S always had the idea of collections of soft-   features. The future benefits they might bring will of-
ware for distribution, somewhat formalized in S4          ten not be sufficiently persuasive. The very success
as chapters, the R package greatly extends and im-        of R, combined with its collaborative facet, poses a
proves the ability to share computing results as ef-      challenge to cultivate the “next big step” in software
fective modules. The mechanism benefits from some          for data analysis.


The R Journal Vol. 1/1, May 2009                                                                  ISSN 2073-4859
8                                                                         I NVITED S ECTION : T HE F UTURE OF R


Implications                                               “workarounds” are a good thing or not. One may
                                                           feel that the total effort devoted to multiple approx-
A number of specific implications of the facets of R,       imate solutions would have been more productive if
and of their interaction, have been noted in previous      coordinated and applied to improving the core im-
sections. Further implications are suggested in con-       plementation. In practice, however, the nature of the
sidering the current state of R and its future possibil-   collaborative effort supporting R makes it much eas-
ities.                                                     ier to obtain a modest effort from one or a few indi-
    Many reasons can be suggested for R’s increasing       viduals than to organize and execute a larger project.
popularity. Certainly part of the picture is a produc-     Perhaps the growth of R will encourage the com-
tive interaction among the interface facet, the modu-      munity to push for such projects, with a new view
larity of packages, and the collaborative, open-source     of funding and organization. Meanwhile, we can
approach, all in an interactive mode. From the very        hope that the growing communication facilities in
beginning, S was not viewed as a set of procedures         the community will assess and improve on the ex-
(the model for SAS, for example) but as an interactive     isting efforts. Particularly helpful facilities in this
system to support the implementation of new pro-           context include the CRAN task views (http://cran.
cedures. The growth in packages noted previously           r-project.org/web/views/), the mailing lists and
reflects the same view.                                     publications such as this journal.
    The emphasis on interfaces has never diminished
and the range of possible procedures across the in-
terface has expanded greatly. That R’s growth has          Bibliography
been especially strong in academic and other re-
search environments is at least partly due to the abil-    R. A. Becker, J. M. Chambers, and A. R. Wilks. The
ity to write interfaces to existing and new software         New S Language. Chapman & Hall, London, 1988.
of many forms. The evolution of the package as             J. M. Chambers. Programming with Data. Springer,
an effective module and of the repositories for con-          New York, 1998. URL http://cm.bell-labs.
tributed software from the community have resulted            com/cm/ms/departments/sia/Sbook/. ISBN 0-387-
in many new interfaces.                                       98503-4.
    The facets also have implications for the possible
future of R. The hope is that R can continue to sup-       J. M. Chambers. Software for Data Analysis: Program-
port the implementation and communication of im-              ming with R. Springer, New York, 2008. ISBN 978-
portant techniques for data analysis, contributed and         0-387-75935-7.
used by a growing community. At the same time,
                                                           J. M. Chambers and T. J. Hastie. Statistical Mod-
those interested in new directions for statistical com-
                                                              els in S. Chapman & Hall, London, 1992. ISBN
puting will hope that R or some future alternative
                                                              9780412830402.
engages a variety of challenges for computing with
data. The collaborative community seems the only           J. Fox. Editorial. R News, 8(2):1–2, October 2008. URL
likely generator of the new features required but, as         http://CRAN.R-project.org/doc/Rnews/.
noted in the previous section, combining the new
with maintenance of a popular current collaborative        R. Ihaka and R. Gentleman. R: A language for data
system will not be easy.                                     analysis and graphics. Journal of Computational and
    The encouragement for new packages generally             Graphical Statistics, 5:299–314, 1996.
and for interfaces to other software in particular are     K. E. Iverson. A Programming Language. Wiley, 1962.
relevant for this question also. Changes that might
be difficult if applied internally to the core R im-        B. D. Ripley. Internationalization features of R 2.1.0.
plementation can often be supplied, or at least ap-          R News, 5(1):2–7, May 2005. URL http://CRAN.
proximated, by packages. Two examples of enhance-            R-project.org/doc/Rnews/.
ments considered in discussions are an optional ob-        A. Shalit. The Dylan Reference Manual. Addison-
ject model using mutable references and computa-             Wesley Developers Press, Reading, Mass., 1996.
tions using multiple threads. In both cases, it can          ISBN 0-201-44211-6.
be argued that adding the facility to R could extend
its applicability and performance. But a substantial       L. Tierney. Name space management for R. R News,
programming effort and knowledge of the current               3(1):2–6, June 2003. URL http://CRAN.R-project.
implementation would be required to modify the in-            org/doc/Rnews/.
ternal object model or to make the core code thread-
safe. Issues of back compatibility are likely to arise
                                                           John M. Chambers
as well. Meanwhile, less ambitious approaches in
                                                           Department of Statistics
the form of packages that approximate the desired
                                                           Stanford University
behavior have been written. For the second exam-
                                                           USA
ple especially, these may interface to other software
                                                           jmc@r-project.org
designed to provide this facility.
    Opinions may reasonably differ on whether these
The R Journal Vol. 1/1, May 2009                                                                 ISSN 2073-4859
I NVITED S ECTION : T HE F UTURE OF R                                                                            9



Collaborative Software Development
Using R-Forge
Special invited paper on “The Future of R”                  Forge, host a project-specific website, and how pack-
                                                            age maintainers submit a package to the Compre-
by Stefan Theußl and Achim Zeileis                          hensive R Archive Network (CRAN, http://CRAN.
                                                            R-project.org/). Finally, we summarize recent de-
                                                            velopments and give a brief outlook to future work.
Introduction
Open source software (OSS) is typically created in a        R-Forge
decentralized self-organizing process by a commu-
nity of developers having the same or similar inter-        R-Forge offers a central platform for the develop-
ests (see the famous essay by Raymond, 1999). A             ment of R packages, R-related software and other
key factor for the success of OSS over the last two         projects.
decades is the Internet: Developers who rarely meet             R-Forge is based on GForge (Copeland et al.,
face-to-face can employ new means of communica-             2006) which is an open source fork of the 2.61 Source-
tion, both for rapidly writing and deploying software       Forge code maintained by Tim Perdue, one of the
(in the spirit of Linus Torvald’s “release early, release   original SourceForge authors. GForge has been mod-
often paradigm”). Therefore, many tools emerged             ified to provide additional features for the R commu-
that assist a collaborative software development pro-       nity, namely a CRAN-style repository for hosting de-
cess, including in particular tools for source code         velopment releases of R packages as well as a quality
management (SCM) and version control.                       management system similar to that of CRAN. Pack-
     In the R world, SCM is not a new idea; in fact,        ages hosted on R-Forge are provided in source form
the R Development Core Team has always been us-             as well as in binary form for Mac OS X and Windows.
ing SCM tools for the R sources, first by means of           They can be downloaded from the website of the cor-
Concurrent Versions System (CVS, see Cederqvist et          responding project on R-Forge or installed directly in
al., 2006), and then via Subversion (SVN, see Pilato        R; for a package foo, say, install.packages("foo",
et al., 2004). A central repository is hosted by ETH        repos = "http://R-Forge.R-project.org").
Zürich mainly for managing the development of the               On R-Forge, developers organize their work in
base R system. Mailing lists like R-help, R-devel and       so-called “Projects”. Every project has various tools
many others are currently the main communication            and web-based features for software development,
channels in the R community.                                communcation and other services. All features men-
     Also beyond the base system, many R con-               tioned in the following sections are accessible via so-
tributors employ SCM tools for managing their R             called “Tabs”: e.g., user accounts can be managed in
packages, e.g., via web-based SVN repositories like         the My Page tab or a list of available projects can be
SourceForge (http://SourceForge.net/) or Google             displayed using the Project Tree tab.
Code (http://Code.Google.com/). However, there                  Since starting the platform in early 2007, more
has been no central SCM repository providing ser-           and more interested users registered their projects on
vices suited to the specific needs of R package de-          R-Forge. Now, after more than two years of develop-
velopers. Since early 2007, the R-project offers such       ment and testing, around 350 projects and more than
a central platform to the R community. R-Forge              900 users are registered on R-Forge. This and the
(http://R-Forge.R-project.org/) provides a set of           steadily growing list of feature requests show that
tools for source code management and various web-           there is a high demand for centralized source code
based features. It aims to provide a platform for           management tools and for releasing prototype code
collaborative development of R packages, R-related          frequently among the R community.
software or further projects. R-Forge is closely re-            In the next three sections, we summarize the core
lated to the most famous of such platforms—the              features of R-Forge and what R-Forge offers to the R
world’s largest OSS development website—namely              community in terms of collaborative development of
http://SourceForge.net/.                                    R-related software projects.
     The remainder of this article is organized as fol-
lows. First, we present the core features that R-           Source code management
Forge offers to the R community. Second, we give a
hands-on tutorial on how users and developers can           When carrying out software projects, source files
get started with R-Forge. In particular, we illustrate      change over time, new files get created and old files
how people can register, set up new projects, use R-        deleted. Typically, several authors work on several
Forge’s SCM facilities, provide their packages on R-        computers on the same and/or different files and


The R Journal Vol. 1/1, May 2009                                                                  ISSN 2073-4859
10                                                                        I NVITED S ECTION : T HE F UTURE OF R


keeping track of every change can become a tedious         velopers so that they can improve the package to
task. In the open source community, the general so-        pass all tests on R-Forge and subsequently on CRAN.
lution to this problem is to use version control, typi-        Second, bug tracking systems allow users to no-
cally provided by the majority of SCM tools. For this      tify package authors about problems they encounter.
reason R-Forge utilizes SVN to facilitate the devel-       In the spirit of OSS—given enough eyeballs, all
oper’s work when creating software.                        bugs are shallow (Raymond, 1999)—peer code re-
    A central repository ensures that the developer        view leads to an overall improvement of the quality
always has access to the current version of the            of software projects.
project’s source code. Any of the authorized col-
laborators can “check out” (i.e., download) or “up-
                                                           Additional features
date” the project file structure, make the necessary
changes or additions, delete files from the current re-     A number of further tools, of increased interest for
vision and finally “commit” changes or additions to         larger projects, help developers to coordinate their
the repository. More than that, SVN keeps track of         work and to communicate with their user base.
the complete history of the project file structure. At      These tools include:
any point in the development stage it is possible to
go back to any previous stage in the history to inspect       • Project websites: Developers may present their
and restore old files. This is called version control, as        work on a subdomain of R-Forge, e.g., http:
every stage automatically is assigned a unique ver-             //foo.R-Forge.R-project.org/, or via a link
sion number which increases over time.                          to an external website.
    On R-Forge such a version-controlled repository
is automatically created for each project. To get             • Mailing lists: By default a list foo-commits@
started, the project members just have to install the           lists.R-Forge.R-project.org is automati-
client of their choice (e.g., Tortoise SVN on Windows           cally created for each project. Additional mail-
or svnX on Mac OS X) and check out the repository.              ing lists can be set up easily.
In addition to the inherent backup of every version           • Project categorization: Administrators may
within the repository a backup of the whole reposi-             categorize their project in several so-called
tory is generated daily.                                        “Trove Categories” in the Admin tab of their
    A rights management system assures that, by de-             project (under Trove Categorization). For each
fault, anonymous users have read access and devel-              category three different items can be selected.
opers have write access to the data associated with a           This enables users to quickly find what they are
certain project on R-Forge. More precisely, registered          looking for using the Project Tree tab.
users can be granted one of several roles: e.g., the
“Administrator” has all rights including the right to         • News: Announcements and other information
add new users to the project or release packages di-            related to a project can be put on the project
rectly to CRAN. He/she is usually the package main-             summary page as well as on the home page of
tainer, the project leader or has registered the project        R-Forge. The latter needs approval by one of
originally. Other members of a project typically have           the R-Forge administrators. All items are avail-
either the role “Senior Developer” or “Junior Devel-            able as RSS feeds.
oper” which both have permission to commit to the
project SVN repository and examine the log files in            • Forums: Discussion forums can be set up sepa-
the R Packages tab (the differences between the two             rately by the project administrators.
roles are subtle, e.g., senior developers additionally
have administrative privileges in several places in
the project). When we speak of developers in sub-          How to get started
sequent sections we refer to project members having
the rights at least of a junior developer.                 This section is intended to be a hands-on tutorial for
                                                           new users. Depending on familiarity with the sys-
                                                           tems/tools involved the instructions might be some-
Release and quality management                             what brief. In case of problems we recommend con-
                                                           sulting the user’s manual (R-Forge Administration
Development versions of a software project are typ-        and Development Team, 2008) which contains de-
ically prototypes and are subject to many changes.         tailed step-by-step instructions.
Thus, R-Forge offers two tools which assist the devel-
                                                               When accessing the URL http://R-Forge.
opers in improving the quality of their source code.
                                                           R-project.org/ the home page is presented (see
    First, it offers a quality management system sim-      Figure 1). Here one can
ilar to that of CRAN. Packages on R-Forge are
checked in a standardized way on different plat-              • login,
forms based on R CMD check at least once daily. The
resulting log files can be examined by the project de-         • register a user or a project,


The R Journal Vol. 1/1, May 2009                                                                 ISSN 2073-4859
I NVITED S ECTION : T HE F UTURE OF R                                                                            11




                                 Figure 1: Home page of R-Forge on 2009-03-10


   • download the documentation,                           Registering a project
                                                           There are two possibilities to register a project: Click-
   • examine the latest news and changes,
                                                           ing on Register Your Project on the home page or going
                                                           to the My Page tab and clicking on Register Project (see
   • go to a specific project website either by search-     Figure 2 below the main tabs). Both links lead to a
     ing for available projects (top middle of the         form which has to be filled out in order to finish the
     page), by clicking on one of the projects listed      registration process. In the text field “Project Pub-
     on the right, or by going through the listing in      lic Description” the registrant is supposed to enter a
     the Project Tree tab.                                 concise description of the project. This text and the
                                                           “Project Full Name” will be shown in several places
    Registered users can access their personal page        on R-Forge (see e.g., Figure 1). The text entered in the
via the tab named My Page. Figure 2 shows the per-         field “Project Purpose And Summarization” is addi-
sonal page of the first author.                             tional information for the R-Forge administrators, in-
                                                           spected for approval. “Project Unix Name” refers to
                                                           the name which uniquely determines the project. In
Registering as a new user                                  the case of a project that contains a single R pack-
                                                           age, the project Unix name typically corresponds to
To use R-Forge as a developer, one has to register as      the package name (in its lower-case version). Restric-
a site user. A link on the main web site called New        tions according to the Unix file system convention
Account on the top right of the home page leads to         force Unix names to be in lower case (and will be
the corresponding registration form.                       converted automatically if they are typed in upper
    After submitting the completed form, an e-mail is      case).
sent to the given address containing a link for activat-       After the completed form is submitted, the
ing the account. Subsequently, all R-Forge features,       project has to be approved by the R-Forge admin-
including joining existing or creating new projects,       istrators and a confirmation e-mail is sent to the
are available to logged-in users.                          registrant upon approval. After that, the regis-


The R Journal Vol. 1/1, May 2009                                                                   ISSN 2073-4859
12                                                                       I NVITED S ECTION : T HE F UTURE OF R




                                  Figure 2: The My Page tab of the first author


trant automatically becomes the project administra-       if the package is already version-controlled some-
tor and the standardized web area of the project          where else—the corresponding parts of the reposi-
(http://R-Forge.R-project.org/projects/foo/) is           tory including the history can be migrated to R-Forge
immediately available on R-Forge. This web area in-       (see Section 6 of the user’s manual).
cludes a Summary page, an Admin tab visible only to
                                                              The SCM tab of a project explains how
the project administrators, and various other tabs de-
                                                          the corresponding SVN repository located at
pending on the features enabled for this project in the
                                                          svn://svn.R-Forge.R-project.org/svnroot/foo
Admin tab. To present the new project to a broader
                                                          can be checked out. From this URL the sources are
community the name of the project additionally is
                                                          checked out either anonymously without write per-
promoted on the home page under “Recently Reg-
                                                          mission (enabled by default) or as developer using
istered Projects” (see Figure 1).
                                                          an encrypted authentication method, namely secure
    Furthermore, within an hour after approval a de-      shell (ssh). Via this secure protocol (svn+ssh:// fol-
fault mailing list and the project’s SVN repository       lowed by the registered user name) developers have
containing a ‘README’ file and two pre-defined di-          full access to the repository. Typically, the developer
rectories called ‘pkg’ and ‘www’ are created. The con-    is authenticated via the registered password but it is
tent of these directories is used by the R-Forge server   also possible to upload a secure shell key (updated
for creating R packages and a project website, respec-    hourly) to make use of public/private key authenti-
tively.                                                   cation. Section 3.2 of the user’s manual explains this
                                                          process in more detail.
SCM and R packages                                            To make use of the package builds and checks
                                                          the package source code has to be put into the ‘pkg’
The first step after creation of a project is typically    directory of the repository (i.e., ‘pkg/DESCRIPTION’,
to start generation of content for one (or more) R        ‘pkg/R’, ‘pkg/man’, etc.) or, alternatively, a subdirec-
package(s) in the ‘pkg’ directory. Developers can ei-     tory of ‘pkg’. The latter structure allows developers to
ther start committing changes via SVN as usual or—        have more than one package in a single project; e.g.,


The R Journal Vol. 1/1, May 2009                                                                 ISSN 2073-4859
I NVITED S ECTION : T HE F UTURE OF R                                                                       13


if a project consists of the packages foo and bar, then   Recent and future developments
the source code is located in ‘pkg/foo’ and ‘pkg/bar’,
respectively.                                             In this section, we briefly summarize the major
     R-Forge automatically examines the ‘pkg’ direc-      changes and updates to the R-Forge system during
tory of every repository and builds the package           the last few months and give an outlook to future
sources as well as the package binaries on a daily        developments.
basis for Mac OS X and Windows (if applicable).              Recently added features and major changes in-
The package builds are provided in the R Pack-            clude:
ages tab (see Figure 3) for download or can be in-
stalled directly in R using install.packages("foo",          • New defaults for freshly registered projects:
repos="http://R-Forge.R-project.org"). Further-                Only the tabs Lists, SCM and R packages are en-
more, in the R Packages tab developers can examine             abled initially. Forums, Tracker and News can
logs of the build and check process on different plat-         be activated separately (in order not to over-
forms.                                                         whelm new users with too many features) us-
     To release a package to CRAN the project ad-              ing Edit Public Info in the Admin tab of the
ministrator clicks on Submit this package to CRAN in           project. Experienced users can decide which
the R Packages tab. Upon confirmation, a message                features they want and activate them.
will be sent to CRAN@R-project.org and the latest
successfully built source package (the ‘.tar.gz’ file)        • An enhanced structure in the SVN repository
is automatically copied to ftp://CRAN.R-project.               allowing multiple packages in a single project
org/incoming/. Note that packages are built once               (see above).
daily, i.e., the latest source package does not include      • The R package RForgeTools (Theußl, 2009)
more recent code committed to the SVN repository.              contains platform-independent package build-
     Figure 3 shows the R Packages tab of the tm               ing and quality management code used on the
project (http://R-Forge.R-project.org/projects/                R-Forge servers.
tm/) as one of the project administrators would see
it. Depending on whether you are a member of the             • A modified News submit page offering two
project or not and your rights you will see only parts         types of submissions: project-specific and
of the information provided for each package.                  global news. The latter needs approval by one
                                                               of the R-Forge administrators (default: project-
Further steps                                                  only submission).

A customized project website, accessible through             • Circulation of SVN commit messages can be
http://foo.R-Forge.R-project.org/ where foo                    enabled in the SCM Admin tab by project ad-
corresponds to the unix name of the project, is man-           ministrators. The mailing list mentioned in the
aged via the ‘www’ directory. The website gets up-             text field is used for delivering the SVN commit
dated every hour.                                              messages (default: off).
   The changes made to the project can be exam-
                                                             • Mailing list search facilities provided by the
ined by entering the corresponding standardized
                                                               Swish-e engine which can be accessed via the
web area. On entry, the Summary page is shown.
                                                               List tab (private lists are not included in the
Here, one can
                                                               search index).
   • examine the details of the project including a
                                                             Further features and improvements which are
     short description and a listing of the adminis-
                                                          currently on the wishlist or under development in-
     trators and developers,
                                                          clude
   • follow a link leading to the project homepage,
                                                             • a Wiki,
   • examine the latest news announcements (if
     available),                                             • task management facilities,

   • go to other sections of the project like Forums,        • a re-organized tracker more compatible with R
     Tracker, Lists, R Packages, etc.                          package development and,

   • follow the download link leading directly to            • an update of the underlying GForge sys-
     the available packages of the project (i.e., the          tem to its successor FusionForge (http://
     R Packages tab).                                          FusionForge.org/).

   Furthermore, meta-information about a project             For suggestions, problems, feature requests, and
can be supplied in the Admin tab via so-called “Trove     other questions regarding R-Forge please contact
Categorization”.                                          R-Forge@R-project.org.


The R Journal Vol. 1/1, May 2009                                                               ISSN 2073-4859
14                                                                       I NVITED S ECTION : T HE F UTURE OF R




                                   Figure 3: R Packages tab of the tm project


Acknowledgments                                             2004.   Full book available online at http://
                                                            svnbook.red-bean.com/.
Setting up this project would not have been possi-
ble without Douglas Bates and the University of Wis-      R-Forge Administration and Development Team. R-
consin as they provided us with a server for hosting        Forge User’s Manual, 2008. URL http://download.
this platform. Furthermore, the authors would like          R-Forge.R-project.org/R-Forge.pdf.
to thank the Computer Center of the Wirtschaftsuni-
                                                          E. S. Raymond. The Cathedral and the Bazaar.
versität Wien for their support and for providing us
                                                            Knowledge, Technology, and Policy, 12(3):23–49, 1999.
with additional hardware as well as a professional
server infrastructure.                                    S. Theußl. RForgeTools: R-Forge Build and Check
                                                            Tools, 2009. URL http://R-Forge.R-project.org/
                                                            projects/site/. R package version 0.3-3.
Bibliography
P. Cederqvist et al. Version Management with CVS.         Stefan Theußl
   Network Theory Limited, Bristol, 2006.     Full        Department of Statistics and Mathematics
   book available online at http://ximbiot.com/           WU Wirtschaftsuniversität Wien
   cvs/manual/.                                           Austria
T. Copeland, R. Mas, K. McCullagh, T. Per-                stefan.theussl@wu.ac.at
  due, G. Smet, and R. Spisser.    GForge Man-
  ual, 2006. URL http://gforge.org/gf/download/           Achim Zeileis
  docmanfileversion/8/1700/gforge_manual.pdf.             Department of Statistics and Mathematics
                                                          WU Wirtschaftsuniversität Wien
C. M. Pilato, B. Collins-Sussman, and B. W. Fitz-         Austria
  patrick. Version Control with Subversion. O’Reilly,     achim.zeileis@wu.ac.at




The R Journal Vol. 1/1, May 2009                                                                ISSN 2073-4859
C ONTRIBUTED R ESEARCH A RTICLES                                                                                                   15



Drawing Diagrams with R
by Paul Murrell                                                      of creating this sort of diagram by pushing objects
                                                                     around the screen with a mouse fills me with dread.
R provides a number of well-known high-level facil-                  Maybe I’m just not a very GUI guy.
ities for producing sophisticated statistical plots, in-                 Before we look at drawing diagrams with the core
cluding the “traditional” plots in the graphics pack-                R graphics facilties, it is important to acknowledge
age (R Development Core Team, 2008), the Trellis-                    that several contributed R packages already pro-
style plots provided by lattice (Sarkar, 2008), and the              vide facilities for drawing diagrams. The Rgraphviz
grammar-of-graphics-inspired approach of ggplot2                     (Gentry et al., 2008) and igraph (Csardi and Nepusz,
(Wickham, 2009).                                                     2006) packages provide automated layout of node-
    However, R also provides a powerful set of low-                  and-edge graphs, and the shape and diagram pack-
level graphics facilities for drawing basic shapes and,              ages (Soetaert, 2008b,a) provide functions for draw-
more importantly, for arranging those shapes relative                ing nodes of various shapes with lines and arrows
to each other, which can be used to draw a wide vari-                between them, with manual control over layout.
ety of graphical images. This article highlights some                    In this article, we will only be concerned with
of R’s low-level graphics facilities by demonstrating                drawing diagrams with a small number of elements,
their use in the production of diagrams. In particular,              so we do not need the automated layout facilities
the focus will be on some of the useful things that can              of Rgraphviz or igraph. Furthermore, while the
be done with the low-level facilities provided by the                shape and diagram packages provide flexible tools
grid graphics package (Murrell, 2002, 2005b,a).                      for building node-and-edge diagrams, the point of
                                                                     this article is to demonstrate low-level grid func-
                                                                     tions. We will use a node-and-edge diagram as the
Starting at the end                                                  motivation, but the underlying ideas can be applied
                                                                     to a much wider range of applications.
An example of the type of diagram that we are going                      In each of the following sections, we will meet a
to work towards is shown below. We have several                      basic low-level graphical tool and demonstrate how
“boxes” that describe table schema for a database,                   it can be used in the generation of the pieces of an
with lines and arrows between the boxes to show the                  overall diagram, or how the tool can be used to com-
relationships between tables.                                        bine pieces together in convenient ways.


                              book_table        publisher_table
    book_author_table
    ID
                              ISBN
                              title
                                                ID
                                                name                 Graphical primitives
                              pub      q
                                                country
    book            q



    author          q

                                                                     One of the core low-level facilities of R graphics is the
                             author_table
                             ID
                                                 gender_table        ability to draw basic shapes. The typical graphical
                                                 ID
                             name
                                                 gender
                                                                     primitives such as text, circles, lines, and rectangles
                             gender        q

                                                                     are all available.
                                                                         In this case, the shape of each box in our dia-
To forestall some possible misunderstandings, the                    gram is not quite as simple as a rectangle because it
sort of diagram that we are talking about is one that                has rounded corners. However, a rounded rectangle
is designed by hand. This is not a diagram that has                  is also one of the graphical primitives that the grid
been automatically laid out.                                         package provides.1
    The sort of diagram being addressed is one where                     The code below draws a rounded rectangle with
the author of the diagram has a clear idea of what the               a text label in the middle.
end result will roughly look like—the sort of diagram
that can be sketched with pen and paper. The task is                 > library(grid)
to produce a pre-planned design, using a computer
to get a nice crisp result.                                          > grid.roundrect(width=.25)
    That being said, a reasonable question is “why                   > grid.text("ISBN")
not draw it by hand?”, for example, using a free-
hand drawing program such as Dia (Larsson, 2008).
The advantage of using R code to produce this sort
                                                                                                   ISBN
of image is that code is easier to reproduce, reuse,
maintain, and fine-tune with accuracy. The thought
   1 From R version 2.9.0; prior to that, a simpler rounded rectangle was available via the grid.roundRect() function in the RGraphics

package.


The R Journal Vol. 1/1, May 2009                                                                                   ISSN 2073-4859
16                                                                          C ONTRIBUTED R ESEARCH A RTICLES


Viewports                                                   number of labels and the width of the viewport is
                                                            based on the width of the largest label, plus a 2 mm
A feature of the boxes in the diagram at the begin-         gap either side. This code also simplifies the labelling
ning of this article is that the text is carefully posi-    by drawing both labels in a single grid.text() call.
tioned relative to the rounded rectangle; the text is
left-aligned within the rectangle. This careful posi-       > labels <- c("ISBN", "title")
tioning requires knowing where the left edge of the         > vp <-
rectangle is on the page. Calculating those positions           viewport(width=max(stringWidth(labels))+
is annoyingly tricky and only becomes more annoy-                              unit(4, "mm"),
ing if at some later point the position of the box is                    height=unit(length(labels),
adjusted and the positions of the text labels have to                                "lines"))
be calculated all over again.                               > pushViewport(vp)
    Using a grid viewport makes this sort of position-      > grid.roundrect()
ing very simple. The basic idea is that we can create       > grid.text(labels,
a viewport where the box is going to be drawn and                       x=unit(2, "mm"),
then do all of our drawing within that viewport. Po-                    y=unit(2:1 - 0.5, "lines"),
sitioning text at the left edge of a viewport is very                   just="left")
straightforward, and if we need to shift the box, we        > popViewport()
simply shift the viewport and the text automatically
tags along for the ride. All of this applies equally to
                                                                                      ISBN
positioning the text vertically within the box.                                       title
    In the code below, we create a viewport for the
overall box, we draw a rounded rectangle occupying
the entire viewport, then we draw text 2 mm from
the left hand edge of the viewport and 1.5 lines of
text up from the bottom of the viewport. A second
                                                            Clipping
line of text is also added, 0.5 lines of text from the
                                                            Another feature of the boxes that we want to pro-
bottom.
                                                            duce is that they have shaded backgrounds. Looking
> pushViewport(viewport(width=.25))                         closely, there are some relatively complex shapes in-
> grid.roundrect()                                          volved in this shading. For example, the grey back-
> grid.text("ISBN",                                         ground for the “heading” of each box has a curvy
            x=unit(2, "mm"),                                top, but a flat bottom. These are not simple rounded
            y=unit(1.5, "lines"),                           rectangles, but some unholy alliance of a rounded
            just="left")                                    rectangle and a normal rectangle.
> grid.text("title",                                            It is possible, in theory, to achieve any sort of
            x=unit(2, "mm"),                                shape with R because there is a general polygon
            y=unit(0.5, "lines"),                           graphical primitive. However, as with the position-
            just="left")                                    ing of the text labels, determining the exact boundary
> popViewport()                                             of this polygon is not trivial and there are easier ways
                                                            to work.
                                                                In this case, we can achieve the result we want us-
                                                            ing clipping, so that any drawing that we do is only
                      ISBN
                      title                                 visible on a restricted portion of the page. R does not
                                                            provide clipping to arbitrary regions, but it is pos-
                                                            sible to set the clipping region to any rectangular re-
Coordinate systems                                          gion.
                                                                The basic idea is that we will draw the complete
The positioning of the labels within the viewport in        rounded rectangle, then set the clipping region for
the previous example demonstrates another useful            the box viewport so that no drawing can occur in
feature of the grid graphics system: the fact that loca-    the last line of text in the box and then draw the
tions can be specified in a variety of coordinate sys-       rounded rectangle again, this time with a different
tems or units. In that example, the text was posi-          background. If we continue doing this, we end up
tioned horizontally in terms of millimetres and ver-        with bands of different shading.
tically in terms of lines of text (which is based on the        The following code creates an overall viewport
font size in use).                                          for a box and draws a rounded rectangle with a grey
    As another example of the use of these different        fill. The code then sets the clipping region to start
units, we can size the overall viewport so that it is       one line of text above the bottom of the viewport and
just the right size to fit the text labels. In the follow-   draws another rounded rectangle with a white fill.
ing code, the height of the viewport is based on the        The effect is to leave just the last line of the original


The R Journal Vol. 1/1, May 2009                                                                    ISSN 2073-4859
C ONTRIBUTED R ESEARCH A RTICLES                                                                             17


grey rounded rectangle showing beneath the white          add an arrow to either end of any straight or curvy
rounded rectangle that has had its last line clipped.     line that R draws.
                                                              The following code draws three curves between
> pushViewport(viewport(width=.25))                       pairs of end points. The first curve draws the default
> grid.roundrect(gp=gpar(fill="grey"))                    “city-block” line between end points, with a smooth
> grid.clip(y=unit(1, "lines"),                           corner at the turning point, the second curve is sim-
            just="bottom")                                ilar, but with an extra corner added, and the third
> grid.roundrect(gp=gpar(fill="white"))                   curve draws a single wide, smooth corner that is dis-
> popViewport()                                           torted towards the end point. The third curve also
                                                          has an arrow at the end.

                                                          > x1a <- 0.1; x1b <- 0.2
                                                          > y1a <- 0.2; y1b <- 0.8
                                                          > grid.curve(x1a, y1a, x1b, y1b)
                                                          > x2a <- 0.4; x2b <- 0.5
                                                          > y2a <- 0.2; y2b <- 0.8
Drawing curves                                            > grid.curve(x2a, y2a, x2b, y2b,
Another basic shape that is used in the overall dia-                   inflect=TRUE)
gram is a nice curve from one box to another.             > x3a <- 0.7; x3b <- 0.8
    In addition to the basic functions to draw straight   > y3a <- 0.2; y3b <- 0.8
lines in R, there are functions that draw curves. In      > grid.curve(x3a, y3a, x3b, y3b,
particular, R provides a graphical primitive called an                 ncp=8, angle=135,
X-spline (Blanc and Schlick, 1995). The idea of an X-                  square=FALSE,
spline is that we define a set of control points and a                  curvature=2,
curve is drawn either through or near to the control                   arrow=arrow(angle=15))
points. Each control point has a parameter that spec-               q               q               q
ifies whether to create a sharp corner at the control
point, or draw a smooth curve through the control              q               q               q

point, or draw a smooth curve that passes nearby.
    The following code sets up sets of three control
points and draws an X-spline relative to each set of
control points. The first curve makes a sharp corner       Graphical functions
at the middle control point, the second curve makes a
                                                          The use of graphical primitives, viewports, coordi-
smooth corner through the middle control point, and
                                                          nate systems, and clipping, as described so far, can
the third curve makes a smooth corner near the mid-
                                                          be used to produce a box of the style shown in the
dle control point. The control points are drawn as
                                                          diagram at the start of the article. For example, the
grey dots for reference (code not shown).
                                                          following code produces a box containing three la-
>   x1 <- c(0.1, 0.2, 0.2)                                bels, with background shading to assist in differenti-
>   y1 <- c(0.2, 0.2, 0.8)                                ating among the labels.
>   grid.xspline(x1, y1)                                  > labels <- c("ISBN", "title", "pub")
>   x2 <- c(0.4, 0.5, 0.5)                                > vp <-
>   y2 <- c(0.2, 0.2, 0.8)                                    viewport(width=max(stringWidth(
>   grid.xspline(x2, y2, shape=-1)                                                     labels))+
>   x3 <- c(0.7, 0.8, 0.8)                                                   unit(4, "mm"),
>   y3 <- c(0.2, 0.2, 0.8)                                             height=unit(length(labels),
>   grid.xspline(x3, y3, shape=1)                                                  "lines"))
                                                          > pushViewport(vp)
          q                q               q
                                                          > grid.roundrect()
                                                          > grid.clip(y=unit(1, "lines"),
      q   q          q     q          q    q
                                                                      just="bottom")
    Determining where to place the control points for     > grid.roundrect(gp=gpar(fill="grey"))
a curve between two boxes is another one of those         > grid.clip(y=unit(2, "lines"),
annoying calculations, so a more convenient option                    just="bottom")
is provided by a curve graphical primitive in grid.       > grid.roundrect(gp=gpar(fill="white"))
The idea of this primitive is that we simply spec-        > grid.clip()
ify the start and end points of the curve and R fig-       > grid.text(labels,
ures out a set of reasonable control points to produce                x=unit(rep(2, 3), "mm"),
an appropriate X-spline. It is also straightforward to                y=unit(3:1 - .5, "lines"),


The R Journal Vol. 1/1, May 2009                                                                ISSN 2073-4859
18                                                                           C ONTRIBUTED R ESEARCH A RTICLES


            just="left")                                     > box1 <- boxGrob(c("ISBN", "title",
> popViewport()                                                          "pub"), x=0.3)
                                                             > box2 <- boxGrob(c("ID", "name",
                           ISBN                                          "country"), x=0.7)
                           title
                           pub
                                                                 The grid.draw() function can be used to draw
    However, in the sort of diagram that we want to          any graphical object, but we need to supply the de-
produce, there will be several such boxes. Rather            tails of how "box" objects get drawn. This is the pur-
than write separate code for each box, it makes sense        pose of the second function in Figure 2. This function
to write a general function that will work for any set       is a method for the drawDetails() function; it says
of labels. Such a function is shown in Figure 1 and          how to draw "box" objects. In this case, the func-
the code below uses this function to draw two boxes          tion is very simple because it can call the tableBox()
side by side.                                                function that we defined in Figure 1. The important
                                                             detail is that the boxGrob() function specified a spe-
> tableBox(c("ISBN", "title", "pub"),                        cial class, cl="box", for "box" objects, which meant
           x=0.3)                                            that we could define a drawDetails() method specif-
> tableBox(c("ID", "name", "country"),                       ically for this sort of object and control what gets
           x=0.7)                                            drawn.
                                                                 With this drawDetails() method defined, we
                                                             can draw the boxes that we created earlier by call-
               ISBN                  ID
               title                 name                    ing the grid.draw() function. This function will
               pub                   country                 draw any grid graphical object by calling the appro-
                                                             priate method for the drawDetails() generic func-
    This function represents the simplest way to ef-
                                                             tion (among other things). The following code calls
ficiently reuse graphics code and to provide graph-
                                                             grid.draw() to draw the two boxes.
ics code for others to use. However, there are ben-
efits to be gained from going beyond this procedu-
                                                             > grid.draw(box1)
ral programming style to a slightly more complicated
                                                             > grid.draw(box2)
object-oriented approach.

                                                                           ISBN                 ID
                                                                           title                name
Graphical objects                                                          pub                  country


In order to achieve the complete diagram introduced              At this point, we appear to have achieved only a
at the start of this article, we need one more step:         more complicated equivalent of the previous graph-
we need to draw lines and arrows from one box to             ics function. However, there are a number of
another. We already know how to draw lines and               other functions that can do useful things with grid
curves between two points; the main difficulty that           graphical objects. For example, the grobX() and
remains is calculating the exact position of the start       grobY() functions can be used to calculate locations
and end points, because these locations depend on            on the boundary of a graphical object. As with
the locations and dimensions of the boxes. The cal-          grid.draw(), which has to call drawDetails() to find
culations could be done by hand for each individual          out how to draw a particular class of graphical object,
curve, but as we have seen before, there are easier          these functions call generic functions to find out how
ways to work. The crucial idea for this step is that         to calculate locations on the boundary for a particu-
we want to create not just a graphical function that         lar class of object. The generic functions are called
encapsulates how to draw a box, but define a graphi-          xDetails() and yDetails() and methods for our
cal object that encapsulates information about a box.        special "box" class are defined in the last two func-
    The code in Figure 2 defines such a graphical             tions in Figure 2.
object, plus a few other things that we will get to              These methods work by passing the buck. They
shortly. The first thing to concentrate on is the             both create a rounded rectangle at the correct loca-
boxGrob() function. This function creates a "box"            tion and the right size for the box, then call grobX()
graphical object. In order to do this, all it has to do is   (or grobY()) to determine a location on the boundary
call the grob() function and supply all of the infor-        of the rounded rectangle. In other words, they rely
mation that we want to record about "box" objects.           on code within the grid package that already exists
In this case, we just record the labels to be drawn          to calculate the boundary of rounded rectangles.
within the box and the location where we want to                 With these methods defined, we are now in a po-
draw the box.                                                sition to draw a curved line between our boxes. The
    This function does not draw anything. For exam-          key idea is that we can use grobX() and grobY() to
ple, the following code creates two "box" objects, but       specify a start and end point for the curve. For exam-
produces no graphical output whatsoever.                     ple, we can start the curve at the right hand edge of


The R Journal Vol. 1/1, May 2009                                                                    ISSN 2073-4859
C ONTRIBUTED R ESEARCH A RTICLES                                                                             19




> tableBox <- function(labels, x=.5, y=.5) {
      nlabel <- length(labels)
      tablevp <-
          viewport(x=x, y=y,
                   width=max(stringWidth(labels)) +
                         unit(4, "mm"),
                   height=unit(nlabel, "lines"))
      pushViewport(tablevp)
      grid.roundrect()
      if (nlabel > 1) {
          for (i in 1:(nlabel - 1)) {
              fill <- c("white", "grey")[i %% 2 + 1]
              grid.clip(y=unit(i, "lines"), just="bottom")
              grid.roundrect(gp=gpar(fill=fill))
          }
      }
      grid.clip()
      grid.text(labels,
                x=unit(2, "mm"), y=unit(nlabel:1 - .5, "lines"),
                just="left")
      popViewport()
  }

Figure 1: A function to draw a diagram box, for a given set of labels, centred at the specified (x, y) location.




> boxGrob <- function(labels, x=.5, y=.5) {
      grob(labels=labels, x=x, y=y, cl="box")
  }
> drawDetails.box <- function(x, ...) {
      tableBox(x$labels, x$x, x$y)
  }
> xDetails.box <- function(x, theta) {
      nlines <- length(x$labels)
      height <- unit(nlines, "lines")
      width <- unit(4, "mm") + max(stringWidth(x$labels))
      grobX(roundrectGrob(x=x$x, y=x$y, width=width, height=height),
            theta)
  }
> yDetails.box <- function(x, theta) {
      nlines <- length(x$labels)
      height <- unit(nlines, "lines")
      width <- unit(4, "mm") + max(stringWidth(x$labels))
      grobY(rectGrob(x=x$x, y=x$y, width=width, height=height),
            theta)
  }

Figure 2: Some functions that define a graphical object representing a diagram box. The boxGrob() function
constructs a "box" object, the drawDetails() method describes how to draw a "box" object, and the xDetails()
and yDetails() functions calculate locations on the boundary of a "box" object.




The R Journal Vol. 1/1, May 2009                                                                ISSN 2073-4859
20                                                                        C ONTRIBUTED R ESEARCH A RTICLES


box1 by specifying grobX(box1, "east"). The ver-          are five interconnected boxes and the boxes have
tical position is slightly trickier because we do not     some additional features, such as a distinct “header”
want the line starting at the top or bottom of the box,   line at the top. This complete code was excluded
but we can simply add or subtract the appropriate         to save on space, but a simple R package is pro-
number of lines of text to get the right spot.            vided at http://www.stat.auckland.ac.nz/~paul/
    The following code uses these ideas to draw a         with code to draw that complete diagram and the
curve from the pub label of box1 to the ID label of       package also contains a more complete implementa-
box2. The curve has two corners (inflect=TRUE) and        tion of code to create and draw "box" graphical ob-
it has a small arrow at the end.                          jects.
    This call to grid.curve() is relatively verbose,          One final point is that using R graphics to draw
but in a diagram containing many similar curves,          diagrams like this is not fast. In keeping with the S
this burden can be significantly reduced by writing        tradition, the emphasis is on developing code quickly
a simple function that hides away the common fea-         and on having code that is not a complete nightmare
tures, such as the specification of the arrow head.        to maintain. In this case particularly, the speed of
    The major gain from this object-oriented ap-          developing a diagram comes at the expense of the
proach is that the start and end points of this curve     time taken to draw the diagram. For small, one-off
are described by simple expressions that will auto-       diagrams this is not likely to be an issue, but the
matically update if the locations of the boxes are        approach described in this article would not be ap-
modified.                                                  propriate, for example, for drawing a node-and-edge
                                                          graph of the Internet.
> grid.curve(grobX(box1, "east"),
             grobY(box1, "south") +
               unit(0.5, "lines"),                        Bibliography
             grobX(box2, "west"),
             grobY(box2, "north") -                       C. Blanc and C. Schlick. X-splines: a spline model de-
               unit(0.5, "lines"),                          signed for the end-user. In SIGGRAPH ’95: Proceed-
             inflect=TRUE,                                  ings of the 22nd annual conference on Computer graph-
             arrow=                                         ics and interactive techniques, pages 377–386, New
               arrow(type="closed",                         York, NY, USA, 1995. ACM. ISBN 0-89791-701-4.
                  angle=15,                                 doi: http://doi.acm.org/10.1145/218380.218488.
                  length=unit(2, "mm")),
             gp=gpar(fill="black"))                       G. Csardi and T. Nepusz. The igraph software
                                                            package for complex network research. InterJour-
                                                            nal, Complex Systems:1695, 2006. URL http://
              ISBN                 ID
              title                name                     igraph.sf.net.
              pub                  country
                                                          J. Gentry, L. Long, R. Gentleman, S. Falcon, F. Hahne,
                                                             and D. Sarkar. Rgraphviz: Provides plotting capabil-
                                                             ities for R graph objects, 2008. R package version
Conclusion                                                   1.18.1.

This article has demonstrated a number of useful          A. Larsson. Dia, 2008.       http://www.gnome.org/
low-level graphical facilities in R with an example of      projects/dia/.
how they can be combined to produce a diagram
consisting of non-trivial nodes with smooth curves        P. Murrell. The grid graphics package. R News, 2(2):
between them.                                                14–19, June 2002. URL http://CRAN.R-project.
    The code examples provided in this article have          org/doc/Rnews/.
ignored some details in order to keep things simple.
                                                          P. Murrell. Recent changes in grid graphics. R
For example, there are no checks that the arguments
                                                             News, 5(1):12–20, May 2005a. URL http://CRAN.
have sensible values in the functions tableBox() and
                                                             R-project.org/doc/Rnews/.
boxGrob(). However, for creating one-off diagrams,
this level of detail is not necessary anyway.             P. Murrell. R Graphics. Chapman & Hall/CRC,
    One detail that would be encountered quite               2005b. URL http://www.stat.auckland.ac.nz/
quickly in practice, in this particular sort of dia-         ~paul/RGraphics/rgraphics.html. ISBN 1-584-
gram, is that a curve from one box to another that           88486-X.
needs to go across-and-down rather than across-and-
up would require the addition of curvature=-1 to          R Development Core Team. R: A Language and Envi-
the grid.curve() call. Another thing that is miss-          ronment for Statistical Computing. R Foundation for
ing is complete code to produce the example dia-            Statistical Computing, Vienna, Austria, 2008. URL
gram from the beginning of this article, where there        http://www.R-project.org. ISBN 3-900051-07-0.


The R Journal Vol. 1/1, May 2009                                                                ISSN 2073-4859
C ONTRIBUTED R ESEARCH A RTICLES                                                                        21


D. Sarkar. Lattice: Multivariate Data Visualization with   H. Wickham. ggplot2. Springer, 2009. http://had.
  R. Springer, 2008. URL http://amazon.com/o/                co.nz/ggplot2/.
  ASIN/0387759689/. ISBN 9780387759685.
K. Soetaert. diagram: Functions for visualising simple
                                                           Paul Murrell
  graphs (networks), plotting flow diagrams, 2008a. R
                                                           Department of Statistics
  package version 1.2.
                                                           The University of Auckland
K. Soetaert. shape: Functions for plotting graphical       New Zealand
  shapes, colors, 2008b. R package version 1.2.            paul@stat.auckland.ac.nz




The R Journal Vol. 1/1, May 2009                                                            ISSN 2073-4859
22                                                                          C ONTRIBUTED R ESEARCH A RTICLES



The hwriter Package
Composing HTML documents with R objects                     > hwrite('Hello world !', 'doc.html')

by Gregoire Pau and Wolfgang Huber
                                                                               Hello world !
Introduction
HTML documents are structured documents made                    Character strings can be rendered with a variety
of diverse elements such as paragraphs, sections,           of formatting options. The argument link adds a hy-
columns, figures and tables organized in a hierar-           perlink to the HTML element pointing to an external
chical layout. Combination of HTML documents                document. The argument style specifies an inline
and hyperlinking is useful to report analysis results;      Cascaded Style Sheet (CSS) style (color, alignment,
for example, in the package arrayQualityMetrics             margins, font, ...) to render the element. Other argu-
(Kauffmann et al., 2009), estimating the quality of mi-     ments allow a finer rendering control.
croarray data sets and cellHTS2 (Boutros et al., 2006),     > hwrite('Hello world !', 'doc.html',
performing the analysis of cell-based screens.                link='http://cran.r-project.org/')
    There are several tools for exporting data from
R into HTML documents. The package R2HTML is
able to render a large diversity of R objects in HTML                           Hello world !
but does not easily support combining them in a
structured layout and has a complex syntax. On the
other hand, the package xtable can render R matrices        > hwrite('Hello world !', 'doc.html',
with simple commands but cannot combine HTML                  style='color: red; font-size: 20pt;
elements and has limited formatting options.                  font-family: Gill Sans Ultra Bold')
    The package hwriter allows rendering R objects
in HTML and combining resulting elements in a
structured layout. It uses a simple syntax, supports
extensive formatting options and takes full advan-
                                                                    Hello world !
tage of the ellipsis ’...’ argument and R vector re-
cycling rules.
    Comprehensive documentation and examples of                Images can be included using hwriteImage which
hwriter are generated by running the command                supports diverse formatting options.
example(hwriter), which creates the package web             > img=system.file('images', c('iris1.jpg',
page http://www.ebi.ac.uk/~gpau/hwriter.                      'iris3.jpg'), package='hwriter')
                                                            > hwriteImage(img[2], 'doc.html')

The function hwrite
The generic function hwrite is the core function of
the package hwriter. hwrite(x, page=NULL, ...)
renders the object x in HTML using the formatting
arguments specified in ’...’ and returns a character
vector containing the HTML element code of x.
    If page is a filename, the HTML element is written
in this file. If it is an R connection/file, the element is
appended to this connection, allowing the sequential
building of HTML documents. If NULL, the returned           Arguments recycling
HTML element code can be concatenated with other
                                                            Objects written with formatting arguments contain-
elements with paste or nested inside other elements
                                                            ing more than one element are recycled by hwrite,
through subsequent hwrite calls. These operations
                                                            producing a vector of HTML elements.
are the tree structure equivalents of adding a sibling
node and adding a child node in the document tree.          > hwrite('test', 'doc.html', style=paste(
                                                              'color:', c('peru', 'purple', 'green')))

Formatting objects
The most basic call of hwrite is to output a character
                                                                             test test test
vector into an HTML document.


The R Journal Vol. 1/1, May 2009                                                                  ISSN 2073-4859
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R
Data Mining With R

More Related Content

Viewers also liked

Weblogic Server
Weblogic ServerWeblogic Server
Weblogic Serveracsvianabr
 
The Django Book - Chapter 6 the django admin site
The Django Book - Chapter 6  the django admin siteThe Django Book - Chapter 6  the django admin site
The Django Book - Chapter 6 the django admin siteVincent Chien
 
Securing the e health cloud
Securing the e health cloudSecuring the e health cloud
Securing the e health cloudBong Young Sung
 
saic annual reports 2003
saic annual reports 2003saic annual reports 2003
saic annual reports 2003finance42
 
WebLion Hosting: Leveraging Laziness, Impatience, and Hubris
WebLion Hosting: Leveraging Laziness, Impatience, and HubrisWebLion Hosting: Leveraging Laziness, Impatience, and Hubris
WebLion Hosting: Leveraging Laziness, Impatience, and HubrisErik Rose
 
Be2Awards and Be2Talks 2013 - event slides
Be2Awards and Be2Talks 2013 - event slidesBe2Awards and Be2Talks 2013 - event slides
Be2Awards and Be2Talks 2013 - event slidesBe2camp Admin
 
Cookies
CookiesCookies
Cookiesepo273
 
Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Igor Sfiligoi
 
East Algarve Magazine - NOVEMBER 2010
East Algarve Magazine - NOVEMBER 2010East Algarve Magazine - NOVEMBER 2010
East Algarve Magazine - NOVEMBER 2010Nick Eamag
 
2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-iLogesh Kumar Anandhan
 
ISC West 2014 Korea Pavilion Directory
ISC West 2014 Korea Pavilion DirectoryISC West 2014 Korea Pavilion Directory
ISC West 2014 Korea Pavilion DirectoryCindy Moon
 
Adobe Marketing Cloud Integration with Adobe AEM
Adobe Marketing Cloud Integration with Adobe AEMAdobe Marketing Cloud Integration with Adobe AEM
Adobe Marketing Cloud Integration with Adobe AEMDeepak Narisety
 
Dedo talk-2014-flat
Dedo talk-2014-flatDedo talk-2014-flat
Dedo talk-2014-flat23rd & 5th
 
120000 trang edu urls
120000 trang edu urls120000 trang edu urls
120000 trang edu urlssieuthi68
 

Viewers also liked (17)

Weblogic Server
Weblogic ServerWeblogic Server
Weblogic Server
 
The Django Book - Chapter 6 the django admin site
The Django Book - Chapter 6  the django admin siteThe Django Book - Chapter 6  the django admin site
The Django Book - Chapter 6 the django admin site
 
Securing the e health cloud
Securing the e health cloudSecuring the e health cloud
Securing the e health cloud
 
saic annual reports 2003
saic annual reports 2003saic annual reports 2003
saic annual reports 2003
 
ITIL v3 story
ITIL v3 storyITIL v3 story
ITIL v3 story
 
WebLion Hosting: Leveraging Laziness, Impatience, and Hubris
WebLion Hosting: Leveraging Laziness, Impatience, and HubrisWebLion Hosting: Leveraging Laziness, Impatience, and Hubris
WebLion Hosting: Leveraging Laziness, Impatience, and Hubris
 
Be2Awards and Be2Talks 2013 - event slides
Be2Awards and Be2Talks 2013 - event slidesBe2Awards and Be2Talks 2013 - event slides
Be2Awards and Be2Talks 2013 - event slides
 
Cookies
CookiesCookies
Cookies
 
Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012
 
East Algarve Magazine - NOVEMBER 2010
East Algarve Magazine - NOVEMBER 2010East Algarve Magazine - NOVEMBER 2010
East Algarve Magazine - NOVEMBER 2010
 
EdCamp News & UpDates
EdCamp News & UpDatesEdCamp News & UpDates
EdCamp News & UpDates
 
2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i
 
ISC West 2014 Korea Pavilion Directory
ISC West 2014 Korea Pavilion DirectoryISC West 2014 Korea Pavilion Directory
ISC West 2014 Korea Pavilion Directory
 
Uk norway ib directory
Uk norway ib directoryUk norway ib directory
Uk norway ib directory
 
Adobe Marketing Cloud Integration with Adobe AEM
Adobe Marketing Cloud Integration with Adobe AEMAdobe Marketing Cloud Integration with Adobe AEM
Adobe Marketing Cloud Integration with Adobe AEM
 
Dedo talk-2014-flat
Dedo talk-2014-flatDedo talk-2014-flat
Dedo talk-2014-flat
 
120000 trang edu urls
120000 trang edu urls120000 trang edu urls
120000 trang edu urls
 

Similar to Data Mining With R

R journal 2010-1
R journal 2010-1R journal 2010-1
R journal 2010-1Ajay Ohri
 
R journal 2011-2
R journal 2011-2R journal 2011-2
R journal 2011-2Ajay Ohri
 
R journal 2010-2
R journal 2010-2R journal 2010-2
R journal 2010-2Ajay Ohri
 
A Beginner's Guide to R.pdf
A Beginner's Guide to R.pdfA Beginner's Guide to R.pdf
A Beginner's Guide to R.pdfVeevThummuri
 
Think_Stats.pdf
Think_Stats.pdfThink_Stats.pdf
Think_Stats.pdfSukanyaSom
 
Standard definitions2011 ( FINAL DISPOSITIONS OF CASE CODES AND OUTCOMES RAT...
Standard definitions2011 ( FINAL  DISPOSITIONS OF CASE CODES AND OUTCOMES RAT...Standard definitions2011 ( FINAL  DISPOSITIONS OF CASE CODES AND OUTCOMES RAT...
Standard definitions2011 ( FINAL DISPOSITIONS OF CASE CODES AND OUTCOMES RAT...NATANIEL FAJAS COLOMBIANAS
 
Annotating Digital Documents For Asynchronous Collaboration
Annotating Digital Documents For Asynchronous CollaborationAnnotating Digital Documents For Asynchronous Collaboration
Annotating Digital Documents For Asynchronous CollaborationClaire Webber
 
Treemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataTreemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataBrian Johnson
 
An introductiontoappliedmultivariateanalysiswithr everit
An introductiontoappliedmultivariateanalysiswithr everitAn introductiontoappliedmultivariateanalysiswithr everit
An introductiontoappliedmultivariateanalysiswithr everitFredy Gomez Gutierrez
 
An intro to applied multi stat with r by everitt et al
An intro to applied multi stat with r by everitt et alAn intro to applied multi stat with r by everitt et al
An intro to applied multi stat with r by everitt et alRazzaqe
 
PENOEMBANW ALGORITMA tR4AGERRQCESSING UNTUK MENOUGA KEMASAKAN BUAH MMWSE -R (...
PENOEMBANW ALGORITMA tR4AGERRQCESSING UNTUK MENOUGA KEMASAKAN BUAH MMWSE -R (...PENOEMBANW ALGORITMA tR4AGERRQCESSING UNTUK MENOUGA KEMASAKAN BUAH MMWSE -R (...
PENOEMBANW ALGORITMA tR4AGERRQCESSING UNTUK MENOUGA KEMASAKAN BUAH MMWSE -R (...Repository Ipb
 
NEWCOMB-BENFORD’S LAW APPLICATIONS TO ELECTORAL PROCESSES, BIOINFORMATICS, AN...
NEWCOMB-BENFORD’S LAW APPLICATIONS TO ELECTORAL PROCESSES, BIOINFORMATICS, AN...NEWCOMB-BENFORD’S LAW APPLICATIONS TO ELECTORAL PROCESSES, BIOINFORMATICS, AN...
NEWCOMB-BENFORD’S LAW APPLICATIONS TO ELECTORAL PROCESSES, BIOINFORMATICS, AN...David Torres
 
Introduction to Statistical Learning with Appliations in R.pdf
Introduction to Statistical Learning with Appliations in R.pdfIntroduction to Statistical Learning with Appliations in R.pdf
Introduction to Statistical Learning with Appliations in R.pdfMaribelTacla1
 
ISLRv2_website.pdf
ISLRv2_website.pdfISLRv2_website.pdf
ISLRv2_website.pdfEsatCanli1
 
DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING.docx
DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING.docxDEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING.docx
DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING.docxtheodorelove43763
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsKelly Lipiec
 

Similar to Data Mining With R (20)

R journal 2010-1
R journal 2010-1R journal 2010-1
R journal 2010-1
 
R journal 2011-2
R journal 2011-2R journal 2011-2
R journal 2011-2
 
R journal 2010-2
R journal 2010-2R journal 2010-2
R journal 2010-2
 
A Beginner's Guide to R.pdf
A Beginner's Guide to R.pdfA Beginner's Guide to R.pdf
A Beginner's Guide to R.pdf
 
Think_Stats.pdf
Think_Stats.pdfThink_Stats.pdf
Think_Stats.pdf
 
Knapp_Masterarbeit
Knapp_MasterarbeitKnapp_Masterarbeit
Knapp_Masterarbeit
 
Final Report
Final ReportFinal Report
Final Report
 
Rand rr4212 (1)
Rand rr4212 (1)Rand rr4212 (1)
Rand rr4212 (1)
 
Standard definitions2011 ( FINAL DISPOSITIONS OF CASE CODES AND OUTCOMES RAT...
Standard definitions2011 ( FINAL  DISPOSITIONS OF CASE CODES AND OUTCOMES RAT...Standard definitions2011 ( FINAL  DISPOSITIONS OF CASE CODES AND OUTCOMES RAT...
Standard definitions2011 ( FINAL DISPOSITIONS OF CASE CODES AND OUTCOMES RAT...
 
Annotating Digital Documents For Asynchronous Collaboration
Annotating Digital Documents For Asynchronous CollaborationAnnotating Digital Documents For Asynchronous Collaboration
Annotating Digital Documents For Asynchronous Collaboration
 
Treemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataTreemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical Data
 
An introductiontoappliedmultivariateanalysiswithr everit
An introductiontoappliedmultivariateanalysiswithr everitAn introductiontoappliedmultivariateanalysiswithr everit
An introductiontoappliedmultivariateanalysiswithr everit
 
An intro to applied multi stat with r by everitt et al
An intro to applied multi stat with r by everitt et alAn intro to applied multi stat with r by everitt et al
An intro to applied multi stat with r by everitt et al
 
PENOEMBANW ALGORITMA tR4AGERRQCESSING UNTUK MENOUGA KEMASAKAN BUAH MMWSE -R (...
PENOEMBANW ALGORITMA tR4AGERRQCESSING UNTUK MENOUGA KEMASAKAN BUAH MMWSE -R (...PENOEMBANW ALGORITMA tR4AGERRQCESSING UNTUK MENOUGA KEMASAKAN BUAH MMWSE -R (...
PENOEMBANW ALGORITMA tR4AGERRQCESSING UNTUK MENOUGA KEMASAKAN BUAH MMWSE -R (...
 
NEWCOMB-BENFORD’S LAW APPLICATIONS TO ELECTORAL PROCESSES, BIOINFORMATICS, AN...
NEWCOMB-BENFORD’S LAW APPLICATIONS TO ELECTORAL PROCESSES, BIOINFORMATICS, AN...NEWCOMB-BENFORD’S LAW APPLICATIONS TO ELECTORAL PROCESSES, BIOINFORMATICS, AN...
NEWCOMB-BENFORD’S LAW APPLICATIONS TO ELECTORAL PROCESSES, BIOINFORMATICS, AN...
 
Introduction to Statistical Learning with Appliations in R.pdf
Introduction to Statistical Learning with Appliations in R.pdfIntroduction to Statistical Learning with Appliations in R.pdf
Introduction to Statistical Learning with Appliations in R.pdf
 
ISLRv2_website.pdf
ISLRv2_website.pdfISLRv2_website.pdf
ISLRv2_website.pdf
 
merged_document
merged_documentmerged_document
merged_document
 
DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING.docx
DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING.docxDEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING.docx
DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING.docx
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing Units
 

More from Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 

More from Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

Data Mining With R

  • 1. The Journal Volume 1/1, May 2009 A peer-reviewed, open-access publication of the R Foundation for Statistical Computing Contents Editorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Special section: The Future of R Facets of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Collaborative Software Development Using R-Forge . . . . . . . . . . . . . . . . . . . . . . . . 9 Contributed Research Articles Drawing Diagrams with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The hwriter Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 AdMit: Adaptive Mixtures of Student-t Distributions . . . . . . . . . . . . . . . . . . . . . . . 25 expert: Modeling Without Data Using Expert Opinion . . . . . . . . . . . . . . . . . . . . . . . 31 mvtnorm: New Numerical Algorithm for Multivariate Normal Probabilities . . . . . . . . . . 37 EMD: A Package for Empirical Mode Decomposition and Hilbert Spectrum . . . . . . . . . . 40 Sample Size Estimation while Controlling False Discovery Rate for Microarray Experiments Using the ssize.fdr Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Easier Parallel Computing in R with snowfall and sfCluster . . . . . . . . . . . . . . . . . . . 54 PMML: An Open Standard for Sharing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 News and Notes Forthcoming Events: Conference on Quantitative Social Science Research Using R . . . . . . 66 Forthcoming Events: DSC 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Forthcoming Events: useR! 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Conference Review: The 1st Chinese R Conference . . . . . . . . . . . . . . . . . . . . . . . . . 69 Changes in R 2.9.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Changes on CRAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 News from the Bioconductor Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 R Foundation News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
  • 2. 2 The Journal is a peer-reviewed publication of the R Foundation for Statistical Computing. Communications regarding this publication should be addressed to the editors. All articles are copyrighted by the respective authors. Please send submissions to regular columns to the respective column editor and all other sub- missions to the editor-in-chief or another member of the editorial board. More detailed submission instructions can be found on the Journal’s homepage. Editor-in-Chief: Vince Carey Channing Laboratory Brigham and Women’s Hospital 75 Francis St. Boston, MA 02115 USA Editorial Board: John Fox, Heather Turner, and Peter Dalgaard. Editor Programmer’s Niche: Bill Venables Editor Help Desk: Uwe Ligges R Journal Homepage: http://journal.r-project.org/ Email of editors and editorial board: firstname.lastname@R-project.org The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 3. 3 Editorial by Vince Carey first issue of the new Journal is purely accidental – the transition has been in development for several Early in the morning of January 7, 2009 — very early, years now, with critical guidance and support from from the perspective of readers in Europe and the the R Foundation, from a number of R Core mem- Americas, as I was in Almaty, Kazakhstan — I no- bers, and from previous editors. I am particularly ticed a somewhat surprising title on the front web thankful for John Fox’s assistance with this transi- page of the New York Times: “Data Analysts Capti- tion, and we are all indebted to co-editors Peter Dal- vated by R’s Power”. My prior probability that a fa- gaard and Heather Turner for technical work on the vorite programming language would find its way to new Journal’s computational infrastructure, under- the Times front page was rather low, so I navigated to taken on top of already substantial commitments of the article to check. There it was: an early version, editorial effort. minus the photographs. Later a few nice pictures While contemplating the impending step into a were added, and a Bits Blog http://bits.blogs. new publishing vehicle, John Fox conceived a series nytimes.com/2009/01/08/r-you-ready-for-r/ cre- of special invited articles addressing views of “The ated. As of early May, the blog includes 43 en- Future of R”. This issue includes two articles in this tries received from January 8 to March 12 and in- projected series. In the first, John Chambers analyzes cludes comment from Trevor Hastie clarifying histor- R into facets that help us to understand the ‘rich- ical roots of R in S, from Alan Zaslavsky reminding ness’ and ‘messiness’ of R’s programming model. readers of John Chambers’s ACM award and its role In the second, Stefan Theußl and Achim Zeileis de- in funding an American Statistical Association prize scribe R-Forge, a new infrastructure for contributing for students’ software development, and from nu- and managing source code of packages destined for merous others on aspects of various proprietary and CRAN. A number of other future-oriented articles non-proprietary solutions to data analytic comput- have been promised for this series; we expect to is- ing problems. Somewhat more recently, the Times sue a special edition of the Journal collecting these blog was back in the fold, describing a SAS-to-R in- when all have arrived. terface (February 16). As a native New Yorker, it is natural for me to be aware, and glad, of the Times’ The peer-reviewed contributions section includes coverage of R; consumers of other mainstream me- material on programming for diagram construction, dia products are invited to notify me of comparable HTML generation, probability model elicitation, in- coverage events elsewhere. tegration of the multivariate normal density, Hilbert As R’s impact continues to expand, R News has spectrum analysis, microarray study design, paral- transitioned to The R Journal. You are now reading lel computing support, and the predictive modeling the first issue of this new periodical, which continues markup language. Production of this issue involved the tradition of the newsletter in various respects: ar- voluntary efforts of 22 referees on three continents, ticles are peer-reviewed, copyright of articles is re- who will be acknowledged in the year-end issue. tained by authors, and the journal is published elec- tronically through an archive of freely downloadable Vince Carey PDF issues, easily found on the R project homepage Channing Laboratory, Brigham and Women’s Hospital and on CRAN. My role as Editor-In-Chief of this Vince.Carey@R-project.org The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 4. 4 The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 5. I NVITED S ECTION : T HE F UTURE OF R 5 Facets of R Special invited paper on “The Future of R” 2. interactive, hands-on in real time; by John M. Chambers 3. functional in its model of programming; 4. object-oriented, “everything is an object”; We are seeing today a widespread, and welcome, tendency for non-computer-specialists among statis- 5. modular, built from standardized pieces; and, ticians and others to write collections of R functions 6. collaborative, a world-wide, open-source effort. that organize and communicate their work. Along with the flood of software sometimes comes an atti- None of these facets is original to R, but while many tude that one need only learn, or teach, a sort of basic systems emphasize one or two of them, R continues how-to-write-the-function level of R programming, to reflect them all, resulting in a programming model beyond which most of the detail is unimportant or that is indeed rich but sometimes messy. can be absorbed without much discussion. As delu- The thousands of R packages available from sions go, this one is not very objectionable if it en- CRAN, BioConductor, R-Forge, and other reposi- courages participation. Nevertheless, a delusion it is. tories, the uncounted other software contributions In fact, functions are only one of a variety of impor- from groups and individuals, and the many citations tant facets that R has acquired by intent or circum- in the scientific literature all testify to the useful com- stance during the three-plus decades of the history of putations created within the R model. Understand- the software and of its predecessor S. To create valu- ing how the various facets arose and how they can able and trustworthy software using R often requires work together may help us improve and extend that an understanding of some of these facets and their software. interrelations. This paper identifies six facets, dis- We introduce the facets more or less chronolog- cussing where they came from, how they support or ically, in three pairs that entered into the software conflict with each other, and what implications they during successive intervals of roughly a decade. To have for the future of programming with R. provide some context, here is a brief chronology, with some references. The S project began in our statistics research group at Bell Labs in 1976, evolved Facets into a generally licensed system through the 1980s and continues in the S+ software, currently owned by Any software system that has endured and retained TIBCO Software Inc. The history of S is summarized a reasonably happy user community will likely in the Appendix to Chambers (2008); the standard have some distinguishing characteristics that form books introducing still-relevant versions of S include its style. The characteristics of different systems are Becker et al. (1988) (no longer in print), Chambers usually not totally unrelated, at least among systems and Hastie (1992), and Chambers (1998). R was an- serving roughly similar goals—in computing as in nounced to the world in Ihaka and Gentleman (1996) other fields, the set of truly distinct concepts is not and evolved fairly soon to be developed and man- that large. But in the mix of different characteris- aged by the R-core group and other contributors. Its tics and in the details of how they work together lies history is harder to pin down, partly because the much of the flavor of a particular system. story is very much still happening and partly be- Understanding such characteristics, including a cause, as a collaborative open-source project, indi- bit of the historical background, can be helpful in vidual responsibility is sometimes hard to attribute; making better use of the software. It can also guide in addition, not all the important new ideas have thinking about directions for future improvements of been described separately by their authors. The the system itself. http://www.r-project.org site points to the on-line The R software and the S software that preceded manuals, as well as a list of over 70 related books and it reflect a rather large range of characteristics, result- other material. The manuals are invaluable as a tech- ing in software that might be termed rich or messy nical resource, but short on background information. according to one’s taste. Since R has become a very Articles in R News, the predecessor of this journal, widely used environment for applications and re- give descriptions of some of the key contributions, search in data analysis, understanding something of such as namespaces (Tierney, 2003), and internation- these characteristics may be helpful to the commu- alization (Ripley, 2005). nity. This paper considers six characteristics, which we will call facets. They characterize R as: An interactive interface 1. an interface to computational procedures of The goals for the first version of S in 1976 already many kinds; combined two facets of computing usually kept The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 6. 6 I NVITED S ECTION : T HE F UTURE OF R apart, interactive computing and the development of Functional and object-oriented pro- procedural software for scientific applications. gramming One goal was a better interface to new computa- tional procedures for scientific data analysis, usually During the 1980s the topics of functional programming implemented then as Fortran subroutines. Bell Labs and object-oriented programming stimulated growing statistics research had collected and written a variety interest in the computer science literature. At the of these while at the same time numerical libraries same time, I was exploring possibilities for the next and published algorithm collections were establish- S, perhaps “after S”. These ideas were eventually ing many of the essential computational techniques. blended with other work to produce the “new S”, These procedures were the essence of the actual data later called Version 3 of S, or S3, (Becker et al., 1988). analysis; the initial goal was essentially a better in- As before, user expressions were still interpreted, terface to procedural computation. A graphic from but were now parsed into objects representing the the first plans for S (Chambers, 2008, Appendix, page language elements, mainly function calls. The func- 476) illustrated this. tions were also objects, the result of parsing expres- sions in the language that defined the function. As But that interface was to be interactive, specifically the motto expressed it, “Everything is an object”. The via a language used by the data analyst communicat- procedural interface facet was not abandoned, how- ing in real time with the software. At this time, a few ever, but from the user’s perspective it was now de- systems had pointed the way for interactive scientific fined by functions that provided an interface to a C computing, but most of them were highly specialized or Fortran subroutine. with limited programming facilities; that is, a limited The new programming model was functional in ability for the user to express novel computations. two senses. It largely followed the functional pro- The most striking exception was APL, created by gramming paradigm, which defined programming Kenneth Iverson (1962), an operator-based interac- as the evaluation of function calls, with the value tive language with heavy emphasis on general mul- uniquely determined by the objects supplied as ar- tiway arrays. APL introduced conventions that now guments and with no external side-effects. Not all seem obvious; for example, the result of evaluating functions followed this strictly, however. a user’s expression was either an assignment or a The new version was functional also in the sense printed display of the computed object (rather than that nearly everything that happened in evaluation expecting users to type a print command). But APL was a function call. Evaluation in R is even more at the time had no notion of an interface to proce- functional in that language components such as as- dures, which along with a limited, if exotic, syntax signment and loops are formally function calls inter- caused us to reject it, although S adopted a number nally, reflecting in part the influence of Lisp on the of features from it, such as its approach to general initial implementation of R. multidimensional arrays. Subsequently, the programing model was ex- tended to include classes of objects and methods From its first implementation, the S software that specialized functions according to the classes combined these two facets: users carried out data of their arguments. That extension itself came in analysis interactively by writing expressions in the two stages. First, a thin layer of additional code S language, but much of the programming to extend implemented classes and methods by two simple the system was via new Fortran subroutines accom- mechanisms: objects could have an attribute defin- panied by interfaces to call them from S. The inter- ing their class as a character string or a vector of faces were written in a special language that was multiple strings; and an explicitly invoked method compiled into Fortran. dispatch function would match the class of the func- The result was already a mixed system that had tion’s first argument against methods, which were much of the tension between human and machine simply saved function objects with a class string ap- efficiency that still stimulates debate among R users pended to the object’s name. In the second stage, the and programmers. At the same time, the flexibility version of the system known as S4 provided formal from that same mixture helped the software to grow class definitions, stored as a special metadata object, and adapt with a growing user community. and similarly formal method definitions associated with generic functions, also defined via object classes Later versions of the system replaced the inter- (Chambers, 2008, Chapters 9 and 10, for the R ver- face language with individual functions providing sion). interfaces to C or Fortran, but the combination of The functional and object-oriented facets were an interactive language and an interface to com- combined in S and R, but they appear more fre- piled procedures remains a key feature of R. Increas- quently in separate, incompatible languages. Typi- ingly, interfaces to other languages and systems have cal object-oriented languages such as C++ or Java are been added as well, for example to database systems, not functional; instead, their programming model is spreadsheets and user interface software. of methods associated with a class rather than with a The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 7. I NVITED S ECTION : T HE F UTURE OF R 7 function and invoked on an object. Because the ob- essential tools provided in R to create, develop, test, ject is treated as a reference, the methods can usu- and distribute packages. Among many benefits for ally modify the object, again quite a different model users, these tools help programmers create software from that leading to R. The functional use of meth- that can be installed and used on the three main op- ods is more complicated, but richer and indeed nec- erating systems for computing with data—Windows, essary to support the essential role of functions. A Linux, and Mac OS X. few other languages do combine functional structure with methods, such as Dylan, (Shalit, 1996, chapters The sixth facet for our discussion is that R is a col- 5 and 6), and comparisons have proved useful for R, laborative enterprise, which a large group of people but came after the initial design was in place. share in and contribute to. Central to the collabo- rative facet is, of course, that R is a freely available, open-source system, both the core R software and Modular design and collaborative most of the packages built upon it. As most read- ers of this journal will be aware, the combination of support the collaborative efforts and the package mechanism have enormously increased the availability of tech- Our third pair of facets arrives with R. The paper by niques for data analysis, especially the results of new Ihaka and Gentleman (1996) introduced a statistical research in statistics and its applications. The pack- software system, characterized later as “not unlike age management tools, in fact, illustrate the essential S”, but distinguished by some ideas intended to im- role of collaboration. Another highly collaborative prove on the earlier system. A facet of R related to contribution has facilitated use of R internationally some of those ideas and to important later develop- (Ripley, 2005), both by the use of locales in compu- ments is its approach to modularity. In a number of tations and by the contribution of translations for R fields, including architecture and computer science, documentation and messages. modules are units designed to be useful in them- selves and to fit together easily to create a desired While the collaborative facet of R is the least tech- result, such as a building or a software system. R has nical, it may eventually have the most profound im- two very important levels of modules: functions and plications. The growth of the collaborative commu- packages. nity involved is unprecedented. For example, a plot Functions are an obvious modular unit, espe- in Fox (2008) shows that the number of packages cially functions as objects. R extended the modu- in the CRAN repository exhibited literal exponen- larity of functions by providing them with an envi- tial growth from 2001 to 2008. The current size of ronment, usually the environment in which the func- this and other repositories implies at a conservative tion object was created. Objects assigned in this en- estimate that several hundreds of contributors have vironment can be accessed by name in a call to the mastered quite a bit of technical expertise and satis- function, and even modified, by stepping outside fied some considerable formal requirements to offer the strict functional programming model. Functions their efforts freely to the worldwide scientific com- sharing an environment can thus be used together munity. as a unit. The programming technique resulting is usually called programming with closures in R; for This facet does have some technical implications. a discussion and comparison with other techniques, One is that, being an open-source system, R is gener- see (Chambers, 2008, section 5.4). Because the eval- ally able to incorporate other open-source software uator searches for external objects in the function’s and does indeed do so in many areas. In contrast, environment, dependencies on external objects can proprietary systems are more likely to build exten- be controlled through that environment. The names- sions internally, either to maintain competitive ad- pace mechanism (Tierney, 2003), an important con- vantage or to avoid the free distribution require- tribution to trustworthy programming with R, uses ments of some open-source licenses. Looking for this technique to help ensure consistent behavior of quality open-source software to extend the system functions in a package. should continue to be a favored strategy for R de- The larger module introduced by R is probably velopment. the most important for the system’s growing use, the package. As it has evolved, and continues to evolve, On the downside, a large collaborative enterprise an R package is a module combining in most cases with a general practice of making collective decisions R functions, their documentation, and possibly data has a natural tendency towards conservatism. Rad- objects and/or code in other languages. ical changes do threaten to break currently working While S always had the idea of collections of soft- features. The future benefits they might bring will of- ware for distribution, somewhat formalized in S4 ten not be sufficiently persuasive. The very success as chapters, the R package greatly extends and im- of R, combined with its collaborative facet, poses a proves the ability to share computing results as ef- challenge to cultivate the “next big step” in software fective modules. The mechanism benefits from some for data analysis. The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 8. 8 I NVITED S ECTION : T HE F UTURE OF R Implications “workarounds” are a good thing or not. One may feel that the total effort devoted to multiple approx- A number of specific implications of the facets of R, imate solutions would have been more productive if and of their interaction, have been noted in previous coordinated and applied to improving the core im- sections. Further implications are suggested in con- plementation. In practice, however, the nature of the sidering the current state of R and its future possibil- collaborative effort supporting R makes it much eas- ities. ier to obtain a modest effort from one or a few indi- Many reasons can be suggested for R’s increasing viduals than to organize and execute a larger project. popularity. Certainly part of the picture is a produc- Perhaps the growth of R will encourage the com- tive interaction among the interface facet, the modu- munity to push for such projects, with a new view larity of packages, and the collaborative, open-source of funding and organization. Meanwhile, we can approach, all in an interactive mode. From the very hope that the growing communication facilities in beginning, S was not viewed as a set of procedures the community will assess and improve on the ex- (the model for SAS, for example) but as an interactive isting efforts. Particularly helpful facilities in this system to support the implementation of new pro- context include the CRAN task views (http://cran. cedures. The growth in packages noted previously r-project.org/web/views/), the mailing lists and reflects the same view. publications such as this journal. The emphasis on interfaces has never diminished and the range of possible procedures across the in- terface has expanded greatly. That R’s growth has Bibliography been especially strong in academic and other re- search environments is at least partly due to the abil- R. A. Becker, J. M. Chambers, and A. R. Wilks. The ity to write interfaces to existing and new software New S Language. Chapman & Hall, London, 1988. of many forms. The evolution of the package as J. M. Chambers. Programming with Data. Springer, an effective module and of the repositories for con- New York, 1998. URL http://cm.bell-labs. tributed software from the community have resulted com/cm/ms/departments/sia/Sbook/. ISBN 0-387- in many new interfaces. 98503-4. The facets also have implications for the possible future of R. The hope is that R can continue to sup- J. M. Chambers. Software for Data Analysis: Program- port the implementation and communication of im- ming with R. Springer, New York, 2008. ISBN 978- portant techniques for data analysis, contributed and 0-387-75935-7. used by a growing community. At the same time, J. M. Chambers and T. J. Hastie. Statistical Mod- those interested in new directions for statistical com- els in S. Chapman & Hall, London, 1992. ISBN puting will hope that R or some future alternative 9780412830402. engages a variety of challenges for computing with data. The collaborative community seems the only J. Fox. Editorial. R News, 8(2):1–2, October 2008. URL likely generator of the new features required but, as http://CRAN.R-project.org/doc/Rnews/. noted in the previous section, combining the new with maintenance of a popular current collaborative R. Ihaka and R. Gentleman. R: A language for data system will not be easy. analysis and graphics. Journal of Computational and The encouragement for new packages generally Graphical Statistics, 5:299–314, 1996. and for interfaces to other software in particular are K. E. Iverson. A Programming Language. Wiley, 1962. relevant for this question also. Changes that might be difficult if applied internally to the core R im- B. D. Ripley. Internationalization features of R 2.1.0. plementation can often be supplied, or at least ap- R News, 5(1):2–7, May 2005. URL http://CRAN. proximated, by packages. Two examples of enhance- R-project.org/doc/Rnews/. ments considered in discussions are an optional ob- A. Shalit. The Dylan Reference Manual. Addison- ject model using mutable references and computa- Wesley Developers Press, Reading, Mass., 1996. tions using multiple threads. In both cases, it can ISBN 0-201-44211-6. be argued that adding the facility to R could extend its applicability and performance. But a substantial L. Tierney. Name space management for R. R News, programming effort and knowledge of the current 3(1):2–6, June 2003. URL http://CRAN.R-project. implementation would be required to modify the in- org/doc/Rnews/. ternal object model or to make the core code thread- safe. Issues of back compatibility are likely to arise John M. Chambers as well. Meanwhile, less ambitious approaches in Department of Statistics the form of packages that approximate the desired Stanford University behavior have been written. For the second exam- USA ple especially, these may interface to other software jmc@r-project.org designed to provide this facility. Opinions may reasonably differ on whether these The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 9. I NVITED S ECTION : T HE F UTURE OF R 9 Collaborative Software Development Using R-Forge Special invited paper on “The Future of R” Forge, host a project-specific website, and how pack- age maintainers submit a package to the Compre- by Stefan Theußl and Achim Zeileis hensive R Archive Network (CRAN, http://CRAN. R-project.org/). Finally, we summarize recent de- velopments and give a brief outlook to future work. Introduction Open source software (OSS) is typically created in a R-Forge decentralized self-organizing process by a commu- nity of developers having the same or similar inter- R-Forge offers a central platform for the develop- ests (see the famous essay by Raymond, 1999). A ment of R packages, R-related software and other key factor for the success of OSS over the last two projects. decades is the Internet: Developers who rarely meet R-Forge is based on GForge (Copeland et al., face-to-face can employ new means of communica- 2006) which is an open source fork of the 2.61 Source- tion, both for rapidly writing and deploying software Forge code maintained by Tim Perdue, one of the (in the spirit of Linus Torvald’s “release early, release original SourceForge authors. GForge has been mod- often paradigm”). Therefore, many tools emerged ified to provide additional features for the R commu- that assist a collaborative software development pro- nity, namely a CRAN-style repository for hosting de- cess, including in particular tools for source code velopment releases of R packages as well as a quality management (SCM) and version control. management system similar to that of CRAN. Pack- In the R world, SCM is not a new idea; in fact, ages hosted on R-Forge are provided in source form the R Development Core Team has always been us- as well as in binary form for Mac OS X and Windows. ing SCM tools for the R sources, first by means of They can be downloaded from the website of the cor- Concurrent Versions System (CVS, see Cederqvist et responding project on R-Forge or installed directly in al., 2006), and then via Subversion (SVN, see Pilato R; for a package foo, say, install.packages("foo", et al., 2004). A central repository is hosted by ETH repos = "http://R-Forge.R-project.org"). Zürich mainly for managing the development of the On R-Forge, developers organize their work in base R system. Mailing lists like R-help, R-devel and so-called “Projects”. Every project has various tools many others are currently the main communication and web-based features for software development, channels in the R community. communcation and other services. All features men- Also beyond the base system, many R con- tioned in the following sections are accessible via so- tributors employ SCM tools for managing their R called “Tabs”: e.g., user accounts can be managed in packages, e.g., via web-based SVN repositories like the My Page tab or a list of available projects can be SourceForge (http://SourceForge.net/) or Google displayed using the Project Tree tab. Code (http://Code.Google.com/). However, there Since starting the platform in early 2007, more has been no central SCM repository providing ser- and more interested users registered their projects on vices suited to the specific needs of R package de- R-Forge. Now, after more than two years of develop- velopers. Since early 2007, the R-project offers such ment and testing, around 350 projects and more than a central platform to the R community. R-Forge 900 users are registered on R-Forge. This and the (http://R-Forge.R-project.org/) provides a set of steadily growing list of feature requests show that tools for source code management and various web- there is a high demand for centralized source code based features. It aims to provide a platform for management tools and for releasing prototype code collaborative development of R packages, R-related frequently among the R community. software or further projects. R-Forge is closely re- In the next three sections, we summarize the core lated to the most famous of such platforms—the features of R-Forge and what R-Forge offers to the R world’s largest OSS development website—namely community in terms of collaborative development of http://SourceForge.net/. R-related software projects. The remainder of this article is organized as fol- lows. First, we present the core features that R- Source code management Forge offers to the R community. Second, we give a hands-on tutorial on how users and developers can When carrying out software projects, source files get started with R-Forge. In particular, we illustrate change over time, new files get created and old files how people can register, set up new projects, use R- deleted. Typically, several authors work on several Forge’s SCM facilities, provide their packages on R- computers on the same and/or different files and The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 10. 10 I NVITED S ECTION : T HE F UTURE OF R keeping track of every change can become a tedious velopers so that they can improve the package to task. In the open source community, the general so- pass all tests on R-Forge and subsequently on CRAN. lution to this problem is to use version control, typi- Second, bug tracking systems allow users to no- cally provided by the majority of SCM tools. For this tify package authors about problems they encounter. reason R-Forge utilizes SVN to facilitate the devel- In the spirit of OSS—given enough eyeballs, all oper’s work when creating software. bugs are shallow (Raymond, 1999)—peer code re- A central repository ensures that the developer view leads to an overall improvement of the quality always has access to the current version of the of software projects. project’s source code. Any of the authorized col- laborators can “check out” (i.e., download) or “up- Additional features date” the project file structure, make the necessary changes or additions, delete files from the current re- A number of further tools, of increased interest for vision and finally “commit” changes or additions to larger projects, help developers to coordinate their the repository. More than that, SVN keeps track of work and to communicate with their user base. the complete history of the project file structure. At These tools include: any point in the development stage it is possible to go back to any previous stage in the history to inspect • Project websites: Developers may present their and restore old files. This is called version control, as work on a subdomain of R-Forge, e.g., http: every stage automatically is assigned a unique ver- //foo.R-Forge.R-project.org/, or via a link sion number which increases over time. to an external website. On R-Forge such a version-controlled repository is automatically created for each project. To get • Mailing lists: By default a list foo-commits@ started, the project members just have to install the lists.R-Forge.R-project.org is automati- client of their choice (e.g., Tortoise SVN on Windows cally created for each project. Additional mail- or svnX on Mac OS X) and check out the repository. ing lists can be set up easily. In addition to the inherent backup of every version • Project categorization: Administrators may within the repository a backup of the whole reposi- categorize their project in several so-called tory is generated daily. “Trove Categories” in the Admin tab of their A rights management system assures that, by de- project (under Trove Categorization). For each fault, anonymous users have read access and devel- category three different items can be selected. opers have write access to the data associated with a This enables users to quickly find what they are certain project on R-Forge. More precisely, registered looking for using the Project Tree tab. users can be granted one of several roles: e.g., the “Administrator” has all rights including the right to • News: Announcements and other information add new users to the project or release packages di- related to a project can be put on the project rectly to CRAN. He/she is usually the package main- summary page as well as on the home page of tainer, the project leader or has registered the project R-Forge. The latter needs approval by one of originally. Other members of a project typically have the R-Forge administrators. All items are avail- either the role “Senior Developer” or “Junior Devel- able as RSS feeds. oper” which both have permission to commit to the project SVN repository and examine the log files in • Forums: Discussion forums can be set up sepa- the R Packages tab (the differences between the two rately by the project administrators. roles are subtle, e.g., senior developers additionally have administrative privileges in several places in the project). When we speak of developers in sub- How to get started sequent sections we refer to project members having the rights at least of a junior developer. This section is intended to be a hands-on tutorial for new users. Depending on familiarity with the sys- tems/tools involved the instructions might be some- Release and quality management what brief. In case of problems we recommend con- sulting the user’s manual (R-Forge Administration Development versions of a software project are typ- and Development Team, 2008) which contains de- ically prototypes and are subject to many changes. tailed step-by-step instructions. Thus, R-Forge offers two tools which assist the devel- When accessing the URL http://R-Forge. opers in improving the quality of their source code. R-project.org/ the home page is presented (see First, it offers a quality management system sim- Figure 1). Here one can ilar to that of CRAN. Packages on R-Forge are checked in a standardized way on different plat- • login, forms based on R CMD check at least once daily. The resulting log files can be examined by the project de- • register a user or a project, The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 11. I NVITED S ECTION : T HE F UTURE OF R 11 Figure 1: Home page of R-Forge on 2009-03-10 • download the documentation, Registering a project There are two possibilities to register a project: Click- • examine the latest news and changes, ing on Register Your Project on the home page or going to the My Page tab and clicking on Register Project (see • go to a specific project website either by search- Figure 2 below the main tabs). Both links lead to a ing for available projects (top middle of the form which has to be filled out in order to finish the page), by clicking on one of the projects listed registration process. In the text field “Project Pub- on the right, or by going through the listing in lic Description” the registrant is supposed to enter a the Project Tree tab. concise description of the project. This text and the “Project Full Name” will be shown in several places Registered users can access their personal page on R-Forge (see e.g., Figure 1). The text entered in the via the tab named My Page. Figure 2 shows the per- field “Project Purpose And Summarization” is addi- sonal page of the first author. tional information for the R-Forge administrators, in- spected for approval. “Project Unix Name” refers to the name which uniquely determines the project. In Registering as a new user the case of a project that contains a single R pack- age, the project Unix name typically corresponds to To use R-Forge as a developer, one has to register as the package name (in its lower-case version). Restric- a site user. A link on the main web site called New tions according to the Unix file system convention Account on the top right of the home page leads to force Unix names to be in lower case (and will be the corresponding registration form. converted automatically if they are typed in upper After submitting the completed form, an e-mail is case). sent to the given address containing a link for activat- After the completed form is submitted, the ing the account. Subsequently, all R-Forge features, project has to be approved by the R-Forge admin- including joining existing or creating new projects, istrators and a confirmation e-mail is sent to the are available to logged-in users. registrant upon approval. After that, the regis- The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 12. 12 I NVITED S ECTION : T HE F UTURE OF R Figure 2: The My Page tab of the first author trant automatically becomes the project administra- if the package is already version-controlled some- tor and the standardized web area of the project where else—the corresponding parts of the reposi- (http://R-Forge.R-project.org/projects/foo/) is tory including the history can be migrated to R-Forge immediately available on R-Forge. This web area in- (see Section 6 of the user’s manual). cludes a Summary page, an Admin tab visible only to The SCM tab of a project explains how the project administrators, and various other tabs de- the corresponding SVN repository located at pending on the features enabled for this project in the svn://svn.R-Forge.R-project.org/svnroot/foo Admin tab. To present the new project to a broader can be checked out. From this URL the sources are community the name of the project additionally is checked out either anonymously without write per- promoted on the home page under “Recently Reg- mission (enabled by default) or as developer using istered Projects” (see Figure 1). an encrypted authentication method, namely secure Furthermore, within an hour after approval a de- shell (ssh). Via this secure protocol (svn+ssh:// fol- fault mailing list and the project’s SVN repository lowed by the registered user name) developers have containing a ‘README’ file and two pre-defined di- full access to the repository. Typically, the developer rectories called ‘pkg’ and ‘www’ are created. The con- is authenticated via the registered password but it is tent of these directories is used by the R-Forge server also possible to upload a secure shell key (updated for creating R packages and a project website, respec- hourly) to make use of public/private key authenti- tively. cation. Section 3.2 of the user’s manual explains this process in more detail. SCM and R packages To make use of the package builds and checks the package source code has to be put into the ‘pkg’ The first step after creation of a project is typically directory of the repository (i.e., ‘pkg/DESCRIPTION’, to start generation of content for one (or more) R ‘pkg/R’, ‘pkg/man’, etc.) or, alternatively, a subdirec- package(s) in the ‘pkg’ directory. Developers can ei- tory of ‘pkg’. The latter structure allows developers to ther start committing changes via SVN as usual or— have more than one package in a single project; e.g., The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 13. I NVITED S ECTION : T HE F UTURE OF R 13 if a project consists of the packages foo and bar, then Recent and future developments the source code is located in ‘pkg/foo’ and ‘pkg/bar’, respectively. In this section, we briefly summarize the major R-Forge automatically examines the ‘pkg’ direc- changes and updates to the R-Forge system during tory of every repository and builds the package the last few months and give an outlook to future sources as well as the package binaries on a daily developments. basis for Mac OS X and Windows (if applicable). Recently added features and major changes in- The package builds are provided in the R Pack- clude: ages tab (see Figure 3) for download or can be in- stalled directly in R using install.packages("foo", • New defaults for freshly registered projects: repos="http://R-Forge.R-project.org"). Further- Only the tabs Lists, SCM and R packages are en- more, in the R Packages tab developers can examine abled initially. Forums, Tracker and News can logs of the build and check process on different plat- be activated separately (in order not to over- forms. whelm new users with too many features) us- To release a package to CRAN the project ad- ing Edit Public Info in the Admin tab of the ministrator clicks on Submit this package to CRAN in project. Experienced users can decide which the R Packages tab. Upon confirmation, a message features they want and activate them. will be sent to CRAN@R-project.org and the latest successfully built source package (the ‘.tar.gz’ file) • An enhanced structure in the SVN repository is automatically copied to ftp://CRAN.R-project. allowing multiple packages in a single project org/incoming/. Note that packages are built once (see above). daily, i.e., the latest source package does not include • The R package RForgeTools (Theußl, 2009) more recent code committed to the SVN repository. contains platform-independent package build- Figure 3 shows the R Packages tab of the tm ing and quality management code used on the project (http://R-Forge.R-project.org/projects/ R-Forge servers. tm/) as one of the project administrators would see it. Depending on whether you are a member of the • A modified News submit page offering two project or not and your rights you will see only parts types of submissions: project-specific and of the information provided for each package. global news. The latter needs approval by one of the R-Forge administrators (default: project- Further steps only submission). A customized project website, accessible through • Circulation of SVN commit messages can be http://foo.R-Forge.R-project.org/ where foo enabled in the SCM Admin tab by project ad- corresponds to the unix name of the project, is man- ministrators. The mailing list mentioned in the aged via the ‘www’ directory. The website gets up- text field is used for delivering the SVN commit dated every hour. messages (default: off). The changes made to the project can be exam- • Mailing list search facilities provided by the ined by entering the corresponding standardized Swish-e engine which can be accessed via the web area. On entry, the Summary page is shown. List tab (private lists are not included in the Here, one can search index). • examine the details of the project including a Further features and improvements which are short description and a listing of the adminis- currently on the wishlist or under development in- trators and developers, clude • follow a link leading to the project homepage, • a Wiki, • examine the latest news announcements (if available), • task management facilities, • go to other sections of the project like Forums, • a re-organized tracker more compatible with R Tracker, Lists, R Packages, etc. package development and, • follow the download link leading directly to • an update of the underlying GForge sys- the available packages of the project (i.e., the tem to its successor FusionForge (http:// R Packages tab). FusionForge.org/). Furthermore, meta-information about a project For suggestions, problems, feature requests, and can be supplied in the Admin tab via so-called “Trove other questions regarding R-Forge please contact Categorization”. R-Forge@R-project.org. The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 14. 14 I NVITED S ECTION : T HE F UTURE OF R Figure 3: R Packages tab of the tm project Acknowledgments 2004. Full book available online at http:// svnbook.red-bean.com/. Setting up this project would not have been possi- ble without Douglas Bates and the University of Wis- R-Forge Administration and Development Team. R- consin as they provided us with a server for hosting Forge User’s Manual, 2008. URL http://download. this platform. Furthermore, the authors would like R-Forge.R-project.org/R-Forge.pdf. to thank the Computer Center of the Wirtschaftsuni- E. S. Raymond. The Cathedral and the Bazaar. versität Wien for their support and for providing us Knowledge, Technology, and Policy, 12(3):23–49, 1999. with additional hardware as well as a professional server infrastructure. S. Theußl. RForgeTools: R-Forge Build and Check Tools, 2009. URL http://R-Forge.R-project.org/ projects/site/. R package version 0.3-3. Bibliography P. Cederqvist et al. Version Management with CVS. Stefan Theußl Network Theory Limited, Bristol, 2006. Full Department of Statistics and Mathematics book available online at http://ximbiot.com/ WU Wirtschaftsuniversität Wien cvs/manual/. Austria T. Copeland, R. Mas, K. McCullagh, T. Per- stefan.theussl@wu.ac.at due, G. Smet, and R. Spisser. GForge Man- ual, 2006. URL http://gforge.org/gf/download/ Achim Zeileis docmanfileversion/8/1700/gforge_manual.pdf. Department of Statistics and Mathematics WU Wirtschaftsuniversität Wien C. M. Pilato, B. Collins-Sussman, and B. W. Fitz- Austria patrick. Version Control with Subversion. O’Reilly, achim.zeileis@wu.ac.at The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 15. C ONTRIBUTED R ESEARCH A RTICLES 15 Drawing Diagrams with R by Paul Murrell of creating this sort of diagram by pushing objects around the screen with a mouse fills me with dread. R provides a number of well-known high-level facil- Maybe I’m just not a very GUI guy. ities for producing sophisticated statistical plots, in- Before we look at drawing diagrams with the core cluding the “traditional” plots in the graphics pack- R graphics facilties, it is important to acknowledge age (R Development Core Team, 2008), the Trellis- that several contributed R packages already pro- style plots provided by lattice (Sarkar, 2008), and the vide facilities for drawing diagrams. The Rgraphviz grammar-of-graphics-inspired approach of ggplot2 (Gentry et al., 2008) and igraph (Csardi and Nepusz, (Wickham, 2009). 2006) packages provide automated layout of node- However, R also provides a powerful set of low- and-edge graphs, and the shape and diagram pack- level graphics facilities for drawing basic shapes and, ages (Soetaert, 2008b,a) provide functions for draw- more importantly, for arranging those shapes relative ing nodes of various shapes with lines and arrows to each other, which can be used to draw a wide vari- between them, with manual control over layout. ety of graphical images. This article highlights some In this article, we will only be concerned with of R’s low-level graphics facilities by demonstrating drawing diagrams with a small number of elements, their use in the production of diagrams. In particular, so we do not need the automated layout facilities the focus will be on some of the useful things that can of Rgraphviz or igraph. Furthermore, while the be done with the low-level facilities provided by the shape and diagram packages provide flexible tools grid graphics package (Murrell, 2002, 2005b,a). for building node-and-edge diagrams, the point of this article is to demonstrate low-level grid func- tions. We will use a node-and-edge diagram as the Starting at the end motivation, but the underlying ideas can be applied to a much wider range of applications. An example of the type of diagram that we are going In each of the following sections, we will meet a to work towards is shown below. We have several basic low-level graphical tool and demonstrate how “boxes” that describe table schema for a database, it can be used in the generation of the pieces of an with lines and arrows between the boxes to show the overall diagram, or how the tool can be used to com- relationships between tables. bine pieces together in convenient ways. book_table publisher_table book_author_table ID ISBN title ID name Graphical primitives pub q country book q author q One of the core low-level facilities of R graphics is the author_table ID gender_table ability to draw basic shapes. The typical graphical ID name gender primitives such as text, circles, lines, and rectangles gender q are all available. In this case, the shape of each box in our dia- To forestall some possible misunderstandings, the gram is not quite as simple as a rectangle because it sort of diagram that we are talking about is one that has rounded corners. However, a rounded rectangle is designed by hand. This is not a diagram that has is also one of the graphical primitives that the grid been automatically laid out. package provides.1 The sort of diagram being addressed is one where The code below draws a rounded rectangle with the author of the diagram has a clear idea of what the a text label in the middle. end result will roughly look like—the sort of diagram that can be sketched with pen and paper. The task is > library(grid) to produce a pre-planned design, using a computer to get a nice crisp result. > grid.roundrect(width=.25) That being said, a reasonable question is “why > grid.text("ISBN") not draw it by hand?”, for example, using a free- hand drawing program such as Dia (Larsson, 2008). The advantage of using R code to produce this sort ISBN of image is that code is easier to reproduce, reuse, maintain, and fine-tune with accuracy. The thought 1 From R version 2.9.0; prior to that, a simpler rounded rectangle was available via the grid.roundRect() function in the RGraphics package. The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 16. 16 C ONTRIBUTED R ESEARCH A RTICLES Viewports number of labels and the width of the viewport is based on the width of the largest label, plus a 2 mm A feature of the boxes in the diagram at the begin- gap either side. This code also simplifies the labelling ning of this article is that the text is carefully posi- by drawing both labels in a single grid.text() call. tioned relative to the rounded rectangle; the text is left-aligned within the rectangle. This careful posi- > labels <- c("ISBN", "title") tioning requires knowing where the left edge of the > vp <- rectangle is on the page. Calculating those positions viewport(width=max(stringWidth(labels))+ is annoyingly tricky and only becomes more annoy- unit(4, "mm"), ing if at some later point the position of the box is height=unit(length(labels), adjusted and the positions of the text labels have to "lines")) be calculated all over again. > pushViewport(vp) Using a grid viewport makes this sort of position- > grid.roundrect() ing very simple. The basic idea is that we can create > grid.text(labels, a viewport where the box is going to be drawn and x=unit(2, "mm"), then do all of our drawing within that viewport. Po- y=unit(2:1 - 0.5, "lines"), sitioning text at the left edge of a viewport is very just="left") straightforward, and if we need to shift the box, we > popViewport() simply shift the viewport and the text automatically tags along for the ride. All of this applies equally to ISBN positioning the text vertically within the box. title In the code below, we create a viewport for the overall box, we draw a rounded rectangle occupying the entire viewport, then we draw text 2 mm from the left hand edge of the viewport and 1.5 lines of text up from the bottom of the viewport. A second Clipping line of text is also added, 0.5 lines of text from the Another feature of the boxes that we want to pro- bottom. duce is that they have shaded backgrounds. Looking > pushViewport(viewport(width=.25)) closely, there are some relatively complex shapes in- > grid.roundrect() volved in this shading. For example, the grey back- > grid.text("ISBN", ground for the “heading” of each box has a curvy x=unit(2, "mm"), top, but a flat bottom. These are not simple rounded y=unit(1.5, "lines"), rectangles, but some unholy alliance of a rounded just="left") rectangle and a normal rectangle. > grid.text("title", It is possible, in theory, to achieve any sort of x=unit(2, "mm"), shape with R because there is a general polygon y=unit(0.5, "lines"), graphical primitive. However, as with the position- just="left") ing of the text labels, determining the exact boundary > popViewport() of this polygon is not trivial and there are easier ways to work. In this case, we can achieve the result we want us- ing clipping, so that any drawing that we do is only ISBN title visible on a restricted portion of the page. R does not provide clipping to arbitrary regions, but it is pos- sible to set the clipping region to any rectangular re- Coordinate systems gion. The basic idea is that we will draw the complete The positioning of the labels within the viewport in rounded rectangle, then set the clipping region for the previous example demonstrates another useful the box viewport so that no drawing can occur in feature of the grid graphics system: the fact that loca- the last line of text in the box and then draw the tions can be specified in a variety of coordinate sys- rounded rectangle again, this time with a different tems or units. In that example, the text was posi- background. If we continue doing this, we end up tioned horizontally in terms of millimetres and ver- with bands of different shading. tically in terms of lines of text (which is based on the The following code creates an overall viewport font size in use). for a box and draws a rounded rectangle with a grey As another example of the use of these different fill. The code then sets the clipping region to start units, we can size the overall viewport so that it is one line of text above the bottom of the viewport and just the right size to fit the text labels. In the follow- draws another rounded rectangle with a white fill. ing code, the height of the viewport is based on the The effect is to leave just the last line of the original The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 17. C ONTRIBUTED R ESEARCH A RTICLES 17 grey rounded rectangle showing beneath the white add an arrow to either end of any straight or curvy rounded rectangle that has had its last line clipped. line that R draws. The following code draws three curves between > pushViewport(viewport(width=.25)) pairs of end points. The first curve draws the default > grid.roundrect(gp=gpar(fill="grey")) “city-block” line between end points, with a smooth > grid.clip(y=unit(1, "lines"), corner at the turning point, the second curve is sim- just="bottom") ilar, but with an extra corner added, and the third > grid.roundrect(gp=gpar(fill="white")) curve draws a single wide, smooth corner that is dis- > popViewport() torted towards the end point. The third curve also has an arrow at the end. > x1a <- 0.1; x1b <- 0.2 > y1a <- 0.2; y1b <- 0.8 > grid.curve(x1a, y1a, x1b, y1b) > x2a <- 0.4; x2b <- 0.5 > y2a <- 0.2; y2b <- 0.8 Drawing curves > grid.curve(x2a, y2a, x2b, y2b, Another basic shape that is used in the overall dia- inflect=TRUE) gram is a nice curve from one box to another. > x3a <- 0.7; x3b <- 0.8 In addition to the basic functions to draw straight > y3a <- 0.2; y3b <- 0.8 lines in R, there are functions that draw curves. In > grid.curve(x3a, y3a, x3b, y3b, particular, R provides a graphical primitive called an ncp=8, angle=135, X-spline (Blanc and Schlick, 1995). The idea of an X- square=FALSE, spline is that we define a set of control points and a curvature=2, curve is drawn either through or near to the control arrow=arrow(angle=15)) points. Each control point has a parameter that spec- q q q ifies whether to create a sharp corner at the control point, or draw a smooth curve through the control q q q point, or draw a smooth curve that passes nearby. The following code sets up sets of three control points and draws an X-spline relative to each set of control points. The first curve makes a sharp corner Graphical functions at the middle control point, the second curve makes a The use of graphical primitives, viewports, coordi- smooth corner through the middle control point, and nate systems, and clipping, as described so far, can the third curve makes a smooth corner near the mid- be used to produce a box of the style shown in the dle control point. The control points are drawn as diagram at the start of the article. For example, the grey dots for reference (code not shown). following code produces a box containing three la- > x1 <- c(0.1, 0.2, 0.2) bels, with background shading to assist in differenti- > y1 <- c(0.2, 0.2, 0.8) ating among the labels. > grid.xspline(x1, y1) > labels <- c("ISBN", "title", "pub") > x2 <- c(0.4, 0.5, 0.5) > vp <- > y2 <- c(0.2, 0.2, 0.8) viewport(width=max(stringWidth( > grid.xspline(x2, y2, shape=-1) labels))+ > x3 <- c(0.7, 0.8, 0.8) unit(4, "mm"), > y3 <- c(0.2, 0.2, 0.8) height=unit(length(labels), > grid.xspline(x3, y3, shape=1) "lines")) > pushViewport(vp) q q q > grid.roundrect() > grid.clip(y=unit(1, "lines"), q q q q q q just="bottom") Determining where to place the control points for > grid.roundrect(gp=gpar(fill="grey")) a curve between two boxes is another one of those > grid.clip(y=unit(2, "lines"), annoying calculations, so a more convenient option just="bottom") is provided by a curve graphical primitive in grid. > grid.roundrect(gp=gpar(fill="white")) The idea of this primitive is that we simply spec- > grid.clip() ify the start and end points of the curve and R fig- > grid.text(labels, ures out a set of reasonable control points to produce x=unit(rep(2, 3), "mm"), an appropriate X-spline. It is also straightforward to y=unit(3:1 - .5, "lines"), The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 18. 18 C ONTRIBUTED R ESEARCH A RTICLES just="left") > box1 <- boxGrob(c("ISBN", "title", > popViewport() "pub"), x=0.3) > box2 <- boxGrob(c("ID", "name", ISBN "country"), x=0.7) title pub The grid.draw() function can be used to draw However, in the sort of diagram that we want to any graphical object, but we need to supply the de- produce, there will be several such boxes. Rather tails of how "box" objects get drawn. This is the pur- than write separate code for each box, it makes sense pose of the second function in Figure 2. This function to write a general function that will work for any set is a method for the drawDetails() function; it says of labels. Such a function is shown in Figure 1 and how to draw "box" objects. In this case, the func- the code below uses this function to draw two boxes tion is very simple because it can call the tableBox() side by side. function that we defined in Figure 1. The important detail is that the boxGrob() function specified a spe- > tableBox(c("ISBN", "title", "pub"), cial class, cl="box", for "box" objects, which meant x=0.3) that we could define a drawDetails() method specif- > tableBox(c("ID", "name", "country"), ically for this sort of object and control what gets x=0.7) drawn. With this drawDetails() method defined, we can draw the boxes that we created earlier by call- ISBN ID title name ing the grid.draw() function. This function will pub country draw any grid graphical object by calling the appro- priate method for the drawDetails() generic func- This function represents the simplest way to ef- tion (among other things). The following code calls ficiently reuse graphics code and to provide graph- grid.draw() to draw the two boxes. ics code for others to use. However, there are ben- efits to be gained from going beyond this procedu- > grid.draw(box1) ral programming style to a slightly more complicated > grid.draw(box2) object-oriented approach. ISBN ID title name Graphical objects pub country In order to achieve the complete diagram introduced At this point, we appear to have achieved only a at the start of this article, we need one more step: more complicated equivalent of the previous graph- we need to draw lines and arrows from one box to ics function. However, there are a number of another. We already know how to draw lines and other functions that can do useful things with grid curves between two points; the main difficulty that graphical objects. For example, the grobX() and remains is calculating the exact position of the start grobY() functions can be used to calculate locations and end points, because these locations depend on on the boundary of a graphical object. As with the locations and dimensions of the boxes. The cal- grid.draw(), which has to call drawDetails() to find culations could be done by hand for each individual out how to draw a particular class of graphical object, curve, but as we have seen before, there are easier these functions call generic functions to find out how ways to work. The crucial idea for this step is that to calculate locations on the boundary for a particu- we want to create not just a graphical function that lar class of object. The generic functions are called encapsulates how to draw a box, but define a graphi- xDetails() and yDetails() and methods for our cal object that encapsulates information about a box. special "box" class are defined in the last two func- The code in Figure 2 defines such a graphical tions in Figure 2. object, plus a few other things that we will get to These methods work by passing the buck. They shortly. The first thing to concentrate on is the both create a rounded rectangle at the correct loca- boxGrob() function. This function creates a "box" tion and the right size for the box, then call grobX() graphical object. In order to do this, all it has to do is (or grobY()) to determine a location on the boundary call the grob() function and supply all of the infor- of the rounded rectangle. In other words, they rely mation that we want to record about "box" objects. on code within the grid package that already exists In this case, we just record the labels to be drawn to calculate the boundary of rounded rectangles. within the box and the location where we want to With these methods defined, we are now in a po- draw the box. sition to draw a curved line between our boxes. The This function does not draw anything. For exam- key idea is that we can use grobX() and grobY() to ple, the following code creates two "box" objects, but specify a start and end point for the curve. For exam- produces no graphical output whatsoever. ple, we can start the curve at the right hand edge of The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 19. C ONTRIBUTED R ESEARCH A RTICLES 19 > tableBox <- function(labels, x=.5, y=.5) { nlabel <- length(labels) tablevp <- viewport(x=x, y=y, width=max(stringWidth(labels)) + unit(4, "mm"), height=unit(nlabel, "lines")) pushViewport(tablevp) grid.roundrect() if (nlabel > 1) { for (i in 1:(nlabel - 1)) { fill <- c("white", "grey")[i %% 2 + 1] grid.clip(y=unit(i, "lines"), just="bottom") grid.roundrect(gp=gpar(fill=fill)) } } grid.clip() grid.text(labels, x=unit(2, "mm"), y=unit(nlabel:1 - .5, "lines"), just="left") popViewport() } Figure 1: A function to draw a diagram box, for a given set of labels, centred at the specified (x, y) location. > boxGrob <- function(labels, x=.5, y=.5) { grob(labels=labels, x=x, y=y, cl="box") } > drawDetails.box <- function(x, ...) { tableBox(x$labels, x$x, x$y) } > xDetails.box <- function(x, theta) { nlines <- length(x$labels) height <- unit(nlines, "lines") width <- unit(4, "mm") + max(stringWidth(x$labels)) grobX(roundrectGrob(x=x$x, y=x$y, width=width, height=height), theta) } > yDetails.box <- function(x, theta) { nlines <- length(x$labels) height <- unit(nlines, "lines") width <- unit(4, "mm") + max(stringWidth(x$labels)) grobY(rectGrob(x=x$x, y=x$y, width=width, height=height), theta) } Figure 2: Some functions that define a graphical object representing a diagram box. The boxGrob() function constructs a "box" object, the drawDetails() method describes how to draw a "box" object, and the xDetails() and yDetails() functions calculate locations on the boundary of a "box" object. The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 20. 20 C ONTRIBUTED R ESEARCH A RTICLES box1 by specifying grobX(box1, "east"). The ver- are five interconnected boxes and the boxes have tical position is slightly trickier because we do not some additional features, such as a distinct “header” want the line starting at the top or bottom of the box, line at the top. This complete code was excluded but we can simply add or subtract the appropriate to save on space, but a simple R package is pro- number of lines of text to get the right spot. vided at http://www.stat.auckland.ac.nz/~paul/ The following code uses these ideas to draw a with code to draw that complete diagram and the curve from the pub label of box1 to the ID label of package also contains a more complete implementa- box2. The curve has two corners (inflect=TRUE) and tion of code to create and draw "box" graphical ob- it has a small arrow at the end. jects. This call to grid.curve() is relatively verbose, One final point is that using R graphics to draw but in a diagram containing many similar curves, diagrams like this is not fast. In keeping with the S this burden can be significantly reduced by writing tradition, the emphasis is on developing code quickly a simple function that hides away the common fea- and on having code that is not a complete nightmare tures, such as the specification of the arrow head. to maintain. In this case particularly, the speed of The major gain from this object-oriented ap- developing a diagram comes at the expense of the proach is that the start and end points of this curve time taken to draw the diagram. For small, one-off are described by simple expressions that will auto- diagrams this is not likely to be an issue, but the matically update if the locations of the boxes are approach described in this article would not be ap- modified. propriate, for example, for drawing a node-and-edge graph of the Internet. > grid.curve(grobX(box1, "east"), grobY(box1, "south") + unit(0.5, "lines"), Bibliography grobX(box2, "west"), grobY(box2, "north") - C. Blanc and C. Schlick. X-splines: a spline model de- unit(0.5, "lines"), signed for the end-user. In SIGGRAPH ’95: Proceed- inflect=TRUE, ings of the 22nd annual conference on Computer graph- arrow= ics and interactive techniques, pages 377–386, New arrow(type="closed", York, NY, USA, 1995. ACM. ISBN 0-89791-701-4. angle=15, doi: http://doi.acm.org/10.1145/218380.218488. length=unit(2, "mm")), gp=gpar(fill="black")) G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJour- nal, Complex Systems:1695, 2006. URL http:// ISBN ID title name igraph.sf.net. pub country J. Gentry, L. Long, R. Gentleman, S. Falcon, F. Hahne, and D. Sarkar. Rgraphviz: Provides plotting capabil- ities for R graph objects, 2008. R package version Conclusion 1.18.1. This article has demonstrated a number of useful A. Larsson. Dia, 2008. http://www.gnome.org/ low-level graphical facilities in R with an example of projects/dia/. how they can be combined to produce a diagram consisting of non-trivial nodes with smooth curves P. Murrell. The grid graphics package. R News, 2(2): between them. 14–19, June 2002. URL http://CRAN.R-project. The code examples provided in this article have org/doc/Rnews/. ignored some details in order to keep things simple. P. Murrell. Recent changes in grid graphics. R For example, there are no checks that the arguments News, 5(1):12–20, May 2005a. URL http://CRAN. have sensible values in the functions tableBox() and R-project.org/doc/Rnews/. boxGrob(). However, for creating one-off diagrams, this level of detail is not necessary anyway. P. Murrell. R Graphics. Chapman & Hall/CRC, One detail that would be encountered quite 2005b. URL http://www.stat.auckland.ac.nz/ quickly in practice, in this particular sort of dia- ~paul/RGraphics/rgraphics.html. ISBN 1-584- gram, is that a curve from one box to another that 88486-X. needs to go across-and-down rather than across-and- up would require the addition of curvature=-1 to R Development Core Team. R: A Language and Envi- the grid.curve() call. Another thing that is miss- ronment for Statistical Computing. R Foundation for ing is complete code to produce the example dia- Statistical Computing, Vienna, Austria, 2008. URL gram from the beginning of this article, where there http://www.R-project.org. ISBN 3-900051-07-0. The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 21. C ONTRIBUTED R ESEARCH A RTICLES 21 D. Sarkar. Lattice: Multivariate Data Visualization with H. Wickham. ggplot2. Springer, 2009. http://had. R. Springer, 2008. URL http://amazon.com/o/ co.nz/ggplot2/. ASIN/0387759689/. ISBN 9780387759685. K. Soetaert. diagram: Functions for visualising simple Paul Murrell graphs (networks), plotting flow diagrams, 2008a. R Department of Statistics package version 1.2. The University of Auckland K. Soetaert. shape: Functions for plotting graphical New Zealand shapes, colors, 2008b. R package version 1.2. paul@stat.auckland.ac.nz The R Journal Vol. 1/1, May 2009 ISSN 2073-4859
  • 22. 22 C ONTRIBUTED R ESEARCH A RTICLES The hwriter Package Composing HTML documents with R objects > hwrite('Hello world !', 'doc.html') by Gregoire Pau and Wolfgang Huber Hello world ! Introduction HTML documents are structured documents made Character strings can be rendered with a variety of diverse elements such as paragraphs, sections, of formatting options. The argument link adds a hy- columns, figures and tables organized in a hierar- perlink to the HTML element pointing to an external chical layout. Combination of HTML documents document. The argument style specifies an inline and hyperlinking is useful to report analysis results; Cascaded Style Sheet (CSS) style (color, alignment, for example, in the package arrayQualityMetrics margins, font, ...) to render the element. Other argu- (Kauffmann et al., 2009), estimating the quality of mi- ments allow a finer rendering control. croarray data sets and cellHTS2 (Boutros et al., 2006), > hwrite('Hello world !', 'doc.html', performing the analysis of cell-based screens. link='http://cran.r-project.org/') There are several tools for exporting data from R into HTML documents. The package R2HTML is able to render a large diversity of R objects in HTML Hello world ! but does not easily support combining them in a structured layout and has a complex syntax. On the other hand, the package xtable can render R matrices > hwrite('Hello world !', 'doc.html', with simple commands but cannot combine HTML style='color: red; font-size: 20pt; elements and has limited formatting options. font-family: Gill Sans Ultra Bold') The package hwriter allows rendering R objects in HTML and combining resulting elements in a structured layout. It uses a simple syntax, supports extensive formatting options and takes full advan- Hello world ! tage of the ellipsis ’...’ argument and R vector re- cycling rules. Comprehensive documentation and examples of Images can be included using hwriteImage which hwriter are generated by running the command supports diverse formatting options. example(hwriter), which creates the package web > img=system.file('images', c('iris1.jpg', page http://www.ebi.ac.uk/~gpau/hwriter. 'iris3.jpg'), package='hwriter') > hwriteImage(img[2], 'doc.html') The function hwrite The generic function hwrite is the core function of the package hwriter. hwrite(x, page=NULL, ...) renders the object x in HTML using the formatting arguments specified in ’...’ and returns a character vector containing the HTML element code of x. If page is a filename, the HTML element is written in this file. If it is an R connection/file, the element is appended to this connection, allowing the sequential building of HTML documents. If NULL, the returned Arguments recycling HTML element code can be concatenated with other Objects written with formatting arguments contain- elements with paste or nested inside other elements ing more than one element are recycled by hwrite, through subsequent hwrite calls. These operations producing a vector of HTML elements. are the tree structure equivalents of adding a sibling node and adding a child node in the document tree. > hwrite('test', 'doc.html', style=paste( 'color:', c('peru', 'purple', 'green'))) Formatting objects The most basic call of hwrite is to output a character test test test vector into an HTML document. The R Journal Vol. 1/1, May 2009 ISSN 2073-4859