Summer of Code: Ideas

www.perl.org
 

Good projects...

  • ...build on or integrate well with existing code
  • ...are a development of an existing project, rather than starting from scratch
  • ...provide more basic infrastructure or libraries that will provide the widest opportunities for others to build on the results of the proposal

Some ideas...

  • svk has several tasks which would make good Summer of Code projects:
    • Fully implement svk Public Key infrastructure.
      svk allows you to sign and verify changesets. Implement the public key and policy infrastructure on top of it.
    • Web GUI for svk users.
      A web tool that at least allows svk users to merge one branch to another. Much of this could be re-used elsewhere. For example, a web implementation of 3 way-merge would benefit wikis and other online edit+versioning applications
    • Thin client for svk.
      Make the command layer of svk networked. This allows svk users in well-connected environment to use a specialized svk server without mirroring the repository.
    • Revamp foreign version control system support.
      We are currently using VCP which is not actively developed, and it also suffers from complicated abstraction to cope with the CVS model. Investigate and integrate tailor or other technology to provide robust foreign version control system integration. Separate the cvs changeset aggregation issue.
  • Devel::Cover provides code coverage metrics for Perl code. There are plenty of tasks available which would make good projects. See the projects page for more information.
    • Use the new "MAD" code to report on coverage.
    • Provide support for unreachable code.
    • Improve on the current reports.
    • Integrate profiling into reports.
    • Threads support.
    • Analyse tests and indicate how to improve coverage.
    • Improve cpancover and create a central coverage repository.
    • Implement path coverage.
    • Implement mutation coverage.
    • Test suite optimization.
    • Improve tests.
    • Something else useful and interesting.
    Hopefully someone will find something here that piques their interest.
  • The POE project has several tasks that would make good Summer of Code projects. For example:
    • Bring POE closer to 1.0. One of the documented goals for POE 1.0 is test coverage as complete as humanly possible. We're currently at 54%. Bringing it up to 80% is probably a realistic goal for the Summer of Code.
    • Implement base classes for POE::Stage. POE::Stage is designed to be a standard class library for POE components. It encapsulates some of the strongest component design patterns that have emerged in eight years of POE development, but it lacks many of the basic features POE already provides. The goal is to duplicate POE's feature set in POE::Stage. Luckily POE::Stage delegates most of the hard work to POE itself, so the bulk of the work will be designing and writing interfaces. POE::Stage is on the CPAN, and its project is hosted at http://thirdlobe.com/.

    You can discuss your ideas with the POE developers on #poe on irc.perl.org

  • Reimplement the DBI v1 API in Pugs
    Design an implementation of the DBI API in Perl 6 using Pugs. The goal is to maintain the familiar DBI API while radically refactoring the internals to make best use of Perl 6 and so enable greater functionality and extensibility. (Likely mentor: Tim Bunce)
  • Create a "Web Application Toolkit" for the NMS project.
    NMS has been very successful at providing well-written re-implementations of certain popular but insecure CGI scripts, but the requirement to be "drop-in" replacements has constrained what could be done. The next phase is to provide entirely new programs based around a "Web Application Toolkit", wide variety of functionality without any programming knowledge on the part of the person configuring it, and it should be simply installed on the majority of shared web hosting that supports Perl CGI programs. The work has been started - TFmail is the recommended replacement for the FormMail program.
    The project would be to write this toolkit and scripts that use it. The toolkit should incorporate all TFmail's functionality, but also support configurations as Guestbook, Web Discussion Board, Content Search, Link Exchange, or any combination. It's likely to need code to deal with
    • Authentication
    • Rate limiting of requests
    • Session management
    • Secure file handling
    • Generalized and flexible error reporting
    • Message threading
    • Improved templating engine
    see http://nms-cgi.sourceforge.net/phase2.html for more information. (likely mentors: Dave Cross, Jonathan Stowe)
  • Using Parrot and the compiler tools, write a parser for C header files that can generate Perl 5 NCI or Parrot NCI code.
    It should let the developer write an SDL binding for either language by running the parser against SDL.h and its friends. For output, it would generate either or both Perl 5 and Parrot code that:
    • opens the appropriate library with NCI or Parrot NCI
    • binds to the functions with the appropriate data types
    • declares the defined structs appropriately
    Developers should be able to include this generated code in the distribution so that end users never have to run this compiler tools themselves or even have the headers for the libraries installed. Obviously they need the NCI and bound libraries installed, but not the headers or even a C compiler.
  • Implement a Perl 5 regular expression module for Parrot along the same lines as the Perl 6 grammar engine. This will allow the use of current Perl 5 regular expression syntax with the new engine.
    There is an initial Perl 5 syntax parser to build from, which already passes some 600 tests, although the Perl 5 test suite provides plenty more! Of course, the tests that it's not yet passing are the ones that are syntactically more complex (i.e., (?...) extensions) or otherwise more involved to implement, so this should provide plenty of interest. With the syntax is fully implemented, the next stage is to analyze and start to port over the optimizations from the Perl 5 engine. (likely mentor: Patrick Michaud)
  • A Perl 5 to Perl 6 translator.
    As the first stage of of translating Perl 5 to Perl 6, Larry Wall has been working on adapting the existing Perl 5 parser to output a complete abstract syntax tree, something it had never needed to do before. The current implementation can convert about 75% of the core Perl 5 distribution back into byte-for-byte perfect source code (i.e. everything restored, including the exact whitespace and comment text normally discarded by a tokenizer). His work has been merged back into the core Perl 5 development tree. What is needed is for someone to complete this first stage proof-of-concept identity transformation, then to work with the Perl 5 and Perl 6 teams on converting the code from generating Perl 5 output to Perl 6 output.
  • Improve Parrot's Tree Grammar Engine:
    • allow complex node selectors in rules
    • find and implement a rule syntax other than PIR
    • complete and document the default PAST and POST objects
    • make the default objects available to all projects
    • refactor out a default compiler framework for all projects
    • optimize for memory use
    • allow interpretation as well as compilation
    • create a system that provides scaffolding of a new language implementation
    • create default tree optimizations (SSA would be nice)
    • create default POST -> PIR rules
  • Benchmarks for the Parrot Grammar Engine. This will involve developing profiling and other code analysis tools for Parrot, as well as investigating approaches to improve the algorithms used in the engine and their implementations. (likely mentor: Patrick Michaud)
  • LLVM <-> Parrot interop (likely mentor: Leopold Tötsch)
    • Parrot as LLVM backend (LLVM -> PASM)
    • PIR -> LLVM (not really, but evaluate optimizations)
    • Implement well-known optimizations in PIR compiler (SSA -> register allocation)
    • Runtime & offline bytecode optimization
  • Create a compiler for LISS (Language of Integers, Sequences and Sets an educational language developed at Universidade do Minho) and target it into Parrot. As deliverables we expect the compiler (runnable on the three major architectures) and a tutorial/how-to explaining the compiler architecture (to be added in the documentation of Parrot as an example of a compiler). (Likely mentor: Alberto Simões or Leopold Tötsch)
  • Parrot JIT: improve & complete JIT for some set of subsystems from alpha to Z8000 i.e. several pieces, one per platform
  • Parrot threads: implement STM for Parrot (software transactional memory, as in GHC, et al.)
  • Any collection of items from the Perl 5 TODO list that would combine to a project of a suitable length. Good examples:
    • "A decent benchmark" + "Profile Perl"
      Measure the performance of the Perl interpreter using free tools such as cachegrind, gprof, and dtrace, and work to reduce the bottlenecks they reveal.
    • "The regexp optimizer is not optional" + "Make the peephole optimizer optional"
      Perl 5's parser and the Perl 5 regexp engine both have optimizers. However, both optimizers are somewhat misnamed - as well as performing optimizations, they also perform transformations that are essential parts of the compilation processes. This means that they can't currently be disabled. Each would benefit from being split into two parts, separating out the essential compilation related-transformations, from the non-essential optimizations. This would allow the optimizations to be made optional, which allows the occasional bugs found in the optimizer to be more easily traced and eliminated.
    • "iCOW" -- a plan for Copy-On-Write to speed up thread creation.
      Perl 5's threading model runs each thread in a distinct interpreter. This has the benefit of low contention between threads, but the disadvantage that thread creation is slow. Sarathy and Arthur have a design for a Copy-On-Write system which will work across threads, allowing thread creation times to be dramatically reduced. The first stage of this project would be to test out the new model by implementing Copy-On-Write within a single thread. With the technology proven and debugged, the project would then move on to implementing the full COW semantics for thread creation.
    • "Properly Unicode safe tokeniser and pads"
      The tokeniser isn't actually very UTF-8 clean. Whilst Perl provides use utf8; to allow scripts to be written with variable names in UTF-8, globals are actually stored in stashes as raw bytes, without the internal UTF-8 flag set. Lexicals are stored in "PAD"s, and the API to access them is all bytes. If the input file handle for the program source code is marked by the IO system as UTF-8, the tokenizer ignores this. All this should be fixed.
    • "Attach/detach debugger from running program"
      Debuggers such as gdb and other tools such as truss can be attached to an existing running program. It should be possible to provide similar functionality with the Perl debugger, which would allow long running process to be inspected without restarting. On Unix ssh and screen address security concerns about connecting unrelated processes by using named pipes in /tmp. Maybe Perl can too. This project is likely to also evolve into many more debugger related improvements.
    • "fix tainting bugs" + "Make tainting consistent"
      Running the test suite with the taint warnings enabled reveals that many tests and some core modules are not written to be taint safe. It would be good for someone to work through these, identify and address the issues, so that as much of the core as possible can be used under the added protection that -T provides.
      Related to this, tainting is documented as being able to take shortcuts, which allows the implementation to be more efficient. As computers have become faster but humans remain immune to Moore's law, this tradeoff is no longer optimal. The implementation of tainting should be changed so that taint is propagated exactly in step with data, so that programmers don't get hard-to-understand surprises when they are working with tainting enabled
  • Polish up and finish off Perl 5 Native Call Interface. (likely mentor: chromatic).
  • Automated testing for binary compatibility between versions of Perl.
    Stable minor releases of Perl should be binary compatible, but there is no formal testing of this. The project would develop a testing architecture to verify that XS modules compiled with earlier minor versions still pass all tests when run with a Perl binary built from maintenance branch checked out from version control. The initial challenge in this project is automating the system sufficiently well that it runs reliably without attention or filling the local disks. Once that's done the interesting part can be tackled - correctly identifying all the test failure modes:
    • when a module failing tests with an existing released Perl version is that the module author's fault, or a local setup problem?
    • when an already built module's tests fail with the new Perl version, is that really a binary compatibility problem, or do they also fail if that module is built from clean with the new Perl?
    • Is the observed failure actually because a dependency failed in the same way?
    • Do you get the same results when you build dependencies with the old Perl but the module in question with the new Perl?
    And all this while minimizing both false positives (reporting a binary compatibility where none exists) and false negatives (missing one). Doing all this would also provide a very thorough testing of upcoming Perl releases with the CPAN codebase.
  • Work on the Perl Image Testing Architecture, intended to comprehensively solve the problem of mass testing Perl packages against a wide variety of different platforms (possible mentor: Adam Kennedy).
  • Build tools capable of migrating the Perl source repository from perforce to subversion.
    This isn't a trivial task to get right. Existing tools such as VCP are great for migrating most repositories, but every tool chokes when asked to migrate the nearly 30000 changes racked up in the past 9 years in the perforce repository that holds the core Perl source code. In particular, they fail to preserve the existing complex branching history, which we need to retain for code auditing purposes. This task would commence by researching the limits of the existing migration tools, to learn what they do, and deciding whether it is preferable to enhance them or instead steal their best ideas. The bulk of the project would be an iterative cycle of battle testing: running a migration, evaluating the corner cases where it failed (scope for automation here too), then working out how enhance the migration tools to cope. Getting the repository migrated is merely a side effect - the real deliverables here are reliable tools capable of manipulating and migrating everyone else's complex repositories.
  • Ajax library for Template Toolkit, akin to the Ajax support in Ruby in Rails (likely mentor: the Catalyst project).
  • Build a suite of tools around WxWidgets for cross-platform GUI apps with low overhead (i.e., don't require Tk or GNOME overhead).
  • A Perl SOAP library including WSDL support.