Google Summer of Code 2012
List of ideas and other information for Google Summer of Code 2012.
- Mentoring organizations
- List of Ideas
- NDT, M-Lab
- DONAR, Princeton
- Paris Traceroute, UMPC Sorbonne Universites
Mentoring organizations
M-Lab is a collaborative effort aimed at empowering users, researchers, and regulators with good data on scientific performance. M-Lab was founded by Vint Cerf and a large consortium of academic and industry partners. It operates as a consortium, with many organizations contributing work that furthers M-Lab’s mission. More information about M-Lab at measurementlab.net.
In the spirit of collaboration that defines M-Lab, we are acting as an umbrella organization for a number of GSoC projects that contribute to M-Lab and that are mentored by different organizations. M-Lab acts as the primary PoC, coordinating these different efforts. Each organization provides mentors specific to their own project(s).
List of ideas
(1) NDT
NDT is a TCP diagnostic tool developed by Internet2 and runs on the M-Lab platform. During each test run, NDT collects over 150 variables representing full TCP states, in addition to TCP dumps and other device-specific information. This means that NDT can provide data not only on *what* broadband performance occurred, but also on *why* specific performance was evidenced, and, implicitly, what could be done to improve it. NDT’s rich data is at the heart of a huge (+500TB), open, globally comparable repository of measurement data that has increasingly been used to set the tone of the debate, in favor of transparency, regarding broadband regulation and measurement.
- Homepage: http://measurementlab.net/
- IRC: irc://#m-lab@irc.freenode.org
- Mailing list: ndt-users at internet2 dot edu
(1.1) Add path information to NDT.
Brief description
An NDT test measures end-to-end performance between an NDT client and an M-Lab server.
The project proposal is to extend NDT to also collect information about the client-server path used by the test (e.g. by running a traceroute from/to the client to/from the server).
Having such information would allow to correlate end-to-end performance with specific network characteristics and, therefore, to answer questions like:
- How much is the performance of an end-user affected by its access network? How much is it affected by inter-ISP peering relationships?
- What’s the relation between end-to-end performance and path length?
- Which ISP in a country provides best access or transit (for TCP traffic)?
Given that all the data collected by NDT is made publicly available, researchers, regulators and the whole Internet community will benefit from this path data.
Expected results
New version of the NDT client, which runs one or more traceroutes as part of the test. Possibly correla the paths with end-to-end performance, as measured by NDT.
Develop this idea as a library. This way, it will be possible to also intergrate it within other M-Lab tools.
Our preference is to use Paris Traceroute to the standard traceroute. Development on the Paris Traceroute is also part of this M-Lab Summer of Code application (see below).
Knowledge prerequisite
- Java, (possibly) JavaScript, (possibly) HTML5.
- Basic experience with networking.
Mentor
- Thomas Gideon, New America Foundation’s Open Technology Initiative
(1.2) Display NDT test results along with wider context.
Brief description
The current NDT clients display a single result -- those of the test just concluded.
Displaying the addition of data from tests conducted near a given tester can provide a richer context for understanding a single result vs. some relevant aggregate, like a median or average throughput.
Displaying historical test results for a given IP address at the time of producing new test results can provide a different, equally informative context in terms of identifying trends in a link’s performance over time.
The ultimate goal is to provide NDT users with contextual information that can allow them to get a better understanding of the performance of their Internet connections, compared to other users in the same area as well as their own performance at different points in time.
Expected results
New version of the NDT client that also displays results
- From tests conducted by users in the same area and same ISP.
- From tests conducted in the past by the same user.
- And more ...
Examples of the kinds of context this project may produce can be seen at the following sites.
- https://sites.google.com/site/mlabcharts/eu-hackathon-results/android-uis
- http://www.youtube.com/my_speed
Knowledge prerequisite
- Java, (possibly) JavaScript, (possibly) HTML5 for customizing existing NDT clients.
- Experience with networking.
- Experience with data mining and big data preferred.
Mentor
- Thomas Gideon, New America Foundation’s Open Technology Initiative
(1.3) Develop iPhone client for NDT.
Brief description
The kinds of diagnostics and measurements NDT performs are increasingly useful to emerging technologies, like mobile broadband networks. NDT currently includes an Android client. Expanding the reach of NDT measurements to other mobile platforms will help produce a more complete picture of performance and capabilities.
Expected results
A mobile client suitable for distribution to iOS users that has a UI comparable to the beta 2 version of the Android NDT client. The iOS client should present the same computed summary results as well as allowing the user to review the more detailed measurement data for their individual test from which these metrics are calculated.
Knowledge prerequisite
- Objective-C, iOS development.
- C in order to extract or port the CLI version of the NDT client for use on iOS.
Mentor
- Thomas Gideon, New America Foundation’s Open Technology Initiative
(1.4) Develop desktop NDT client.
Brief description
The current web based version of the NDT client is most useful for single, on-demand tests. For a user who wants to build up a view of their connection’s performance over time, it would be ideal to take advantage of the capabilities of a full, desktop version that could run periodically using the target OS’s scheduling capabilities.
Expected results
Clients for Windows, OS X and Linux that can be installed and configured to perform NDTs battery of tests on a user configurable frequency. The client should include an affordance for viewing historical results to represent the data as it accumulates over time.
Knowledge prerequisite
- Rich client application development.
- Experience with a portable application framework, e.g. Air, JavaFX, Qt, GTK+, et. al.
Mentor
- Thomas Gideon, New America Foundation’s Open Technology Initiative
(1.5) Research and prototype NDT tests using multiple TCP connections.
Brief description
NDT currently utilizes only a single TCP connection for testing a network connection. Most modern network aware applications can take advantage of multithreading to utilize parallel connections at the same time. Ideally NDT should be able to operate in a similar fashion, to measure a connection while multiple connections are being used at the same time to compare and contrast with the data produced under the current result.
Expected results
This task may end up being mostly research with development efforts being more exploratory, in the form of a proof of concept or prototype. Many questions will be answered, in terms of how multiple streams affect the results of a given test and how using them impacts server resources and the ability to scale the M-Lab platform to better handle testing of this type.
Knowledge prerequisite
- Familiarity with network measurement and TCP/IP.
- C programming skills to work with the exist NDT client and server code.
- Multi-threaded and network programming in C to work through building simple multiple stream support on top of the existing NDT code.
Mentor
- Thomas Gideon, New America Foundation’s Open Technology Initiative
(2) DONAR, Princeton
With the advent of cloud computing and the growth of popular Web services, many networked applications are replicated at multiple geographic locations. Such distributed services face the challenge of server selection — that is, directing an incoming client request to the appropriate server or data center, in the hope of reducing network latency or carefully tuning server loads. M-Lab runs on a federated network of world-wide machines. Thus, optimal server selection is critical to ensure the accuracy and performance of the tools and diagnostics which are built on the M-Lab platform.
DONAR is a distributed system that provides name resolution and server selection for M-Lab experiments. It provides replica registration through a service API and offers client routing to replicas via DNS. DONAR explores many research issues surrounding mutli-replica services, including API design and optimal client-server assignment.
- Homepage: http://donardns.org/
- IRC: irc://#m-lab@irc.freenode.org
- Mailing list: m-lab-dev at measurementlab dot net
(2.1) Add IPv6 support to DONAR.
Brief description
The current version of DONAR does not support IPv6. This project involves extending DONAR to support IPv6 by modifying its storage layer and updating its registration API to let services register IPv6 addresses. This would also involve modifying DONAR’s interface to the underlying DNS serving engine (powerDNS).
Expected results
A version of DONAR that supports IPv6.
Knowledge prerequisite
- Java (at least intermediate level - some experience with concurrent programming in java required)
- Knowledge about how DNS works (including AAAA resolution)
- Development on Linux/UNIX (at least passing familiarity with BASH scripting)
- Basic knowledge of IPv6
Mentor
- Patrick Wendell, DONAR developer and Berkeley CS Ph.D
(2.2) Add server selection mechanism based on RTT.
Brief description
DONAR facilitates server selection based on various metrics. In particular, the metrics currently implemented are geographical distance and server load.
A number of tools running on M-Lab are latency sensitive and prefer to select a server based on expected latency, which isn’t always correlated exactly with geographic distance.
This project would involve adding an RTT-based server selection metric to DONAR, in order to increase the accuracy of server-selection for the latency-sensitive M-Lab tools.
Most likely this project would also involve synthesizing data from existing sources in real-time and integrating them into DONAR’s serving engine.
Expected results
The goal of this project is to have a working version of DONAR that also supports a new metric that can allow to choose an M-Lab server based on the latency between a client and the server.
Knowledge prerequisite
- Java (at least intermediate level - some experience with concurrent programming in java required).
- Knowledge about how DNS works.
- Development on Linux/UNIX (at least passing familiarity with BASH scripting)
- Knowledge of networking and network protocols.
- Familiarity or experience with network measurement.
Mentor
- Patrick Wendell, DONAR developer and Berkeley CS Ph.D
(2.3) Analize DONAR's usage trends.
Brief description
Improve DONAR’s data collection and logging and analyze the data that DONAR is collecting via M-Lab tools’ use in order to understand usage trends. Such analysis will guide us in making changes to the deployment of M-Lab services or the design of DONAR.
Expected results
A better understanding of DONAR’s usage behavior in the M-Lab context, and a well-informed ability to design changes and improvements based on this knowledge.
Knowledge prerequisite
- Basic Java experience.
- Knowledge about how DNS works.
- Data analysis skills.
- Development on Linux/UNIX (at least passing familiarity with BASH scripting).
- Knowledge of networking and network protocols.
Mentor
- Patrick Wendell, DONAR developer and Berkeley CS Ph.D
(2.4) Increase robustness and durability of DONAR.
Brief description
DONAR began as a research project out of Princeton University, but it is being used in production contexts on M-Lab. In fact, DONAR currently servers more than 500k tests per day. A variety of small additions, such as better testing, monitoring, and administrative interface development, would vastly improve the robustness of DONAR.
Expected results
Improve overall code quality of DONAR and find latent correctness bugs in the DONAR code.
Knowledge prerequisite
- Java (at least intermediate level - some experience with concurrent programming in java required)
- Knowledge about how DNS works.
- Development on Linux/UNIX (at least passing familiarity with BASH scripting).
- Experience writing testable, extensible code. General understanding of unit tests.
- Knowledge of networking and network protocols.
Mentor
- Patrick Wendell, DONAR developer and Berkeley CS Ph.D
(3) Paris Traceroute, UPMC
UPMC Sorbonne Universites is France’s leading technical university in science, engineering, and medicine, with over 30,000 students at its campuses in central Paris. It can be thought of as the Sorbonne’s “technical wing”. Its networking group is working on Paris Traceroute, a fundamental upgrade to the Traceroute tool that currently exists in all major operating systems.
The standard Traceroute was not designed to take into account the presence of load balancing routers, which are widely deployed in today’s Internet, both at the core and at its edges. As a result, the standard Traceroute is not aware that there are often multiple paths between a source and a destination in the Internet: the paths will split at a load balancing router at one point along the route and then re-converge some hops later. Standard traceroute is incapable of furnishing this information to its users and instead reports what it states to be a single path but is instead a confusing mixture of pieces of multiple paths. Paris Traceroute is aware of the multiple paths and can report on any single one of them accurately, as well as on all of them. The UPMC team has developed Paris Traceroute and released it as free, open source software. Its impact in the network measurement research community has been felt broadly, as evidenced by well over 100 scientific papers that a casual search for “Paris Traceroute” will reveal. There is a “paris-traceroute” package available at all of the official Debian repositories.
While Paris Traceroute is currently used by researchers (see the magnitude of scientific papers), it could be made much more flexible, and therefore useful. Notably, we are well advanced on a fundamental refactoring of the code into a modular library that will allow a variety of tools, beyond the standard command-line tool, to use Paris Traceroute functionality. Researchers will be able to use the library to control the low level processes of probe packet formation, sending and receiving, or make use of the high level algorithms for sending sufficient numbers of packets to ensure a high level of confidence that all paths between a source and a destination have truly been discovered. Improving the code and utility of Paris Traceroute will also allow to more easily integrate the Paris Traceroute into M-Lab tools.
- Homepage: http://www.paris-traceroute.net/
- Code development site: http://code.google.com/p/paris-traceroute/
- IRC: irc://#paris-traceroute@irc.freenode.org
- Mailing list: paris-traceroute at googlegroups dot com
(3.1) Development of the library core
The libparistraceroute library is organized into a core and a set of modules. For a high level picture of how the library is organized, please see the Overview section. The first set of ideas concerns the core.
(3.1.1) Complete the core
Brief description
The central development task, on which all other tasks are based, is to complete the development of the core. While it is already well advanced, work is still required to polish the base functionalities and tidy up the interface that the library offers to its users.
Expected results
A first release of libparistraceroute.
Knowledge prerequisite
- C programming
- Development on Linux/UNIX
- Basic knowledge of active network measurements
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
(3.1.2) Build probes according to a set of constraints
Brief description
At present, the library offers a simple abstraction through which probe packets are constructed. It allows an application to describe the stack of protocols, assign values to fields, and provide the payload. The next step will be to allow applications to delegate more of the packet construction labor to the core. The application would simply define a set of predicates that constrain the way that a probe packet is forged, and the core would do the rest.
For instance, suppose that we detect a load balancing router during a traceroute measurement (as indicated by an interface followed by several successor interfaces encountered during a trace towards a target destination). The router could be a per-flow load balancer, a per-packet load balancer, or a per-destination load balancer. In order to know whether a node corresponds to a per-packet load balancer, we must forge a set of probes belong to the same flow (same source IP, destination IP, source port, destination port and protocol). The constraints on these probes could be modeled using predicates.
There are several other applications that could be implemented thanks to predicates. A port scanning application is one example.
Expected results
A set of functions that allow to design easily any probe packet as easily as possible.
Knowledge prerequisite
- C programming
- Development on Linux/UNIX
- Knowledge of networking and network protocols.
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
(3.1.3) Develop schedulers and rate limits
Brief description
The library currently sends a packet immediately after it has been forged. We would like to be able to assign a precise sending time to each packet. Goals include allowing the sending of packets according to a given delay distribution, or allowing an application to replace the simple FIFO queue that is used for sending packets with a more complex scheduler. The mechanism could integrate a rate limiting process that would protect the user and the network by capping the amount of traffic being sent (during a time slot, towards a destination, etc.).
Expected results
A mechanism that schedule and regulate the probe packets sent.
Knowledge prerequisite
- C programming
- Development on Linux/UNIX
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
(3.1.4) Enhance the performance of the library
Brief description
Currently, all the processing to craft and send probes is managed in a single thread. The idea is to make the following improvements to library performance:
- allow multiple threads to coexist and make data structures used by the different algorithms thread-safe
- introduce a preliminary phase during which a packet is built so that it can be sent more promptly at the required moment, or even sent many times
Expected results
A thread-safe design of the library.
Knowledge prerequisite
- C programming (threads...)
- Development on Linux/UNIX
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
(3.2) Development of library modules
The organization of libparistraceroute around a set of modules, each one for an independent aspect, such as a protocol or an algorithm, makes the library highly flexible. The following are ideas for enlarging the scope of the library by bringing in new protocols and functionalities.
(3.2.1) Development of protocol modules
Brief description
Through a protocol module, the library both crafts packets from a probe description, and determines which sent probe corresponds to a received probe reply. Currently, the library has IPv4, UDP, and ICMP modules, which together allow for the sending of UDP traceroute probes and the parsing of the corresponding ICMP replies. The idea is to develop modules that support IPv6, and/or TCP, which is more complex to handle.
Expected results
Support of IPv6 and TCP in libparistraceroute.
Knowledge prerequisite
- C programming
- Familiarity with IPv6 and TCP/IP.
- Development on Linux/UNIX
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
(3.2.2) Development of algorithm modules
Brief description
Node and link queries
Today, the library has two primitives for network graph discovery: "node query" and "link query". The former sends a single probe in order to discover a router interface, and the latter sends a pair of probes in order to discover two successive interfaces. These simple algorithms are building blocks from which network probing tools can be built.
The idea is to develop a richer toolbox of primitives of this sort. One would apply the essential Paris Traceroute principle of maintaining a constant flow identifier across multiple probes. Another would discover all links along a path following a given interface. Yet others remain to be defined.
Multipath Discovery Algorithm (MDA)
The Multipath Discovery Algorithm (MDA) allows Paris Traceroute to discover all paths between a source and a destination, bounding the failure probability under some basic assumptions and returning a significance level for the result. In the process, the algorithm identifies all per-flow, per-destination and per-packet load balancing routers along those paths.
The current MDA implementation in libparistraceroute would gain much in clarity if it were to be broken down into more basic bricks, such as node and link queries (mentioned above), modules that provide statistical tests, etc. This would open the way for additional statistical algorithms to be tried in Paris Traceroute.
Expected results
A set of useful algorithm modules (link query and node query for instance) that make libparistraceroute as user-friendly as possible.
Knowledge prerequisite
- C programming
- Familiarity with network measurement.
- Development on Linux/UNIX
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
(3.3) Provide Python and Ruby bindings
Brief description
Another idea would be to wrap libparistraceroute to provide Python and Ruby bindings and thus increase the visibility of the library. Ideally, these bindings should take advantage of particular language capabilities. For example, the Python binding might be inspired by scapy.
Expected results
Python and/or ruby bindings.
Knowledge prerequisite
- C programming
- python/ruby programming
- Development on Linux/UNIX
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
(3.4) Packaging libparistraceroute
Brief description
Prepare a libparistraceroute-based Paris Traceroute command-line tool, of a quality sufficient to be integrated into one or more of the common variants of Linux. The current command-line tool is based on the earlier, monolithic, version of Paris Traceroute. The new tool should be thoroughly backwards-compatible, both at the command line and in the textual output, while also providing the new features that libparistraceroute offers, such as the ability to trace all the paths from a source to a destination using the MDA algorithm. In addition, the command-line tool will need to be packaged for distribution. We're looking for someone here who already has a track record as a committer to a significant open source project, preferably involving Unix system code.
Expected results
A backward-compatible traceroute tool using libparistraceroute and the corresponding Debian and RPM packages.
Knowledge prerequisite
- C programming
- Development on Linux/UNIX
- Basic knowledge of Linux packaging tools (APT and RPM).
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
(3.5) Implementation of network measurement tools and algorithms
Brief description
There are several existing network measurement tools that are highly used but that do not yet benefit from the advancements of Paris Traceroute:
- Doubletree
- Reverse Traceroute
- MIDAR
- Motu
- iffinder
- kapar
- scamper
- ...
The idea is to choose any one of these tools and to incorporate libparistraceroute, either into the existing tool or by reimplementing the tool, as appropriate.
There are several considerations to keep in mind:
- It could be a convenient way to improve existing tools and to test whether an improvement is promising.
- If libparistraceroute is well-designed, we should be able to optimize these tools or get comparable performance.
- Conducting this work will be helpful in defining the set of primitives that libparistraceroute needs to support. For example, the standard Traceroute tool measures a path towards a given target IP. Such a measurement can be seen as a "per-path" query. However, the MDA algorithm or Doubletree aims at discovering paths from a set of source node towards a set of target destinations. "Per-path" queries would induce a large number of redundant measurements, because several paths share common subpaths. Thus, "per-node" and "per-link" queries are two primitives that significantly decrease the amount of probes used by these two algorithms.
Expected results
An implementation of these tools using libparistraceroute.
Knowledge prerequisite
- C programming
- Familiarity with network measurement.
- Development on Linux/UNIX
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
- Timur Friedman (UPMC Sorbonne Universités)
(3.6) Develop distributed measurement techniques
Brief description
Typically, an active measurement tool enabled by libparistraceroute will be designed to run on a single machine. However, there are occasionally good reasons to design a distributed architecture for performing network measurements. We list below two applications that could be created thanks to libparistraceroute if it were to allow distributed measurements.
Expected results
A set of function in order to develop easily distributed algorithm thanks to libparistraceroute.
Knowledge prerequisite
- C programming
- Development on Linux/UNIX
- Knowledge of networking and network protocols.
Applications
Distributed traceroute measurements
As we know from the Doubletree and MDA algorithms, distributed traceroute measurements can involve considerable probe redundancy, for instance when several traceroute paths cross the same set of interfaces. In these algorithms, measurement nodes share information that allow the overall system to reduce its level of redundant measurements. One can also imagine more sophisticated approaches, and libparistraceroute should support them through its probe sender and response handler.
IP/router aliasing problem
The IP/router aliasing problem consists in discovering groups of IP addresses that all belong to the same router. iffinder is an example of a tool that tackles this problem. libparistraceroute could be used as the basis for a tool that solves this problem using a different approach.
Let us consider a A sender that emits a probe (for a given TTL and a given destination) towards a target T while spoofing the source IP address of receiver B. T will send the response to B. Thus, B collects the result of the probe sent by A and discovers an interface of T. We can iterate this approach by spoofing a set of IP addresses such that each address correspond to a a different receiver capable of handling T's responses. This would allow us to discover the set of interfaces belonging to T.
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
- Timur Friedman (UPMC Sorbonne Universités)
(3.7) Looking glass servers
Brief description
A final idea is the development and deployment of Looking Glass Servers for Paris Traceroute. There are two ways for people to benefit from the improved route tracing provided by Paris Traceroute: one is for them to use a Paris Traceroute command-line tool that is installed on their own operating system, and the other is to access a remotely-deployed instance of Paris Traceroute through a website or XML-RPC. There are already widely-deployed remote traceroute facilities throughout the world known as Looking Glass servers. The idea here is to create new Looking Glass Servers that incorporate the tracing improvements of Paris Traceroute. We would first deploy them on the global PlanetLab system. After testing and debugging, we would submit them for approval to be deployed on Measurement Lab, the PlanetLab system, sponsored by Google, that is dedicated to the testing of broadly-useful network measurement tools.
Expected results
A set of function in order to develop easily distributed algorithm thanks to libparistraceroute.
Knowledge prerequisite
- Python programming
- Development on Linux/UNIX
Mentors
- Jordan Augé (UPMC Sorbonne Universités)
- Marc-Olivier Buob (UPMC Sorbonne Universités)
- Timur Friedman (UPMC Sorbonne Universités)

