Monday, July 1, 2013

Code reuse and Global Operations - Google Summer of Code Week 2

This is the second week of my Google Summer of Code project. Last week, I implemented only two small and simple code actions. This week, I got a bit more ambitious.
Technically, some of this work was finished after the Midnight of Sunday, I'll leave it to the reader to decide if that's "cheating".

Null coallescing and pattern matching

The first action I'll be talking about is ConvertIfToNullCoalescing action.
//This action takes this:
object x = Foo();
if (x == null) x = Bar();

//And replaces it by this:
object x = Foo() ?? Bar();
This is a very simple action - similar to the stuff I wrote last week.
What changes this time is the heavy use of pattern matching.

With pattern matching, it's possible to define a certain pattern (e.g. if (someExpression) doSomething();) and check if a node matches that pattern. For instance, the pattern if (<Any>) { <Any> } recognizes if (true) { return 0; }, but not while (true) {} nor if (true) return 0;.

It's also possible to have named nodes. With named names, it's possible to know what matched certain parts of the pattern. For instance, using pattern if (<condition> = <Any>) {} to match if (true) {}, named nodes tell us that the condition is true.

Of course, this is a very simple example. Pattern matching gets more useful in more complex cases.
Also, note that patterns are created as normal C# objects - the string representation I used above was merely intended as an example.

Dispose and IDisposable

Another simple action I did was search for types with a void Dispose() method that didn't implement IDisposable. This issue detects and fixes the problem for classes, though it is disabled for interfaces.
public class Test
{
    public void Dispose() {}
}
//Is converted to
public class Test : System.IDisposable
{
    public void Dispose() {}
}
The issue is disabled for interfaces. I initially implemented that feature but ultimately decided to remove it due to a problematic edge case with explicit implementation:
interface IA {
    void Dispose();
}
class B : IA {
    void IA.Dispose() {}
}
//Let's say the action converted this to:
interface IA : System.IDisposable
{
}
class B : IA {
    void IA.Dispose() {} //ERROR here
}
The only way for this action to be effective for explicit interfaces would be to use a global operations, and I only figured out how to do this effectively on Sunday. More on global operations later.

LINQ Query and LINQ Fluent

One of the best features of .NET and C# is LINQ.
LINQ brings a declarative/functional taste to C# and greatly improves the readability of list operations - like SQL but better.

There are many ways to use LINQ in C#.
  1. One could use the methods of System.Linq.Enumerable, such as Enumerable.Where(myEnumerable, SomeFunction). This could be acceptable if there was no better way of doing things -- which there is;
  2. Use those same methods from Enumerable but in a different way. Those are extension methods and, therefore, can be used as myEnumerable.Where(SomeFunction). This is much better than the first option, especially because it's a Fluent Interface and therefore method calls can be chained -- myEnumerable.Where(Foo).Select(Bar)
  3. The last option is the query syntax. Instead of using the method calls, the programmer can use special syntax designed specifically for this purpose. More on this later
Query syntax is translated by the C# compiler to fluent syntax.
var seq = from item in myEnumerable
          where Foo(item)
          select Bar(item);

//Is converted to:
var seq = myEnumerable.Where(item => Foo(item)).Select(item => Bar(item));
Because query syntax is blindly converted to method calls, it can be used for a lot more than just Enumerables. Anything with the correct methods (e.g. Where and Select) can be used with query syntax, it doesn't need to be an extension method - a normal instance method works just fine.

And, in fact, the argument doesn't even need to be of type System.Func.

LINQ is not just used for enumerables. Some of the best uses of LINQ are related to databases.
int idToFind = 10;
var userWithId = from user in database.Users
                 where user.Id == idToFind
                 select new { user.Name, user.Score };
Console.WriteLine("User {0} has {1} point(s).", userWithId.Name, userWithId.Score);
//The query is translated to
int idToFind = 10;
var userWithId = database.Users.Where(user => user.Id == idToFind)
                     .Select(user => new { user.Name, user.Score });
Console.WriteLine("User {0} has {1} point(s).", userWithId.Name, userWithId.Score);
In this case, database.Users is not an enumerable. It's an instance of IQueryable.

Instances of IQueryable actually read the contents of the passed lambdas -- they use Expression Trees. And it works just fine.

So which is best? Query syntax or fluent syntax? It depends.
In some cases, fluent syntax is the simplest and most compact solution.
In other cases, query syntax is the most readable option - since joins and other complex queries do not involve complex method calls with anonymous types.

The action I implemented converts query syntax to fluent syntax. It should be capable of handing any query. So, whenever a query would be best written with fluent syntax, there's no need to manually convert it (and risking introducing new bugs!) - just let the code action do it.

This code action turned out to require less effort than I expected, since NRefactory already had a class to do what I needed - QueryExpressionExpander. Sadly, I didn't know about that class, so I ended up reading the relevant portion of the C# specification and writing my own code. Later, when I found out about QueryExpressionExpander, I rewrote a significant portion of my own work to use it - fixing a bug in NRefactory and adding a feature to QueryExpressionExpander along the way.

As it turns out, the C# specification is surprisingly readable. I was expecting an impenetrable wall of technical text but instead I was greeted by a text with lots of examples and simple to understand.

Convert to Enum - Global Operations

The last action I wrote this week is again a tale of finding the right class to use.

Anyway, the story of this action started with an old bug report (and by old I mean half a year old). An user with lots of Java code automatically converted to C# had lots of fields like public const int SOME_PREFIX_SOME_NAME and wanted to convert all those fields to enumerations. Code actions to the rescue!
//We have this
public const int SOME_PREFIX_A = 1;
public const int SOME_PREFIX_B = 2;
//We want this
public enum SOME_PREFIX : int
{
    A = 1,
    B = 2
}
I think it's not an overstatement to say this was the hardest action to implement so far. And I'm not even sure I'm done yet.

The main problem with this action that references to SOME_PREFIX_A become references to SOME_PREFIX.A. This means changing files besides the one with the field declaration.
It's not a simple find-and-replace either. We don't want to change strings nor comments. And we don't want to replace different fields with the same name.
As if things weren't bad enough, enumerations aren't implicitly cast to integers. The "perfect" version of this action would also replace method parameter types. This would soon become tremendously complex.
I opted to develop a more limited version of this conversion. Instead of replacing types everywhere, this just adds a type cast to the enum underlying type. So instead of SOME_PREFIX.A, we'd have ((int) SOME_PREFIX.A).

So, back to global operations. A global operation is an operation that changes files other than the current one. NRefactory has a method just for that: script.DoGlobalOperationOn.

Now, it would be just fine if  I were to just use that and be done with it. Unfortunately, I only discovered that method (and what it really did) very late. Before finding it out, I had to go on a tour discovering MonoDevelop source code. While the information I learned has helped me understand MonoDevelop better, I went full circle and ended up where I started - in NRefactory.

This code action is still not fully tested, so it might not work perfectly in some edge cases yet.

Conclusion

This week's work was harder than last week's. Still, I managed to get most of my planned work done.
Additionally, I'm gradually learning about new APIs of NRefactory, so I expect things to get easier as I get more experienced.

The main problem this week was spending time figuring out how to do things, only to find out there was an API just for that. Fortunately, the number of APIs in NRefactory is finite, so I'm hoping this situation won't last forever.

See Also

Google Summer of Code - Week 1
Last week's post

mono-soc-2013/NRefactory
As usual, my work is available in the mono-soc-2013 NRefactory github repository. Check the branches with names starting with "luiscubal-". The exception is code that's been merged back into the master branch of the official repository. Those branches have been deleted.

Download C# Language Specification 5.0 from Official Microsoft Download Center
The spec also comes bundled with Visual Studio (except Express). So if you already have it installed, just go to the VC#\Specifications\1033 folder. The exact location of this folder will depend on where you decided to install Visual Studio, but Program Files is a good place to start the search.

No comments:

Post a Comment