In another project that I’m working on (like I don’t have enough of them) I stumbled across the MediaWiki API. MediaWiki is the workhorse wiki software that powers such sites as Wikipedia. What you may (or may not) know is that the MediaWiki API is pretty slick, offering access to any content on a MediaWiki powered site.

In this post we’ll walk through building a Windows Phone 7 app to browse the content on Wikipedia using JSON and the MediaWiki API.

First create a new Windows Phone 7 app. Any app template will do but I find the Windows Phone Databound Application template to be useful as it creates a few useful things out of the box for you like a ViewModel class and adds it to the app. It’s nothing special that you can’t do yourself, but does save a little time.

Next you’ll want to be able to read the data coming from Wikipedia. We’ll be using the MediaWiki API (no download required) which can serve data up in XML format but we’ll opt for using JSON. Rather than using the native JSON methods in .NET let’s use the Json.NET library, a wicked cool library by James Newton-King that makes serializing and deserializing JSON into .NET objects a breeze.

Unfortunately at the time of this writing, the Nuget package for Json.NET doesn’t install properly on Windows Phone 7 projects so you have to download the file, unzip it, and add the references manually. Hopefully someone updates the Nuget package so this 5 minute task can be avoided in the future.

The default app has a listbox with items to display that links to a detail page. For this sample, we’ll fill the list with categories and drill into the category to display the pages associated with it. The first task is to retrieve the categories from Wikipedia. The full documentation for the API is online here. To get the categories it’s a straight forward API call that looks like this:

http://en.wikipedia.org/w/api.php?action=query&list=allcategories&format=json

The first part is where the API page is located on Wikipedia (it may not be in this location on other WikiMedia sites so check with the site owner). Then we specify an action, in this case a query. We ask for a list of items specifying “allcategories” and we want it in JSON format.

Here’s the output:

{
    query: {
        allcategories: [
            {
                *: "!"
            }
            {
                *: "!!! EPs"
            }
            {
                *: "!!! albumns"
            }
            {
                *: "!!! albums"
            }
            {
                *: "!!! songs"
            }
            {
                *: "!!AFRICA!!"
            }
            {
                *: "!910s science fiction novels"
            }
            {
                *: "!928 births"
            }
            {
                *: "!936 births"
            }
            {
                *: "!946 poems"
            }
        ]
    }
    query-continue: {
        allcategories: {
            acfrom: "!949 births"
        }
    }
}

MediaWiki queries come back in two groups. First is the query results and then a section titled “query-continue” that contains the next value that you can use to start from on a subsequent query. You may need to do several queries if you want to get everything. MediaWiki supports up to 500 items per call but very often (especially with the size of Wikipedia) that number can be in the thousands. It’s up to you how to do the queries (all at once or as you go) but think of it as picking up where you left off. The default size is 10 which is fine for now.

However the results are not very pretty and somewhat bizarre list of categories. First we’ll add some more data to the category with information about it. This is done by adding more parameters to the API call:

http://en.wikipedia.org/w/api.php?action=query&list=allcategories&format=json&acprop=size&acprefix=A

All we’ve done is add “&acprop=size&acprefix=A” to the call. This brings in number pages, files, sub categories into the mix (the size property is the sum of all those). We’ll also get categories that start with the letter “A” to avoid the categories named “!”. Here’s the results:

{
    query: {
        allcategories: [
            {
                *: "A"
                size: 0
                pages: 0
                files: 0
                subcats: 0
            }
            {
                *: "A&E Network shows"
                size: 0
                pages: 0
                files: 0
                subcats: 0
            }
            {
                *: "A&E Shows"
                size: 0
                pages: 0
                files: 0
                subcats: 0
            }
            {
                *: "A&E Television Network shows"
                size: 0
                pages: 0
                files: 0
                subcats: 0
            }
            {
                *: "A&E Television Networks"
                size: 17
                pages: 14
                files: 0
                subcats: 3
            }
            {
                *: "A&E Television network shows"
                size: 0
                pages: 0
                files: 0
                subcats: 0
            }
            {
                *: "A&E network shows"
                size: 0
                pages: 0
                files: 0
                subcats: 0
            }
            {
                *: "A&E shows"
                size: 66
                pages: 66
                files: 0
                subcats: 0
            }
            {
                *: "A&E television network shows"
                size: 0
                pages: 0
                files: 0
                subcats: 0
            }
            {
                *: "A&M Records"
                size: 0
                pages: 0
                files: 0
                subcats: 0
            }
        ]
    }
    query-continue: {
        allcategories: {
            acfrom: "A&M Records EPs"
        }
    }
}

That looks better. Unfortunately there’s no way to include additional filters like “don’t include items with size = 0” so you’re going to have to make multiple calls and the filtering in your app (LINQ is great for this) but this is good enough to get started.

Json.NET can deserialize these results into an object graph but you need to create the classes for it. One way to do this is to use the JSON C# Class Generator which is a pretty handy tool if you’re starting with JSON as your format and don’t have anything. It doesn’t know anything about the Json.NET library so it uses native C# calls for the structure and helpers and but will give you something to start from.

We’ll just build our classes manually as there are only a few we need. Here’s the first cut (based on the JSON data above):

public class MediaWiki
{
    public Query Query { get; set; }
    public QueryContinue QueryContinue { get; set; }
}
 
public class Query
{
    public Allcategory[] Allcategories { get; set; }
}
 
public class Allcategory
{
    public int Size { get; set; }
    public int Pages { get; set; }
    public int Files { get; set; }
    public int Subcats { get; set; }
}
 
public class QueryContinue
{
    public Allcategories Allcategories { get; set; }
}
 
public class Allcategories
{
    public string Acfrom { get; set; }    
}

The class and property names here are not the most intuitive but we’ll fix that. First let’s make the call to the API to get our JSON then deserialize it into this object graph. Replace the LoadData method in the MainViewModel with this code:

public void LoadData()
{
    var address = @"http://en.wikipedia.org/w/api.php?action=query&list=allcategories&format=json&acprop=size&acprefix=A&aclimit=500";
    var webclient = new WebClient();
    webclient.DownloadStringCompleted += OnDownloadStringCompleted;
    webclient.DownloadStringAsync(new Uri(address));
}

This kicks off the download and sets up the callback to invoke when the download is complete (I also added “&aclimit=500” to the end of the query to get more than the default 10 results). When the string is downloaded we call this:

foreach (var category in json.Query.Allcategories.Where(category => category.Size > 0))
{
    Items.Add(
        new ItemViewModel
            {
                LineOne = "Pages: " + category.Pages,
                LineTwo = "Subcats: " + category.Size,
            });
}

This takes the result of the download and calls the JsonConvert method DeserializeObject. This is a generic method that we pass our MediaWiki class from above to. Then just loop over the Allcategories array to pluck out each category and create our ItemViewModel items manually. We use LINQ in the loop to filter out any categories with a size of 0.

Here's the result on our phone:

This isn’t too exciting because frankly we don’t know what the name of each category is. Remember the JSON?

{
    *: "A&E Television Network shows"
    size: 0
    pages: 0
    files: 0
    subcats: 0
}

Hmmm.. how are we going to get that name into a property in our class? The Json.NET library matches up names of attributes in the markup with the name of a property in your class. We can’t create a property called “*” as that’s not valid in C#.

The answer is to use the JsonPropertyAttribute on our class and introduce a new property called Title. Here’s our updated Allcategory class with the markup:

public class Allcategory
{
    [JsonProperty("*")]
    public string Title { get; set; }
    public int Size { get; set; }
    public int Pages { get; set; }
    public int Files { get; set; }
    public int Subcats { get; set; }
}

This tells Json.NET that when it comes across a value with the markup “*” to deserialize it into the Title property.

Now we can update our LoadData method to use the title instead:

private void OnDownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
    var json = JsonConvert.DeserializeObject<MediaWiki>(e.Result);
    foreach (var category in json.Query.Allcategories.Where(category => category.Size > 0))
    {
        Items.Add(
            new ItemViewModel
                {
                    LineOne = category.Title,
                    LineTwo = string.Format("Pages: {0} Subcats: {1}", 
                        category.Pages, category.Subcats),
                });
    }
    IsDataLoaded = true;
}

Which now looks like this on the phone:

That’s a little better.

With the [JsonProperty] attribute we can also specify the name of the property in the JSON markup so that we’re not tied to that name when specifying the name in our C# class. This allows us to make our C# class a little more readable. Here’s an couple of examples:

public class Query
{
    [JsonProperty("allcategories")]
    public Allcategory[] Categories { get; set; }
}
 
public class Allcategory
{
    [JsonProperty("*")]
    public string Title { get; set; }
    public int Size { get; set; }
    public int Pages { get; set; }
    public int Files { get; set; }
    [JsonProperty("subcats")]
    public int Categories { get; set; }
}

The JsonProperty will match whatever markup MediaWiki (or whomever is providing your JSON feed) and we can use a more friendlier name in our code (P.S. the class names can be whatever you want, it’s the properties that are important).

The default app already has the function to display the DetailsPage when you tap on an item in the list. It passes the index of the array of ItemViewModel items to the page which retrieves it from the Items property of the ViewModel stored in the App class and sets up the title of the DetailsPage to the LineOne property of the ViewModel.

This property is really the title of the category and the value we can use to get more information from the MediaWiki API. We’ll use the “categorymembers” action to get all pages in a given category. Here’s the url we’re going to use:

http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&format=json&cmtitle=Category:Zombies&cmprop=type|ids|title

We’re going after all pages in the “Zombies” category and want to include the type of page, the id, and the title.

Here’s the JSON from this call:

{
    query: {
        categorymembers: [
            {
                pageid: 8375
                ns: 0
                title: "Draugr"
                type: "page"
            }
            {
                pageid: 27279250
                ns: 0
                title: "Felicia Felix-Mentor"
                type: "page"
            }
            {
                pageid: 143895
                ns: 0
                title: "Jiang Shi"
                type: "page"
            }
            {
                pageid: 1776116
                ns: 0
                title: "Clairvius Narcisse"
                type: "page"
            }
            {
                pageid: 781891
                ns: 0
                title: "Philosophical zombie"
                type: "page"
            }
            {
                pageid: 7568400
                ns: 0
                title: "Zombie Squad"
                type: "page"
            }
            {
                pageid: 5048737
                ns: 0
                title: "Zombie walk"
                type: "page"
            }
            {
                pageid: 22328159
                ns: 0
                title: "Zombeatles"
                type: "page"
            }
            {
                pageid: 34509
                ns: 0
                title: "Zombie"
                type: "page"
            }
            {
                pageid: 6569013
                ns: 14
                title: "Category:Hoodoo"
                type: "subcat"
            }
        ]
    }
    query-continue: {
        categorymembers: {
            cmcontinue: "subcat|31450932|ZOMBIES AND REVENANTS IN POPULAR CULTURE"
        }
    }
}

Note that we have a new attribute called “categorymembers” instead of “allcatgories” (but the top level attribute is still “query”). We’ll definitely need a class to handle the categorymembers array returned by the call but should it go into the existing Query class?

Technically you could do it. Json will deserialize it for you and if it can match up the attribute with the C# property name (or find the property decorated with the JsonProperty attribute) it will and the other items will just be null.

It’s up to you if you want to build a special query class for each type of query. I suggest you do (you can even create a generic MediaWikiQuery<T> class that takes in things like a CategoryMember class or AllCategory class) to keep things clean. Otherwise you’re violating a few SOLID principles and laying the foundation for a God class.

For demo purposes I’ll just add the CategoryMember class to our Query class but like I said, it’s a demo only. Here’s the new CategoryMember class and the modified Query class:

public class CategoryMember
{
    public int PageId { get; set; }
    public string Title { get; set; }
    public string Type { get; set; }
}
 
public class Query
{
    [JsonProperty("allcategories")]
    public Allcategory[] Categories { get; set; }
 
    [JsonProperty("categorymembers")]
    public CategoryMember[] Pages { get; set; }
}

Now when the DetailsPage loads we’ll figure out the index of the item based on the parameter passed in and call a new method on our MainViewModel to load the pages.

protected override void OnNavigatedTo(NavigationEventArgs e)
{
    string selectedIndex = "";
    if (NavigationContext.QueryString.TryGetValue("selectedItem", out selectedIndex))
    {
        int index = int.Parse(selectedIndex);
        App.ViewModel.LoadPages(index);
        DataContext = App.ViewModel.SelectedCategory;
    }
}

Note that the LoadPages method is another call to the service so we’ll probably want to create a handler in our view to handle displaying a “Loading” indicator and respond to something like a property changed event on our ViewModel to remove it. Here we’ll just the binding in the ViewModel and the page will update (eventually) with the list of pages. It’s not the UX you want to build but this post is already getting long and I’m sure you’re pretty tired reading it.

We also set the DataContext of our DetailsPage to a new property we added in the ViewModel call SelectedCategory. This way MainViewModel hangs onto whatever category the user selected so we can reference it (and it’s properties) later.

To load the pages first we’ll call out to Wikipedia to fetch them based on our category. We could pass in the title we want but here we’ll pass the index to the title and fetch it in the method:

public void LoadPages(int index)
{
    SelectedCategory = Items[index];
    var address =
        string.Format(
            "http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&format=json&cmtitle=Category:{0}&cmprop=type|ids|title",
            SelectedCategory.LineOne);
    var webclient = new WebClient();
    webclient.DownloadStringCompleted += OnDownloadPagesCompleted;
    webclient.DownloadStringAsync(new Uri(address));
}

This sets up SelectedCategory based on the index we passed in and crafts the url to the MediaWiki API to fetch all pages for whatever the category is (based on the title).

Now we need to process the JSON when the download of the page list completes. For this we’re going to need a new ViewModel. Here’s a quick and dirty one that just uses the title and pageid property:

public class PageViewModel : INotifyPropertyChanged
{
    private int _pageid;
    private string _title;
 
    public int PageId
    {
        get { return _pageid; }
        set
        {
            if (value == _pageid) return;
            _pageid = value;
            NotifyPropertyChanged("PageId");
        }
    }
 
    public string Title
    {
        get { return _title; }
        set
        {
            if (value == _title) return;
            _title = value;
            NotifyPropertyChanged("Title");
        }
    }
 
    public event PropertyChangedEventHandler PropertyChanged;
    private void NotifyPropertyChanged(String propertyName)
    {
        var handler = PropertyChanged;
        if (null != handler)
        {
            handler(this, new PropertyChangedEventArgs(propertyName));
        }
    }
}

We’ll display the Title to the user but the pageid needs to be stored because later we’ll want to retrieve all the details about a single page.

In the ItemViewModel (our category) we have an ObservableCollection of PageViewModel objects called Pages. This mimics the Items property in the MainViewModel. Here’s the declaration:

public ObservableCollection<PageViewModel> Pages { get; private set; }

And here’s the constructor creating them:

public ItemViewModel()
{
    Pages = new ObservableCollection<PageViewModel>();
}

Back in the MainViewModel we deserialize the JSON and add the PageViewModel objects to our selected category:

private void OnDownloadPagesCompleted(object sender, DownloadStringCompletedEventArgs e)
{
    var json = JsonConvert.DeserializeObject<MediaWiki>(e.Result);
    foreach (var page in json.Query.Pages.Where(page => page.Type.Equals("page")))
    {
        SelectedCategory.Pages.Add(
            new PageViewModel
                {
                    PageId = page.PageId, 
                    Title = page.Title
                });
    }
}

Just like when we loaded the categories here we only select items where the page.Type is a “page”. In our Zombie example, one of the items is a “subcat”. In a real app, we would have something to handle constructing that and creating some kind of link to another category (since everything is a category, we could reuse a lot of this code for that).

The last part is changing the DetailsPage.xaml to display a list of pages. Here’s the LayoutRoot grid updated:

<Grid x:Name="LayoutRoot" Background="Transparent" d:DataContext="{Binding SelectedCategory}">
    <Grid.RowDefinitions>
        <RowDefinition Height="Auto"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
 
    <!--TitlePanel contains the name of the application and page title-->
    <StackPanel x:Name="TitlePanel" Grid.Row="0" Margin="12,17,0,28">
        <TextBlock x:Name="PageTitle" Text="MY APPLICATION" Style="{StaticResource PhoneTextNormalStyle}"/>
        <TextBlock x:Name="ListTitle" Text="{Binding LineOne}" Margin="9,-7,0,0" Style="{StaticResource PhoneTextTitle1Style}"/>
    </StackPanel>
 
    <!--ContentPanel contains details text. Place additional content here-->
    <Grid x:Name="ContentPanel" Grid.Row="1" Margin="12,0,12,0">
        <TextBlock x:Name="ContentText" Text="{Binding LineThree}" TextWrapping="Wrap" Style="{StaticResource PhoneTextNormalStyle}"/>
        <ListBox x:Name="MainListBox" Margin="0,0,-12,0" ItemsSource="{Binding Pages}" SelectionChanged="MainListBox_SelectionChanged">
            <ListBox.ItemTemplate>
                <DataTemplate>
                    <StackPanel Margin="0,0,0,17" Width="432">
                        <TextBlock Text="{Binding Title}" TextWrapping="Wrap" Style="{StaticResource PhoneTextExtraLargeStyle}"/>
                    </StackPanel>
                </DataTemplate>
            </ListBox.ItemTemplate>
        </ListBox>
    </Grid>
</Grid>

The result is a details page that shows our category as the title and a listbox full of pages.

At this point you can add an additional handler to the listbox to drill down into the page itself. From there you can pluck out a list of images, links to other pages, and even the wiki content and sections.

Check the MediaWiki API for more info on getting down into all of this stuff. It’s very cool being able to poke into MediaWiki and beats the hell out of screen scraping!

A Call to Action!

This just gives you an intro to accessing a resource like Wikipedia using the public API and deserializing results via JSON into a set of classes you can use to bind to a Windows Phone 7 app. I’m going to leave the rest up to you. A few ideas to think about if you were to build on this example:

  • Calling directly from the ViewModel isn’t a production practice, it was a demo only. You’ll probably want to create a MediaWikiService and inject it into your ViewModel
  • The call can take a few seconds so you’lll need to handle this in your service and update the UI accordingly
  • Drill down into a single page, pluck out the images and create a visual MediaWilki browser experience or something.
  • You can even post content *to* a MediaWiki wiki (after you login with a username/pass that has rights to) so not only can it be a browser experience but it can be an editing experience too.
  • This is just one example to use the MediaWiki API by fetching categories and page content but there are a lot of other type of queries you can make like getting a list of recent changes, comment history, or even random pages.

Use your imagination and above all, have fun. If you get stuck feel free to leave questions in the comments section and I’ll do my best to answer them.