2011-07-25

WebDAV protocol for Dummies

A little bit of why and how


Onion gets WebDAV support


This last week I've been adding support for WebDAV to onion, so now sharing resources (file systems) is really easy. There is a lot of work left, but for a fast export of files, it works. It does not have any security measure and has not been thoughtfully tested.

Anyway on this post I want to share what I learnt about WebDAV protocols, just in case any other person wants to implement a WebDAV protocol, so they don't have to go thought the 100+ pages of specification. I did not read it fully although I looked at it whenever I felt it necessary.

The full specification is always the recommended way, but actually you don't need everything to have a server working.

I want also to warn that I don't like XML so much. Its nice for some scenarios, but I think it was pushed to do things it just is overkill and makes it more difficult. Some might say that that was the standard when WebDAV was created, but then we should look for a new standard more centerd on JSON or anything saner. Anyway, WebDAV is built in in all operating systems, and more or less well integrated, so we need to support it.

The WebDAV implementation at libonion is, as always, at github.

How to reverse engineer it


To reverse engineer the protocol I installed an apache2 webserver with webdav, on simple HTTP, and used wireshark. Actually I did it on a virtual machine and had problems capturing data, so I resorted to use tcpdump:
tcpdump -i eth0 -s 65535 -w webdav.tcp
Then I did some normal tasks, as listing files, copying, moving, delete... and check how it does it. So I got several important methods, maybe the most important PROPFIND.

The protocol


WebDAV is actually a very powerful protocol, very extensible that allows to do almost anything you can think of about resources. This includes a lot about metadata, locking... Anyway not all clients use everything and if you want advanced features your client and server should have them. Anyway the protocol allows graceful degradation of the features.

Each of the methods described bellow are the HTTP method used to get the data.

All XML entities uses the "DAV:" namespace, normally encoded as you will see in the examples.

PROPFIND

Propfind is for C programmers, like stat and opendir/readdir. You can ask for whatever properties you want from a resource, including listing subresources. This is maybe the most important method, as allows to navigate thought content, that later is retrieved via normal GET.

 First you set the depth of the query, normally 0 or 1 (on the spec they also talk about infinity). This is set in a header. With depth 0 you query only about the resource itself, for example to stat a directory or a file. With Depth 1, you query also about subresources, like files inside a directory.

Then you get an XML as data for the PROPFIND request. On this request the client ask for some properties of the resource, for example resource type, content length, last modified and creation time:

<propfind xmlns="DAV:">
  <prop>
    <resourcetype xmolns="DAV:">
    <getcontentlength xmlns="DAV:">
    <getlastmodified xmlns="DAV:">
    <creationdate xmlns="DAV:">

  </prop>
</propfind>

Then the answer is a multipart answer, with one part for each resource, and for each resource one part for the known properties, and a second one for unknown. This is a verbose output of the answer of a depth 1:
<D:multistatus xmlns:D="DAV:">
  <D:response xmlns:lp1="DAV:" xmlns:g0="DAV:">
    <D:href>/CMakeFiles</D:href>
    <D:propstat>
      <D:prop>
        <lp1:resourcetype>
          <D:collection/>
        </lp1:resourcetype>
        <lp1:getlastmodified>Mon, 25 Jul 2011 08:49:40 GMT</lp1:getlastmodified>
        <lp1:creationdate>2011-07-25T08:49:40Z</lp1:creationdate>
      </D:prop>
      <D:status>HTTP/1.1 200 OK</D:status>
    </D:propstat>

    <D:propstat>
      <D:prop>
        <g0:getcontentlength/>
      </D:prop>
      <D:status>HTTP/1.1 404 Not Found</D:status>
    </D:propstat>

  </D:response>
</D:multistatus>
The answer is an HTTP return code 207.

So, in this verbose XML we have the multistatus, inside each answer; if we are with depth 1 and directory, each file is an answer.

Then a description of the URL in the server (href). This is not a relative URL but absolute, so you server needs to know where it is, or guess it from the query.

Next is the returned known/applicable properties for this resource. It it is a collection (directory listing), it must return as shown on the example. Else just empty. For all the other properties just return the right value. Dates are in the typical HTTP format or ISO. I don't really know when to use which, so I copied what apache does.Very important here is the status XML tag, with status 200, that sets that these are the properties known.

Next we have the unknown properties, that is just a list of them with the status 404.

Depending if we are listing properties of a directory (collection in WebDAV argot) or file (resource), we have to return one or another set of properties. A fast table follows:

PropertyCollectionResource
resourcetype<collection/>empty
getlastmodifiedyesyes
creationdateyesyes
getcontentlengthnoyes
getcontenttypehttpd/unix-directoryoptional
getetagyesyes

This is a work in progress list, and only has my findings, might not be correct. As always if you want it 100% correct check the spec.

If it asks for PROPFIND of an unknown resource, it just returns the HTTP return code of 404  NOT FOUND, or an error. 207 is the only valid answer if everything went ok.

GET

Just returns the file contents. This is a mistake of the WebDAV protocol, as just because of this, you can not mix final results of some server processing and a webdav share, as you don't want GET to do the same on a Webserver and on a WebDAV share. The solution is to WebDAV share on another URL.

PUT

Creates a new file, the file data is the HTTP request data, and the name the URL.

DELETE

Deletes a resource.

MOVE

Moves the path resource to the path at the "Destination:" header. The destination has the full URI, including the server, so some process of the data is needed.

WebDAV drawbacks


The only real advantage of WebDAV is that it is a standard, and so you can be sure that it will be supported by several vendors and implementations. Also that the standard is "quite" known and not so complex, so that give some mental peace.

But the protocol is not nice at all. Here I list some reasons:
  1. Its too verbose. XML is verbose per se, namespaces are wildly used. Same info with JSON would be quite less bytes, and easier.
  2. Mixes HTTP headers, HTTP methods, HTTP data and XML on request. A clean solution should be just XML or just headers. There was no need to create new methods. 
  3. There are some powerful but unnecessary on the server stuff, like MOVE receiving the full URI.
  4. The GET mistake is really a problem. As it uses the same method as normal HTTP, with no header differentiation  nothing, the only solution is to have separate WebDAV resources and normal resources, when a proper and nicer way should be to be able to get the raw resource with the WebDAV GET and the processed resource with the "server" GET.
Use with onion

To use it in onion, just add the onion_webdav handler, included in onion_handlers. Check ofileserver for a real world example.

No comments:

Post a Comment