Rexiology::Work

Microsoft, Information Technologies...

Community

News

  • From Taiwan, living and working at Tokyo, Japan.




Recent Posts

Tags

Microsoft Sites

Other Sites

Blog pools

Bloggers

Archives

Site Info



Locations of visitors to this page




Logos & Chicklets


GeoURL


Rex's Certifications
Rex's Certifications


Creative Commons授權條款
本 著作 係採用
Creative Commons 授權條款



The fun of URL Rewriting...

 

This blog site had finally changed to CommunityServer 1.1.

As on doing the site transfer, I've moved from the sub-domain blog.rex.la to the root domain rex.la, at the same time, also moved those archived pic files as well as other public referenced files out to the other IIS virtual root with different domain name. It's a big move of URLs in the website, but actually there are seldom moves in the actual files. As in the past time I usually mixed using all the domain names I registered to point to the files (most of the domain names I registered are all pointed to the same virtual root, thus using any one for the path was ok), unifiying the domain name urls costs me some time to organize all the blog content.

When using the .Text converting tools provided by Kevin Harder, it provided a string replacement section in the middle of convertion while moving old .Text posts to new CommunityServer database. this saved me a lot of time of re-editing posts. noticed that it will be not possible to just using SQL statements utilizing the Replace function to change BLOB fields (the post body), if one wants to do the string replacement in those posts, one must writing a small program to get each post out of database, modified in the memory, and then update it back to db, which is Kevin's convertion tool did.

After did the first convertion to move all the posts as well as links, referrals, etc to new CommunityServer blog site, string replacement of the URLs of files moved to other virtual root were already changed with correct URL links. I'll still need to do some database queries to find out those posts with links that using rex.la as domain name to point files, and mannualy changed to the new domain name. fortunately it's only about 30 posts or so.

After confirmed the content that all had correct URLs , it's time to consider all the outside links around the world that point to my site. All the links outside will all using old domain name blog.rex.la to get the file. I'll need to have a way to redirect those old links to new links in my new blog site.

I found that CommunityServer now had very good URL rewriting subsystem to do all the URL rewriting using Http Module, and store all the rewriting configuration in a SiteUrl.config file. each rewriting item all had 2 parts of settings for 2 functions. one is for the system to get the real path of a aspx file really located, by using formatting string to let program pass parameters to form the real URL. after this, if this URL had a really processing Http Handler with the other aspx file, then same in this rewriting item it contains the other 2 xml attributes, the pattern and the vanity attributes. by using Regular Expression pattern to match the original path, it then transforms to the real URL that the vanity attribute specified, with parameters from the pattern transform, to call the real aspx file. pretty cool!

So, I would like to use this rewriting system to point my old posts to new posts. First just point the old domain name blog.rex.la to the new virtual root, so that old blog.rex.la are now also points to new website rex.la. doing this making the old blog traffic going to the new site too. Second is to mapping old .Text URLs to the new blog site. noticed that in old .Text site's web.config file there were bunch of http handler settings that also using a pattern matching approach to map the urls to the actual processing aspx programs, that will be a good reference to start the mapping to the new locations.

The formal .Text url rewriting settings were as follows (inside the web.config file):

<HandlerConfiguration  defaultPageLocation = "DTP.aspx" type = "Dottext.Common.UrlManager.HandlerConfiguration, Dottext.Common">
<HttpHandlers>
<HttpHandler pattern = "^(?:/rss20\.aspx)$" type = "Dottext.Common.Syndication.RssHandler, Dottext.Common" handlerType = "Direct" />
<HttpHandler pattern = "^(?:/atom20\.aspx)$" type = "Dottext.Common.Syndication.AtomHandler, Dottext.Common" handlerType = "Direct" />
<HttpHandler pattern = "^(?:/comments/commentRss/\d+\.aspx)$" type = "Dottext.Common.Syndication.RssCommentHandler, Dottext.Common" handlerType = "Direct"/>
<HttpHandler pattern = "^(?:/aggbug/\d+\.aspx)$" type = "Dottext.Framework.Tracking.AggBugHandler, Dottext.Framework" handlerType = "Direct"/>
<HttpHandler pattern = "^(?:/customcss\.aspx)$" type = "Dottext.Web.UI.Handlers.BlogSecondaryCssHandler, Dottext.Web" handlerType = "Direct" />
<HttpHandler pattern = "^(?:/category\/(\d|\w|\s)+\.aspx/rss)$" type = "Dottext.Common.Syndication.RssCategoryHandler, Dottext.Common" handlerType = "Direct" />
<HttpHandler pattern = "^(?:((\/default\.aspx)?|(\/?))?)$"  controls = "homepage.ascx"/>
<HttpHandler pattern = "^(?:/articles/\d+\.aspx)$" controls = "viewpost.ascx,Comments.ascx,PostComment.ascx" />
<HttpHandler pattern = "^(?:/articles/\w+\.aspx)$" controls = "viewpost.ascx,Comments.ascx,PostComment.ascx" />               
<HttpHandler pattern = "^(?:/archive/\d{4}/\d{2}/\d{2}/\d+\.aspx)$" controls = "viewpost.ascx,Comments.ascx,PostComment.ascx" />
<HttpHandler pattern = "^(?:/archive/\d{4}/\d{2}/\d{2}/\w+\.aspx)$" controls = "viewpost.ascx,Comments.ascx,PostComment.ascx" />
<HttpHandler pattern = "^(?:/archive/\d{4}/\d{1,2}/\d{1,2}\.aspx)$" controls = "ArchiveDay.ascx" />
<HttpHandler pattern = "^(?:/archive/\d{4}/\d{1,2}\.aspx)$" controls = "ArchiveMonth.ascx" />
<HttpHandler pattern = "^(?:/contact\.aspx)$" controls="Contact.ascx" />
<HttpHandler pattern = "/posts/|/story/|/archive/" type="Dottext.Web.UI.Handlers.RedirectHandler,Dottext.Web"  handlerType = "Direct"/>
<HttpHandler pattern = "^(?:/gallery\/\d+\.aspx)$" controls="GalleryThumbNailViewer.ascx" />
<HttpHandler pattern = "^(?:/gallery\/image\/\d+\.aspx)$" controls="ViewPicture.ascx" />
<HttpHandler pattern = "^(?:/(?:category|stories)/(\w|\s)+\.aspx)$" controls="CategoryEntryList.ascx" />
<HttpHandler pattern = "^(?:/comments\/\d+\.aspx)$" type = "Dottext.Common.Syndication.CommentHandler, Dottext.Common" handlerType = "Direct" />
<HttpHandler pattern = "^(?:/services\/trackbacks/\d+\.aspx)$" type = "Dottext.Framework.Tracking.TrackBackHandler, Dottext.Framework" handlerType = "Direct" />
<HttpHandler pattern = "^(?:/services\/pingback\.aspx)$" type = "Dottext.Framework.Tracking.PingBackService, Dottext.Framework" handlerType = "Direct" />
<HttpHandler pattern = "^(?:/services\/metablogapi\.aspx)$" type = "Dottext.Framework.XmlRpc.MetaWeblog, Dottext.Framework" handlerType = "Direct" />
</HttpHandlers>

Just do a one-to-one mapping of those urls to the new urls by editing SiteUrls.config file. first one must added a new location with empty start path (this is in the "location" section):

<location name="empty" path="" />

then just start the mapping in the "url" section:

<!-- below is the general mapping from .text to cs files -->
<url name = "oldcat03" location="empty" path="/rss\.aspx" pattern="http://blog.rex.la/rss\.aspx" vanity="http://rextang.net/blogs/past/rss.aspx" />
<url name = "oldcat04" location="empty" path="/atom\.aspx" pattern="http://blog.rex.la/atom\.aspx" vanity="http://rextang.net/blogs/past/atom.aspx" />
<url name = "oldcat05" location="empty" path="/comments/commentRss/\d+\.aspx" pattern="http://blog.rex.la/comments/commentRss/(\d+)\.aspx" vanity="http://rextang.net/blogs/past/commentrss.aspx?PostID=$1" />
<url name = "oldcat06" location="empty" path="/aggbug/\d+\.aspx" pattern="http://blog.rex.la/aggbug/(\d+)\.aspx" vanity="http://rextang.net/aggbug.aspx?PostID=$1" />
<url name = "oldcat07" location="empty" path="/customcss\.aspx" pattern="http://blog.rex.la/customcss.aspx" vanity="http://rextang.net/blogs/blogs/customcss.aspx?app=past" />
<url name = "oldcat08" location="empty" path="/category\/(\d|\w|\s)+\.aspx/rss" pattern="http://blog.rex.la/category/(\d+)\.aspx/rss" vanity="http://rextang.net/blogs/past/rss.aspx?CategoryID=$1" />
<url name = "oldcat09" location="empty" path="((\/default\.aspx)?|(\/?))?" pattern="http://blog.rex.la/default.aspx" vanity="http://rextang.net/" />
<url name = "oldcat10" location="empty" path="/articles/\d+\.aspx" pattern="http://blog.rex.la/articles/(\d+)\.aspx" vanity="http://rextang.net/blogs/past/articles/$1.aspx" />
<url name = "oldcat11" location="empty" path="/articles/\w+\.aspx" pattern="http://blog.rex.la/comments/(\w+)\.aspx" vanity="http://rextang.net/blogs/past/articles/$1.aspx" />
<url name = "oldcat12" location="empty" path="/archive/\d{4}/\d{2}/\d{2}/\d+\.aspx" pattern="http://blog.rex.la/archive/(\d{4})/(\d{1,2})/(\d{1,2})/(\d+)\.aspx" vanity="http://rextang.net/blogs/post.aspx?App=past&amp;y=$1&amp;m=$2&amp;d=$3&amp;PostID=$4" />
<url name = "oldcat13" location="empty" path="/archive/\d{4}/\d{2}/\d{2}/\w+\.aspx" pattern="http://blog.rex.la/archive/(\d{4})/(\d{1,2})/(\d{1,2})/(\w+)\.aspx" vanity="http://rextang.net/blogs/post.aspx?App=past&amp;y=$1&amp;m=$2&amp;d=$3&amp;PostName=$4" />
<url name = "oldcat14" location="empty" path="/archive/\d{4}/\d{1,2}/\d{1,2}\.aspx" pattern="http://blog.rex.la/archive/(\d{4})/(\d{1,2})/(\d{1,2})\.aspx" vanity="http://rextang.net/blogs/day.aspx?App=past&amp;y=$1&amp;m=$2&amp;d=$3" />
<url name = "oldcat15" location="empty" path="/archive/\d{4}/\d{1,2}\.aspx" pattern="http://blog.rex.la/archive/(\d{4})/(\d{1,2})\.aspx" vanity="http://rextang.net/blogs/month.aspx?App=past&amp;y=$1&amp;m=$2&amp;d=1" />
<url name = "oldcat16" location="empty" path="/contact\.aspx" pattern="http://blog.rex.la/contact\.aspx" vanity="http://rextang.net/blogs/past/contact.aspx" />
<url name = "oldcat17" location="empty" path="/gallery\/\d+\.aspx" pattern="http://blog.rex.la/gallery/(\d+)\.aspx" vanity="http://rextang.net/photos/past/category$1.aspx" />
<url name = "oldcat18" location="empty" path="/gallery\/image\/\d+\.aspx" pattern="http://blog.rex.la/gallery/image/(\d+)\.aspx" vanity="http://rextang.net/photos/past/picture$1.aspx" />
<url name = "oldcat19" location="empty" path="/(?:category|stories)/(\w|\s)+\.aspx" pattern="http://blog.rex.la/(?:category|stories)/(\d+)\.aspx" vanity="http://rextang.net/blogs/past/archive/category/$1.aspx" />
<url name = "oldcat20" location="empty" path="/comments\/\d+\.aspx" pattern="http://blog.rex.la/comments/(\d+)\.aspx" vanity="http://rextang.net/blogs/past/comments/$1.aspx" />
<url name = "oldcat21" location="empty" path="/services\/trackbacks/\d+\.aspx" pattern="http://blog.rex.la/services/trackbacks/(\d+)\.aspx" vanity="http://rextang.net/blogs/trackback.aspx?PostID=$1" />

Since the path attribute is for programs to passing parameters, it's not used here and just act as comment string place. The pattern specifies the original .Text site path pattern, and vanity point to the actual CommunityServer program path. noticed that the path mapped might not be the end path of a program and may just been url-rewritten some times to reach it's real http handler, which is defined in other place inside SiteUrl.config file.

The CommunityServer source code must be modified to support above re-writing with full URL path including the start http:// , otherwise the output URL will be wrong to redirect to the right place. since the original design of this URL rewriting subsystem was just for internal site rewriting, it will need some care for that http module code to make above mapping happen.

So, after all the modification, I've made my old links correctly linked to the new places without broken. all the outside post links will link to the new place with a http 301 status code indicating the old url were permenently changed to this new place (added code to do this!). further more, using above approach can let me redirect all my RSS links as well as category links out to other RSS service provider like FeedBurner, as my site is currently doing, without modify skin-pages or modify the code to point the category RSS as well as site RSS to outside. Just write one more url rewrite and it's all out!

All until now is for those http requests that's able to make itself to the CommunityServer's http modules. for those like static html files, image files, etc, it's taken and served by defult IIS server behavior which is far before it can be captured and re-written by CommunityServer's http module. how do we re-write those things?

It's not able to do it inside the CommunityServer site by just using the re-write subsystem. some re-writter that's more under IIS must be utilized. I've used ISAPI-Rewrite for a long time for defending the spammers, it will be a good place to do those html files and image files re-writing there.

for example I got a big folder with pictures in a directory that some of the outside links used the new domain name to reference them, which now this domain name had become the blog site and I moved files to the other domain name. to do the rewrite in ISAPI Rewrite, simplely using the example like this:

RewriteCond Host: rex.la
RewriteRule /misc/(.*) http\://archive\.tang\.tw/misc/$1 [I,R]
RewriteCond Host: blog.rex.la
RewriteRule /images/(.*) http\://rex\.la/blogs/past/images/$1 [I,R]
RewriteCond Host: rex.la
RewriteRule /rex-resume.htm http\://archive\.tang\.tw/rex-resume\.htm [I,R,L]


noticed that using Host header as rewrite condition to prevent rewriting infinite loops since both virtual roots got the same sub-directory pattern while rewriting.

And that should cover all my links outside without broken.

URL rewriting is fun and convenient for site transfer!

Technorati Tags: communityserver , asp.net , programming

 

Comments

Rexiology::Work said:

&amp;nbsp;
Just a follow-up reading about URL Rewriting in ASP.NET.
I've did the Url Rewriting of this...
# November 30, 2005 5:55 PM

Rexiology::Work said:

&amp;nbsp;
So I finally got some time to start the process of upgrading this blog site to CommunityServer...
# October 24, 2006 8:56 PM

Rexiology... said:

crosspost from http://rextang.net/blogs/work/ So I finally got some time to start the process of upgrading
# October 24, 2006 8:57 PM