This script looks for the wikipedia network around a certain topic. It first finds the outlinks from and inlinks to that page in the wikipedia.org domain. It will use the common list as startingpoints and calculate the nr of pages that link to each page thus found. You will notice this has a long tail distribution. For all pages with an nr of links > the 'long tail cut-of point' their outlinks are retrieved and the shared links are calculated again for that set of links. This proces is repeated for crawlDepth -1 times.
Warning: crawling is exponential! More and more links are found each time. A page is fetched once each second. Even a small network takes a looooong time.
Tips: exclude dates and categories. ctrl-f on iteration to get to the results quickly
Note: Notice that only for getting the first startingpoints 'what links here' is used. We might wanna use it for the other pages as well - instead of just counting shared (out)links. In doing so we would get shared (out)links of which we are sure they also have an inlink to a previous iteration. The implications of both methods need to be thought through.

wikipedia url:
do not include these pages in the network (1 url per line):
also remove:
(multiple select possible)
links that are automatically removed: non wikipedia urls, User:, Talk:, Special:, Portal:, Image:, Help:, Main_page, Wikipedia:, Template:, &action=, /w/index.php (+ their dutch equivalents)

depth of crawl:
cut of long-tail (in each iteration): +- %. This will only use those page with the top x% of inlinks - with the modification that it will keep pages with the same amount of inlinks from the page where it's supposed to be cut. Note to myself: find a good cut-of point based on the actual distribution of the links (see probability distributions with long tails)
show removed links no yes

last update 13/2/2007: added dutch pages to be removed, fixed whatlinkshere to be universal