c# - Asp.net Crawler Webresponse Operation Timed out -


Hi, I have created a simple Threadpool based web crawler within my web application. Its task is to crawl its own application space and to create the Lausanne index and their meta content for each legitimate web page. Here's the problem when I run the crawler from the debug server instance of the Visual Studio Express, and provides the initial example as the IIS url, so it works fine. However, when I do not provide IIS example and it crawls The process takes its own URL to begin (ie. Crawling its own domain space), I get hit by the operation on the exception WebResponse statement Muay ended. Can anyone guide me in what I should do here or not? Here's my code to fetch the page, it is executed in a multi-threaded environment.

  Private String String GetWebText (string URL) {string htmlText = ""; HttpWebRequest Request = (HttpWebRequest) HttpWebRequest.Create (url); Request.UserAgent = "My crawler"; Using (using the WebResponse response = request.GetResponse ()) {stream (stream = response.GetResponseStream ()) {{StreamReader Reader = New StreamReader (stream)} {htmlText = reader.ReadToEnd () ; }}} Return html text; }  

And the following is my stacktrace:

  on CSharpCrawler.Crawler.GetWebText on System.Net.HttpWebRequest.GetResponse () (string url) C: \ MyAppDev \ myApp \ site \ App_Code \ CrawlerLibs \ Crawler.cs: Serial crawler. Crawler CR-366 on crawler. Crawlpage (string url, list '1 thread credentials) in C: \ myAppDev \ myApp \ site \ App_Code \ CrawlerLibs \ Crawler CS: CSHR crawler. Crawler on line 105. Crawler seatbild index (string host url, string url tofizstarchapp, list '1 threadcartist) in C: \ myAppDev \ myApp \ site \ App_Code \ CrawlerLibs \ Crawler.cs: crawler default. Threaded crawl stubblindindex (Object Threaded Crawler obje): System.Threading.QueueUserWorkItemCallback.WaitCallback_Context (object state) on C: \ myAppDev \ MyApp \ Site \ crawler \ Default.aspx.cs: Line System.Threading.ExecutionContext.runTryCode (Object UserData) In 108 System.Runtime.CompilerServices. RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup (TryCode code, CleanupCode backoutCode, UserData object) Reading.ExecutionContext.RunInternal (ExecutionContext executionContext, ContextCallback callback, object state) on System.Th the System.Threading.QueueWorkItemCallback.System.Threading System.Threading.ExecutionContext.Run (ExecutionContext executionContext, ContextCallback, callback, etc. on .ithreadPoolWorkItem.ExecuteWorkItem Shall state, Boolean ignoreSyncCtx) on (on SystemkThreadingkThreadPoolWorkQueuekDispatch (on System.Threading._ThreadPoolWaitCallback.PerformWaitCallback ()))  

Thanks and Cheers, Leon.

How many concurrent requests are being made by your crawler? You can easily die from thunderpools - especially the crawler is running within the website code.

Each request is that it will use your threading thread 2 from the pool in this way - to take action on a request and wait for the second response.


Comments