crawl websites out of java web application without using bin/nutch -

I am trying to use nutch (1.1) without bin / nutch from my (java) mojarra 2.0.2 webapp I'm looking for Google for examples, but there is no example how I can feel this: / ... I get an exception and the job is unsuccessful: / (I think about the reason I want to think ... ... here is my code:

 Public Zero Run throws exceptions {Last string [] args = new string [] {String.format ("% s% s% s% S ", JSFUtils.getWebAppRoot ()," nutch " , Failksepretr, Dayrectori_urls), "-dir" Stringkformet ( "% s% s% s% s", Jasfutilskgetvebpprut (), "Nuc" Failksepretr, Dayrectori_crol), "-threds" This preferences.get ("threads"), "-depth", this.preferences.get ("Depth"), "-topN", this.preferences.get ("topN"), "-solr", this Preferences .get ("solr")}; Crawl.main (args); }

and a part of the entry:

10/05/17 10:42:54 information jvm.JvmMetrics: To start with JVM metrics processName = JobTracker, sessionId = 5 / 10/17 10:42:54 Warne manager. Job Client: Use the generic option parser to parse the arguments. Applications should apply tools for the same 10/05/17 10:42:54 information mapred.FileInputFormat: 1 10/05/17 10:42:54 information mapred.JobClient: running the total investment paths for the process work: job_local_0001 10/05/17 10:42: 54 information mapred.FileInputFormat: paths of total investment for the process: 1 05/10/17 10:42:55 information mapred.MapTask: numReduceTasks: 1 10/05 / 17 10:42:55 information mapred.MapTask: io.sort. MB = 100 java.io.IOException: Job failed! org.apache.hadoop.mapred.JobClient.runJob (JobClient.java:1232) on org.apache.nutch.crawl.Injector.inject (Injector.java:211) on org.apache.nutch.crawl.Crawl.main on lan.localhost.main.Index.indexing (Index.java:71) on lan.localhost.process.NutchCrawling.run (NutchCrawling.java:108) (Crawl.java:124) on lan.localhost.bean.FeedingBean. ActionStart (Feedingbin.Java) .... Can someone help me or tell me how can I crawl with Java application? I have extended the XMS from 256 meters and XMX to 768 meters, but nothing changed ...

Best regards Marcel

is you usually would possibly add nutch config files in your Wargpth, set it through the NUTCH_CONF_DIR environment variable when the script bin / Nach calls.

--Dhadoop.log.dir which may need to be set up.

Take time to learn bin / nasty script more about those people.

New Tmime

Search This Blog

crawl websites out of java web application without using bin/nutch -

Comments

Post a Comment