Printer Friendly Version Print this thread
Email this thread to a friend eMail this thread to a friend
  • For Sale Russia/USA Marriage/Dating/Meeting Site (In: I Want to Sell My Website)
  • I want to sell my site, mzkforums.com (In: I Want to Sell My Website)
  • I want to sell my site, mzkforums.com (In: I Want to Sell My Website)
  • For Sale - Child Gift Web Site (In: I Want to Sell My Website)
  • For Sale PR3 Site with Lots of Authority (In: I Want to Sell My Website)
  • Featured Web Site Template

    Hundreds More at Free Site Templates.com!

    Web Site Partners
    Sponsored Links
    Jet City Software
     
    Whos Here ?
    Reflects user activity within the last 5 minutes
    Moderator(s): Prowler, jcokos
    Member Message

    joe_vimal
    Joined: Mar 22, 2001
    # Posts: 104

    View the profile for joe_vimal Send joe_vimal a private message

    Posted: 2005-Sep-30 08:32
    Edit Message Delete Message Reply to this message

    Anyone heard of a scraper script which extracts job ads from many sites and dumps them into a database ?

    I could see tons of other scripts in all the usual places, but not one along this line. Writing a script from the scratch seems daunting. Any help would be much appreciated.



    masidani
    Joined: Oct 21, 2005
    # Posts: 10

    View the profile for masidani Send masidani a private message

    Posted: 2005-Oct-24 08:24
    Edit Message Delete Message Reply to this message

    Joe,

    I think it would have to be custom-written, I'm afraid. The reason is that the script needs to know how the HTMl code on each of the sites is written in order to know where to find the job data in the HTML.

    If you visit each of the job sites yourself and look at the HTML source code, you'll see that each one is different. The "screen scraper" program will need to know where to look in each page to find things like job title, salary, location etc., which will be different in each case. Hence it will need to be custom-written.

    That said, a Perl program with LWP::Useragent library and a few regular expressions will suffice, so long as there are no login/registration procedures etc. that need to be dealt with.

    Simon



    joe_vimal
    Joined: Mar 22, 2001
    # Posts: 104

    View the profile for joe_vimal Send joe_vimal a private message

    Posted: 2005-Oct-26 16:02
    Edit Message Delete Message Reply to this message

    Thanks Simon. I was afraid I would have to start from the beginning. There are other issues too. Will I be infringing on some copy right law if the script scrapes a couple of lines from many sites ?





    bhartzer
    Staff
    Joined: Jun 08, 2000
    # Posts: 7042

    View the profile for bhartzer Send bhartzer a private message

    Posted: 2005-Oct-26 17:44
    Edit Message Delete Message Reply to this message

    Will I be infringing on some copy right law

    Yes.



    joe_vimal
    Joined: Mar 22, 2001
    # Posts: 104

    View the profile for joe_vimal Send joe_vimal a private message

    Posted: 2005-Oct-27 08:22
    Edit Message Delete Message Reply to this message

    Thanks bhartzer. I knew something like this would happen. Ok. I have read somewhere that if you quote a couple of lines from any site in your site and use appropriate credit, you will not be hauled up for violation of copyright. Is this true ?

    I am sorry I am asking this in a Perl forum.



    lizardz
    Joined: Nov 12, 2004
    # Posts: 1394

    View the profile for lizardz Send lizardz a private message

    Posted: 2005-Oct-27 20:00
    Edit Message Delete Message Reply to this message

    Use of a few lines is fair use I believe, that's not copyright infringement.

    That's why you can quote somebody's writing for example, but not duplicate their whole article, but you can quote from an article.



    excell
    Staff
    Joined: Mar 19, 2001
    # Posts: 14512

    View the profile for excell Send excell a private message

    Posted: 2005-Oct-27 20:04
    Edit Message Delete Message Reply to this message

    a scraper script - automation of the process of taking content...yes I would be careful with what you create.



    joe_vimal
    Joined: Mar 22, 2001
    # Posts: 104

    View the profile for joe_vimal Send joe_vimal a private message

    Posted: 2005-Oct-28 06:43
    Edit Message Delete Message Reply to this message

    No way excell. I perfectly understand and abhor the stealing of content from others. But what I am interested is - we want to populate the database of a jobsite with enough job offers to make the site attractive for the job seekers. Our client does not wish to infringe any laws and we won't either.

    Scraping a line of content from other sites is perfectly acceptable if you don't overdo it. eg: For SEO purposes, many scrape the search results pages of search engines:

    Results 1 - 100 of about 3,640,000 for 'keyword'

    Same way we use snippets of information from weather sites too usually with the express consent from the webmasters.

    In our case, even a couple of lines might be frowned upon as the snippet of imformation has some commercial value.

    I am confused. We don't want to be associated with any route that will even remotely land us in trouble. Losing this client in such a case would be preferable. What is the consensus of the Ladies and Gentlemen here ?



    You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
    1. You have not yet logged in, or registered properly as a member
    2. You are a member, but no longer have posting rights.
    3. This is a private forum, for which you do not have permissions.

    If you are a recent member, it's possible that you simply have not yet confirmed your account. Please check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions contained within.

    If you cannot find this message, click here to Re-Send it.

    If you are still experiencing problem, please read the Login Assistance Article for some advice on what may be causing your login not to work properly.

    Switch to Advanced Editor and ... Create a New Topic or Reply to this Thread

    New posts Forum is locked
    © 1995  ·  iWeb, Inc  ·  DBA JimWorld Productions