Browsing From The Command Line / Server With cURL

cURL is a tool used for browsing the web from the shell or command-line. It supports many internet protocols, such as HTTP, FTM, POP3, IMAP, SMTP and more. See the full list here.

With libcurl installed in your system and the C API, you can browse using a C program. You can also install the extension cURL for PHP.

Steps of a cURL Program

  1. Obtain a curl handle
    For example:

    $curl_handle = curl_init();
  2. Set options to the handle.
    Define the behavior of curl when executing the request: setting the URL, cookie location, variables passed to the server, etc.
    In PHP, you can use curl_setop to set a single option, or curl_setopt_array to pass a PHP array.
    For example:

    curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,true);

    will cause cURL to store the output in a string; the value ‘false’ will cause cURL to send it to the standard output.

  3. Execute the request.
    $out=curl_exec($curl_handle);
  4. Close the handle
    curl_close($curl_handle);

As long as the handle is open, you can repeat steps 2 and 3 as many times as you need.

Another useful cURL function is curl_getinfo. In the example below, I have used

"$httpCode = curl_getinfo($curl_handle, CURLINFO_HTTP_CODE);"

to determine if the login action was successful.

 An Example – Sharing a Message in LinkedIn

Publishing a text message in LinkedIn is simple: surf to LinkedIn, find the relevant form in the response and submit it. I’ve found the login form and the publish form using HTML Dom documents. Then populated post vars according to them, and connected to the URL given in the “action” attribute of the form. The code is run in PHP CLI in Linux (“stty -echo” is a system call that suppresses the echoing of input characters in Linux).

Step I – Surf to LinkedIn

In this step cURL will send a request to get the content in linkedin.com This is the first time, so the user is not logged in, and there are no cookies. The option CURLOPT_USERAGENT will make the server believe that the request has been sent from a real browser. The option CURLOPT_COOKIEJAR will define the file from which to store and retrieve cookies.

Following is the code:

<?php
error_reporting(E_ERROR | E_PARSE); //Suppress warnings

$curl_handle = curl_init();

// Connect to the site for the first time.
curl_setopt($curl_handle,CURLOPT_URL,"https://www.linkedin.com");
curl_setopt($curl_handle,CURLOPT_USERAGENT,'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:35.0) Gecko/20100101 Firefox/35.0');
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,true);
curl_setopt($curl_handle,CURLOPT_COOKIEJAR,'/tmp/cookies');
$out = curl_exec($curl_handle);
if (!$out){
  echo "Error: " . curl_error($curl_handle) . "\n";
  die();
}

Step II – Get the Login Form, Populate It, and Submit It

In this step your script will read your e-mail address and password from the standard input (“php://stdin”), then populate the login form, and submit it. Using the Firefox extension DOM Inspector, I found that the Id of the form element is ‘login’, the username (e-mail) field’s name is “session_key”, and the password field’s name is “session_password”. The script willl submit the form with the input fields of type ‘hidden’ and with the entered e-mail and password. If the login was successful, the http code returned in the header would be 302, which means the output is returned from another address.

Following is the code:

$stdin = fopen('php://stdin','r');
echo "Enter e-mail:";
$email = trim(fgets($stdin));
system("stty -echo");
echo "Enter Password:";
$pass = trim(fgets($stdin));
system("stty echo");
echo "\n";
// Get the form inputs.
$doc = new DOMDocument();
$doc->loadHTML($out);

$form = $doc->getElementById('login');
$inputElements = $form->getElementsByTagName('input');
$length = $inputElements->length;

$inputs = Array();
for ($i=0;$i<$length;$i++){
  $elem=$inputElements->item($i);
  $name = $elem->getAttribute('name');
  $value = $elem->getAttribute('value');
  $inputs[$name]=$value;
}
$inputs['session_key']=$email;
$inputs['session_password']=$pass;
$keys = array_keys($inputs);
$postvars = '';

$firstInput=true;
foreach ($keys as $key){
  if (!$firstInput)
    $postvars .= '&';
  $firstInput = false;
  $postvars .= $key . "=" . urlencode($inputs[$key]);
}
$submitUrl = $form->getAttribute('action');

curl_setopt_array($curl_handle, Array(
  CURLOPT_URL=>$submitUrl,
  CURLOPT_POST=>true,
  CURLOPT_POSTFIELDS=>$postvars
));
$out=curl_exec($curl_handle);
$httpCode = curl_getinfo($curl_handle, CURLINFO_HTTP_CODE);

if ($httpCode != 302)
  die("Error - could not connect: $httpCode\n");

Step III – Post the Silly Message

After a successful login, the relevant data is stored in the cookie jar associated with the cURL handle. This time the script will read the content of the home page with the user logged in. A logged-in user can post status updates. This time, the operation is not complete until the “browser” is referred to the new address. So, we set the cURL option “CURLOPT_FOLLOWLOCATION” to true. In addition, PHP cURL allows to send an associative array as the value of the option “CURLOPT_POSTFIELDS”, a more elegant way to send POST data.

Following is the code:

// Post the message
curl_setopt($curl_handle, CURLOPT_URL, 'https://www.linkedin.com');
$out = curl_exec($curl_handle);
$doc = new DOMDocument();
$doc->loadHTML($out);
$form=$doc->getElementById('share-form');
$inputElements = $form->getElementsByTagName('input');
$length = $inputElements->length;
$inputs=Array();
for ($i=0;$i<$length;$i++){
  $elem=$inputElements->item($i);
  $name = $elem->getAttribute('name');
  $value = $elem->getAttribute('value');
  $inputs[$name]=$value;
}
$inputs['postText']="Hello! I am a message sent by a PHP script.";
$inputs['postVisibility2']='EVERYONE';
$keys=array_keys($inputs);


$formAction = $form->getAttribute('action');
if (substr($formAction,0,5)!='http:')
  $formAction = 'http://www.linkedin.com' . $formAction;

curl_setopt_array($curl_handle, Array(
  CURLOPT_URL=>$formAction,
  CURLOPT_POST=>true,
  CURLOPT_FOLLOWLOCATION=>true,
  CURLOPT_POSTFIELDS=>$inputs
));
$out = curl_exec($curl_handle);

curl_close($curl_handle);
?>
Advertisements