Discussion:
[wwwsearch-general] python mechanize - clicking javascript button
James
2008-06-25 01:33:49 UTC
Permalink
Hi,

I'm having a bit of a problem and am hoping that someone can assist (I
realize this may be slightly off-topic, but I haven't found a good
place to post this question, so... ;))

I'm writing a script to interact with a website. There's a form
similar to the following in the website:

<form name="form" method="post" action="push.do">
...stuff...

<input type="button" name="button" value="delete"
onclick="javascript:delete();" class="button">

...stuff...

Everything in this form is sent to the server and push.do is the
script that runs in the backend. There is *one* button, however, will
change the action on the form to "delete.do", as can be seen in the
javascript function definition below:

function delete()
{
document.form.action = "delete.do";
document.form.submit();
}

Seems simple enough, right?

When I use mechanize and print the form that I'm working with
br.select_form(name="form")
print br.form
<form POST delete() application/x-www-form-urlencoded
<HiddenControl(number=2321) (readonly)>
<HiddenControl(sessionIdentification=115) (readonly)>
<IgnoreControl(button=<None>)>
<TextControl(string=Hello World)>
<SelectControl(code=[ code1, *code2, code3, code4, code5])>
<SelectControl(codeClass=[1, 2, 3, 4, 5, 6, *7, 8])>
<SubmitControl(button=Commit) (readonly)>
<IgnoreControl(button=<None>)>>

The only other button on this form is "Commit", which apparently
results in a POST to "push.do". The javascript "delete" button is the
*only* entity in the form that POSTs to delete.do.

The Python mechanize website states that in this situation the best
thing to do is set up a POST to the web server (since mechanize cannot
interpret javascript).

A few things boggle me:
a) When submitting the form, how do I deal with cookies? I'm unsure
about how to pass the web tool the appropriate cookie (I'm not even
sure the web server wants a cookie in the first place)

b) Do I have to pass *all* the values listed in the "print br.form"?
If not, then how do I figure out what precisely the server "requires"
from a POST? (TamperData seems to indicate that most of the stuff is
sent to the server on *either* button click, but I'm not sure...is
there a better way to find out?)

d) Is POST my only option, or is there a simpler way? I realize the
only thing the javascript snippet is doing is changing the "form
action" from push.do to delete.do...seems like something simple enough
to change without writing code to set up a POST. (I've tried, but have
not had any success, unfortunately). Can I "modify" the "form"? (how
would I go about modifying br.form, anyways?)

...

I found a website that explains how to set up a POST in Python using
urllib2, below:
http://love-python.blogspot.com/2008/04/get-content-html-source-of-url-by-http.html

It structures the post as follows:

url = 'http://www.example.com'

values = {'key1' : 'value1',
'key2' : 'value2',
'key3' : 'value3',
}

try:
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
print the_page

except Exception, detail:
print "Err ", detail


Is this the best way to set up a POST? (by configuring a dictionary
with all the values that are required for the post?) I believe the
last time I tried doing this the server returned a 501 error.

Any thoughts *greatly* appreciated!

- j
John J Lee
2008-06-26 20:41:10 UTC
Permalink
Post by James
I'm having a bit of a problem and am hoping that someone can assist (I
realize this may be slightly off-topic, but I haven't found a good
place to post this question, so... ;))
Not off topic.

[...]
Post by James
Everything in this form is sent to the server and push.do is the
script that runs in the backend. There is *one* button, however, will
change the action on the form to "delete.do", as can be seen in the
[...]
Post by James
The Python mechanize website states that in this situation the best
thing to do is set up a POST to the web server (since mechanize cannot
interpret javascript).
a) When submitting the form, how do I deal with cookies? I'm unsure
about how to pass the web tool the appropriate cookie (I'm not even
sure the web server wants a cookie in the first place)
The web tool? What's that? If the cookie arrives in an HTTP header, it
should be handled automatically by mechanize. If JS code sets the cookie,
you have to call browser.set_cookie(cookie_string) (see the docstring for
that method).
Post by James
b) Do I have to pass *all* the values listed in the "print br.form"?
If not, then how do I figure out what precisely the server "requires"
from a POST? (TamperData seems to indicate that most of the stuff is
sent to the server on *either* button click, but I'm not sure...is
there a better way to find out?)
Not really. Your options are: experimentation, or reading the JS, or
using a tool that understands JS (e.g. Mozilla or Java HtmlUnit).
Post by James
d) Is POST my only option, or is there a simpler way? I realize the
only thing the javascript snippet is doing is changing the "form
action" from push.do to delete.do...seems like something simple enough
to change without writing code to set up a POST. (I've tried, but have
not had any success, unfortunately). Can I "modify" the "form"? (how
would I go about modifying br.form, anyways?)
Unfortunately that's something that never got properly documented, though
there's a clear and increasing need for it. So the answer is a qualified
"yes". You can mutate the form object and the form's control objects.
Please ask specific questions about things you can't seem to do.

In the specific case you asked about (untested snippet):

import mechanize
br = mechanize.Browser()
br.open(url)
br.select_form(name="some_form")
br.form.action = "delete.do"
br.submit()
Post by James
I found a website that explains how to set up a POST in Python using
http://love-python.blogspot.com/2008/04/get-content-html-source-of-url-by-http.html
[...]
Post by James
Is this the best way to set up a POST? (by configuring a dictionary
with all the values that are required for the post?) I believe the
last time I tried doing this the server returned a 501 error.
That's one way to do it -- see br.form.click_request_data() -- but the
mechanize way would be to mutate the form / controls, as above. The API
docs are in the docstrings (yeah, there should be HTML docs generated from
those). But as I say, I think the API docs are sadly incomplete here,
which probably means tests are also incomplete, which probably means this
functionality is incomplete / broken in places.


John

Loading...