Index

Table of contents

pyquery

required import
from pyquery import PyQuery as pq

read document

open document from file
document = pq(filename='/path/to/file')
open document from url
document = pq(url='[url]')
open document from raw html
document = pq('<html><h1>header</h1></html>')

extract data

query css selector
p = document("[selector]")
get outer html
print(p.outerHtml())
get inner html
print(p.html())
get contained text
print(p.text())

documentation

https://pyquery.readthedocs.io/en/latest/api.html#pyquery.pyquery.PyQuery