Extracting usable text from webpages and documents is more challenging than it appears. HTML pages blend meaningful content with layout elements, scripts, ads, and styling rules, while PDF files are structured around printing logic rather than natural text flow. Without a reliable way to interpret these formats, even simple extraction can become inconsistent and difficult to automate.

Reader solves this by rendering the source exactly as a browser would and then extracting the text from the fully resolved document. This approach avoids guesswork and produces stable output across a wide range of real-world websites and files.

The Jina Reader service provides two endpoints, offering different services:

  • r.jina.ai converts any URL into LLM-ready text via https://r.jina.ai/<URL>. It generated cleaner, more reliable input for your agents and RAG systems.
  • s.jina.ai performs live web search for any query (https://s.jina.ai/your+query), allowing your LLMs to retrieve the latest information from across the web.

Together, these endpoints give developers a simple, reliable way to convert complex HTML or PDF sources into clean Markdown or well-structured JSON with a single request, making Reader a practical foundation for any ingestion or processing pipeline.

Quick start

To get a feel for Reader, you can use any internet browser directly, without writing any code or configuring an API key.

HTML → Markdown

Open https://www.elastic.co/elasticsearch. You should see the Elasticsearch product page (screenshot as of 17 November 2025).

Now open the same page through Reader by prefixing the URL with r.jina.ai:

https://r.jina.ai/https://www.elastic.co/elasticsearch

Reader will return a plain-text response in Markdown format representing the page content.

Accessing Reader directly in a browser via the r.jina.ai is a convenient way to experiment. Free, unauthenticated use is rate-limited and may be restricted for some URLs. For production or higher-priority workflows, you should use the authenticated API with an API key. You can get an API key from the Jina Reader API website, which you can insert into your request headers as shown in the next section.

Using the

Convert a page to Markdown

The simplest way to use Reader is to pass a URL to https://r.jina.ai/ and save the output using any internet-accessible tool.

With curl:

Here, the flag and argument -o elasticsearch.md writes the result to a local file named elasticsearch.md.

With wget:

The argument -O elasticsearch.md writes the result to the named file.

With Python:

This code prints the result to the screen.

We also offer an interactive interface that lets you experiment with different parameter settings and automatically generate sample code for other programming languages.

Authenticated usage and output modes

For higher-priority access and fewer limitations, include your Jina API key in the Authorization header (replace YOUR_API_KEY in the examples below with your key) and, optionally, choose an explicit output mode via X-Return-Format.

With curl:

With wget:

With Python:

JSON mode

In production, you may prefer to wrap your Reader output in a JSON object so that it’s easier to parse. To enable JSON mode on r.jina.ai, add the header Accept: application/json. For example, in Python:

For curl, use the flag -H "Authorization: Bearer <YOUR_API_KEY>", and with wget --header="Authorization: Bearer <YOUR_API_KEY>", as in the examples above, to add the header to your request. The core of the resulting JSON output will look like this:

For more information on the services available via r.jina.ai, visit the Reader API documentation page.

Search with s.jina.ai

Reader – via the https://s.jina.ai API – performs web search: You give it a query, and it processes the results from a search engine. The service it provides is to standardize SERPs (Search Engine Results Page): the page of results you get when you search on Google or another search engine. A SERP typically contains a set of links with snippets, preview text, and small metadata fields, but this is difficult to directly extract because the page often includes a variety of other materials, including ads and links to optional services or other products.

To use it, just retrieve the following URL, with your own query terms:

Replace any spaces between search terms with “+”,as in the example below:

The Reader API will perform your search on Google and process the result into a structured Markdown page that includes, for the top 10 matches, the page title, URL, and the description/summary provided by the underlying search engine. By default, it will also try to fetch the webpage and translate it to Markdown, using the same model as for the HTML/PDF to Markdown converter as above, but the request may time out or fail for other reasons so the page might be absent.

As with all Reader API services, if you use your Jina API key, you will get higher priority, and therefore faster results without limits or having your request dropped because of traffic.

As a concrete example:

This returns Markdown-formatted text that looks like this:

You can do search-only, without attempting to retrieve the underlying page, by adding the header X-Respond-With: no-content. For example:

This will return Markdown-formatted text that starts like this:

Reader also provides a structured extraction mode that returns JSON-structured data if you add the header Accept: application/json. To perform the same search as above, for example:

This will return the same information in a JSON format that is readily accessible to other applications, in addition to some of the page metadata and a summary of token usage. In the example output below, a lot has been cut (replaced with <...>) for readability and space:

There are more options for this API. For fuller information, including optional arguments and their corresponding headers, visit the Reader API page.

Reader and Elasticsearch

Reader API is a handy tool for doing what, on the surface, seems like a simple job, but isn’t. The web is complicated enough that you really need AI to get the most out of it. Extracting data from web pages and PDFs is a messy job for a programmer, but one that suits AI language models very well.

Reader’s choice to standardize on JSON and Markdown reflects a choice to support the most useful, widely supported, yet simple and human-accessible data formats.

You can put Markdown directly into your Elasticsearch document store and index using any of the methods available via the Elastic Document API. The best strategy is generally to use Reader to get documents and then process them in bulk, or to use the largest batches possible if resources are too constrained. You can combine Reader output with any text storage or indexing strategy.

Elasticsearch has faculties for direct JSON support in storage, indexing, and retrieval. The format is open to human inspection, transmissible as plain text, and has reliable support in all major programming languages and frameworks, as well as built-in support in browsers.

Reader makes your Elastic installation more useful by bringing AI to the kind of messy data the real world produces.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as you are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself