File manager - Edit - /home/u478019808/domains/bestandroidphones.store/public_html/static/img/logo/faq.rst.tar
Back
opt/gsutil/third_party/chardet/docs/faq.rst 0000644 00000012102 15025315434 0015047 0 ustar 00 Frequently asked questions ========================== What is character encoding? --------------------------- When you think of “text”, you probably think of “characters and symbols I see on my computer screen”. But computers don’t deal in characters and symbols; they deal in bits and bytes. Every piece of text you’ve ever seen on a computer screen is actually stored in a particular *character encoding*. There are many different character encodings, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Very roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk. In reality, it’s more complicated than that. Many characters are common to multiple encodings, but each encoding may use a different sequence of bytes to actually store those characters in memory or on disk. So you can think of the character encoding as a kind of decryption key for the text. Whenever someone gives you a sequence of bytes and claims it’s “text”, you need to know what character encoding they used so you can decode the bytes into characters and display them (or process them, or whatever). What is character encoding auto-detection? ------------------------------------------ It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It’s like cracking a code when you don’t have the decryption key. Isn’t that impossible? ---------------------- In general, yes. However, some encodings are optimized for specific languages, and languages are not random. Some character sequences pop up all the time, while other sequences make no sense. A person fluent in English who opens a newspaper and finds “txzqJv 2!dasd0a QqdKjvz” will instantly recognize that that isn’t English (even though it is composed entirely of English letters). By studying lots of “typical” text, a computer algorithm can simulate this kind of fluency and make an educated guess about a text’s language. In other words, encoding detection is really language detection, combined with knowledge of which languages tend to use which character encodings. Who wrote this detection algorithm? ----------------------------------- This library is a port of `the auto-detection code in Mozilla <https://www-archive.mozilla.org/projects/intl/chardet.html>`__. I have attempted to maintain as much of the original structure as possible (mostly for selfish reasons, to make it easier to maintain the port as the original code evolves). I have also retained the original authors’ comments, which are quite extensive and informative. You may also be interested in the research paper which led to the Mozilla implementation, `A composite approach to language/encoding detection <http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html>`__. Yippie! Screw the standards, I’ll just auto-detect everything! -------------------------------------------------------------- Don’t do that. Virtually every format and protocol contains a method for specifying character encoding. - HTTP can define a ``charset`` parameter in the ``Content-type`` header. - HTML documents can define a ``<meta http-equiv="content-type">`` element in the ``<head>`` of a web page. - XML documents can define an ``encoding`` attribute in the XML prolog. If text comes with explicit character encoding information, you should use it. If the text has no explicit information, but the relevant standard defines a default encoding, you should use that. (This is harder than it sounds, because standards can overlap. If you fetch an XML document over HTTP, you need to support both standards *and* figure out which one wins if they give you conflicting information.) Despite the complexity, it’s worthwhile to follow standards and `respect explicit character encoding information <http://www.w3.org/2001/tag/doc/mime-respect>`__. It will almost certainly be faster and more accurate than trying to auto-detect the encoding. It will also make the world a better place, since your program will interoperate with other programs that follow the same standards. Why bother with auto-detection if it’s slow, inaccurate, and non-standard? -------------------------------------------------------------------------- Sometimes you receive text with verifiably inaccurate encoding information. Or text without any encoding information, and the specified default encoding doesn’t work. There are also some poorly designed standards that have no way to specify encoding at all. If following the relevant standards gets you nowhere, *and* you decide that processing the text is more important than maintaining interoperability, then you can try to auto-detect the character encoding as a last resort. An example is my `Universal Feed Parser <https://pythonhosted.org/feedparser/>`__, which calls this auto-detection library `only after exhausting all other options <https://pythonhosted.org/feedparser/character-encoding.html>`__. opt/gsutil/third_party/requests/docs/community/faq.rst 0000644 00000007070 15025316351 0017343 0 ustar 00 .. _faq: Frequently Asked Questions ========================== This part of the documentation answers common questions about Requests. Encoded Data? ------------- Requests automatically decompresses gzip-encoded responses, and does its best to decode response content to unicode when possible. When either the `brotli <https://pypi.org/project/Brotli/>`_ or `brotlicffi <https://pypi.org/project/brotlicffi/>`_ package is installed, requests also decodes Brotli-encoded responses. You can get direct access to the raw response (and even the socket), if needed as well. Custom User-Agents? ------------------- Requests allows you to easily override User-Agent strings, along with any other HTTP Header. See :ref:`documentation about headers <custom-headers>`. Why not Httplib2? ----------------- Chris Adams gave an excellent summary on `Hacker News <http://news.ycombinator.com/item?id=2884406>`_: httplib2 is part of why you should use requests: it's far more respectable as a client but not as well documented and it still takes way too much code for basic operations. I appreciate what httplib2 is trying to do, that there's a ton of hard low-level annoyances in building a modern HTTP client, but really, just use requests instead. Kenneth Reitz is very motivated and he gets the degree to which simple things should be simple whereas httplib2 feels more like an academic exercise than something people should use to build production systems[1]. Disclosure: I'm listed in the requests AUTHORS file but can claim credit for, oh, about 0.0001% of the awesomeness. 1. http://code.google.com/p/httplib2/issues/detail?id=96 is a good example: an annoying bug which affect many people, there was a fix available for months, which worked great when I applied it in a fork and pounded a couple TB of data through it, but it took over a year to make it into trunk and even longer to make it onto PyPI where any other project which required " httplib2" would get the working version. Python 3 Support? ----------------- Yes! Requests officially supports Python 3.8+ and PyPy. Python 2 Support? ----------------- No! As of Requests 2.28.0, Requests no longer supports Python 2.7. Users who have been unable to migrate should pin to `requests<2.28`. Full information can be found in `psf/requests#6023 <https://github.com/psf/requests/issues/6023>`_. It is *highly* recommended users migrate to Python 3.8+ now since Python 2.7 is no longer receiving bug fixes or security updates as of January 1, 2020. What are "hostname doesn't match" errors? ----------------------------------------- These errors occur when :ref:`SSL certificate verification <verification>` fails to match the certificate the server responds with to the hostname Requests thinks it's contacting. If you're certain the server's SSL setup is correct (for example, because you can visit the site with your browser) and you're using Python 2.7, a possible explanation is that you need Server-Name-Indication. `Server-Name-Indication`_, or SNI, is an official extension to SSL where the client tells the server what hostname it is contacting. This is important when servers are using `Virtual Hosting`_. When such servers are hosting more than one SSL site they need to be able to return the appropriate certificate based on the hostname the client is connecting to. Python 3 already includes native support for SNI in their SSL modules. .. _`Server-Name-Indication`: https://en.wikipedia.org/wiki/Server_Name_Indication .. _`virtual hosting`: https://en.wikipedia.org/wiki/Virtual_hosting
| ver. 1.4 |
Github
|
.
| PHP 8.2.28 | Generation time: 0.02 |
proxy
|
phpinfo
|
Settings